Why Unkey Ditched Serverless to Improve API Performance

Introduction: The Shift from Hype to Performance Optimization

Contents hide

1 Introduction: The Shift from Hype to Performance Optimization

2 The Serverless Promise vs. The Latency Reality

2.1 Understanding the Serverless Appeal

2.2 The “Cold Start” Bottleneck

3 Deconstructing Unkey’s Decision

3.1 1. The Cost of TCP and TLS Handshakes

3.2 2. The Execution Environment Limitation

4 The Architecture Shift: From Ephemeral to Persistent

4.1 Benefits of Long-Running Servers for APIs

5 The User Experience and SEO Correlation

6 When Should You Stick with Serverless?

7 Architecting for the Edge

8 Comparison: Serverless vs. Long-Running Containers

9 Frequently Asked Questions

9.1 1. What exactly is a “cold start” in serverless computing?

9.2 2. Did Unkey completely abandon serverless?

9.3 3. Is serverless always cheaper than long-running servers?

9.4 4. How does latency affect SEO?

9.5 5. What is the alternative to AWS Lambda for high performance?

10 Conclusion

In the rapidly evolving landscape of cloud infrastructure, the narrative has long been dominated by the allure of serverless computing. The promise of infinite scalability, pay-per-use pricing models, and zero infrastructure management seduced developers and CTOs alike. However, as the ecosystem matures, high-performance startups are beginning to reassess the trade-offs. A prime example of this pivot is Unkey, an API key management platform that made headlines by migrating away from a purely serverless architecture to improve API performance.

This strategic move highlights a critical reality in software engineering: there is no one-size-fits-all solution. While serverless functions (like AWS Lambda or Vercel Functions) are revolutionary for event-driven tasks and unpredictable traffic, they introduce specific latency hurdles—primarily cold starts and connection overheads—that can cripple high-frequency, low-latency applications. For companies where speed is the product, milliseconds equal revenue.

At XS One Consultants, we specialize in dissecting these architectural decisions. Through our expert technology consultancy, we help businesses look past the hype to implement infrastructure that aligns with their specific performance goals. In this comprehensive analysis, we will explore why Unkey ditched serverless, the technical bottlenecks of serverless performance issues, and how long-running servers can drastically reduce latency for API-first companies.

The Serverless Promise vs. The Latency Reality

Understanding the Serverless Appeal

Before criticizing serverless, it is essential to understand why it became the default choice for modern web development. Serverless architectures allow developers to push code without provisioning servers. The cloud provider handles the allocation of resources, scaling up to meet demand and scaling down to zero when idle. This reduces operational overhead and theoretical costs for sporadic workloads.

The “Cold Start” Bottleneck

The Achilles’ heel of serverless computing in a high-performance context is the “cold start.” When a serverless function is triggered after a period of inactivity, the cloud provider must spin up a new container, download the code, start the runtime, and execute the function. This process can add anywhere from 200ms to several seconds to a request.

For a standard website, a 500ms delay might be negligible. However, for an API key verification service like Unkey, which sits in the critical path of every request a user makes, adding latency is unacceptable. If the gatekeeper is slow, the entire application feels sluggish. This is where custom software development requires a nuanced approach, choosing architectures that prioritize connection persistence over ease of deployment.

Deconstructing Unkey’s Decision

Unkey’s core product is verifying API keys. This verification needs to happen globally, instantly, and reliably. When they relied on serverless functions, they faced two distinct technical challenges that eventually forced their hand:

1. The Cost of TCP and TLS Handshakes

In a serverless environment, functions are ephemeral. They spin up, do a job, and die (or freeze). This means that for many requests, the function has to establish a new database connection or a new network request to other services. Establishing a secure connection involves a TCP handshake and a TLS handshake.

These handshakes require multiple round-trips between the client and the server before any data is actually exchanged. In a serverless environment, you pay this “handshake tax” frequently. By moving to long-running servers (or persistent containers), Unkey could utilize connection pooling. This allows the application to keep connections open and reuse them, bypassing the handshake overhead for subsequent requests and shaving precious milliseconds off the total response time.

2. The Execution Environment Limitation

Serverless functions often run in constrained environments with limits on execution time and memory. While adequate for microservices, they struggle with high-throughput I/O operations that require sustained network performance. Unkey found that by controlling the runtime environment on a long-running server, they could optimize the network stack and memory allocation specifically for their high-speed verification logic.

The Architecture Shift: From Ephemeral to Persistent

The solution for Unkey was not to abandon the cloud, but to abandon the ephemeral nature of serverless for their core hot-path. They moved to an edge-compatible, long-running compute model (using platforms like Fly.io or AWS Fargate) where the code stays warm.

Benefits of Long-Running Servers for APIs

Zero Cold Starts: The server is always running and ready to accept traffic. The latency is purely network travel time plus processing time.
Connection Reuse: Database connections remain open. This is critical for applications utilizing AI-powered applications or high-frequency databases where the cost of connecting is high.
Predictable Performance: Without the variance of container spin-up times, P99 latency (the speed of the slowest 1% of requests) stabilizes significantly.
In-Memory Caching: A long-running server can hold local state in memory (RAM). This allows for ultra-fast caching strategies that aren’t possible in stateless serverless functions.

The User Experience and SEO Correlation

Why does this technical nuance matter to a business owner? Because latency kills conversion. Google’s Core Web Vitals heavily penalize slow loading times, specifically Interaction to Next Paint (INP) and Largest Contentful Paint (LCP). If your API backend is slow, your frontend is slow.

When Unkey improved their API performance, they didn’t just make developers happy; they enabled their customers to build faster, more responsive user interfaces. In the world of UI/UX design, perceived performance is reality. A delay in data fetching can lead to layout shifts, loading spinners, and user frustration. Furthermore, faster API responses contribute directly to better search rankings, creating a symbiotic relationship between backend architecture and SEO services.

When Should You Stick with Serverless?

Despite Unkey’s exit, serverless is not dead. It remains an incredible tool for specific use cases. At XS One Consultants, we advise clients to retain serverless architectures for:

Asynchronous Tasks: Image processing, sending emails, or generating PDFs where immediate response time is not critical.
Spikey Traffic: Marketing campaigns or events where traffic might jump from zero to ten thousand in seconds.
Prototyping: Building an MVP quickly without worrying about infrastructure management.

However, for the “hot path”—the core functionality that a user waits for—a hybrid approach or a dedicated server is often the superior choice.

Architecting for the Edge

Unkey’s move also emphasizes the importance of the “Edge.” Modern infrastructure allows long-running servers to be deployed in multiple regions simultaneously, close to the user. This is distinct from the old model of one central server.

By deploying persistent servers in regions like US-East, Europe-West, and Asia-Pacific, companies can combine the low latency of geographical proximity with the performance benefits of persistent computing. This global distribution is often a key component in enterprise-grade mobile app development, ensuring that users in Tokyo get the same snappy experience as users in New York.

Comparison: Serverless vs. Long-Running Containers

Feature	Serverless (Lambda/Vercel)	Long-Running (Fly.io/EC2)
Cold Starts	Frequent (200ms – 1s+)	None (Always on)
Connection Pooling	Difficult / Requires Proxy	Native / Efficient
Cost Model	Pay per request	Pay per compute hour
Maintenance	Low (No OS management)	Medium (Container management)
Ideal Use Case	Event-driven, sporadic tasks	High-throughput APIs, WebSockets

Frequently Asked Questions

1. What exactly is a “cold start” in serverless computing?

A cold start occurs when a serverless platform initiates a new instance of a function to handle a request. This involves allocating resources and loading the code, which causes a delay. Once the function is “warm,” subsequent requests are faster, but if traffic is sporadic, cold starts can occur frequently.

2. Did Unkey completely abandon serverless?

While Unkey moved their core API verification engine to long-running servers to solve the latency crisis, most modern companies utilize a hybrid approach. It is likely that non-critical, asynchronous background tasks (like analytics logging or email notifications) may still utilize serverless functions due to their cost-efficiency.

3. Is serverless always cheaper than long-running servers?

Not always. Serverless is cheaper for low traffic or sporadic usage. However, at high scale, paying for every millisecond of execution time can become significantly more expensive than paying a flat rate for a dedicated server or container that runs 24/7.

4. How does latency affect SEO?

Google uses Core Web Vitals as ranking signals. Slow API responses delay the rendering of content (LCP) and the responsiveness of the page (INP). High latency can lead to poor user experience metrics, which Google interprets as a lower-quality site, potentially harming your search rankings.

5. What is the alternative to AWS Lambda for high performance?

Alternatives include Container-as-a-Service (CaaS) platforms like AWS Fargate, Google Cloud Run (configured with minimum instances), or edge-compute platforms like Fly.io that allow for persistent, long-running containers distributed globally.

Conclusion

The story of Unkey ditching serverless is a powerful reminder that in technology, architectural decisions must be driven by data and performance requirements, not just industry trends. While serverless revolutionized deployment workflows, its inherent limitations regarding cold starts and connection overheads make it unsuitable for ultra-low-latency requirements.

For businesses looking to scale, the transition to long-running servers offers a pathway to stability, speed, and better user experiences. Understanding the intricacies of TCP handshakes, memory persistence, and global distribution is what separates a good application from a world-class one. Whether you are building the next global API or optimizing a legacy system, choosing the right infrastructure is paramount.

At XS One Consultants, we help organizations navigate these complex decisions. If you are facing performance bottlenecks or need to audit your cloud architecture, contact us today to build a faster, more resilient future for your software.

Editor

Editor at XS One Consultants, sharing insights and strategies to help businesses grow and succeed.