Heroku's Ugly Secret
The story of how the cloud-king turned its back on Rails
and swindled its customers
A Rails dyno isn't what it used to be. In mid-2010, Heroku quietly redesigned its routing system, and the change — nowhere documented, nowhere instrumented — radically degraded throughput on the platform. Dollar for dollar a dyno became worth a fraction of its former self.
UPDATE 2/14: We've addressed a few of the popular suggestions and reactions, click here to find out more
Rap Genius is blowing up. Traffic is growing, the rate of growth is growing, superstar artists are signing up for their own accounts, smart people have bet on us to the tune of $15M and day by day we are accumulating more of the prize itself, irrepressible meme value:
Of course for the tech side of the house, this means we’re finally running into those problems that everyone says they want to run into. We are finally hitting that point where our optimizations aren’t premature. With nearly 15 million monthly uniques we are, as they say, “at scale.”
As exciting as that is it’s also frightening, and in this moment more than others it’s great to have a strong technical partner — like Heroku, the hosting service that makes ops as easy as playing with sliders:
If you had asked us a couple of weeks ago, we would have told you that we were happy to be one of Heroku’s largest customers, happy even to be paying their eye-popping monthly bill (~$20,000). “As devs,” we would have said, “we don’t want to manage infrastructure, we want to build features. If Heroku lets us do that, they’ve earned their keep.”
But then they told us the truth.
“Requests are waiting in a queue at the dyno level,” a Heroku engineer told us, “then being served quickly (thus the Rails logs appear fast), but the overall time is slower because of the wait in the queue.”
Waiting in a queue at the dyno level? What?
How you probably think Heroku works
To understand how weird this response was, you first must understand how everyone thinks Heroku works.
When you deploy an app to Heroku, you actually deploy it to a bunch of different “dynos” (virtualized Ubuntu servers) that live on AWS. For a Rails app, each dyno is capable of serving one request at a time. They each cost $36 per month, or $79.20 per month if you buy the New Relic add-on.
When someone requests a page from your site, that request first goes through Heroku’s router (they call it the “routing mesh”), which decides which dyno should work on it. The ostensible purpose of the router is to balance load intelligently between dynos, so that a single dyno doesn’t end up working non-stop while the others do nothing. If at any given moment all the dynos are busy, the router should queue the request and give it to the first one that becomes available.
And indeed this is what Heroku claims on their "How it Works" page:
Their documentation tells a similar story. Here's a page from 2009:
Intelligent routing: The routing mesh tracks the availability of each dyno and balances load accordingly. Requests are routed to a dyno only once it becomes available. If a dyno is tied up due to a long-running request, the request is routed to another dyno instead of piling up on the unavailable dyno’s backlog.
The 2013 version of that doc is a bit more cryptic...
Intelligent routing: The routing mesh tracks the location of all dynos running web processes (web dynos) and routes HTTP traffic to them accordingly.
But elsewhere in their current docs, they make the same old statement loud and clear:
The heroku.com stack only supports single threaded requests. Even if your application were to fork and support handling multiple requests at once, the routing mesh will never serve more than a single request to a dyno at a time.
The Heroku log format doesn't even include an entry for time spent in the in-dyno queue, because the assumption is that such a queue does not exist. The entries that are included are for the router queue:
Same for New Relic: When it reports “Request Queuing,” it’s talking about time spent at the router. For Rap Genius, on a bad day, that amounts to a tiny imperceptible tax of about 10ms per request.
Which brings us back to...
"Queuing at the Dyno Level"
This is why the Heroku engineer's comment about requests “waiting in a queue at the dyno level” struck us as so bizarre — we were under the impression that this could never happen. The whole point of "intelligent load distribution as you scale" is that you shouldn't send requests to dynos unless they're free! And even if all the dynos are full, it's better for the router to hold on to requests until one frees up (rather than risk stacking them behind slow requests).
If you're lucky enough to find the correct doc — a doc that contradicts all the others, and the logs, and the marketing material — you'll find that Heroku replaced its "intelligent load distribution," once a cornerstone of its platform, with "random load distribution":
The routing mesh uses a random selection algorithm for HTTP request load balancing across web processes.
That’s important enough to repeat:
In mid-2010, Heroku redesigned its routing mesh so that new requests would be routed, not to the first available dyno, but randomly, regardless of whether a request was in progress at the destination.
That decision was not announced. The bulk of Heroku's documentation explicitly says, or implicitly assumes, the opposite. “Time spent in the dyno queue” is nowhere reported in their logs, and nowhere exposed by their (very expensive) analytics partner New Relic. And, crucially, this change didn't affect their prices — Heroku has charged $36 per month per dyno since launch.
Why does this matter? Because routing requests randomly is dumb!
It would be like if those machines at the Whole Foods checkout line didn’t send you to the first available register, but to a random register where other customers were already standing in line. How much longer would it take to get out of the store? How much more time would the checkout clerks spend idling? If you owned that store and one day the manager, without telling you, replaced your fancy checkout routing system with a pair of dice, and his nightly reports to you never changed — he never told you how long people were waiting at individual registers, that they even could (wasn’t preventing that the whole point of having a routing system?) — that would be bad, right?
In the old regime, which Heroku called “intelligent routing,” a dyno was a dyno was a dyno. When you bought one, you bought a predictable increase in concurrency (the capacity for simultaneous requests). In fact Heroku defines concurrency as "exactly equal to the number of dynos you have running for your app."
But that's no longer true, because the routing system is no longer intelligent. When you route requests randomly — we’ll call this the “naive” approach — concurrency can be significantly less than the number of dynos. That’s because unused dynos only have some probability of seeing a request, and that probability decreases as the number of dynos grows. It’s no longer possible to reliably “soak up” excess load with fresh dynos, because you have no guarantee that requests will find them.
Clearly, under Heroku's random routing approach you need more dynos to achieve the same throughput you had when they routed requests intelligently. But how many more dynos do you need? If your app needed 10 dynos under the old regime, how many does it need under the new regime? 20? If so, Heroku is overcharging you by a factor of 2, which you might playfully refer to as the Heroku Swindle Factor™.
Intuitively, how much worse do you think random routing is? What's the true value of the HSF™? 2? 5? TEN?!
Actually, for big apps, it's about FIFTY. That's right — if your app needs 80 dynos with an intelligent router, it needs 4,000 with a random router. So if you're on Rails (or any other single-threaded, one-request-at-a-time framework), Heroku is overcharging you by a factor of 50.
This we discovered by simulating (in R — here's our annotated source) both routing regimes on a model of the Rap Genius application with these properties:
- 9,000 requests per minute (arriving as in a Poisson process)
- Mean request time: 306ms
- Median request time: 46ms
- Request times following this distribution (from a sample of 212k actual Rap Genius requests; use a Weibull distribution to approximate this at home):
1% 5% 10% 25% 50% 75% 90% 99% 99.9%
7ms 8ms 13ms 23ms 46ms 255ms 923ms 3144ms 7962ms
Below you can see a minute's worth of the simulation. The first animation shows what happens in a world with naive routing. Notice that as time goes on, requests pile up on individual dynos, each dyno represented by a bar that's as high as its current queue of requests.
Now let's turn on intelligent routing, holding the other parameters in the simulation constant. Watch what happens. The bars never grow, because dynos never see more than one request at a time. Requests respond as quickly as Rails can process them:
Here are our final aggregated results:
If Heroku were using intelligent routing, an app with 75 dynos that receives 9,000 requests per minute will never have to queue a request. But with a naive (random) router, that same app — with the same number of dynos, the same rate of incoming requests, the same distribution of response times — will now see a 62% queue rate, with a mean queue time of 2.763 seconds. On average each request will spend almost 6x longer in queue than in the app.
And since each additional dyno adds less and less to your app's concurrency (since it's less and less likely to get used), you have to add a lot of dynos to get the queue rate down. In fact to cut your percentage of queued requests by half, you have to double your allotment of dynos. And even as you do that, the average amount of time that queued requests spend in the queue (column 4) stubbornly holds above 1s.
To bring queuing down to an acceptable level (<10ms), you’d need to crank your app to 4,000 dynos, or fifty times your allotment in the intelligent case.
But of course you can’t actually crank your app to 4,000 dynos. For one thing it’d cost over $300k per month. For another, Postgres can’t handle that many simultaneous connections.
So the only solution is for Heroku to return to routing requests intelligently. They claim that this is hard for them to scale, and that it complicates things for more “modern” concurrent apps like those built with Node.js and Tornado. But Rails is and always has been Heroku’s bread and butter, and Rails isn’t multi-threaded.
In fact a routing layer designed for non-blocking, evented, realtime app servers like Node and its ilk — a routing layer that assumes every dyno in a pool is as capable of serving a request as any other — is about as bad as it gets for Rails, where almost the opposite is true: the available dynos are perfectly snappy and the others, until they become available, are useless. The unfortunate conclusion being that Heroku is not appropriate for any Rails app that’s more than a toy.
We tried convincing Heroku to return to intelligent routing, but they don’t think what they’re doing now is a problem. Hit up email@example.com if you disagree.
(If you enjoyed this article, remember: Rap Genius is hiring!)
Edit news description to add:
- Historical context: how the event or text affects the world and history
- An explanation of the work's overall story (example: "Here, President Obama confirms the legality of drone strikes...")
- The work's impact on current issues