The Technical Blog

Orchestration without Kubernetes

Tue 25-Feb-2020 10:23 AM +00:00

A couple of huge trends have swept through coding in the past many years: microservices, and containers.

For many developers and shops now, "microservice" has come to mean "container", and once you have containers, well... you have to orchestrate them, right? And that means Kubernetes.

Kubernetes is great. It's come a long way. The hosted versions of it, like Azure Kubernetes Service, completely manage the VM's for you so you can focus on deploying your pods.

And yet... Kubernetes has paper cuts. It works, but once you get past the Kubernetes 101 demos, and you get into production, you start to see the levels of care and feeding that you're responsible for. It takes some skill and experience to run it well.

Questioning the assumptions

So... Kubernetes good. Kubernetes kinda hard sometimes.

Is it necessary? Well... not really. Not for most solutions, I'd say. And I want to talk about why, if for no other reason than to have an interesting discussion about solutions architecture. I'm not saying "don't use Kubernetes" at all... I'm just thinking through the choices.

And does "microservice" have to = "container"? I like what a friend of mine said about containers: "I love them for dev envionments for my team, because it eliminates setup time, and I can trust the versions, but I don't deploy them to production. They're unnecessary."

So, yes, "microservice" = "container" for dev, but maybe not for prod. I like that perspective. (And in case you're still not using containers even in dev... don't worry about it. You don't have to.)

In other words, remember when we used to just run servers, and we made sure they had the right runtime versions installed? And it worked? It still does, and you don't need an orchestrator for it.

Orchestration using Platform-as-a-Service

I spent years of my career as a network administrator, and I've installed and managed more servers in my life than I can remember. I still do it when I have to, but if I don't have to, I use Platform-as-a-Service. And almost all of the time, I don't have to.

I'm going to use Azure App Service and Azure Functions as my example deployment vehicles for this exercise. You might familiar with other PaaS application services, like Heroku, so feel free to swap the product names in your head.

Azure App Service lets me deploy just-my-code to a managed, fully secure, highly tuned VM with predictable versions of runtimes installed by Microsoft.

And if I want to run a container on App Service, great. And if I want to bring-my-own-runtime, great. But I don't need to.

What do I want from containers?

What I really want from a container is: a process, running on some computer, with predictable versions of runtimes (so I don't waste time chasing runtime version issues).

With App Service, I get exactly that: a process, running on some computer, with predictable versions of runtimes (installed by Microsoft instead of docker pull).

I don't need containers in production to get the benefits I want from containers. There are other ways to get those benefits that keep things much simpler. No YAML, no multi-gigabyte network transfers to get things into test and production.

What do I want out of orchestration?

So that's what I want out of containers... what about orchestration? What do we get out of it? Can we get that some other way?

Kubernetes gives fine-grained control over how many versions of each pod are running on which nodes. That's awesome. If you need that level of control, it's there.

Do we always need that level of control over exactly how services get deployed? I don't think that we do. Sometimes we do, for sure, but not always.

What I think we really want from orchestration, most of the time, is:

  • Can I make sure I have more than one instance of a service running, for redundancy?
  • Does each service have enough resources (CPU, memory, network bandwidth, etc.) to run well?
  • Can I auto-scale capacity up when I need to to deliver more resources? Can I auto-scale down when things are slower?
  • Can I roll out updates in safe way that doesn't cause site outages?

Something like that. You might have more requirements, but that's mostly what we get from Kubernetes.

And, for this exercise, I'll suggest that Azure App Service (with Azure Monitor for fine-grained control over auto-scaling) gives you those features without having to write a single line of YAML.

Group services by their scaling requirements

Imagine that we have 100 services that we need to deploy onto a set of App Service machines. How should we organize them for the best experience for our users, and our lowest Azure bill at the end of the month?

I'll start by suggesting that you think about your services in terms of which resource they're likely to run out of first: CPU, memory, or bandwidth; or which external resources indicate pressure, like queue lengths or task pool task length.

I think these are a good first guess for most organizations on how to deploy services together, but - I want to be clear - they're not intended to be an authoritative list. At all. Once you start thinking about your services you might find other perspectives to take on your services that help you scale them up and down properly. But let's proceed with that list.

Services with similar scaling requirements get deployed together.

So, back to the 100 services, let's imagine that they break down like this:

  • 50 are CPU-bound with low calls/sec
  • 20 are CPU-bound with high calls/sec
  • 5 are memory-bound
  • 5 are bandwidth-bound
  • 10 are message-queue-length-bound
  • 5 are task pool bound

For the services that are CPU-bound, what we're saying it that: if I have sufficient CPU available for this service, it should run fine. If two or three services that are all CPU-bound run together on the same servers, and they need to auto-scale up, it doesn't matter which service caused it... they'll all run on additional CPU's and all have enough to run well.

I definitely don't need to decide up-front how much memory each server needs; the Windows and Linux memory managers are far better at allocating memory across dozens of processes than I will ever be.

And if 50 (or more) services all were CPU-bound, stateless, and relatively low calls/sec, we could run all of them together in one Azure App Service Plan with a large enough VM size chosen, and all we have to do is set auto-scaling to be based on CPU %. Or split them up into two. It's not like a server can't run 50 processes without a problem.

Whatever. It works. They're just processes. I never have to think about it.

For the services that are memory-bound, we can deploy them all together on sufficiently large VM's that they run well, and we can use Azure Monitor to set up auto-scale parameters around not just CPU, but also things like available memory and process memory sizes.

For the services that are bandwidth-heavy, same thing, We use Azure Monitor to keep an eye on network traffic, and scale accordingly. The services requiring the bandwidth don't have to care which service caused a scaling event, as long as they all have sufficient available bandwidth.

As an aside, in many cases, being queue-length bound or being task pool task-bound shows up as just using a lot of CPU, and if you throw more CPU at either problem, it should go away. But it's entirely possible to be bound by some external service or resource that shows up in your system as a queue-length issue.

Fine tuning

Start simple. When you identify a service or two that really should be run separately from the others, move it to another App Service Plan. No big deal. Let evidence be your guide.

What about Azure Functions?

I'll bet that some of those services don't get all that many calls every day, and I'll bet that some of them work asynchronously - responding to queue messages, or handling tasks not directly related to UX - and for those services, Azure Functions is great. Let Azure handle everything, and only pay for the actual CPU time and memory your code is using.

And if you want to use Azure Functions for your services, but need to make sure that you never hit a cold start, you can deploy an Azure Function project on an App Service Plan that you own exclusively. You can deploy them side-by-side with Web Apps on the same Plan. Totally up to you. "Serverless" model, but on servers you have to yourself.

Wrapping up

Again, I'm not saying that Kubernetes isn't good at what it does, or that it's not an appropriate choice for some solutions. I'm just saying that it does come with quite a bit of care and feeding required, and for many solutions - many more than we seem to have settled on as an industry - you don't need it. You can deploy dozens of services, handling thousands of requests/second, that will run very well without an orchestrator, or a single IaaS VM, or YAML file, in sight.

Really, what I'm encouraging you to do is to think for yourself. Don't just default to what's in fashion right now. Every piece of technology you choose to operate yourself incurs tech debt, takes up engineering time, causes outages, requires updates, etc. After a lot of years in the business, I want as little of that as possible.

Azure App Service and Azure Functions are amazing Platform-as-a-Service pieces to build on that give you utility and performance with as little management overhead as possible. I use them whenever I can, but, either way... think about PaaS instead of IaaS, think about what you really need, not just what the cool kids are doing, and keep things simple.



Azure Queue Storage vs. Azure Service Bus

Mon 17-Feb-2020 09:44 PM +00:00

TL;DR: Always choose Service Bus.

I'll be honest, this isn't going to be close. I've used both, and while they both sort-of do the same thing, Azure Service Bus has so many more features for each message, and is now cheaper at higher volumes, that there's no reason to use Azure Queue Storage for a production application unless you have low message volumes and want to save less than $10/month.

Pricing

Azure Queue Storage

Here's the thing: Azure Queue Storage used to be cheaper, but if you're running General Purpose v2 Storage Accounts, it's not anymore. There are two costs for Queues: storage, and transactions. Since queue messages won't often accumulate much, they won't take up much storage, so that'll be like $0.05/month. For General Purpose V2 storage accounts, Queue messages cost the same as they do for Table and Blob Storage: 10,000 transactions for $0.0040. (The pre-transaction cost is now higher with V2.)

In other words, if you send 10,000 queue messages/day for a month, it'll cost you $1.20. (A message incurs two transactions: one send, one receive).

Azure Service Bus

Azure Service Bus, on the other hand, starts its pricing at $10.00/month for up to 13,000,000 operations, with the next 74M operations billed at $0.20/million, and other rates at higher numbers. (Like Queue Storage, sends and receives both count as operations.)

Pricing comparison (each message counts for two operations)

For low numbers of messages, Azure Queue Storage is cheaper because of the minimum $10.00 cost.

Messages / Operations Azure Service Bus Azure Queue Storage
1,000,000 / 2,000,000 (~0.38 msg/sec) $10.00 $0.80
12,000,000 / 24,000,000 (~5 msg/sec) $12.20 $9.60
18,500,000 / 37,000,000 (~7 msg/sec) $14.80 $14.80 ⬅ break-even
50,000,000 / 100,000,000 (~20 msg/sec) $35.20 $40.00
250,000,000 / 500,000,000 (~100 msg/sec) $115.20 $200.00
500,000,000 / 1,000,000,000 (~200 msg/sec) $215.20 $400.00

The break-even point is about 18.5M messages/month.

So Queue Storage is cheaper at lower message rates, but if you're serious about messaging, whether in terms of volume or features, you want Service Bus.

Features

Speaking of features...

Azure Queue Storage

Uh… um… well, you can definitely send messages up to 64KB using it. Yeah. And you can get up to 2,000 messages/second (i.e. 2,000 transactions/partition in Azure Storage) from each queue.

Azure Service Bus

On the other hand, Azure Service Bus actually has features.

One-to-Many Routing: Topics and Subscriptions

Azure Service Bus has the expected one-to-one routing of Queues. It also has one-to-many routing using Topics and Subscriptions.

Service Bus Topics receive messages just like queues, but then they distribute them to as many Subscriptions as you want. Each Topic can have up to 2,000 Subscriptions, so, you know, that's a lot of fanning out.

One excellent use of this is for sharing overall system status between servers. You could have one well-known Topic that receives all of the updates from all of the servers, and then have each server register itself as the receiver of a Subscription on that Topic. Now every server gets to see all of the status updates from all of the other servers.

CorrelationId

The Message, which is the .NET class we use to send and receive messages with Service Bus, has a lot of great features. The first one I want to highlight is the CorrelationId. The CorrelationId is a field you can set with an arbitrary string you use to track an operation all the way through your system. For instance, let's say a client sends an update message to a Service Bus Topic. You might want to have the client generate a GUID that gets put on the message. Your server logic would take that GUID and pass it along with each step in a process, including all logging that takes place on the way. Having it right on the Message instance is very handy, and I've used it to great effect.

Properties

Another great feature of Message is the UserProperties property (yeah, I know). This is a simple Dictionary<string, object> that you can use to put any kind of metadata on the message. I've used these properties to hold context about the message; while I passed the serialized payload in the body of the message (so I can deserialize to the same type), I was also able to include information about how to handle the message in Properties without having to change the structure of the data itself.

Labels

The Label property lets you assign labels to your messages. This could also be done in Properties, but I like that Label is a first-class property. I think it shows thoughtful design in Service Bus.

Sessions

When you have to guarantee delivery order of a series of messages, Service Bus has Sessions to help. When you create a Queue or Topic that's session-enabled, every message that has the same SessionId is guaranteed to be delivered in-order.

Filters

Every Subscription can have a filter applied to it, to allow it to tell Service Bus which messages to deliver, and which ones not to deliver. The filters look like SQL WHERE clauses, and can save you from seeing messages that you'd rather not see.

A great place to use this is, again, the example of a well-known topic that servers subscribe to in order to communicate with each other. Each subscriber should place a filter on the subscription it sets up to filter out messages sent by themselves... they already know, no need to get the message again.

And more...

...lots more. ContentType. Batch sending and receiving. AMQP support. And more.

Wrapping up

Choose Azure Service Bus, because duh.

One caveat

If you really don't care about doing anything but pumping messages through queues, and you're not going to track them, sure, Azure Queue Storage would work absolutely fine. (If you'll blow past 2,000 messages/second, just do sharded queues.) I'd like to add: especially if you don't mind losing a message here and there... not because I think Azure Storage will lose any messages, but because you're giving up your ability to correlate your queue messages all the way through the operation.

But, sigh even then, use Azure Service Bus. If you're at the scale of 2,000+ queue messages/second, you're running a serious system. Just use Service Bus.