[MUSIC PLAYING] Hi. And welcome to the Virtual Experts Conference sponsored by Quest. I'm Paul Robichaux and today I'm going to be talking about how service throttling in Microsoft 365 stinks, but also what you can do about it. So I come to this honestly, I've worked as an Office Apps and Services MVP since 2003. And right now I'm the Senior Director of Product Management at Keepit where I manage our backup products, but before that, I had a stint at Quest managing their SAS application products and before that it Quadratec.
So I've been around the 365 ecosystem for quite a while. And you can see Pancake the cat has joined me for this meeting. That's something the virtual folks are getting that the live event attendees in Atlanta did not get because there was no cat. So what are we talking about? Well, very quickly, we're going to talk about what throttling, is how it works, when it happens and what you can do.
But I want to sum up at the very beginning by a quote from my former coworker Randy Rempel at Quest, "if you don't want to get throttled? Don't migrate" because throttling is an inextricable part of pretty much every large scale thing that you might do with this service. And so understanding when it happens, when it's likely, and how to respond is really critical because there is no world in which you're not going to get throttled some of the time working with the service. So we talk about throttling, there are two meetings in the dictionary. One is to attack or kill someone by choking them and the other is to control an engine or a vehicle with a throttle.
So neither one of these is quite exactly right for what we want, but if we treat throttling like a crime and ask what a real detective or a CSI detective would look for, we're looking for a motive, a means, and an opportunity. Well for throttling, let's define what we actually think the crime is in this case. If it's not choking someone or controlling an engine, it's an intentional restriction of performance to keep the system from getting into a bad state. So if the restriction is accidental because the system designers didn't put enough scale into the system, it's not throttling. If throttling-- If the mechanism of throttling isn't to restrict performance in some way, it's probably also not throttling.
And then if you're doing it for some other reason other than the managed performance, generally, most people would say that's not considered to be throttling. For example, Exchange supports a mechanism called tarpitting that slows down incoming connections when the exchange server thinks that someone is using it to spam. That's not exactly throttling. It's not meant to keep the system from going into an undesirable state, it's meant to keep the riffraff out, if I can say it that way.
Autoscaling is a little bit different, this is automatically changing the amount of resources that are assigned to something to keep a desired performance level. So these two are sort of opposite. Throttling means I have a certain amount of resources and I'm going to restrict people's ability to use it to maintain performance. Autoscaling is I want to maintain a level of performance and I will throw more resources at the problem. This is what Xbox Live does.
When you are playing a multiplayer Xbox Live game the back end of that game will autoscale by adding more Azure VMs or more VMs on whatever platform the game is running on to provide enough capacity to support party chat and the lobby and so on. If a million people are playing halo there are going to be more autoscale VMs running than if 10 people are playing it. Throttling would just say, oops, sorry. Halo is too busy, you can't play right now. Or you would play, but you get one frame per second and you'd be sad.
Now that we have a better understanding of what throttling is, let's talk about the motive. Why would Microsoft do this? You know it's not going to make people happy. The truth of the matter is that in modern environments at cloud scale vendors have to do this, if they don't they can't control costs, they can't deliver reliable performance, or reliable service quality. Throttling can help prevent denial-of-service attacks, it prevents a big company or a big tenant from hogging all the capacity that smaller tenants might want, it helps even out spikes in demand during busy periods, and it helps provide more predictable performance for everybody.
You would hate it if Microsoft didn't throttle. That doesn't mean that we love it when they do, but things would be much worse without a degree of throttling applied throughout the Microsoft 365 platform. So how does it work? Well, I talked earlier about crimes having means, motive, and opportunities. It turns out throttling has actually got multiple means that can be applied to enforce it.
The first thing I want you to understand is that throttling can be applied across all levels of the stack, from the network to the physical server, the mailbox, et cetera. And we'll talk more about what those are in a minute, but it helps if we separate different types of throttling or different layers where throttling is applied and call them something, I'm going to arbitrarily choose the term domains. So most of the time in 365, domains are throttled independently of one another, which means you may get throttled using the SeeSaw API to access SharePoint data, but that same throttling won't necessarily affect you when you are doing operations against the same tenant with Microsoft Graph. And that's important, you'll see why in a little bit.
Now, if you think about all of the different domains where Microsoft might potentially throttle some set of operations that you're trying to execute they include the network, users or things that belong to users, mailbox is being probably the most obvious example, but also OneDrive document collections. APIs, so right now Microsoft has got separate throttling limits in place for using exchange web services, SeeSaw, and Graph. So if you're throttled on Graph that doesn't mean you'll be throttled on EWS and if you're throttled on both EWS and Graph that still doesn't mean that you will necessarily be throttled on some other endpoint someplace else. Throttling can apply at the tenant level, this gets a lot of people into trouble when they're migrating or backing up their data. Throttling can happen at the workload level, so Microsoft may choose to throttle only operations against Teams or SharePoint or OneDrive or Exchange Online and not throttle operations against other workloads, even though they're in the same tenant or the same region.
Microsoft can throttle individual servers, now this takes us back to on-prem days when you would see self-throttling in exchange servers when you didn't have enough CPU or RAM or disk IOPS to satisfy a particular workload. And then finally, they can throttle in regions. We don't see this as much because Microsoft has built their worldwide infrastructure out with generally enough capacity per region, but they can do it if there's a reason for them to do so. Now, super important point that I want you to keep in mind, for most of these domains Microsoft will not change the limits. You can call them and you can beg, or plead, or threaten, or offer bribes and they won't change those limits.
The other thing is that, in general, Microsoft will not tell you that you're throttled, no. You'll see there is a significant exception that I'll talk about when we talk about how throttling manifests itself to your applications. But it's not like you can go into the 365 admin portal and see a blinking red light that says your domain is being throttled right now, which would be really helpful if we did have that, but we don't. So with that beginning understanding of what throttling is, your next question is very likely to be, OK, well now that I know what it is, what makes it happen? When is it likely to affect what I'm doing?
One case is if you do one thing to lots of objects very quickly. So imagine that you make backup software, one of the things you might like to do is back up all the messages in a selected set of mailboxes. And one way that you might do this is you might open a mailbox and read a message and at the same time open another mailbox and read a message and continue doing that until you've gotten all the messages you want. If you do that to enough objects at the same time, you'll get throttled.
Now, there's another case, which is sort of the opposite side of this coin. Imagine that you want to do some kind of high volume transactional processing, like sending all of the customer account statements via email at the last hour of the last day of each month. And you think, OK, I'm going to write a program and we'll use EWS to pump all these messages out by putting them in the sent items-- or sorry, in the pickup folder of a single mailbox. You're going to get throttled. You're doing many things to the same mailbox.
The other thing, and this is much more difficult to plan for and to work around, is that sometimes you will try to do multiple complicated things at the same time and that will get you triggered. You can think of throttling as being like a bucket. If you are doing things that fill the bucket up when the bucket fills, instead of overflowing you get throttled. So you can fill a bucket by pouring one liquid into it or by pouring a lesser volume of two different liquids or five different liquids into it, but what matters is when the bucket fills it overflows. Throttling works exactly like that.
So doing something like a migration from on-prem SharePoint to SharePoint Online and trying to back up your SharePoint Online Estate at the same time, great recipe to fill your throttling bucket. Keep in mind that all throttling operations are all operations in the service and are not created equal. Microsoft does do a pretty good job of defining, at least for SharePoint, which of these operations are more likely to trigger throttling than others. So they specifically call out Sites.Read.All permission queries. They call out certain types of right operations that are going to have a heavier penalty.
We found, experientially, that using the Graph API endpoint that you use to read authentication methods for an Azure AD user is about 5x more likely to get you throttled than reading the user data itself from Azure AD. Now, the reason why there's a difference in these throttling costs is because on the back end some APIs are heavier weight, is a good way to put it, they're more computationally expensive than others. And so Microsoft adjusts those throttling levels accordingly. In general, any time you make a change to a position you run the risk that you may invalidate the cached permission data that Microsoft tries to keep. And so when you do that, if you're going to invalidate the cache, you're going to have to pay for it. In general, though, if you think about the delta between different operations, the difference between an expensive operation and an inexpensive one, five to one is a fair way to plan.
It's not always that ratio, but that's not a bad ballpark. Now, I included this note, which is a screenshot of the note from the Microsoft Documentation. They may change these limits at any time, including while you're in the middle of doing a project. So just keep that in mind.
Backups and migrations tend to trigger throttling, big time. So if you think about why that is, it's because backups and migrations both are doing a high volume of operations with high concurrency and some of those operations may be expensive. By concurrency I mean that you may have lots of things going on in parallel and I'll explain in a minute why that's important. In general, experientially, and I've confirmed this with Randy and the tenant to tenant and migration Teams at Quest, you're most likely to see throttling when you're doing heavyweight operations against SharePoint Online. Second place is likely to be OneDrive, third place is likely to be Teams, and then Exchange.
This may seem a little counterintuitive, but remember Exchange and the Exchange engineering team have decades of experience dealing with high volume operations with high concurrency for backup, for restore, for migration, for all kinds of tenant to tenant operations, mailbox moves, et cetera. So those teams have a great deal of experience knowing exactly where the pain points are for them and adjusting throttling limits to minimize the impact. Keep in mind, and I apologize, I just realized this is cut off on the bottom boundary of the window, remember that we shared domains across operations. So if you think about getting throttled on Graph during a SharePoint migration that may impact your throughput for some other operation using Graph like loading web page.
We actually ran into this, this is fascinating, with a non Microsoft workload that we back up. It turns out if a backup is running beyond a certain degree of concurrency and the user tries to use a web application, front end for this app-- I'm not going to say who it is because that's not relevant. But a backup can throttle things so badly that users who are trying to use the front end to do the application work are not able to because they get throttled. Now, Microsoft knows better than that, this third party knows better than that too now. We've reported some bugs and we just actually hired one of their engineering directors to try to get this fixed, but remember that whatever one operation pours into the bucket that may impact the ability of another application to use resources that are throttled or controlled by that same bucket.
So I promised that I would talk a little bit about concurrency. And the useful-- or a useful way to think about this is suppose that you want to read 100 messages from each of 100 mailboxes. So one way that you could do, is what I said before, you can start off and you could open mailbox one and read message one and open mailbox two and read message one and continue doing that until you've read one message from each of the 100 mailboxes and then go on to the next message. A slightly smarter way to do that would be to, at the same time, open, let's just say, ten mailboxes. There's going to be a limit on how many resources you can work on in parallel.
So let's say, just for the sake of argument, that you're going to open ten mailboxes as your width and do one operation at a time. So you're going to open simultaneously or concurrently mailboxes one through ten and you're going to read message one, then you're going to read message two, and so on. And when you're done with mailboxes one through ten then you will concurrently open 11 through 20, and so on. So that's one way to do this.
Another way to do it is that you open one mailbox and you read messages in chunks of ten. OK, now think about the difference between those two. In both cases, you're still getting 10 messages at a time, but in one case you're spreading that load across ten resources, which means you're not hitting a single object, you may not be hitting a single server, you may not be hitting a single disk, you may not even be hitting a single region. So there's some advantage in going wide. Now, the trick is you're operation has to take into account how Microsoft is going to slap your hand until you've gone to wide. Don't forget, like I said on the previous slide, we share domains.
So it's tempting to say, OK, well what I'm going to do is I'm going to go as wide as I can and go until I see throttling kick in and then start to back off. But the problem is that by doing that you may sabotage your ability to use another endpoint that doesn't have anything to do with that primary workload. There's another potential problem inherent in this approach too. Imagine that you're mailing letters in batches of ten and they're all going to unique addresses.
So you mail letter number one, and you mail letter number two, and you mail letter number three. And then you get letter number one back from the post office with a big invalid address scrawled on it, but you've already put mail or letters four, five, and six in the outbox waiting for the mail carrier to come pick them up. So maybe a better way to think of it is, instead of putting these in the mail, you're throwing messages in bottles into the water or something. The point is that if you do something early on that triggers throttling, all the requests after that are automatically going to get throttled as soon as you make them, which drives your concurrency, your rate of successful concurrency, way down.
Now, I mentioned earlier that Microsoft does not generally give you the ability or didn't have the ability, themselves, to adjust these throttling limits. This is true for concurrency too. So it's not really that Microsoft doesn't want to be helpful. When their support engineers say, oh, you're being throttled you should call your vendor or you should change the way that you're doing whatever it is you're doing, there is no way for them to adjust many of these limits. And when I say, they, I mean support.
Of course, at the end of the day, the engineers who are building 365 and operating always have the ability to go into the code or go into the back end and change these knobs, but the support teams don't. And even if they did, they are very unlikely to be given permission to do so just because it inconveniences you from a performance standpoint, really important to keep in mind. I don't want it to sound like I think Microsoft is deliberately being unhelpful, but the fact of the matter is there's not a lot that they will do when you run into throttling. That means the onus is going to be on you to try to follow better practices.
Now, why did I say better practices instead of best practices? It's simple, there is no one perfect set of best practices for managing throttling. Because remember what I said earlier, you may get throttled by doing one thing across a large set of resources or by doing many things across a smaller set. And so the right practices for you to turn down the throttling pain are going to depend on exactly what you're doing, where you're doing it, what resources you're doing it against, what APIs you're using, and so on.
And I'll give you an example of a real world situation that we were recently in. We had a customer who was complaining that they were getting throttled during a backup operation. This is a large scale backup so it's not unheard of for them to get throttled, but when we looked into it we thought, oh, it looks like they're being throttled much more severely than we would expect. This customer was in the Nordic region and so all of their resources were being served out of the same region by Microsoft, their tenant was homed in that region. And we thought, OK, what's really going on?
Well, it turns out they were doing something that was not great from a throttling perspective. The second bullet, it says don't overlap operations, they were doing a backup at the same time that they were doing the concurrent SharePoint migration that I mentioned earlier. And so depending on the time of day, during non-working hours the migration was running and it was pouring stuff into the bucket so fast that the backup couldn't run. Then during the day the backup engine was trying to catch up on all the work it didn't get to do overnight because it had been throttled. And it was consuming so much throttling bandwidth that other things that they were running against their SharePoint tenant were being throttled.
So, if you can, don't overlap multiple operations that may cause throttling. Also, keep in mind, and I've seen this many times in the tenent to tenent and data migration world, you're not necessarily going to get better throughput by doing the same thing with multiple instances of a tool. Kind of the canonical example of this is people will say, oh, you know what, if I'm not getting enough migration throughput what I'll do is I'll have five service accounts instead of just one and I'll run five copies of my migrate here and migrate as though I were having five completely separate migration streams. Well, depending on what you're migrating and when and what tool you're using, that may help or it may make things worse because the exact results are going to depend on what APIs that tool uses, what time of day, what else is going on in the system, do you happen to be running in a region that has lower throttling limits because it has less capacity, et cetera.
In general though, don't count on being able to multitask like that as a means to evade throttling. Now, this last one is super important and I wish more people understood it. When you're getting throttled don't make things worse on yourself. What does that mean? Well, Microsoft actually gives back information that vendors can use to help them understand when a workload is being throttled.
So this second bullet, Retry-After and Rate-Limit, both of those are headers that Microsoft returns when throttling is happening and they tell the offending application, here's when you can try again. OK, I like to use the example of a toddler. Many of you have had toddlers or been around toddlers and you know sometimes they will just continue to ask the same question over and over and over. Can I have some ice cream? No, not right now.
5 minutes later, can I have some ice cream now? No, not right now. Can I have some ice cream now? No, you can't have any ice cream. Retry-After is a header that the service uses to tell you, you can't have any ice cream until after 1745 Zulu, in other words, don't ask again for a certain period of time.
Rate-Limit is a really interesting header, it's an internet engineering task force draft standard that Microsoft does not fully implement yet and their own documentation will tell you this. But what it's intended to do is to tell the calling application exactly how much of the throttling limit it has used. So Rate-Limit headers may indicate you're allowed 200 requests per second and right now you're running at a sustained rate of 185 requests per second. As an application maker, man, that's gold because now I know exactly how much more headroom I have before the bucket's full. So I can do that thing that a good bartender will do where I slowly pour into the glass just at exactly the right rate to keep the meniscus on top of the glass nicely curved with liquid without overfilling it.
Decoration headers are a SharePoint thing that your vendor-- I don't know of any major header any major vendors that don't correctly implement this, but if you're writing your own applications you need to know you have the ability to include what Microsoft calls decorations that tell them what application you are so they can go, oh, OK, this isn't some hacker mounting a distributed denial-of-service attack it's actually a migration application or a data analysis application or a backup application. Manage your concurrency properly.
Man, I can't emphasize this strongly enough. As an application developer, concurrency is one of the most difficult problems you're ever going to run into when you develop cloud applications, but you have to really be smart, or maybe clever is a better word, in how you do that when you're running against the service. And these headers are critical to getting good results because it's easy using dot net, in particular, it's really easy to write concurrent applications because you can define that you've got a block of something that should be a thread and you just spin out a million of those. And if you do that, you're going to get throttled faster than anything. If you manage your concurrency properly and honor the headers that you get back, that means you'll be able to, if the throttling limit is up here, you'll be able to get close to it and hopefully not bust through and end up in throttling land.
Use batching and paging. So this is interesting, I see a lot of people struggle with this concept when they start working with PowerShell because you can open up a PowerShell session, you can say something like get mailbox and PowerShell will happily go out and get you a bunch of mailboxes. But the more mailboxes you ask for once, the longer that call is going to take. And so Microsoft sensibly put some restrictions in place to limit the max number of results that you get back. So that teaches you that, OK, if I really want to get all of the mailboxes in 100,000 mailbox tenant, I need to figure out how am I going to do that.
I can't get all 100,000 at once. Maybe I can get the first 500 and then the next 500 and so on. Well that's just paging. So learn how to use paging with the Graph API, super important. One of the things that Graph will do is give you hints to tell you how many items you can get back in the next page that you ask for, which gives you this nice dynamic sort of sliding window mechanism that will help make your transfers more efficient.
This last bullet, I cannot emphasize this enough. If you're an ISV, even if you're writing applications for your own internal line of business use, please tell users when they're being throttled. If you imagine the Windows progress dialogue, everybody's seen this when you're copying files, a little progress bar will say your file copy will be done in 6 minutes, 16 minutes, 4 minutes, 14 minutes, 18 minutes, 3 minutes, 2 minutes, now it's done. It just bounces around all over the place. That's not because of throttling, that's because the system is making its best attempt to estimate the finish time based on the real world throughput it's getting. You can do the same thing.
If you know what your real world throughput looks like, when you see it change, for better or worse, you can keep track of that. And when you start seeing Retry-After or Rate-Limit headers come back then you're either throttled or about to get throttled and you can show the user that. How you show them? It's up to you. You can use color, you can use the little blinky icon, you can log something in a log somewhere, but helping users understand that the application is working, it's being slowed because of Microsoft's throttling is really doing them a good deed.
Now, many of you are probably going to be interested in learning a little bit more about the nuts and bolts of how Graph throttling works and about how the other throttling subsystems that Microsoft implements works so I've included these links that will tell you more, but not everything. There's still a lot of this throttling stuff that is not well documented or not documented at all because in some ways, as consumers of the service, as the admins whose users are using it, we're not supposed to care. Throttling is a protection mechanism for the operators of the service, not necessarily something that end users or tenant admins will be aware of. But the reality is, because you may have it enforced on you in some circumstances, you do need to be aware of it. And the more you learn, the better equipped you'll be to avoid throttling problems.
So with that said, let's take a few minutes for questions.
[MUSIC PLAYING]