Simulate rate limits on any API with Microsoft 365 Developer Proxy


Simulate rate limits for any API and verify that your app won’t break when it exceeds the API quota.

Even cloud has its limits

Most cloud APIs come with service limits. Typically, depending on your plan and the API, you get to call the API the agreed number of times in a period. It makes sense. Cloud API providers use service limits to ensure fair use of their APIs and handle the scale properly. Like with any limit though, there’s a chance that you’ll exceed it. And when you do, your app could break.

If built properly, APIs use response headers to communicate information about rate limits. For example, when you call the GitHub API, you get a response similar to the following:

$ curl -i https://api.github.com/users/octocat

HTTP/2 200
[...] trimmed for brevity
x-ratelimit-limit: 60
x-ratelimit-remaining: 59
x-ratelimit-reset: 1694784833
x-ratelimit-used: 1

{
  "login": "octocat",
  "id": 583231,
  "node_id": "MDQ6VXNlcjU4MzIzMQ==",
  [...] trimmed for brevity
}

The x-ratelimit-... headers communicate information about the rate limits for the API. In this case:

  • you’ve got a total of 60 resources in the time window (x-ratelimit-limit)
  • you’ve used 1 resource in the current time window (x-ratelimit-used)
  • you’ve got 59 resources left until the window reset (x-ratelimit-remaining)
  • the window will reset at 1694784833 (UTC epoch seconds; x-ratelimit-reset)

Each API has its definition of the time window duration, the number of resources available in a time window, and the cost per request. Often, these values are related to your subscription level for the given API.

The trouble with rate limits

While there’s a proposal for how rate limits should be implemented in an API, the reality is that each API does it differently.

Following the IETF draft, API should use ratelimit- response headers (without the x-) prefix to communicate rate limits to the user. The reset value should represent the number of seconds left until reset. If you look at the GitHub API example I’ve just shown you, you’ll see that it uses prefixed headers and the reset header is set to reset the timestamp in UTC epoch seconds.

While the IETF draft mentions that after exceeding the quota the API will stop serving your requests, it doesn’t prescribe a specific way of handling it. OneDrive APIs on Microsoft Graph, for example, will return a 429 Too Many Requests response. GitHub on the other hand, will fail with a 403 Forbidden error.

There are also some more nuances, like Microsoft Graph will only communicate rate limits when you exceed 80% of the quota while GitHub APIs will communicate them all the time.

How will your app react to exceeding cloud API service limits

When you build an application connected to a cloud API, it most likely has a service limit defined for its usage. The last thing you want is for your app to fail miserably when it exceeds the API’s service limits. So to prevent that, you should test how your app will react when it exceeds the API service limits. But how do you do that?

Brute-force testing exceeding service limits

One way to test how your app will react to exceeding service limits is… to exceed service limits! Keep calling the API until you exceed the limit and start getting errors.

While it works, it’s problematic at least for two reasons.

If you burn through your resources, you won’t be able to do anything until the window resets. Depending on the API, it could be minutes, hours, or days, which isn’t great for your productivity.

Then, from the API provider’s point of view, excessive use of the API is not something they’re glad about. After all, you’re causing unreasonable stress on their service. So in extreme cases, it could even lead to having your account suspended. Again, not something you want to do.

Luckily, there’s a better way.

Simulate rate limits with Microsoft 365 Developer Proxy

You can easily simulate rate limits using Microsoft 365 Developer Proxy. Here are some examples to help you get started.

Simulate Microsoft Graph rate limits

The Rate Limiting Plugin is by default configured to match the behavior of Microsoft Graph. Using its configuration, you can define how many requests you’re allowed to do before you exceed the limit. These values are meant for your convenience during testing and don’t represent the actual service limits.

Download the configuration.

{
  "plugins": [
    {
      "name": "RetryAfterPlugin",
      "enabled": true,
      "pluginPath": "~appFolder\\plugins\\m365-developer-proxy-plugins.dll"
    },
    {
      "name": "RateLimitingPlugin",
      "enabled": true,
      "pluginPath": "~appFolder\\plugins\\m365-developer-proxy-plugins.dll"
    }
  ],
  "urlsToWatch": [
    "https://graph.microsoft.com/*/drive*",
    "https://graph.microsoft.com/*/shares/*",
    "https://graph.microsoft.com/*/sites/*",
    "https://graph.microsoft.us/*/drive*",
    "https://graph.microsoft.us/*/shares/*",
    "https://graph.microsoft.us/*/sites/*",
    "https://dod-graph.microsoft.us/*/drive*",
    "https://dod-graph.microsoft.us/*/shares/*",
    "https://dod-graph.microsoft.us/*/sites/*",
    "https://microsoftgraph.chinacloudapi.cn/*/drive*",
    "https://microsoftgraph.chinacloudapi.cn/*/shares/*",
    "https://microsoftgraph.chinacloudapi.cn/*/sites/*"
  ],
  "labelMode": "text",
  "logLevel": "info",
  "rateLimitingPlugin": {
    "costPerRequest": 1,
    "rateLimit": 40,
    "retryAfterSeconds": 20
  }
}

Following this configuration, you’ll be able to call OneDrive APIs 40 times in 1 minute. If you call it more frequently, you’ll be throttled and asked to back off for 20 seconds.

Simulate GitHub API rate limits

As I mentioned previously, GitHub API behaves differently. It shows rate limit information on each request, uses a different format for the reset value and when you exceed the quota, it returns a 403 Forbidden response.

To help you simulate rate limits on the GitHub API, but also to show you how to implement custom rate limit behaviors, in Microsoft 365 Developer Proxy v0.12.0-beta.1, we extended the Rate Limiting Plugin with support for custom behaviors.

Download the configuration.

First, we define the proxy rate limiting configuration:

{
  "plugins": [
    {
      "name": "RateLimitingPlugin",
      "enabled": true,
      "pluginPath": "~appFolder\\plugins\\m365-developer-proxy-plugins.dll",
      "configSection": "rateLimiting"
    }
  ],
  "urlsToWatch": [
    "https://api.github.com/*"
  ],
  "rateLimiting": {
    "headerLimit": "X-RateLimit-Limit",
    "headerRemaining": "X-RateLimit-Remaining",
    "headerReset": "X-RateLimit-Reset",
    "costPerRequest": 1,
    "resetTimeWindowSeconds": 3600,
    "warningThresholdPercent": 0,
    "rateLimit": 60,
    "resetFormat": "UtcEpochSeconds",
    "whenLimitExceeded": "Custom",
    "customResponseFile": "github-rate-limit-exceeded.json"
  },
  "rate": 50,
  "labelMode": "text",
  "logLevel": "info"
}

In the configuration, we specify the header names, window duration, quota, and the reset value format, and we want to use a custom behavior rather than throttling when the app exceeds the quota.

Then, we specify the custom response in the same format we specify mocks in Microsoft 365 Developer Proxy:

{
  "responseCode": 403,
  "responseHeaders": {
    "Content-Type": "application/json; charset=utf-8"
  },
  "responseBody": {
    "message": "You have exceeded a secondary rate limit and have been temporarily blocked from content creation. Please retry your request again later.",
    "documentation_url": "https://docs.github.com/rest/overview/resources-in-the-rest-api#secondary-rate-limits"
  }
}

Using this configuration, you can simulate rate limits on GitHub API.

Microsoft 365 Developer Proxy simulating exceeded rate limit on GitHub APIs

Summary

Cloud API providers use rate limits to ensure fair use of their APIs and handle scale. When using cloud APIs, you need to properly handle rate limit information to get the maximum throughput and offer your customers a great user experience when using your app. Microsoft 365 Developer Proxy helps you simulate rate limits on cloud APIs so that you don’t need to use excessive load to force exceeding cloud API limits.

Others found also helpful: