Features
Ratelimit Types
Ratelimit AI provides three types of rate limits specifically designed for LLM API usage. You can use these limits individually or combine them based on your needs.
Requests Per Minute (RPM)
The number of requests allowed per minute. This limit is useful for:
- Controlling the number of requests made to the LLM API in a short period.
- Preventing API throttling
- Managing concurrent request load
The RPM limit is checked before each request. If a user exceeds the limit in a minute, subsequent requests will be rate limited until the minute window resets.
Requests Per Day (RPD)
The number of requests allowed per day. This limit is useful for:
- Implement daily usage quotas
- Set up tiered access levels
The RPD counter resets at midnight UTC, making it ideal for implementing quotas like “1000 requests per day per API key”
Tokens Per Minute (TPM)
A specialized limit that tracks total token usage per minute, which is essential for LLM API cost management:
- Counts both input (prompt) and output (completion) tokens
- Provides more granular cost control than request-based limits.
- Aligns with LLM provider pricing models
Request Scheduling
When rate limits are hit, Ratelimit AI can automatically schedule requests for later processing.
This feature requires the QSTASH_TOKEN
environment variable to be set.
When a request hits the rate limit:
- The request is automatically scheduled in QStash
- QStash waits until the rate limit resets
- The request is executed (prompt will be sent to the LLM API) and the response (completion) is sent to your callback URL.
Analytics
RatelimitAI can collect analytics about your rate limit usage. Analytics tracking is disabled by default and can be enabled during initialization:
When analytics is enabled, RatelimitAI will collect information about the number of requests made, rate limit successes, and failures. This data can be viewed in the Upstash Console.
Dashboard
The Upstash Console provides a Rate Limit Analytics dashboard where you can monitor your usage. Access it by clicking the three dots menu in your Redis database page and selecting Rate Limit Analytics.
The dashboard displays three main categories of requests: allowed requests showing successful API calls, rate limited requests indicating which identifiers hit limits, and denied requests showing blocked API calls. You can view this data over time and see usage patterns for different rate limit types.
If you’ve configured RatelimitAI with a custom prefix, enter the same prefix in the dashboard’s top left corner to filter your analytics data.
For each rate-limited request, the analytics system records the identifier, timestamp, limit type (RPM/RPD/TPM), and status. For token-based limits, it also tracks the number of tokens used. This information helps you understand your API usage patterns and optimize your rate limit configurations.