Elijas/token-throttle

Multi-resource rate limiting for LLM APIs. Reserve tokens before you call, refund what you don't use, stay under the limit across workers.

40
/ 100
Emerging

When you're working with large language models (LLMs) and making many API calls, especially across different applications or in batch processes, it's easy to hit rate limits and get errors or dramatically slow down your work. This tool helps you manage those limits by letting you reserve the tokens you expect to use before you make a call, and then refund any unused capacity afterward. This ensures you maximize your allowed usage without exceeding your provider's limits, making it ideal for developers building LLM-powered applications.

Use this if you are a developer integrating LLMs into applications and need to manage API rate limits efficiently across multiple concurrent calls or distributed systems.

Not ideal if you are making only occasional, single LLM calls and do not need to optimize for high-volume or concurrent usage.

LLM API management API rate limiting distributed systems concurrent processing application development
No Package No Dependents
Maintenance 10 / 25
Adoption 6 / 25
Maturity 15 / 25
Community 9 / 25

How are scores calculated?

Stars

17

Forks

2

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/agents/Elijas/token-throttle"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.