Elijas/token-throttle

Multi-resource rate limiting for LLM APIs. Reserve tokens before you call, refund what you don't use, stay under the limit across workers.

/ 100

Emerging

When you're working with large language models (LLMs) and making many API calls, especially across different applications or in batch processes, it's easy to hit rate limits and get errors or dramatically slow down your work. This tool helps you manage those limits by letting you reserve the tokens you expect to use before you make a call, and then refund any unused capacity afterward. This ensures you maximize your allowed usage without exceeding your provider's limits, making it ideal for developers building LLM-powered applications.

Use this if you are a developer integrating LLMs into applications and need to manage API rate limits efficiently across multiple concurrent calls or distributed systems.

Not ideal if you are making only occasional, single LLM calls and do not need to optimize for high-volume or concurrent usage.

LLM API management API rate limiting distributed systems concurrent processing application development

No Package No Dependents

Maintenance 10 / 25

Adoption 6 / 25

Maturity 15 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

valmi-io/value

⚡ "Value" - https://value.valmi.io . Valmi Value is Outcome-based billing and payments...

aptible/unpage

Unpage is the open source framework for building SRE agents with infrastructure context and...

Harshit-J004/toolguard

Pytest-style reliability testing for AI agent tool chains. Catches hallucinated payloads, schema...

dipampaul17/AgentGuard

Real-time guardrail that shows token spend & kills runaway LLM/agent loops.

2001Haru/TokenWaster

The Most Useless EVER agent assistant in the Human History. Always trying to read everything in...

Explore AI Agents

All categories Trending AI Agent directory Insights