phospho-app/fastassert

Dockerized LLM inference server with constrained output (JSON mode), built on top of vLLM and outlines. Faster, cheaper and without rate limits. Compare the quality and latency to your current LLM API provider.

15
/ 100
Experimental

This project helps developers and MLOps engineers deploy and manage Large Language Models (LLMs) more efficiently. It takes text prompts and desired JSON or regex output formats as input, and provides faster, more cost-effective, and rate-limit-free LLM responses. It's designed for technical teams who integrate LLMs into applications and need reliable, structured outputs.

No commits in the last 6 months.

Use this if you are a developer or MLOps engineer looking to self-host LLM inference, guarantee structured JSON or regex outputs, and reduce costs and latency compared to commercial LLM APIs.

Not ideal if you are a non-technical user or do not have the infrastructure (Linux OS, CUDA 12.1, and at least 16GB GPU RAM) to run a local inference server.

LLM deployment MLOps API integration backend development AI infrastructure
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 0 / 25

How are scores calculated?

Stars

27

Forks

Language

Jupyter Notebook

License

Last pushed

Feb 17, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/phospho-app/fastassert"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.