langfuse and langkit

These are complements that work together: LangKit extracts monitoring signals (text quality, safety metrics) from LLM inputs/outputs that Langfuse can ingest and visualize within its broader observability platform.

langfuse
82
Verified
langkit
43
Emerging
Maintenance 22/25
Adoption 15/25
Maturity 25/25
Community 20/25
Maintenance 0/25
Adoption 10/25
Maturity 16/25
Community 17/25
Stars: 23,106
Forks: 2,333
Downloads:
Commits (30d): 252
Language: TypeScript
License:
Stars: 976
Forks: 70
Downloads:
Commits (30d): 0
Language: Jupyter Notebook
License: Apache-2.0
No risk flags
Stale 6m No Package No Dependents

About langfuse

langfuse/langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

This platform helps AI application developers build, test, and improve their large language model (LLM) powered products. It takes data from your LLM application's usage and provides tools for debugging, evaluating performance, and managing prompts. The end users are developers, machine learning engineers, and product managers working on AI applications.

AI-application-development LLM-observability prompt-engineering AI-testing machine-learning-operations

About langkit

whylabs/langkit

🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring safety & security. 🛡️ Features include text quality, relevance metrics, & sentiment analysis. 📊 A comprehensive tool for LLM observability. 👀

This toolkit helps data scientists and ML engineers proactively monitor the behavior of their language models, including LLMs, in production. It takes the text prompts and responses from your model and extracts various signals like text quality, relevance, sentiment, and potential security risks. The output is a set of metrics that provide deep insights into how your language model is performing and interacting with users.

LLM-observability AI-safety production-monitoring model-governance NLP-metrics

Scores updated daily from GitHub, PyPI, and npm data. How scores work