KarineAyrs/knowledge-distillation-semantic-search
KDSS is the framework for knowledge distillation from LLMs
This framework helps machine learning practitioners or researchers fine-tune smaller, more efficient language models for semantic search tasks. You provide your specialized documents, and the framework uses the knowledge of large language models (like OpenAI's or Alpaca) to create training data. The output is a smaller, fine-tuned model (e.g., BERT) that can then be used to generate embeddings for your documents, making them semantically searchable.
Use this if you need to create a custom semantic search engine for your specific domain and want to leverage the power of large language models to train a smaller, faster model on your data without manual labeling.
Not ideal if you don't have a collection of domain-specific documents or if you require an off-the-shelf, immediately deployable semantic search solution without model training.
Stars
12
Forks
2
Language
Python
License
MIT
Category
Last pushed
Nov 05, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/KarineAyrs/knowledge-distillation-semantic-search"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
airaria/TextBrewer
A PyTorch-based knowledge distillation toolkit for natural language processing
sunyilgdx/NSP-BERT
The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original...
princeton-nlp/CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
kssteven418/LTP
[KDD'22] Learned Token Pruning for Transformers
georgian-io/Transformers-Domain-Adaptation
:no_entry: [DEPRECATED] Adapt Transformer-based language models to new text domains