UKPLab/gpl
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
This tool helps improve how well a search or recommendation system finds relevant information within a specific subject area, even if you don't have existing labeled data for that area. It takes a collection of documents (your 'corpus') and automatically generates training data, then uses that to fine-tune a search model. Data scientists and machine learning engineers who need to build or enhance domain-specific information retrieval systems would find this useful.
340 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to adapt a general-purpose search model to perform much better on a specialized collection of documents without manually labeling any queries or passages.
Not ideal if you already have a large, high-quality dataset of domain-specific search queries and relevant document pairs.
Stars
340
Forks
38
Language
Python
License
Apache-2.0
Category
Last pushed
Jul 06, 2023
Commits (30d)
0
Dependencies
4
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/UKPLab/gpl"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
galilai-group/stable-pretraining
Reliable, minimal and scalable library for pretraining foundation and world models
CognitiveAISystems/MAPF-GPT
[AAAI-2025] This repository contains MAPF-GPT, a deep learning-based model for solving MAPF...
larslorch/avici
Amortized Inference for Causal Structure Learning, NeurIPS 2022
svdrecbd/mhc-mlx
MLX + Metal implementation of mHC: Manifold-Constrained Hyper-Connections by DeepSeek-AI.
kyegomez/MHMoE
Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch