UKPLab/gpl

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577

/ 100

Established

This tool helps improve how well a search or recommendation system finds relevant information within a specific subject area, even if you don't have existing labeled data for that area. It takes a collection of documents (your 'corpus') and automatically generates training data, then uses that to fine-tune a search model. Data scientists and machine learning engineers who need to build or enhance domain-specific information retrieval systems would find this useful.

340 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to adapt a general-purpose search model to perform much better on a specialized collection of documents without manually labeling any queries or passages.

Not ideal if you already have a large, high-quality dataset of domain-specific search queries and relevant document pairs.

information-retrieval search-systems natural-language-processing unsupervised-learning text-analytics

Stale 6m

Maintenance 0 / 25

Adoption 10 / 25

Maturity 25 / 25

Community 17 / 25

How are scores calculated?

Stars

340

Forks

Language

Python

License

Apache-2.0

Related models

galilai-group/stable-pretraining

Reliable, minimal and scalable library for pretraining foundation and world models

CognitiveAISystems/MAPF-GPT

[AAAI-2025] This repository contains MAPF-GPT, a deep learning-based model for solving MAPF...

larslorch/avici

Amortized Inference for Causal Structure Learning, NeurIPS 2022

svdrecbd/mhc-mlx

MLX + Metal implementation of mHC: Manifold-Constrained Hyper-Connections by DeepSeek-AI.

kyegomez/MHMoE

Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch

Explore Transformer Models

All categories Trending Transformer directory Insights