UKPLab/gpl

Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577

52
/ 100
Established

This tool helps improve how well a search or recommendation system finds relevant information within a specific subject area, even if you don't have existing labeled data for that area. It takes a collection of documents (your 'corpus') and automatically generates training data, then uses that to fine-tune a search model. Data scientists and machine learning engineers who need to build or enhance domain-specific information retrieval systems would find this useful.

340 stars. No commits in the last 6 months. Available on PyPI.

Use this if you need to adapt a general-purpose search model to perform much better on a specialized collection of documents without manually labeling any queries or passages.

Not ideal if you already have a large, high-quality dataset of domain-specific search queries and relevant document pairs.

information-retrieval search-systems natural-language-processing unsupervised-learning text-analytics
Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 17 / 25

How are scores calculated?

Stars

340

Forks

38

Language

Python

License

Apache-2.0

Last pushed

Jul 06, 2023

Commits (30d)

0

Dependencies

4

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/UKPLab/gpl"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.