BatsResearch/cross-lingual-detox
Code for "Preference Tuning For Toxicity Mitigation Generalizes Across Languages." Paper accepted at Findings of EMNLP 2024
This project helps large language model (LLM) developers make their models generate less toxic content, even across different languages. By feeding the tool examples of preferred, less toxic English text, it fine-tunes the LLM to reduce harmful outputs. The output is a modified LLM that produces safer content in various languages without needing specific training for each one, benefiting developers building multilingual AI applications.
No commits in the last 6 months.
Use this if you are an LLM developer aiming to reduce toxicity in your models' outputs across multiple languages with a single English-based fine-tuning process.
Not ideal if you are an end-user of an LLM and want to filter out toxic content from an existing model without direct access to its development or fine-tuning process.
Stars
18
Forks
—
Language
Jupyter Notebook
License
BSD-3-Clause
Category
Last pushed
Mar 25, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/BatsResearch/cross-lingual-detox"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
unitaryai/detoxify
Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built...
kensk8er/chicksexer
A Python package for gender classification.
Infinitode/ValX
ValX is an open-source Python package for text cleaning tasks, including profanity detection and...
PavelOstyakov/toxic
Toxic Comment Classification Challenge
minerva-ml/open-solution-toxic-comments
Open solution to the Toxic Comment Classification Challenge