s-nlp/parallel_detoxification_dataset

Data from "Crowdsourcing of Parallel Corpora: the Case of Style Transfer for Detoxification" paper

/ 100

Experimental

This dataset helps content moderators, online community managers, and social media platforms automatically identify and rephrase toxic online comments into civil language. It provides pairs of original toxic sentences and their human-generated, detoxified counterparts. The data can be used to train AI models that can automatically transform harmful user-generated content into acceptable text.

No commits in the last 6 months.

Use this if you need to build or evaluate systems that automatically detect and rewrite toxic user comments into neutral, polite versions.

Not ideal if you're looking for a tool that performs the detoxification directly, as this is a dataset for training models, not a ready-to-use application.

content-moderation online-safety natural-language-processing community-management social-media-management

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

—

License

—

Higher-rated alternatives

unitaryai/detoxify

Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built...

kensk8er/chicksexer

A Python package for gender classification.

Infinitode/ValX

ValX is an open-source Python package for text cleaning tasks, including profanity detection and...

s-nlp/parallel_detoxification_dataset

Higher-rated alternatives

Explore NLP Tools