franciellevargas/HateBR

HateBR is the first large-scale expert annotated dataset of Brazilian Instagram comments for hate speech and offensive language detection on the web and social media.

46
/ 100
Emerging

HateBR is a dataset designed to help identify hate speech and offensive language in Brazilian Portuguese social media comments. It takes Instagram comments, primarily directed at politicians, and classifies them as either offensive or non-offensive. The output is a categorized set of comments, which can be used to train and evaluate automated systems for content moderation or social listening. This is ideal for social media analysts, content moderation teams, or researchers studying online communication in Brazil.

Use this if you need to build or evaluate a system for automatically detecting offensive language or hate speech in Brazilian Portuguese social media content.

Not ideal if your focus is on a language other than Brazilian Portuguese, or if you need to analyze content from platforms other than Instagram comments.

content-moderation social-listening online-safety Portuguese-language-processing social-media-analysis
No Package No Dependents
Maintenance 6 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 16 / 25

How are scores calculated?

Stars

45

Forks

8

Language

License

Last pushed

Jan 05, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/franciellevargas/HateBR"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.