gambolputty/textstelle
Textstelle is a collection of corpora for the creation of bots and other things that generate text π€
This project provides organized collections of text, primarily in German, that you can use as source material for creating text-generating bots or other automated writing tools. It takes raw text from various sources and offers it in a ready-to-use format. Writers, artists, or researchers interested in computational creativity or natural language generation, particularly with German text, would find this useful.
No commits in the last 6 months.
Use this if you need diverse German text datasets to train a chatbot, an AI writer, or any application that generates original text based on existing patterns.
Not ideal if you're looking for datasets for tasks like sentiment analysis, machine translation, or text classification, or if your primary language isn't German.
Stars
21
Forks
3
Language
—
License
—
Category
Last pushed
Oct 19, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/gambolputty/textstelle"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
natasha/corus
Links to Russian corpora + Python functions for loading and parsing
SergeyShk/ruTS
ΠΠΈΠ±Π»ΠΈΠΎΡΠ΅ΠΊΠ° Π΄Π»Ρ ΠΈΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΡ ΡΡΠ°ΡΠΈΡΡΠΈΠΊ ΠΈΠ· ΡΠ΅ΠΊΡΡΠΎΠ² Π½Π° ΡΡΡΡΠΊΠΎΠΌ ΡΠ·ΡΠΊΠ΅.
darija-open-dataset/dataset
darija <-> english dataset
omicsNLP/Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...