abhaskumarsinha/Corpus2GPT

Corpus2GPT: A project enabling users to train their own GPT models on diverse datasets, including local languages and various corpus types, using Keras and compatible with TensorFlow, PyTorch, or JAX backends for subsequent storage or sharing.

34
/ 100
Emerging

This tool helps researchers, data scientists, and linguists train custom GPT-style language models using their own text data, including content in various local languages. You provide your text corpus, and it produces a trained language model ready for use in generating text or other NLP tasks. It's designed for anyone looking to build specialized language models without deep technical overhead.

No commits in the last 6 months.

Use this if you need to train a custom GPT model on your specific datasets, especially if those datasets include diverse languages or unique text types.

Not ideal if you're looking for an out-of-the-box, pre-trained language model for general use without any custom training.

natural-language-processing custom-language-models linguistic-research text-generation multi-language-data
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 4 / 25
Maturity 16 / 25
Community 14 / 25

How are scores calculated?

Stars

7

Forks

3

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Oct 18, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/abhaskumarsinha/Corpus2GPT"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.