abhaskumarsinha/Corpus2GPT

Corpus2GPT: A project enabling users to train their own GPT models on diverse datasets, including local languages and various corpus types, using Keras and compatible with TensorFlow, PyTorch, or JAX backends for subsequent storage or sharing.

/ 100

Emerging

This tool helps researchers, data scientists, and linguists train custom GPT-style language models using their own text data, including content in various local languages. You provide your text corpus, and it produces a trained language model ready for use in generating text or other NLP tasks. It's designed for anyone looking to build specialized language models without deep technical overhead.

No commits in the last 6 months.

Use this if you need to train a custom GPT model on your specific datasets, especially if those datasets include diverse languages or unique text types.

Not ideal if you're looking for an out-of-the-box, pre-trained language model for general use without any custom training.

natural-language-processing custom-language-models linguistic-research text-generation multi-language-data

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

Nixtla/nixtla

TimeGPT-1: production ready pre-trained Time Series Foundation Model for forecasting and...

andrewdalpino/NoPE-GPT

A GPT-style small language model (SLM) with no positional embeddings (NoPE).

sigdelsanjog/gptmed

pip install gptmed

akanyaani/gpt-2-tensorflow2.0

OpenAI GPT2 pre-training and sequence prediction implementation in Tensorflow 2.0

samkamau81/FinGPT_

FinGPT is an AI language model designed to understand and generate financial content. Built upon...

Explore LLM Tools

All categories Trending LLM Tool directory Insights