abhaskumarsinha/Corpus2GPT
Corpus2GPT: A project enabling users to train their own GPT models on diverse datasets, including local languages and various corpus types, using Keras and compatible with TensorFlow, PyTorch, or JAX backends for subsequent storage or sharing.
This tool helps researchers, data scientists, and linguists train custom GPT-style language models using their own text data, including content in various local languages. You provide your text corpus, and it produces a trained language model ready for use in generating text or other NLP tasks. It's designed for anyone looking to build specialized language models without deep technical overhead.
No commits in the last 6 months.
Use this if you need to train a custom GPT model on your specific datasets, especially if those datasets include diverse languages or unique text types.
Not ideal if you're looking for an out-of-the-box, pre-trained language model for general use without any custom training.
Stars
7
Forks
3
Language
Jupyter Notebook
License
Apache-2.0
Category
Last pushed
Oct 18, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/abhaskumarsinha/Corpus2GPT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Nixtla/nixtla
TimeGPT-1: production ready pre-trained Time Series Foundation Model for forecasting and...
andrewdalpino/NoPE-GPT
A GPT-style small language model (SLM) with no positional embeddings (NoPE).
sigdelsanjog/gptmed
pip install gptmed
akanyaani/gpt-2-tensorflow2.0
OpenAI GPT2 pre-training and sequence prediction implementation in Tensorflow 2.0
samkamau81/FinGPT_
FinGPT is an AI language model designed to understand and generate financial content. Built upon...