aws-samples/fine-tuning-llm-with-domain-knowledge
This repo walks you through how to use transfer learning to fine tune a LLM (large language model) using UK Supreme Court case law as the domain specific training dataset. The model being fine-tuned is the HuggingFace GPTJ-6B model.
This project helps machine learning engineers or data scientists enhance a large language model's ability to understand and generate text specific to a particular domain. By feeding it a dataset of UK Supreme Court case law, you can transform a general-purpose language model into one proficient in legal terminology and context. The input is a pre-trained LLM and domain-specific text documents, and the output is a fine-tuned LLM ready for specialized tasks.
No commits in the last 6 months.
Use this if you need to create a language model that is highly knowledgeable and accurate within a very specific field, such as legal research or medical documentation, beyond what a general LLM can offer.
Not ideal if you are a legal professional looking for a direct legal advice tool, or if you lack experience with AWS SageMaker and machine learning model training.
Stars
42
Forks
6
Language
Jupyter Notebook
License
MIT-0
Category
Last pushed
Aug 08, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/aws-samples/fine-tuning-llm-with-domain-knowledge"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
OptimalScale/LMFlow
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
adithya-s-k/AI-Engineering.academy
Mastering Applied AI, One Concept at a Time
jax-ml/jax-llm-examples
Minimal yet performant LLM examples in pure JAX
young-geng/scalax
A simple library for scaling up JAX programs
riyanshibohra/TuneKit
Upload your data → Get a fine-tuned SLM. Free.