Sreyan88/DALE
Code for EMNLP 2023 paper: DALE: Generative Data Augmentation for Low-Resource Legal NLP
This tool helps legal professionals, researchers, or legal tech developers working with limited legal text data. It takes existing legal documents or case texts and generates new, diverse variations of these documents. This augmentation helps improve the performance of machine learning models used for tasks like legal document classification or information extraction, even when initial data is scarce.
No commits in the last 6 months.
Use this if you need to train or improve an NLP model on legal texts but have a small amount of annotated data, and traditional data augmentation methods aren't effective for complex legal language.
Not ideal if you're looking for a direct, out-of-the-box solution for legal text analysis without any programming or machine learning model integration.
Stars
10
Forks
2
Language
Python
License
MIT
Category
Last pushed
Oct 27, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/Sreyan88/DALE"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
sdv-dev/SDV
Synthetic data generation for tabular data
sdv-dev/SDGym
Benchmarking synthetic data generation methods.
NVIDIA-NeMo/DataDesigner
🎨 NeMo Data Designer: A general library for generating high-quality synthetic data from scratch...
AlexanderVNikitin/tsgm
Generation and evaluation of synthetic time series datasets (also, augmentations,...
mostly-ai/mostlyai
Synthetic Data SDK ✨