HaoAreYuDong/MachineLearningLM
Scaling In-context Learning from Few-shot to 1,024-shot on Tabular ML
This project provides an end-to-end system for evaluating how well large language models (LLMs) perform on various machine learning tasks, particularly with tabular data. It takes in your raw tabular or text datasets and generates comprehensive evaluation reports, showing how different LLMs handle tasks like classification or regression. It's designed for researchers and machine learning engineers who need to benchmark LLM capabilities for practical applications.
Use this if you need to systematically evaluate the performance of large language models on machine learning tasks using your own datasets, from data preprocessing to final reports.
Not ideal if you're looking for a simple, plug-and-play solution for a single machine learning model without needing extensive LLM evaluation.
Stars
59
Forks
2
Language
Python
License
MIT
Category
Last pushed
Dec 12, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/HaoAreYuDong/MachineLearningLM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
mlabonne/llm-datasets
Curated list of datasets and tools for post-training.
malteos/llm-datasets
A collection of datasets for language model pretraining including scripts for downloading,...
magpie-align/magpie
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your...
jd-coderepos/llms4subjects
The official SemEval 2025 Task 5 - LLMs4Subjects - Shared Task Dataset repository
willxxy/ECG-Bench
A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)