HaoAreYuDong/MachineLearningLM

Scaling In-context Learning from Few-shot to 1,024-shot on Tabular ML

/ 100

Emerging

This project provides an end-to-end system for evaluating how well large language models (LLMs) perform on various machine learning tasks, particularly with tabular data. It takes in your raw tabular or text datasets and generates comprehensive evaluation reports, showing how different LLMs handle tasks like classification or regression. It's designed for researchers and machine learning engineers who need to benchmark LLM capabilities for practical applications.

Use this if you need to systematically evaluate the performance of large language models on machine learning tasks using your own datasets, from data preprocessing to final reports.

Not ideal if you're looking for a simple, plug-and-play solution for a single machine learning model without needing extensive LLM evaluation.

machine-learning-evaluation LLM-benchmarking tabular-data-analysis model-performance data-science-research

No Package No Dependents

Maintenance 6 / 25

Adoption 8 / 25

Maturity 15 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

mlabonne/llm-datasets

Curated list of datasets and tools for post-training.

malteos/llm-datasets

A collection of datasets for language model pretraining including scripts for downloading,...

magpie-align/magpie

[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your...

jd-coderepos/llms4subjects

The official SemEval 2025 Task 5 - LLMs4Subjects - Shared Task Dataset repository

willxxy/ECG-Bench

A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)

Explore Transformer Models

All categories Trending Transformer directory Insights