Y-Research-SBU/CSR

Official Repository for CSR - ICML 2025 Oral

/ 100

Experimental

This project helps machine learning practitioners efficiently process and retrieve information from large datasets containing images, text, or a combination of both. It takes existing data embeddings and transforms them into a 'sparse' representation, allowing for faster and more cost-effective searches while maintaining accuracy. This is ideal for researchers and engineers building and deploying AI models.

Use this if you need to perform accurate content retrieval or classification on large image, text, or multimodal datasets with significantly reduced computational cost and faster inference.

Not ideal if your primary goal is to train a model from scratch without leveraging pre-trained embeddings or if your datasets are very small and efficiency is not a critical concern.

information-retrieval machine-learning-engineering multimodal-analytics large-scale-data-processing computational-efficiency

No License No Package No Dependents

Maintenance 10 / 25

Adoption 6 / 25

Maturity 7 / 25

Community 4 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

jncraton/languagemodels

Explore large language models in 512MB of RAM

microsoft/unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

haizelabs/verdict

Inference-time scaling for LLMs-as-a-judge.

albertan017/LLM4Decompile

Reverse Engineering: Decompiling Binary Code with Large Language Models

bytedance/Sa2VA

Official Repo For Pixel-LLM Codebase

Explore Transformer Models

All categories Trending Transformer directory Insights