LivingFutureLab/UQABench

[KDD 2025] The source code for UQABench

/ 100

Experimental

This benchmark helps e-commerce platforms and personalized recommendation systems evaluate how well their large language models (LLMs) can answer customer questions in a personalized way. It takes historical user interaction data, like past purchases or clicks, and outputs metrics showing how accurately an LLM can provide tailored answers. This is for researchers and engineers working on enhancing personalized customer service or product recommendations.

No commits in the last 6 months.

Use this if you need a standardized way to test and compare different methods of personalizing LLM responses for individual users based on their historical behavior.

Not ideal if you are looking for a plug-and-play LLM solution or a general-purpose question-answering system without a strong focus on user personalization benchmarks.

e-commerce personalized-recommendations customer-experience AI-evaluation LLM-benchmarking

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

ContextualAI/gritlm

Generative Representational Instruction Tuning

xlang-ai/instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

liuqidong07/LLMEmb

[AAAI'25 Oral] The official implementation code of LLMEmb

hpcaitech/CachedEmbedding

A memory efficient DLRM training solution using ColossalAI

ritesh-modi/embedding-hallucinations

This repo shows how foundational model hallucinates and how we can fix such hallucinations using...

Explore Embedding Tools

All categories Trending Embeddings directory Insights