reddy-lab-code-research/XLCoST

Code and data for XLCoST: A Benchmark Dataset for Cross-lingual Code Intelligence

34
/ 100
Emerging

This is a comprehensive dataset for training machine learning models that work with code across different programming languages. It provides aligned code snippets and full programs in 7 languages (C++, Java, Python, C#, Javascript, PHP, C) along with corresponding English comments and problem descriptions. Software engineers, researchers, and developers working on intelligent code tools would use this dataset to build models for tasks like code translation, summarization, and search.

No commits in the last 6 months.

Use this if you are building or evaluating AI models for code translation, summarization, or searching across multiple programming languages.

Not ideal if you need a dataset focused on a single programming language or if your task doesn't involve natural language descriptions.

code-translation code-summarization code-search software-engineering developer-tools
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 9 / 25

How are scores calculated?

Stars

91

Forks

6

Language

C

License

Apache-2.0

Last pushed

Jan 21, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/reddy-lab-code-research/XLCoST"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.