reddy-lab-code-research/XLCoST

Code and data for XLCoST: A Benchmark Dataset for Cross-lingual Code Intelligence

/ 100

Emerging

This is a comprehensive dataset for training machine learning models that work with code across different programming languages. It provides aligned code snippets and full programs in 7 languages (C++, Java, Python, C#, Javascript, PHP, C) along with corresponding English comments and problem descriptions. Software engineers, researchers, and developers working on intelligent code tools would use this dataset to build models for tasks like code translation, summarization, and search.

No commits in the last 6 months.

Use this if you are building or evaluating AI models for code translation, summarization, or searching across multiple programming languages.

Not ideal if you need a dataset focused on a single programming language or if your task doesn't involve natural language descriptions.

code-translation code-summarization code-search software-engineering developer-tools

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

License

Apache-2.0

Higher-rated alternatives

facebookresearch/fairseq2

FAIR Sequence Modeling Toolkit 2

lhotse-speech/lhotse

Tools for handling multimodal data in machine learning projects.

google/sequence-layers

A neural network layer API and library for sequence modeling, designed for easy creation of...

awslabs/sockeye

Sequence-to-sequence framework with a focus on Neural Machine Translation based on PyTorch

OpenNMT/OpenNMT-tf

Neural machine translation and sequence learning using TensorFlow

Explore ML Frameworks

All categories Trending ML Framework directory Insights