UKPLab/acl2024-ircoder
Data creation, training and eval scripts for the IRCoder paper
This project offers a way for machine learning researchers and engineers to improve how their code-generating language models (Code-LMs) handle multiple programming languages. It takes source code files from various languages and their corresponding compiler intermediate representations (IR) to train Code-LMs. The output is a more robust, multilingual Code-LM capable of better code completion, understanding, and instruction following across different programming languages.
No commits in the last 6 months.
Use this if you are a machine learning researcher or engineer looking to enhance the multilingual capabilities and robustness of your code-generating language models by leveraging compiler intermediate representations.
Not ideal if you are an end-user developer simply looking for an off-the-shelf code generation tool without needing to train or fine-tune models.
Stars
20
Forks
2
Language
Python
License
—
Category
Last pushed
May 31, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/UKPLab/acl2024-ircoder"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
luheng/deep_srl
Code and pre-trained model for: Deep Semantic Role Labeling: What Works and What's Next
sileod/tasksource
Datasets collection and preprocessings framework for NLP extreme multitask learning
loomchild/maligna
Bilingual sengence aligner
CK-Explorer/DuoSubs
Semantic subtitle aligner and merger for bilingual subtitle syncing.
coastalcph/lex-glue
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English