Program translation retrieval system

Program translation is a growing demand in software engineering. Manual program translation requires programming expertise in source and target language. One way to automate this process is to make use of the big data of programs. However, existing code retrieval techniques lack the design to cover cross-language code retrieval. Other data-driven approaches require human efforts in constructing cross-language parallel datasets to train translation models. We built a code translation retrieval system. We use a lightweight but informative program representation, which can be generalized to all imperative PLs. Furthermore, we implement our customized index structure and hierarchical filtering mechanism for efficient code retrieval from a big data.

Dependencies

Python3
MongoDB
pymongo
ANTLR (Java target)

Create a feature database

In the offline phase, system constructs feature reapresentation for each program in the database and save it in a feature database for use.

sh create_feature_db.sh path_to_your_database

Create index

System constructs path-type-bucket-index for the featrue representations.

python3 create_index.py

User can specify maximum bucket size in bucket_size.json.

Retrieve translation for a program

sh translate.sh path_to_input_code target_language

In this repository, we add 4 programming languages: Python, Java, C++, JavaScript.

If you want more languages support, just simply run ANTLR parser in your desired language and enrich the lang_collection list in each file.

Evaluations

Comparison with Data-driven Program Translation

We compare the results of effectiveness and efficiency of our system with the following state-of-the-art baselines:

1pSMT: phrase-based SMT on sequential programs.
mppSMT: multi-phase phrase-based SMT.
Tree2tree: tree-to-tree neural networks.
TransCoder: weakly-supervise translation model.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
node		node
pathtype		pathtype
LICENSE		LICENSE
README.md		README.md
create_feature_db.sh		create_feature_db.sh
create_index.py		create_index.py
feature_std.py		feature_std.py
print_trans.py		print_trans.py
representation.py		representation.py
retrieval.py		retrieval.py
translate.sh		translate.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Program translation retrieval system

Dependencies

Create a feature database

Create index

Retrieve translation for a program

Evaluations

Comparison with Data-driven Program Translation

About

Releases

Packages

Languages

License

BigDaMa/RPT

Folders and files

Latest commit

History

Repository files navigation

Program translation retrieval system

Dependencies

Create a feature database

Create index

Retrieve translation for a program

Evaluations

Comparison with Data-driven Program Translation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages