Startups

DeepCode
DeepCode offers the first AI-based code review system

Statistical Engines

JSNice
JSNice de-obfuscates JavaScript programs. JSNice is a popular system in the JavaScript commmunity used by tens of thousands of programmers, worldwide
Nice2Predict
Efficient and scalable open-source framework for structured prediction, enabling one to build new statistical engines more quickly.
DeGuard
Based on Nice2Predict, DeGuard reverses the process of layout obfuscation done by Android obfuscation systems. It enables security analyses, including code inspection and predicting libraries.
DEBIN
Based on Nice2Predict, DEBIN recovers debug information (e.g., names and types) of stripped binaries, helpful for various analysis tasks like decompilation, malware inspection and similarity.

Datasets and Models

150k Python Dataset
Dataset consisting of 150'000 Python ASTs
150k JavaScript Dataset
Dataset consisting of 150'000 JavaScript files and their parsed ASTs
Probablistic models
Sythesized programs for probabilistic models (on the above datasets)
JSNice artifact
JSNice artifact that contains an engine, trained model and evaluation dataset
JSNice dataset
List of GitHub repositories used to train JSNice on

Publications

DEBIN: Predicting Debug Information in Stripped Binaries
Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, Martin Vechev
ACM CCS 2018
Inferring Crypto API Rules from Code Changes
Rumen Paletov, Petar Tsankov, Veselin Raychev, Martin Vechev
PLDI 2018
Learning a Static Analyzer from Data
Pavol Bielik, Veselin Raychev, Martin Vechev
CAV 2017
Program Synthesis for Character Level Language Modeling
Pavol Bielik, Veselin Raychev, Martin Vechev
ICLR 2017
Probabilistic Model for Code with Decision Trees
Veselin Raychev, Pavol Bielik, Martin Vechev
ACM OOPSLA 2016
Statistical Deobfuscation of Android Applications
Benjamin Bichsel, Veselin Raychev, Peter Tsankov, Martin Vechev
ACM CCS 2016
Predicting Program Properties from "Big Code"
Veselin Raychev, Martin Vechev, Andreas Krause
ACM POPL 2015
Programming with Big Code: Lessons, Techniques and Applications
Pavol Bielik, Veselin Raychev, Martin Vechev
SNAPL 2015
Code Completion with Statistical Language Models
Veselin Raychev, Martin Vechev, Eran Yahav
ACM PLDI 2014
Phrase-Based Statistical Translation of Programming Languages
Svetoslav Karaivanov, Veselin Raychev, Martin Vechev
Onward 2014

Talks

Learning to Analyze Programs at Scale
Machine Learning for Programming Workshop, FLOC 2018
Learning a static analyzer from data
Computer Aided Verification 2017
Programming Languages and Machine Learning
Neural Abstract Machines & Program Induction (NIPS'16 workshop)
Machine Learning for Programming
Invited Talk at ML4PL'15
Machine Learning for Programming
Invited Talk at MIT ExCAPE'15 Summer School
Machine Learning for Programming
Invited Talk at TCE'15 Conference
Programming Tools based on Big Data and Conditional Random Fields
Zurich Machine Learning and Data Science Meet-up
Code Completion with Statistical Language Models
Talk given at University of Washington and Microsoft Research (by V. Raychev) and EPFL and ETH (by Martin Vechev)

Resources



Funded by ERC grant BIGCODE - #680358