Funded by ERC grant BIGCODE - #680358

## Startups

## Statistical Engines

JSNice

JSNice de-obfuscates JavaScript programs. JSNice is a popular system in the JavaScript commmunity used by tens of thousands of programmers, worldwide

Nice2Predict

Efficient and scalable open-source framework for structured prediction, enabling one to build new statistical engines more quickly.

DeGuard

Based on Nice2Predict, DeGuard reverses the process of layout obfuscation done by Android obfuscation systems. It enables security analyses, including code inspection and predicting libraries.

## Datasets and Models

150k Python Dataset

Dataset consisting of 150'000 Python ASTs

150k JavaScript Dataset

Dataset consisting of 150'000 JavaScript files and their parsed ASTs

Probablistic models

Sythesized programs for probabilistic models (on the above datasets)

JSNice artifact

JSNice artifact that contains an engine, trained model and evaluation dataset

JSNice dataset

List of GitHub repositories used to train JSNice on

## Publications

## 2020

Learning Fast and Precise Numerical Analysis

Jingxuan He, Gagandeep Singh, Markus Püschel, Martin Vechev

PLDI 2020

Guiding Program Synthesis by Learning to Generate Examples

Larissa Laich, Pavol Bielik, Martin Vechev

ICLR 2020

## 2019

Learning to Infer User Interface Attributes from Images

Philippe Schlattner, Pavol Bielik, Martin Vechev

ArXiv 2019

Learning to Fuzz from Symbolic Execution with Application to Smart Contracts

Jingxuan He, Mislav Balunovic, Nodar Ambroladze, Petar Tsankov, Martin Vechev

ACM CCS 2019

Unsupervised Learning of API Aliasing Specifications

Jan Eberhardt, Samuel Steffen, Veselin Raychev, Martin Vechev

PLDI 2019

Scalable Taint Specification Inference with Big Code

Victor Chibotaru, Benjamin Bichsel, Veselin Raychev, Martin Vechev

PLDI 2019

## 2018

Robust Relational Layouts Synthesis from Examples for Android

Pavol Bielik, Marc Fischer, Martin Vechev

ACM OOPSLA 2018

DEBIN: Predicting Debug Information in Stripped Binaries

Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, Martin Vechev

ACM CCS 2018

Inferring Crypto API Rules from Code Changes

Rumen Paletov, Petar Tsankov, Veselin Raychev, Martin Vechev

PLDI 2018

## 2017

Program Synthesis for Character Level Language Modeling

Pavol Bielik, Veselin Raychev, Martin Vechev

ICLR 2017

## 2016

Probabilistic Model for Code with Decision Trees

Veselin Raychev, Pavol Bielik, Martin Vechev

ACM OOPSLA 2016

Statistical Deobfuscation of Android Applications

Benjamin Bichsel, Veselin Raychev, Peter Tsankov, Martin Vechev

ACM CCS 2016

## 2015

Predicting Program Properties from "Big Code"

Veselin Raychev, Martin Vechev, Andreas Krause

ACM POPL 2015

Programming with Big Code: Lessons, Techniques and Applications

Pavol Bielik, Veselin Raychev, Martin Vechev

SNAPL 2015

## 2014

Code Completion with Statistical Language Models

Veselin Raychev, Martin Vechev, Eran Yahav

ACM PLDI 2014

Phrase-Based Statistical Translation of Programming Languages

Svetoslav Karaivanov, Veselin Raychev, Martin Vechev

Onward 2014

## Talks

Learning to Analyze Programs at Scale

Machine Learning for Programming Workshop, FLOC 2018

Learning a static analyzer from data

Computer Aided Verification 2017

Probabilistic and Interpretable Models for Code

SYNT workshop, FLOC 2018

Machine Learning for Programming

iFM 2017 Keynote Talk

DeGuard: Statistical Deobfuscation for Android

Android Security Symposium 2017

Programming Languages and Machine Learning

Neural Abstract Machines & Program Induction (NIPS'16 workshop)

Statistical Deobfuscation of Android Applications

CCS 2016 talk

Machine Learning for Programs

CAV'16 Tutorial

Probabilistic Learning from Big Code

ISSTA'16 Keynote Talk

PHOG: Probabilistic Model for Code

ICML 2016 talk

Learning Programs from Noisy Data

POPL 2016 talk

Machine Learning for Programming

Invited Talk at ML4PL'15

Machine Learning for Code Analytics

PLDI'15 Tutorial

Machine Learning for Programming

Invited Talk at MIT ExCAPE'15 Summer School

Machine Learning for Programming

Invited Talk at TCE'15 Conference

Programming with Probabilistic Graphical Models

EPFL Colloquium, Dec, 2014

Programming Tools based on Big Data and Conditional Random Fields

Zurich Machine Learning and Data Science Meet-up

Statistical Program Analysis and Synthesis

HVC'14 Keynote

Statistical Program Analysis and Synthesis

ETH Workshop 2014

Code Completion with Statistical Language Models

Talk given at University of Washington and Microsoft Research (by V. Raychev) and EPFL and ETH (by Martin Vechev)

## Resources

- A new web site for learning from Big Code has been released here: HERE. The web site contains data sets, systems and challenge problems from groups working in the area.
- We are co-organizing a Dagstuhl Seminar on "Programming with Big Code", Nov 15-18, 2015