Funded by ERC grant BIGCODE - #680358
Startups
Statistical Engines
JSNice
      JSNice de-obfuscates JavaScript programs. JSNice is a popular system in the JavaScript commmunity used by tens of thousands of programmers, worldwide
      
      
    Nice2Predict
      Efficient and scalable open-source framework for structured prediction, enabling one to build new statistical engines more quickly.
      
      
    DeGuard
      Based on Nice2Predict, DeGuard reverses the process of layout obfuscation done by Android obfuscation systems. It enables security analyses, including code inspection and predicting libraries.
      
      
    Datasets and Models
        150k Python Dataset
        
          
    
    
    Dataset consisting of 150'000 Python ASTs
        
        
      
        150k JavaScript Dataset
        
          
    
    
    Dataset consisting of 150'000 JavaScript files and their parsed ASTs
        
        
      
        Probabilistic models
        
          
    
    
    Sythesized programs for probabilistic models (on the above datasets)
        
        
      
        JSNice artifact
        
          
    
    
    JSNice artifact that contains an engine, trained model and evaluation dataset
        
        
      
        JSNice dataset
        
          
    
    
    
  List of GitHub repositories used to train JSNice on
        
        
      Publications
2022
        On Distribution Shift in Learning-based Bug Detectors
        
          
    
    
    
  Jingxuan He, Luca Beurer-Kellner, Martin Vechev
        
        
          
      ICML 
      2022 
      
      
      
    
        
      2021
        Learning to Explore Paths for Symbolic Execution
        
          
    
    
    
  Jingxuan He, Gishor Sivanrupan, Petar Tsankov, Martin Vechev
        
        
          
      ACM CCS 
      2021 
      
      
      
    
        
      
        TFix: Learning to Fix Coding Errors with a Text-to-Text Transformer
        
          
    
    
    
  Berkay Berabi, Jingxuan He, Veselin Raychev, Martin Vechev
        
        
          
      ICML 
      2021 
      
      
      
    
        
      
        Learning to Find Naming Issues with Big Code and Small Supervision
        
          
    
    
    
  Jingxuan He, Cheng-Chun Lee, Veselin Raychev, Martin Vechev
        
        
          
      PLDI 
      2021 
      
      
      
    
        
      
        Robustness Certification with Generative Models
        
          
    
    
    
  Matthew Mirman, Alexander Hägele, Timon Gehr, Pavol Bielik, Martin Vechev
        
        
          
      PLDI 
      2021 
      
      
      
    
        
      2020
        Learning Fast and Precise Numerical Analysis
        
          
    
    
    
  Jingxuan He, Gagandeep Singh, Markus Püschel, Martin Vechev
        
        
          
      PLDI 
      2020 
      
      
      
    
        
      
        Guiding Program Synthesis by Learning to Generate Examples
        
          
    
    
    
  Larissa Laich, Pavol Bielik, Martin Vechev
        
        
          
      ICLR 
      2020 
      
      
      
    
        
      2019
        Learning to Infer User Interface Attributes from Images
        
          
    
    
    
  Philippe Schlattner, Pavol Bielik, Martin Vechev
        
        
          
      arXiv 
      2019 
      
      
      
    
        
      
        Learning to Fuzz from Symbolic Execution with Application to Smart Contracts
        
          
    
    
    
  Jingxuan He, Mislav Balunović, Nodar Ambroladze, Petar Tsankov, Martin Vechev
        
        
          
      ACM CCS 
      2019 
      
      
      
    
        
      
        Unsupervised Learning of API Aliasing Specifications
        
          
    
    
    
  Jan Eberhardt, Samuel Steffen, Veselin Raychev, Martin Vechev
        
        
          
      PLDI 
      2019 
      
      
      
    
        
      
        Scalable Taint Specification Inference with Big Code
        
          
    
    
    
  Victor Chibotaru, Benjamin Bichsel, Veselin Raychev, Martin Vechev
        
        
          
      PLDI 
      2019 
      
      
      
    
        
      2018
        Robust Relational Layouts Synthesis from Examples for Android
        
          
    
    
    
  Pavol Bielik, Marc Fischer, Martin Vechev
        
        
          
      ACM OOPSLA 
      2018 
      
      
      
    
        
      
        DEBIN: Predicting Debug Information in Stripped Binaries
        
          
    
    
    
  Jingxuan He, Pesho Ivanov, Petar Tsankov, Veselin Raychev, Martin Vechev
        
        
          
      ACM CCS 
      2018 
      
      
      
    
        
      
        Inferring Crypto API Rules from Code Changes
        
          
    
    
    
  Rumen Paletov, Petar Tsankov, Veselin Raychev, Martin Vechev
        
        
          
      PLDI 
      2018 
      
      
      
    
        
      2017
        Program Synthesis for Character Level Language Modeling
        
          
    
    
    
  Pavol Bielik, Veselin Raychev, Martin Vechev
        
        
          
      ICLR 
      2017 
      
      
      
    
        
      2016
        Probabilistic Model for Code with Decision Trees
        
          
    
    
    
  Veselin Raychev, Pavol Bielik, Martin Vechev
        
        
          
      ACM OOPSLA 
      2016 
      
      
      
    
        
      
        Statistical Deobfuscation of Android Applications
        
          
    
    
    
  Benjamin Bichsel, Veselin Raychev, Peter Tsankov, Martin Vechev
        
        
          
      ACM CCS 
      2016 
      
      
      
    
        
      2015
        Predicting Program Properties from "Big Code"
        
          
    
    
    
  Veselin Raychev, Martin Vechev, Andreas Krause
        
        
          
      ACM POPL 
      2015 
      
      
      
    
        
      
        Programming with Big Code: Lessons, Techniques and Applications
        
          
    
    
    
  Pavol Bielik, Veselin Raychev, Martin Vechev
        
        
          
      SNAPL 
      2015 
      
      
      
    
        
      2014
        Code Completion with Statistical Language Models
        
          
    
    
    
  Veselin Raychev, Martin Vechev, Eran Yahav
        
        
          
      ACM PLDI 
      2014 
      
      
      
    
        
      
        Phrase-Based Statistical Translation of Programming Languages
        
          
    
    
    
  Svetoslav Karaivanov, Veselin Raychev, Martin Vechev
        
        
          
      Onward 
      2014 
      
      
      
    
        
      Talks
        Learning to Analyze Programs at Scale
        
          
    
    
    
  Machine Learning for Programming Workshop, FLOC 2018
        
        
      
        Learning a static analyzer from data
        
          
    
    
    
  Computer Aided Verification 2017
        
        
      
        Probabilistic and Interpretable Models for Code
        
          
    
    
    
  SYNT workshop, FLOC 2018
        
        
      
        Machine Learning for Programming
        
          
    
    
    
  iFM 2017 Keynote Talk
        
        
      
        DeGuard: Statistical Deobfuscation for Android
        
          
    
    
    
  Android Security Symposium 2017
        
        
      
        Programming Languages and Machine Learning
        
          
    
    
    
  Neural Abstract Machines & Program Induction (NIPS'16 workshop)
        
        
      
        Statistical Deobfuscation of Android Applications
        
          
    
    
    
  CCS 2016 talk
        
        
      
        Machine Learning for Programs
        
          
    
    
    
  CAV'16 Tutorial
        
        
      
        Probabilistic Learning from Big Code
        
          
    
    
    
  ISSTA'16 Keynote Talk
        
        
      
        PHOG: Probabilistic Model for Code
        
          
    
    
    
  ICML 2016 talk
        
        
      
        Learning Programs from Noisy Data
        
          
    
    
    
  POPL 2016 talk
        
        
      
        Machine Learning for Programming
        
          
    
    
    Invited Talk at ML4PL'15
        
        
      
        Machine Learning for Code Analytics
        
          
    
    
    
  PLDI'15 Tutorial
        
        
      
        Machine Learning for Programming
        
          
    
    
    
  Invited Talk at MIT ExCAPE'15 Summer School
        
        
      
        Machine Learning for Programming
        
          
    
    
    
  Invited Talk at TCE'15 Conference
        
        
      
        Programming with Probabilistic Graphical Models
        
          
    
    
    
  EPFL Colloquium, Dec, 2014
        
        
      
        Programming Tools based on Big Data and Conditional Random Fields
        
          
    
    
    Zurich Machine Learning and Data Science Meet-up
        
        
      
        Statistical Program Analysis and Synthesis
        
          
    
    
    HVC'14 Keynote
        
        
      
        Statistical Program Analysis and Synthesis
        
          
    
    
    
  ETH Workshop 2014
        
        
      
        Code Completion with Statistical Language Models
        
          
    
    
    
  Talk given at University of Washington and Microsoft Research (by V. Raychev) and EPFL and ETH (by Martin Vechev)
        
        
      Resources
- A new web site for learning from Big Code has been released here: HERE. The web site contains data sets, systems and challenge problems from groups working in the area.
 - We are co-organizing a Dagstuhl Seminar on "Programming with Big Code", Nov 15-18, 2015
 
 
      



