The objective of the seminar is to:

  • Introduce students to the emerging field of Deep Learning for Big Code.
  • Learn how machine learning models can be used to solve practical challenges in software engineering and programming beyond traditional methods.
  • Highlight the latest research and work opportunities in industry and academia available on this topic.

The seminar is carried out as a set of presentations (2 each lecture) chosen from a set of available papers (available below). The grade is determined as a function of the presentation, handling questions and answers, and participation:


21.02 Introduction to the seminar (topics, objectives, structure): Veselin Raychev PDF
07.03 Learning type annotation: is big data enough? Ali Luca Beurer-Kellner
Synchromesh: Reliable Code Generation from Pre-trained Language Models Kajetan Nikola Jovanović
14.03 (zoom day) Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks Anton Nikola Jovanović
ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations Pascal Mislav Balunović
21.03 Robust relational layout synthesis from examples for Android Alexis Marc Fischer
Synthesis of web layouts from examples Clément Pesho Ivanov
28.03 Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection Lovro Mark Niklas Müller
GraphCodeBERT: Pre-training Code Representations with Data Flow Lennart Maximilian Baader
04.04 Learned garbage collection Lukas Luca Beurer-Kellner
AutoPandas: neural-backed generators for program synthesis Sebastian Mark Niklas Müller
11.04 (zoom day) Explaining mispredictions of machine learning models using rule induction Ambarish Marc Fischer
Program Synthesis with Large Language Models Jelte Nikola Jovanović
02.05 TFix: Learning to fix coding errors with a text-to-text transformer James Matthew Mirman
CC2Vec: distributed representations of code changes Hrishikesh Matthew Mirman
09.05 Deepproblog: Neural probabilistic logic programming Robert Marc Fischer
The Effectiveness of Pre-Trained Code Embeddings Christoffer Mislav Balunović
16.05 CURE: Code-Aware Neural Machine Translation for Automatic Program Repair Mert Luca Beurer-Kellner
23.05 (zoom day) Leveraging Automated Unit Tests for Unsupervised Code Translation Benjamin Mislav Balunović
Using Active Learning to Synthesize Models of Applications That Access Databases Moritz Maximilian Baader
30.05 Type4Py: Practical Deep Similarity Learning-Based Type Inference for Python Vithurjan Mark Niklas Müller
SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair Yan (moved from May 16) Maximilian Baader