Reliable and Trustworthy Artificial Intelligence

Semester:Fall 2025

Number:263-2400-00L

Head TA:

Nikola Jovanović (use Moodle for questions unless they contain sensitive information)

TA:

Jasper Dekoninck, Giovanni De Muri, Kazuki Egashira, Thibaud Gloaguen, Nikola Jovanović, Yuhao Mao, Robin Staab, Chenhao Sun

Lecture: Wed 14-16, HG G 3 (recordings on the ETH video portal)

Exercise: Mon 12-14, CAB G 56 or Wed 12-14, CAB G 51

Credits:6

Overview

Reliability, security, privacy, and robustness are core challenges in achieving trustworthy AI and are of fundamental importance. The goal of this course is to teach both the mathematical foundations of this emerging field and to introduce students to the latest and most exciting advances. To facilitate deeper understanding, the course includes a group coding project where students will build a system based on the learned material.

The course is structured in three parts:

Robustness in Machine Learning

Adversarial attacks and defenses on deep learning models.
Automated certification of deep learning models (convex relaxations, branch and bound, randomized smoothing).
Certified training of deep neural networks (combining symbolic and continuous methods).
State-of-the-art attacks and novel attack vectors for large language models (LLMs).

Privacy in Machine Learning

Threat models (e.g., data stealing, model poisoning, membership inference).
Privacy attacks in decentralized (federated) machine learning.
Protection via differential privacy; applications to centralized and decentralized model training.
Memorization in generative AI models; training data extraction attacks.
Private attribute inference with generative AI models.
Securing data flows in agentic AI systems.

Provenance and Evaluation in Generative AI

Reliable detection of AI-generated content via watermarking.
Removing and forging watermarks; data watermarking.
Dataset contamination: detecting and evading detection.
Trustworthy evaluation of LLMs: challenges in benchmarking and rating.
Bridging AI regulation (e.g., EU AI Act) and technical evaluations.

Lectures

Use your NETHZ account to access the files.

Date	Content	Slides	Solutions
Sep 17	Course Introduction
Sep 24	Adversarial Attacks and Defenses
Oct 1	Neural Network Certification: Box, MILP (Part 1)
Oct 8	Neural Network Certification: MILP (Part 2), DeepPoly
Oct 15	Branch and Bound, Certified Defenses, Combining Logic and Deep Learning
Oct 22	Randomized Smoothing, Course Project Introduction
Oct 29	LLM Attacks and Defenses		*
Nov 5	Introduction to Privacy, Federated Learning Attacks
Nov 12	Differential Privacy	*
Nov 19	LLM Privacy: Membership Inference, Memorization, Prompt Injection, Security of AI Agents
Nov 26	LLM Watermarks
Dec 3	LLM Evaluation and Benchmarking
Dec 10	Guest Lecture (LatticeFlow): Beyond AI Governance

Recordings

All lecture recordings from this year will be available on the ETH video portal, in the same way as the recordings from 2024. Another useful resource is our Youtube playlist of lecture recordings from 2020. However, note that the course syllabus has been significantly modified in the meantime.

Course Project

The project description is on these slides. The code release is scheduled for Oct 25.

Previous Exams

Previous exams (formerly, this course was named "Reliable and Interpretable Artificial Intelligence") are available in the exam collection of the student association (VIS).

Course Organization

Lectures

The lecture will take place physically in room HG G 3, but will be recorded.
For additional questions, we have prepared a Moodle forum.

Exercises

Every week, we will publish an exercise sheet and its solutions on this page, by Thursday evening.
The exercise session will consist of a discussion of selected exercises (potentially not all exercises). On demand, the teaching assistant can also discuss questions on specific exercises brought up by students.
Some exercise sessions will also discuss prerequisites for the course. The material covered in these sessions will be available online. This will be the case in the first exercise on Sep 22/24. For other exercise sessions, we will announce by mail if they discuss prerequisites.
Attending the exercise sessions is optional. We will not cover new material in the exercise sessions, except for prerequisites (see above). Therefore, we will also not record the exercise sessions.
We strongly recommend solving the exercises before next week's exercise session, and before looking at the solutions. The style of the exam will be similar to the exercises, so first-hand experience solving exercises is critical.
For additional questions, we have prepared a Moodle forum.
There is no need to attend both exercise sessions, as their contents will be equivalent.

Communication

All communication (e.g., special announcements) will be sent out by e-mail.

Prerequisites

While not a formal requirement, the course assumes a good understanding of linear algebra, analysis, probability theory and machine learning (especially neural networks). These topics are usually covered in “Intro to ML” classes at most institutions (e.g., “Introduction to Machine Learning” at ETH).

The coding project will utilize Python and PyTorch. Thus programming experience in Python is expected. Students without prior knowledge of PyTorch are expected to acquire it prior to the course.

Literature

For students who would like to brush up on the basics of machine learning used in this course, we recommend

Section 3 (Background) of the publication An Abstract Domain for Certifying Neural Networks by Gagandeep Singh, Timon Gehr, Markus Püschel, and Martin Vechev
Neural Networks and Deep Learning by Michael Nielsen
Deep Learning book by Ian Goodfellow, Yoshua Bengio, and Aaron Courville