Symbolic execution is a powerful technique that can generate tests steering program execution into desired paths. However, the scalability of symbolic execution is often limited by path explosion, i.e., the number of symbolic states representing the paths under exploration quickly explodes as execution goes on. Therefore, the effectiveness of symbolic execution engines hinges on the ability to select and explore the right symbolic states.

In this work, we propose a novel learning-based strategy, called Learch, able to effectively select promising states for symbolic execution to tackle the path explosion problem. Learch directly estimates the contribution of each state towards the goal of maximizing coverage within a time budget, as opposed to relying on manually crafted heuristics based on simple statistics as a crude proxy for the objective. Moreover, Learch leverages existing heuristics in training data generation and feature extraction, and can thus benefit from any new expert-designed heuristics.

We instantiated Learch in KLEE, a widely adopted symbolic execution engine. We evaluated Learch on a diverse set of programs, showing that Learch is practically effective: it covers more code and detects more security violations than existing manual heuristics, as well as combinations of those heuristics. We also show that using tests generated by Learch as initial fuzzing seeds enables the popular fuzzer AFL to find more paths and security violations.

Machine Learning for Code