FSE 2025, June, Trondheim, Norway 🇳🇴
Wei-Hao Chen
Purdue University
Jia Lin Cheoh
Purdue University
Manthan Keim
Purdue University
Sabine Brunswicker
Purdue University
Tianyi Zhang
Purdue University
Programming in data science is distinct from traditional software development, often relying on tools like Jupyter Notebooks. This paper investigates fine-grained programming mistakes and fixing behaviors in data science through analysis of 390 Jupyter notebooks created by 67 participants over a six-week competition and follow-up interviews with 10 practitioners. We find that errors frequently occur during data preprocessing and exploration stages, with common issues such as ValueError and NameError. We identify 12 fix patterns and 5 types of debugging actions. Our results suggest a need for tools that better support debugging iterative workflows and managing cell dependencies in computational notebooks.
Stage | AttrErr | TypeErr | ValErr | NameErr | NotFnd | BadReq | KeyErr | SyntaxErr | IndexErr | ModNotFnd | Misc | Total |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Data Loading | 9 | 3 | 0 | 25 | 28 | 27 | 6 | 2 | 1 | 0 | 0 | 101 |
Date Preprocessing | 25 | 17 | 32 | 25 | 0 | 0 | 25 | 12 | 4 | 0 | 0 | 140 |
Data Exploration | 24 | 18 | 37 | 31 | 0 | 0 | 20 | 22 | 14 | 0 | 4 | 170 |
Modeling | 1 | 0 | 8 | 3 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 15 |
Prediction | 0 | 1 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 7 |
Evaluation | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 |
Visualization | 9 | 5 | 13 | 16 | 0 | 0 | 8 | 8 | 0 | 0 | 2 | 56 |
Result Saving | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Comment Only | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Helper Functions | 0 | 0 | 19 | 1 | 0 | 0 | 0 | 4 | 0 | 12 | 1 | 37 |
Total | 70 | 44 | 111 | 103 | 28 | 27 | 55 | 50 | 19 | 12 | 10 | 529 |
@inproceedings{chen2025mistakes, title={Towards Understanding Fine-Grained Programming Mistakes and Fixing Patterns in Data Science}, author={Chen, Wei-Hao and Cheoh, Jia Lin and Keim, Manthan and Brunswicker, Sabine and Zhang, Tianyi}, booktitle={Proceedings of the 2025 ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering}, year={2025}, organization={ACM} }