Towards Understanding Fine-Grained Programming Mistakes and Fixing Patterns in Data Science

FSE 2025, June, Trondheim, Norway 🇳🇴

Wei-Hao Chen

Purdue University

Jia Lin Cheoh

Purdue University

Manthan Keim

Purdue University

Sabine Brunswicker

Purdue University

Tianyi Zhang

Purdue University

Abstract

Programming in data science is distinct from traditional software development, often relying on tools like Jupyter Notebooks. This paper investigates fine-grained programming mistakes and fixing behaviors in data science through analysis of 390 Jupyter notebooks created by 67 participants over a six-week competition and follow-up interviews with 10 practitioners. We find that errors frequently occur during data preprocessing and exploration stages, with common issues such as ValueError and NameError. We identify 12 fix patterns and 5 types of debugging actions. Our results suggest a need for tools that better support debugging iterative workflows and managing cell dependencies in computational notebooks.

Error Distribution across DS Stages

Stage AttrErr TypeErr ValErr NameErr NotFnd BadReq KeyErr SyntaxErr IndexErr ModNotFnd Misc Total
Data Loading 9 3 0 25 28 27 6 2 1 0 0 101
Date Preprocessing 25 17 32 25 0 0 25 12 4 0 0 140
Data Exploration 24 18 37 31 0 0 20 22 14 0 4 170
Modeling 1 0 8 3 0 0 0 1 0 0 0 15
Prediction 0 1 2 1 0 0 0 0 0 0 2 7
Evaluation 2 0 0 0 0 0 0 0 0 0 0 3
Visualization 9 5 13 16 0 0 8 8 0 0 2 56
Result Saving 0 0 0 0 0 0 0 0 0 0 0 0
Comment Only 0 0 0 0 0 0 0 0 0 0 0 0
Helper Functions 0 0 19 1 0 0 0 4 0 12 1 37
Total 70 44 111 103 28 27 55 50 19 12 10 529

Citation

@inproceedings{chen2025mistakes,
  title={Towards Understanding Fine-Grained Programming Mistakes and Fixing Patterns in Data Science},
  author={Chen, Wei-Hao and Cheoh, Jia Lin and Keim, Manthan and Brunswicker, Sabine and Zhang, Tianyi},
  booktitle={Proceedings of the 2025 ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},
  year={2025},
  organization={ACM}
}