Hi, my name is Weihao Chen (陳威豪). I'm a third-year Ph.D. student in the Department of Computer Science at Purdue University, and a member of the Human-Centered Software Systems Lab led by Professor Tianyi Zhang. My research focuses on Human-Computer Interaction (HCI), Software Engineering, and Data Science, and I'm currently developing tools to help data scientists.
Web automation is frequently used by data scientists, domain experts, and programmers to complete time-consuming data collection tasks. However, developing web automation scripts requires familiarity with a programming language and HTML, which remains a key learning barrier for non-expert users. We provide MIWA, a mixed-initiative web automation system that enables users to create web automation scripts by demonstrating what content they want from the targeted websites. Compared to existing web automation tools, MIWA helps users better understand a generated script and build trust in it by (1) providing a step-by-step explanation of the script's behavior with visual correspondence to the target website, (2) supporting greater autonomy and control over web automation via step-through debugging and fine-grained demonstration refinement, and (3) automatically detecting potential corner cases that are handled improperly by the generated script. We conducted a within-subjects user study with 24 participants and compared MIWA with Rousillon, a state-of-the-art web automation tool. Results showed that, compared to Rousillon, MIWA reduced the task completion time by half while helping participants gain more confidence in the generated script.
Keywords: Programming by Demonstration, Web Automation, Data Science
In the era of big data, Data Science (DS) plays a crucial role in gaining valuable insights from data across various domains. However, our understanding of DS programmers' coding behavior remains insufficient. Previous studies mainly analyzed DS code from public code-sharing platforms such as GitHub and Kaggle, which are limited to code changes committed to the version history, omitting many coding errors that are resolved before code commits. To bridge this gap, we present a comprehensive analysis of the fine-grained logs of a DS hackathon that includes Jupyter Notebooks written by participants for over six weeks. We recorded all code changes and program execution logs, enabling us to identify common programming mistakes and bugs across different data science stages and programmers' debugging behavior. This work enhances our understanding of DS coding errors and debugging practices of DS programmers, highlighting several future opportunities for designing new tool support for DS programming.
Keywords: Data Science, Jupyter Notebook, Programming Behavior