What makes confusing code so confusing? Current Atoms initiatives look for the Hows, Whys and Wheres

2018-06-05 · Posted by: Lois Anne DeLong · Categories: Atoms of Confusion · Comments

The Atoms of Confusion project deals with perhaps the most random variable in the development of software—the human programmer who writes and/or maintains the code. The contention of the Atom’s research team is that there can be issues within a piece of code that cause programmers to make costly or potentially damaging assumptions about its output. As such, the project explores how elements in the code affect human comprehension.

In a relatively short span of time, the group has conducted and evaluated the results of two carefully constructed user studies, and has obtained empirical proof that there are small, self-contained patterns within lines of code that could cause programmer confusion. To date, 15 of these patterns have been identified and confirmed as “Atoms of Confusion,” including several ignored by most commonly-used style guides, and, one—the use of curly braces—where our findings contradicted both the NASA and Linux style guides.

By the time the paper documenting these study results was presented at the Foundations of Software Engineering Conference in September, 2017, the research team—which in addition to students, faculty, and staff at NYU Tandon also includes personnel at the Pennsylvania State University and the University of Colorado, Colorado Springs—had opened its research efforts on several new fronts. One group began to measure factors that can affect levels of confusion, such as where the atoms are located within a piece a code, while another looked at how these confusing patterns influence brain activity in developers. Still another group set out to see just how omnipresent these atoms are within code “in the wild,” by conducting a quantitative assessment of the frequency of atoms in real-world software.

Here is a brief review of the research initiatives the Atoms of Confusion project has undertaken over the past year.

Atoms Do Exist in the Wild

While the initial studies did prove the existence of Atoms of Confusion, its original code corpus was selected precisely for its likelihood to contain atoms. That left an important question to answer: were these confusing patterns just as prevalent “in the wild?” A group led by NYU Tandon Ph.D. student Dan Gopstein identified a corpus of 14 of “the most popular and influential open source C and C++ projects” to measure the amount of atoms, if any, that they might contain. They found that the 15 confirmed atoms occurred “millions of times in programs like the Linux kernel and GCC, appearing on average once every 23 lines.”

The research team, which also included Hongwei Henry Zhou, and faculty members Phyllis Frankl and Justin Cappos, summed up the significance of this work by noting it demonstrated “that beyond simple misunderstanding in a lab setting, Atoms of Confusion are both prevalent—occurring often in real projects—and meaningful,” as they are “being removed by bug-fix commits at an elevated rate.” A paper documenting the Atoms Finder work was presented at the Mining Software Repositories conference in Gothenburg, Sweden in May, and was honored by conference organizers as a distinguished paper. This marks the second time this recognition has been given to an Atoms paper, and also the second time a paper on which Gopstein was lead author has been so honored.

What Programmers are Really Thinking When they Think about Code

As the Atoms project evolves, one question the team continues to return to is “why are these code patterns confusing?” While the answer probably lies in a confluence of physical and psychological factors, one line of investigation that could prove helpful is learning how confusion manifests itself in the brain waves of programmers. Martin K.C. Yeh, an Atoms team member from the Pennsylvania State University, has been using an inexpensive, non-invasive EEG device to record the brain activity of developers when shown both confusing and non-confusing code snippets. The results of a pilot study on 8 subjects indicate that more neurons may be active when a subject is solving confusing code snippets.

Yeh summarized his findings from the pilot study in a paper, delivered at the Frontiers in Education Conference in October 2017. Coauthored by Gopstein, Yanyan Zhuang, and Yu Yan, the paper also suggested that “intelligent tutoring systems” might be able to incorporate EEG “as an input to provide detailed explanations, extra practices, additional examples, or select different instructional strategies” when brainwaves suggest confusion.

Since that presentation, Yeh has completed a second round of tests using a larger subject pool and he is currently analyzing the results.

Other Initiatives

The Atoms of Confusion team has three other initiatives currently in process.

Examining the influence of an atom’s position within a snippet of code on its ability to create confusion.
Identifying more potential atoms for testing and confirmation
Determining the effect of developers’ native languages on their perception of code, and their susceptibility to atoms. To follow the progress of the Atoms of Confusion project, check our web site.