Software bugs are endemic to computing. They emerge in small ways, like when your word processor won’t format text the way it’s supposed to, and big ways, as when the 2014 Heartbleed bug took advantage of coding flaws to compromise the security of 500,000 of the most-visited websites in the world. These errors have major consequences, but they’re hard to guard against completely. That’s because modern software is so complicated, and designed with such widely varying standards, that even the most diligent coders can’t be sure their programs will always do exactly what they’re supposed to—and, critically, nothing else.
Now, a trio of Penn professors is helping to develop an entirely different approach to programming that has the potential to eliminate software bugs completely. Steve Zdancewic, Stephanie Weirich, and Benjamin Pierce, all professors of computer and information science, are part of a newly launched project called DeepSpec that intends to bring the rigor and precision of mathematics to the world of computers—so that programmers can prove the accuracy of their code with the same complete confidence that mathematicians establish truths about numbers or geometry.
“The question is, how do you know you’re actually preventing all the problems you think you are, and what if you make a mistake?” says Zdancewic. “Part of what DeepSpec is trying to study is how we can turn these challenging engineering efforts into a science.”
Programming bugs have existed as long as computers have been around. They’ve become a more urgent problem as we’ve started to ask software to do more important tasks. If a video game malfunctions, it’s no big deal; if air traffic control software hits a glitch, look out below.
For this reason, in January the National Science Foundation awarded a $10 million, five-year grant to launch DeepSpec, a group of six researchers headed by computer scientist Andrew Appel at Princeton. DeepSpec is a response to a condition of modern software: It’s so complicated that ways of verifying its functionality that have grown up with the field are no longer adequate.
“If you look at Microsoft Word, it’s millions of lines of code that all have to interact with each other in subtle ways,” that aren’t always easy to predict, says Weirich.
Another challenge is that even if software seems like it always works correctly—that for every input, it returns the correct output—it’s hard to rule out the possibility that there remains some particular set of conditions under which the software will malfunction.
“Good testing is hard, because it’s hard to think of all the ‘corner cases,’ which are where the bugs hide,” says Pierce, the Henry Salvatori Professor of Computer and Information Science.
The way software is typically written makes it especially hard, if not practically impossible, to anticipate all those corner cases. In general terms, programmers will have a task they want to perform and a range of tools they can use to engineer software that accomplishes it. Within that framework, they feel their way towards a solution. In an exaggerated sense, it’s similar to rigging up a knot that will support a large weight. If you approach the task naively, you could probably come up with a knot that looks formidable. But unless you have a predetermined vision of the knot, and an exact understanding of how all the loops work together, you can’t rule out the possibility of a weak point that, when pulled, causes the whole thing to come undone.
In software terms, the way to rule out such possibilities is through the process of specification (which is where the “spec” in DeepSpec comes from). Specification involves defining exactly what a program is intended to do. It’s easier said than done. To see why, consider self-driving cars—one domain in which it will be important that software never makes mistakes. In plain terms, we might be able to explain what we want a self-driving car to do. But specifying those tasks in terms a computer can understand, and that account for all the possible scenarios that could unfold on a road, is considerably harder.
“At a very high level, you can say, ‘I don’t want this self-driving car to drive off the road, I want it to stay in its lane,’” says Robert Constable, a computer scientist at Cornell University who works on software verification and knows all three of the Penn professors professionally. “But what precisely does that mean, ‘stay in your lane’? What happens when the lane markers disappear, the road narrows, [or] there’s a road closure? You have the standard image that the car should stay in its lane, but no, it changes all the time.”
Through DeepSpec, each of the Penn professors intends to create tools to enable the creation of a fully specified computer system. Zdancewic is working on creating specifications for LLVM, a compiler that translates high-level programmer code like Java into machine code that the computer’s processor (like an Intel chip) can understand.
“Because all of the code [programmers] write gets transformed by this tool, if there are bugs, it can lead to serious vulnerabilities,” Zdancewic says.
Pierce will be developing a method called “property-based random testing” that will allow programmers to quickly probe the accuracy of their code using millions of randomly yet strategically generated inputs, prior to the point that they write out a full specification. And Weirich will be establishing a new effort to fully specify the core elements of a mathematically-derived programming language called Haskell.
In all of this work, the DeepSpec team will be using a program called Coq, which provides programmers with a highly structured environment in which to write code. Coq is based on a branch of mathematics called type theory, in which elements of a program’s code can be defined in precise relation to each other. This gives software written in Coq a modular flavor that makes it easier to verify in a comprehensive way.
Coq is a kind of programming environment called a “proof assistant”; it treats code as a mathematical or logical argument in which, given some starting assumptions, or axioms, certain conclusions emerge as necessary while others can be ruled out completely. Coq can be used to create a specification of what a program is supposed to do, and it can also follow the logic in a program’s code, to either verify that the code completely and correctly enacts those specifications, with no holes or bugs, or identify coding errors that need to be corrected.
“You use a rich mathematical language provided by Coq to describe the intended behaviors, and then you have to go through a lot of work to prove the program or hardware actually agrees with that specification,” says Zdancewic.
The end result is a strong security guarantee. If the functionality of a program can be precisely specified, and those specifications can be verified in Coq, then developers know that their software is capable of doing exactly what it’s supposed to do and nothing more. This eliminates the possibility of a bad actor exploiting a bug and using software in ways programmers never intended it to be used.
One of DeepSpec’s primary goals is to move specification from the margins of computer science to the heart of the way critical computer systems are assembled. Over the next few years the team intends to create a project, like a fully verified web server or fully verified voting software, that will demonstrate the power and practicality of specification to industry giants like Microsoft and Google. In addition, Pierce is working on developing curricular materials—including an updated version of his textbook, Software Foundations—that will make it easier for computer science departments to teach students how to formally specify programs.
It would be a major shift in the way software is written, but it’s also a necessary one. Software is already involved in many critical parts of our lives and its influence is only growing. If computers are going to drive cars and robots are going to perform surgery, then human beings are going to have to find better ways to guarantee that the underlying software really works.