“I could tell you a story, but I don’t want to come across as really ridiculous,” Tauberer begins, after a characteristic pause. “So, in third grade, you’re supposed to bring in a book to read. And I brought in a programming manual. It’s not narrative, right? Every page is just one particular function that you can program, and I’m just flipping through the pages and learning the functions.”
Tauberer’s father, a civil engineer, taught him to program when he was seven or eight years old. In high school at Plainview-Old Bethpage John F. Kennedy High School in Long Island—“which apparently is one of the best high schools in the country,” he says—Tauberer won a website-design competition for a site called “Webcytology” that he created with his best friend. Another friend came up with the name, he says, a mashup of website and cytology (unicellular biology).
“It was an educational website that included a cool simulation of unicellular life,” he explains. “You would design your own single-celled organism by choosing things like how many mitochondria to put in it, and the website would put your organism in a simulated colony with other users’ organisms, and you would watch it replicate over days and weeks.”
But even before he got to college in 2000, Tauberer knew that he didn’t just want to be a programmer. “Which is weird,” he says. “I should do what I’m good at, but I didn’t want to.”
A class discussion about the 1998 Digital Millennium Copyright Act in a course on copyright law, free speech, and technology sparked the idea of applying his talents to government transparency. The DMCA was intended to prevent people from circumventing copyright protection measures but ended up penalizing innocent people, he says—for instance, by blocking them from transferring a legally purchased book or video from one device to another. “In the class we were being told all the reasons why it was not a great law, and it seemed like an obviously bad law,” Tauberer says. “And I thought, if the American public had better information about what was happening in Congress we might actually be able to prevent bad laws from happening, or at least hold people accountable.”
In his free time Tauberer started searching for ways to stay informed about Congress. The Library of Congress’s website, THOMAS, established in 1995, listed the status of bills, but it was notoriously unstable. (“The links on THOMAS break after 5 or 10 minutes,” says Daniel Schuman.) The House and Senate websites were supposed to provide voting records, but those weren’t always updated or complete. At the time (2001), there certainly wasn’t any one place where he could get all the information he was looking for—congressional voting records, summaries of bills, notifications of when a bill was changed.
So Tauberer decided to build one.
The first component of GovTrack was the actual website that allows users to stalk Congress’s every move. Among its key features are research tools that allow the user to search for a bill on a particular subject area and get email updates every time something happens in that arena. So if you are a doctor, for example, you can stay abreast of every law related to medicine. It is also the only site that displays edited bills in a marked-up form, similar to Microsoft Word’s track changes feature. Before GovTrack, the only way to compare current and past versions of bills was to read them side-by-side and look for changes.
The site also provides Prognosis, a statistical analysis tool that calculates each new bill’s chances in Congress. This is a boon in particular to activists with limited resources in deciding where to concentrate their efforts, helping to level the playing field. “He’s giving other people the opportunity to have access to high-quality information,” says Schuman. “You don’t have to just be a very wealthy corporation to know what’s going on. You can be anyone.”
As an example, take H.R. 2397, the Department of Defense Appropriations Act, 2014. The site offers summaries of the bill from the Library of Congress and House Republicans, explains its current status—as of July 24, it was passed by the House and sent on to the Senate for consideration—and predicts its chance of passage: 20 percent.
When he launched the site in 2004, “I had no conception that [it]could be a career in any way,” Tauberer says. “It was a website. It was interesting. It had a couple of people visiting it. It was losing money, not generating money, which it does now.”
His expectations for the site were so low that 2004 was when he decided to enter Penn’s doctoral program in linguistics, a subject he had always found interesting—and in many ways similar to programming. “A lot of linguistics and a lot of programming is information management,” he says. “What are the relationships between things and how do you express those relationships in a simple but [as] correct way as possible?”
But despite Tauberer’s initial doubts, GovTrack has been hugely popular. In 2012, the site had five million users, including journalists, lobbyists for businesses and other causes, and members of Congress. “Everybody uses it!” exclaims Schuman. “The current system that Congress has available [to track legislation] isn’t particularly good, and what Josh has built is much much, much better.”
Perhaps the greatest service performed by GovTrack is the database of political information that Tauberer has assembled to create it. While the federal government has a huge database related to the affairs of Congress (all the bills in both chambers, all the action on those bills, who serves on what committee, etc.), it doesn’t release that information directly to the public. Instead, information is spread out in bits and pieces across multiple websites—THOMAS, the House and Senate websites, the Government Printing Office’s Federal Digital System (gpo.gov/fdsys), and others.
For an individual looking for information, this system can kind of work—if you’re willing to take the time and trouble to search the various websites and piece things together. Not so much if you’re trying to build your own website or app harnessing that data.
Tauberer initially tried to get access to the original database, reasoning that, as an American, he deserved to have information about his own government in any form he found useful. But this reasoning proved unconvincing to the agencies responsible for that information.
“We periodically receive requests such as yours, so I know the answer is no, we are not able to provide anyone with direct access to our data,” a Library of Congress staffer replied in May 2001 to Tauberer’s request. The letter went on to note that all material on THOMAS was in the public domain, and “no permission is required to use it,” and concluded, “Good luck with your project.”
The question ever since, Tauberer says, has been “Where did this no-data-sharing policy come from? To be honest, I’m still not sure.” But what it boils down to, he adds, is an attitude that “the public can’t be trusted to have more information about government. It’s perverse. And un-American.”
The letter also suggested that Tauberer could use “robots” to gather data from the site. This is the same thing as “screen scraping,” Tauberer says, which is ultimately what he did do, using software that recognizes different pieces of information and then compiles it into a central location.
Screen scraping, by the way, isn’t easy, says Tauberer, likening the process to the story of Humpty Dumpty. “It takes years to be able to do it accurately,” he explains. “You never really know which pieces fit where, exactly, until things happen. So, a veto is a really rare occurrence. Until a veto occurs, you can’t really tell how it is going to appear on Congress’s website, so you can’t really predict how to program for it. And then, once it occurs you have to scramble and figure out, ‘OK, now how do I add this to the database?’”
Tauberer not only used the database to create GovTrack, but also made it available to any programmer or developer for use in their projects. “Many, many dozens of people, many hundreds now, have taken this data and built something or at least tried to build something” with it, he says.
For example, MAPLight.org, which tracks the correlations between votes and campaign contributions, uses GovTrack’s data, as does Filibusted.us, which records which Senators filibuster the most bills. And the House Democratic caucus uses GovTrack’s data to run the internal web portal that keeps track of their legislative agenda—this after getting the same No answer Tauberer did from the Library of Congress to a request for the database, which he calls “a real example of just how locked down the data in Congress has been.”
“It’s not just that Josh has gone and figured out how to unscramble the egg,” says Schuman. “He’s figured out how to unscramble the eggs and make the eggs available to everyone to use for free.”
“He illustrates pro-bono probably better than anybody else I’ve seen,” echoes the Cato Institute’s Jim Harper. “He’s doing it because it’s interesting to him, and it’s going to help other people. What more do you need?”
“Once I started looking for data, I was insulted that the information wasn’t available for free anywhere,” says Tauberer, explaining his motivation. “I guess I got it into my head that there was a moral obligation for the government to make core information available.”