Congressional Data Mining: Coming Soon?

How a little-noticed provision in a House spending bill could revolutionize access to congressional information.

—Photo by flickr user I am I.A.M. used under a Creative Commons license.
Thu March 5, 2009 10:04 AM PST

By slipping a simple, three-sentence provision into the gargantuan spending bill passed by the House of Representatives last week, a congressman from Silicon Valley is trying to nudge Congress into the 21st Century. Rep. Mike Honda (D-Calif.) placed a measure in the bill directing Congress and its affiliated organs—including the Library of Congress and the Government Printing Office—to make its data available to the public in raw form. This will enable members of the public and watchdog groups to craft websites and databases showcasing government data that are more user-friendly than the government's own.

If the Senate passes the bill with the provision intact, citizens seeking information about Congress' activities—such as bill names and numbers, amendments, votes, and committee reports—won't have to rely on government websites, which often filter information, are incomplete, or are difficult to use. Instead, the underlying data will be available to anyone who wants to build a superior site or tool to sift through it. "The language is groundbreaking in that it supports providing unfiltered legislative information to the public," says Honda's online communications director, Rob Pierson. "Instead of silo-ing the information, and only allowing access through a limited web form, access to the raw data will make it easier for people to learn what their government is doing."


story continues below
story continued from above

Successful, privately-created websites that provide the public with information about Congress' actions already exist. OpenCongress.org, GovTrack.us, Legistorm.com, and MAPLight.org all make legislative data available to the public in ways that are easier to navigate than Congress' primary web portal, a system called Thomas. Those sites currently get their data through techies who "scrape" Thomas and other government websites, which means they use bots to process the HTML and gather what is valuable. The process is labor-intensive and imprecise. "It's difficult to keep the data up to date, in some cases impossible, and occasionally there are errors in the data," says Josh Tauberer, the 26-year-old who runs GovTrack.us and does lots of the "scraping" that others use. "This could all be fixed by a bulk data download."

Tauberer expects that the availability of additional and easier-to-use congressional data will spur innovation. "You can expect to see other sites spring up doing new and interesting things with the information." He anticipates charts, graphs, and maps that represent congressional goings-on visually—"ways of visualizing the congressional process that we couldn't yet imagine." Honda, with his Silicon Valley roots, expects that developers and coders will quickly outpace the government's efforts to date. "We hope that we can learn from the wisdom of crowds," says Pierson.

There are government agencies that already provide massive amounts of data via databases. The Census Bureau provides huge amounts of information in raw form, allowing academics, statisticians, and think tank scholars to comb through it in any way they please. The Federal Elections Commission publishes unedited data on campaign contributions, giving rise to sites like OpenSecrets.org, which allows the public to see who is donating to whom, and allows journalists and watchdogs to investigate the influence of money in politics.

"In our Web 2.0 world, we can empower the public by providing them with raw data that they can remix and reuse in new and innovative ways," Honda told Mother Jones in a statement. (Disclosure: In the summer of 2002, I briefly worked as an intern in Honda's district office.) Honda's provision, however, pertains only to legislative data. Federal departments like the Environmental Protection Agency, the Food and Drug Administration, and the Department of Energy have reams of data that political scientists, economists, and researchers of all stripes would love to get their hands on. Many who work at the intersection of technology, politics, and transparency believe that the key player in broadening Honda's effort to include the executive branch will be Vivek Kundra, the former Chief Technology Officer of the District of the Columbia who was named Obama's Chief Information Officer on Thursday. According to the National Journal's Tech Daily Dose, Kundra "told reporters Thursday he will launch data.gov, a Web site intended to 'democratize data' by giving the public raw feeds of information from a range of agencies."

John Wonderlich, the policy director at the Sunlight Foundation, which has created or funded several tools that make government data easier to analyze, is holding out hope that the president's Open Government Directive, which is due at the end of May, will further address the issue of data availability. He applauds Honda for putting Congress, at least, on the right track. "Without Honda's attention to this issue, congressional level attention to bulk data access would be unlikely," he says. "We're happy to see this first step."

Get Mother Jones by Email - Free. Like what you're reading? Get the best of MoJo three times a week.
Comments
no profile pic for comment author

Congressional Data Mining.

Congressional Data Mining. Jonathan is really onto something here. The idea of the public getting raw data and records of transactions from government databases has the potential to transforming government. Can you imagin how the Bush administration would had to act differently if its actions were immediately open to public view. So access is the first needed thing. Second, someone has to have the dedication to do the searches and pull important stuff out of the data and bring it to the public. You can bet that special interest groups will, and there will be new spin doctors. Hopefully organizations like MoJo will pull out the data / records / transactions and objectively present them to the public. We're not going to do it. I could see a blog on "Today's Fact Mining" where MoJo feeds us information daily.

no profile pic for comment author

Imagine this...

Can you imagine how differen the first 5 weeks of the Obama administration would have been if their actions were open to public view instead of behind close door sessions? As we continue to learn hard lessons during the age of information that too much information can be bad. Take identity theft for one. I think we must tread lightly and with great caution when going down this path. There is also potential for unseen pitfalls.

I support this in a broad sense. It's just too bad it isn't in place before we go on the multi trillion dollar shopping spree.

no profile pic for comment author

We should fix it instead of masking it ...

In my opinion, all this is well and good. The danger here is that if indeed we now have "another" layer that gets to the information (that appears to be somewhat not up to snuff) are we not creating a risk of layers and layers of information that is not up to snuff. hmm, maybe we should "fix" the base and then work at layering. Otherwise, we may indeed end up not knowing what is clean or dirty information to begin with. All the same, the amount of terrabytes of information per person on the planet is doubling rapidly. We will now all drown in this mire while we feed the terrabytes with our precious power and cooling infrastructure that spits our all that nasty mercury and whatnot into the atmosphere.

no profile pic for comment author

Striking a blow against government secrecy

This is welcome news, and hopefully the legislation will still contain the provision mandating public access to raw congressional data if/when the bill passes.

Government in the United States is so huge, so complicated, and therefore so out of touch with the people, that if it is to be accountable to us at all, it is vital that it be subjected to strong Sunshine laws such as this.

Ninety-nine percent of what government hides from public knowledge should be made public.

no profile pic for comment author

US Government Web Services and XML Datasources

www.USGovXML.com is an index of publically available web services and XML data sources provided by the US government. It includes detailed descriptions of the data sources and their operations. Links to the host systems for documentation, tech support, etc. are also available. Source code snippets are provided to help developers better understand how to use the data sources. Web based applets, for use by mobile devices (i.e. SmartPhones), have also been provided. The mobile applets are available at www.USGovXML.com/mobile.

no profile pic for comment author

Thanks for sharing this

Thanks for sharing this post! The salaries of the Capitol Hill staffs have been continuously tracked by the website database LegiStorm. LegiStorm may be too hard for certain people to weather but it remains as the only place on the web where you can find congressional staff salaries. "Fact finding trips" have been lampooned as wasteful spending for years, as they don't always reveal anything, and its not like the funds used are a personal loan – the funds come from the taxpayers. The only people protesting it so far are the people whose information is posted, and doubtless they would get a personal loan to quash LegiStorm.

no profile pic for comment author

I have noe idea how to write

I have noe idea how to write an essay on this topic.

Post a comment
Alternately, you may login to or register an account
The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <ul> <ol> <li> <blockquote> <img>
  • Lines and paragraphs break automatically.

More information about formatting options


Jail.org - Inmate Search
Criminal records, instant public records & people search & current court records. www.jail.org

U.S. Public Records Search
Search County & State Court Records, Criminal records, Vital and Adoption Records www.PublicRecordsInfo.com

Records.com - People Search
Public Records and Background Checks. Instantly Search Criminal Records, Addresses and Court Records www.Records.com

Court Records & County Records
Find Instant Public Records, Criminal Records as Well as County Property Records Search. www.PublicRecordsIndex.com

Mother Jones Podcast
Get in on the conversation! We talk about culture, politics, the environment, the economy and more. Listen now!

TalkBackTees.com
A treasure trove of liberal wit, wisdom and quotations, from ancient to modern, on colorful, cotton tees.

Support Independent Artists
Amazing art, crafts, apparel, paper-goods and more. A carefully curated selection of sundries since 1999.

FREE CONNECTIONS FOR GREEN SINGLES
Meet progressive singles in the environmental, vegetarian & animal rights community who share your values