Meet the People Behind the Wayback Machine, One of Our Favorite Things About the Internet
The Internet Archive is home to more than 15 million gigabytes of free digital information—and it's just getting started.
Brewster Kahle is quick to point out that we are not standing inside a former Scientology church. Visitors to this looming white building in San Francisco's Inner Richmond District are often confused about its past life as a meeting place for Christian Scientists, not to be confused with Scientologists. It is now a different kind of house of worship, known as the Internet Archive, where free digital access to all knowledge is the canon.
"The average life of a web page is about 100 days before it's either changed or deleted," says Kahle. "Even if it's supported by big companies: Google Video came down, Yahoo Video came down, Apple went and wiped out all the pages in Mobile Me." Capturing this transient web was Kahle's original mission for the Internet Archive when he founded it in 1996. Nearly two decades later, the 53-year-old compares his organization to a "Library of Alexandria, version two."
That may be an understatement. In addition to hosting the Wayback Machine, an ever-growing collection of more than 400 billion copies of web pages, the Internet Archive has also expanded its services by providing millions of free digitized books, TV shows, movies, songs, documents, and software titles. Want to see what MotherJones.com looked like in 1996? Here you go. Are you a Deadhead in search of rare recordings? There are more than 9,000 to choose from. Remember when federal websites were closed for business during the government shutdown? They were still available thanks to the Internet Archive.
Walking through the Internet Archive's physical headquarters, which has occupied this former church since 2009, is a surreal experience. Built in 1923, the grand worship hall on the second floor remains intact, with wooden pews lining the floor and a podium sitting atop a stage. But stacks of humming blinking server racks now rest against the walls. And then there are the figurines—dozens of half-size human models that populate the outside rows of pews and immortalize Archive employees and volunteers throughout the years. Kahle's mini-mannequin stands in the front row. Next to him is Aaron Swartz, the "Internet folk hero" who was a volunteer and contractor from 2007 to 2009. Swartz committed suicide in 2013 following a federal indictment for downloading the contents of the digital library JSTOR from the Massachusetts Institute of Technology. Kahle remains disappointed with how prosecutors, MIT, and JSTOR handled the Swartz case. "Shame on them," he says. "I think it's a symbol of the old world and the old approach that must be overturned. There are some organizations that are still built around this idea of restricting, restricting, restricting, and that's not going to fly."
While Kahle is against restricting access to knowledge, he adamantly supports internet users' right to privacy. In 2007, the FBI sent the Internet Archive a secret National Security Letter (PDF) seeking information about one of its patrons. With the help of the Electronic Frontier Foundation, Kahle challenged the request and won. "That a library has to sue the US government is not terribly appropriate," he says. But the Internet Archive's relationship with the feds is not entirely prickly. It also provides web crawling and book scanning services for the Library of Congress. Kahle says the Patent and Trademark Office has used the Wayback Machine to research which ideas are novel or not.
A collection like the Internet Archive's is extremely valuable. Kahle estimates it has about 15 petabytes of information (a petabyte is approximately one million gigabytes of data). That's a lot less than Facebook's estimated 300 petabytes, but there's a big difference: "The Internet Archive is a nonprofit, and nope, there's no buying it," says Kahle. Kahle has sold other companies in the past. The Internet Archive was started with funding from the 1995 sale of his search system WAIS, which AOL purchased for $15 million. His online tracking service Alexa was sold to Amazon for $250 million in 1999. The Internet Archive's current budget is around $12 million.
One of the Internet Archive's fastest growing collections is its TV News Archive. For 24 hours a day, 7 days a week, HD feeds from more than 65 news channels, both foreign and domestic, are recorded on the Internet Archive servers. The US feeds are fully searchable the following day. Roger Macdonald, who runs the project's entire Television Archive, preaches treating all media as data. He says many TV and cable networks are "scared about experimenting" with closed captioning data that could make their content searchable by a global audience. By making its videos text-searchable, "our service has vaulted over the confines of the linear video storytelling," he says. For example, when Harvard and MIT researchers studied how the media covered the Trayvon Martin shooting, they turned to the TV News Archive, using its closed captioning data to help map the story's evolution.
In 2013, the Internet Archive received an unusual message from Michael Metelits. Metelits's mother, Marion Stokes, who had recently passed away, had recorded more than 35 years of TV news in Philadelphia and Boston with her VHS and Betamax machines. Metelits was left with approximately 40,000 well-organized tapes, but he had nowhere to put them. So he emailed the Archive. "I thought there might be a typo in his email," Macdonald recalls. "I couldn't imagine an individual doing that."
The donated collection turned out to be a goldmine. The TV News Archive began recording in 2000; Stokes had them beat by more than 20 years. And not only were her tapes in good condition, they also recorded closed captioning data, providing vital metadata. Digitizing and logging the massive trove, now stored in Richmond, California, is a challenge, to say the least. Macdonald says they've "only just scratched the surface of imagining what's there."
Looming above the Richmond storage facility where the Stokes collection resides is another element of Kahle's ongoing mission. It's an antenna broadcasting free internet, one of two free wi-fi access points the Archive provides to San Francisco Bay Area residents. (A third free wi-fi setup is in North Carolina.) He says cities "haven't been doing their part" to provide faster access to the web and that communication infrastructure is "just as much the lifeblood as water or transportation to a city."
Adding to its long list of projects, the Internet Archive is also taking a swing at the housing market. Kahle wants to apply the tech industry concept of "open sourcing" to disrupt (if you will) the Bay Area's affordable housing crisis, which has been fueled in part by the booming tech industry. The Internet Archive has set up a separate nonprofit to purchase an 11-unit apartment building six blocks from its San Francisco headquarters, which it hopes will offer "debt free" housing to nonprofit employees. Macdonald says the first Internet Archive employee will move in later this year. Eventually, Kahle's dream is "to transition 5 percent of all housing into a new housing class that would be dedicated to supporting the nonprofit sector."
Even as he sets more ambitious goals, Kahle worries that the end of net neutrality could spell the end of the open web he's fought to preserve. "If we lose net neutrality," he says, "or if we let monopolization happen, whether it's Comcast and AT&T in the United States, or other players in other countries, we will lose the magic that we've had for the last 20 or 30 years with this internet." He urges other technologists to get involved. "We can't just wait on government to do something. They'll be bashed around by the commercial players that have all to gain from monopolization."
Thinking about the current state of internet, Kahle says, "I wake up sometimes really depressed, and sometimes really optimistic." But, he adds, "As they said in other struggles, you should know which side you're on, and at least the Internet Archive knows which side it's on."