Future researchers can rest easy: Know Your Meme, Urban Dictionary, Creepypasta and Cute Overload have all been preserved by the Library of Congress. So has the band website for They Might Be Giants and the entire published output of The Toast, the humor site that shut down in 2016.
And while the Library of Congress owns a rare print copy of the Gutenberg Bible, the web archive features the LOLCat Bible Translation Project, which rendered the bible in LOLspeak.
For the past 20 years, a small team of archivists at the Library of Congress has been collecting the web, quietly and dutifully in its way. The initiative was born out of a desire to collect and preserve open-access materials from the web, especially U.S. government content around elections, which makes this the team’s busy season.
But the project has turned into a sweeping catalog of internet culture, defunct blogs, digital chat rooms, web comics, tweets and most other aspects of online life.
“Suddenly, these new technologies and social media platforms come in, and these new types of ways people were communicating or sharing data online,” said Abbie Grotke, who leads the archiving team and has worked for the program since 2002, two years after its founding. “And we had to keep up with it all. There’s always something new the web is throwing at us.”
March turned out to be particularly chaotic. With an entire team working from home, the web archivists are participating in an international project to collect content around the coronavirus, as well as adding to the library’s own collections about the pandemic. And, of course, it’s still technically campaign season.
“We do an all-hands-on-deck,” Ms. Grotke said.“And we don’t delete anything. We’re digital hoarders.”
We asked the Library of Congress digital archivists
to riff on popular memes
The Criteria for Selection
The web archive team has grown from one librarian who used to read newspapers and circle mentions of websites to a staff of five, along with employees from other departments who pitch in. It is hardly adequate, given their monumental task.
Already the library has amassed more than 2.129 petabytes of data — or put another way, 18 billion digital documents. And that’s just a sliver of the internet.
“In the vastness of the web, what is the sampling of stuff that we can pull together that demonstrates what’s going on now?” said John Fenn, the head of research and programs at the American Folklife Center. He is also one of about 80 recommending officers, who make suggestions for the library’s archive — in Mr. Fenn’s case, for the Web Cultures collection. (It is one of several thematic groupings in the archive, along with the Webcomics collection, American Music Creators and dozens more.)
“It’s like whack-a-mole,” said Gina Jones, a digital projects coordinator on the team.
The criteria for selection typically used by print archivists — value to future scholars, uniqueness of the material — still apply to the web archivists, though the high extinction rate of digital matter factors into decision making. One of the most recent acquisitions is the recently defunct Design Sponge, an interior decorating website that ran for 15 years. (Though it will cease to exist as a website, every single blog post will be fully accessible through the Library’s web archive.)
The earliest material in the archive dates to the 2000 elections, when the web archive was still a pilot program. After the terrorist attacks of 9/11, when heart-rending memorials and fierce political debates played out online, the library recognized the need for an official digital record.
For years, collecting was keyed to major news events: the Iraq War, the 2004 elections. Then, around 2009, came a more continuing, expanded approach that sought to reflect the web in all its dizzying newness.
It is inevitable that many things go uncollected or are lost forever. The recommending officers have regrets.
Megan Halsband, who oversees the Webcomics collection, still mourns the death of Joey Manley in 2013, and with him, the influential sites he published like Serializer and Girlamatic. And she has so far been unable to archive another popular webcomics site, The Oatmeal, because in that case, the cartoonist who runs it has never responded to her emails seeking permission. (The library has an opt-in policy.)
“It probably goes into their spam,” Ms. Halsband said.
The Library of Congress Web archive isn’t the only in-depth record of the internet and it is not as comprehensive as the Wayback Machine, which is a project of the Internet Archive, a nonprofit in San Francisco. (The Wayback Machine has been crawling the internet since 2001, preserving more than 411 billion web pages by recent count.)
But the Library of Congress digital collection carries with it the heft of the federal government and the official stamp of American history. Digital material that is chosen by the web archivists will live alongside the rough draft of the Declaration of Independence, “Moby Dick” and other sacrosanct print holdings.
Ms. Grotke, 52, is of the generation who were adults when they first learned about the internet. In her case, it was back around 1993, at a house in the Dupont Circle of Washington, D.C., where a friend of her brother’s lived. “He brought us over and we got to see Mosaic, an early browser,” she said. “I remember clicking and, like, whoa, there’s hyperlinks.”
That wide-eyed reaction to the internet has morphed, over 18 years of trying to corral it, into a more seasoned outlook. “The web is messy, and the web archives are messier,” she likes to say.
In addition to running the team, Ms. Grotke’s other current task is to make the public aware of the archive’s existence. The archive’s website is available to anyone with an internet connection, but after 20 years it remains underutilized by the general public and the scholars it may be most beneficial to.
Ian Milligan, an associate professor of history at the University of Waterloo in Canada, has used the web archive to research the 1990s, and as a teaching tool in the classroom. Historians have had a long tradition of sitting down in a reading room, looking through tidily packaged print material, he said.
But with a digital archive, “we’re talking petabytes of information. You need technical skills to work with this — skills that are beyond almost anyone in the social sciences,” or the general public.
Ms. Grotke said that if the archive is slightly impenetrable at present, it’s a consequence of limited resources and the ever-expanding ocean of digital content. “We don’t have time to stop and make it more user friendly. We’re just trying to collect it all” before it disappears, she said.
“Our archives are just massive and keep growing and growing,” Ms. Grotke said. “And I have the same number of staff.”
Safe From Eradication
When Grace Bonney, the creator of Design Sponge, decided to stop publishing, she thought she would leave the website online indefinitely as a static archive for anyone who still wanted to read it.
But after a talk with her accountant, who explained the hosting fees would run to thousands of dollars each month — an unaffordable sum — Ms. Bonney ran into a paradoxical truth about the internet: Information doesn’t live forever online.
Even websites as successful as Design Sponge, which played a pivotal role in teaching the first internet generation how to decorate and what to cook and where to buy paper goods sourced from Uruguay, are prone to disappear overnight.
Ms. Bonney had resigned herself to that fate when, last fall, she received an email that began: “The United States Library of Congress has selected your website for inclusion in its web archives. We consider your website to be an important part of this collection and the historical record.”
To be asked to be part of the national record was “surreal,” Ms. Bonney said. “When you work on the internet, it’s easy to feel nothing ever happens. It can disappear in blink of eye,” she said, adding, “to be one of the lucky few to have our work saved is like winning the lottery.”