We explore the need for digital archives like the Wayback Machine, and why they're crucial to build a record of facts on the web
View in browser ([link removed] )
February 20, 2024 // Did someone forward you this newsletter? Sign up to receive your own copy here ([link removed] ) .
Image from Project Liberty
Digital archives: a time machine for the web
In the summer of 2023, the New York Times ran an article titled “Ways You Can Still Cancel Your Federal Student Loan Debt.”
The article outlined six ways to cancel student debt, with the final being:
"Death
This is not something that most people would choose as a solution to their debt burden."
At least that was the sixth reason until the New York Times revised it with a stealth edit ([link removed] ) . When you read the article ([link removed] ) today, choosing death as a solution to a debt burden has been replaced, but there’s no mention that this article was revised. The timestamp is still the day it was originally published.
If not for Internet Archive ([link removed] ) ’s Wayback Machine ([link removed] ) , this discrepancy wouldn’t have been caught. The Wayback Machine is a digital archive of the internet, and as such, it captured multiple previous versions ([link removed] ) .
The internet is constantly being revised in ways that allow history to be rewritten and a shared sense of truth to be questioned. With AI-generated disinformation, the potential to exert control over the future by rewriting the past has never been greater.
This week we’re exploring how digital archives are crucial in developing a record of truth in an ever-changing web.
// The need for digital archives
Mark Graham ([link removed] ) , Director of the Wayback Machine, spoke with the Project Liberty Foundation and shared the key reasons why there’s an even greater need for digital archives:
Mark Graham ([link removed] )
- The importance of the internet. So much of what humanity publishes and makes available lives only on the internet. Given how much time we spend online, the internet has become a central medium of human expression, history, and culture.
- The fragile and ephemeral nature of the internet. Graham shared two stats that underscore how fragile today’s internet is:
- A study ([link removed] ) found that of the two million hyperlinks in New York Times articles from 1996 to 2019, 25% of all links were broken (described as link rot ([link removed] ) ).
- The Wayback Machine has fixed 20 million broken links in Wikipedia articles with the correct ones.
“The web itself is a living thing. Webpages change. They go away on quite a frequent basis. There's no backup system or version control system for the web,” Graham explained. That is, except for archives like the Wayback Machine.
//
The Wayback Machine has archived over 866 billion webpages.
//
// The Wayback Machine
The Wayback Machine is a “time machine for the web,” in Graham’s words. It allows users to trace the evolution (or disappearance) of a webpage over time, enabling them to establish a record of what happened on the internet.
- For example, the Apple.com ([link removed] ) URL has been archived 539,000 times ([link removed] ) since its first archived page in October 1996.
- The Wayback Machine has archived over 866 billion webpages in its 28-year history. Today, it archives hundreds of millions of webpages every day and has become one of the most important archives of online content in the world.
// How it works
- The Wayback Machine “crawls” the web and downloads publicly accessible information. Webpages, documents, and data are stored with a time-stamped URL.
- For information that’s not publicly accessible, Internet Archive offers web archiving services through Archive-It ([link removed] ) for 1,200 organizations in 24 countries around the world (from libraries to research institutions).
- The Wayback Machine supports everyday people to help it archive the internet. Anyone can go to Save Page Now ([link removed] ) to archive a webpage or article.
- The Wayback Machine partners with 1,200 fact-checking organizations globally to help it reference material on the web that was the source of disinformation. It has built a library of more than 200,000 examples where a claim has been made, and the Wayback Machine has provided additional context on if that claim is true (known as a review of the claim).
// Archive of facts
Fixing links, archiving webpages, and fact-checking digital articles are part of a deeper, more important project to chronicle digital history and establish a record of facts.
- Last month, the archive of press releases from a sitting member of Congress, New York’s Elise Stefanik, vanished after she came under scrutiny ([link removed] ) . The Wayback Machine documented this erasure and provided a time-stamped record of past versions of her website and press releases.
- In 2018, a US Appeals court ruled ([link removed] ) that the Wayback Machine’s archive of webpages can be used as legitimate legal evidence.
- The Internet Archive has countless examples ([link removed] ) of when the press have referenced the Wayback Machine to correct disinformation and dispel rumors. In one example from last year, the Associated Press relied on the Wayback Machine ([link removed] ) to set the record that the CDC did not say the polio vaccine gave millions of Americans a “cancer virus.”
With the rise of AI-generated disinformation, there’s reason to believe such attempts at rewriting history ([link removed] ) (even if that history is just yesterday) will become more prevalent and the social contract that has governed web crawlers ([link removed] ) is coming to an end.
// A citizen-powered web
Building digital archives is a xxxxxx against those attempting to rewrite history and spread misinformation. An archived, time-stamped webpage is not just unimpeachable evidence, it’s a foundational building block of a shared sense of reality.
In 2014, when Malaysia Airlines Flight 17 went down over Ukraine, the Wayback Machine captured evidence that a pro-Russian group was behind the missile attack. But it wasn’t the Wayback Machine’s algorithms that captured the evidence by crawling the internet; it was an individual ([link removed] ) who found an obscure blog post from a Ukrainian separatist leader touting the shooting down of a plane. That individual identified the blogpost as important enough to be archived, and it became a critical piece of evidence, even after that post disappeared from the internet.
As Graham said, “You don't know what you got until it's gone. If you see something, save something.”
What pages can you help archive? Archive them with the Wayback Machine on Save Page Now ([link removed] ) .
Project Liberty Foundation roles
// Project Liberty Foundation is seeking a Research and Governance Program Manager to cover for six months from March 2024 - September 2024. Learn more and apply here ([link removed] ) .
Other notable headlines
// 🏛 An article in Tech Policy Press ([link removed] ) asks, can democracy survive artificial general intelligence?
// 🕵 The New York Times reported ([link removed] ) that hackers working for China, Russia, and North Korea have used OpenAI’s systems in the creation of their cyberattacks.
// đź—ł According to an article in the Wall Street Journal ([link removed] ) , a new era of AI deepfakes will complicate the 2024 elections.
// 🤖 What will happen when AI starts training itself? An article in The Atlantic ([link removed] ) explored the implications of AI training on AI.
// 📝 An article in WIRED ([link removed] ) explored what would happen if 26 words in Section 230 were removed.
// đź–Ą An article in the Wall Street Journal ([link removed] ) explored how AI will lead to the end of the internet as we know it.
// 🦺 AI doesn’t have to be a job destroyer. It could help rebuild the middle class, according to an article in Noema Magazine ([link removed] ) .
// 🧑‍🤝‍🧑 It won’t be long before you know someone with an AI significant other. An article in Fast Company ([link removed] ) explored the rise of romantic chatbot apps.
// 🇪🇺 Big Tech companies signed an accord in Europe to combat AI-generated election disinformation, according to an article in Euro News ([link removed] ) .
Partner news & opportunities
// Virtual event on online governance
February 28th at 12pm ET
Nathan Schneider ([link removed] ) , founder of the Media Economies Design Lab at University of Colorado, is releasing a book, Governable Spaces: Democratic Design for Online Life. In a book launch seminar with Metagov ([link removed] ) , he will explore why governance in our everyday online spaces matters. Register here ([link removed] ) .
// Data Empowerment Fund: $50,000 & $100,000 grants available
Data Empowerment Fund ([link removed] ) is open to proposals. The goal of the fund, which is powered by the Omidyar Network ([link removed] ) and other partners, is to support initiatives that enable greater individual agency or community control over data. Learn more and apply here ([link removed] ) .
/ Project Liberty Foundation is advancing responsible development of the internet, designed and governed for the common good. /
Thank you for reading.
Facebook ([link removed] )
LinkedIn ([link removed] )
X Logo (formerly Twitter) ([link removed] )
Instagram ([link removed] )
PLslashes_logo_green ([link removed] )
501 W 30th Street, Suite 40A,
New York, New York, 10001
Unsubscribe ([link removed] ) Manage Preferences ([link removed] )
© 2023 Project Liberty