Internet Archive
 |
Internet Archive headquarters. |
The
Internet Archive (
IA) is a
non-profit organisation dedicated to maintaining an
archive of
Web and
multimedia resources. Located at the
Presidio in
San Francisco, California, this
archive includes "
snapshots of the
World Wide Web" (
archived copies of pages, taken at various points in time),
software,
movies,
books, and
audio recordings (including recordings of live concerts from
bands that allow it). The Archive makes the collections available at no cost to researchers, historians, and scholars.
The Archive was founded by
Brewster Kahle in
1996.
According to its website::
Most societies place importance on preserving artifacts of their culture and heritage. Without such artifacts, civilization has no memory and no mechanism to learn from its successes and failures. Our culture now produces more and more artifacts in digital form. The Archive's mission is to help preserve those artifacts and create an Internet library for researchers, historians, and scholars. The Archive collaborates with institutions including the Library of Congress and the Smithsonian. Because of its goal of preserving human knowledge and artifacts, and making its collection available to all, proponents of the archive have likened it to the
Library of Alexandria.
The archive also maintains the "Wayback Machine", with content from
Alexa Internet. This service allows users to see archived versions of
web pages, what the Archive calls a "three dimensional index".
The Wayback Machine's archive is gradually made available. It can take from six to twelve months for an archived snapshot to appear. As an alternative, librarians and scholars who want to permanently archive material and immediately cite an archived version can use the
Archive-It system instead.
As of 2004 the Wayback Machine contained approximately a
petabyte of data and was growing at a rate of 20 terabytes per month, increasing by two thirds the 12 terabytes/month growth rate reported in 2003. Its growth rate eclipses the amount of text contained in the world's largest libraries, including the
Library of Congress.
A copy of the data is also maintained at
Bibliotheca Alexandrina.
The name "Wayback Machine" is a reference to a segment from
The Rocky and Bullwinkle Show in which
Mr. Peabody, a
bowtie-wearing
dog with a
professorial air, and his human assistant, Sherman, use a
time machine called the "WABAC machine" to witness famous events in
history.
*
The Internet archive at the New Library of Alexandria*
The UK National Archives Government ArchiveMost of their movies, books, and recordings are
public domain or licensed under a
Creative Commons License. The audio section largely includes
music from independent
artists, as well as more established
artists and musical ensembles with permissive rules in regards to the recording of their concerts (e.g.
The Grateful Dead,
String Cheese Incident,
Toad the Wet Sprocket,
311,
Fugazi, etc.).
The Internet Archive operates the
Open Library where a small number of scanned public domain books are made available in an easily browsable and printable format.
Moving Image Collection
Aside from feature films, their Moving Image collection includes:
newsreels; classic
cartoons; pro- and anti- war
propaganda; Skip Elsheimer's "A.V. Geeks" collection; and ephemeral material from
Prelinger Archives, such as advertising, educational and industrial films and amateur and home movie collections.
Their
Brick Films collection contains
stop-motion animation filmed with
LEGO bricks, some of which are 'remakes' of feature films. The
Election 2004 collection is a non-partisan public resource for sharing video materials related to the
2004 United States Presidential Election. The
Independent News collection includes sub-collections such as the Internet Archive's
World At War competition from 2001, in which contestants created short films demonstrating "why access to history matters." Among their most-downloaded video files are eyewitness recordings of the devastating
2004 tsunami.
Some of the films available on the Internet Archive are:
Scientology sites
In late
2002, the Internet Archive removed various sites critical of
Scientology from the Wayback Machine. The error message stated that this was in response to a "request by the site owner". However, it was later clarified that lawyers from the Church of Scientology had demanded the removal, on unknown legal grounds, and that the actual site owners did
not want their material removed.
Archived web pages as evidence
In an October 2004 case called "
Telewizja Polska SA, Inc. v. Echostar Satellite", the Wayback Machine archives were used as a source of admissible evidence, perhaps for the first time. Telewizja Polska is the provider of
TVP Polonia, and
EchoStar operates the
Dish Network. During the trial's proceedings, EchoStar offered Wayback Machine snapshots as proof of the past content of Telewizja Polska's website. Telewizja Polska attempted to suppress the snapshots on the grounds of
hearsay and unauthenticated source, but Magistrate Judge Arlander Keys rejected Telewizja Polska's assertion of hearsay and accepted an affidavit from an Internet Archive employee as sufficient to authenticate the snapshots for admissibility.
Grateful Dead
In November 2005, free downloads of
Grateful Dead concerts were removed from the site.
John Perry Barlow identified
Bob Weir,
Mickey Hart, and
Bill Kreutzmann as the instigators of the change, according to a
New York Times article.
Phil Lesh commented on the change in a
November 30,
2005 posting to his personal website::
It was brought to my attention that all of the Grateful Dead shows were taken down from Archive.org right before Thanksgiving. I was not part of this decision making process and was not notified that the shows were to be pulled. I do feel that the music is the Grateful Dead's legacy and I hope that one way or another all of it is available for those who want it.A
November 30 forum post from
Brewster Kahle summarized what appeared to be the compromise reached among the band members. Audience recordings could be downloaded or
streamed, but
soundboard recordings were to be available for streaming only. Concerts have been since re-added.
*
WebCite*
Alexa Internet*
Digital library*
Digital preservation*
Heritrix*
Link rot*
Project Gutenberg*
Universal library*
Web archiving*
Web crawlerScientology controversy
*
CNET story*
Forum post at archive.org*
LawMeme articleWayback Machine archives as legally admissible evidence
*
Internet Archive's Web Page Snapshots Held Admissible as Evidence, from a
Stanford University website
Grateful Dead controversy
*
Wrath of Deadheads stalls Web crackdown, a
New York Times story via the
International Herald Tribune*
Phil Lesh's Hotline, where a
November 30 2005 message commented on the controversy
*
Good News and an Apology: GD on the Internet Archive, Brewster Kahle's forum post at archive.org
*
The Internet Archive*
The Open Library*
Petabox, a useful invention created in collaboration with the Internet Archive*
Wayback Machine*
Pictures and descriptions of the Wayback Machine hardware, with cost information*
Form 990-PF for Internet Archive (2003)*
Archive-It 1.5 Press Release and
Archive-It FAQ*
Warrick - a tool for recovering websites from the Internet Archive and search engine caches