Terror of terabytes.
One terabyte, the smallest practical measure for our project, is a million megabytes, which is equivalent to the textual content of a million books. An exabyte, which is what we use to report the final results, is a billion gigabtyes.
Capabilities of compression.
Conversion to ASCII, MP3, MP4 and other compression technologies dramatically reduces storage requirements by one to two orders of magnitude.
Democratization of data.
Individuals produce significant amounts of non-digital information. As photos and videos move to digital formats, households will have to manage terabytes of data.
Dominance of digital.
Ninety-three percent of the information produced each year is stored in digital form. Hard drives in stand-alone PCs account for 55% of total storage shipped each year.
Print and film content is rapidly moving to magnetic and optical storage. This is true for professional use now, and will become increasingly true at the level of individual users.
Tape in transition.
Magnetic tape is about 10 times as large as disk storage, but is used almost exclusively for archives. Disk storage is much more attractive, even for archives, due to its rapidly declining cost and the fact that it is much easier to access data stored on disk.
Paucity of print.
If all printed material published in the world each year were expressed in ASCII, it could be stored in less than 5 terabytes.
Immensity of images.
Over 80 billion photographs are taken every year, which would take over 400 petabytes to store, more than 80 million times the storage requirements for text.
Convenience of copies.
There is a lot of redundancy both across and within media. A newspaper, for example, is composited using digital technology, printed on paper, then archived on microfilm. Estimates of ``unique'' information can only be taken as approximate.
Ubiquity of the US.
The US produces 35% of all print material, 40% of the images and well over 50% of the digitally stored content produced in the world each year.