How Much Information?
About the Project
Executive Summary
Print
Film
Optical
Magnetic
Internet
Broadcast
Phone
Mail
Acknowledgements
Site Map

About the Project



Conversion Factors

To estimate the storage requirements for these non-digital information sources, one must make some assumptions about the form in which they will be stored and the degree to which they will be compressed. There is considerable variation in the "bytes per page" proposed by various sources, partly because of differing assumptions about the content (e.g. ratio of text to pictures), formatting (e.g. number of characters per page, amount of white space), and file storage method (e.g. whether converted to a text or compressed document image file). Here is a sample of the range:
  • According to Webopoedia: "A disk that can hold 1.44 megabytes, for example, is capable of storing approximately 1.4 million characters, or about 3,000 pages of information." 1.44 MB = 1,509,949 bytes = 3,000 pages = 503 bytes per page
  • According to Horison Information Strategies: 2,400 bytes (2.3 KB) for one page of paper. 300 - 500 MB for large encyclopedia
  • According to Michael Lesk: 5,000 bytes (about 5 KB) per page (Source: "How Much Information is There in the World?", 1997)
  • 30 - 50 KB for a page of mathematical text in bitmap form (Source: Andrew Odlyzko, "Tragic loss or good riddance? The impending demise of traditional scholarly journals," 1994)
  • According to Roy Williams of CalTech, "Data Powers of Ten": 10 KB per encyclopedic (small font, dense formatting) page, 50 KB for a compressed document image page
  • According to ArchiveBuilders.com 50 KB for one scanned page but 800 KB for one scanned engineering drawing
  • According to JSTOR, a digital library of scholarly journals, one scanned document page is 130 KB; compressed using Cartesian Perceptual Compression (CPC) a page is only 26 KB.
  • According to Internet Archive: 1 MB for one mystery novel. 1 copy of Encyclopedia Brittanica = 1 gigabyte (2,619 pages per copy so 409,981 bytes or about 400 KB per page)
To demonstrate this range, for our estimates, we will use three different numbers: one for a scanned image (uncompressed, at archival quality - 600 dpi), one for a compressed image file, and one for and ASCII (or plain text) file. For the summary statements, will use the mid-range number, as most documents are scanned with some measure of compression. We opted to use Cartesian Perceptual Compression (CPC) as our compression standard because although it is a lossy algorithm, it is "non-degrading" - the human eye can't determine any difference. And it keeps the full 600 dpi resolution.



Originals

Books

Conversion Factors

To estimate the size (in bytes) of all books published in the world, use the following formula:

Average number of pages per book * Average file size per page * Number of books published per year.


Table 1: Books Conversion Factors

Average # of Pages per Item

Storage Format

Average File Size

Conversion Factor (rounded)
300 Scanned TIFF (600 dpi) 130 KB 40 MB/Item
Compressed 26 KB 8 MB/item
Plain text 2.5 KB 1 MB/Item


Flow

World

Worldwide, in 1996 there were 808,066 titles published, according to UNESCO's Statistical Yearbook 1998, as cited in The Bowker Library and Trade Almanac. This is roughly equivalent to 7 TB/year. The editor of the Almanac notes problems with UNESCO's data-gathering--inconsistencies in reporting, data collection lags, and missing data; nonetheless, the editor acknowledges that UNESCO provides researchers the best and most usable overview of international book title output. If one includes the most recently available title production figures for countries that did not show 1996 data (such as China, France, and the Netherlands), the annual world total increases to 968,735 - about 8 TB/year. This number is still not quite accurate because there are still about 70 countries of the world not documented by UNESCO. However, it is the best available estimate we were able to locate.

United States

In 1997, about 64,711 books were published in the United States, according to the U.S. Census Bureau's 1999 Statistical Abstract of the United States. The Bowker Annual Library and Book Trade Almanac provides a similar figure for 1997 (65,796 titles) and notes that this is a decrease from 1996's all-time high figure of 68,175 titles. The preliminary estimate for 1998 suggested another decrease, to 56,129 books, roughly equivalent to .5 TB per year.

Stock

World

To estimate the international stock of books currently available for purchase, we note that the United States produces about 40% of the world's printed material. (Source: US Industry and Trade Outlook 2000) The national library and copyright repository of the United States--the Library of Congress-- contains about 26 million books. Therefore, using the 40% rule of thumb, the world stock of books might be approximately 65 million titles.

United States

Books In Print is an authoritative source for books published and/or distributed in the United States. The 1999-2000 edition of Books in Print (with supplement) includes 1,663,815 books - about 13 TB in all. (This number does not include: books not published and/or not distributed in the U.S.; books not available to the trade or general public; free books not included with a title for sale; unbound material, pamphlets and booklets; periodicals and serials; books only available to members of an organization; subscription-only material, and music manuscripts, sheet music, and librettos.) According to a press release from January 2000, booksinprint.com 2000 includes 3.2 million titles - about 26 TB total. (Of these, about 500,000 titles are out-of-print.) This figure is consistent with online booksellers: Barnes & Noble.com asserts that it has 3 million titles available while Amazon claims to have 4.5 million titles. Source: Wall Street Journal, July 17, 2000, "A New Chapter: Independent Booksellers Hope to Find Strength in Numbers" by Scott Eden.

Rate of change

The number of titles published each year in the United States and internationally has risen gradually and fairly consistently over the last 10 years, apart from a slight downturn in US title production in 1997.



Newspapers

Conversion Factors

A large metropolitan daily US newspaper will run approximately 60 pages each day, and may vary from 48 to 72 pages. Therefore one year of a large US newspaper would consist of about 21,900 pages per year. To account for the smaller regional papers and those which are not published daily, we opted to use 30 for the average number of pages, which results in 10,950 pages per year.

A double truck (center fold) full broadsheet is 24 in X 36 in. Because a newspaper page would be scanned at higher resolution and contains detailed graphics, a double - truck would require about 1 Megabyte (uncompressed) and a single full broadsheet page (18 X 24 inches) would require about .5 MB.


Table 2: Newspapers Conversion Factors

Average Number of Pages per Year

Storage Format

Average File Size (per page)

Conversion Factor (rounded)
10,950 Scanned TIFF (600 dpi) 500 KB 5475 MB/item
Compressed 100 KB 1095 MB/item
Plain text 10 KB 110 MB/item


Flow

World

The number of newspapers in the world as of 1999 is 22,643, according to the International Standard Serial Number Register. UNESCO's Statistical yearbook from 1996 offers a different, much smaller figure: 8,391. The discrepancy is due largely to the fact that multiple ISSNs may be assigned to what is essentially the same information product if it exists in two different formats (paper, online, floppy disk, CD-ROM, microform). For example, the New York Times print version and NYT online have two different ISSNs. Other differences may be attributed to the fact that UNESCO's excludes non-daily newspapers from its statistics or that the UNESCO data set is less complete than that of ISSN.

Using the ISSN figure and the compressed format:

.1MB/page * 10,950 pages/year * 22,643 newspapers/world = 25 TB/year

United States

According to the Newspaper Association of America, there were 1,489 daily newspapers and 897 Sunday - only newspapers published in the United States in 1999.

Stock

The Library of Congress provides useful statistics on the nationwide stock of printed periodicals and newspapers. Most library print periodical holdings have been transferred to microfilm, microfiche, or simply made accessible online, through efforts such as the National Endowment for the Humanities' Newspaper Project. However, bound periodicals do still exist at the LOC: about 30,570 newspapers and 557,738 serials.

Rate of Change

The number of newspaper titles and the circulation figures have both declined slowly over the past 10 years.



Periodicals

Conversion Factors


Table 3: Periodicals Conversion Factors

Periodical Type

Average # of Pages per Item (annual total)

Storage Format

Average File Size (per page)

Conversion Factor (rounded)
Scholarly Journal 1,723 Scanned TIFF (600 dpi) 130 KB 225 MB/item
Compressed 26 KB 45 MB/item
Plain text 2.5 KB 4 MB/item
Mass-Market Periodical 5,000 Scanned TIFF (600 dpi) 130 KB 650 MB/item
Compressed 26 KB 130 MB/item
Plain text 2.5 KB 13 MB/item
Newsletter 150 Scanned TIFF (600 dpi) 130 KB 20 MB/item
Compressed 26 KB 4 MB/item
Plain text 2.5 KB .4 MB/item


World

According to the 2000 Ulrich International Periodical Directory, there are 158,000 unique periodical titles in the world. If one wished to include all serial publications, for example, yearbooks, annual publications of all kinds, not just journals, one could cite the International Standard Serial Number statistics: 617,801 for all serials around the world.

It is useful to differentiate between scholarly journals and mass market periodicals because of the differences in number of pages per year. According to a recent study on Economic Cost Models of Scientific Scholarly Journals, "A typical 1995 scientific scholarly journal has 8.3 issues... and 1,723 pages (or 208 pages per issue)." (Source: http://www.bodley.ox.ac.uk/icsu/kingppr.htm) A typical mass-market periodical, (such as Time, Newsweek, or People) that is published weekly, will be about 5,000 pages per year. (Source: Robert M. Hayes, "The Economics of Digital Libraries", http://www.usp.br/sibi/economics.html). There are also newsletters, smaller than both the scholarly journal and mass-market publications, at about 100 pages per year.

Using the ratios within the US periodical statistics as a reference point (see below), we can estimate that about half of the world's periodical publications are mass - market magazines/tabloids, one-fourth are scholarly journals, and one-fourth are newsletters.

United States

According to the 1999 US Statistical Abstract, there are 12,036 periodicals published in the United States. Given the world total of 158,000, this seems low. Another estimate on United States periodical production comes from the National Directory of Magazines: 20,613 titles as of August 2000 (this figure includes Canada).

The estimated number of scientific journals published in the United States increased 62 percent, from 4,175 in 1975 to 6,771 in 1995 (Source: Designing Electronic Journals With 30 Years of Lessons from Print, Tenopir and King, 1998 http://www.press.umich.edu/jep/04-02/king.html) Data on periodicals in the field of modern languages and literature may serve as a rough index of the trends in the general availability of scholarly journals outside the sciences. According to the MLA Directory of Periodicals, there are currently 3,700 periodicals in the areas of literature, language, linguistics, and folklore covered regularly in the MLA International Bibliography. This means that there are about 10,500 scholarly journals in the United States.

According to the Newsletter and Electronic Publisher's Association (NEPA), there is no way of really knowing exactly how many newsletters exist, because they are not required to be registered anywhere. There are two directories of newsletters: the Hudson Subscription Newsletter Directory lists over 5,000 subscription newsletters while the Oxbridge Directory of Newsletters includes over 20,000 newsletter titles, of which they say 9,000 are subscription newsletters. NEPA's best guess is that there are approximately 10,000 subscription newsletters in print today. (Source: http://www.newsletters.org/faqs.htm)



Office Documents

Conversion Factors

Table 4: Office Documents Conversion Factors

Average # of Document Pages per Year

Storage Format

Average File Size

Total Annual Production
7,500,000,000 Scanned TIFF (600 dpi) 130 KB 975 TB/year
Compressed 26 KB 195 TB/year
Plain text 2.5 KB 19 TB/year


Flow

United States

To estimate the amount of information generated by offices, one might begin by looking at the statistics of the Federal Government, which is the single largest employer in the United States, with 1.8 million civilian workers as of 1998 and 1.2 million individuals in the armed services. The Federal Government, in total, employs about 3% of the nation's workforce.

The National Archives in Washington D.C. retains 2% of what the government produces, across a range of media. According to Archives representative Eric Chaskas, the Archives retains only what is deemed to be of some permanent historical value. Document types include correspondence, registers, reports, forms, treaties, case files, and log books. The perceived value determines how long a record will be retained--some will be kept indefinitely, while others are retained for no more than 6 months. An effort is made to prevent duplicating records but there is still some degree of overlap. Although the NARA representative was unable to quantify how much new information is accessioned each year, he did state that the current archival holdings, as of July 2000, include 4 billion pieces of paper, occupying a total of 21.5 million cubic feet. That is about 200 pieces of paper per cubic foot.

If one divides 4 billion by the number of years the United States government has existed (224 years), one could obtain a rough number of pages collected per year by dividing 4 billion by 224 - the result is about 18,000,000 pages per year.

The current accession rate, however, appears to be much higher. Each year, Federal agencies submit about 4,000 items and about 75% of these (3,000) are processed for archival. Although the Archives does not publish statistics on the average size of these items, it is known that NARA adds a total of 500,000 cubic feet of mostly paper-based records each year. As previously noted, in archives, 1 cubic foot can hold 200 pieces of paper, so the total annual accession rate is therefore about 100,000,000 pages per year.

If this represents 3% of the nation's workforce, then one could estimate that United States companies produce a total of more than 3 billion archiveable pages each year, equivalent to 78 terabytes.

World

Using our rule of thumb that the US produces about 40% of the world's printed materials, we can estimate that each year the world produces 7.5 billion archiveable pages, which would be equivalent to 195 terabytes.

Stock

Again, consider the US National Archives with 4 billion pieces of paper. If this is 3% of the United States total, then the total stock of paper held by US companies is somewhere on the order of 130 billion sheets of paper - about 3,380 terabytes.

Rate of Change

Despite the increasing use of computers for communication and storage, the production and retention of paper office records is on the rise. The materials within the National Archives has increased in volume by more than one-third, just in the past decade.



Visual Material (Maps, Posters, Prints and Drawings)

This is a difficult category to quantify because, unlike the book, newspaper, and periodical industries, there is no central agency monitoring the production of originals or serving as a repository for the items. The best method for approximating the production statistics is to consider the holdings of the National Archives and the Library of Congress.

Conversion Factors

According to the Library of Congress, maps scanned in at 300 dpi tend to be large, approximately 100 to 300 MB. (Source: http://lcweb2.loc.gov/ammem/formats.html#V)

Flow

For photographs, see the Film Summary portion of this paper.

During 1998, the Library of Congress added 60,150 maps, 1,881 posters, and 4,723 prints and drawings to its collections.

Stock

World

[???]

United States

The Library of Congress has 4,523,049 maps in its collections. 85,216 posters and 405,708 prints and drawings. The National Archives has another 5 million maps and drawings, as well as 13 million still photographs and 9 million aerial photographs.



Copies

For the past 10 years, the US has produced about 30% of the world's paper and paperboard output (US Industry & Trade Outlook 2000). Other major producers are Asia, Latin America, Canada, and Europe. As of 1998, total global capacity was 333.6 million metric tons, with growth predicted so that by 2001 the annual total is expected to be 348.1 metric tons. A recent article in the Economist supports this forecast stating that, during the period 1993 - 1998, "the production of printing and writing paper in North America has grown by over 13%. Worldwide it has doubled since 1982."

There are four different commodity segments in the paper/paperboard industry, only two of which are relevant to this study: printing and writing paper and newsprint. In 1999, the US produced 23.8 million metric tons of printing and writing paper and 6.4 million metric tons of newsprint. The most recent global production figures (available from UNESCO's Statistical Yearbook) are for 1997, at which time the world was producing 90 million metric tons of printing and writing paper and 36 million metric tons of newspaper.

For the estimates of printing/writing paper, we used the midrange figure of 26 KB per 8 1/2 X 11 page. We estimated 500 sheets of standard grade 8 1/2 X 11 paper would weigh 5 pounds. Therefore, each metric ton (2204 pounds) equals about 440 500-sheet-reams or 220,000 sheets. Multiplying by 26 KB per page results in 6 GB per metric ton, for a total annual storage capacity of 142,800 possible terabytes (US) and 540,000 terabytes (world).

For newspapers, we estimate that storage requirements would be be about 2 times that for writing/printing paper (12 GB per metric ton). Newspapers tend to contain more words and graphics per page, requiring, on average 1 MB per page. Furthermore, newsprint is thinner and lighter, so each metric ton contains more individual sheets. US newsprint production in 1997 equalled about 80,000 terabytes and international production was about 432,000 terabytes.

These figures provide an extreme upper bound on the total number of bytes required to digitally store information currently produced in printed format each year.

Books

About 1.1 billion books were sold in the United States in 1999. (Source: Wall Street Journal, July 17, 2000, "A New Chapter: Independent Booksellers Hope to Find Strength in Numbers" by Scott Eden.) This is an increase of more than 3% from 1998 when about 1,037,000 books were sold. (Source: The 1999 Consumer Research Study on Book Purchasing, (Section II) Detailed Findings: Consumer Adult Book Purchasing).

Newspapers

In the United States 55,979,332 daily newspapers and 59,894,381 Sunday newspapers circulate each year. (Source: Newspaper Association of America.)

Periodicals

The total number of US magazines circulated annually exceeds 500 million. (Source: US Industry and Trade Outlook 2000.)

Office Documents

Each year, almost 500 billion copies are produced on copiers in the US (13,000 TB); nearly 15 trillion copies are produced on copiers, printers, and multi - function machines. 15 trillion copies is roughly equal to 390,000 TB. This accounts for the majority of the printing/writing paper used each year. (Source: XeroxParc.) For specific information on fax printing, see the Telecommunications Summary of this report.

A 1998 Coopers & Lybrand study showed that the average office makes 19 copies of each document. The average office loses 1 out of 20 office documents. It then costs: $120 to search for the document; $250 to recreate it, if lost (1 lost document = $370). (Source: NAGARA, Records Management Technical Bulletin www.nagara.org/rmbulletins/bulletin_2.htm)

The Government Printing Office, which handles the printing needs of the entire Federal community, uses or sells more than 55 million pounds of paper each year. Approximately 10,000 titles are available for sale to the public at any given time.



Fun Facts About Print Media

Paper Size Equivalencies

According to ArchiveBuilders.com: 1 pulp tree (loblolly pine) = 1/10th cord of wood = 10,000 pages = 1 file cabinet = 4 boxes = 1/2 Gigabyte = 1 CD

Scope of the Library of Congress Collections

"LC21: A Digital Strategy for the Library of Congress," (a report from an advisory group at the National Academy of Sciences about strategic directions for the applications of information technology in the Library) reports the following statistics about the collections of the Library of Congress:
  • 26 million volumes (pages 2-3)
  • 1 million serials (pages 2-3)
  • 4.5 million maps (pages 2-5)
  • 1,000,000 audio recordings (pages 2-6)
  • 200,000 cans of film (pages 2-6)
  • 13.5 million images (pages 2-7)
  • 70,000 periodicals (pages 2-7)
  • 1,400 newspapers (pages 2-7)
The Library receives some 22,000 items each working day and adds approximately 10,000 items to the collections daily.

Paperless Society?

With the spread of the personal computer, many predicted the advent of the paperless office. So far, this prediction has not come to pass. One one hand, according to BIFMA, a business and institutional furniture manufacturer's association, as of Dec. 16, 1999, the percentage of file cabinets in relation to other types of office furniture, like seating, desks, and tables, has steadily decreased, from 16% in 1988 to 12.9% in 1998. This would lead one to believe that less paper is being generated to require storage space. On the other hand, the installation of home computers and expansion of home offices has led some analysts to predict that by 2003 the consumption of standard office paper will double from the 1996 level (Source: 1999 U.S. Industry and Trade Outlook, 10-3).

Growth of Online Periodicals

Of the more than 157,000 serial titles reported in the latest edition of the international periodicals directory, 10,332 were available exclusively online or in addition to a paper counterpart (Source: Ulrich's International Periodicals Directory, 1999, p. vii).

Bulk Mail

As of 1990, the United States Postal Service handled almost 90 billion pieces of first-class mail per year and another 75 billion pieces of second- , third-, and fourth-class mail.

Americans receive almost 4 million tons of junk mail a year. About 44% of the junk mail is never opened. Every person in the United States receives junk mail that represents the equivalent of one and a half trees a year. (Source: The Consumer Research Institute's Stop Junk Mail Page)

For more information on flows of information through the mail, please see the Mail Summary of this report.



Print Media Bibliography


Charts



© 2000 Regents of the University of California