Magnetic Storage Media
The first magnetic storage mechanism was the Telegraphone invented in 1898 by Danish scientist Valdemar Poulsen.
The digital revolution has so far been magnetic. The vast majority of the world's data is now created, transported and stored in electro-magnetic systems.
Hard Disk Drives (World)
No storage medium has ever had the explosive growth demonstrated by the hard disk.
Source: IDC (1999) "1999 Winchester Disk Drive Market Forecast and Review"
The incredible growth in hard disk shipments has been accompanied by a relentless decrease in the cost per gigabyte of storage capacity:
Details regarding hard disk shipments.
Rates of Growth
The projections for the growth over the next few years in hard disk sales are for units shipped to increase at an annual rate of 15 to 20% but for the actual capacity shipped to grow much faster - 70 - 80% annually.
According to Disk/Trend, 75% of the disk drives sold are for desktop computers, followed by 13% for servers and 12% mobile drives for laptop computers. Therefore, although enterprise level disk storage systems each may have huge capacity, the vast number of disks deployed to individual workstations really accounts for the enormous scale of the world's current digital storage capability.
The lifespan of a hard disk is approximately 3 years. The storage capacity of the hard disks shipped in 1998, 1999 and 2000 is 4672 petabytes, or roughly 5 exabytes. In order to appreciate the scale of this statistic, consider that Roy Williams of CalTech advises that 5 exabytes is equivalent to the number of words ever spoken by all human beings.
The hard disk typically shipped with a desktop personal computer in 2000 holds 10 gigabytes.
Disk drives are expected to be deployed in applications other than personal computers, such as TV set-top boxes. These increasingly popular devices allow users to store TV shows on disk rather than tape and to stop and rewind during a broadcast while still recording. In all, by 2003 the IDEMA predicts that 8-10 percent of the disk drive market will be such devices.Similarly, starting in the year 2000 audio jukeboxes and computer game consoles will also include hard disk drives. Already the higher resolutions of digital cameras are creating such large file sizes that small hard disks will be incorporated into cameras as well.
Floppy Disks (World)
The number of floppy disk drives sold every year has remained relatively constant at around 100 million units for the past several years. Little change is expected. (Source: Computer Tech Review, April 1, 1999). The number of floppy disks being sold is diminishing rapidly as their storage capacity is too small to be useful in light of the much larger file sizes now common.
Source: International Recording Media Association
The Japanese Recording Media Industries Association, however, claim somewhat higher figures for floppy disk production, although confirming that the trend is still dramatically downward. This organization, for example, anticipated sales of 2.21 billion floppies in 1998. It would appear that an estimate of 2 billion for 1998, 1.5 billion for 1999 and 1 billion for 2000 would be a reasonable compromise around these figures. This would indicate that there would be a total stock of floppy disks of 4.5 billion, assuming 3 years production as stock.
One of the world's largest producers of 3.5 inch floppy disks is CMC Magnetics. They claim to have produced over 700 million disks in 1998. They also are purported to produce 56% of the world's floppy disks in 2000, implying a market of somewhere north of 1 billion disks.
Floppy disks are used primarily for backup and are little used now for original content creation.
Removable magnetic disk drives (World)
Removable drives are primarily used for backup, transfer of files, e.g,, desktop publishing files to service bureaus, or video or image editing. The general trend for low-end disk (capacity of around 100 to 250 megabytes, e.g. Iomega Zip Drives) sales is upward with a strong possibility that, if manufacturer incompatibility issues are ever resolved, this format could replace the 1.44 mb floppy. However, high capacity removable drives, with capacity in the gigabyte or better range, are being replaced by recordable CD's, which in turn may be replaced by recordable DVD's.
The high capacity drives (Iomega Jaz) come with a free cartridge and it is assumed that an additional three cartridges are sold with each drive. Source: San Francisco Chronicle, Jan. 23, 1998 "Does Bad News for Iomega Mean Horrible News for HMT?"
The amount of original content created directly to this medium is, therefore, probably quite low. Furthermore, the disks are regularly reused and not generally viewed as archival solutions.
Tape was the primary storage medium for the first generation of electronic computers in the 1950's. Reel-to-reel half-inch tape was used for data storage on mainframe computers from the earliest days of computing into the 1970's. Since that time, numerous tape formats have been developed. The worldwide installed base for tape drive units is 25.2 million.Michael Lesk estimated that in 1995, the magnetic tape industry would ship 200 petabytes of blank tape. (Lesk, M., "Preserving Digital Objects: Recurrent Needs and Challenges").
Current estimates are that approximately $1 billion of tape media will be sold every year. Source, Infostore July 1, 1999 (Tape: The Media Is the Message).
In a filing with the United States Securities and Exchange Commission in July 1999, Storage Tek, a large tape drive manufacturer, wrote that the cost of data storage on computer tape media was less than $.005/megabyte ($5.00/gb). Therefore, if predictions are correct that approximately $1 billion of computer tape media is sold every year, that implies worldwide annual tape storage capability of 200 petabytes. There may be some incongruity in these figures because the $1 billion may reflect manufacturer revenue for the product, rather than the retail cost of the product to end users, which would presumably be much higher. In fact, the Department of Commerce figures for the mid-1990's showed factory revenue for computer tape manufacturer's of around $600 to 700 million. Substantial amounts of tape media are manufactured in other countries so it is likely that $1 billion is a producer revenue figure.
If it is assumed that the retail price of the tape media is twice that of the manufacturer's then $2 billion of retail sales of tape would work out to around 400 petabytes of storage capacity. The markup at retail over manufacturer's prices is likely limited due to competition and the common practice of users purchasing bulk quantities of tape. Further, some moderation is due to the lower price of DAT format.
The Imation DC2000, or Travan, quarter-inch tape drive is a low-end product used primarily for the backup of desktop PC's. Their general capacity is in the range of 500 megabytes to 4 gigabytes. In August 2000, a Sony Travan Formatted MiniCartridge capable of holding 4 gb uncompressed was advertised for sale on the Internet for $29.49 each. A comparable 4gb tape cartridge manufactured by Maxell was available for $30.79.
Tandberg SLR (Scalar Linear Recording) is also a backup format for desktops and workstations and typically store 350 megabytes to 4 gigabytes.
4 mm tape drives are the largest segment of the market and use digital audio tape (DAT) format. They are commonly deployed as backups for PC servers. These drives generally provide backup in the range of 5 to 40 gigabytes (uncompressed). This format has an installed base of 7.6 million users.
8 mm tape drives provide storage in the 14 to 50 gigabyte range. Vendors include Exabyte (Mammoth), Sony (AIT), IBM (Magstar 3570).
DLT: (Digital Linear Tape) produced by Quantum Corporation. mid-range computer backup with 15 to 40 gigabytes of native capacity. There are more than 1.4 million DL Tape drives deployed and there have been approximately 40 million tape cartridges in this format sold. Quantum estimates that by the end of 2000, there will be 1.9 million DLT drives shipped to customers.
LTO Ultrium. A new format from consortium of IBM, Seagate and Hewlett Packard. The specification for the Ultrium format is for 100 gigabytes of native storage.
Enterprise Level Formats
1/2-inch cartridge: The dominant format in the mainframe, enterprise level storage market.
Automated tape libraries - which provide completely automated hands-off storage management, including random tape access, sophisticated robotics, unattended backup, and reduced labor costs - are expected to grow from less than 18,000 units shipped in 1996 to close to 120,000 units by 2002. (Source: Freeman Reports)
There is an industry rule of thumb that suggests a three-to-one ratio of disk capacity over tape be maintained.
The retail price in August 2000 of 3590 tape cartridges with 10 gb native capacity was $53.21. Fuji Film DLT Tape cartridges were also available at retail for $51.10 for 10 gb native capacity. Sony DLT tapes were being sold for $49.72 for 10 gb. uncompressed. (http://www.cleansweepsupply.com/pages/skugroup2599.html)
From low-end formats such as Travan, through most popular format DLT, to high-end 3590 format, retail price of roughly $5.00 per gigabyte of native storage capacity on tape seems reasonable estimate. (DAT tapes are the only exception and are a lot cheaper.) Of course, if larger purchases result in substantial discounts, then the revenue assumptions would commensurately be pushed more toward the $1 billion wholesale estimate; therefore, not really affecting the calculation of the price per gigabyte of storage capacity.
According to Computer Technology Review (March 1998) the total storage at a typical Fortune 1000 site is projected to escalate from 10 TB in 1997 to 1 PB by the year 2000. In the next five years, a typical large database system for US Government agencies is expected to accept 5TB per day and archive from 15 to 100 PB.
In 1995 Freeman Associates predicted that the total number of tape libraries would increase from 6,454 in 1994 to about 90,000 by the year 2000.
The estimate of the amount of original data stored on tape will, therefore, focus only on mass storage applications from large-scale scientific applications to heavily transaction oriented business applications. The installed base of IBM mainframe OS390 class computers is estimated by IDC to be around 16,500 in 2000.
The number of tape cartridges required to backup small sized computer disks is relatively few and will never substantially exceed the capacity necessary to backup the entire hard disk or disk array. Typical backup storage strategy is to store the entire file system once and then do incremental updates of any changes made, reducing the amount of storage necessary to keep a current copy of the entire file system at hand.
In large scale tape libraries, there may be thousands or even tens of thousands of magnetic tapes providing primary storage of the application data. The scale of storage requirements is growing rapidly as new facilities, such as the Large Scale Hadron Collider, are built and start performing experiments. Large scale databases are also becoming more common as corporations make increasing efforts to comprehensively track consumer transactions.
The number of households conducting banking transactions may reach 32 million by 2003. The cost of an Internet banking transaction is an estimated 1 cent, compared with $1.14 per transaction by teller, 55 cents by phone, 29 cents by ATM and 2 cents by proprietary computer system. (Source: "Banking on the Internet")
The importance of massive scale databases in general commercial arenas is exemplified by the experience of Wal-Mart, a leader in so-called "data mining" technology and the owner of one of the largest privately held data sets. The U.S. Department of Commerce in its July 2000 report "Digital Economy 2000" points out that "over a three-year period, Wal-Mart achieved a 47 percent increase in sales on only a 7 percent increase in inventories by using a relational database system running on massively parallel computers. The system allows vendors to access almost realtime information on sales and customer transactions and handles 120,000 queries each week from 7,000 suppliers."
Digital Data Creation
Computers for the most part may not greatly contribute to the production of new and original data, but the great exception is in scientific explorations where huge data sets are commonplace and where new discoveries rely on computing and storage.
High Energy Physics
The Large Hadron Collider is being built at CERN in Switzerland, it is expected to be conducting production experiments in around 2005. It is expected to generate approximately 20 petabytes of data per experiment at rates of 100-1500 megabytes per second. Currently experiments in high energy physics generate data at the rate of 35 megabytes per second and many hundred terabytes per experiment. Obviously, this is all original data.
Source: Shiers, Jamie, "Massive-Scale Data Management using Standards-Based Solutions" IEEE 16th Symposium on Mass Storage Systems.
The BaBar experiment at SLAC will generate approximately 200TB/year of data at a rate of 10MB/sec for 10 years.
Los Alamos National Laboratory estimated total storage capacity in its open storage system at 243 terabytes and in its secure system at 2.31 petabytes as of 1998. It is also anticipated that storage capacity will grow to 5 petabytes in 2001.
The majority of data held and administered by the National Oceanic and Atmospheric Administration are held at three national data centers: the National Climatic Data Center at Asheville, NC; the National Oceanographic Data Center at Silver Spring, MD; and the National Geophysical Data Center at Boulder, CO. The climatic data is by far the largest of the three collections, holding approximately 640 terabytes on 350,000 magnetic tapes. The geophysical and oceanographic data total a combined 12 terabytes on 14,500 tapes.
The NASA Center for Computational Sciences in Greenbelt MD has 27,692 tapes holding data as of August 2000. This Center is using 3590 and 9840 tapes which hold 20 gb per tape uncompressed. This Center also automatically makes duplicate tapes for all new data generated. As of August 2000, this storage facility holds 92.5 terabytes of unique data and over 162 terabytes counting the duplicate data. New data is received at the rate of approximately 200-300 gigabytes per month.
The University of Tokyo stores satellite images in an environmental digital library of approximately 6 terabytes, approximately 60,000 images that average 100 megabytes in size.
The San Diego SuperComputer Center stored as of August, 1998 approximately 65 terabytes. The SDSC as of that time held the data on approximately 11,000 tapes. As of mid-2000, the SDSC is storing 1.5e+15 bytes.
The National Center for Atmospheric Research (NCAR) in 1996 had about 68 TB and was growing at the rate of 1.5 TB per month.
Analog Storage Tape
The first uses of magnetic media for data storage occurred about fifty years ago with the development of magnetic tape. A number of formats have evolved over the decades but today by far the most prevalent are the cassette tape used for the mass market distribution of prerecorded music. The other important use of magnetic tape for the storage of analog signals is videotape recordings.
Analog Audio Tape
The distribtution of prerecorded music is one of the most common uses of magnetic tape. The sales of music in this format, however, are now much smaller than they have been historically and are generally expected to continue to decline as digital media become more prevalent and convenient.
Although there has been a dramatic decline in the overall sales of cassette tapes due to the availability of music on alternative formats, there has been a booming market for books-on-tape. According to the Audio Publishers Association, this market is estimated to have grown 100% since 1990. The reasons for this are that cassettes can run 40 minutes longer than CDs, they have a built-in "bookmark", and they are frequently listened to in the car, and 75% of cars are manufactured with only an AM/FM cassette radio.
The IFPI reports that for the entire world prerecorded music sales on cassette tape were down by 11% in 1998 to 1.2 billion due to depressed sales in Asia.
Blank Audio Tape
The U.S. shipments of blank audio tape dropped dramatically during the 1990's.
Worldwide shipments of blank audio tape are expected to decline in 2000 to 921 million units from 971 million in 1999, with an anticipated market for 771 million cassette tapes by 2003. Source: Consumer MultiMedia Report, Dec. 27, 1999.
Analog Video Tape
Prerecorded VideoTapes (VHS Format) (World)
1997 - 1.666 billion
1998 - 1.719 billion
1999 - 1.748 billion
2000 - 1.664 billion
2001 - 1,561 billion
Source: International Recording Media Association
The main use of the blank video tape is the consumer's use to record televison programs. It is anticipated that there would be a large drop in the sales of this tape if pay-per-view television shows carried copy protection. It is estimated that a very large share of the users of video cassette recorders do so for time shifting of viewing programs.
Blank VideoTapes (VHS Format T-120 equivalent units) (World)
1997 - 1,485 million
1998 - 1,446 million
1999 - 1,463 million
2000 - 1400 million
2001 - 1,275 million
Source: International Recording Media Association
Another view of the blank video market came from British research firm Understanding & Solutions. Their prediction was for 1.147 billion blank videocassettes in 1999 compared to 1.146 in 1998.
The stock of videotape in 1997 was estimated at about 4.6 billion by Richard Kelly, Cambridge Associates.It is not clear whether this is referring to all videos, including those sold blank, or just prerecorded. Further, it is not specified whether this estimate is solely for the United States or includes the whole world. (Feb. 1997 Newsletter, IRMA) We have taken this estimate into account but not used it directly because it seems that the flow of new videotapes worldwide each year would yield a considerably higher figure, even assuming a lot of videotapes may be viewed as disposable after a few years. We have instead estimated a world stock of videotape of all format at around 10 billion.
Audio Conversion Issues
In translating the vast quantity of audio information available on cassette tape into its digital equivalent, we have chosen to use the CD format, linear PCM audio at a 16-bit word length and 44.1kHz sample rate. Although, professional recording studios use a sampling rate of 96kHz, the vast majority of tape recorded audio material is music for consumer use and the CD format is the digital format of choice for this application. The amount of data generated by this format is easily calculated. There are 44,100 16-bit samples taken each second for two tracks. Thus,1.4 million bits per second and 5.08 gigabits per hour are generated. The conversion to bytes yields 605 mBs per hour. (1 mByte = 1,048,576 bytes). This data is not compressed and yields a reasonable representation of music for most people.
Video Conversion Issues
In making assumptions about the size of analog videotape stores we have chosen to make conversions assuming the use of MPEG-2 video compression standard. In the case of videotape, the use of this conversion factor is seen as appropriate because it was designed as a generic format for digital multimedia and includes coding schema for both video and audio.
In the case of video, the massive amount of data generated requires that for any practical purpose some compression scheme must be used. MPEG-2 is now the international standard for video storage. Compression is achieved in two ways: spatial compression and temporal compression. The spatial compression is achieved by reducing the number of bits used to represent a single frame. Temporal compression, where the bulk of the savings come, attempts to encode only the bits that represent the portions of a frame that have changed from the previous frame.
The actual amount of compression that can be achieved with MPEG-2 varies quite a bit, we have assumed that 2 gigabytes is adequate to represent 1 hour of high-fidelity audio and high-definition video data.
References and Resources
Gibson, G.D. (1994): Audio, Film and Video Survey. A report on an international survey of 500 audio, motion picture film and video archives. Library of Congress, Washington DC.
Moving Pictures Expert Group