Sometimes this data has been protected, backed-up and stored for decades, or longer. In many cases, the tape drives or storage subsystems are out of maintenance, out of support, and problematic to use any longer.
Other times, the physical media itself is aging out with the tape media disintegrating so that it is uncertain if it will be possible to read data stored on it.
Until the time comes to move all that data, many customers simply think of it as a number — whether they’ve got a 200 terabyte or 60 petabyte pile of data. These are large, unimaginable amounts of data.
Data managers don’t have a way to visualize the actual volume of data they’re dealing with, and most are just flabbergasted when they truly understand the practical implications of moving it forward into a modern backup environment or growing it under current business requirements.
A petabyte of data is equivalent to 1000 terabytes or 1,000,000 gigs of data storage. To put this into perspective, here are some comparisons to real world examples:
Over the years, my teams have used a small number of ways to help customers visualize this pile. We had another opportunity on a recent visit to a customer who has over 150 petabytes of primary data on tape to move.
When I say over 150 petabytes of data on tape, I should mention that the customer also has a second copy on tape, which brings the total data stored on tape in their universe to over 300 petabytes. We started looking at all this data and asserted:
“Alright, if everything lines up and everything works perfectly in your environment today, and if we can run about 24/7 with the resources you currently have available, then you can move these 300 petabytes of data in a bit more than 7 years.”
The customer’s response was, “We knew it was going to be a long time but … what?!” So, we offered to put it into perspective using the ISBN metric.
What’s the ISBN metric? We agreed that most everyone knows what an international standards book number (ISBN) is. I like to use the complete works of Sherlock Holmes, printed in 1927, as an example. (It’s one of my favorite books of all time.)
According to Amazon, the standard hardcover collection measures 6.8 x 2.8 x 10 inches and represents approximately 4 megabytes of data for the rich stories within. Using this as our metric, we can multiply copies of this book to represent the customer’s data volume.
In this customer’s case, the volume of 300 petabytes of stored data is equivalent to moving a physical cube of books — about 75,000,000,000 copies.
"If you stacked these copies height wise from end to end, they would stack over 11,835,000 miles. This means 300 petabyte's worth of The Complete Sherlock Holmes would stack from the Earth's surface to the moon 49 times."
As you can imagine, it takes time and technology and people to move this amount of data. The ISBN metric brings this colossal amount of data out of the abstract and puts it in terms the customer can truly understand.
To assist customers, we discover the requirements, the facts, and determine the "why" for so much data. Then with the customer, we illustrate how to surgically extract only the data that’s needed, remove it from the legacy environment, and bring the valuable information into the modernized data protection environment — all efficiently and quickly while still governed by the laws of physics.
We typically see two main types of data migration where we do not disrupt the chain of custody:
And for this migrated data, we see two main types of policy management:
Sifting through a mountain of data is challenging, if not impossible, for even the most tenured data professionals.
To solve this issue, we at Cobalt Iron built analytics and automation into our Compass platform. This enables end users to assess and tackle the monumental task of data migration more efficiently by “just feeding the tape.” In some cases, it's even allowed end users to eliminate tape all together.
In short, we take the mystery out of that mountain of data, and we enable effective movement of data into an environment that unshackles end user's data for business use across the enterprise.
Equally or more important for our customers: Information that previously was locked up in tape and essentially inaccessible becomes instantly available, cataloged, and valuable to the enterprise not only for compliance or legal audits but also as a rich resource of historical data that can be analyzed and leveraged to drive better business outcomes.
The true size of a petabyte is daunting when put into practical terms such as an equivalent stack of Sherlock Holmes compilations. Nevertheless, it is possible to get over that mountain and realize the business benefits that come from maintaining data in a modern enterprise backup environment. For a comparison, here is a chart detailing various sizes of data:
Byte (B): 1 Byte
Kilobyte (KB): 1,024 Bytes
Megabyte (MB): 1,048,576 Bytes
Gigabyte (GB): 1,073,741,824 Bytes
Terabyte (TB): 1,099,511,627,776 Bytes
Petabyte (PB): 1,125,899,906,842,624 Bytes
Exabyte (EB): 1,152,921,504,606,846,976 Bytes
Zettabyte (ZB): 1,180,591,620,717,411,303,424 Bytes
Yottabyte (YB): 1,208,925,819,614,629,174,706,176 Bytes
A special thank you to Brittany at Madison Square Garden for providing the distance from the court to the base of the scoreboard.
Also a thank you to Kelley at the Library of Congress.