Petabyte - How Much Information Could it Actually Hold?

Written by Richard Spurlock | Oct 31, 2019 11:33:00 AM

When a customer says to me, "Hey, I've got two petabytes of tape data I need to move," it’s time to talk about the physics of a petabyte and what that really means. How long to move it, and what work it will take to get it from the legacy storage location into modernized data management.

Sometimes this data has been protected, backed-up and stored for decades, or longer. In many cases, the tape drives or storage subsystems are out of maintenance, out of support, and problematic to use any longer.

Other times, the physical media itself is aging out with the tape media disintegrating so that it is uncertain if it will be possible to read data stored on it.

Until the time comes to move all that data, many customers simply think of it as a number — whether they’ve got a 200 terabyte or 60 petabyte pile of data. These are large, unimaginable amounts of data.

Data managers don’t have a way to visualize the actual volume of data they’re dealing with, and most are just flabbergasted when they truly understand the practical implications of moving it forward into a modern backup environment or growing it under current business requirements.

How big is a petabyte?

A petabyte of data is equivalent to 1000 terabytes or 1,000,000 gigs of data storage. To put this into perspective, here are some comparisons to real world examples:

The average 4k movie is 100GB of data. This would mean 1 petabyte of storage could hold 11,000 4k movies. With an average run time of 2 hours, it would take you over 2.5 years of nonstop binge watching to get through a petabyte's worth of 4k movies.
The Library of Congress contains over 20 petabytes of data.
If you took a petabyte's worth of 1GB flash drives and lined them up end to end, they would stretch over 92 football fields.
If you stacked a petabyte's worth of 1TB SSD drives on top of each other in Madison Square Garden, they would reach from the court floor to the base of the score board Over two and a half times.
1 petabyte’s worth of data is equal to taking over 4000 digital photos every day for the rest of your life.

What does a petabyte look like in the real world?

Over the years, my teams have used a small number of ways to help customers visualize this pile. We had another opportunity on a recent visit to a customer who has over 150 petabytes of primary data on tape to move.

When I say over 150 petabytes of data on tape, I should mention that the customer also has a second copy on tape, which brings the total data stored on tape in their universe to over 300 petabytes. We started looking at all this data and asserted:

“Alright, if everything lines up and everything works perfectly in your environment today, and if we can run about 24/7 with the resources you currently have available, then you can move these 300 petabytes of data in a bit more than 7 years.”

The customer’s response was, “We knew it was going to be a long time but … what?!” So, we offered to put it into perspective using the ISBN metric.

What’s the ISBN metric? We agreed that most everyone knows what an international standards book number (ISBN) is. I like to use the complete works of Sherlock Holmes, printed in 1927, as an example. (It’s one of my favorite books of all time.)

According to Amazon, the standard hardcover collection measures 6.8 x 2.8 x 10 inches and represents approximately 4 megabytes of data for the rich stories within. Using this as our metric, we can multiply copies of this book to represent the customer’s data volume.

In this customer’s case, the volume of 300 petabytes of stored data is equivalent to moving a physical cube of books — about 75,000,000,000 copies.

"If you stacked these copies height wise from end to end, they would stack over 11,835,000 miles. This means 300 petabyte's worth of The Complete Sherlock Holmes would stack from the Earth's surface to the moon 49 times."

As you can imagine, it takes time and technology and people to move this amount of data. The ISBN metric brings this colossal amount of data out of the abstract and puts it in terms the customer can truly understand.

Do I really need petabytes of data?

To assist customers, we discover the requirements, the facts, and determine the "why" for so much data. Then with the customer, we illustrate how to surgically extract only the data that’s needed, remove it from the legacy environment, and bring the valuable information into the modernized data protection environment — all efficiently and quickly while still governed by the laws of physics.

We typically see two main types of data migration where we do not disrupt the chain of custody:

Move all data for a system or application
Surgically extract specific ranges of stored data for a system or application

And for this migrated data, we see two main types of policy management:

Maintain historical policies and retention requirements
Restamp with current, modernized policies and retention requirements

What is the best method for managing a petabyte of data?

Sifting through a mountain of data is challenging, if not impossible, for even the most tenured data professionals.

To solve this issue, we at Cobalt Iron built analytics and automation into our Compass platform. This enables end users to assess and tackle the monumental task of data migration more efficiently by “just feeding the tape.” In some cases, it's even allowed end users to eliminate tape all together.

In short, we take the mystery out of that mountain of data, and we enable effective movement of data into an environment that unshackles end user's data for business use across the enterprise.

Equally or more important for our customers: Information that previously was locked up in tape and essentially inaccessible becomes instantly available, cataloged, and valuable to the enterprise not only for compliance or legal audits but also as a rich resource of historical data that can be analyzed and leveraged to drive better business outcomes.

How does a petabyte compare to other sizes of data?

The true size of a petabyte is daunting when put into practical terms such as an equivalent stack of Sherlock Holmes compilations. Nevertheless, it is possible to get over that mountain and realize the business benefits that come from maintaining data in a modern enterprise backup environment. For a comparison, here is a chart detailing various sizes of data:

Byte (B): 1 Byte

Kilobyte (KB): 1,024 Bytes

Megabyte (MB): 1,048,576 Bytes

Gigabyte (GB): 1,073,741,824 Bytes

Terabyte (TB): 1,099,511,627,776 Bytes

Petabyte (PB): 1,125,899,906,842,624 Bytes

Exabyte (EB): 1,152,921,504,606,846,976 Bytes

Zettabyte (ZB): 1,180,591,620,717,411,303,424 Bytes

Yottabyte (YB): 1,208,925,819,614,629,174,706,176 Bytes

Sources:

A special thank you to Brittany at Madison Square Garden for providing the distance from the court to the base of the scoreboard.

Also a thank you to Kelley at the Library of Congress.

Lifewire.com

snia.org

View full post