Simple logic, greener data protection

People who want to make the digital revolution more energy-efficient, says Tiffany Jing Li, might gain motivation by visiting the U.S. Library of Congress.

The world’s largest library has assembled 29 million books and periodicals, 15 million recordings and photographs, and 5 million maps in its 210-year history.

If that sounds impressive, says Li, consider this: The world today generates an equivalent amount of new digital information every 15 minutes.

Whether the videos, photos and tunes taking their place in cyberspace are as worthy of preservation as congressional tomes is a matter of conjecture.

But there is little doubt, says Li, an associate professor of electrical and computer engineering, that the explosion of digital data is exacting a steep energy price tag.

“The proliferation of information has created an exponential demand for digital storage. This requires massive data centers that are becoming notorious energy hogs.”

The cost of redundant data protection

A PC or laptop by itself generates an inconsequential amount of heat, says Li. But when hundreds or thousands join forces to back up data at a bank, corporation or military base, cooling is essential. In 2006, the U.S. burned through 61 billion kilowatt-hours, at a cost of $4.5 billion, to power and cool data centers.

Much of that energy, says Li, goes to provide redundant data protection.

“If one disk fails, you need to replace it and restore its data from a duplicate disk. If the second disk fails during the restoration, you need a third disk with the data backed up and protected. Replicating in triplicate enables you to support two concurrent failures.

“All three disks have to be operating simultaneously—collecting and filing data, incorporating updates, using energy. So when you measure energy usage, you are always multiplying by at least three.

“If you can reduce your storage disks without sacrificing reliability or robustness, you can cut costs significantly.”

Government and industry have sought to make data centers more efficient by arranging servers more optimally, improving air flow, and developing better lighting and cooling systems.

Computing networks for leaner data storage

Li wants to cut energy consumption by making data storage itself more efficient. She designs erasure codes that restore lost data when storage disks fail, thus reducing the need for redundant replacement schemes. An efficient code enables one storage disk to protect multiple disks of data with parity checks that rely on a simple logical operation.

While the idea sounds attractive, erasure codes are difficult to design, implement and verify in large-scale data storage systems, says Li.

“Erasure codes must be designed just right. The optimal code has to be computationally simple. It has to provide maximum protection against erasures with a minimal number of parity checks.

“In the end, it’s not just a storage network but also a computing network that is needed.”

Li and her students have designed optimal erasure codes with the smallest possible computational complexity promised by theory. They have performed theoretical analyses and run computer simulations, and will work with data centers to check the codes against real-world failure frequencies and protection requirements.

Li’s research is funded by a Faculty Innovation Grant from Lehigh.

 

Photo by Douglas Benedict