UW alum masterminding next generation data storage: A solution to the datapocalypse?
CATALOG, a Boston company co-founded by a recent UW–Madison Ph.D., is preparing to demonstrate the world’s fastest, densest DNA-based data-warehouse.
In a meeting at the Weinert Center for Entrepreneurship at the Wisconsin School of Business, Hyunjun Park said the device will hold digital information in DNA – life’s evolution-perfected “data storage” molecule.
The goal of the demonstration, says Park, is to store 125 gigabytes, or about 200 full-length CDs in 24 hours, on less than 1 cubic centimeter of DNA. And to do it for $7,000.
That’s a lot more expensive than a hard drive, but Park says it’s a million times cheaper than a DNA “drive” demonstrated by Microsoft — and 100,000 times faster. Indeed, faster and cheaper are CATALOG’s two competitive advantages. (The company name is in upper case because it contains C, A, T and G; shorthand for the four “letters” that carry information in DNA.)
Data, from business, spy satellites, telescopes, weather stations, government agencies or the countless gadgets in the “Internet of Things,” is becoming the currency of the era.
That picture was already emerging by the time Park earned a Ph.D. in bacteriology at UW–Madison in 2014. His next step was a post-doctoral position in synthetic biology at MIT – an emerging field dedicated to creating, not growing, organisms and biological structures.
Before leaving Madison, Park attended the week-long Morgridge Entrepreneurial Boot Camp on campus, and began considering leaving academia for the rough and tumble of entrepreneurship.
At MIT, Park met Nathaniel Roquet, a Harvard Ph.D. student in biophysics who became CATALOG’s other co-founder and chief technology innovation officer.
With eight employees in Boston, and investments totaling more than $10 million, CATALOG is built for speed in the race to find the next great storage system. “We have started from zero, and hopefully are getting to one,” says Park, lapsing into dataspeak.
For a startup, a solution is less important than a solid problem, Park told the Weinert Center’s Distinguished Entrepreneurs Lunch on Feb. 27. And Park’s problem – the glut of information sometimes called the “datapocalypse” — is a result of a tsunami of data from pretty much every sphere of human activity.
With CATALOG, so far, so good, says Dan Olszewski, director of the Weinert Center, who invited Park to speak at the event as a distinguished alumnus. “Hyunjun is a great example of the creativity of someone who was trained to an amazing level of science at UW–Madison. He overlays that on his entrepreneurial creativity, and comes up with an innovative idea that bridges those realms. His solution reads like science fiction, but it makes sense. He’s got the persistence and resilience that are fundamental to entrepreneurs.”
The slowing improvements in hard drives and solid state drives make DNA a tantalizing alternative, and Microsoft, Intel and other techno-bigs are chasing the Grail of a medium that offers millennium-level stability.
Curiously, the molecule of life’s data storage lasts far longer than disks and magnetic tape, which must be replaced every few years.
In terms of digital data in a given volume, DNA is also millions of times more capacious. A 2017 demonstration at Columbia University showed a data density of 215 billion megabytes per gram of DNA, enough store more than 100 million movies.
On the downside, that DNA cost $3,500 per megabyte to assemble. For comparison, today you can buy 5 million megabytes of storage (in a combination disk drive and solid-state unit) for about the same price.
We did the math: that’s about 5 million times cheaper.
As these numbers show, the problem dogging DNA data storage is the same one that hounded hard-disk technology in the 1950s: Cost. Assembling the four “letters” of DNA is expensive, and slow.
Nonetheless, promising technologies can have a slow start and then pick up speed. When Park and Roquet formed CATALOG in 2016, they shunned the idea of assembling bases one by one to represent the digital “alphabet.”
“We don’t think about how nature does it,” says Park. “We scrap all that. We think of DNA as a medium, a polymer, and ask, ‘What is the best way to generate a lot of different molecules?’”
CATALOG opted for prefab: it buys or makes fragments of DNA, “in massive quantities,” and then assembles with a custom-made liquid-handling robot.
“DNA molecules are like Lego blocks,” says Park. “We can string them together in virtually infinite combinations. We take advantage of that and start with a few hundred molecules to generate in the end, trillions of different molecules.”
Park likens the approach to movable type. Instead of having to write out every letter each time you want to write something, old-style typesetters cast their letters in advance, and then slotted them into position.
The result is a warp-speed improvement on an assembly process that takes advantage of a medium that life has perfected over billions of years of evolution.
If he’s daunted by having Intel and Microsoft as competitors, Park does not show it. “We’re creating a new medium for digital data storage, so the opportunity is big enough for more than one company. If large companies like Microsoft are interested, it only helps to validate the idea and to build up the ecosystem for a technology like this.”
Tags: alumni, bacteriology, biology, data