Big Data in a tiny package
A team of PolyU researchers is pioneering a new method that could revolutionise how we store data
When scientists discuss peptides, it is usually in relation to biomedical research and the development of new pharmaceutical drugs. But now, a team of researchers at PolyU is investigating peptides for an entirely different purpose — as a medium for storing vast quantities of data.
According to International Data Corporation (IDC), the amount of data we generate worldwide is growing at an exponential rate — from 33 zettabytes in 2018 to 175 zettabytes by 2025. (One zettabyte is roughly equal to a billion terabytes or a trillion gigabytes.)
The question is, how can we store all this data? Most magnetic storage devices today are not large enough and last for only 10 to 20 years.
Fortunately, there is a solution in sight thanks to the efforts of a team at PolyU led by Dr Yao Zhongping, Associate Professor, Department of Applied Biology and Chemical Technology, and Professor Francis Lau, Associate Head, Department of Electronic and Information Engineering. Researchers from the Hong Kong University of Science and Technology and the Chinese University of Hong Kong are also participating in this revolutionary new project.
Converting data into peptides
The approach developed by Dr Yao and Professor Lau’s team incorporates peptides as a storage medium for the first time. Compared with existing data storage devices and developing technologies such as DNA data storage, peptides offer much higher storage density, are more stable and can last for millions of years.
To write data with this new method, raw data is encoded as 1s and 0s, translated into peptide sequences, and finally stored as synthesised peptides in powder or solution. To retrieve the data, the peptides are sequenced using techniques such as tandem mass spectrometry, and decoded back to the raw data. Once the technology becomes mature, it would then be possible to store big data, such as the entire collections of national libraries, in vials of peptides occupying the same space as a shoebox.
What exactly are peptides?
Although the team has filed the patents of its new method and received encouraging feedback, there are still challenges to be overcome. Mainly, these are related to the high cost of reading and writing data as well as the amount of time needed for sequencing, which is still too slow for everyday usage.
The research team, however, have set clear targets for overcoming these challenges. These include making further improvements in data density, designing a better encoding scheme and creating prototypes for this next-generation peptide-based storage system.
To achieve these targets, the team has received a grant of more than HK$9.7 million from the Research Impact Fund of the University Grants Committee, in recognition of PolyU’s leadership and expertise in this field.
A particularly exciting prospect for the future of this project is space exploration, where massive amounts of data must be stored and managed in a confined area for long durations. In fact, this is already starting to happen with the launch of China’s Long March-5B rocket on 5 May 2020, when the spacecraft carried peptides encoding the PolyU motto, To learn and to apply, for the benefit of mankind and PolyU 80th Anniversary. The purpose of this experiment is to test the stability and reliability of the peptides after exposure to the space environment. The PolyU team will decode and retrieve the data after the safe return to Earth of the spacecraft along with the experimental materials.
In the meantime, the team looks forward to collaborating with organisational partners such as Huawei, the multinational technology company, and developing commercial applications for their project.
Entering a new era of data storage
New developments are anticipated for this project in the next few years that will involve sequencing optimisation and software development as well as storage and retrieval of real-life data.
As Dr Yao points out, “We are very excited about this opportunity to continue developing our project. We believe it offers tremendous benefits to government agencies and corporations that generate and archive large volumes of data — it truly has the potential to radically transform the data storage industry and the way we manage data.”
Dr Yao Zhongping (middle), Professor Francis Lau (second from left), with some researchers Dr Albert Ng (left), and (from right) Dr Tam Wai-man and Dr So Pui-kin