The growing volume of data necessitates storage solutions that can often last for many years. Synthetic polymers present a viable alternative to traditional storage media, offering the ability to preserve information while consuming less space and energy. However, when using mass spectrometry for data retrieval, the length of polymer chains is constrained, limiting their storage potential. Recently, researchers published a method in the journal Angewandte Chemie that addresses this issue, enabling direct access to specific bits of information without the need to read the entire polymer chain.
The constant influx of data arises from numerous sources such as business transactions, process monitoring, quality control, and tracking product lots. Storing this data for extended periods requires significant space and energy. For the long-term storage of substantial data volumes that are accessed infrequently, macromolecules with a predetermined structure, such as DNA and synthetic polymers, emerge as appealing alternatives.
Synthetic polymers offer several benefits over DNA: they are easier to synthesize, provide a higher storage density, and can withstand harsh conditions. However, their drawback lies in the fact that information encoded within them is retrieved using mass spectrometry (MS) or tandem-mass sequencing (MS2). These techniques necessitate limited molecule sizes, which drastically curtails the storage capability of each polymer chain. Furthermore, the entire chain must be decoded sequentially, examining each segment one by one—similar to having to read an entire book rather than flipping to the desired page. Conversely, long DNA strands can be fragmented randomly, sequenced separately, and then digitally pieced back together to form the original sequence.
Kyoung Taek Kim and his research team from the Department of Chemistry at Seoul National University (Republic of Korea) have pioneered a novel technique allowing the effective decoding of very long synthetic polymer chains whose molecular weights exceed the analytical boundaries of MS and MS2. In their demonstration, the team encoded their university address into ASCII format and translated this alongside an error detection code (CRC, a known method for ensuring data accuracy) into binary code, forming a sequence of ones and zeroes. This 512-bit sequence was embedded within a polymer chain composed of two distinct monomers: lactic acid representing a 1 and phenyllactic acid standing for a 0. They also interspersed fragmentation codes using mandelic acid at irregular intervals, which, when chemically activated, prompted the chains to cleave at those points. Ultimately, they generated 18 fragments of varying sizes that could be individually decoded through MS2 sequencing.
The team utilized custom-designed software to identify the fragments initially by their mass and corresponding end groups as indicated in the MS spectra. During the MS2 analysis, the measured molecular ions further disintegrated, and these fragments were analyzed as well. The sequences were derived by examining the mass differences of the segments. With the assistance of the CRC error detection code, the software was able to reconstruct the complete sequence of the polymer chain, thus surpassing the previous limitations on chain length.
Additionally, the researchers demonstrated the ability to decode specific bits of information without needing to sequence the entire polymer chain (random access), such as extracting the word “chemistry” from their address code. By recognizing that the address components follow a particular order (department, institution, city, postal code, country) and are separated by commas, they could pinpoint where the desired information was located within the chain and sequence only the relevant fragments.