First DNA Data Storage Specification Released: First Step Towards Commercialization
by Anton Shilov on March 15, 2024 12:00 PM EST- Posted in
- Storage
- SSDs
- HDDs
- HDD
- DNA Data Storage
The DNA Data Storage Alliance introduced its inaugural specifications for DNA-based data storage this week. This specification outlines a method for encoding essential information within a DNA data archive, crucial for developing and commercializing an interoperable storage ecosystem.
DNA data storage uses short strings of deoxyribonucleic acid (DNA) called oligonucleotides (oligos) mixed together without a specific physical ordering scheme. This storage media lacks a dedicated controller and an organizational means to understand the proximity of one media subcomponent to another. DNA storage differs significantly from traditional media like tape, HDD, and SSD, which have fixed structures and controllers that can read and write data from the structured media. DNA's lack of physical structure requires a unique approach to initiate data retrieval, which brings its peculiarities regarding standardization.
To address this, the SNIA DNA Archive Rosetta Stone (DARS) working group, part of the DNA Data Storage Alliance, has developed two specifications, Sector Zero and Sector One, to facilitate the process of starting a DNA archive.
Sector Zero serves as the starting point, providing minimal details necessary for the archive reader to identify the entity responsible for synthesizing the DNA (e.g., Dell, Microsoft, Twist Bioscience) and the CODEC used for encoding Sector One (e.g., Super Codec, Hyper Codec, Jimbob's Codec). Sector Zero consists of 70 bases: the first 35 bases identify the vendor, and the second 35 bases identify the codec. The information in Sector Zero enables access and decoding of data stored in Sector One. The amount of data stored in SZ is small and fits into a single oligonucleotide.
Sector One expands on this by including a description of the contents, a file table, and parameters required for transferring data to a sequencer. This specification ensures that the main body of the archive is accessible and readable, paving the way for data retrieval. Sector One contains exactly 150 bases and will span multiple oligonucleotides.
"A key goal of the DNA Data Storage Alliance is to set and publish specifications and standards that allow an interoperable DNA data storage ecosystem to grow," said Dave Landsman, of the DNA Data Storage Alliance Board of Directors. "With the publishing of the Alliance's first specifications, we take an important step in achieving that goal. Sector Zero and Sector One are now publicly available, allowing companies working in the space to adopt and implement."
The DNA Data Storage Alliance is led by Catalog Technologies, Inc., Quantum Corporation, Twist Bioscience Corporation, and Western Digital (though we are unsure whether Western Digital's NAND or HDD division is responsible for developing the specification). Meanwhile, numerous industry giants, including Microsoft, support the DNA Data Storage Alliance.
Source: SNIA
13 Comments
View All Comments
GeoffreyA - Friday, March 15, 2024 - link
Fantastic. Thanks for the article.Threska - Friday, March 15, 2024 - link
Somewhere in all that error correction needs to be in place.Scabies - Friday, March 22, 2024 - link
Data Mutation, coming to a cold storage near you!ballsystemlord - Friday, March 15, 2024 - link
I was of the understanding that it was too difficult, at present, to read/write DNA at a reasonable scale -- even as a one-off/prototype project. Subsequently, a specification is pointless.RedGreenBlue - Friday, March 15, 2024 - link
It is, but the methods of reading and changing DNA have never had much need to be faster, most purposes today don’t have much time pressure. Trying to develop other methods for computers would improve it and the usage time period for this is nowhere near. This is for when we’ve exhausted HAMR and bit-patterned media with dots the size of a few atoms in like 20 or 30 years. Can’t start developing it without any ground rules for the industry to agree on. Also, error correction would probably be done in the sectors.RedGreenBlue - Friday, March 15, 2024 - link
Also, in 20 or 30 years they won’t care if someone uses this technology to bring dinosaurs back. This is how you leave your mark on the world.GeoffreyA - Saturday, March 16, 2024 - link
If I understand it correctly, I think it would have been better for the vendor identification to have been done away with entirely, especially when I see Microsoft there. All that's needed is an identifier for the codec.ballsystemlord - Saturday, March 16, 2024 - link
I have to agree with you on that count. It's odd that they'd even need to identify the vendor. Technical details are what is needed.GeoffreyA - Sunday, March 17, 2024 - link
Exactly. Who knows to what extent this may be used in the future, and then there's going to be a little Made by XYZ stamp on it. This smacks of Blade Runner.Threska - Sunday, March 17, 2024 - link
Liability, IP, or support reasons.