The quest to bring back the mammoth got a step closer this week with the development of a new mouse.
Behind Colossal Biosciences’ breakthrough achievement is a growing computing infrastructure that aims to bring DNA into the digital world.
This week, the biotechnology firm announced that it had created a new species, the woolly mouse, with the company genetically modifying fertilized mouse eggs and embryonic mouse stem cells to mimic traits of a woolly mammoth.
This successful test is part of a first step towards recreating the mammoth, or at least a hybrid Asian elephant-woolly mammoth, with the first calf expected by the end of 2028.
"We're not putting mammoth DNA sequences into mice," Beth Shapiro, chief science officer at Colossal Biosciences, tells DCD. “There's 200 million years of evolutionary distance between them, so it doesn't make that much sense to do it that way.
"Instead, we're looking for variants where we know that a particular change in the mouse genome leads to a healthy, happy mouse that has a particular trait we're interested in. And then we combined all of those together and created these ultra wooly mice based on a variation that exists in mice."
The new creation has woolly hair and fat cells more suited for the cold tundras that once served as the home for the mammoth.
Cloning the mammoth is impossible. What is left of the giant mammal is found in fragments, echoes of the past. Instead of a complete replica, Colossal aims to piece together DNA from other animals alongside what remains, creating new creatures entirely.
The idea is to edit DNA to include traits from other species, building out a blueprint for a mammoth-like being.
The company is taking a similar approach in its effort to resurrect the thylacine, more commonly known as the Tasmanian tiger.
While the carnivorous marsupial superficially resembles dogs and foxes, its closest living relative is actually the mouse-like fat-tailed dunnart, a small marsupial that lives in southern Australia.
"The thylacine's got 70 million years of genetic divergence between its closest living relative," Ben Lamm, co-founder and CEO of Colossal, says. The company may instead borrow from an entirely different family to help fill in the gaps.
"What's interesting is, if you look at a thylacine skull and you look at a wolf skull, they look virtually the same. Unless you know exactly what you're looking for, if you're just looking at the skull, they look almost identical, except for these very, very minor things. And that's due to a process that we know in science called convergent evolution."
Colossal hopes to build on the wolf DNA, editing it to remove those differences and come closer to what once was. "We're not trying to clone these species, but as we make edits for our mammoth, our thylacine, and our dodo, we want to continually do more and more sequencing and comparative genomics," Lamm says.
"We've built this macro genome across canines and wolves so that we can understand the cranial facial morphology edits that we are already making our thylacines."
The comparative genomics process can take "hundreds of thousands of edits," Lamm explains, so the company instead hopes to build out its understanding of macro genomes across similar phenotypes in other species. "That requires even more compute," he adds.
"I would argue the better we are at computational analysis, and the more we do, and the more money we spend, the fewer edits we have to make. And so, on that cranial facial example, we now have a list of about 450 edits that we believe drive core cranial facial hypercarnivorism, which will result in a phenotype that is similar to that of that is that of a thylacine."
Phenotypes are how genotypes (DNA) actually express themselves in the real world, once a being has been exposed to its environment. "We want to truly understand genotype to phenotype expression," Lamm says.
This means that the company is not just analyzing the remnants of extinct species and their closest living relatives, but "going so much deeper and wider on this," he says. "The next step is saying, 'what other species, even though they have massive distances of genetic divergence on the phylogenetic tree, had similar characteristics? And can we build macro genomes across those and then do this comparative genome work?'"
As the company ramps up its collection of DNA and other genetic data, storage needs have ballooned. Colossal currently stores around 3.8 petabytes of data, with an expectation that that will expand significantly as more species are added. There's also a lot of useless data that is hoovered up as part of the process.
"We are storing a lot of data that's maybe not necessary," Shapiro says. "But mapping genomes is incredibly challenging - if I sequence billions of fragments of DNA from a mammoth bone, and each of those are strings of data, many of those aren't going to be things that we're actually interested in."
In that same bone is DNA from microbes, fungi, and bacteria, along with other unrelated data. "We have to hold a genome in memory and then map each of these things to that genome and see what's happening. And we create a bunch of intermediate files doing these things. That's a ton of data."
Managing all of this data and turning it into something usable would not have been possible with automation and machine learning, Lamm says. "We're now turning out projects that are going to take three to five years, but they would take 50 years or never without this compute."
The company has developed assistive guide design tools and other machine learning models, but is currently not using much generative AI. "[Co-founder and lead geneticist] George Church and I have been having a lot of conversations over generative AI in biology," Lamm says, "but some of the claims that have come out... I don't know if I buy that, because biology and wetware work very different than stuff just in a simulation."
For its own simulation work and storage, Colossal currently uses Google Cloud, Amazon Web Services, and some internal compute. But as its ambitions grow, so will its data hosting needs.
Alongside de-extinction, the company hopes to get involved in conservation, building Noah's Arks of genetic data on the animals that are at risk of disappearing.
"We're now working on a l model process for how we think it's best to optimize storage for multiple critically endangered species, and doing a population genetics study so that we can have all of that analysis, to understand genetic diversity and genetic drift within a kind of microcosm of the population," Lamm says.
The company plans to build out bio vaults that will "have a combination of actual wetware and wet storage and cloud-based storage, and so that it's accessible to researchers." Alongside that, because of "the amount of data that we will have, it'll have a component of cold storage too, so that we don't have like a trillion petabytes."
That would mean operating its own cloud service for researchers, and expanding its own compute. "We will have to build those specific data centers," Lamm says, "I do think that we will have to be in that purchasing conversation."
Expected compute advancements from the latest GPUs and accelerators are set to only speed up the company's work. "Faster and more powerful computing will compress our timeline by enabling more design iterations in silicon before moving to wet lab work," Lamm says.
"This extends from genome assembly (in particular for ancient DNA where preserved DNA fragments are super short), where better chips would allow more accurate reconstructions and faster iterations, to guide design for gene-editing through more complex simulations predicting off-target effects and phenotypic outcomes. Our artificial womb and embryo development research would also benefit from improved modeling of developmental biology and adaptive AI."
The company could benefit from other, less certain, advances. "Every two years, we all hear about the mythical quantum computing," Lamm says. "I don't think we'll get there [soon], but eventually we're going to get there. And I think that access to simultaneous compute on that scale will also dramatically change costs."
Similarly, the company could follow the path of other large-scale compute users with very specific needs and make its own hardware. "We've had a lot of conversations on-chip assembly and chip design, specifically around DNA synthesis, but there are lots of problems," Lamm explains. "Having brittle DNA survive and be assembled is one thing, but then also having it accurate is another."
As we talk, Lamm lists multiple technical and esoteric challenges of the technological process, as well as the fundamental constraints of different partners working on different aspects of the delivery mechanism. "There are many innovations that have to come in that area," he says.
This past week, Lamm attended a meeting on chip architecture design specifically for DNA synthesis, he reveals. "I think most likely for everything that we will have to run long term, we'll just use off the shelf, or some version modified off the shelf or cloud, unless we decide to get really further into the DNA synthesis approach to an engineering world."
There are only a few companies in this space, he says. "They're all roughly at the same scale, and they're roughly at the same standard in terms of what they can output. So the question is: Do we buy one of those guys? Do we just work with them, or do we go into the architecture ourselves? I want to wait and see where some of the kind of dust settles in the next year."
Similarly, on the storage side, DCD posited that the company's experience with DNA could make it more open to using it as a storage medium, as pitched by companies like Catalog and Biomemory.
"I'm certainly intrigued by the idea of repurposing a billions-of-years-old information storage system for our modern digital needs," Lamm says. "DNA data storage offers remarkable theoretical data density and durability, making it compelling for archival purposes, but we're not there yet.
"We're great at reading DNA, but the writing process is still too error-prone, although it continues to improve. I also worry a bit about the speed of encoding/decoding, which is why I think it is most promising for specialized long-term archival applications rather than general computing storage."
For now, however, such ideas remain future concerns, with the company already focused on its own mammoth task. “We don't want to do any experiments on elephants if we can avoid it,” Shapiro says. “When we start working on elephants, we want to have as much information as we possibly can before we start doing any work like that. Also, elephants have a 22-month gestation, so it wouldn't be a fast process to test these hypotheses.
"So we look, then, to a relative of the elephant - the mouse. And that's how we got the woolly mouse."