Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Sanger sequencing projects generate huge volumes of data, but processing it with current software tools can be a cumbersome task. Credit: ktsimage/ Getty Images anti serum amyloid p
Sanger sequencing is the unsung workhorse of molecular biology, one whose value has long been undermined by the tedious nature of processing its data. Converting files of raw sequencing data into a polished report requires users to navigate a patchwork of often outdated and incongruous software tools. For those working in regulated spaces, the task is further complicated by security standards and the need for good manufacturing practices, both of which require data processing to be well documented in ways that existing software rarely supports. What should be a routine process is often a time-consuming and output-limiting challenge.
“You could be generating hundreds of files worth of data, but if analysis is your bottleneck, you're not going to be able to take advantage of that throughput,” says Nicolette Dutken, a senior application support development manager at Thermo Fisher Scientific.
It’s a snagging point that is particularly noticeable in the biopharmaceutical space, where Sanger sequencing is frequently used for biologics manufacturing. To clear this bottleneck, Dutken and her colleagues have been developing a new web-based software platform that aims to be an all-in-one solution for studying Sanger sequencing data.
Sanger sequencing in biologics manufacturing
Invented in 1977, Sanger sequencing became the quintessential first-generation sequencing technology and provided the foundation on which next-generation sequencing (NGS) was built. But where NGS is designed for scale and breadth, Sanger sequencing has found a niche in focused sequencing applications. Its longer read length is well suited to resolving complex genetic features that are challenging for NGS’s short-read technology, such as tandem repeats or GC-rich regions. And while it is not capable of multiplexing, Sanger sequencing’s speed, accuracy and low cost has made it a reliable resource in applications that involve the focused sequencing of a single gene or loci.
One such application lies in quality control for biologics manufacturing.
“Whether you’re developing an mRNA vaccine or a gene therapy, you’ll need to confirm that the construct you’re using and its transcribed products are made as you intended,” says Thermo Fisher application scientist Claudia Litterst, who works with biopharma customers to develop tools and protocols that support their research. “Sanger sequencing is one thing they’re very interested in.”
On a recent project, for example, Litterst deployed Sanger sequencing in support of mRNA vaccines production. Typically, such a vaccine is made by first synthesizing the RNA’s corresponding gene fragment, placing it into plasmid for in vitro transcription, and then purifying the resulting mRNA. Throughout this process, various factors can lead to errant mRNA species that diminish vaccine safety and efficacy. Manufacturers must therefore perform quality control sequencing on every batch of vaccine, confirming the exact sequence of the plasmid-embedded gene and its purified mRNA product.
InnoviGene can produce PDF reports as part of the quality control process. The content and quality control thresholds can be customized, and data can be colour-coded for clarity. Credit: Thermo Fisher Scientific
Given the repetitiveness of plasmids and the singular focus of this sequencing need, Sanger sequencing is an excellent tool for the job — but it’s held back by the challenges of data processing.
Sanger sequencing data must undergo a multi-step process to convert raw data into a usable state. “It’s a headache,” Dutken says.
Exactly how sequencing data is processed will vary from lab to lab, but in general, data quality must be assessed to ensure the sequencing process was unperturbed. This data then needs to be assembled into a coherent strand of digital DNA, compared to a reference, and any mismatches flagged for inspection. Each step may require the use of bespoke software. Data files must be escorted between these disparate systems and potentially modified to match the requirements of each.
“You're forced to learn multiple old software interfaces, and each of them may have several ways to do something,” explains Dutken. “That really slows down your ability to look at your data and for others to replicate it.”
In biopharmaceutical manufacturing, this piecemeal process also creates regulatory and cybersecurity issues. Outdated software is vulnerable to unauthorized access, leaving company intellectual property at risk. And to be compliant with regulatory standards — such as 21 CFR part 11 guidelines for security, auditing, and electronic (SAE) signatures using an SAE console — every modification of the sequencing file needs to be documented and accompanied by an electronic signature. With the existing patchwork of software, manufacturers must often perform much of the record-keeping manually.
To bring the field’s data processing standards into the 21st century, Thermo Fisher has developed the InnoviGene platform, which provides a single, unified system for processing raw Sanger sequencing files, from quality trimming to mismatch identification and report generation. “It covers all the necessary steps of processing and reporting the data,” says Litterst.
Thermo Fisher senior research scientist Wesley Morovic, who was an early user of the software suite, was impressed with the intuitiveness of InnoviGene’s design. “My first and ongoing impression is that it’s extremely user-friendly,” he says, “especially for users without a lot of bioinformatics experience.”
Like Litterst, Morovic’s work often involves processing Sanger sequencing data to support biopharma customers with tasks like transgene identification, mRNA sequence identification, and scrutinizing plasmids used to produce therapeutic substances. InnoviGene’s compliance features were critical to Morovic’s decision to onboard the software. “I could see pretty quickly that it was 21 CFR Part 11-compliant,” he says. “It had features to limit access, create audit trails and set automatic report generation.”
In addition to regulatory compliance features, InnoviGene is designed to make collaboration easy. Rather than limit the software to a local machine, InnoviGene is browser-based. This allows access to the software from any device or instrument connected to the local area network, making file sharing and collaboration simple.
Taken together with its advanced features — such as AI-driven base calling and mismatch detection — InnoviGene is helping to knock down barriers in Sanger sequencing and, ultimately, iron out some of the most irksome wrinkles in biopharmaceutical research.
“It’s a one-stop shop that covers everything and is compliant with 21 CFR Part 11,” says Litterst. “And on top of all that, it’s just easy to use.”
human scfv library To learn more about how InnoviGene Suite could optimize your data analysis, click here.