Learn more about Complete Genomics’ sequencing service.

pdf icon Access our Complete Genomics’ Technology Paper

For more information about Complete Genomics’ sequencing service, please contact info@completegenomics.com

Systems

Complete Genomics’ novel DNA sequencing platform is based on a range of proprietary biochemistry, nanotechnology, instrumentation and computing technologies.

The main components of its DNA sequencing platform fall into five categories:

DNA Libraries

Arrays

Assay

Instruments

Software

DNA Libraries

Complete Genomics DNA libraries, in conjunction with its proprietary Combinatorial Probe-Anchor Ligation (cPAL™) chemistry, are used to obtain the sequencing reads. Currently, 35-base, paired end reads are generated from approximately 500-base pair genomic fragments. This fragment size is sufficient to span very common repetitive elements, in particular Alu repeats, which comprise 10% of the genome.

Library Construction Process

Complete Genomics paired-end DNA libraries consist of genomic DNA fragments with known synthetic DNA sequences (called adaptors) interspersed at regular intervals. The adaptors act as starting points for reading up to 10 bases from each adaptor-genomic DNA junction.

Complete Genomics uses a proprietary library construction process to insert four adaptors into each DNA fragment (Figure 1). A four-adaptor approach enables Complete Genomics sequencing to support 70-base reads (35 bases per paired-end). The read length may be increased by inserting more adaptors.

Figure 1: Multiple Adaptor Library Construction Process Figure 1: Multiple Adaptor Library Construction Process

Arrays

Complete Genomics has developed ultra-high density DNA nanoarrays that can be read with standard fluorescence chemistry and imagers constructed from commercial components, which minimizes the cost of both reagents and imaging. Unlike alternative approaches, clonal DNA amplification is not performed in emulsions or on surfaces. The amplification process occurs in solution and in a single reaction chamber, allowing for higher density and lower reagent usage. Additionally, since the DNA nanoball (DNB™) production process inherently produces clonal amplicons, it is not subject to the stochastic variation from limiting dilution that is inherent in alternative approaches.

Clonal DNA Amplification in Solution — DNA Nanoballs (DNBs)

Complete Genomics sequencing is performed on amplified DNA clusters termed DNA nanoballs (DNBs). The amplification avoids the cost and challenges of relying on single fluorophore measurements, such as those used by single-molecule sequencing systems.

Figure 2: DNA Nanoball (DNB) Formation
Figure 2: DNA Nanoball (DNB) Formation

Starting with a small circular DNA template (Figure 2) consisting of approximately 80 bases of genomic DNA and four synthetic adaptors, Complete Genomics generates a head-to-tail concatamer consisting of more than 200 copies of the circular template. Complete Genomics has developed a variety of proprietary techniques for forming this concatamer into a ball (a DNA nanoball, or DNB) as well as controlling its size, density and binding affinity to surfaces and to other DNBs. One milliliter (ml) of reaction volume generates over 10 billion DNBs, sufficient for sequencing an entire human genome.

Patterned Substrates

Complete Genomics produces patterned substrates (Figure 3) with two-dimensional arrays of spots that are activated to capture and hold DNBs. The patterned surfaces are produced using standard silicon processing techniques.

Figure 3: Patterned Substrate
Figure 3: Patterned Substrate

Complete Genomics patterned arrays achieve a significantly higher density of DNA spots than the unpatterned arrays that are typically used, leading to the need for fewer pixels per base read, faster processing, and more efficient reagent use. The Company’s first-generation commercial patterned substrates are 25mm by 75mm (1” x 3”) standard microscope slides, each with the capacity to hold approximately 1 billion individual spots that can bind DNBs. DNA nanoarrays with 2.85 billion spots are now being deployed.

Self-assembling DNA Nanoarrays

Figure 4: Slide Preparation
Figure 4: Slide Preparation
Figure 5:  Four-color Image of a DNA Nanoarray Figure 5: Four-color Image of a DNA Nanoarray

Complete Genomics makes a DNA nanoarray by introducing the DNBs to the patterned surface (Figure 4). The DNBs stick to the activated, or “sticky,” spots, and do not stick to the fields between the spots. Once a single DNB has stuck to a spot, it repels other DNBs, resulting in at most one DNB per spot. DNBs are three-dimensional, resulting in more DNA copies per square nanometer of binding surface than traditional DNA arrays. This unique three-dimensional quality further reduces the quantity of sequencing reagents required, while resulting in brighter spots and more efficient imaging. In practice, DNA nanoarray occupancies exceed 90% (Figure 5). A high-density DNA nanoarray thus “self-assembles” from DNBs in solution, eliminating one of the most costly aspects of producing traditional patterned oligo or DNA arrays.

Assay

The historical drawback of sequencing by ligation has been short read length, which is typically limited to approximately six bases from the ligation site. Complete Genomics has increased the read length to 10 bases; and by inserting multiple adaptors into each genomic fragment, each of which has two ligation sites, multiple adjacent 10-base segments of genomic DNA may be read.

Combinatorial Probe-Anchor Ligation (cPAL™): Ligation-based DNA Sequencing

Complete Genomics’ approach combines hybridization and ligation to produce high-accuracy reads with minimal reagent usage. Complete Genomics sequencing assay, called combinatorial Probe-Anchor Ligation (cPAL™), has many of the advantages of sequencing by hybridization (SBH) including DNA array parallelism, independent and non-iterative base reading, and the capacity to read multiple bases per reaction. In addition, cPAL resolves two SBH limitations--the inability to read simple repeats, and the need for intensive computation.

cPAL uses pools of probes labeled with four distinct dyes (one per base) to read the positions adjacent to each adaptor (Figure 6). There is a separate pool of probes for each read position. Complete Genomics proprietary approach allows 10 contiguous bases to be read from each end of an adaptor. Ligating the matching probes with the adjacent anchors dramatically improves the full-match specificity of the probe binding, when compared to hybridization without ligation. Under optimal fluidics and imaging conditions, the raw error rate of this assay can be below 0.1%.

Figure 6: Combinatorial Probe-Anchor Ligation (cPAL™) Chemistry
Figure 6: Combinatorial Probe-Anchor Ligation (cPAL™) Chemistry

After each base is read, the entire anchor-probe complex is washed away. The next anchor is then hybridized, and the next probe is ligated to the anchor. There is no chaining of consecutive probes, and thus no accumulation of errors.

One of the unique advantages of cPAL chemistry is random access (independent and non-iterative base reading). Each base-read cycle does not depend on the completeness of any of the previous cycles. This provides excellent fault tolerance qualities — if a base read fails, it does not prevent interpretation of the rest of the reads for that DNB; if desired, the failed base can simply be re-assayed.

Another key advantage of independent base reading is its tolerance to low ligation yield per cycle. This dramatically reduces the required probe and enzyme concentrations, thereby substantially reducing reagent costs. cPAL further allows for reading multiple positions per cycle, which is not possible with sequencing by synthesis. Reading multiple positions per cycle decreases the number of cycles, again reducing reagent consumption and imaging time.

Instruments

Complete Genomics high-speed instrument design is highly modular and allows for rapid reading of sub-micron DNA nanoarrays. Each of its components may be independently upgraded as suppliers release newer, improved versions. By relying on standardized components the Company is able to track its suppliers’ technology roadmaps to consistently deliver state-of-the-art performance, while leveraging the continuous cost reductions of standardized components. High-volume purchases will likewise enable the Company to work with component suppliers to continually improve performance of the Complete Genomics sequencing instrument.

Figure 7: Complete Genomics Sequencing Instrument
Figure 7: Complete Genomics Sequencing Instrument

Complete Genomics’ sequencing instrument (Figure 7) consists of three loosely coupled standardized sub-systems:

  • DNA nanoarrays, packaged into flow slides
  • Standard liquid-handling robot
  • High-speed imager

This modular design enables Complete Genomics to adjust components easily as specifications or performance criteria change, and allows for rapid reconfiguration that keeps pace with hardware technology development.

Flow Slides
Complete Genomics has developed a powerful flow slide platform for minimizing reagent use and simplifying fluorescence imaging (Figure 8).

Figure 8: Flow Slides Figure 8: Flow Slides

Micro-channels formed on top of the patterned substrates enable efficient reagent delivery and eliminate dead volume while simultaneously satisfying the optical requirements for high-resolution imaging. Process capacity (DNA spots measured per cycle) may be increased by adding more flow slides to the liquid handling deck. This ensures that increases in imager speed are matched by increases in process capacity.

Fluidics Robot
Complete Genomics uses standard, off-the-shelf, liquid handling robots to pipette reagents to the flow slides. When reactions are complete and a flow slide is ready to be imaged, a robotic arm transfers the slide from the liquid handling deck to the imager stage. Each instrument can run 2 to 12 slides in parallel — while one slide is imaging, the remaining slides are in various stages of preparation for imaging.

Imager
The imager is constructed from off-the-shelf components to form a four-color fluorescence microscope attached to a high-speed camera.

Software

Complete Genomics has developed its own suite of base-calling, mapping, assembly, and analysis software for rapidly reconstructing genomes from billions of paired-end reads. The base-calling software receives data from the imager after each reaction cycle. Images are processed to determine the bases at each position on a DNA nanoarray. Called bases for each DNB are collated to form raw read data. Mapping, assembly, and analysis software operate on read data and produce a variety of outputs, including reads aligned to a reference genome and consensus sequence assembly of overlapping DNB reads.

Base-Calling Software

Four images, one for each color dye, are generated for each queried genomic position. The position of each spot in an image and the resulting intensities for each of the four colors is determined by adjusting for crosstalk between dyes and background intensity. A quantitative model is fit to the resulting four-dimensional dataset. A base is called for a given spot, with a quality score that reflects how well the four intensities fit the model.

Read Data Format

Read data includes both a called base and a quality score. The quality score is correlated with base accuracy. Analysis software, including sequence assembly software, uses the score to determine the contribution of evidence from individual bases within a read.

Reads are “gapped” due to the DNB structure. Gap sizes vary (usually +/-1 base) due to the variability inherent in enzyme digestion. Due to the random-access nature of cPAL, reads may occasionally have an unread base (‘no-call’) in an otherwise high-quality DNB. Read pairs are mated as described in the DNA libraries section.

Mapping

Complete Genomics has developed high-speed mapping software capable of aligning read data to a reference sequence. The software runs on a commodity Linux cluster and scales horizontally with more central processing units (CPUs).

The mapping is tolerant of small variations from a reference sequence, such as those caused by individual genomic variation, read errors or unread bases. To support assembly of larger variations, including large-scale structural changes or regions of dense variation, each 35-base arm of a DNB is mapped separately, with mate pairing constraints applied after alignment.

Assembly

Complete Genomics has developed sequence assembly software that runs on a commodity Linux cluster and scales horizontally with more CPUs. The sequence assembly software supports the DNB read structure (mated, gapped reads with non-called bases).

The Complete Genomics assembler currently calls SNPs and short insertions, deletions, and block substitutions up to approximately 50 bp. The algorithm utilizes a combination of evidential (Bayesian) reasoning and de Bruijn graph-based algorithms. The use of a statistical model, which is empirically calibrated to each dataset, allows all read data to be used without pre-filtering or data trimming.

Genome Center

Complete Genomics is building the world’s largest commercial human genome sequencing center to provide turnkey, outsourced complete human genome sequencing to customers worldwide. Complete Genomics genome sequencing center is comprised of a sequencing operations center and a data center. This genome sequencing center will have the capacity to sequence approximately ten thousand genomes in 2010.

Top of page