To learn more about our genome sequence results:

Complete Genomics Sequencing Results

Analyses of Genomes from Science Publication

Complete Genomics has published a report describing three human genome sequences in the journal Science. Two of the samples were derived from potentially different passages of cell lines used in the International HapMap project: 1) a Caucasian male of European descent (NA07022) , and 2) a Yoruban female (NA19240). The third sample was generated from lymphoblast DNA from a Personal Genome Project (PGP) Caucasian male sample (NA20431). Sequencing of these genomes was conducted at Complete Genomics’ commercial-scale genome center. Results presented in this paper and discussed below include:

NOTE: Analysis of these data is ongoing, and we have made considerable additions to our production analysis software since this paper was written. Please refer to the Sequence Data Available for Download page for updated results.

Summary of Genomes Sequenced

Summary information from mapping and assembly of the three genomes.

SampleMapped sequence (Gb)Average coverage depth (fold)Percent of genome called
FullyPartially
NA192401786395%1%
NA070222418791%2%
NA204311244586%3%

Results were obtained by mapping sequence reads to the human genome reference( NCBI Build 36) and assembling variants with custom algorithms specifically designed for Complete Genomics data. Between 124 and 241 Gigabases (Gb) were mapped, for an overall mean depth of coverage of 45- to 87-fold per genome. Fully called regions are those where both diploid alleles could be determined at high accuracy (see below), while partially called regions are those where one of the two alleles was determined but the second was not.

Summary of Variations

Variations detected relative to reference genome (NCBI Build 36).

 NA19240NA07022NA20431
Variation typeCountNovelCountNovelCountNovel
SNPsAll4,042,801 19%3,076,869 10%2,905,51710%
Homozygous1,297,601 4%1,097,899 2%965,0291%
Heterozygous2,639,864 27%1,800,287 15%1,657,54016%
Transitions3,635,882 2,858,818 2,658,112 
Transversions1,706,195 1,316,837 1,213,232 
Coding23,000 16%18,723 9%16,53210%
Non-synonymous11,400 19%9,286 11%821512%
IndelsShort insertions242,391 40%168,909 37%136,78637%
Short deletions253,803 44%168,726 37%133,00836%
Total 496,194   337,635   269,794  
Coding short indels549 56%556 58%43559%
Frameshifting short indels327 61%310 62%29971%
Block substitutionsLength conserving54,054 39%40,103 42%38,44933%
Length changing34,432 64%22,680 61%18,16660%

Between 2.91 to 4.04 million single nucleotide polymorphisms (SNPs) with respect to the reference genome were identified. Of these SNPs, 81 – 90% have been previously reported in dbSNP build 129. This is consistent with reports of other complete human genome sequences from different ethnicities compared to this reference.

Concordance with HapMap and other orthogonal technologies

The data generated show excellent concordance with SNP genotypes generated by the HapMap project, particularly with the highest quality Illumina Infinium™ subset. The HapMap paper can be found at: http://hapmap.org/downloads/presentations/nature_hapmap3.pdf; see Supplementary Table 3 for details of genotyping accuracy by technology and center: http://www.hapmap.org/downloads/presentations/nature_supp3.pdf. The high concordance of our genotypes with those generated using independent technologies affirms the accuracy of Complete Genomics' sequencing technology for discovery and validation of polymorphisms.

Sample NA19240

Genotype calls compared against HapMap Phase I and II genotypes and the HapMap Infinium subset.

 HapMap Phase I & II SNPsHapMap Infinium subset
# reported3.8M144K
% called98.46%98.45%
% locus concordance99.14%99.85%
HapMap genotype callsHomozygous ref (% concordance)99.22%99.92%
Heterozygous (% concordance)99.62%99.81%
Homozygous alt (% concordance)98.26%99.79%

Sample NA07022

Genotype calls compared against Infinium 1M, HapMap Phase I and II genotypes, and the HapMap Infinium subset. To determine whether discordances were due to errant calls in the Complete Genomics data or the Infinium subset of HapMap, discordant loci were tested by Sanger sequencing.

 Infinium 1MHapMap Phase I & II SNPsHapMap Infinium subsetHapMap Infinium SNPs tested for accuracy by Sanger sequencing
# reported1M3.9M143KThese data correctThese data incorrect% affirmed
% called95.98%94.39%96.00%
% locus concordance99.89%99.15%99.88%
HapMap Genotype callsHomozygous ref (% concordance)99.96%99.34%99.96%18290%
Heterozygous (% concordance)99.78%99.39%99.80%284638%
Homozygous alt (% concordance)99.81%98.14%99.84%281270%

Additionally, to determine a whole-genome false positive rate, 291 novel non-synonymous variants (a category enriched for errors) were tested with Sanger sequencing. This approach yielded an extrapolated false positive rate of approximately 1 in 100 kilobases. For more detail, see Supplemental Table S8 in the Science publication.

Sample NA20431

Genotype calls compared to Affymetrix® 500K SNP genotypes. Genotypes were assayed in duplicate, and only SNPs with identical calls between the replicates are considered.

 Affymetrix 500K SNP chip
# reported475K
% called94.18%
% locus concordance99.75%
Array genotype callsHomozygous ref (% concordance)99.88%
Heterozygous (% concordance)99.45%
Homozygous alt (% concordance)99.78%

©2010 Complete Genomics, Inc. All rights reserved. cPAL and DNB are trademarks of Complete Genomics, Inc. in the US and certain other countries. All other trademarks are the property of their respective owners.