Complete Genomics has made available several data sets from NA19240, an International
HapMap project Yoruban female sample. This sample was sequenced using Complete Genomics’
third-generation human genome sequencing technology. High-level summary information
describing the sequencing results for this sample may be reviewed in the NA19240
Data Sheet.
For more detailed data and results, see the Variation Data,
NA19240 Chromosome Data Set, Science Paper Data Sets, and Complete NA19240 Data Set sections below.
Documentation regarding the file formats is available here.
Complete Genomics has posted recently updated NA19240 results which are derived from the
same source data used in our Science publication
(Science 327 (5961), 78. [DOI: 10.1126/science.1181498]). While the published data
set is of excellent quality, Complete Genomics has made improvements to our service since that
time, in particular by providing additional data analysis results. These data have been reanalyzed
using a newer version of our pipeline software (v1.7.0), thus the data is now provided in an
updated file format. Please note: Complete Genomics requests that the version number of the
assembly software be included in any publications and presentations referencing these data.
Variation Data
The variation file for NA19240 contains a complete description of this genome as compared
to the NCBI reference genome (Build 36). Every base of this genome in reference genome
coordinate space is described as variant, same as reference, or as no-called base. In addition,
annotation files are available showing the sequence of NA19240 at sites of known polymorphisms
(from dbSNP 129) and in the coding regions of known genes. A summary table of functional
changes categorized by transcript is also available. The current software version available for
this data set is 1.7.3.
- Documentation:
DataFileFormats.pdf (784 KB) contains detailed descriptions of Complete Genomics data formats.
Note that not all data described in the DataFileFormats.pdf is available for download below.
- Variations:
GS000000320-ASM.tsv.bz2 (318,975 KB): Called sequence relative to NCBI reference including
all base calls and no-calls from NA19240.
- Annotations:
dbSNPAnnotated-GS000000320-ASM.tsv.bz2 (327,016 KB):
For each entry in dbSNP 129, states call in NA19240.
gene-GS000000320-ASM.tsv.bz2 (112,500 KB):
Functional annotation of individual variants called in NA19240
(nonsynonymous SNPs, frameshifting indels, etc)
gene-var-summary-GS000000320-ASM.tsv (2,318 KB):
For each transcript, summarizes the number of variations with certain functional impacts
(counts of nonsynonymous SNPs, etc.)
summary-GS000000320-ASM.tsv (2 KB):
Overall information and statistics for this sequencing project.
NA19240 Chromosome 21 Dataset
A complete data set for NA19240 chromosome 21 is also available for download. It includes a subset of the variant files described
above, and in addition the assemblies, mapping data, reads and quality scores, coverage data and other computed results. This data
set is useful for understanding the data structure and file formats of Complete Genomics’ sequence data, for downstream software
and analysis development, as well as for a variety of other applications.
Data included in this release are broken into five portions for ease in downloading..
- GS19240-173-21-pt1-ASM.tar(564 MB): Contains the
contents of the ASM and LIB folders. These folders contain the complete set of derived results from the genome sequencing:
variant calls, annotations, coverage information and assemblies. This also includes the individual reads, quality scores, and
initial mapping results for a single lane, primarily as an example.
- GS19240-173-21-pt2-MAP.tar(3 GB): Contains
individual reads, quality scores, and initial mapping results for about a third of the data set by size.
- GS19240-173-21-pt3-MAP.tar(3 GB): Contains
individual reads, quality scores, and initial mapping results for roughly another third of the data set.
- GS19240-173-21-pt4-MAP.tar(2.997 GB): Contains
all remaining individual reads, quality scores, and initial mapping results, also roughly a third of the data set.
- GS19240-173-21-DOC.zip(1.75 GB): Contains
the documentation for this data set.
Science Paper Data Sets
The read and variation data from the data from the three human genomes (NA19240, NA07022, and NA20431/PGP1)
from the Science publication and based on the CGI software in use as of that submission has been submitted to the
Short Read Archive
at NCBI (SRA008092).
Complete NA19240 Data Set
The complete dataset of sample NA19240 for the entire genome is also available by request.
This data set (~400 Gb) is provided in the format and structure of a typical customer result.
Legal Disclaimer
These human genomic sequence data are preliminary and may contain errors.
Complete Genomics, Inc. reserves all rights to the genome sequence data and related variations file(s) downloaded here or obtained from Complete Genomics. With access to these data, you agree not to redistribute the genome sequence data or variations file(s) without express written permission. When these genome sequence data (and/or information on sequence variations from a variation file) are used for any publications, you agree to (i) reference the Science publication (R. Drmanac, et. al. Science, 5 November 2009 (10.1126/science.1181498)) and (ii) provide the version number of the Complete Genomics assembly software from which the data was generated.
Disclaimer of Warranties
COMPLETE GENOMICS, INC. PROVIDES THESE DATA IN GOOD FAITH TO THE RECIPIENT “AS IS.” COMPLETE GENOMICS, INC. MAKES NO REPRESENTATION OR WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE OR USE, OR ANY OTHER STATUTORY WARRANTY. COMPLETE GENOMICS, INC. ASSUMES NO LEGAL LIABILITY OR RESPONSIBILITY FOR ANY PURPOSE FOR WHICH THE DATA ARE USED.
Any permitted redistribution of the data should carry the Disclaimer of Warranties provided above.
Data file formats and any corresponding API are expected to evolve over time. Complete Genomics cannot guarantee backward compatibility of any file format or API.