Google Blog: Analyzing 3K rice genomes characterized by DeepVariant

The 3,000 Rice Genomes Project (3K RGP) is a collaborative, international research program that has sequenced 3,024 rice varieties from 89 countries. This massive dataset is a powerful resource for understanding natural genetic variation in rice as well as for large-scale discovery of new genes associated with economically important traits. It will help accelerate the pace of developing improved rice varieties around the globe to feed a growing population, estimated to reach more than 9.6 billion by 2050, with half of humanity relying on rice for sustenance and livelihood.

Three research institutions—IRRI, the Chinese Academy of Agricultural Sciences (CAAS), and the Beijing Genomics Institute (BGI) Shenzhen — collaborated to sequence the genomes of 3,024 rice varieties and lines housed in the IRRI (82%) and the CAAS (18%) genebanks. The sequencing and initial analysis was funded by grants from the Bill & Melinda Gates Foundation and the Chinese Ministry of Science and Technology. This dataset contains millions of genomic sequences from a diverse set of rice varieties that, when combined with phenotyping observations, gene expression, and other information, provides an important step in establishing gene-trait associations, building predictive models, and applying these models to breeding.

The result of this collaboration to sequence and characterize the genomic variation of the Rice 3K dataset was published in April 2018.

In this blog post , Google engineers and analysts explore the identification and analysis of different rice genome mutations with a tool called DeepVariant . They conducted a re-analysis of the Rice 3K dataset and have made the data publicly available as part of the Google Cloud Public Dataset Program pre-publication and under the terms of the Toronto Statement .