Moon Young Kima,1, Sunghoon Leeb,1,2, Kyujung Vana,1, Tae-Hyung Kimb,1,2, Soon-Chun Jeongc, Ik-Young Choid, Dae-Soo Kimb, Yong-Seok Leeb, Daeui Parkb, Jianxin Mae, Woo-Yeon Kimb, Byoung-Chul Kimb, Sungjin Parkb, Kyung-A Leeb, Dong Hyun Kima, Kil Hyun Kima, Jin Hee Shina, Young Eun Janga, Kyung Do Kima, Wei Xian Liua, Tanapon Chaisana, Yang Jae Kanga, Yeong-Ho Leea, Kook-Hyung Kimf, Jung-Kyung Moong, Jeremy Schmutzh, Scott A. Jacksone, Jong Bhakb,2,3, and Suk-Ha Leea,i,3
aDepartment of Plant Science and Research Institute for Agriculture and Life Sciences, Seoul National University, Seoul 151-921, Korea;
bKorean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 305-806, Korea;
cBio-Evaluation Center, Korea Research Institute of Bioscience and Biotechnology, Chungbuk 363-883, Korea;
dNational Instrumentation Center for Environmental Management, Seoul National University, Seoul 151-921, Korea;
eDepartment of Agronomy, Purdue University, West Lafayette, IN 47906;
fDepartment of Agricultural Biotechnology, Seoul National University, Seoul 151-921, Korea;
gRural Development Administration, Gyeonggi 441-770, Korea;
hHudsonAlpha Genome Sequencing Center, Huntsville, AL 35806; and
iPlant Genomics and Breeding Institute, Seoul National University, Seoul 151-921, Korea
Edited* by Ronald L. Phillips, University of Minnesota, St. Paul, MN, and approved October 29, 2010 (received for review July 12, 2010)
The genome of soybean (Glycine max), a commercially important crop, has recently been sequenced and is one of six crop species to have been sequenced. Here we report the genome sequence of G. soja, the undomesticated ancestor of G. max (in particular, G. soja var. IT182932). The 48.8-Gb Illumina Genome Analyzer (Illumina-GA) short DNA reads were aligned to the G. max reference genome and a consensus was determined for G. soja. This consensus sequence spanned 915.4 Mb, representing a coverage of 97.65% of the G. max published genome sequence and an average mapping depth of 43-fold. The nucleotide sequence of the G. soja genome, which contains 2.5 Mb of substituted bases and 406 kb of small insertions/deletions relative to G. max, is ∼0.31% different from that of G. max. In addition to the mapped 915.4-Mb consensus sequence, 32.4 Mb of large deletions and 8.3 Mb of novel sequence contigs in the G. soja genome were also detected. Nucleotide variants of G. soja versus G. max confirmed by Roche Genome Sequencer FLX sequencing showed a 99.99% concordance in single-nucleotide polymorphism and a 98.82% agreement in insertion/deletion calls on Illumina-GA reads. Data presented in this study suggest that the G. soja/G. max complex may be at least 0.27 million y old, appearing before the relatively recent event of domestication (6,000∼9,000 y ago). This suggests that soybean domestication is complicated and that more in-depth study of population genetics is needed. In any case, genome comparison of domesticated and undomesticated forms of soybean can facilitate its improvement.
massively parallel sequencing, sequence variation, wild soybean, divergence, genome duplication
3To whom correspondence may be addressed.
Author contributions: M.Y.K., K.V., S.-C.J., J.M., J.B., and S.-H.L. designed research; I.-Y.C., D.-S.K., D.H.K., K.H.K., J.H.S., Y.E.J., K.D.K., W.X.L., T.C., and Y.-H.L. performed research; S.L., T.-H.K., D.-S.K., Y.-S.L., D.P., W.-Y.K., B.-C.K., S.P., K.-A.L., and Y.J.K. analyzed data; and M.Y.K., S.L., K.V., T.-H.K., S.-C.J., K.-H.K., J.-K.M., J.S., S.A.J., J.B., and S.-H.L. wrote the paper.
1M.Y.K., S.L., K.V., and T.-H.K. contributed equally to this work.
2Present address: Personal Genomics Institute, Suwon, Gyeonggi 433-759, Korea.
*This Direct Submission article had a prearranged editor.
The authors declare no conflict of interest.
Database deposition: The sequence data from this study have been deposited in the National Center for Biotechnology Information Short Read Archive, www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi (accession no. SRA009252).
This article contains supporting information online at