Metagenome sampling bias for geographical location and lifestyle is partially responsible for the incomplete catalog of reference genomes of gut microbial species. Thus, genome assembly from currently under-represented populations may effectively expand the reference gut microbiome and improve taxonomic and functional profiling.
We assembled genomes using public whole-metagenomic shotgun sequencing (WMS) data for 110 and 645 fecal samples from India and Japan, respectively. In addition, we assembled genomes from newly generated WMS data for 90 fecal samples collected from Korea. Expecting genome assembly for low-abundance species may require a much deeper sequencing than that usually employed, so we performed ultra-deep WMS (> 30 Gbp or > 100 million read pairs) for the fecal samples from Korea. We consequently assembled 29,082 prokaryotic genomes from 845 fecal metagenomes for the three under-represented Asian countries and combined them with the Unified Human Gastrointestinal Genome (UHGG) to generate an expanded catalog, the Human Reference Gut Microbiome (HRGM).
HRGM contains 232,098 non-redundant genomes for 5414 representative prokaryotic species including 780 that are novel, > 103 million unique proteins, and > 274 million single-nucleotide variants. This is an over 10% increase from the UHGG. The new 780 species were enriched for the Bacteroidaceae family, including species associated with high-fiber and seaweed-rich diets. Single-nucleotide variant density was positively associated with the speciation rate of gut commensals. We found that ultra-deep sequencing facilitated the assembly of genomes for low-abundance taxa, and deep sequencing (e.g., > 20 million read pairs) may be needed for the profiling of low-abundance taxa. Importantly, the HRGM significantly improved the taxonomic and functional classification of sequencing reads from fecal samples. Finally, analysis of human self-antigen homologs on the HRGM species genomes suggested that bacterial taxa with high cross-reactivity potential may contribute more to the pathogenesis of gut microbiome-associated diseases than those with low cross-reactivity potential by promoting inflammatory condition.
By including gut metagenomes from previously under-represented Asian countries, Korea, India, and Japan, we developed a substantially expanded microbiome catalog, HRGM. Information of the microbial genomes and coding genes is publicly available (www.mbiomenet.org/HRGM/). HRGM will facilitate the identification and functional analysis of disease-associated gut microbiota.