Skip to main content
Premium Trial:

Request an Annual Quote

Consortium Produces Pangenome Reference Spanning Dozens of Chinese Populations

NEW YORK – Members of the Chinese Pangenome Consortium (CPC) have published new research on dozens of populations in China, putting together a draft pangenome reference that contains missing sequences not found in the past, including sequences originating in archaic hominins.

The findings appeared in Nature on Wednesday.

"[W]e attempted to uncover missing sequences and hidden variations that have not been identified before in Chinese ethnic groups," co-senior and co-corresponding author Shuhua Xu, a human population omics group researcher affiliated with Fudan University, the Chinese Academy of Sciences, ShanghaiTech University, and Jiangsu Normal University, said in an email, adding that the CPC pangenome reference "undoubtedly provides a more comprehensive understanding of genomic variation in Asian populations, particularly those of Chinese ancestry."

Using Pacific Biosciences or Oxford Nanopore Technologies long sequence reads, together with linked reads, Hi-C data, and Illumina short reads, the researchers put together high-quality, haplotype-phased de novo genome assemblies for 116 individuals from three dozen minority Chinese ethnic groups that have been underrepresented in prior research efforts.

In the process, they unearthed some 15.9 million small variants, including single-nucleotide variants and small insertions or deletions, along with 78,000 structural variants — a set that included around 5.9 million small variants and 34,000 SVs that had not been identified in the past.

Relative to the GRCh38 reference sequence, the team's new pangenome sequences represent an additional 189 million bases of polymorphic sequence data on euchromatic sequences, Xu explained, while flagging 1,367 duplications involving protein-coding gene sequences.

"[A]bout 18.4 percent of the small variants and 17.1 percent of the SVs identified were specific to the CPC assemblies compared with a recently released pangenome reference by the Human Pangenome Reference Consortium (HPRC)," he explained. "These newly identified genomic variations are more informative and thus can facilitate uncovering finer-scale population relationships, as the majority of the novel variations are population-specific."

The team was able to achieve improved Chinese genome sequence alignments using the CPC pangenome reference relative to alignments possible with an available reference from the Human Pangenome Reference Consortium.

"Compared with the HPRC graph reference, using the CPC graph reference improved the perfect alignment rate of short reads in East Asian samples," Xu noted, explaining that this improved alignment "would also help to improve the accuracy of profiling parts of the genome enriched with complex sequence variations (such as genes regulating the immune system)."

The new sequence data is expected to provide an enhanced understanding of sequences behind specific traits and conditions, Xu explained, including missing complex disease heritability and associations traced back to genes and genetic variants originating in archaic hominin sequences.

"Overall, such efforts would aid genomic analysis for human evolutionary and medical research," Xu said, noting that the current work "is just the first step" toward the team's goal of establishing a comprehensive high-quality genome reference for populations in China and other parts of Asia.

The Scan

Cystatin C Plays Role in Immunosuppression, Cancer Immunotherapy Failure, Study Finds

A study in Cell Genomics provides insight into how glucocorticoids can lead to cancer immunotherapy failure via cystatin C production.

Aging, Species Lifespan Gene Expression Signatures Overlap

An Osaka Metropolitan University team reports in Nucleic Acids Research that transcriptional signatures of aging and maximum lifespan have similarities.

Splicing Subgroup Provides Protocols for Evaluating Splicing Variant Data

The group presents their approach on how to apply evidence codes to splicing predictions and other data in the American Journal of Human Genetics.

Single-Cell Transcriptomic Atlas of Mouse Cochlea to Aid Treatment Development

Researchers in PNAS conducted single-cell and single-nuclear sequencing of about 120,000 cells at three key timepoints in cochlear development to generate a transcriptomic atlas.