The US National Institutes of Health has deleted gene sequences of early Covid-19 cases from a key scientific database at the request of Chinese researchers, claimed a Seattle-based virologist.
Jesse Bloom, a virologist at the Fred Hutchinson Cancer Research Center in Seattle, described the removal of the sequencing data in a new paper posted online on bioRxiv on Tuesday.
The paper, which hasn’t been peer-reviewed, flags concerns that the lack of the key gene sequences may dent the current probe into the origin of the pandemic by scientists.
The paper claims that Chinese researchers took virus samples from some of the earliest Covid patients in Wuhan in January and February of 2020, then posted the viral sequences to a widely used US database. After three months the genetic information was removed to “obscure their existence”, an editorial in the journal Science reported on Wednesday.
“Here I identify a data set containing SARS-CoV-2 sequences from early in the Wuhan epidemic that has been deleted from the NIH’s Sequence,” Bloom posted on bioRxiv.
“I recover the deleted files from the Google Cloud, and reconstruct partial sequences of 13 early epidemic viruses. Phylogenetic analysis of these sequences in the context of carefully annotated existing data suggests that the Huanan Seafood Market sequences that are the focus of the joint WHO-China report are not fully representative of the viruses in Wuhan early in the epidemic.
“Instead, the progenitor of known SARS-CoV-2 sequences likely contained three mutations relative to the market viruses that made it more similar to SARS-CoV-2’s bat coronavirus relatives,” Bloom wrote.
Meanwhile, the US NIH has confirmed that it deleted the sequences after receiving a request from a Chinese researcher who had submitted them three months earlier, the Wall Street Journal reported on Wednesday.
“Submitting investigators hold the rights to their data and can request withdrawal of the data,” the NIH said in a statement.
The scientist “indicated the sequence information had been updated, was being submitted to another database, and wanted the data removed from SRA to avoid version control issues,” NIH said.
Bloom said he started his research into the origins of the pandemic after a team led by the World Health Organization submitted its report early in March this year. It was heavily criticized by many scientists who deemed it “extremely unlikely” that SARS-CoV-2 escaped from a laboratory.
Bloom’s search led him to a study that listed all SARS-CoV-2 sequences submitted before March 31, 2020, to the Sequence Read Archive (SRA) — a database overseen by the National Center for Biotechnology Information, a division of NIH. But when he checked SRA for one of the listed projects, he couldn’t find its sequences, the Science report said.
Further research led him to another study by Ming Wang from Wuhan University’s Renmin Hospital, China, which was published in a journal Small. While the paper lists some of the earliest Wuhan Covid patients and the specific mutations in their viruses, it doesn’t give the full sequence data.
Additional internet sleuthing led Bloom to discover that SRA backs up its information in Google’s Cloud Platform, and a search there turned up files containing some of Wang’s team’s earlier data submissions.
The paper in Small makes no mention of any corrections to viral sequences which might explain why they were removed from SRA, which led Bloom to conclude in his preprint that “the trusting structures of science have been abused to obscure sequences relevant to the early spread of SARS-CoV-2 in Wuhan”, the report said.