Continuous genomic diversification of long polynucleotide fragments drives the emergence of new SARS-CoV-2 variants of concern

March 10 2022

Highly transmissible or immuno-evasive SARS-CoV-2 variants have intermittently emerged and outcompeted previously circulating strains, resulting in repeated COVID-19 surges, reinfections, and breakthrough infections in vaccinated individuals. With over 6 million SARS-CoV-2 genomes sequenced globally over the last 2 years, there is unprecedented data to decipher how competitive viral evolution results in the emergence of fitter SARS-CoV-2 variants. Much attention has been directed to studying how specific mutations in the Spike protein impact its binding to the ACE2 receptor or viral neutralization by antibodies, but there is limited knowledge of a genomic signature that is shared primarily by the sequential dominant variants. Here we introduce a methodology to quantify the genome-wide distinctiveness of polynucleotide fragments of various lengths (3- to 240-mers) that constitute SARS-CoV-2 sequences (freely available at Compared to standard phylogenetic distance metrics and overall mutational load, the quantification of distinctive 9-mer polynucleotides provides a higher resolution of separation between VOCs (Reference = 89, IQR: 65–108; Alpha = 166, IQR: 149–181; Beta 131, IQR: 114–149; Gamma = 164, IQR: 150–178; Delta = 235, IQR: 217–255; and Omicron = 459, IQR: 395–521). Omicron's exceptionally high genomic distinctiveness may confer a competitive advantage over both prior VOCs (including Delta) and the recently emerged and highly mutated B.1.640.2 (IHU) lineage. Expanding on this analysis, evaluation of genomic distinctiveness weighted by intra-lineage 9-mer conservation for 883 lineages highlights that genomic distinctiveness has increased over time (R2 = 0.37) and that VOCs score significantly higher than contemporary non-VOC lineages, with Omicron among the most distinctive lineages observed till date. This study demonstrates the value of characterizing new SARS-CoV-2 variants by the newly introduced metric of genome-wide polynucleotide distinctiveness and emphasizes the need to go beyond a narrow set of mutations at known functionally or antigenically salient sites on the Spike protein. The consistently higher distinctiveness of each emerging VOC compared to prior VOCs suggests that real-time monitoring of genomic distinctiveness would aid in more rapid assessment of viral fitness.


Karthik Murugadoss, Michiel J.M. Niesen, Bharathwaj Raghunathan, Patrick J. Lenehan, Pritha Ghosh, Tyler Feener, Praveen Anand, Safak Simsek, Rohit Suratekar, Travis K. Hughes, Venky Soundararajan

nference, Cambridge, MA 02142, USA


Correspondence to:

Venky Soundararajan (