Learning the language of viral evolution and escape
Science, 2021, 371, 284
DOI: 10.1126/science.abd7331
Abstract
- Viral escape patterns were modeled by machine learning algorithms.
- Viral escape mutations preserve viral infectivity but cause a virus to look different to the immune system.
- The algorithm can predict mutation pattern and represents a conceptual bridge between natural language and viral evolution.
Introduction
Previous: High-throughput experimental techniques
This work: Train algorithm to model
A computational model of protein evolution
Previous: Focus on either Fitness or Function
This work: developing a single model that simultaneously achieves both
Method
Use machine learning algorithms called language models.
Protein sequence as a language.
The algorithm learns the probability of an amino acid given its sequence context.
Semantic change -> antigenic change.
Grammaticality -> viral fitness
Three substrates
Influenza A hemagglutinin (HA)
HIV-1 envelope glycoprotein (Env)
SARS-CoV-2 spike glycoprotein (Spike)
All three are found on the viral surface, are responsible for binding host cells, are targeted by antibodies, and are drug targets
Result
Understanding the semantic patterns
- Visualized semantic embeddings correspond to subtype, host species.
-> The algorithm interpreted the classification of viruses by just seeing sequences.
Language model grammaticality was significantly correlated with viral fitness.
Dataset was obtained by measuring the dissociation constant between mutated virus proteins and human receptors.
-> Keep grammaticality = fitness same and change semantic = antigen recognition
-> Lack of information of posttranslational modifications
Conclusion
- Using machine learning technology called the language model, virus escape patterns were predicted.
- The result that evolutionary selection is reflected in sequence variation can be generalized beyond viral escape to different natural selection.
コメント