Learning the language of viral evolution and escape

研究

Learning the language of viral evolution and escape

Science, 2021, 371, 284
DOI: 10.1126/science.abd7331

Abstract

  • Viral escape patterns were modeled by machine learning algorithms.
  • Viral escape mutations preserve viral infectivity but cause a virus to look different to the immune system.
  • The algorithm can predict mutation pattern and represents a conceptual bridge between natural language and viral evolution.

Introduction

To understand viral escape pattern. 

Previous: High-throughput experimental techniques

This work: Train algorithm to model

A computational model of protein evolution

Previous: Focus on either Fitness or Function

This work: developing a single model that simultaneously achieves both

Method

Use machine learning algorithms called language models.

Protein sequence as a language.

The algorithm learns the probability of an amino acid given its sequence context.

Semantic change -> antigenic change.

Grammaticality -> viral fitness

Three substrates

Influenza A hemagglutinin (HA)

HIV-1 envelope glycoprotein (Env)

SARS-CoV-2 spike glycoprotein (Spike)

All three are found on the viral surface, are responsible for binding host cells, are targeted by antibodies, and are drug targets

Result

Understanding the semantic patterns
  • Visualized semantic embeddings correspond to subtype, host species.

-> The algorithm interpreted the classification of viruses by just seeing sequences.

The relationship between viral fitness and language model grammaticality

Language model grammaticality was significantly correlated with viral fitness.

Dataset was obtained by measuring the dissociation constant between mutated virus proteins and human receptors.

Prediction of mutations that lead to viral escape by combining semantic change and grammaticality

-> Keep grammaticality = fitness same and change semantic = antigen recognition 

-> Lack of information of posttranslational modifications

Conclusion

  • Using machine learning technology called the language model, virus escape patterns were predicted.
  • The result that evolutionary selection is reflected in sequence variation can be generalized beyond viral escape to different natural selection.

コメント

タイトルとURLをコピーしました