Machine learning could help to identify the viruses most likely to spill over from animals to people and cause future pandemics.
In February 2021, seven Russian poultry-farm workers were reported to have been infected with H5N8 avian influenza. This subtype of bird flu had never been known to infect people before, and the virus’s genetic sequence was quickly uploaded to the genetic data repository GISAID. For Colin Carlson, a biologist at Georgetown University in Washington DC, it presented an opportunity. “I immediately thought, ‘I want to run this through FluLeap’,” he says.
FluLeap is a machine-learning algorithm that uses sequence data to classify influenza viruses as either avian or human. The model had been trained on a huge number of influenza genomes — including examples of H5N8 — to learn the differences between those that infect people and those that infect birds. But the model had never seen an H5N8 virus categorized as human, and Carlson was curious to see what it made of this new subtype.
Somewhat surprisingly, the model identified it as human with 99.7% confidence. Rather than simply reiterating patterns in its training data, such as the fact that H5N8 viruses do not typically infect people, the model seemed to have inferred some biological signature of compatibility with humans. “It’s stunning that the model worked,” says Carlson. “But it’s one data point; it would be more stunning if I could do it a thousand more times.”
The zoonotic process of viruses jumping from wildlife to people causes most pandemics. As climate change and human encroachment on animal habitats increase the frequency of these events, understanding zoonoses is crucial to efforts to prevent pandemics, or at least to be better prepared.