At the heart of Britain’s publicly funded health-care system lies a contradiction. The National Health Service generates and holds vast swathes of data on Britons’ health, organised using nhs numbers assigned to every person in its care. The system enables world-leading studies, like the recovery trial during the pandemic, which discovered treatments for covid-19. You might suppose it to be a treasure trove for artificial-intelligence (ai) developers eager to bring their models to bear on improving human health. Yet if you put this to a developer they will roll their eyes and tell you why all is not as rosy as it seems.
That is because the kinds of tabular data that inform clinical trials—who took which drug, what the outcome was—are not the same as those most useful for training machine-learning models, such as scans or genomes, which hold more information about a patient. Much of this sort of nhs data is a mess, organised in ways which serve doctors treating patients, but not ai developers hoping to feed it to computers. Making it suitable for those models is a task with which the nhs has not yet come to grips. It is often easier for those seeking to organise these richer data to start from scratch, as with a vast data-collection exercise now under way.