AI models for health care that predict disease are not as accurate as reports might suggest. Here’s why

We use tools that rely on artificial intelligence (AI) every day, with voice assistants like Alexa and Siri being among the most common. These consumer products work reasonably well—Siri understands most of what we say—but they are by no means perfect. We accept their limitations and adapt how we use them until they get the right answer, or we give up. After all, the consequences of Siri or Alexa misunderstanding a user request are usually minor.

However, mistakes by AI models that support doctors’ clinical decisions can mean life or death. Therefore, it’s critical that we understand how well these models work before deploying them. Published reports of this technology currently paint a too-optimistic picture of its accuracy, which at times translates to sensationalized stories in the press. Media are rife with discussions of algorithms that can diagnose early Alzheimer’s disease with up to 74 percent accuracy or that are more accurate than clinicians. The scientific papers detailing such advances may become foundations for new companies, new investments and lines of research, and large-scale implementations in hospital systems. In most cases, the technology is not ready for deployment.

Here’s why: As researchers feed data into AI models, the models are expected to become more accurate, or at least not get worse. However, our work and the work of others has identified the opposite, where the reported accuracy in published models decreases with increasing data set size.

The cause of this counterintuitive scenario lies in how the reported accuracy of a model is estimated and reported by scientists. Under best practices, researchers train their AI model on a portion of their data set, holding the rest in a “lockbox.” They then use that “held-out” data to test their model for accuracy. For example, say an AI program is being developed to distinguish people with dementia from people without it by analyzing how they speak. The model is developed using training data that consist of spoken language samples and dementia diagnosis labels, to predict whether a person has dementia from their speech. It is then tested against held-out data of the same type to estimate how accurately it will perform. That estimate of accuracy then gets reported in academic publications; the higher the accuracy on the held-out data, the better the scientists say the algorithm performs.

Veille-cyber

Share
Published by
Veille-cyber

Recent Posts

Sécurité des mots de passe : bonnes pratiques pour éviter les failles

Sécurité des mots de passe : bonnes pratiques pour éviter les failles La sécurité des…

1 semaine ago

Ransomware : comment prévenir et réagir face à une attaque

Ransomware : comment prévenir et réagir face à une attaque Le ransomware est l’une des…

1 semaine ago

Cybersécurité et e-commerce : protéger vos clients et vos ventes

Cybersécurité et e-commerce : protéger vos clients et vos ventes En 2025, les sites e-commerce…

2 semaines ago

Les ransomwares : comprendre et se défendre contre cette menace

Les ransomwares : comprendre et se défendre contre cette menace En 2025, les ransomwares représentent…

2 semaines ago

RGPD et cybersécurité : comment rester conforme en 2025

RGPD et cybersécurité : comment rester conforme en 2025 Depuis sa mise en application en…

2 semaines ago

VPN : un outil indispensable pour protéger vos données

VPN : un outil indispensable pour protéger vos données Le VPN, ou « Virtual Private…

2 semaines ago

This website uses cookies.