AI algorithms rely on reliable data to generate optimal results – if the data is biased, incomplete, insufficient, and inaccurate, it leads to devastating consequences.
AI systems that identify patient diseases are an excellent example of how poor data quality can lead to adverse outcomes. When ingested with insufficient data, these systems produce false diagnoses and inaccurate predictions resulting in misdiagnoses and delayed treatments. For example, a study conducted at the University of Cambridge of over 400 tools used for diagnosing Covid-19 found reports generated by AI entirely unusable, caused by flawed datasets.
In other words, your AI initiatives will have devastating real-world consequences if your data isn’t good enough.
What Does “Good Enough” Data Mean?
There is quite a debate on what ‘good enough’ data means. Some say good enough data doesn’t exist. Others say the need for good data causes analysis paralysis – while HBR outrightly states your machine learning tools are useless if your information is terrible.
At WinPure, we define good enough data as “complete, accurate, valid data that can be confidently used for business processes with acceptable risks, the level of which is subjected to individual objectives and circumstances of a business.’
Most companies struggle with data quality and governance more than they admit. Add to the tension; they are overwhelmed and under immense pressure to deploy AI initiatives to stay competitive. Sadly, this means problems like dirty data are not even part of boardroom discussions until it causes a project to fail.