Applications that use artificial intelligence and machine learning techniques present unique challenges to testers. These systems are largely black boxes that use multiple algorithms—sometimes hundreds of them—to process data in a series of layers and return a result.
While testing can be a complex endeavor for any application, at a fundamental level it involves making sure that the results returned are those expected for a given input. And with AI/ML systems, that’s a problem. The software returns an answer, but testers have no way to independently determine whether it’s the correct answer. It’s not always apparent, because testers don’t necessarily know what the right answer should be for a given set of inputs.
In fact, some application results may be laughable. Individual e-commerce recommendation engines often get it factually wrong, but as long as they collectively induce shoppers to add items to their carts, they are considered a business success. And how do you determine whether your ML application achieves the needed level of success before deployment?