Companies rely on real-world data to train artificial-intelligence models that can identify anomalies, make predictions and generate insights. But often, it isn’t enough.
To detect credit-card fraud, for example, researchers train AI models to look for specific patterns of known suspicious behavior, gleaned from troves of data. But unique, or rare, types of fraud are difficult to detect when there isn’t enough data to support the algorithm’s training.
To get around that, companies are learning to fake it, building so-called synthetic data sets designed to augment training data.
Rare or uncommon types of fraud can be overlooked by the company’s AI-based fraud-detection model if the algorithms don’t have enough training examples of that type of fraud, he said.
« There are a lot of different kinds of patterns, the number of fraud patterns in real life is pretty big, » Mr. Efimov said. « Some fraud patterns happen more often than others, and some patterns are very rare. »
American Express is working on improving these models by experimenting with generative adversarial networks, a technique to create synthetic data on uncommon fraud patterns. That data is then used to augment the company’s existing data set of fraud behaviors to improve its overall AI-based fraud-detection models.
« We started thinking, can we balance the presence of different fraud patterns? That’s where [generative adversarial networks] come up, » he said.
A generative adversarial network is an AI technique commonly used to create simulation data to train the underlying AI models that power self-driving cars. It is also used to create deepfakes, that is, photographs or videos of people that are often indistinguishable from reality.
One AI model acts as a « generator » that produces new data, and the second model tries to determine whether the data is real or fake, Mr. Efimov said. The « perfect » generative adversarial network is one that cannot tell fake data from real, he said.
Personally identifiable information isn’t used at any stage of the process, he said.
The effort is still in the research stages, in part, because it is difficult to determine the amount of each unique fake fraud pattern the AI model should be generating, he said. But early tests are promising. Experiments have shown that for specific types of fraud, the fake data does improve the AI-based fraud-detection model, he said.
American Express has had the lowest U.S. fraud-loss rates among the major banks for the past 14 years, according to a February Nilson Report, a source of news and statistics on the payment industry.
Synthetic data has already found uses in other industries. Hospitals, for example, are using synthetic data based on real medical records from patients to make medical decisions.
Startup Moveworks Inc. generates synthetic data to improve its AI-based chatbots, used by corporate customers to answer employee questions related to information technology, finance and human resources, said Vaibhav Nivargi, co-founder and chief technology officer.
Moveworks’ customers supply it with technical documents to help answer IT questions related to, for example, computer memory, Mr. Nivargi said. But that data is frequently insufficient to train its chatbots to answer questions.