In recent years, voice-based virtual assistants such as Google Assistant and Amazon Alexa have grown popular. This has presented both potential and challenges for natural language understanding (NLU) systems. These devices’ production systems are often trained by supervised learning and rely significantly on annotated data. But, data annotation is costly and time-consuming. Furthermore, model updates using offline supervised learning can take long and miss trending requests.
In the underlying architecture of voice-based virtual assistants, the NLU model often categorizes user requests into hypotheses for downstream applications to fulfill. A hypothesis comprises two tags: user intention (intent) and Named Entity Recognition (NER). For example, the valid hypothesis for “play a Madonna song” will be: PlaySong intent, ArtistName – Madonna.