- MURAL uses multitask learning applied to image–text pairs in combination with translation pairs covering over 100 languages.
There are around 7000 languages spoken in this world, and often, there is no direct one-to-one translation from one language to another. Even if such translations exist, they may not be exactly accurate, and different associations and connotations can be easily lost for a non-native speaker. This issue can be resolved by presenting a text paired with a supporting image. But, such image–text pair data does not exist for most languages. This type of data mostly comes for highly-resourced languages like English and Chinese.
To address this, Google AI has released the “MURAL: Multimodal, Multitask Representations Across Languages” model for image–text matching. It uses multitask learning applied to image–text pairs in combination with translation pairs covering over 100 languages.