Meta’s quest to translate underserved languages is marking its first victory with the open source release of a language model able to decipher 202 languages.
Named after Meta’s No Language Left Behind initiative and dubbed NLLB-200, the model is the first able to translate so many languages, according to its makers, all with the goal to improve translation for languages overlooked by similar projects.
« The vast majority of improvements made in machine translation in the last decades have been for high-resource languages, » Meta researchers wrote in a paper [PDF]. « While machine translation continues to grow, the fruits it bears are unevenly distributed, » they said.
According to the announcement of NLLB-200, the model can translate 55 African languages « with high-quality results. » Prior to NLLB-200’s creation, Meta said fewer than 25 African languages were covered by widely used translation tools. When tested against the BLEU standard, Meta said NLLB-200 showed an average improvement of 44 percent over other state-of-the-art translation models. For some African and Indian languages, the improvement reportedly went as high as 70 percent.