Open-source language AI challenges big tech’s models

Open-source language AI challenges big tech's models

An international team of around 1,000 largely academic volunteers has tried to break big tech’s stranglehold on natural-language processing and reduce its harms. Trained with US$7-million-worth of publicly funded computing time, the BLOOM language model will rival in scale those made by firms Google and OpenAI, but will be open-source. BLOOM will also be the first model of its scale to be multilingual.

The collaboration, called BigScience, launched an early version of the model on 17 June, and hopes that it will ultimately help to reduce harmful outputs of artificial intelligence (AI) language systems. Models that recognize and generate language are increasingly used by big tech firms in applications from chat bots to translators, and can sound so eerily human that a Google engineer this month claimed that the firm’s AI model was sentient (Google strongly denies that the AI possesses sentience). But such models also suffer from serious practical and ethical flaws, such as parroting human biases. These are difficult to tackle because the inner workings of most such models are closed to researchers.

As well being a tool to explore AI, BLOOM will be open for a range of research uses, such as extracting information from historical texts and making classifications in biology. “We think that access to the model is an essential step to do responsible machine learning,” says Thomas Wolf, co-founder of Hugging Face, a company that hosts an open-source platform for AI models and data sets, and has helped to spearhead the initiative.

Read more