A new method automatically describes, in natural language, what the individual components of a neural network do.
Neural networks are sometimes called black boxes because, despite the fact that they can outperform humans on certain tasks, even the researchers who design them often don’t understand how or why they work so well. But if a neural network is used outside the lab, perhaps to classify medical images that could help diagnose heart conditions, knowing how the model works helps researchers predict how it will behave in practice.
MIT researchers have now developed a method that sheds some light on the inner workings of black box neural networks. Modeled off the human brain, neural networks are arranged into layers of interconnected nodes, or “neurons,” that process data. The new system can automatically produce descriptions of those individual neurons, generated in English or another natural language.