Imagine typing « dramatic intro music » and hearing a soaring symphony or writing « creepy footsteps » and getting high-quality sound effects. That’s the promise of Stable Audio, a text-to-audio AI model announced Wednesday by Stability AI that can synthesize music or sounds from written descriptions. Before long, similar technology may challenge musicians for their jobs.
If you’ll recall, Stability AI is the company that helped fund the creation of Stable Diffusion, a latent diffusion image synthesis model released in August 2022. Not content to limit itself to generating images, the company branched out into audio by backing Harmonai, an AI lab that launched music generator Dance Diffusion in September.
Now Stability and Harmonai want to break into commercial AI audio production with Stable Audio. Judging by production samples, it seems like a significant audio quality upgrade from previous AI audio generators we’ve seen.
On its promotional page, Stability provides examples of the AI model in action with prompts like « epic trailer music intense tribal percussion and brass » and « lofi hip hop beat melodic chillhop 85 bpm. » It also offers samples of sound effects generated using Stable Audio, such as an airline pilot speaking over an intercom and people talking in a busy restaurant.