Synthetic intelligence developer Stability AI has unveiled Secure Audio 2.0, the following iteration of its text-to-music era system.
The most recent model helps artists and musicians with a wider vary of artistic instruments and the flexibility to supply full-length music tracks “with conventional tune construction and excessive audio high quality” utilizing pure language prompts, the corporate mentioned Wednesday (April 3).
Secure Audio 1.0, launched final September, captured consideration with its means to craft quick audio clips based mostly on textual descriptions. It was named considered one of TIME’s Greatest Innovations of 2023.
The brand new model expands on this basis, permitting customers to generate full songs as much as three minutes lengthy at 44.1 kHz stereo. This prolonged timeframe opens doorways for a greater variety of musical creations, from full instrumentals to structured compositions with intros, improvement sections, and outros.
“Secure Audio 2.0 units a brand new normal in AI-generated audio,” Stability AI mentioned in a weblog publish. “The brand new mannequin introduces audio-to-audio era by permitting customers to add and rework samples utilizing pure language prompts.
Past the elevated size, Secure Audio 2.0 additionally presents different options together with new “audio-to-audio” capabilities that enable customers to add their very own audio samples to set the fashion and sound of AI-generated outputs.
“With each text-to-audio and audio-to-audio prompting, customers can produce melodies, backing tracks, stems, and sound results, thus enhancing the artistic course of.”
Stability AI
“Our most superior audio mannequin but expands the artistic toolkit for artists and musicians with its new functionalities. With each text-to-audio and audio-to-audio prompting, customers can produce melodies, backing tracks, stems, and sound results, thus enhancing the artistic course of,” Stability AI mentioned.
The discharge of Secure Audio 2.0 comes amid a interval of inner change at Stability AI. Ed Newton-Rex, the corporate’s former Vice President of Audio, lately departed as a result of disagreements over using copyrighted supplies in coaching datasets.
“Firms value billions of {dollars} are, with out permission, coaching generative AI fashions on creators’ works, that are then getting used to create new content material that in lots of instances can compete with the unique works. I don’t see how this may be acceptable in a society that has arrange the economics of the artistic arts such that creators depend on copyright,” Newton-Rex, who helped develop Secure Audio, mentioned in a public resignation letter. He has since launched an initiative to guage and certify AI fashions based mostly on their respect for creators’ rights.
Stability AI addressed copyright issues about its AI improvement, saying “Secure Audio 2.0 was solely skilled on a licensed dataset from the AudioSparx music library, honoring opt-out requests and making certain honest compensation for creators.”
The 1.0 mannequin was additionally skilled utilizing knowledge from AudioSparx, which consists of over 800,000 audio recordsdata containing music, sound results, and single-instrument stems, and corresponding textual content metadata.
“Secure Audio 2.0 is likely one of the strongest and versatile generative AI music instruments out there and makes it potential for musicians, producers, and different creators to make use of AI as a collaborative software for music composition, audio experimentation, and content material creation — like by no means earlier than.”
Stability AI
The replace additionally built-in Audible Magic to scan audio uploads for copyright infringement. Audible Magic presents content material recognition know-how to assist with real-time content material matching to forestall copyright infringement.
Secure Audio 2.0 additionally introduces options like Model Switch to match generated or uploaded audio to present tracks, SFX creation, and variations.
“Secure Audio 2.0 is likely one of the strongest and versatile generative AI music instruments out there and makes it potential for musicians, producers, and different creators to make use of AI as a collaborative software for music composition, audio experimentation, and content material creation — like by no means earlier than,” Stability AI mentioned in a press release.
Stability AI additionally presents technical particulars in regards to the mannequin’s structure, explaining its effectiveness in producing high-quality musical compositions.
“A brand new, extremely compressed autoencoder compresses uncooked audio waveforms into a lot shorter representations. For the diffusion mannequin, we make use of a diffusion transformer (DiT), akin to that utilized in Secure Diffusion 3, rather than the earlier U-Internet, as it’s more proficient at manipulating knowledge over lengthy sequences.
“The mix of those two parts ends in a mannequin able to recognizing and reproducing the large-scale constructions which can be important for high-quality musical compositions.”
The brand new mannequin is on the market to make use of totally free on the Secure Audio web site and can quickly be out there on the Secure Audio API.
Stability AI has additionally launched Secure Radio, a 24/7 stay stream that options tracks generated by Secure Audio.
Music Enterprise Worldwide