Multi-Source Diffusion Models for Simultaneous Music Generation and Separation

Tech talk

Multi-Source Diffusion Models for Simultaneous Music Generation and Separation

Tech talk with Emilian Postolache

PhD student from La Sapienza University

Bio

Emilian Postolache is a PhD student interested in generative models and audio processing. He has a strong track record of developing innovative solutions to complex problems. His work has focused on enabling source separation using latent autoregressive models in VQ-VAE domains, and he has proposed a Bayesian sampling technique based on fully discrete likelihood functions. He pursued an internship at Dolby Laboratories, where he improved universal sound separation using adversarial techniques. He combined their music generation and source separation expertise to develop a diffusion-based model that simultaneously performs both tasks.
He is committed to pushing the boundaries of what is possible in the field and is eager to continue making impactful contributions.

Abstract

He will present his paper “Multi-Source Diffusion Models for Simultaneous Music Generation and Separation” in this technical talk. The work introduces a diffusion-based generative model (MSDM) to perform music synthesis and source separation tasks. Moreover, it presents the partial inference task of source imputation, where we can generate a subset of the sources given the others. This task has practical applications in music production and performance, as it can be used to create new musical pieces by combining different sources in interesting ways. He will introduce diffusion models and highlight the differences between the audio setting and the more commonly considered image setting, followed by a detailed explanation of MSDM. He will also compare MSDM with other generative audio models, including those based on diffusion (such as Moûsai and AudioLDM) and latent autoregressive models (such as MusicLM). He will show how this approach offers a novel contribution to the field by highlighting the similarities and differences between these models.

Tech talk