Subtractive Training for Music Stem Insertion using Latent Diffusion Models

Institution Name

*Indicates Equal Contribution

Abstract

We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusion model to generate the missing instrument stem, guided by both the existing stems and the text instruction. Our results demonstrate Subtractive Training's efficacy in creating authentic drum stems that seamlessly blend with the existing tracks. We also show that we can use the text instruction to control the generation of the inserted stem in terms of rhythm, dynamics, and genre, allowing us to modify the style of a single instrument in a full song while keeping the remaining instruments the same. Lastly, we extend this technique to MIDI formats, successfully generating compatible bass, drum, and guitar parts for incomplete arrangements.

Subtractive Training Generation Examples

Background

Target

Generated

Add soft acoustic drums to enhance emotion

Add indie drums with punchy beats

Generate percussion with a lively Latin flair

Add reggae beats

Add rock drums

Subtractive Training Style Transfer Examples

Background

Target

Generated

add jazzy drums

add reggae beats

add aggressive rock drums with cymbal crashes

Subtractive Training MIDI Examples

Guitar Only

Guitar and Generated Bass

Drums Only

Drums and Generated Guitar

BibTeX

BibTex Code Here