Splitting A Recording By Voice

Richard Audette included in Computing Project

2024-12-30 211 words One minute

/splitting-podcast-by-speaker/images/Lerwick.jpg

Contents

I have been bouncing around ideas on how to make generated content more interesting for a couple years now. In April 2023, I created a video with generated content, using a green-screen type effect to put a real street scene in the background. In January 2024, I tried emulating the podcasts I love, and added a co-host - I had ChatGPT generate a dialog between two characters, and then used different voices to create a discussion.

Google’s Notebook LM does this REALLY well, SO MUCH better than just creating a two-person dialog. The conversation flows naturally, you get the “oh yeah?”, “interesting” injected like you would in a conversation. Notebook LM creates a single WAV file. I wanted a separate recording/.WAV for each “speaker”, so I could run it through nVidia’s “audio2face” model. I created a tool to split a wav file per-speaker using AssemblyAI’s transcription service - if this is useful for anyone, you can check it out here: https://github.com/raudette/conversationsplit

/splitting-podcast-by-speaker/images/SpeakerSplit.png — Audacity Screen Shot Illustrating How Combined Notebook LM Track Has Been Split

For an example of how this all comes together, check out this video of two avatars chatting about an internet forum thread on making the perfect coffee: Brewing Perfection: Inside the World of At-Home Espresso Machines - YouTube