This Mathcad A-Z entry is going to be different from the others for a few reasons. The first reason is that Mona Zeftel, our usual contributor, is on vacation so I’m trying my hand at this. Mona knows a lot more tips and tricks and everything else about Mathcad. Most of the stuff I know, you guys probably already know!
Now the second reason, I think, is the bigger reason. And that is “J” is a hard letter to write about, whether it related to Mathcad or not. My first thought was: “J is for Bessel Functions.” I thought it would be clever since I’m using the mathematical notation of the function as opposed to the actual name of the function, i.e. “B is for Bessel Functions.” But the more I thought about it, the more I realized that the entry would just be a few copy/paste selections from the Wikipedia entry on Bessel functions of the first kind.
After scratching that Bessel function idea, the second J” word that came to mind was “jazz.” Okay, next… The imaginary number, j. Jupiter. Jamaica. Wait a minute, jazz? Can I perhaps write about jazz?
I did a quick YouTube search on Louis Armstrong, the first artist that came to mind. C’est si bon came up. I’ve never heard of the song before, but like his other songs, it was easy on the ears. And who can say ‘no’ to Armstrong’s signature voice? The video starts off with a quick introduction. Then Armstrong and the All Stars play their instruments for about a minute and a half before Armstrong busts out his windpipes. So this got me thinking – why not try to see if I can perform some karaoke machine magic and try to remove Armstrong’s voice from the song.
Now there are karaoke machine software out there. I think most of them work pretty much the same way – they try to remove frequencies found in the human voice. But that gets tricky because those frequencies are often found in the music as well. Regardless, I figured I would give it a try in Mathcad. Even if I don’t succeed, it’s still a fun exercise!
The first step is to load the music into Mathcad. In Mathcad, we can use GETWAVINFO to get information on the WAV file – the number of channels, the sample rate, the bit resolution, and average number of bytes per second. Then we use READWAV to read in the actual audio data.
So I figured I would start by using the clip of Armstrong’s intro at the beginning, where there is just his voice and no music. I’ll take a look at the FFT and see what frequencies are present in his voice. This should give me an advantage over karaoke machine software – having a sample of the singer’s voice without the background music.
Right away we can see three peaks in 200-600Hz range. Next, I read in a clip of the song where Armstrong is singing, and take a FFT of that. I can plot the FFT of the two together. The subscript ‘v’ is for voice-only, and ‘s’ for singing.
At first I thought I could just use a band-stop filter to wipe off the biggest peak. Because the frequency band I’m trying to get rid of starts and stops at 0.9% and 0.13% of the sampling rate, it’s actually very similar to a high-pass filter. The magnitude response for both the high-pass filter (c1) and low-pass filter (c2) is shown below. Note that I need to use about eight times as many coefficients with the band-pass filter in order to get comparable attenuation in the desired frequency band.
I outputted the filtered audio with WRITEWAV and played it. The filter had taken out a lot of the instrumental music, so I was left with a chipmunk speaking – a result of the higher harmonics of the voice frequency. A double whammy! I lost the music and I still have Armstrong’s voice (albeit in chipmunk form). What I want is just the music and no voice. A quick aside here is that even though Armstrong voice sounds like a chipmunk, it’s at normal speed. Typically, to get the chipmunk effect, you speed up the clip and so the chipmunk voice is really sped up.
So I decided to try a much more manual process where I “took out” bands where the peaks were in the voice-only audio clip. I started with the three peaks marked in the FFT plot. For frequencies in those peaks, I multiplied the FFT at those points by a scaling factor, thereby reducing the magnitude. The result was a bit better, but still not great. I retained more of the background instruments. There was still loss, but not as bad as with the high-pass filter. I was, however, still picking up harmonics of Armstrong’s voice.
So I leave this one open ended to you guys. What else can I try to get rid of the voice?