Skip to content

Voice Detection with the Client SDK

Anton Venema Jul 27, 2020 11:48:00 AM
Click me
 
Voice Detection 
Voice detection is an important technique used in audio processing to help detect the presence or absence of human speech.  Being able to monitor and mute inactive speakers has many benefits but primarily it helps:
 
  • Reduce background noise for an improved user experience
  • Save network bandwidth by avoiding the unnecessary transmission of audio packets. 

How you can monitor and mute inactive speakers

The LiveSwitch Server SDK it easy to monitor audio levels. It’s as simple as wiring up a single event handler to your `LocalMedia` instance:
 

localMedia.OnAudioLevel += (level) =>
{
    Log.WriteLine($"Local media {localMedia.Id} audio level is {level}.");
};

Adding the above snippet to your app will flood your log with local microphone capture levels as soon as you start the local media.

The same event is available for monitoring inbound remote audio levels using `RemoteMedia`:


remoteMedia.OnAudioLevel += (level) =>
{
    Log.WriteLine($"Remote media {remoteMedia.Id} audio level is {level}.");
};

You can also work with audio tracks directly, whether you are creating your own or using the prebuilt tracks that underpin `LocalMedia` and `RemoteMedia`:


audioTrack.OnLevel += (level) =>
{
    Log.WriteLine($"Audio track {audioTrack.Id} level is {level}.");
};

Warning: this event is raised for every single audio frame on a time-sensitive audio thread. Unless you’ve got a big battery or a really fast device, make sure any work done in your event handler is performed as quickly as possible.

Let’s try muting our microphone when we’re not speaking.

Muting itself is trivial:


localMedia.MuteAudio();

… and we can easily do it in response to a certain audio level threshold:


localMedia.OnAudioLevel += (level) =>
{
    if (level < InactiveThreshold)
    {
        localMedia.MuteAudio();
    }
    Log.WriteLine($"Local media {localMedia.Id} audio level is {level}.");
};

There’s a problem with this, though. Once we call this method, the audio levels all read out as 0.0 going forward. This is technically correct - the audio buffers are perfectly silent - but it means we have no way to detect when we start speaking again.

Let’s try it a different way, using the `OnRaiseFrame` event of the `AudioSource`. This event gives direct access to the audio buffers as they are raised by the source:


localMedia.AudioSource.OnRaiseFrame += (frame) =>
{
    var buffer = frame.LastBuffer;
    var level = buffer.CalculateLevel();
    if (level < InactiveThreshold)
    {
        buffer.Mute();
    }
    Log.WriteLine($"Local media {localMedia.Id} audio level is {level}.");
};

Using this event, we can calculate the audio level every time, even while muting. It also automatically “unmutes” (by doing nothing) when the level rises above our threshold.

The LiveSwitch Server SDK is designed for maximum flexibility and provide unprecedented media pipeline access to build virtually anything you can dream. Can you shoot yourself in the foot? Absolutely. All that power comes with a bit of a learning curve, so we are always hard at work enhancing our documentation to make it easier to find the solution you’re looking for.

Reimagine your live video platform