Skip to content

Creating a Custom Audio Sink in LiveSwitch

Jacob Steele Jun 30, 2023 11:54:27 AM

In this article, we provide a detailed guide on creating a custom audio sink in LiveSwitch using F# or C# and the .Net Core libraries. Understanding the concept of an audio sink and its role in the media pipeline is essential to leverage LiveSwitch effectively.

 

Introduction

An audio sink in LiveSwitch is the final element in the media pipeline on the receiving end, similar to SFU Downstream. The audio pipeline consists of several components:

  • De-Packetizer: De-packetizes the incoming stream from the connection.
  • Decoder: Decodes the stream into raw data (PCM 48000/2 or I420 frames).
  • Converter: Converts the stream into the required format for the sink.
  • Sink: Renders the video, plays back the audio, etc.
Note: The pipeline can be modified and branched into multiple elements.

As you can see, the audio sink is the final step in the pipeline and generally plays the audio to the user through speakers. However, it has many other potential applications, such as transcriptions (which I will cover in a future blog post). Now, let's explore the requirements for creating your own AudioSink.

 

Audio Sink

In C# or F#, an AudioSink consists of four methods that need to be overridden:

  • Constructor
  • Label
  • DoDestroy
  • DoProcessFrame

Constructor

The constructor must call the base constructor with the desired AudioFormat that the sink expects. In most cases, it will be either PCM 48000Hz 2-channel (the default output of the Opus Decoder) or PCM 16000Hz/mono (1 channel) used by various audio processors.

Label

The Label method returns a string representing the name of your AudioSink. This name will propagate up the chain and be used for logging purposes.

DoDestroy

The DoDestroy method is used to clean up any additional services being used when the pipeline is destroyed (e.g. when a participant leaves the conference).

DoProcessFrame

The DoProcessFrame method is where the magic happens. When a frame passes through the pipeline, it reaches the DoProcessFrame method as the final step. This is where you can perform audio processing on the received audio frame.

 

Example Code

C#

using FM.LiveSwitch;

namespace Example
{
  public class CustomAudioSink : FM.LiveSwitch.AudioSink
  {
    public CustomAudioSink()
        // We want PCM (raw) audio 16000 hz mono (1 channel)
        :base(new FM.LiveSwitch.Pcm.Format(16000, 1))

    {
    }

      public override string Label => "My Custom Audio Sink";

      protected override void DoDestroy()
      {
        // Clean up services.
      }

      protected override void DoProcessFrame(AudioFrame frame, AudioBuffer inputBuffer)
      {
        // Process the frame here! The Magic Method!
      }
  }
}

F#

module Example


open FM.LiveSwitch

type CustomAudioSink =
    inherit AudioSink(new Pcm.Format(16000, 1))

    override this.Label : string = "My Custom Audio Sink"

    override this.DoDestroy () =
        ()

    override this.DoProcessFrame (frame: AudioFrame, buf: AudioBuffer) =
        ()

 

Wiring It Up

Using the Chat Example (C# only)

In our Chat Example's RemoteMedia.cs file, at line 47, you can create the audio sink by overriding the CreateAudioSink method:

// Line 47 of RemoteMedia.cs in our Chat Example

protected override AudioSink CreateAudioSink(AudioConfig config)
{
  return CustomAudioSink();//new NAudio.Sink(config);
}

Console Application (Hard Way)

Both in C# and F#, you can create custom pipelines for remote media. Instead of relying on the predefined media pipeline, we create our own and assign it to the connection object.

F#
let audioSink = CustomAudioSink() 
let audioTrack : AudioTrack = AudioTrack(Opus.Depacketizer()).Next(Opus.Decoder()).Next(SoundConverter(AudioConfig(16000, 1))).Next(audioSink)

let audiostream : AudioStream = if connInfo.HasAudio then AudioStream(null, audioTrack) else null
let connection
= channel.CreateSfuDownstreamConnection(connInfo, audiostream)
C#
var sink = new CustomAudioSink();

var audioTrack = new AudioTrack(new Opus.Depacketizer()).Next(new Opus.Decoder()).Next(new SoundConverter(AudioConfig(16000, 1))).Next(sink);

var audioStream = (connInfo.HasAudio) ? new AudioStream(null, audioTrack) : null;
var connection = channel.CreateSfuDownstreamConnection(connInfo, audiostream)

Instead of using the default RemoteMedia class, we created our own AudioTrack and built a custom media pipeline by chaining the required elements. Finally, we attached the pipeline to the connection. Note that for MCU and P2P connections, you may need to create separate audio pipelines for capturing and rendering audio (if sending audio is desired).

 

Advanced Concept

LiveSwitch's media pipelines are powerful and can be utilized to create multiple outputs. In the following example, we output the same frame to three different sinks:

.Next(new AudioTrack[] { new CustomAudioSink1(), new CustomAudioSink2(), new CustomAudioSink3() })

For further reading, stay tuned for our upcoming LiveSwitch VOSK Custom Audio Sink Transcriber blog post!

That wraps up the process of creating custom audio sinks in LiveSwitch using F# and C#. I hope this guide helps you get started with extending LiveSwitch's capabilities for your specific audio processing requirements. Feel free to explore the possibilities and unleash your creativity!

 

Need assistance in architecting the perfect WebRTC application? Let our team help out! Get in touch with us today!