Skip to content

Opening Up Media and Signaling

Anton Venema Jan 1, 2020 11:39:00 AM

 

IceLink 3’s integrated media pipeline allows powerful and fine-grained control of both local and remote media handling.
 
In this article, we’ll walk through the process of setting up custom audio and video tracks, as well as a demonstration of signaling by simulating two peers in a single process.

If you haven’t already, please read our previous article, which provides an overview of the IceLink 3 media pipeline and gets you familiar with some of the terminology we will use here.
 
We’ll be using C# and Visual Studio for this walkthrough, but the same concepts and API will work for Java, Objective-C, or Swift. JavaScript and HTML5 won’t let us operate at this level, so for now, we’ll be talking about native apps exclusively.
 
The first thing we need is a baseline for our local and remote audio and video tracks:


using FM.IceLink;
using AForge = FM.IceLink.AForge;
using G711 = FM.IceLink.G711;
using H264 = FM.IceLink.H264;
using NAudio = FM.IceLink.NAudio;
using OpenH264 = FM.IceLink.OpenH264;
using Opus = FM.IceLink.Opus;
using Pcmu = FM.IceLink.Pcmu;
using Pcma = FM.IceLink.Pcma;
using WinForms = FM.IceLink.WinForms;
using Vp8 = FM.IceLink.Vp8;
using Yuv = FM.IceLink.Yuv;
  
public class Client : IDisposable
{
  public AudioTrack LocalAudioTrack  { get; private set; }
  public VideoTrack LocalVideoTrack  { get; private set; }
  public AudioTrack RemoteAudioTrack { get; private set; }
  public VideoTrack RemoteVideoTrack { get; private set; }
  
  public WinForms.PictureBoxSink LocalVideoSink  { get; private set; }
  public WinForms.PictureBoxSink RemoteVideoSink { get; private set; }
  
  public Client()
  {
    // Create a local and remote video sink. We need to add their Views
    // to the UI at some point, so we construct them separately so as
    // to keep a reference around. Note that we can configure the scale
    // of the view as well as apply mirroring for the local preview.
    LocalVideoSink = new WinForms.PictureBoxSink()
    {
      ViewScale = LayoutScale.Contain,
      ViewMirror = true
    };
    RemoteVideoSink = new WinForms.PictureBoxSink()
    {
      ViewScale = LayoutScale.Contain
    };
  
    // Create a new audio track that starts with an NAudio-based microphone
    // source running at the default configuration for Opus - 48KHz stereo.
    LocalAudioTrack = new AudioTrack(new NAudio.Source(Opus.Format.DefaultConfig))
      .Next(new[]
      {
        // To properly support WebRTC/ORTC, we need to enable Opus and
        // the two G.711 codecs - PCMU and PCMA. We're already capturing
        // at the audio configuration required for Opus, so we can branch
        // here to (1) feed directly into the Opus encoder/packetizer
        // and (2) downsample to 8KHz mono before feeding into the PCMU
        // and PCMA encoders/packetizers.
        new AudioTrack(new Opus.Encoder())
                 .Next(new Opus.Packetizer()),
        new AudioTrack(new SoundConverter(G711.Format.DefaultConfig))
          .Next(new[]
          {
            new AudioTrack(new Pcmu.Encoder())
                     .Next(new Pcmu.Packetizer()),
            new AudioTrack(new Pcma.Encoder())
                     .Next(new Pcma.Packetizer())
          })
      });
  
    // Create a new video track that starts with an AForge.NET-based camera
    // source capturing 640x480 images at 30 FPS.
    LocalVideoTrack = new VideoTrack(new AForge.CameraSource(new VideoConfig(640, 480, 30)))
      .Next(new[]
      {
        // Immediately branch out to our local video sink so we have a
        // live preview and also to an image converter that can take the
        // RGB images coming from our camera and convert them to I420. A
        // specific format of YUV, I420 is the input format for both the
        // VP8 and H.264 video encoders. The fantastic libyuv library is
        // used to optimize this operation for the current processor.
        new VideoTrack(LocalVideoSink),
        new VideoTrack(new Yuv.ImageConverter(VideoFormat.I420))
          .Next(new[]
          {
            // For maximum compatibility, we enable both VP8 and H.264
            // video encoders.
            new VideoTrack(new Vp8.Encoder())
                     .Next(new Vp8.Packetizer()),
            new VideoTrack(new OpenH264.Encoder())
                     .Next(new H264.Packetizer())
          })
      });
  
    // Create a new audio track that starts with a few depacketizers, one
    // for each of the audio codecs we support. From there, it's out into
    // a decoder and then an NAudio-based sink that render to the default
    // audio output device.
    RemoteAudioTrack = new AudioTrack(new[]
    {
      new AudioTrack(new Opus.Depacketizer())
               .Next(new Opus.Decoder())
               .Next(new NAudio.Sink()),
      new AudioTrack(new Pcmu.Depacketizer())
               .Next(new Pcmu.Decoder())
               .Next(new NAudio.Sink()),
      new AudioTrack(new Pcma.Depacketizer())
               .Next(new Pcma.Decoder())
               .Next(new NAudio.Sink())
    });
  
    // Create a new video track that starts like the audio track, with a
    // couple depacketizers, one for each of the video codecs we support.
    // From there, it's out into a decoder and then into a shared image
    // converter that will give us the RGB format we need for the remote
    // video sink, which will allow us to view the remote feed.
    RemoteVideoTrack = new VideoTrack(new[]
    {
      new VideoTrack(new Vp8.Depacketizer())
               .Next(new Vp8.Decoder()),
      new VideoTrack(new H264.Depacketizer())
               .Next(new OpenH264.Decoder())
    }).Next(new Yuv.ImageConverter(VideoFormat.Bgr))
      .Next(RemoteVideoSink);
  }
  
  public void Dispose()
  {
    LocalAudioTrack.Destroy();
    LocalVideoTrack.Destroy();
    RemoteAudioTrack.Destroy();
    RemoteVideoTrack.Destroy();
  }
}

These four components - local audio track, local video track, remote audio track, and remote video track - represent everything you need at the media layer for a WebRTC-compatible connection.

We can now proceed to create a Connection, which will manage the network layer, and signal with the other side. In this case, the “other side” will simply be another Connection in the same process. In a real-world scenario, the “other side” would be on another device, so you would need to signal through a real-time messaging system like WebSync, SIP, or socket.io.

First, we need to create the actual Connection, which is rather straightforward:


public class Client
{
  …
  
  public AudioStream AudioStream { get; private set; }
  public VideoStream VideoStream { get; private set; }
  public Connection Connection { get; private set; }
  
  public Client()
  {
    …
  
    // Create audio and video streams using the local tracks as the
    // sources to feed into them, and the remote tracks as the sinks
    // to process the incoming media.
    AudioStream = new AudioStream(LocalAudioTrack, RemoteAudioTrack);
    VideoStream = new VideoStream(LocalVideoTrack, RemoteVideoTrack);
  
    // Create a connection class to manage the streams.
    Connection = new Connection(new Stream[] { AudioStream, VideoStream });
  }
}

Next, we can perform signalling. Again, keep in mind that the signalling is happening in-memory here, but in the real-world, the offer/answer descriptions would be sent through a centralized messaging system to the remote peer:


var alice = new Client(); // the caller
var bob   = new Client(); // the callee
  
// 1) Alice creates an offer for Bob…
alice.Connection.CreateOffer().Then((offer) =>
{
  // 2) … locks the offer in as her local description…
  return alice.Connection.SetLocalDescription(offer);
}).Then((offer) =>
{
  // 3) … and then signals the offer over to Bob.
  
  // 4) Bob locks the offer in as his remote description…
  return bob.Connection.SetRemoteDescription(offer);
}).Then((offer) =>
{
  // 5) … creates an answer for Alice…
  return bob.Connection.CreateAnswer();
}).Then((answer) =>
{
  // 6) … locks the answer in as his local description…
  return bob.Connection.SetLocalDescription(answer);
}).Then((answer) =>
{
  // 7) … and then signals the answer over to Alice.
  
  // 8) Alice locks the answer in as her remote description.
  return alice.Connection.SetRemoteDescription(answer);
});

Note that while signalling is obviously taking place in-memory, IceLink will still run the media streams through a local network interface since it uses network IP addresses for all peer-to-peer communication. If you run this code on a device with no network adapters, the connections should time out and fail.

Because it’s not necessary for this example, we have skipped over the exchange of public (a.k.a. server reflexive) and relay network candidates - IP addresses and ports where you might be reachable over the public Internet, used for firewall and router traversal.

We don’t need it here because everything is running on the same computer (with no pesky firewalls or routers to get in the way), but if you were to use a proper signaling system and put “Alice” and “Bob” on two different devices on different networks, you would need to watch for public/relay network candidates discovered by the Connection and signal them to the remote peer in parallel with your offer/answer signaling.

This process is known as “trickle ICE”, so named because of how the candidates trickle in from the remote peer:


// When Alice raises a network candidate…
alice.Connection.OnLocalCandidate += (connection, candidate) =>
{
  // … signal it over to Bob…
  
  // … so Bob can add it to his collection.
  bob.Connection.AddRemoteCandidate(candidate);
};
  
// When Bob raises a network candidate…
bob.Connection.OnLocalCandidate += (connection, candidate) =>
{
  // … signal it over to Alice…
  
  // … so Alice can add it to her collection.
  alice.Connection.AddRemoteCandidate(candidate);
};

Network candidates are raised as soon as both a local and remote description are set, so you would want to add these handlers before starting into the offer/answer logic.

The only step left is to start the local media sources and add the video views to the UI so we can actually see them:


// Start the local audio and video sources.
alice.LocalAudioTrack.Source.Start().WaitForResult();
alice.LocalVideoTrack.Source.Start().WaitForResult();
bob.LocalAudioTrack.Source.Start().WaitForResult();
bob.LocalVideoTrack.Source.Start().WaitForResult();
  
FormClosing += (sender, args) =>
{
    // Stop the local audio and video sources.
    alice.LocalAudioTrack.Source.Stop().WaitForResult();
    alice.LocalVideoTrack.Source.Stop().WaitForResult();
    bob.LocalAudioTrack.Source.Stop().WaitForResult();
    bob.LocalVideoTrack.Source.Stop().WaitForResult();
};
  
// Get references to our video views.
var aliceLocalView = alice.LocalVideoSink.View;
var aliceRemoteView = alice.RemoteVideoSink.View;
var bobLocalView = bob.LocalVideoSink.View;
var bobRemoteView = bob.RemoteVideoSink.View;
  
// Add the views to the UI.
panel.Controls.Add(aliceLocalView);
panel.Controls.Add(aliceRemoteView);
panel.Controls.Add(bobLocalView);
panel.Controls.Add(bobRemoteView);
  
// Set the view widths.
var width = (panel.Width / 2);
aliceLocalView.Width = width;
aliceRemoteView.Width = width;
bobLocalView.Width = width;
bobRemoteView.Width = width;
  
// Set the view heights.
var height = (panel.Height / 2);
aliceLocalView.Height = height;
aliceRemoteView.Height = height;
bobLocalView.Height = height;
bobRemoteView.Height = height;
  
// Set the view positions.
aliceLocalView.Top = 0;
aliceLocalView.Left = 0;
aliceRemoteView.Top = 0;
aliceRemoteView.Left = width;
bobLocalView.Top = height;
bobLocalView.Left = 0;
bobRemoteView.Top = height;
bobRemoteView.Left = width;

That’s it! If all went well, this should give you some nice audio and video feedback loops.

From here, the sky's the limit. Try removing Opus from the pipeline to force PCMU, changing the audio configuration, or using H.264 by ordering it before VP8 in the track arrays. Replace the AForge.CameraSource with an AForge.MotionJpegSource to capture media from an IP camera instead of your device webcam, or even feed directly into the Vp8.Packetizer if you have a file with encoded VP8 media.

IceLink media tracks can be customized to meet any application needs.  What will you build?