While the concept of telemedicine has been around for over a century, using synchronous video consults as a way to facilitate care is a relatively new concept. Many healthcare providers have successfully integrated video consults on a one-to-one basis without the use of medical peripheral devices, but scaling beyond simple use cases poses much more of a challenge, and not all video conferencing solutions are created equal.
Some telehealth organizations know at the outset that they need to be able to support a specific complex use case. Some common uses cases that we see include:
- Remote consultation or triage
- Broadcasting to teams
- Facilitating training sessions
- Remote patient monitoring
- Chronic disease management
Many organizations, however, will start with a relatively simple use case only to find that in a short period of time, their needs have evolved and their video conferencing software has become obsolete or ineffective.
The importance of developing a plan for current and future video conferencing scalability can not be overstated. In this blog post, we will go in depth on how you can future proof your telemedicine application by incorporating these best practices for video conferencing scalability.
What is scaling?
Before we dive in too deep, we must first have a good understanding of what we mean by scaling -- and more specifically, the types of network topologies available for video conferencing systems and their impact on scalability.
Within the world of video conferencing, scaling can mean many things. For instance, if you have a server that is no longer able to effectively handle your video conference, you can either increase the processing power or memory of the single server (vertical scaling), or you could distribute the load over multiple servers (horizontal scaling).
While these types of scaling should be considered (our LiveSwitch server scales both horizontally and vertically), for the purposes of this post we are going to focus on how to scale a single conference beyond a one-to-one call without any disruption to the end users.
To do this, you want a video conferencing system that supports three main topologies: peer-to-peer(P2P), selective forwarding(SFU), and multipoint control unit(MCU).
In its simplest form, a peer-to-peer (P2P) network is when two or more participants are directly connected in a video conference without the use of a server. While this type of connection is perfect for a simple doctor to patient use case and provides the lowest operating cost, it tends to break down quickly as conferences grow both in number of participants and complexity (e.g. recording).
In a P2P conversation, every participant must upload their video stream to every participant and download a stream from every participant. This can become too CPU intensive as the conversation moves beyond two individuals and often creates too much stress on older devices.
Additionally, in this mode complex functions such as recording or PSTN integration have to take place on the end users devices, increasing time to market and support requirements since implementations must be re-built on a per-platform basis.
When a P2P network will no longer suffice, a Selective Forwarding Unit (SFU) can be used. This option reduces the amount of CPU required on the local device by adding a server into the mix that can take some of the stress off of the participants.
When an SFU is used, every participant only needs to upload their encrypted video stream one time to the server. The server then forwards those streams to each of the other participants. This can reduce latency for people that are highly distributed. It also permits things like transcoding, recording, and other server-side integrations such as SIP which would be much more difficult in a peer-to-peer connection.
Multipoint Control (Mixing)
While the SFU is sufficient in many use cases, if the conversation grows beyond a certain point (e.g. 6 or 7 participants) or if any of the participants are using underpowered devices (ex. an older iPhone or Android), then a different topology is needed.
A Multipoint Control Unit allows your participants to upload individual encrypted video streams to a server. The server than mixes the incoming streams from each participant into a single stream and forwards them as a single feed back to each individual client. This means that every patient will only need to have a device powerful enough to handle a single bidirectional connection, regardless of the number of clients present. This is the least bandwidth and device CPU option but it does require additional server CPU for mixing audio/video into single streams.
Hybrid Topology - The best of all worlds
Hybrid architectures are, as their name implies, a combination of peer-to-peer, selective forwarding, and multipoint control (mixing). In a hybrid environment, participants can join a session based on whatever makes the most sense for the session or the devices in use. For simple two-party doctor/patient calls, a P2P connection is simple and requires minimal server resources. For small group sessions or session involving more advanced functionality like telephony using SIP or recording, forwarding will better meet your needs. For larger group sessions or for poor network connections, a multipoint control unit is often the only practical option. Our LiveSwitch server stack is a great example of a hybrid topology and is one of the few hybrid media servers on the market today.
For another perspective into the intricacies of scaling check out this post by our CTO Anton Venema: How to Successfully Scale your WebRTC Application.
Best Practice #1: Avoid video conferencing solutions that rely solely on peer-to-peer networking, as they are inherently limited.
If you want to have a video conferencing solution with the ability to scale beyond 2-3 individuals, you need to have a solution that does not solely rely on peer-to-peer networks. As conferences grow, connection topologies can and should change.
There are a number of different types of possible conferences that lend themselves to different topologies:
It is important that your solution support all three connection types to provide you with flexible application options that will grow with your organization.
Best Practice #2: Ask potential solution providers whether they support a hybrid architecture and whether they can support multiple connection types at the same time.
Before making a purchase, you need to have a good understanding of what connection types your system will support and understand any limitations that it may have. For example, a peer-to-peer network may technically be able to support 4 users in a video consult under the right conditions, but it would have major ramifications for bandwidth, CPU usage and video quality on each connected participant.
Be sure to ask all potential providers if they support all three of the connection types and if they can support all of them simultaneously in a single conference. This will allow for the greatest flexibility and will allow you to cost optimize the connection types based on the number of users.
To illustrate this need, consider the above illustration. In conversation A, we have a nurse and a patient in a P2P conversation. When the doctor is brought into the consult (conversation B), we shift some of the stress to the SFU in order to maintain maximize bandwidth and reduce the CPU burden. If a fourth person wants to join the consult (conversation C) but is in a poor network (e.g. 3G), an MCU connection can be opened up for that individual while maintaining an SFU connection for the rest of the participants. At any point, any individual in the consult can move in and out of a poor network areas and the system will dynamically change the connection type without any disruption to the users --providing the best possible user experience for your patients.