The technology behind 200-people video calls
A businessman is on a call with his client. The call drops. He tries to call back. But the connection is lost, if not choppy.
That’s a commonly faced problem around the world. But conference calls are here to stay. There’s no better alternative at the moment. 76 percent of business decision makers use video conferencing during work, according to Polyvore. More than half have at least one meeting over video every week.
But conference calls shouldn’t be just limited to remote teams; their possibilities extend to large-scale meetings.
Line Group Call allows users to make voice calls with up to 200 people simultaneously. Behind the scenes, there’s nifty work done so that embarrassment from dropped calls or poor connectivity won’t be amplified 200-fold.
Don’t mesh things up
The mesh network, a way to connect network devices without access to the internet or cloud, may have an increasingly important role to play in our lives. The mesh network connects devices through wireless nodes rather than a centralized access point such as a local area network (LAN) wire.
A mesh network is low-cost as it does not need cables. It can be applied to make homes smart, such as turning on the lights once a user gets home. Other applications of a mesh network are to amplify wifi and make sure people can always access the internet from every nook and cranny.
Mesh networks can even help during disasters as any sole point of failed connectivity can be avoided.
Imagine having to process both video and voice data from 199 people on a mobile phone.
Instead of using a full mesh, Line uses an adaptation of the mesh network with a centralized server. Without a centralized server, traffic increases as more users get on the call, because everyone is connected to each other. Imagine having to process both video and voice data from 199 people on a mobile phone.
With the main server, Line can minimize the user’s traffic. The server itself is split into two: one for managing Line Talk and the other specifically for mixing the voice and video. There are different servers all around the world so that an Argentinian team can speak comfortably with someone across the Pacific Ocean to, say, an Indonesian team.
Can you hear me?
The business executive is finally back on the call with the team after the latency issues. As he launches into the climax of his pitch, someone in the room starts coughing. The coughs echo, creating an unbearable feedback loop of terrible phlegmy hacks. It’s distracting and horrible.
Imagine the noise echoing a hundred times. Noise increases as more users get on the call, with exponential difficulty identifying its source.
To beat this, Line Group Call uses a Voice Quality Enhancement (VQE) software module. The input signal is broken down into three components: echoes, the voice, and noise from the environment. Echoes and noise are undesirable, and Line estimates how much of such unwanted signals to be removed. To do so, the engineers divided sounds into these four categories:
- User’s voice
- Sound of another user, echo
- Sound of users talking simultaneously
- No one talks here, background noise
Line’s modules will estimate echo and noise based on categories 2 and 4 and remove them while preserving the users’ voices in categories 1 and 3. Line also uses the Automatic Gain Controller (AGC) to amplify or reduce the volume of different sounds. No more ear-clutching moments.
It’s an uphill task. Line says there are over 20,000 different devices around the globe with various audio specifications, according to Line. They’re still identifying different characteristics to achieve good audio for all users.
Look good for 199 other people
Some people sound good, but look even better. That’s what the businessman thinks of his face during video meetings. “If only my face weren’t a blur of skin-colored pixels for my remote clients,” he thinks.
To solve the pixelation problem, Line uses a multi-layer video stream – a sequence of video data spliced into three different resolutions.
In transcoding, a file is converted from an encoding during a file transfer. A user’s device sends the video in the highest resolution. A server would transcode the video into various resolutions before sending it to other users. But this is resource intensive. With 200 participants and three separate resolutions, it means the server has to work 600 times. It’s akin to asking the sole cook in a restaurant to create 600 dishes at one go from scratch to fit 200 diners’ taste and preferences.
Line believes a multi-layer video stream is more helpful than transcoding. The stream forms the video from multiple layers to create videos in different resolutions. The user’s device captures the video and sends a multi-layer video stream to the central server. The server then simply has to combine different layers to send the video with the most manageable resolution to other users. Using the same restaurant analogy, the cook simply has to combine the right ingredients for 200 different diners this time.
“I can’t take this conference call” isn’t going to be a viable excuse anymore. It’s hard to back out when 199 others can make it.