If you’ve ever taken part in a meeting inside a small, glass-walled conference room, you’ve likely experienced it: your voice sounds muffled, and people on the other end struggle to understand you. The problem isn’t that you’re speaking too quietly – it’s about how sound travels in that kind of space.
This article explores a technical challenge that may seem simple at first glance, but is in fact complex, multi-layered, and still unresolved across much of the industry. It also tells the story of how Yealink’s team has worked persistently, step by step, to find a solution.
Why is audio so challenging in glass-walled spaces?
Compared to traditional meeting rooms, glass rooms come with much more difficult acoustic conditions. Large, reflective glass surfaces and the lack of sound-absorbing materials mean that sound waves bounce almost instantly off the walls, ceilings, and tables – leading to strong echoes and distorted sound.
The main problems include:
-
Low frequencies become blurred, making speech less intelligible.
-
Early reflections interfere with the natural tone of speech.
-
Traditional echo cancellation algorithms often fail in real-time situations.
-
Overly aggressive filtering can ruin the natural sound of the voice.
Because of this, many in the industry rely on physical solutions – such as using specialized furniture or adding multiple microphones to the room. However, these fixes are expensive and often lead to inconsistent results.
But what if a single device could deliver clear, understandable audio in any meeting room?
AI-Based Noise Reduction
In 2022, Yealink began rolling out its AI-based noise reduction algorithms in the A20, A30, and IWB devices. These solutions proved effective at managing both constant and sudden background noises, providing a reliable foundation for clean audio capture.
During this development, three key technologies were established:
-
Building real-world meeting audio datasets.
-
Designing efficient AI models that run directly on the device.
-
Creating a flexible deployment framework compatible with various hardware platforms.
This foundation paved the way for tackling the reverberation problem directly.
Step by Step: Tackling Reverberation
The team began by addressing what’s known as “late reverberation” – the lingering echoes that can still be heard even after someone has finished speaking. The goal was to develop a model that could operate effectively in real-world environments without increasing the load on the device.
They recorded real audio from office environments, refined their acoustic models, and developed a custom real-time algorithm.
The results were promising:
-
Reverberation was noticeably reduced across various types of rooms.
-
The solution received positive feedback during evaluation at the Microsoft Teams AV certification lab.
-
Users reported improved speech clarity, especially in medium-sized meeting spaces.
It was a key milestone – but not the final goal.
The Hardest Part: Early Reverberation
To achieve truly clear and intelligible speech, the team had to solve a much tougher issue: echoes occurring at the very moment the voice is produced. These early echoes are so closely intertwined with the direct speech signal that traditional methods struggle to remove them without distorting the natural sound of the voice.
After testing several approaches, they ultimately chose to work with a generative model. This allowed the system to reconstruct the speech signal while significantly reducing unwanted echo—with minimal distortion. However, the trade-off was that the model required substantially more computing power.
THE SOLUTION: Efficient On-Device Operation Using NPU
To run this advanced model on cost-effective devices, the team developed a custom runtime framework optimized for Rockchip NPU chips. This made it possible for the more complex model to operate quickly, efficiently, and with low power consumption on embedded systems—a crucial step for product viability.
What may appear to be a simple conference room is, in fact, a highly complex technical testbed. Their work brings together multiple areas:
-
Acoustic engineering
-
Audio data collection and AI model training
-
AI algorithm development and deployment
-
Fast and efficient inference optimization
Throughout development, the focus has not only been on algorithmic accuracy, but also on how the solution actually sounds to real people in real-world environments.
Always One Step Ahead
“We know that clear speech is the foundation of successful communication—especially in glass-walled spaces with challenging acoustics. Achieving that requires serious innovation, long-term thinking, and deep technical commitment.
We’re proud of how far we’ve come—but we’re not done yet. From one room to the next, one voice at a time, we’re pushing the boundaries of what’s possible.” — The Yealink Team