POLQA vs PESQ – “Why does PESQ give a lower score?”

"Do I really need POLQA?

The question is understandable, as the cost of POLQA can be a bit higher than the cost of PESQ. MultiDSLA customers recognize great value and sometimes want to pursue it to the limit! Well, predictably, the answer is "It depends..." but to be more helpful, Opale has published a video describing specific cases where POLQA is the best solution and also when the PESQ metric - about 10 years older than POLQA - might be perfectly good enough.  The video is available here , in case you missed it.

From time to time, customer questions bring to mind the practical reality of the PESQ vs. POLQA debate, and that's exactly what happened recently, when we got this question:

"Why do I only get a PESQ score of 1.90, when POLQA gives 3.86?"


How do you answer a question like that?


By the way, we posted another video that talked about analysis techniques and if you want to refer to it, here it is.

MultiDSLA Analysis

But getting back to that question, here's how the two sets of results - POLQA first, then PESQ - came about:




So how does the survey work?

Step 1: Listen

Listening to the degraded signal alone doesn't reveal much. In fact, it sounds reasonably good, not excessively distorted, and there are no "gaps" in the speech. But then listening to the reference signal a few times reveals something interesting - the speech in the degraded signal actually comes out faster during the first of the two sentences!


Step 2: Look

Next, there's a visual clue, clear enough in the screenshots above. The weakest of the three traces - the "error surface" in the POLQA analysis - shows small, but fairly consistent errors. The same trace in the PESQ analysis (highlighted) shows larger errors throughout the first sentence and right at the beginning of the second. What is going on?


Step 3: Going Further

It's time to go through some of the other POLQA and PESQ graphs provided by MultiDSLA to see what we can discover. When we get to the Sample (Frame) Time Offset view, this is what we see, again POLQA first:



Well, they both tell the same story, sort of...


Latency/Delay Changes

POLQA tells us that during the first of the two sentences, the one-way latency (delay) decreases by more than 200 ms over a period of about 2.1 s. POLQA indicates that this is an approximately linear fit, corresponding to a speech "speed-up" of about 9.5%. Interestingly, speech pitch appears normal while this is occurring. One would expect that "speeding up" speech by nearly 10% would raise the pitch, but that is not what we hear.


What does PESQ say? Well, PESQ shows a similar overall effect but interprets the latency reduction as a series of discrete steps rather than a continuous process. So what's going on?


Codec behavior discovered

We find the explanation in the way some codecs handle adverse network conditions through a process called "time warping." Codecs with this capability belong to a class called relaxed code excited linear prediction (RCELP) that can vary the read rate of a jitter buffer while approximating the original signal pitch. Reading a signal faster than normal allows the codec to regain the time it "lost" after the jitter suppression buffer was increased to accommodate a delayed packet arrival time.


So what? And how does this explain the low PESQ score and the high POLQA score?


What do users think?

Here's the problem: PESQ can't track this kind of codec behavior - it simply considers it a distortion because the "accelerated" degraded signal no longer matches the clean (original) reference signal. But how do users perceive time-warping? Most of the time, they won't notice: after all, when you make a phone call, you can't compare what you hear to a "reference". So we have a technique that is effective in preserving voice quality, and that PESQ cannot handle, tending to give scores lower than what might be considered reasonable - that is, lower than what human subjects would tend to give.


Mobile Codecs - State of the Art

The example in this blog is from a Skype Internet call, and the effect is pronounced and easy to see. But the same technique works on a smaller scale in codecs used in mobile networks, and that's where the limitations of PESQ became clear over a decade ago. This prompted interested parties to publish technical reports suggesting that inappropriate use of PESQ measurements could result in unfair, or even invalid, characterization of codec performance. You can see an example of this here .


And now there is POLQA

The partial obsolescence of PESQ was also one of the driving forces behind the development of a new objective voice quality analysis technique, which eventually led to the first commercial version of POLQA in 2011.


So, our client's simple question about PESQ vs. POLQA took us a long way, but we got to the explanation and discovered - guess what? That you have to use the right tools for the job!

Troubleshooting Voice Quality