POLQA vs PESQ – “Why does PESQ give a lower score?”

Nick Fox How-to, In More Depth, Lab

“Do I really need POLQA?”

The question is understandable, as the cost of POLQA can be a little greater than the cost of PESQ. MultiDSLA customers recognize great value and sometimes want to pursue it to the limit!  Well, predictably, the answer is “It depends…” but to be more helpful Opale published a video outlining specific cases where POLQA is the better solution and also when the PESQ metric – some 10 years older than POLQA – could be perfectly good enough.  The video is available here, in case you missed it.

From time to time, customers’ questions bring home the practical reality of the PESQ vs POLQA debate, and that’s just what happened recently, when we got this question:

“Why do I only get a score of 1.90 from PESQ, when POLQA gives 3.86?”

How do we go about responding to a question like that?

By the way, we published another video which discussed analysis techniques and if you want to refer to that, here it is.

MultiDSLA Analysis

But to get back to this question, here is how the two sets of results appeared – POLQA first, then PESQ:

So how does the investigation go?

Step 1: Listening

Listening to the Degraded signal alone does not reveal much.  Actually, it sounds reasonably good, not excessively distorted, and it doesn’t have ‘gaps’ in the speech.  But then listening to the Reference signal a couple of times does show up something of interest – the speech in the Degraded signal actually comes out faster during the first of the two sentences!

Step 2: Looking

Then there is a visual clue, quite clear in the screenshots above.  The lower of the three traces – the “error surface” in the POLQA analysis – shows small, but fairly consistent errors.  The same trace in the PESQ analysis (highlighted) shows larger errors throughout the first sentence and just at the beginning of the second.  What is going on?

Step 3: Going deeper

It’s time to scan though some of the other POLQA and PESQ graphs provided by MultiDSLA to see what we can discover.  When we get to the Sample (Frame) Time Offset view, here is what we see, again POLQA first:

Well, they both tell the same story, kind of…

Latency/Delay Changes

POLQA is telling us that during the first of the two sentences the one-way latency (delay) reduces by more than 200ms over a period of around 2.1s.  POLQA indicates that this is an approximately linear adjustment, corresponding to a ‘speeding up’ of the speech by around 9.5%.  It’s interesting to notice that the pitch of the speech sounds normal whilst this happens.  We might expect that ‘speeding up’ the speech by almost 10% would cause the pitch to rise, but somehow that is not what we hear.

What does PESQ have to say?  Well, PESQ shows a similar overall effect but interprets the latency reduction as a series of discrete steps rather than a continuous process.  What is going on?

Codec Behavior Uncovered

We find the explanation in the way some codecs handle adverse network conditions by a process called ‘time-warping’.  Codecs with this capability belong to a class called relaxed code-excited linear prediction (RCELP) which can vary the play-out rate of a jitter buffer whilst approximating the original pitch of the signal.  Playing out a signal faster than normal allows the codec to regain time which it ‘lost’ after the de-jitter buffer was increased to accommodate a period of delayed packet arrival.

So what?  And how does this explain the low score of PESQ and the high score of POLQA?

What do Users Think?

Here’s the thing: PESQ cannot track this kind of codec behavior – it just sees it as a distortion because the ‘speeded-up’ degraded signal no longer matches the clean reference (original) signal.  But how do users perceive time-warping?  Most of the time they will be unaware it is happening: after all, in a phone call you can’t compare what you are hearing to a ‘reference’.  So, we have a technique which is effective in preserving voice quality, and which PESQ cannot deal with, tending to give lower scores than might be considered reasonable – that is, lower than human subjects would tend to give.

Mobile Codecs – State of the Art

The example in this blog comes from an Internet Skype call, and the effect is pronounced and easy to see.  But the same technique works on a smaller scale in codecs used in mobile networks, and it was here that the limitations of PESQ became clear over a decade ago.  This prompted interested parties to publish technical reports which suggested that inappropriate use of PESQ measurements could result in unfair or even invalid characterization of codec performance.  You can see an example of this here.

And now there is POLQA

The partial obsolescence of PESQ was also one of the drivers for the development of a new objective technique for voice quality analysis, which eventually led to the first commercial release of POLQA in 2011.

So, our customer’s straightforward question about PESQ vs POLQA has led us down a long path, but we got to the explanation and discovered – guess what?  That you need to use the right tools for the job!

Talk to Opale Systems today to learn about the right tools for your voice quality application:

Talk by form


Talk by email