Background to the P.863 POLQA Bug Fixes

Over the last few months a number of bugs have been found and fixed in POLQA. These have been published in ITU Recommendation P.Imp863. Why a new Recommendation? This is standard practice in the ITU. Rather than reissuing the original recommendation the bug fixes are published in an implementer's guide. The new recommendation includes 9 fixes.

The MultiDSLA v4.3.2 and subsequent versions include the new POLQA release. It should be downloaded and installed.

Should I update?

It is expected that everyone should use the new version of POLQA as these fixes can have an impact on the results obtained by POLQA. The new recommendation includes new conformance data for POLQA.

What are the POLQA bug fixes?

Most of the bug fixes resolve coding mistakes rather than change how P.863 works. The only algorithmic change has been to a re-sampling threshold in the Idealization process.

The POLQA re-sampling mechanism accounts for 'time-scaling' distortions. Time-scaling is where the time domain is compressed or stretched while keeping the nominal sample rate constant. These types of distortion are common in modern telecommunications as a result of:

  • advanced jitter buffer adaption
  • packet loss error concealment
  • poor clock generators in A/D or D/A converters

Large time differences can be seen between reference and degraded speech files without compensation for 'time-scaling'. However, the perceived difference/degradation is very small.  POLQA's re-sampling mechanism tries to match the 'time-scaling' in the reference and degraded signal. In this way the influence of this time difference on the quality prediction will be removed.

While characterising the performance of P.863 it was found that this mechanism works well for small 'sample-rate' differences of around 1%. It was less robust beyond these levels.  For example, a 0.45 MOS difference between no time-scaling and 3% might be seen.  The change to the re-sampling threshold significantly increases robustness. Scores are now typically within 0.15 MOS with time-scaling variations from -3% to +3%.

Measure cellphone mp3 audio performance