We added POLQA (ITU-T P.863) to our MultiDSLA product some seven months ago. Since then, we have been busy discovering what works and what doesn’t work quite so well in the new algorithm. Most of this learning comes from working with our early adopters of POLQA. We continue to gain further knowledge about using POLQA and how to understand results that don’t make sense. We thought it would be useful to share these odd behaviours over the coming weeks. Here is the second one.
2. Ensure reference files pass the transparency test
POLQA does not always produce a perfect score when a reference file is compared with itself. This is referred to as the transparency issue and is being studied but there is no workaround today. While we wait for a change to the standard you must check the suitability of your reference material before you start using it with POLQA.
The standard describes the characteristics of the reference file recording, in particular the mean active speech level should be -26dBov and the noise floor below -80dBov(A). There should be leading and trailing silence and a gap of one second between the sentences. In addition, the reference file should be sampled at 48k and processed with a super-wideband audio bandwidth filter. If your file does not meet these requirements then it may produce incorrect results. Even when these requirements are met the file may still display the transparency issue.
The transparency issue arises because POLQA assumes that a reference signal will have a balanced timbre and it judges deviations from a balanced timbre as a degradation. If the timbre in the reference is not balanced, for example because of a bass-boost, or a lot of sibilance, a reference-reference POLQA comparison with no impairments will not return the maximum score for the scale. Typically you’ll see a drop of a 0.1 or 0.2 MOS, however it can be larger.
A simple way to check if you have a transparency issue is to compare a candidate 48k sample rate super-wideband reference recording with itself using the File Processor in MultiDSLA; if the result is not equal to the maximum theoretical score of the model then the tested recording is not suitable for use as a reference signal for POLQA. In super-wideband mode the reference to reference comparison should score 4.75. If the candidate reference file is filtered to have no energy above 3.8kHz and then compared with itself in narrowband mode the score should be 4.5.
To further confirm the appropriateness of a reference signal, the same tests should be performed after adding a small time offset (10ms and 15ms) to the start of a copy of the reference before presenting it as the degraded file input to POLQA. The predicted score should remain equal to the maximum theoretical score of the model.
Note: a drop in score of 0.1 MOS is generally not audible by a listener, but when selecting a reference speech file it makes sense to have a file that achieves a perfect score when compared to itself.
Part 1 Don’t use more than 10s of speech in narrowband mode