POLQA v2.4 – Notes on the Upgrade’s Behaviours (Part 2)

Olivier Willi

Thursday, 23rd June 2022

< Back to list

Comparing Performance Differences between ITU-T Rec. P.863 v2.4 and v1.1 for Speech Quality Metrics

ITU-T Rec. P.863 v2.4 works rather differently from v1.1. Many aspects of the algorithm have been modified. Consequently, the performance differs from v1.1 in some respects. POLQA v2.4 is more sensitive to speaker and language. Some examples are shown below.

Speech quality metrics, like codecs, have some differences in performance according to the spectral content of the speech material. Background noise level and the dynamic range of the voice also affect metric performance.

Impact of Speaker and Language Sensitivity in POLQA v2.4: A Comparative Analysis with v1.1

These graphs show how the speech quality score differs between the POLQA versions when evaluating speech with added Gaussian White Noise at different levels in the analogue domain for a number of speakers. The speech files were selected from material in ITU-T Rec. P.501 and filtered by the MIRS characteristic. The x-axis shows the level of added noise. The y-axis shows the score from the Narrowband Model of P.862.1 PESQ, P.863 POLQA v1.1 and POLQA v2.4.

These three examples are of the best and worst speaker sensitivities in a larger set of tests. POLQA v1.1 exhibited some speaker sensitivity but v2.4 may show more. There is almost no difference in the score between v1.1 and v2.4 for the Japanese female speech but the English male speech shows a greater range. The Russian male speech is consistently 0.2 MOS lower when evaluated by v2.4.

The use of many different speakers when testing speech transmission systems has always been best practice. MultiDSLA now includes male and female speakers from eight different languages, the ITU-T Rec. P.501 Annex C material, to minimise error due to speaker dependency.

Using these 32 speakers to evaluate the performance of PESQ and the two versions of POLQA NB for simple codecs, we can see the wide range of scores that can be obtained in columns 1-64. Column 65 is the mean for the condition. The G.711 A Law scores are generally lower with v2.4 than with v1.1. G711 μ Law scores are only slightly lower.

The mean scores for the different conditions are shown below.

	POLQA NB v1.1	POLQA NB v2.4
G.711 µ Law	4.44	4.42
G.711 A Law	4.44	4.36

Contact Opale Systems or your distributor for more information.

How can we help you?

mod_content_opalesystemsfooter_address_marker

Vélizy-Villacoublay

Green Plaza
6 rue Dewoitine
78140 Vélizy-Villacoublay
France

Sophia Antipolis

120 route des macarons
Parc de Sophia Antipolis
06560 Valbonne
France

Send

This site uses cookies to ensure its proper functioning. It also uses cookies from third party services to provide advanced functionality. At any time, you can choose which services you wish to activate or decide to withdraw your consent.

Customise accepted services

You are free to choose which services you wish to enable. By authorising these third party services, you agree to the deposit and reading of cookies and the use of tracking technologies necessary for their proper functioning. By withdrawing your consent for some of these services, some website features may no longer function.

Website navigation Read more

The site writes a session cookie to enable it to function properly and to help with navigation. It cannot be deactivated.
Usage: 1 cookie, records the session identifier.
Time to live: The cookie is present during the entire session on the site. It becomes obsolete after 24 minutes of inactivity.

Mandatory

Media Popup

Display videos from Youtube or Dailymotion.

Google Analytics Read more

Records website statistics.

Accept all Refuse all Manage