HD Voice

HD Voice

Aymeric TIBURCE How-to, Lab

What is HD Voice?

HD Voice is short for High-Definition Voice, and is also called Wideband.  In Internet and mobile telephony, it refers to the use of Wideband technology to provide a deeper clarity and better audio experience during a telephone call.  The higher frequencies transmitted in Wideband make for easier recognition of un-voiced sounds such as “ss” “f” “sh”.  Understanding is better, particularly when there is a high level of background noise or more than one person speaking.

Wideband frequency range is from 50Hz to 7.5 kHz, which is double the highest Narrowband frequency.  This adds significant depth and nuance to the transmitted sound.

HD Voice technology uses advanced codecs to transmit the higher quality sound without using more bandwidth than Narrowband transmission.  HD Voice currently uses a number of Wideband codecs including AMR-WB, EVRC-WB, G.722 and G.722.1.  Other codecs, including SILK and iSAC, can operate in Narrowband, Wideband and Super-Wideband modes.  Fourth-generation (4G, or Long Term Evolution, LTE) mobile services will enable network operators and service providers to offer “Voice over LTE” (VoLTE) which in many cases will provide HD Voice and even super-Wideband services.

Where is HD Voice Used?

HD Voice is used in internet telephony and mobile communications, audio and video conferencing and also in some office communication systems.

What is the best method of testing HD Voice performance?

ITU-T Rec. P.863 POLQA is the prime choice for testing HD Voice and services up to Super-Wideband.

The Background

Although the PESQ metric has both Narrowband and Wideband modes, objective testing of HD Voice quality using PESQ can lead to confusing and seemingly unsatisfactory conclusions.  To understand the reasons for this it is necessary to appreciate that PESQ attempts to model the outcomes of subjective testing. Different methods are used for subjective assessment of Narrowband and Wideband speech but the 1-5 scale is the same.  The Narrowband subjective test involves the use of a loudspeaker or a handset, so that listeners are aware of background noises around them. The Wideband subjective test, on the other hand, uses headphones, so that virtually no background noise is audible. This means that Wideband subjective testing is much more critical than for Narrowband and therefore the results come out on different scales; 3.5 Narrowband is not as good as 3.5 Wideband.

Testing HD-Voice with PESQ: Potential for Confusion

When PESQ is used to compare the performance of, for example, mobile devices/networks that can operate in both Narrowband and Wideband modes, there is an understandable expectation that the Wideband score will be higher than the Narrowband score.  The Wideband score may be only equal to or even lower than than the Narrowband score, even though the Wideband sample sounds “better” than the Narrowband sample.  This makes testers inclined to distrust objective methods, as the intuitive belief is that “what sounds better should score higher”.

No Confusion with POLQA

POLQA has solved this problem by defining a Super-Wideband (SWB) scale such that Narrowband, Wideband and Super-Wideband performance can be measured, and compared, on a common scale.  Using the POLQA SWB scale, the relative scores are generally in line with intuitive expectation if the same handset is tested in Narrowband or Wideband mode.

Narrowband scores on the POLQA SWB scale will tend to be numerically lower than those on the PESQ Narrowband scale, because the POLQA Narrowband scores are compressed so that the better Wideband and Super-Wideband scores can fit into a common scale of 1 to 5.

Best-case Scores with POLQA SWB Scale

The following table compares the scores returned by POLQA SWB, POLQA NB, PESQ WB and PESQ NB, for a variety of Narrowband and Wideband conditions:

 

POLQA
SWB

PESQ WB
P.862.2

POLQA
NB

PESQ NB
P.862.1

14kHz 16 bit Linear

4.75

7kHz 16 bit Linear

4.5

4.6

AMR - WB

4.0

3.6

3.4kHz 16 bit Linear

3.8

3.6

4.5

4.5

G.711

3.7

4.3

4.5

EFR / AMR-FR 12.2kbps

3.6

4.1

4.1

EVRC 9.5 kbps

3.4

3.9

3.7

EVRC-B  9.5 kbps

3.5

4.0

3.8

AMR-HR 7.95 kbps

3.4

3.8

3.6

Testing HD Voice with MultiDSLA

First configure the nodes, then choose the correct task list.

Configuring the Nodes

When configuring the nodes, bear in mind the following:

For DSLA nodes specify the following properties:

Application:

Narrowband,   HD-Voice

All   bandwidths

Property

PESQ   and/or POLQA

POLQA   only

Sample Rate

16k

48k

Scale Type

Wideband/Super-Wideband

Metric

PESQ and/or POLQA

POLQA

 

For sVN nodes, the sample rate depends on the codec. Select one of the codecs that support Wideband: AMR (Advanced Multi-rate) or G722. For AMR, set the sample rate to 16k.

Selecting the Tasklist

When selecting the tasklist, bear in mind the following:

  • P.501 tasklists use speech files that are suitable for use with POLQA. These are the only types of tasklist that support Super-Wideband and will therefore work with all three bands.
  • ‘Cellular’ denotes Narrowband cellular material which uses mIRS (modified Intermediate Reference Send) filtering
  • ‘NextGen’ denotes material which uses Wideband or Super-Wideband filtering so HD Voice fits into this category
  • ‘PSTN’ denotes Narrowband material which uses IRS (Intermediate Reference Send) filtering

Conformance with ITU-T Recommendations

PESQ Narrowband conforms to ITU-T Recommendations P.862 and P.862.1.

PESQ Wideband conforms to ITU-T Recommendation P.862.2.

POLQA conforms to ITU-T Recommendation P.863.