In this blog, we will explore:
- What Causes VoIP Call Quality Problems?
- How to Measure VoIP call quality
- The Role of Session Border Controllers (SBCs) in Monitoring Voice Quality
- How VoIP Systems Monitor and Report Voice Quality
- How to Identify and Fix VoIP call quality Issues
Poor voice quality is something most people have experienced, even as consumers. It significantly increases errors. The error rate goes up, and it does not take much to spoil the voice path enough to cause errors when reading an address, a telephone number, or a ZIP code. This slows down agent handling in contact centers and causes stress for the customer. Trying to listen hard through a crunchy call, a call with echo, or one with a lot of latency can really harm customer satisfaction.
Maintaining high voice quality on VoIP systems is critically important.
What Causes VoIP Call Quality Problems?
In an ideal world, there would be no impairments. Packets traveling over Real-time Transport Protocol (RTP) from point A to point B would all be in order, right on time, and perfectly delivered. Most of the time, networks are absolutely perfect with no packet loss, no jitter, and everything is great. But when issues appear, they tend to fall into a few categories.
1. Packet Loss
Packet loss means a packet goes missing. The other packets appear in time, but one does not. That forces the destination VoIP system to take a best guess at what audio was in that hole. The codecs use a trick called packet loss concealment. It often sounds robotic, a sort of simulation of a voice, which usually means a number of packets are missing and the codec is trying to make up speech that sounds close to what was transmitted.
2. Jitter
Jitter is when packets do not arrive predictably on time or they arrive out of order. This causes problems for codec algorithms. If data is missing, that is one thing, but if it shows up late or out of order, then the system has to reorder the packets. To do that, it has to delay, buffer, or queue packets, and of course that causes latency.
3. Latency
Latency is total time delay that comes when a person/callers speaks and when a response is produced and heard. Up until around 300 milliseconds total latency is fine. As soon as it starts going over 300 milliseconds total latency, you begin hearing your voice back and delays become noticeable. As a result, people misunderstand that the other person has stopped talking and interruptions increase. Sometimes network echo cancellers also cannot deal with large latency figures, and echo can end up in the call.
4. Codec Mismatch
Codec mismatch is usually a configuration problem. A codec is used to compress and transmit voice during a VoIP call. If caller and receiver do not support the same codec, they may not understand each other’s audio format. For example, device A uses G.711 codec and device B uses G.729 codec
If the system is not configured to convert between them (transcoding), the call may have problems such as no audio, poor voice quality, and call failure.
5. Dual-Tone Multi-Frequency (DTMF) Transport Issue
Dual-Tone Multi-Frequency (DTMF) transport is the way to send the keypad tones (numbers you press on a phone) during a call accurately and reliably so that IVR systems can understand which button the caller pressed.
6. One-way Audio
In one-way audio, usually a configuration problem, sound is unidirectional during a call. Person A can hear person B but person B cannot hear person A.
7. Echo
In echo, the speaker hears his or her own voice back during the call a few seconds after speaking. Echo is not uncommon in VoIP or phone calls and can happen because microphones pick up sound from speakers, poor headset or speaker setup, network delays, problems with echo cancellation in devices
Explore More: Reducing Latency in Voice AI Systems: Practical Optimization Techniques Across the Pipeline
How to Measure VoIP Call Quality
There are various models to measure VoIP call quality . These models are:
- Recommendation P.800
- ITU-T P.863
- E-Model
Recommendation P.800
The International Telecommunication Union (ITU) created a standard called Recommendation P.800 to measure how good or bad voice quality sounds during phone calls. This standard measures Mean Opinion Score (MOS). MOS is a standardized way of scoring human speech and specifically how human speech is perceived by humans. It uses 60 listeners, usually with headphones in booths, who score speech samples from five down to one. From that, a mean opinion score is derived.
This is great if there are facilities and people available and speech is being processed through an algorithm to determine whether it improves or harms perception. But it is not very practical for real-time telecommunication systems because it requires 60 people listening in real time.
ITU-T P.863
A more current algorithmic objective method is ITU-T P.863. No people are involved. It compares a reference input against a degraded or impaired output. The algorithm predicts the cognitive or perceptual result that a real listener would have and determines the MOS score. It is an excellent predictor of MOS and can be done in a lab environment with computers, but it still requires the reference input and the degraded output under test. That makes it impractical for a real-time, on-the-fly telecommunications measurement system.
The E-Model
In the practical real world, voice quality is measured by taking information about the codec, collecting latency figures, packet loss figures, and jitter figures, crunching them together into an E-model calculation, producing an R-factor. The R-factor ranges from 0 to 100 and can be mapped to a MOS score.
The call quality can be calculated automatically using network data from the call. During a VoIP call, the system already collects technical information such as the number of packets lost, jitter received for a specific call, and latency if RTCP is enabled. If all of that is available, a MOS score can be calculated automatically, without any people involved and without the source original audio.
Within VoIP products like media gateways and session border controllers, each call that passes through the devices produces a MOS score and a network quality score based on the codecs in use and network data.
Among the three codecs presented, G.711 gives the best results compared to G.729 and G.723. An ideal call on G.711 is 4.3. It is not a five because G.722 wideband can do better than G.711, using a better way of compressing information and a wider spectrum. But the regular TDM network and the regular IP network commonly use G.711 as the default codec.
The Role of Session Border Controllers (SBCs) in Monitoring Voice Quality
A session border controller sits in a unique place in the network. As the name hints, it acts as the border of a voice network, the point of ingress or egress between a carrier and another carrier, or between a carrier and an enterprise. This demarcation point makes it a great place to measure, track, and report voice quality because it sees all the voice traffic coming and going. It sees the call setup and all the RTP traffic as it passes in or out of the network..
RTP (Real-time Transport Protocol) and RTCP (Real-time Control Protocol) are network protocols used in VoIP, video conferencing, and real-time media communication over IP networks.They usually work together to deliver and monitor audio or video during a call.
For every call between service providers, there is two-way audio, with an RTP stream in each direction. These streams pass through the session border controller. The opportunity is to measure the performance of the RTP as it enters the SBC. This is done with statistical modeling by measuring jitter, latency, and packet loss, collecting those numbers, and calculating a score on the incoming RTP.
If RTCP responses from the far end are accessible, statistics can also be collected from them. With either RTP alone, or in the best case both RTP and RTCP, figures can be calculated and voice quality can be determined. This can be used to record and store information, or to make intelligent routing decisions if a provider, leg, or WAN link is having trouble.
How VoIP Systems Monitor and Report Voice Quality
Voice quality can be monitored and analyzed via:
- Call Trace Analysis
- Using Call Detail Records (CDRs)
Call Trace Analysis
One method is to put the information in a call trace. For each call that goes through the SBC, the web interface provides access to the call trace. Opening it shows the full call flow of that specific call, who called from which network, which IP was used, the RTP and User Datagram Protocol (UDP) ports used, and at the end of the call, information such as how many packets were received, whether there were any packet errors, RTP events, fax information, and the calculated MOS score for that particular call.
In a VoIP call, there are two audio streams: One stream sends voice from person A to person B and another stream sends voice from person B to person A. Each stream can have its own MOS (Mean Opinion Score) to measure voice quality.
If one MOS value is 0.0 and the other is 4.3, that means only the MOS score from the received leg was calculated because RTCP was not enabled from the other side of the call due to which the system cannot get quality statistics of other streams.. That indicates the voice quality on the measured side was really good.
At the end of the call, this allows a clear conclusion about whether it was a good call, at least from an ingress standpoint.
Using Call Detail Records (CDRs)
Another method is through call detail records. When enabled on the SBC, call detail records can be kept in text files and they preserve the MOS information of every single call that came through the system.
These records include much more than just the MOS score. They show how many packets were dropped, how many were lost, how many had sequence errors, the jitter, and the latency. This gives all the information needed for a complete analysis of a particular communication.
How to Identify and Fix VoIP call quality Issues
Reducing voice quality issues really boils down to logical thinking. It means taking known factors and starting to make logical decisions.
Questions to ask include whether the impairments are carrier dependent, on a particular WAN or LAN segment, time-of-day dependent, or related to something that recently changed. It requires understanding the full network and having the details of each call that has gone through the system, including which carrier the call used, which LAN or WAN segment it went on, the time of day, and the other related factors.
With that information, it becomes possible to distill what is happening and make changes.
Poor MOS Based on Time of Day
If voice quality seems to drop in the middle of the day, one potential cause is network congestion. In enterprise applications, this may happen because a WAN leg is undersized.
Possible remedies include:
- adding WAN capacity
- choosing an alternate carrier
- turning on voice compression to reduce congestion, even though it may lower everybody’s voice quality during those congested times
One-Way Voice
One-way audio is a common problem. It often relates to an Application Layer Gateway (ALG), a firewall, or NAT behavior.
When the SBC is connected directly to a public network or directly on a virtual private network, there is no need to go through a firewall and data goes both ways without a problem. But sometimes it has to pass through a firewall or another endpoint, and then the SIP traffic may be easy to forward while the RTP traffic is not. RTP requires many ports to be open in the firewall. Application level gateways can be installed on firewalls to control this, but they can be complicated to configure and can cause problems. NAT behaves similarly if it remaps ports incorrectly.
Possible remedies include:
- checking configuration
- bypassing the ALG or firewall if possible
- setting up port forwarding and pinholes to allow voice and RTP through
Low MOS Scores Due to Service Provider Congestion
If the service provider’s network has congestion, there is not much that can be fixed inside that network except complaining about it.
Possible remedies are:
- rerouting traffic
- complaining to the service provider
- changing service providers
Rerouting traffic is actually not a bad idea and is sometimes a practical tool.
Echo
Echo is generally caused by endpoints, either with acoustic echo or by the hybrid in the device at the far end. It may also come from a media gateway or another far-end network element that is not able to squash the echo because the echo cancellation algorithm does not converge.
Possible remedies include:
- checking endpoint configuration
- making sure echo cancellation is turned on
- fixing anything misconfigured
- trying a different endpoint
- trying a different number
- simply calling again, because sometimes the echo canceller converges on the next call
That is often what happens on cell phones with very echoey calls.
Voice quality measurement, monitoring, reporting, and routing all play an important role in delivering high-quality VoIP experiences. With the right metrics, the right visibility into RTP and RTCP, and a logical approach to identifying impairments, it becomes possible not only to troubleshoot issues but also to proactively manage service quality and hold providers to their quality objectives.
FAQs about VoIP Call Quality
1. What Causes VoIP Call Quality Problems?
Most of the time, networks are absolutely perfect with no packet loss, no jitter, and everything is great. But when issues appear, they are due to Packet Loss, Jitter, Latency, Codec Mismatch, Dual-Tone Multi-Frequency (DTMF) Transport Issue, One-way Audio, Echo.
2. How to Measure VoIP call quality
In the practical real world, voice quality is measured by taking information about the codec, collecting latency figures, packet loss figures, and jitter figures, crunching them together into an E-model calculation, producing an R-factor. The R-factor ranges from 0 to 100 and can be mapped to a MOS score.
3. How does Jitter impact VoIP call quality ?
Jitter is when packets do not arrive predictably on time or they arrive out of order. This causes problems for codec algorithms. If data is missing, that is one thing, but if it shows up late or out of order, then the system has to reorder the packets. To do that, it has to delay, buffer, or queue packets, and of course that causes latency.
4. What is a codec mismatch in VoIP call quality ?
Codec mismatch is usually a configuration problem. A codec is used to compress and transmit voice during a VoIP call. If caller and receiver do not support the same codec, they may not understand each other’s audio format. For example, device A uses G.711 codec and device B uses G.729 codec
If the system is not configured to convert between them (transcoding), the call may have problems such as no audio, poor voice quality, and call failure.



