The Multi-RAB (Multiple Radio Access Bearer) Dropped Call Problem
With the proliferation of smart phones, the numbers and percentages of multi-Radio Access Bearer calls on your network continue to grow. In fact, you’ve probably experienced issues that have been born out of Multi-RAB capable handsets. It is actually quite common for multi-RAB calls with circuit switched (CS) and packet switched (PS) services running in parallel to drop more often than calls with a single instance of CS or PS Radio Access Bearer. These types of problems can be quite challenging, because they can go undetected until customers start to complain. And then they take on a new level of challenge as they are assigned to you or your support teams to troubleshoot.
About the Case Study
This is a case study of a mobile carrier that experienced dropped call complaints for seven months. The RF engineers assigned to the problem were highly skilled, but were not armed with the proper tools to effectively troubleshoot. Tekcomms was enlisted to assist in finding the root causes. This article describes the methodology and tools used troubleshoot the problem.
Tools and Tool Operators
The are many ways to discover a problem ranging from customer complaints, to analyzing key performance indicators, to testing. But you can’t fix a problem until you know the root cause. Finding the source of an issue is the real artwork.
The radio environment is an extremely complex ecosystem. The right set of monitoring and troubleshooting tools, coupled with RF Engineering expertise, will take you beyond the symptoms to the source of the problem. Even though the tools are smart and can draw geographical, pictoral and tabular views of network conditions, RF engineering expertise is frequently needed to arrive at the root cause. Arriving at a root cause may require following a series of paths, many of which will end with “performing as expected,” thus forcing you to pursue alternate routes. Tools coupled with RF expertise will help you negotiate these paths.
The Journey to the Root Cause
After reviewing the symptoms of the problem with the carrier’s RF Engineering staff, we looked at the Radio Network Drop Call reports using the Iris Performance Intelligence tool to identify the causes of the dropped calls starting with the “Top N Worst Performing eNodeB” for Detach Attempts and Failures.
We found that CS and PS calls were dropping at an abnormally high rate, so we filtered the data to isolate the problem. Using Iris Session Analyzer’s call trace function, we drilled down into the failed CS and PS call record summaries revealing that most failures were for multi-RAB calls vs. single RAB calls.
Drilling into the Multi-RAB Data
Next we drilled down into the multi-RAB calls themselves. We opted for a customized report using the Network and Service Analyzer (NSA) tool to allow for deeper root cause analysis. We did this because the customer’s expert RF engineering staff had deep knowledge of the RF environment, so we choose to go statistical over pictoral. The problem was systemic, so we were able to use historical data. Historical data capture is a nice time-saving benefit of the Tekcomms tool suite because we didn’t have to administer a special test environment to capture the data.
A quick view of the statistics on "Handset Type" showed that the majority of Dropped Multi-RAB calls were experienced by Apple iPhone users.
To avoid jumping to a conclusion that the iPhone was the source of the problem, we compared this data with the “IPI Predefined Top 10 Device Report on handset usage statistics” which showed that most of the calls in this network were made using Apple iPhones.
We were able to conclude statistically, that other handsets experienced multi-RAB dropped calls proportionally to the iPhone – this eliminated any particular handset as the root contributor and helped us focus on the other potential sources.
A Deeper Look at the Multi-RAB Drops
To characterize our situation, we looked into RSCP (Received Signal Code Power) and EcNo (Received energy per chip divided by the power density in the band). This first analysis showed protocol causes along with the initial and last measurement results of RSCP and EcNo.
We made an assumption that -100 dBm or higher RSCP was required for good coverage and interference should not generate EcNo levels greater than -14 dB. With these assumptions in place, it was easy to evaluate which of the failed multi-RAB calls dropped due to radio conditions and which must have had other root causes.
The next level of analysis brought us to a scatter plot of EcNO vs. RSCP:
So that we could get the executive management team to see things as we saw them, we opted to simplify this visual by drawing a pie chart from scatter plot data. We arrived at the view in Figure 7, which shows that most of the drops occurred despite very good RF conditions.
Narrowing the Path to the Root Cause
We had eliminated handsets and RF conditions as major contributors, so we went deeper into the dropped calls report and found the following trends:
- 50% of the drops were not caused by the phone, but by network elements. We found a 50/50 distribution between NodeB and RNC issues along with multiple types of issues. One example is shown in figure 8 where the multi-RAB call dropped after one of the involved NodeBs was unable to perform the preparation of a serving HSDPA cell change (“HSDPA Handover”) due to an issue in its state machine. After 5 tries the RNC dropped the call.
- Figure 9 shows another contributor to the Multi-RAB drop problem. Activation of compressed mode in the NodeB lead to a NBAP radio link failure resulting in a lost connection with the UE. We doubted that a radio link failure had occurred, because we saw that UL speech frames with good quality were received on the user plane. Even if there had been an issue with the Compressed Mode activation procedure there was no need for the RNC to drop the call.
- Next we found that only 20% of the calls dropped due to radio conditions that could be optimized, so we had to look further.
- Ultimately we found that the chief contributor was that the UE did not increase UL Tx Power after Spreading Factor Reconfigurations or Active Set Update Radio Link Addition Procedures. The typical radio chart for this issue is shown in the figure 10 below:
Our conclusion lead us to the equipment manufacturer of the RAN equipment as it was not instructing the mobiles to increase transmit power. We were able to provide all the proof necessary to get the manufacturer to own the issue and commit to a fix.
We set out to solve this problem with a top down type of approach. We started with the symptoms and sifted our way through the data until we found potential causes. We used an experienced set of RF engineers along with our Tekcomms toolset to examine the data. This combination of expertise and tools allowed us to narrow our path to the root cause.