1 Introduction

VoIP is a class of new technologies that deliver voice calls over the packet-switched networks, instead of the legacy circuit-switched telecom networks, i.e., the so-called Public Switched Telephone Network (PSTN). By transmitting the voice data over the Internet, VoIP offers clear benefits over the PSTN calling service, including improved quality of service, high-fidelity codecs, and lower monetary costs. As a result, network operators are actively promoting VoIP to modern Android smartphones [1,2,3], with the latest VoLTE (Voice over LTE) and VoWiFi (or Wi-Fi Calling) schemes being deployed.

Existing works on Android VoIP security, however, are far from comprehensive. They focused either on the weaknesses of VoIP network infrastructure, e.g., the insecure deployment of VoIP protocols at the network service providers’ side, or on the privacy concerns of third-party VoIP apps. Notably, Li et al. [4] and Kim et al. [5] discovered multiple vulnerabilities in VoLTE’s both control- and data-plane functions, and Xie et al. [6] uncovered four vulnerabilities in operational Wi-Fi calling services. Regarding Android VoIP’s client-side security, only the privacy risks of some VoIP apps were tested [7, 8], e.g., whether their traffic are encrypted with SSL/TLS. It is thus unclear whether Android’s VoIP integration at the operating system level are secure or not.

In this paper, we conduct the first study to systematically analyze Android VoIP’s (in)security at the system level. Our study begins with a demystification of Android VoIP’s protocol stack and its attack surfaces. Specifically, we study VoIP-related Android system code to identify VoIP components and their implementations, including SIP (Session Initiation Protocol) via the nist-sip library, SDP (Session Description Protocol) via gov.nist.javax.sdp, RTP (Real-time Transport Protocol) via librtp-jni.so, codecs via libstagefright, and SIP user agent via the system phone and dialer apps. Furthermore, we identify all the four potential attack surfaces that allow physical, local, remote, and nearby attacks against Android VoIP.

With these components and their attack surfaces in mind, we propose a novel vulnerability assessment approach that assembles on-device Intent/API fuzzing, network-side packet fuzzing, and targeted code auditing. First, we perform Android Intent and system API fuzzing to comprehensively fuzz the local surface. Second, we set up a unique VoIP testbed to perform three protocol fuzzings that mutate different fields in SIP, SDP, and RTP protocols either directly from a user agent or through a Man-In-The-Middle proxy. Lastly, we combine automatic fuzzing tests with targeted code auditing, including log-driven and protocol specification based auditing, to eventually determine vulnerabilities.

By periodically fuzzing VoIP components on the recent Android OS from version 7.0 to 9.0 over two years, we have discovered a total of nine zero-day vulnerabilities, eight of which are system vulnerabilities and have been confirmed by Google with bug bounty awards. Two-thirds of these vulnerabilities can be exploited by a network-side adversary, which suggests that Android VoIP’s major risks come from the remote and nearby attack surfaces. Moreover, six of nine vulnerabilities’ severity levels were rated by Google Android security team as high or critical (the most two serious levels), which implies that most of Android VoIP vulnerabilities are serious. The incurred security consequences include denying voice calls, caller ID spoofing, unauthorized call operations, and remote code execution. Furthermore, we uncover a new root cause, incompatible processing between VoIP and PSTN calls, that leads to six VoIP vulnerabilities and requires developers’ extra attention in their future design and implementation.

To summarize, we have made the following contributions in this paper:

  • The first demystification of Android VoIP’s protocol stack and all its four attack surfaces (Sect. 3);

  • A novel approach that assembles on-device Intent/API fuzzing, network-side packet fuzzing, and targeted code auditing (Sect. 4);

  • New and comprehensive vulnerability assessment results, with nine zero-day vulnerabilities analyzed and their root causes uncovered (Sect. 5 and Sect. 6).

Fig. 1.
figure 1

A typical network infrastructure of SIP.

2 Background

Before presenting our work, we first introduce the necessary background on VoIP and Android in this section.

2.1 VoIP Background

Android VoIP mainly uses the SIP (Session Initiation Protocol) protocol, which was drafted by IETF in RFC 3261 [9]. As a VoIP signaling protocol, SIP provides a mechanism for one or more participants to create, modify, and terminate sessions. Fig. 1 presents a typical network infrastructure of SIP, which consists of the following components:

  • User Agent (UA): A SIP user agent is a logical network node of SIP, which is responsible for creating, sending, and receiving SIP messages and maintains a SIP session.

  • Proxy Server: A SIP proxy server helps deliver SIP messages between different user agents. It can also perform routing control and check the integrity of SIP messages.

  • Registrar Server: A SIP registrar server is used for accepting SIP REGISTER requests from user agents, and places the location information it receives in those requests.

Similar to HTTP, SIP is a text-based protocol. It employs SDP (Session Description Protocol) to describe session contents. A typical SIP message can be an INVITE, REGISTER, OPTIONS, BYE, or CANCEL request. One important field in the SIP header is the SIP URI (Unified Resource Identifier), which represents the sender or receiver address. A SIP URI is in this format: sip:user_name@server_ip_address, e.g., sip:anonymous@192.168.8.151.

A SIP call involves three phases: the initial signaling phase, the conversation phase, and the end signaling phase. The INVITE and BYE requests are used in the two signaling phases. During the conversation phase, two calling parties exchange audio/video streams using the codecs that are negotiated via RTP (Real Transmission Protocol) [10].

2.2 Android Background

On Android, each application, no matter a system or a third-party app, runs in its own app sandbox [11]. Different apps communicate with each other through a new IPC (Inter-Process Communication) channel called Binder-based Intent. Each app has its own private data and requires permissions to access system’s resources. For example, systems VoIP apps have the RECORD_AUDIO and CALL_PRIVILEGES permissions.

There are four kinds of Android components, including the user interface based Activity, the long-running Service, the event-triggered Broadcast Rec-eiver, and the database-like Content Provider. Although the Intent-based inter-component communication (ICC [12]) enables flexible code and data sharing across different components, it also brings a widely spreading threat called component hijacking [13, 14]. By sending a crafted (malicious) Intent message to an exported component that reserves dangerous permissions or sensitive data, an adversary could misuse the permissions [15, 16], manipulate private data [13, 17]. In this paper, besides system-level vulnerabilities, we also uncover one component hijacking vulnerability in a popular VoIP application.

3 Demystifying Android VoIP

In this section, we demystify Android VoIP’s implementation and all its four attack surfaces. To the best of our knowledge, we are the first to give this demystification.

3.1 Android VoIP’s Protocol Stack

Fig. 2.
figure 2

Android’s integration of VoIP protocol stack.

By studying Android’s source code, we are able to depict its implementation of VoIP protocol at different layers. Figure 2 highlights Android VoIP’s protocol stack in the gray color. Starting from the bottom layer, the stack consists of the following components:

  • SIP (Session Initiation Protocol): Android’s SIP implementation directly uses the nist-sip library, which was developed by National Institute of Science of Technology (NIST). It is a purely Java based SIP implementation, and provides API classes (e.g., SipSession and SipProfile) via the android.net.sip package.

  • SDP (Session Description Protocol): Similar to SIP, Android’s SDP also uses the NIST implementation (gov.nist.javax.sdp), and provides a hidden API class called SdpSessionDescription.

  • RTP (Real-time Transport Protocol): Android implements RTP in a C/C++ dynamic link library called librtp-jni.so. It also provides a few API classes via the android.net.rtp package.

  • Audio or Video Codec: Android VoIP supports only a handful of codecs, including PCM (Pulse-Code Modulation) type A and type U codec, AMR (Adaptive Multi-Rate) codec, and GSM EFR (Enhanced Full Rate) codec. Supporting these codecs relies on libstagefright.

  • SIP UA (User Agent): Android VoIP implements its UA into the system phone app (com.android.phone). It is a high-privilege app under the Linux user group of radio. Hence, it can not only access typical phone-related permissions (e.g., accessing user contacts and making a phone call) but also low-level resources in the Telephone Manager and Radio Interface Layer (RIL). Additionally, displaying VoIP caller numbers is handled by the system dialer app (com.android.dialer).

It is worth noting that these VoIP components are not isolated in Android. Indeed, a VoIP session on Android always initiates from the SIP UA and goes through all those protocol and codec components. As a result, by targeting at the system phone and dialer apps, we can trigger Android VoIP’s code flows and test the entire Android VoIP components.

3.2 Android VoIP’s Attack Surfaces

Figure 3 shows all the potential surfaces that Android VoIP could be attacked:

Fig. 3.
figure 3

Android VoIP’s four attack surfaces: physical, local, remote, and nearby.

  • Physical Attack Surface: If an adversary could physically access a victim user’s phone, he is able to set the phone’s VoIP configuration without the authorization, causing a security breach. Although such attack is rare, it still needs to be considered, as we will demonstrate in Sect. 5.

  • Local Attack Surface: Since the system phone app is a privileged app, it can access not only permission-protected resources but also system interfaces in Telephone Manager and Radio Interface Layer (RIL). An on-device malicious app thus can attack the phone app via the IPC communication to obtain VoIP-related privileges.

  • Remote Attack Surface: Since the phone needs to communicate with outside via IP and mobile communication, it brings another attack surface. Specifically, a network-side adversary can send crafted payloads in SIP/SDP/RTP packets to exploit Android VoIP components remotely, causing remote denial of service and code execution.

  • Nearby Attack Surface: With the popularity of HFP (Hand-Free Profile) devices, a user may use a Bluetooth earphone or a Bluetooth car kit during her VoIP call. These nearby Bluetooth devices bring a new attack surface. On one hand, the malicious payload in VoIP traffic may reach to the system Bluetooth components. On the other hand, the malicious traffic from Bluetooth devices may also attack VoIP components.

4 Methodology

After understanding Android’s VoIP integration and its attack surfaces, we propose a novel approach to systematically assessing Android VoIP’s vulnerabilities. In this approach, we first automatically test Android VoIP components via on-device and network-side fuzzing, and further combine them with targeted code auditing to eventually determine vulnerabilities. In this section, we present these three modules, among which network-side packet fuzzing is the most special one.

4.1 On-Device Intent/API Fuzzing

To comprehensively fuzz the local surface of Android VoIP components, we perform both Android Intent fuzzing and system API fuzzing. Specifically, Intent fuzzing aims to test exported components in VoIP system apps, while system API fuzzing tries to discover unprotected VoIP system service interfaces. In this subsection, we first introduce the fuzzing framework before present its two detailed fuzzing methods.

Fig. 4.
figure 4

The on-device fuzzing framework, with not only the conventional Intent fuzzing but also the creative system API fuzzing based on Java reflection.

On-Device Fuzzing Framework. As shown in Fig. 4, we develop an on-device fuzzing framework based on Drozer [18]. We use a drozer console on PC to control the fuzzing process on a test phone via its drozer agent. We deliver fuzzing commands through Android’s adb forward command and receive fuzzing logs through the adb logcat command. For both Intent and system API fuzzing, we perform these three steps: identifying exposed surfaces, mutating parameters, and recording logs.

On-Device Intent Fuzzing. In the Intent fuzzing, exposed surfaces are VoIP apps’ exported components that can be accessed by any other third-party apps on the same phone. We identify these exported components by analyzing component information in the app’s AndroidManifest.xml file. To mutate Intent parameters, we try both empty (i.e., null) parameters and the parameters that satisfy a component’s data schemes (e.g., content:// and vk.voip).

On-Device System API Fuzzing. In the system API fuzzing, exposed surfaces are those unprotected system service interfaces. We identify them by using Java reflection to invoke Android ServiceManager’s listServices function, which can list not only all the available system service interfaces but also their accepted parameter types. We then launch targeted fuzzing against these exposed service interfaces according to their parameter types.

4.2 Network-Side Packet Fuzzing

To test Android VoIP’s network components, we need to launch network-side packet fuzzing. In this subsection, we first introduce our testbed for network-side fuzzing, and then present three protocol fuzzing and two fuzzing modes.

Fig. 5.
figure 5

Our testbed for network-side fuzzing.

Setting up the Testbed. Figure 5 shows the architecture of our testbed for network-side fuzzing, where an Android phone acts as the victim user and a mjSIP-based User Agent mimics the adversary. Note that mjSIP [19] is a command-line based SIP UA implementation with flexible options. Additionally, we use OpenSIPS [20] to establish a SIP proxy server, and connect all these three parties in the same Wi-Fi network.

Fuzzing Different Protocols. We leverage mjSIP (uac.sh) to fuzz all the three protocols in the Android VoIP stack (see Sect. 3), namely SIP, SDP, and RTP fuzzing. Listing 1.1 shows the mjUA commands used in our three fuzzing methods. Additionally, we install an AutoAnswer app in the Android phone to automate the entire fuzzing process.

  • SIP Fuzzing: In this fuzzing, we mutate the user name and server name in a SIP URI name. For example, we can use a long SIP name to launch the fuzzing: $./uac.sh –user<long_SIP_name>. Additionally, we can also change the display SIP name using the option, as shown in Listing 1.1.

  • SDP Fuzzing: In this fuzzing, we mutate different fields in the SDP’s media description. We launch the SDP fuzzing by preparing variants of a mjSIP configuration file: $./uac.sh -f configFile.cfg. The media format of this configuration file is listed in Listing 1.2. Specifically, we can change the “media” and “media_spec” parameters in multiple ways. For example, we can use different media type, port, and protocol/codec for the “media” parameter and specify different media attributes for the “media_spec” parameter.

  • RTP Fuzzing: To fuzz RTP codecs, we generate codec corpuses and send them to the Android phone one by one via mjUA’s send-file option. The detailed fuzzing code is shown in Fig. 6. Specifically, we first prepare a seed file called sample-gsm-8000.gsm, and use this seed file to randomly generate different audio files (fuzz_$i.tone).

figure c
figure d
Fig. 6.
figure 6

A code illustration of our RTP/Codec fuzzing.

Direct Fuzzing and MITM Fuzzing. As shown in Fig. 5, we provide two fuzzing modes: direct fuzzing from the UA and MITM (Man-In-The-Middle) fuzzing. To enable the MITM fuzzing, we leverage this Ethercap [21] command to perform an ARP spoof for constructing a transparent proxy: sudo ettercap -T -V hex -F rtpfuzz.ef -M arp /192.168.8.152// /192.168.8.191//. With such a MITM proxy, it is convenient for us to leverage existing VoIP traffic for mutation. For example, we can mutate RTP headers by setting an Ethercap filter, which can specify which packet to filter and how to manipulate. The mutated new packets will be then forwarded to the Android phone.

4.3 Targeted Code Auditing

To eventually determine vulnerabilities, it is necessary to launch manual code auditing after the automatic fuzzing. In this subsection, we propose two targeted code auditing methods that leverage fuzzing logs and protocol specification to reduce manual efforts.

Table 1. Zero-day Android VoIP vulnerabilities discovered in our work.

Log-Driven Auditing. Both on-device and network-side generate a number of fuzzing logs. We thus leverage them for a log-driven code auditing. Specifically, for a process crash produced by our fuzzing, we can collect either a Java exception for Java components (e.g., IllegalStateException: Reject SDP: no suitable codecs) or a fault status for native code (e.g., pid: 8112, tid: 8161, name, XXX, signal 11 (SIG SEGV), fault addr: YYY). Moreover, we can obtain the detailed location where the code encounters an error, e.g., createAnswer(SipAudioCall.java:805) and libbluetooth_jni.so(clccRes- ponseNative+30). We then use these code locations to driven our auditing.

Protocol Specification Based Auditing. PSTN and VoIP protocols have some specifications that we can leverage for a targeted auditing. For example, special attributes, e.g., the call transfer splitting character “&” and the phone number prefix “phone-context”, in PSTN may have different behaviors in VoIP, which we will illustrate later. We then leverage this kind of protocol specification differences for an efficient auditing.

5 Evaluation

In this section, we present our results of fuzzing VoIP components on the recent Android OS from version 7.0 to 9.0. Since this is a periodic fuzzing effort (i.e., not a single experiment) over a period of around two years, we focus on reporting our findings in this paper. As shown in Table 1, we have discovered a total of nine zero-day vulnerabilities, eight of which are system vulnerabilities and have been confirmed by Google with bug bounty awards. Table 1 lists the meta information of these vulnerabilities, including the entry components where vulnerabilities can be triggered from, the severity level rated by Google Android Security team, and the corresponding security consequence.

5.1 Vulnerabilities Discovered via On-Device Fuzzing

By performing on-device fuzzing, we find that Android VoIP generally protects its local attack surface, with only one vulnerability discovered by the system API fuzzing and no vulnerable component identified by the Intent fuzzing. To also demonstrate the effectiveness of our Intent fuzzing, we test and identify a VoIP vulnerability in a very popular app called VKFootnote 1, which has cumulatively over 100 million installs on Google Play.

V1: Maliciously Triggering a VoIP call in the VK App. The VK app (version 5.13) was identified by us to contain an exported component, LinkRedirActivity, which accepts an Intent with the content:// scheme and with the vk.voip data type. Surprisingly, LinkRedirActivity would directly make a VoIP call to a VK user account specified by the vk.voip data. As a result, an on-device malicious app can send a crafted Intent to trigger a VoIP call without user’s consent and even when the phone screen is turned off. More seriously, the victim user could be eavesdropped if the callee VK account was set to an account under the attacker’s control, the idea of which is similar to the login CSRF (Cross-Site Request Forgery) [22] attack in web security. To patch this vulnerability, VK added a user confirmation dialog before LinkRedirActivity can make any VoIP call.

V2: Unauthorized Call Transfer in the IMS Interface. Android has a system service called QtilMS, which is for IMS (IP Multimedia Subsystem) related functionality and implemented by Qualcomm. However, our system API fuzzing found that QtilMS exposed two VoIP APIs, SendCallTransfer Request and SendCallForwardUncondTimer, to any third-party app. Normally, these two system APIs are only accessible to those with the CALL_PRIVILEGES permission. However, our fuzzing shows that any app without the permission can also invoke the APIs, because no checking is enforced by QtilMS. As a result, an on-device malicious app can misuse those two privileged APIs to set unauthorized call transfer. To mitigate this, Qualcomm added the permission check for the access of those two QtilMS APIs.

5.2 Vulnerabilities Discovered via Network-Side Fuzzing

Compared to the on-device fuzzing, our network-side fuzzing discovered more VoIP vulnerabilities, as shown in Table 1. This suggests that Android VoIP’s major risks come from the remote and nearby attack surfaces. In this subsection, we first introduce two vulnerabilities that can be exploited remotely, and then present another two vulnerabilities that involve the nearby Bluetooth-based HFP (Hands-Free Profile) devices.

Fig. 7.
figure 7

A demo of exploiting V3.

Fig. 8.
figure 8

A demo of exploiting V4.

V3: Undeniable VoIP Call Spam Due to Long SIP Name. We discovered this vulnerability through a SIP fuzzing test using the long SIP name: $./uac.sh –user<long_SIP_name><victim’s sip account>. As shown in Fig. 7, the callee user’s VoIP phone interface could be filled up by the very long SIP name, e.g., 1,043 characters in our test case. In this scenario, the victim user cannot answer or reject a call, because no button is shown up. If the adversary frequently launches this undeniable VoIP call spam, the victim has to disable the network connection or shutdown her phone. We call this kind of denial of service attack “VoIP call bomb”, as similar to SMS bomb [23]. To defend against this attack, Google restricts the length of SIP user name.

V4: Remote DoS in Telephony Once Accepting a Call. We discovered this vulnerability through the SDP fuzzing using a malformed configuration file: . As shown in Fig. 8, it can crash the victim’s phone process once she accepts the call, causing a remote DoS (denial of service). Our fuzzing identified two weaknesses in the affected Telephony module, either of which could be exploited for the attack. One way is to use a codec that is not in the supported codec list (see Sect. 3.1). For example, if we add “media_spec=audio 102 G726-24 8000 60” into the malformed.cfg file, the phone process crashes with an illegal state exception “Reject SDP: no suitable codecs”. The other way is to use the invalid SDP description. For example, if we add “media=AAAA 4000” into the malformed.cfg file, the phone process crashes with an illegal SDP argument exception. To patch these weaknesses, Google added exception catch statements for those two unhandled exceptions.

Fig. 9.
figure 9

A model of Bluetooth-involved VoIP vulnerabilities.

The Model of Bluetooth-Involved VoIP Vulnerabilities. As shown in Table 1, the V5 and V6 vulnerabilities could be triggered only when the phone is connected with a nearby Bluetooth device. Hence, we first explain the model of these Bluetooth-involved VoIP vulnerabilities before presenting their specific weaknesses. Figure 9 depicts such a vulnerability model. Specifically, mobile phone acts as an AG (Audio Gateway) in the HFP (Hands-Free Profile) communication, and Bluetooth earphone or Bluetooth car kit is the HF (Hand Free) device. When a remote attacker makes a VoIP call to a phone connected with a HF device, the HF device will query all the call information (e.g., caller number) from the phone via HFP’s AT+CLCC command. As a result, the VoIP call input will be delivered to libbluetooth-jni for processing. A vulnerability could happen if it cannot process an unexpected VoIP call input (e.g., a long user name), because Bluetooth may consider only the traditional, instead of VoIP, phone call.

figure f
figure g

V5: Remote Code Execution Due to Stack Buffer Overflow. Both V5 and V6 suffer from the unexpected long user name (or caller number) in a VoIP call. For V5, the vulnerable code locates in the function of preparing CLCC response, as shown in Listing 1.3. It tries to return the caller number in the CLCC response, but uses only a 513-byte array (dialnum) to store it. A stack buffer overflow thus happens when a caller number with more than 513 bytes is inputted. This vulnerability allows an adversary to overwrite the return address of the ClccResponse function, causing remote code execution. For example, the adversary can launch the exploit using this command: $./uac.sh –user $(python -c ’print “8”*1055’).

V6: Remote DoS in Bluetooth Once Receiving a Call. This vulnerability is similar to V5, but it is triggered when the call state changes, i.e., BTHF_CALL_INCOMING in Listing 1.4. In this example, developers also did not expect the long caller number in a VoIP call. Specifically, the return value of the first snprintf statement can be greater than sizeof(ag_res.str)’s 513 bytes. Since the variable now is an unsigned negative number, it becomes a very large positive integer, which eventually triggers the abort checking statement and causes remote DoS. Compared to the DoS in V4, triggering DoS in V6 requires a Bluetooth device connected, but just needs to receive, rather than answer, a call.

To patch V5 and V6, Google restricted the length of caller number inputted in the Bluetooth module.

5.3 Vulnerabilities Discovered via Code Auditing

In this subsection, we present the vulnerabilities that are dedicatedly discovered by our targeted code auditing. Specifically, we are able to use protocol specification based auditing to discover these vulnerabilities, since their root causes are the inconsistency between VoIP’s specification and Android’s traditional phone call processing.

figure i

V7: Data Leak and Permanent DoS Due to Path Traversal. In this vulnerability, we exploit the inconsistency between SIP URI and Android/Linux file directory. Specifically, SIP URI treats “..” and “/” as normal characters, whereas they are special characters in the Android’s file name convention. As a result, a path traversal vulnerability appears in the code shown in Listing 1.5. The directory that contains the serialized “.pobj” SIP profile file is named in this format: “sip_user@server_ip”, e.g., “alice@171.11.160.202”. An attacker thus can misuse these two names to manipulate the path of mProfileDirectory. For example, by physically setting “sip_user” and “server_ip” in the format of Fig. 10(a), mProfileDirectory becomes “ ” and leaks the sensitive SIP profile file to the public SD card. A permanent DoS could also happen if “server_ip” is set to overwrite another system app’s file, e.g., mmssms.db shown in Fig. 10(b). Due to this fake mmssms.db file, the real one cannot be created and thus deny any SMS functionality. Only a factory reset can recover the phone.

Fig. 10.
figure 10

Demo screenshots of exploiting the vulnerability V7.

V8: Caller ID Spoofing Due to Mis-parsing “&”. The last two vulnerabilities, V8 and V9, are due to the inconsistency between SIP URI and PSTN (Public Switched Telephone Network) number format. In vulnerability V8, it is related to a special character “&” in the caller number. For a caller number with “&”, the system dialer app treats the number before “&” as the actual calling number and the number after “&” as the call transfer number, according to PSTN’s convention. However, the dialer does not consider an incoming VoIP call and performs the same for a VoIP call number. As a result, an adversary can mimic any phone number by simply adding a “&” character in the end, causing a caller ID spoofing attack. For example, the attacker can mimic the emergency number by setting the SIP name as “911&”, as shown in Fig. 11(a). He can also spoof as a contact number of the victim if the attacker knows the number, and the dialer will display the name and profile photo of the spoofed contact, as shown in Fig. 11(b).

Fig. 11.
figure 11

Demo screenshots of exploiting the vulnerability V8 and V9.

Table 2. Incompatible behaviors between VoIP and PSTN calls.

V9: Caller ID Spoofing Due to “phone-context”. Another inconsistency between SIP URI and PSTN number format is the “phone-context” parameter [24], which can be used to specify the prefix of a phone number. For example, in PSTN’s convention, the number “650253000;phone-context=+1” is equivalent to “+1650253000”, where the value of “phone-context” becomes the prefix of the number. However, such convention should not apply to VoIP calls, which is unfortunately ignored by the dialer app. As a result, an adversary can intentionally set the caller number as “650253000;phone-context=+1”, and the dialer app will interpret it as “+1650253000” and display it as Google’s call, which is clearly presented in Fig. 11(c). Note that such mapping from “+1650253000” to Google is automatically performed by Android’s CallerID mechanism [25], which tries to correlate well-known phone numbers or mark spam numbers in the normal scenario. But here it worsens the severity instead.

6 A New Root Cause

Besides the vulnerability-level cause analysis in Sect. 5, we try to uncover the root causes underneath those vulnerabilities. Among the nine vulnerabilities we discovered, three of them have previously known root causes, i.e., no protection of exported components in V1 [13, 15], no checking of system APIs in V2 [26, 27], and missed error handling in V4 [28]. For the rest of six vulnerabilities, we identify a new root cause that is dedicated to Android VoIP and not known before.

We call this root cause “incompatible processing between VoIP and PSTN calls”. Specifically, since both VoIP calls and traditional PSTN calls are handled by the Android telephony system, there exist some incompatible processing behaviors between VoIP and PSTN calls. Such incompatibility is the root cause of six VoIP vulnerabilities we identified, as summarized in Table 2. For example, for the attribute of phone number length, VoIP SIP can use more than 513 bytes, whereas only less than 513 bytes is used in the traditional PSTN phone number. Other examples are the special characters of “../”, “&”, and “phone-context”, which could be treated as a part of the URI in VoIP SIP. But they originally have special meanings in the Linux and PSTN specification, causing incorrect processing in the Android VoIP code. Understanding these incompatible behaviors and other potential incompatibility between VoIP and PSTN calls can help us further improve Android VoIP security. We thus call for VoIP developers’ extra attention in their future design and implementation.

7 Related Work

In this section, we present the closely related research on VoIP security, protocol fuzzing, and Android dynamic testing.

VoIP Security. There were some research [29,30,31,32,33] to explore the general security issues of VoIP, e.g., denial of service, eavesdropping, and call hijacking, since over ten years ago. In particular, the VOIPSA organization gave a clear taxonomy [34] of VoIP’s threats. Recently, with the high popularity of Android phones and mobile networks, researchers started to investigate the security of VoIP apps and network infrastructure in the real world. They have identified the privacy risks in some VoIP apps [7, 8] and infrastructure vulnerabilities in several mobile carriers [4,5,6]. In particular, both Li et al. [4] and Kim et al. [5] identify a number of serious vulnerabilities in mobile carriers’ VoLTE networks, including free data, caller spoofing, over-billing, and denial-of-service. Compared with all these works, we are the first to systematically study the security of system-level VoIP implementation on Android, with 8 zero-day vulnerabilities identified and confirmed by Google.

Protocol Fuzzing. Our network-side fuzzing in Sect. 4.2 belongs to the general category of network protocol fuzzing. In the classical book of Fuzzing: Brute Force Vulnerability Discovery [35], the authors explained network protocol fuzzing on both Windows and Unix. Regarding the stateful network protocol fuzzing, SNOOZE [36] and Prospex [37] are two pioneer systems. AutoFuzz [38] is an open-source network protocol fuzzing framework. There are also some fuzzers specific to certain protocols, such as for OPC protocol [39] and TLS libraries [40, 41]. Moreover, KiF [42] is a dedicated SIP fuzzer that was released in 2007, but unfortunately, it does not apply to Android phones. Very recently, Pham et al. proposed AFLNet [43], a greybox fuzzer based on the popular AFL (American Fuzzy Lop) to specifically fuzz network protocol implementations. In this paper, our network-side fuzzing tool is the first Android VoIP fuzzer for SIP, SDP, and RTP fuzzing.

Android Dynamic Testing. Our on-device fuzzing in Sect. 4.1 is related to the general Android dynamic testing [44,45,46,47,48]. For example, SMV-Hunter [44] and FileCross [45] leveraged Android adb commands to dynamically test Android apps’ security vulnerabilities. AppIntent [46], further instrumented Android operating system for the effective dynamic testing of Android apps. Two crowdsourcing apps, UpDroid [47] and NetMon [48], were recently proposed to leverage crowds’ user interaction for dynamic app tests in the wild. Besides general Android dynamic testing, the closest work to our Intent fuzzing is IntentFuzzer [49], which also leveraged Drozer for Intent fuzzing. The difference is that our fuzzing targets at VoIP components, instead of the permission-protected components in IntentFuzzer [49]. Additionally, buzzer (Binder Fuzzer) [50] analyzed input validation vulnerabilities associated with Android system services, which is similar to our System API fuzzing except that we use Java reflection to effectively identify service interfaces and their parameters. Furthermore, our on-device fuzzing is an unified framework that performs both Intent and System API fuzzing.

8 Conclusion

In this paper, we conducted the first study to systematically investigate the (in)security of Android’s VoIP integration at the system level. We began with a demystification of Android VoIP’s protocol stack and all its four attack surfaces. We then proposed a novel vulnerability assessment approach that first employs on-device Intent/API fuzzing and network-side packet fuzzing to automatically test Android VoIP components, and further combines them with targeted code auditing to eventually determine vulnerabilities. By periodically fuzzing VoIP components on the recent Android OS from version 7.0 to 9.0 over two years, we discovered a total of nine zero-day vulnerabilities, two-thirds of which can be exploited by a network-side adversary. These vulnerabilities caused serious security consequences, including denying voice calls, caller ID spoofing, unauthorized call operations, and remote code execution. Finally, we uncovered a new root cause, incompatible processing between VoIP and PSTN calls, that leads to six VoIP vulnerabilities and requires developers’ extra attention in their future design and implementation.