On the Feasibility of Launching the Man-In-The-MiddleAttacks on VoIP from Remote AttackersRuishan Zhang† , Xinyuan Wang† , Ryan Farley† , Xiaohui Yang† , Xuxian Jiang‡†‡Department of Computer ScienceGeorge Mason UniversityFairfax, VA 22030, USA{rzhang3, xwangc, rfarley3, xyang3}@gmu.eduDepartment of Computer ScienceN.C. State UniversityRaleigh, NC 27606, [email protected] man-in-the-middle (MITM) attack has been shown tobe one of the most serious threats to the security and trustof existing VoIP protocols and systems. For example, theMITM who is in the VoIP signaling and/or media path caneasily wiretap, divert and even hijack selected VoIP callsby tempering with the VoIP signaling and/or media traffic.Since all previously identified MITM attacks on VoIP requirethe adversary initially in the VoIP signaling and/or mediapath, there is a common belief that it is infeasible for aremote attacker, who is not initially in the VoIP path, tolaunch any MITM attack on VoIP. This makes people thinkthat securing all the nodes along the normal path of VoIPtraffic is sufficient to prevent MITM attacks on VoIP.In this paper, we demonstrate that a remote attacker whois not initially in the path of VoIP traffic can indeed launchall kinds of MITM attacks on VoIP by exploiting DNS andVoIP implementation vulnerabilities. Our case study of Vonage VoIP, the No.1 residential VoIP service in the U.S. market, shows that a remote attacker from anywhere on the Internet can stealthily become a remote MITM through DNSspoofing attack on a Vonage phone, as long as the remoteattacker knows the phone number and the IP address of theVonage phone. We further show that the remote attackercan effectively wiretap and hijack targeted Vonage VoIP callsafter becoming the remote MITM. Our results demonstratethat (1) the MITM attack on VoIP is much more realistic than previously thought; (2) securing all nodes along thepath of VoIP traffic is not adequate to prevent MITM attackon VoIP; (3) vulnerabilities of non-VoIP-specific protocols(e.g., DNS) can indeed lead to compromise of VoIP.VoIP Security, SIP, MITM Attacks, DNS SpoofingCategories and Subject DescriptorsC.2.0 [Computer-Communication Networks]: General—Security and protection (e.g., firewalls); C.2.3 [ComputerCommunication Networks]: Network Operations—Network monitoringPermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ASIACCS’09, March 10–12, 2009, Sydney, NSW, Australia.Copyright 2009 ACM 978-1-60558-394-5/09/03 . 5.00.1.INTRODUCTIONVoIP has experienced explosive growth in the past fewyears, and it is becoming an indispensable part of more andmore people’s daily life. An IDC report [4] predicted thatthe number of U.S. residential VoIP subscribers will reach 44million by 2010. In addition, VoIP has been widely used forcarrying mission critical 911 calls. The Federal Communications Commission (FCC) estimated [2] that there were about3.5 million residential VoIP 911 calls in 2006. Therefore, failures in providing reliable and trustworthy VoIP services notonly disrupt the the normal operation of our society but alsomay cost people’s lives under certain circumstances.VoIP is built upon the interaction of a number of application protocols on the Internet. The open architectureof the Internet, however, makes VoIP protocols subject tomore attacks than what is possible in PSTN (public switchedtelephone network). Signaling protocol and media transportprotocol are two integral components of any VoIP system.Currently, the Session Initiation Protocol (SIP) [20] and theReal Time Transport Protocol (RTP) [22] are the dominantVoIP signaling protocol and media transport protocol respectively. In fact, most deployed VoIP services (e.g., Vonage, AT&T, Gizmo and Wengophone) use SIP and RTP. Inaddition, all existing VoIP systems depend on DNS to function normally. Therefore, any vulnerabilities in SIP, RTPor DNS could lead to the compromise of VoIP security andtrustworthiness.Previous research [20, 12, 1, 26, 24, 15] has shown thata man-in-the-middle (MITM), who is in the path of VoIPtraffic, is able to wiretap, divert and even hijack selectedVoIP calls by tempering with the VoIP signaling and/ormedia traffic. Such MITM attacks on VoIP could cause serious consequences to the targeted VoIP users. For example,VoIP wiretapping enables attackers to collect sensitive information (e.g., credit card number, bank account number,PIN) of the victim VoIP users. Unauthorized VoIP call diversion and voice pharming [24] could trick even the mostmeticulous VoIP callers into talking with bogus bank telleror interacting with bogus interactive voice response (IVR)systems. All these MITM attacks on VoIP could cause identity theft and financial loss to the victim VoIP users.Since all previously identified MITM attacks on VoIP require the adversary initially in the VoIP signaling and/ormedia path, there is a common belief that it is infeasible for

0 !' %%& '(%' '%% ) %%& '(%' '%% )1#1-2&'3' !#" # % &' *'( ')&' !,! "#!!# # - ./% &''( '" ./,)&' ! ./Figure 2: Unauthorized Call Redirection via MITM*Figure 1: An Example of Message Flow of SIP Authenticationa remote attacker, who is not initially in the VoIP path, tolaunch any MITM attack on VoIP. As a result, many people do not believe the MITM attack is a realistic threat tocurrent VoIP protocols and systems and they think that securing all the nodes along the normal path of VoIP traffic issufficient to prevent MITM attacks on VoIP.In this paper, we investigate the feasibility for a remoteattacker, who is not initially in the path of VoIP traffic, tobecome the MITM. Our case study of Vonage VoIP service,which is the No. 1 residential VoIP service in the U.S. [9],shows that a remote attacker from anywhere on the Internet can, by exploiting the vulnerabilities of DNS and SIPmessage handling in the Vonage phone, stealthily becomethe remote MITM and launch all kinds of MITM attacks ontarget VoIP phones. Specifically, we find that the remote attacker can crash and reboot the targetedVonage SIP phone by sending it crafted, malformedSIP INVITE messages. This will cause the rebootedVonage SIP phone to send out DNS query about thelocation of the SIP server to contact. the remote attacker can trick the Vonage SIP phoneinto taking any IP address as that of the Vonage SIPserver via spoofed DNS responses. the remote attacker can cause all the calls to or fromthe targeted Vonage phone to pass it. This makes theremote attacker a MITM and enables him to wiretapand hijack any calls to or from the targeted Vonagephone.Note, the identified remote MIMT attack on VoIP onlyrequires the knowledge of the phone number and the IP address of the targeted Vonage phone, and it works even if thetargeted Vonage phone is behind NAT.Our results demonstrate that (1) the MITM attack onVoIP is much more realistic than previously thought; (2)securing all nodes along the path of VoIP traffic is not adequate to prevent MITM attack on VoIP; (3) vulnerabilitiesof non-VoIP-specific protocols (e.g., DNS) can indeed leadto compromise of VoIP.The rest of this paper is organized as follows. Section 2gives a brief overview of SIP and the MITM attack. Section3 describes our investigation approach. Section 4 presentsour case study and demonstrates the DNS spoofing, wiretapping and call hijacking attacks on a Vonage SIP phone.Section 5 discusses potential mitigation strategies. Section 6reviews related work. Finally, section 7 concludes the paper.2.OVERVIEW OF SIP AND THE MIMT ATTACKSIP is a HTTP-like, application layer signaling protocolused to create, modify, and terminate multimedia sessions(e.g., VoIP calls) among Internet endpoints. The SIP specification defines the following different components: useragents (UA), proxy servers, redirect servers, registrar servers,location servers. An UA represents an endpoint of the communication (i.e., a SIP phone). The proxy server is the intermediate server that forward the SIP messages from UAsto its destination. Various SIP servers described above arelogical functions. In most deployed systems, generic SIPservers perform the functionalities of both registrar serversand proxy servers.The SIP specification [20] recommends using TLS or IPSecto protect SIP signaling messages, and using S/MIME toprotect the integrity and confidentiality of SIP message bodies. However, most deployed SIP VoIP systems (e.g., Vonage, AT&T CallVantage) only use SIP authentication to protect their signaling messages.SIP authentication is similar to digest based HTTP authentication. Figure 1 depicts the typical SIP authenticationof call registration, call setup and call termination. When aSIP server (e.g., proxy, registrar) receives a SIP request (e.g.,REGISTER, INVITE, BYE) from a SIP phone, the SIP serverchallenges the SIP phone with either a 401 unauthorizedor a 407 proxy-authentication required message. Uponreceiving the 401 or 407 message, the SIP phone calculatesa hash value by applying a specific digest algorithm (e.g.,MD5) to SIP message fields request-URI, username, sharedpassword between the phone and the SIP server, realm, andnonce. Then the SIP phone sends the hash value along withthe original SIP request as the authentication credential.However, existing SIP authentication only covers selectedfields of a few SIP messages from a SIP phone to a SIP server.This leaves other SIP messages and fields unprotected. Byexploiting the vulnerabilities of SIP and RTP, a MITM whois in the path of VoIP traffic can detour any chosen call via anywhere on the Internet[24]. This would allow the attacker to wiretap selectedVoIP calls and collect sensitive information (e.g., ac-

count number, PIN) from the victim. redirect any selected VoIP call to any third party andmanipulate and set the call forwarding setting of anyselected Gizmo VoIP subscriber without authorization[24]. This would allow the attacker to hijack VoIP callsto financial institutions and pretend bank representative. launch billing attacks [26] on selected VoIP users suchthat the victim VoIP users will either be overchargedfor their VoIP calls or charged for calls not made bythem. disrupt any chosen VoIP call by sending a BUSY or BYEmessage.Figure 2 illustrates the message flow of the unauthorizedcall redirection attack by the MITM. All existing MITMattacks require the attacker initially in the VoIP signaling and/or media path, this somewhat limiting requirementmakes many people believe that the MITM attack on VoIPis not realistic. In the following sections, we investigate howa remote attacker, who is not initially in the VoIP path, canbecome the remote MITM and launch all kinds of MITMattacks on targeted VoIP users.3.INVESTIGATION APPROACHTo investigate the feasibility for the remote attacker tobecome the MITM of VoIP traffic, we assume the role of theactive adversary who seeks to trick the targeted VoIP phoneto pass all its VoIP traffic through him by exploiting thevulnerabilities of the SIP phone and all protocols it uses.We choose to experiment with Vonage VoIP, which is themost popular residential VoIP service in the U.S. market.Our investigation is divided into two steps. First, we passively observe the network traffic between our Vonage SIPphone and its VoIP servers to spot potential weaknesses.Second, we use fuzz testing to confirm the weaknesses foundby passive observation or identify new possible flaws. Notethat we treat the VoIP phone as a whole, and look for allthe vulnerabilities from the embedded operating system andthe upper-layer applications. When observing the networktraffic, we use Wireshark [11] to view the parsed protocols .By observing the network traffic, we found a weakness ofthe Vonage phone in handling DNS. A Vonage SIP phoneobtains SIP servers’ IP addresses via DNS query [18]. Giventhat DNS runs over connectionless UDP, the remote attackercan forge and inject DNS response packets to the SIP phone.Whether the victim accepts the forged DNS response depends on whether the following conditions are satisfied: The destination IP address and the destination portnumber of the forged DNS response packet are thesource IP address and the source port number of theDNS query packet. The source IP address and the source port number ofthe forged DNS response packet are the destination IPaddress and the destination port number of the DNSquery packet. The ID field of the forged DNS response packet matchesthat of the DNS query packet. The question section of the forged DNS response packetmatches the question section of one of the DNS querypackets sent.Since both the ID and the port number are 16 bits, thewhole brute-force search space for a matching DNS responseshould be 232 in theory. However in practice, if a DNS queryuses predictable IDs and/or a limited port range, the bruteforce search space could be greatly reduced. One key findingof our research is that the Vonage SIP phone uses a staticID and a small range of port number 45000-46100, whichreduces the brute-force search space to merely 1100.In order to trick the targeted SIP phone to accept thespoofed DNS response, the remote attacker needs to trigger a DNS query from the targeted SIP phone. We haveobserved that the SIP phone sends a DNS query each timeit restarts. Therefore, if the remote attacker can somehowcause the target SIP phone to reboot, he can reach this goal.After a lot of fuzz testing, we have identified a program flawin handling a malformed INVITE message, which allows theremote attacker to remotely crash and reboot the VonageSIP phone, thus triggering a DNS query.Utilizing the above vulnerabilities and techniques, a remote attacker is able to inject fake DNS responses to theVonage SIP phone and trick it into thinking that the remote attacker is the Vonage SIP server. By replacing the thesource IP address of the REGISTER message from the VonageSIP phone with its own IP address, the remote attack canmake the Vonage server into thinking it is the Vonage SIPphone. As a result, the remote attacker becomes a MITMon the path between the SIP phone and its SIP servers.Our implementation of the remote attacks consist of approximately 6000 lines of C code. Logically, it consists ofthree parts: (1) the remote MITM module which let any remote attacker become the remote MITM by crashing the targeted SIP phone and injecting the spoofed DNS responses;(2) the remote wiretapping module that allows the remoteMITM to wiretap selected VoIP calls; (3) the remote callhijacking module that allows the remote MITM to hijackselected VoIP calls.4.CASE STUDYIn this section, we describe our case study of Vonage VoIPservice, which is the No.1 U.S. residential VoIP service withmore than 2.5 million subscribers. Note all our exploitingexperiments have been against our own phones and account.We demonstrate how a remote attacker becomes a MITMby launching DNS spoofing attack on a Vonage SIP phone.First we describe our testbed setup and message flow of thenormal startup or reboot of the Vonage SIP phone. Thenwe present the identified DNS implementation weaknessesof the Vonage phone and its vulnerability in handling themalformed INVITE message. Next we illustrate the messageflow of the DNS spoofing attack and describe our experimental results. Finally after achieving a MITM, we presentthe remote wiretapping and remote call hijacking attacks onVoIP.4.1Network SetupFigure 3 illustrates the network setup of our testbed. Theremote attacker runs Red Hat Linux on a Dell D610 laptopcomputer. NAT router 1 is a FreeBSD machine running ona virtual machine and NAT router 2 is a Linksys router.

(a) SIP phone directly connected to theInternet(b) SIP phone behind NATsFigure 3: Testbed SetupFigure 3(a) illustrates the network setup where the SIPphone is directly connected to the Internet. We use SIP/RTPserver(s) to denote the SIP server and the RTP server whichhandle the signaling messages and the RTP stream respectively. The remote attacker could be anywhere on the Internet. In our experiment, we use a wiretap device to capturelive network traffic transited from/to the SIP phone. Thewiretap device and the SIP phone connect to a four port10BASE-T Ethernet hub.Figure 3(b) illustrates the network setup where the SIPphone is behind NATs. Note this setup is different from themost popular settings where the SIP phone is behind onlyone NAT router. We notice that the SIP phone will sendsome destination unreachable ICMP packets to the VonageDNS server when receiving spoofed DNS responses with unmatched port numbers. We use the NAT router2 to blockthese unwanted traffic from reaching the Vonage DNS server.As a result, the SIP phone is behind 2 NAT routers. Forconvenience, we placed the remote attacker outside NATrouter2 but inside the private network of NAT router1. Fromthe remote attacker’s perspective, the targeted SIP phone isbehind one NAT router, which is the most likely configuration for residential VoIP phones. In this configuration,the wiretap device and NAT Router2 connect to a four port10BASE-T Ethernet hub. We notice that none of the NATrouter will change the source port number of the passingpacket, this enables the remote attacker to become the remote MITM via the identified exploit even if the targetedVonage phone is behind 2 levels of NAT routers.&!"#"#"'" %()Figure 4: Message Flow of Normal Startup or Rebooting the 401 response, the SIP phone sends the SIP servera new SIP REGISTER message containing credentials. Notethe ”expires” field in the SIP REGISTER message specifies theduration for which this registration will be valid. So the SIPphone needs to refresh its registration from time to time. Vulnerabilities of Vonage SIPPhoneWeaknesses in the Implementation of DNS Queryand ResponseThe implementation of DNS query/response in the Vonage phone has several weaknesses.Message Flow of Normal Startup or Reboot The SIP phone always uses a static ID value, 0x0001,in all DNS queries.Figure 4 depicts the message flow of normal startup orreboot of a Vonage phone. At the beginning, the SIP phonesends a DNS query to the Vonage DNS server to ask forSIP servers’s IP addresses in step (1). All DNS queriesfrom the Vonage SIP phone go to the Vonage DNS serverat IP address Then in step (2), the VonageDNS server replies with a DNS response packet containing four IP addresses of Vonage SIP servers:,, and At step (3), theVonage phone sends to one of four SIP servers a SIP REGISTER message. Then in step (4), the SIP server challenges theSIP phone with a 401 Unauthorized message. After receiv- The source port number range of DNS queries is limited to 45000-46100.4.2 The question sections of all DNS queries are identical,and contain 11 bytes of string The SIP phone does not check the source IP addressof a DNS response. Even if the source IP address isnot that of the Vonage DNS server, the Vonage phonestill accepts a spoofed DNS response.Due to these vulnerabilities, the brute-force search spacefor forging a matching DNS response is no more than 1100.

45 "1*Figure 6: Timeline of a Round of Attack% &!"' (% 5 06 "71#) * *, '- .- '- .**'"/ %"/ %0 %10 %-- 231-- 23Figure 5: Message Flow of DNS Spoofing Attack4.3.2Vulnerability in Handling Malformed INVITEMessagesWe have found that our Vonage SIP phone fails to handle amalformed INVITE message correctly and it will reboot whenreceives a malformed INVITE message with a over lengthphone number in the From field. This allows the remote attacker to crash and reboot the targeted Vonage phone bysending it one malformed INVITE message. To launch suchan attack, the remote attacker needs to spoof the source IPaddress as that of one of Vonage SIP servers. Otherwise,the Vonage phone will discard the INVITE message. Ourexperiments have shown that the Vonage phone does notring but replies with a Trying message after receiving themalformed INVITE messages. Then the phone crashes andreboots almost immediately. After a few seconds (e.g., 13seconds), the Vonage phone sends a DNS query to the Vonage DNS sever. Note the SIP phone crash attack is stealthyin that the SIP phone does not ring at all when receives themalformed INVITE message. Spoofing AttackMessage FlowFigure 5 shows the SIP message flow of the DNS spoofing attack on the Vonage SIP phone. At the beginning, theremote attacker sends a malformed INVITE message to theSIP phone with a spoofed source IP in step (1). In response,the SIP phone sends a Trying message to the real SIP serverin step (2). Then the SIP phone crashes and reboots. Several seconds later, the SIP phone sends a DNS query to theVonage DNS server asking for the SIP servers’ IP addressesin step (3). Within several milliseconds, the legitimate DNSresponse from the Vonage DNS server reaches the SIP phonein step (6).If the remote attacker sends the spoofed DNS responsepackets to the Vonage phone within the time window fromstep (3) to (6), the Vonage phone will receive the spoofedDNS response before the legitimate DNS response arrives.This process is represented at step (4). Since the remote attacker does not have access to the original DNS query fromthe Vonage phone, he has to try each of the 1100 possible port numbers in the spoofed DNS response packets. Ifthe spoofed DNS response packet contains the wrong portnumber, the Vonage phone sends a port unreachable ICMPpacket to the DNS server at step (5). If the spoofed DNS response packet contains the matching port number, the Vonage phone accepts the spoofed DNS response packet andsends out REGISTER message to the remote attacker at step(7) as it now thinks the remote attacker is the Vonage SIPserver. Therefore, the remote attacker can determine thesuccess of the DNS spoofing by checking if he receives theexpected REGISTER from the targeted Vonage phone withina predefined period of time.If the remote attacker does not receive the expected REGISTER from the targeted Vonage phone within predefinedperiod of time, he knows that the Vonage phone has accepted the authentic DNS response from the Vonage DNSserver. The remote attacker needs to start a new round ofattack by repeating steps (1-6) until he receives a REGISTERmessage from the SIP phone in step (7). We define stepsfrom (1) to (6) as a round of the attack. Normally it willtake several rounds before the SIP phone finally sends theREGISTER message to the remote attacker.After receiving the REGISTER message at step (7) or (11),the remote attacker forwards them to the real SIP server instep (8) or (12). Meanwhile the remote attacker forwardsthe 401 Unauthorized message at step (9) and the 200 OKmessage at step (13) from the SIP server to the SIP phonein step (10) and (14). Now the remote attacker becomes theMITM in that 1) the SIP phone thinks the remote attackeris the SIP server; and 2) the SIP server thinks the remoteattacker is the SIP phone.To launch the DNS spoofing attack, the remote attackeronly need to construct 1000 fake DNS response packets with1000 different destination port numbers. Specifically, theremote attacker just need to Fill 0x0001 into the ID field of all spoofed DNS responses. Fill into the question section of all DNSresponses. Fill the IP address of the remote attacker into the answer section of all spoofed DNS responses. Set the destination port number of 1st, 2nd,., 1000thpacket as 45000,45001,.,45999. The SIP phone does not check source IP address. Sowe set it to the IP address of the remote attacker whenthe victim phone is on the Internet. When the phone isbehind NATs, the source IP address of spoofed DNSpackets is set to that of Vonage SIP server to passthrough NAT Router2.Figure 6 illustrates the timeline of a round of the attack.T0 is the time when the remote attacker sends a malformed

!""# " %& "# " %'( ))" # "# " %" #%* , "# " %'-!-.&& '&' %&(a) PSTN Phone Calls SIP Phone* .( '( ''(b) SIP Phone Calls PSTN PhoneFigure 7: Message Flow of Wiretapping Calls Between a SIP Phone and a PSTN Phone by the RemoteAttackerTable 1: Measured Time Interval from INVITE to DNS Query without Spoofed DNS10times12345678910rangeaverageseconds 14.9 13.8 13.0 18.8 14.6 12.9 15.5 12.8 15.5 14.1 12.9-15.514.9INVITE. T2 and T3 are the times when the SIP phone sendsa DNS query and receives the legitimate response from theDNS server respectively. We refer to the time interval fromT2 to T3 as the Vulnerable Window (VM). T1 and T4 denote the start time and end time respectively of sendingspoofed DNS response packets. We refer to the time interval from T1 to T4 as an Attack Window (AW). Apparently,the larger the attack window is, the fewer rounds the remoteattacker needs in order to succeed.Our experiments show that the Vonage phone actuallyaccepts spoofed DNS response before it sends out the DNSquery. In addition, if the remote attacker keeps sendingmany spoofed DNS response packets with very shot interpacket arrival time, it will have a good chance to block thetargeted SIP phone from receiving the authentic DNS response. Therefore, the attack window could start earlierand end later than the vulnerable window.4.4.2Experimental Results and AnalysisIdeally we want T1 to be earlier but not too much earlier than T2. We have measured the time interval fromthe moment the remote attacker sends the malfored INVITEmessage to the moment the crashed and reboot SIP phonesends the first DNS query. Table 1 shows the measured thetime intervals for 10 runs of crashing the SIP phones. Itshows that it takes 12.9 15.5 seconds for the SIP phoneto send the first DNS query after receiving the malformedINVITE packet. Therefore, we set T1 at 12 seconds after T0.We have set transmission rate of the spoofed DNS responsepackets at 1000 pkt/s. To maximize the chance of hittingthe correct port number while keeping the the duration ofeach round short, we set the duration of attack window tobe 8 seconds. Therefore, T4 is 20 seconds after T0. At eachround, the remote attacker sends the 1000 spoofed DNS response packets for maximum 8 times, and the duration ofone round of attack is 20 seconds. As shown in Table 2,the average number of rounds and the required time of 10instances of DNS spoofing attack against the SIP phone onthe Internet is 39.8 and 789 seconds (about 13 minutes).When the SIP phone is behind NATs, the attack is similarexcept that the IP address of fake DNS responses should bespoofed as that of the Vonage DNS server to pass throughNAT Router2. The result of one test showed that the number of rounds is 8, and the required time is 169 seconds.Our preliminary investigation shows that port numbers ofDNS queries are all in the range 45000-45999, so that therange 45000-45999 is applied.The packet size of a spoofed DNS response is 87 bytes, including 14 bytes of Ethernet header, 20 bytes of IP header,8 bytes of UDP header and 45 bytes of UDP payload. Giventhat the DNS spoofed packets are transmitted at 1000 pkt/s,the transmission rate is about 700 kbps. Since most household broadband Internet access has at least than 2 Mbpsdownstream rate, our DNS spoofing is practically applicable to household broadband VoIP.4.5Wiretapping and Call HijackingAfter becoming a MITM, the remote attacker is able, atleast in theory, to launch all kinds MITM attacks. In thissubsection, we demonstrate two representative MITM attacks from the remote attacker: call wiretapping and callhijacking.4.5.1Wiretapping Incoming Call RemotelyFigure 7(a) shows the message flow of wiretapping theincoming calls to the Vonage phone by the remote attacker.

Table 2: Number of Rounds and Time Needed to Become the Remote MITM10 instances12345678910Sum Average#round7911152228415410510639839.8Time (sec)135 175 213 296 437 556 800 1080 2081 2117 7890789At the beginning, the SIP server sends an INVITE message to the remote attacker at step (1). The remote attackermodifies the IP address and port number for the upcomingRTP stream in the INVITE message so that upcoming RTPstream from the SIP phone will go to the remote attacker’sIP address and port number 12345. Then the remote attacker sends the modified INVITE message to the SIP phoneat step (2). At step (3-6), the remote attacker forwardsTrying and Ringing messages from the SIP phone to theSIP server. After the receiver picks up the phone, the SIPphone sends a 200 OK message at step (7) to the remote attacker. Similar to step (2), the remote attacker sets its ownIP address and port number (e.g., 12345) as the RTP streamtermination point, and then sends the modified 200 OK tothe SIP server at step (8). At step (9-10), the remote attacker forwards the ACK message from the SIP server to theSIP phone. At this point, the three way handshake for theVoIP call setup is completed. Then at step (11), the remoteattacker wiretaps the RTP streams between the SIP phoneand the RTP server as the remote MITM.4.5.2Wiretapping Outgoing Call RemotelyFigure 7(b) illustrates the message flow of wiretappingthe outgoing calls from the Vonage phone by the remoteattacker.At the beginning, the SIP phone sends an INVITE message to the remote attacker at step (1). Then the remoteattacker modifies the IP address and port number for the upcoming RTP stream and sends the modified INVITE messageto the SIP server at step (2). At step (3-4), the remote attacker forwards the 407 proxy-authentication Requiredmessage to the SIP phone. At step (5-6), the remote attackerforwards the ACK message for 407 proxy-authenticationRequired to the SIP server. At step (7), the SIP phonesends a new INVITE message with the require

VoIP calls by tempering with the VoIP signaling and/or media tra–c. Such MITM attacks on VoIP could cause seri-ous consequences to the targeted VoIP users. For example, VoIP wiretapping enables attackers to collect sensitive in-formation (e.g., credit card number, bank account number, PIN) of the victim VoIP