Sip Dial-Peer Redundancy or Fail-over

Tag Archive: Sip Dial-Peer Redundancy or Fail-over

SIP Dial-Peer Redundancy or Fail-over did not work as expected

Filed under: Gateways and Trunks, Tales — 4 Comments

June 16, 2013

* Hint:Please note that some of the outputs in this post have been recreated in a lab

So on this particular issue, my task was to investigate why, even though there were multiple call-managers in the cluster, all external incoming calls failed to connect to the call-manager cluster whenever the primary call-routing call-manager server was unreachable. We noticed that even though there were other call-managers in the cluster that could handle calls coming from the PSTN or service provider, the calls would always fail just because one server was unreachable. Let me show you a quick diagram of the topology below.

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

Internal phone <———- Call Manager <—(SipTrunk)–<— 2811 Voice Gateway<—-(ISDN Tunk)——-<—Telco or Service provider

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

As the voice gateway is the first point of entry into the network, I started my investigation from there. Here are the related configurations that I found on the voice-gateway.

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

dial-peer voice 1 pots

description incoming dial-peer for pots leg
incoming called-number .
direct-inward-dial

:::::::::::::::::::::::::::::::::::::::::

dial-peer voice 5 voip
description primary dial-peer to cluster
destination-pattern ^1…$
session protocol sipv2
session target ipv4:192.168.0.99
incoming called-number .
dtmf-relay sip-kpml

:::::::::::::::::::::::::::::::::::::::::

dial-peer voice 3 voip
preference 1
description secondary dial-peer to cluster
destination-pattern ^1…$
session protocol sipv2
session target ipv4:192.168.0.55
incoming called-number .
dtmf-relay sip-kpml

:::::::::::::::::::::::::::::::::::::::::

sip-ua
timers trying 1000

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

So based on the above configuration, I could tell that the intention was that if a call came into the gateway from the PSTN, it would be routed by dial-peer 5 to the call-manager server at ip-address 192.168.0.99 – and if no response was gotten from that server, the call would be re-routed by dial-peer 3 which is pointing to another server in the cluster.

However, this was not working as expected so I decided to enable sip debugging (debug ccsip messages) and also ISDN debugging (debug isdn q931 ) . After this, I made sure that the primary call-manager was unreachable and then I placed a test call into the cluster. This is what I saw in the debugs .

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

An ISDN setup message is received from the service provider. The calling number is 5555 and the called number is 1007.

:::::::::::::::::::::::::::::::::::::::::::

*Jun 13 20:55:56.528: ISDN Se0/3/0:23 Q931: RX <- SETUP pd = 8 callref = 0x0081
Bearer Capability i = 0x8090A2
Standard = CCITT
Transfer Capability = Speech
Transfer Mode = Circuit
Transfer Rate = 64 kbit/s
Channel ID i = 0xA98381
Exclusive, Channel 1
Progress Ind i = 0x8183 – Origination address is non-ISDN
Calling Party Number i = 0x2180, ‘5555‘
Plan:ISDN, Type:National
Called Party Number i = 0xA1, ‘1007‘
Plan:ISDN, Type:National

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

The voice gateway then tries to connect the call to the call-manager at 192.168.0.99 and because I had made sure that the primary server was unreachable, obviously there would be no response from the call-manager.

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

INVITE sip:1007@192.168.0.99:5060 SIP/2.0
Via: SIP/2.0/UDP 10.10.10.3:5060;branch=
Remote-Party-ID: <sip:5555@10.10.10.3>;party=calling;screen=no;privacy=off

From: <sip:5555@10.10.10.3>;tag=1EBBA8-1ED
To: <sip:1007@192.168.0.99>
Date: Thu, 13 Jun 2013 20:55:57 GMT
Call-ID: 7A5AF3E6-D3A211E2-800D8BCB-56CC18E8@10.10.10.3
Supported: 100rel,timer,resource-priority,replaces,sdp-anat
Min-SE: 1800
Cisco-Guid: 2052580998-3550613986-2147680284-1483238696
User-Agent: Cisco-SIPGateway/IOS-12.x
Allow: INVITE, OPTIONS, BYE, CANCEL, ACK, PRACK, UPDATE, REFER, SUBSCRIBE, NOTIF Y, INFO, REGISTER
CSeq: 101 INVITE
Max-Forwards: 70
Timestamp: 1371156957
Contact: <sip:5555@10.10.10.3:5060>
Expires: 180
Allow-Events: kpml, telephone-event
Content-Type: application/sdp
Content-Disposition: session;handling=required
Content-Length: 232

v=0
o=CiscoSystemsSIP-GW-UserAgent 9433 8425 IN IP4 10.10.10.3
s=SIP Call
c=IN IP4 10.10.10.3
t=0 0
m=audio 19182 RTP/AVP 18 19
c=IN IP4 10.10.10.3
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=no
a=rtpmap:19 CN/8000
a=ptime:20

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

During the trace or debug output collection, I noticed that even though the gateway was not getting a response back from the primary call-manager, it never used the secondary dial-peer to send a sip invite message to the backup call-manager. It just continued to send the same invite over and over again to the primary call-manager until the call failed. And whenever the call failed, I would see the trace output below:

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

*Jun 13 20:56:06.564: ISDN Se0/3/0:23 Q931: RX <- DISCONNECT pd = 8 callref = 0 x0081
Cause i = 0x82E6 – Recovery on timer expiry

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

So from the above, it is clear that the call was dropped from the service provider side because the Recovery timer had expired on the ISDN circuit. This basically means that : while the voice gateway was busy continuously sending the same invite messages to the non-responsive call-manager, the ISDN timers expired because the call was not progressing forward because the voice gateway was not able to connect the ISDN call leg to the SIP call leg.

So basically, I needed to stop the gateway from continuously sending invite messages to a server that is never going to respond. I started looking at all the default sip related configurations on the gateway and I found this:

:::::::::::::::::::::::::::::::::::::::::::

Router#show sip-ua retry
SIP UA Retry Values
invite retry count = 6 response retry count = 6
bye retry count = 10 cancel retry count = 10
prack retry count = 10 update retry count = 6
reliable 1xx count = 6 notify retry count = 10
refer retry count = 10 register retry count = 6
info retry count = 6 subscribe retry count = 6
options retry count = 6

:::::::::::::::::::::::::::::::::::::::::::

As soon as I saw the output above, it was clear what the problem was. As you can see, the sip user agent was configured to send 6 sip invites before giving up. And before it was done sending the 6 sip invites, the ISDN timers from the service provider had expired so the Cisco voice gateway never got to the point of sending the 6 sip invites before trying to reached the cluster using the secondary dial-peer.

In order to resolve this problem, I reduced the ‘Invite retry’ value to 2 so that the gateway would send two sip invites to the primary server and if it was not responding, the call would be forwarded to the secondary server using the secondary dial-peer.

:::::::::::::::::::::::::::::::::::::::::::

Router(config)#sip-ua

Router(config-sip-ua)#retry invite 2

:::::::::::::::::::::::::::::::::::::::::::

After the above configuration was added to the gateway, I tested everything again and the fail-over worked perfectly.

Hope you’ve enjoyed this entry.

Cheers

Tags: 0x82E6, dial peer, dial-peer failover, dial-peer redundancy, Recovery on timer expiry, Sip Dial-Peer Redundancy or Fail-over, Sip Timers, technology, voice gateway, voice iniciate, voiceiniciate

Comment

	BELA BACSI on International calls failing vi…
	Maxwell on SIP Dial-Peer Redundancy or Fa…
	Maxwell on Unity Connections 8.5.1.10000-…
	Sajith Thrimavithana on SIP Dial-Peer Redundancy or Fa…
	Sam on Unity Connections 8.5.1.10000-…
	Maxwell on Cisco 3850 switch sample QOS C…
	Aldrin on Cisco 3850 switch sample QOS C…
	Maxwell on Technology for Improved Day-to…
	Matt on Technology for Improved Day-to…
	Maxwell on Troubleshooting Cisco multi-pa…
	KakaShi (@fov001) on Troubleshooting Cisco multi-pa…
	Maxwell on How to add a Network file Syst…
	goliardico on How to add a Network file Syst…
	Maxwell on Troubleshooting MGCP Registrat…
	yasir on Troubleshooting MGCP Registrat…

	BELA BACSI on International calls failing vi…
	Maxwell on SIP Dial-Peer Redundancy or Fa…
	Maxwell on Unity Connections 8.5.1.10000-…
	Sajith Thrimavithana on SIP Dial-Peer Redundancy or Fa…
	Sam on Unity Connections 8.5.1.10000-…

The Voice Initiate

Tag Archive: Sip Dial-Peer Redundancy or Fail-over