Monday, January 4, 2010

%DUAL-5-NBRCHANGE: IP-EIGRP(0) 1: Neighbor (Vlan109) is down: retry limit exceeded & peer restarted

We are facing EIGRP neibor peer restarted error logs on my backbone switch and its getting down after every 4 hrs. All our network is severely being impacted due to this EIGRP neibour error.

Debugging Steps
  1. the failure of IP routing protocols and messages of duplicated addresses in HSRP are symptoms of underlying layer 2 problems.
  2. analyze STP and log on all devices. Be aware that also a malfunctioning or mis-configured access layer switch can be the root cause of this kind of problems.
  3. verify who is the root bridge for each defined vlan, there are chances that for some specific vlan the root bridge is a poor access layer switch.
  4. Starting from distribution or core switches you should use: show spanning-tree summary
  5. for each vlan listed in the previous show you can see details using: sh spanning-tree vlan vlan#
  6. Verify the root bridge id for each vlan. It should be one of core/distribution switches, if not this can be a problem.
  7. verify to the check if there is any wrong configuration that caused an access layer switch to be root bridge for a specific vlan.
  8. Try to reach by telnet each access layer switch. The log can be examined using: sh log
  9. search for spanning-tree messages if present they are a lot one for each vlan/STP instance.
  10. HSRP and routing protocols are like user traffic in this case and the error events are symptoms, changing the VIP address is not a solution.
  11. Also look for possible CDP native vlan mismatch that can join two vlans on access ports (no trunking)
  12. Since in your case the problem affects multiple vlans so the cause should be one of the previously mentioned (wrong root bridge or malfunctioning device)
