STP LOOP TROUBLESHOOTING


 

TOPICS

UNDERSTANDING STP COMMAND LINE

HOW LOOP OCCURS

CASE 1: DUPLEX MISMATCH

CASE 2: UDLD ISSUE

CASE 3: PACKET CORRUPTION

CASE 4: DIAMETER ISSUE

CASE 5: ANALYZE BPDU

CASE 6: CHECK PORT UTILIZATION

CASE 7: CHECK BROADCAST PACKETS

CASE 8: CHECK CPU UTILIZATION

CASE 9: MAXIMUM NUMBER OF STP INSTANCES SUPPORTED

CASE 10: IDENTIFYING LOOP

CASE 11: APPROACH TO DISABLE PORTS

CASE 12: LOG STP EVENTS

CASE 13: BPDU SKEW DETECTION   

***********************

UNDERSTANDING STP COMMAND LINE 
Reference Link

https://networkproxy.wordpress.com/2014/12/12/stp-understanding-command-line/


 

HOW LOOP OCCURS

Purpose of STP is to make loop free topology.

This can be achieved by breaking the loop at some point.

Here, breaking simply means shutting down the port at some point.

This complete process is part of the STP mechanism.

For any mechanism, there must be some communication method using some protocol.

Here, protocol used for communication is BPDU (Bridge Protocol Data Unit).

Till the time, BPDU’s are passing fine in the network, STP mechanism will be fine, and there will be no loop in the network.

Example:

A port is blocked till the time port knows that it has to stay in blocking state. How port will know, through BPDU itself. So, if BPDU’s get missed due to any reason, simply blocking port will come in forwarding state, which leads to the STP loop.

************************* 

 

CASE 1: DUPLEX MISMATCH

SWITCH A ======= SWITCH B

Switch A is Root bridge here

If, switch A is in Half duplex mode & switch B is in Full duplex mode.

Due to enough traffic from switch B to switchA, switchB will not be able to receive BPDU.

So, any blocking port at B will come in forwarding, which should not be in ideal scenario.

/ STP works fine or convergence is fine, if & only if BPDU’s are flowing normally.

It means a port which should be blocked as per STP mechanism will be blocked, only if port is receiving BPDU’s. /

 

Example Output

switchA#sh int status

Port   Name         Status       Vlan       Duplex Speed Type

Gi2/1                    disabled     1             full   1000 1000BaseSX

Gi2/2                     disabled     1            full   1000 1000BaseSX

Gi2/3                     disabled     1             full   1000 1000BaseSX

Gi2/4                     disabled     1             full   1000 No Transceiver

 

Solution

Permanent: Match the duplex setting

Temporary: Reboot the device

 

Reference Link

Understanding Ethernet

http://www.cisco.com/c/en/us/support/docs/lan-switching/ethernet/10561-3.html


 

CASE 2: UDLD ISSUE

SWITCHA ======= SWITCHB

On fiber links, a failure that goes without detection often causes unidirectional links.

Anything that can lead a link to stay up and provide a one-way communication is very dangerous with regard to STP.

 

Solution

Reboot will not help

Enable UDLD feature in STP

Reference Link

Understanding UDLD

http://www.cisco.com/c/en/us/support/docs/lan-switching/spanning-tree-protocol/10591-77.html


 

CASE 3: PACKET CORRUPTION

Packet corruption can also lead to the same kind of failure. If a link has a high rate of physical errors, you can lose a certain number of consecutive BPDUs. This loss can lead a blocking port to transition to forwarding state. You do not see this case very often because STP default parameters are very conservative. The blocking port needs to miss BPDUs for 50 seconds before the transition to forwarding. The successful transmission of a single BPDU breaks the loop.

 

Look for error increments in the input errors counter of the show?interfaces command. The error counters include?runts, giants, no buffer, CRC, frame, overrun, and ignored counts.

 

Example:

switchA#sh int gigabitEthernet 4/1

72215753129 packets input, 66789986050466 bytes, 0 no buffer

Received 648957 broadcasts (648905 multicasts)

0 runts, 0 giants, 0 throttles

     2 input errors, 2 CRC, 73 frame, 490669 overrun, 0 ignored

0 watchdog, 0 multicast, 0 pause input

0 input packets with dribble condition detected

 

Reference Link

Troubleshoot switch port & interface problems

http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/12027-53.html


 

CASE 4: DIAMETER ISSUE

This maximum network diameter restricts how far away from each other bridges in the network can be. In this case, two distinct bridges cannot be more than seven hops away from each other. Part of this restriction comes from the age field that BPDUs carry.

Take special care if you plan to change STP timers from the default value. There is danger if you try to get faster reconvergence in this way. An STP timer change has an impact on the diameter of the network and the stability of the STP.


 

CASE 5: ANALYZE BPDU

Analyze on which ports BPDU’s should be received or not, as per STP convergence mechanism

 

CASE: NON-ROOT BRIDGE

On Non-Root bridges, BPDU’s will be received ideally, check if actually BPDU’s are received.

sh spanning-tree detail

Port 129 (GigabitEthernet3/1) of VLAN0001 is designated forwarding

BPDU: sent 24363169, received 9

sh spanning-tree detail

Port 129 (GigabitEthernet3/1) of VLAN0001 is designated forwarding

BPDU: sent 24363184, received 10

 

CASE: ROOT BRIDGE

On Root bridges, BPDU’s should not be received ideally, check if actually BPDU’s are not received.

Above command shows BPDU’s are received. But, this can be only seen on the Non-Root Bridges.

Whereas on Root Bridge, BPDU’s should not be received, BPDU’s should be only generated.

sh spanning-tree detail

Port 1153 (GigabitEthernet10/1) of VLAN0001 is designated forwarding

BPDU: sent 4242079, received 0

sh spanning-tree detail

Port 1153 (GigabitEthernet10/1) of VLAN0001 is designated forwarding

BPDU: sent 4242101, received 0


 

CASE 6: CHECK PORT UTILIZATION

An interface with traffic overload can fail to transmit vital BPDUs.

Example:

switchA#sh int summ

*: interface is up

IHQ: pkts in input hold queue     IQD: pkts dropped from input queue

OHQ: pkts in output hold queue   OQD: pkts dropped from output queue

RXBS: rx rate (bits/sec)         RXPS: rx rate (pkts/sec)

TXBS: tx rate (bits/sec)         TXPS: tx rate (pkts/sec)

TRTL: throttle count

Interface               IHQ   IQD OHQ   OQD RXBS RXPS TXBS TXPS TRTL


Vlan1                   0     0   0     0     0   0     0   0   0

  • Vlan926                 0 32437   0     0 1000   1     0   0 1956

 

Above output shows exact traffic utilization at a particular time.


CASE 7: CHECK BROADCAST PACKETS

CHECK IF BROADCAST PACKETS ARE INCREASING HEAVILY ON A PARTICULAR INTERFACE

switch#sh int counters

Port           InOctets   InUcastPkts   InMcastPkts   InBcastPkts

Gi2/1                 0             0             0             0

Gi2/2                 0             0             0             0

Gi4/1     66789752417866   72214840512       648905           52

Gi4/2       94524288649     125114492       8573073         95799


 

CASE 8: CHECK CPU UTILIZATION

Example:

switchA# sh processes cpu history

11   11 3111 11   1 1 1   1 111   41   1   111111

8823888229943049810877298081988828810398809889888798901107

100

90

80

70

60

50

40                                           *

30             *                             *

20             *                            **   *   *   *

10 ***********#*****************************#****************

0….5….1….1….2….2….3….3….4….4….5….5….

0   5   0   5   0   5   0   5   0   5

CPU% per minute (last 60 minutes)

  • = maximum CPU%   # = average CPU%

CASE 9: MAXIMUM NUMBER OF STP INSTANCES SUPPORTED

Example:

switchA#sh spanning-tree sum


24 vlans                     0         0       0       1300       1300

Here, 1300 is the total number of instances running for all the vlan’s. This number should be supported by SUP card of switch, if this is crossed, BPDU’s might be missed.


 

CASE 10: IDENTIFYING LOOP

The best way to identify a bridging loop is to capture the traffic on a saturated link and check that you see similar packets multiple times.

Symptom: If all users in a certain bridge domain have connectivity issues at the same time, you can already suspect a bridging loop.

Check the port utilization on your devices and look for abnormal values, using command: show interface summary


 

CASE 11: APPROACH TO DISABLE PORTS

Bridging loops have extremely severe consequences on a bridge network. Administrators generally do not have time to look for the cause of the loop and prefer to restore connectivity as soon as possible. The easy way out in this case is to manually disable every port that provides redundancy in the network. If you can identify a part of the network that is affected most, begin to disable ports in this area. Or, if possible, initially disable ports that should be blocking. Each time you disable a port, check to see if you have restored connectivity in the network. By identifying which disabled port stops the loop, you also identify the redundant path where this port is located. If this port should have been blocking, you have probably found the link on which the failure appeared.


 

CASE 12: LOG STP EVENTS

Enable the logging of STP events on the bridges and switches of the network that experiences the failure. If you want to limit the number of devices to configure, at least enable this logging on devices that host blocked ports; the transition of a blocked port is what creates a loop.

Cisco IOS Software—Issue the exec command debug spanning-tree events to enable STP debug information. Issue the general config mode command logging?buffered to capture this debug information in the device buffers.

CatOS—The set?logging?level?spantree?7?default command increases the default level of events that relate to STP to the debug level. Be sure that you log a maximum number of messages in the switch buffers with use of the?set?logging?buffer?500 command.


 

CASE 13: BPDU SKEW DETECTION 

Feature Description

STP operation relies heavily on the timely reception of BPDUs. At every hello_time message (2 seconds by default), the root bridge sends BPDUs. Non-root bridges do not regenerate BPDUs for each hello_time message, but they receive relayed BPDUs from the root bridge. Therefore, every non-root bridge should receive BPDUs on every VLAN for each hello_time message. In some cases, BPDUs are lost, or the bridge CPU is too busy to relay BPDU in a timely manner. These issues, as well as other issues, can cause BPDUs to arrive late (if they arrive at all). This issue potentially compromises the stability of the spanning tree topology.

BPDU skew detection allows the switch to keep track of BPDUs that arrive late and to notify the administrator with syslog messages. For every port on which a BPDU has ever arrived late (or has skewed), skew detection reports the most recent skew and the duration of the skew (latency). It also reports the longest BPDU delay on this particular port.

In order to protect the bridge CPU from overload, a syslog message is not generated every time BPDU skewing occurs. Messages are rate-limited to one message every 60 seconds. However, should the delay of BPDU exceed max_age divided by 2 (which equals 10 seconds by default), the message is immediately printed.

Note: BPDU skew detection is a diagnostic feature. Upon detection of BPDU skewing, it sends a syslog message. BPDU skew detection takes no further corrective action.

This is an example of a syslog message generated by BPDU skew detection:

%SPANTREE-2-BPDU_SKEWING: BPDU skewed with a delay of 10 secs (max_age/2)

 

Configuration Considerations

BPDU skew detection is configured on a per-switch basis. The default setting is disabled. Issue this command in order to enable BPDU skew detection:

Cat6k> (enable) set spantree bpdu-skewing enable

Spantree bpdu-skewing enabled on this switch.

In order to see BPDU skewing information, use the show spantree bpdu-skewing <vlan>|<mod/port> command as demonstrated in this example:

 

Cat6k> (enable) show spantree bpdu-skewing 1

Bpdu skewing statistics for vlan 1

Port Last Skew (ms) Worst Skew (ms) Worst Skew Time


3/12 4000 4100 Mon Nov 19 2001, 16:36:04


 

 

Advertisements

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s