How to check Neighborship is working fine between PE & CE ?
Points to check
Which protocol is running between PE & CE: Use command : sh ip proto summ: It will show all the protocols running
Check if neighbor ship is fine
If Neighbor ship is fine, check if routes are flapped
If Routes are flapped, there might be the case, Neighbor ship is fine, but there is some issue in the further path from where Routes are learned
Example: R1 —— R2 ——- R3 ———R4——— network 192.168.10.0/23
we are checking routes at R1. R1 & R2 neigbor ship is fine, but R3 got rebooted, Then network 192.168.10.0/23 learned at R1 got flapped
If BGP is running between them, simply check If CE is learning prefixes: use command sh ip bgp summ
Check if input errors are increasing
|Input rate exceeds the capacity of the router, or input queues exceed the size of output queues||Note: Input drop problems are typically seen when traffic is being routed between faster interfaces (such as Ethernet, Token Ring, and FDDI1) and serial interfaces. When traffic is light, there is no problem. As traffic rates increase, backups start occurring. Routers drop packets during these congested periods.|
|Input rate exceeds the capacity of the router, or input queues exceed the size of output queues (continued)||1. Increase the output queue size on common destination interfaces for the interface that is dropping packets. Use the hold-queue number out interface configuration command. Increase these queues by small increments (for instance, 25 percent) until you no longer see drops in the show interfaces output. The default output hold queue limit is 100 packets.2. Reduce the input queue size, using the hold-queue number in interface configuration command, to force input drops to become output drops. Output drops have less impact on the performance of the router than do input drops. The default input hold queue is 75 packets.|
Five problem states
Serial x is down, line protocol is down
Serial x is up, line protocol is down
Serial x is up, line protocol is up (looped)
Serial x is up, line protocol is down (disabled)
Serial x is administratively down, line protocol is down
Check if the particular link(which is having error) is in Multilink. Remove link from Multilink by chnaging encapsulation from PPP to HDLC or to
no encapsulation and then perform testing over link having issues
How to classify the alarm types ?
Critical : Example: Device down, Link completely down(All T1’s or E1’s are down)
Major : Example: Out of number of T1’s, few T1 is down
Minor : Latency Issues, Packet Drops : These occurs due to errors over the link
How to check live traffic on interfaces ?
Command: sh int summ
What all are the Tools to check Bandwidth utilization ?
PRTG, NFA(Net Flow Analyzer)
What all are the error types, which occurs over the link & how to check/ remove them ?
Troubleshooting Switch Port & Interface Problems
Commonly errors are: CRC, Over runs, Runts, Frame
The CRC error rate is 1.75915% (greater than 1 in a million packets), and the collision rate is less than 0.1%
This can indicate excessive noise or transmission problems. A high number of CRCs is usually the result of collisions or a station transmitting bad data.
- Bad fiber cable
- Dirty optics
Check cables to determine whether any are damaged. If 100BaseTX is being used, ensure Category 5 cabling is being used and not another type, such as Category 3
Includes runts, giants, no buffer, CRC, frame, overrun, and ignored counts. Other input-related errors can also cause the input error count to be increased, and some datagrams may have more than one error; therefore, this sum may not balance with the sum of enumerated input error counts
Shows the number of packets received incorrectly having a CRC error and a non integer number of octets. On a LAN, this is usually the result of collisions or a malfunctioning Ethernet device.
Shows the number of times that the receiver hardware was incapable of handing received data to a hardware buffer because the input rate exceeded the receiver’s capability to handle the data
What is Loop testing ?
Sending a signal from a source & then receiving the same signal back to the source from destination
What is Hard Loop & Soft Loop ?
What is Intrusive & Non-Intrusive Testing ?
Intrusive Testing: If multilink, consists of number of T1’s or E1’s, is completely down, then Intrusive testing is performed
Non-Intrusive Testing: If out of Multilink, some particular T1 or E1 is down, then remove affected T1 or E1 from Multilink and testing is performed on affected T1 or E1 only
What all are the common BGP commands for initial troubleshooting ?
sh ip bgp nei
sh ip bgp summ, check if neighbor is learning prefixes or not. What is the stuck in which neighbor ship got stuck. First three states confirms TCP connection is OK. Next three states confirm complete BGP neighbor ship is formed
If prefixes are not learned, number of reasons can be there, Route map can be there with Access List to control prefixes OR, Neighbor ship got stuck in any stage
What is the process, if a new customer requests for a new leased line circuit ?
How to confirm to the customer, that the link provided to them is exactly of the same bandwidth as requested by the customer ?
One option is using Third Party Software, which pushes as maximum traffic as possible over the link, to check link capacity or Bandwidth of the link
There is capping of Bandwidth which is done at the PE device to provide link to the customer, which confirms Bandwidth of the link
What all are the SLA’s commonly for different priority issues ?
Two Types os SLA’s (Service Level Agreement) are there:
- Response SLA: To respond to the Incident
- Resoluton SLA: To resolute Incident
Priority One Incident: Response (15 minutes), Resolution (2 Hrs)
Priority Two Incident: Response ( 30 minutes), Resolution (4 hrs)
Priority Three Incident: Response (1 Hr), Resolution (8 Hrs)
What is the process to handle Priority One issues ?
Ticket is responded within SLA (Example: 15 minutes, Response SLA)
Bridge is opened & bring Operation Manager, Incident Manager, Vendor Engineer (if required), TAC Engineer (if required), Engineer from own team.
Draft Initiation Mail to Operation Manager, Incident Manager, Vendor Engineer (if required), TAC Engineer (if required), Engineer from own team.
Keep updating over the Mail with average time (generally, 30 minutes)
After resolution, prepare RCA (Root Cause Analysis), to find out reason for issue occured
What all the reports generally you work upon ?
Utilization reports using Tools like NFA(Net Flow Analyzer), PRTG, e-Health. Generally, we enable monitoring on WAN interface.
What is the last mile & its issues ?
Last Mile, as name suggests is the end connectivity provided to the customer
Which includes, CSU/DSU, Converter(If exists), end networking device
Issue is confirmed by Loop testing in above mentioned portion
Alarm received for device is not accessible, what all are the possible reasons & troubleshooting steps?
Device is up, but unable to respond to the SNMP server, due to High utilization on device
Check by command: sh processes cpu history
Confirm Power issue. Check if there is some out of band option to get access to device. If you are able to access device using out of band, it means there is no power issue
Out Of Band is separate line of very low bandwidth (Example: 256 KB), apart from the actual MPLS link, just to get the access on the dvevice. Out Of Band option cannot handle actual traffic.
Check if there are number of VTY sessions formed. This will also do not allow us to get the access on the device. Check for some alternate option to get the access on the device.
Example: Avocent console. This is used to take the direct console of the device. Using this option, simply login to the device & remove extra VTY line connections formed
Nothing worked till now
MPLS link connecting to the site can be down. Open ticket with the Vendor to confirm Issue