BASIC INTERVIEW QUESTIONS AND ANSWERS: ISP


How to check Neighborship is working fine between PE & CE ?

Reference Link

http://www.cisco.com/en/US/docs/internetworking/troubleshooting/guide/tr1915.html

 

Points to check

Which protocol is running between PE & CE: Use command : sh ip proto summ: It will show all the protocols running

 

Check if  neighbor ship is fine

 

If Neighbor ship is fine, check if routes are flapped

 

If Routes are flapped, there might be the case, Neighbor ship is fine, but there is some issue in the further path from where Routes are learned

Example: R1 —— R2 ——- R3 ———R4——— network 192.168.10.0/23

we are checking routes at R1.  R1 & R2  neigbor ship is fine, but R3 got rebooted, Then network 192.168.10.0/23 learned at R1 got flapped

 

If BGP is running between them, simply check If CE is learning prefixes: use command sh ip bgp summ

 

Check if input errors are increasing

Possible Problem

Solution
Input rate exceeds the capacity of the router, or input queues exceed the size of output queues Note: Input drop problems are typically seen when traffic is being routed between faster interfaces (such as Ethernet, Token Ring, and FDDI1) and serial interfaces. When traffic is light, there is no problem. As traffic rates increase, backups start occurring. Routers drop packets during these congested periods.
Input rate exceeds the capacity of the router, or input queues exceed the size of output queues (continued) 1. Increase the output queue size on common destination interfaces for the interface that is dropping packets. Use the hold-queue number out interface configuration command. Increase these queues by small increments (for instance, 25 percent) until you no longer see drops in the show interfaces output. The default output hold queue limit is 100 packets.2. Reduce the input queue size, using the hold-queue number in interface configuration command, to force input drops to become output drops. Output drops have less impact on the performance of the router than do input drops. The default input hold queue is 75 packets.

 

Five problem states

Serial x is down, line protocol is down

Serial x is up, line protocol is down

Serial x is up, line protocol is up (looped)

Serial x is up, line protocol is down (disabled)

Serial x is administratively down, line protocol is down

 

Check if the particular link(which is having error) is in Multilink. Remove link from Multilink by chnaging encapsulation from PPP to HDLC or to

no encapsulation and then perform testing over link having issues

 

 

How to classify the alarm types ?

Critical : Example: Device down, Link completely down(All T1’s  or  E1’s  are down)

Major : Example: Out of number of T1’s, few T1 is down

Minor : Latency Issues, Packet Drops : These occurs due to errors over the link

 

How to check live traffic on interfaces ?

Command: sh int summ

What all are the Tools to check Bandwidth utilization ?

PRTG, NFA(Net Flow Analyzer)

 

What all are the error types, which occurs over the link & how to check/ remove them ?

Reference Link

Troubleshooting Ethernet

http://www.cisco.com/en/US/docs/internetworking/troubleshooting/guide/tr1904.html

Troubleshooting Switch Port & Interface Problems

http://www.cisco.com/c/en/us/support/docs/switches/catalyst-6500-series-switches/12027-53.html#copper

Commonly errors are: CRC, Over runs, Runts, Frame

 

CRC Errors

Example

The CRC error rate is 1.75915% (greater than 1 in a million packets), and the collision rate is less than 0.1%

This can indicate excessive noise or transmission problems.  A high number of CRCs is usually the result of collisions or a station transmitting bad data.

  • Bad fiber cable
  • Dirty optics

Solution

Check cables to determine whether any are damaged. If 100BaseTX is being used, ensure Category 5 cabling is being used and not another type, such as Category 3

INPUT ERRORS

Includes runts, giants, no buffer, CRC, frame, overrun, and ignored counts. Other input-related errors can also cause the input error count to be increased, and some datagrams may have more than one error; therefore, this sum may not balance with the sum of enumerated input error counts

FRAME

Shows the number of packets received incorrectly having a CRC error and a non integer number of octets. On a LAN, this is usually the result of collisions or a malfunctioning Ethernet device.

OVERRUN

Shows the number of times that the receiver hardware was incapable of handing received data to a hardware buffer because the input rate exceeded the receiver’s capability to handle the data

What is Loop testing ?

Sending a signal from a source & then receiving the same signal back to the source from destination

What is Hard Loop & Soft Loop ?

What is Intrusive & Non-Intrusive Testing ?

Ans.

Intrusive Testing: If multilink, consists of number of T1’s or E1’s, is completely down, then Intrusive testing is performed

Non-Intrusive Testing: If out of Multilink, some particular T1 or E1 is down, then remove affected T1 or E1 from Multilink and testing is performed on affected T1 or E1 only

 

What all are the common BGP commands for initial troubleshooting ?

Ans.

sh ip bgp nei

 

sh ip bgp summ, check if neighbor is learning prefixes or not. What is the stuck in which neighbor ship got stuck. First three states confirms TCP connection is OK. Next three states confirm complete BGP neighbor ship is formed

If prefixes are not learned, number of reasons can be there, Route map can be there with Access List to control prefixes  OR,  Neighbor ship got stuck in any stage

 

What is the process, if a new customer requests for a new leased line circuit ?

 

How to confirm to the customer, that the link provided to them is exactly of the same bandwidth as requested by the customer ?

Ans.

One option is using Third Party Software, which pushes as maximum traffic as possible over the link, to check link capacity or Bandwidth of the link

 

There is capping of Bandwidth which is done at the PE device to provide link to the customer, which confirms Bandwidth of the link

 

What all are the SLA’s commonly for different priority issues ?

Ans.

Two Types os SLA’s (Service Level Agreement) are there:

  1. Response SLA: To respond to the Incident
  2. Resoluton SLA: To resolute Incident

 

Priority One Incident:  Response (15 minutes), Resolution (2 Hrs)

Priority Two Incident: Response ( 30 minutes), Resolution (4 hrs)

Priority Three Incident: Response (1 Hr), Resolution (8 Hrs)

 

What is the process to handle Priority One issues ?

Ans.

Ticket is responded within SLA (Example: 15 minutes, Response SLA)

 

Bridge is opened & bring Operation Manager, Incident Manager, Vendor Engineer (if required), TAC Engineer (if required), Engineer from own team.

 

Draft Initiation Mail to Operation Manager, Incident Manager, Vendor Engineer (if required), TAC Engineer (if required), Engineer from own team.

 

Keep updating over the Mail with average time (generally, 30 minutes)

 

After resolution, prepare RCA (Root Cause Analysis), to find out reason for issue occured

 

What all the reports generally you work upon ?

Ans.

Utilization reports using Tools like NFA(Net Flow Analyzer), PRTG, e-Health. Generally, we enable monitoring on WAN interface.

 

What is the last mile &  its issues ?

Ans.

Reference

http://searchnetworking.techtarget.com/definition/last-mile-technology

 

Last Mile, as name suggests is the end connectivity provided to the customer

Which includes, CSU/DSU, Converter(If exists), end networking device

Issue is confirmed by Loop testing in above mentioned portion

 

Alarm received for device is not accessible, what all are the possible reasons & troubleshooting steps?

Ans.

Possible reasons

Device is up, but unable to respond to the SNMP server, due to High utilization on device

Check by command: sh processes cpu history

 

Confirm Power issue. Check if there is some out of band option to get access to device. If you are able to access device using out of band, it means there is no power issue

Out Of Band is separate line of very low bandwidth (Example: 256 KB), apart from the actual MPLS link, just to get the access on the dvevice. Out Of Band option cannot handle actual traffic.

 

Check if there are number of VTY sessions formed. This will also do not allow us to get the access on the device. Check for some alternate option to get the access on the device.

Example: Avocent console. This is used to take the direct console of the device. Using this option, simply login to the device & remove extra VTY line connections formed

 

Nothing worked till now

MPLS link connecting to the site can be down. Open ticket with the Vendor to confirm Issue

 

One comment

Leave a comment