BGP troubleshooting: Basic
Border Gateway Protocol (BGP) is without doubt the most complex IP routing protocol currently deployed in the Internet. Its complexity is primarily due to its focus on security and routing policies – BGP is used to exchange cooperative information (Internet routes) between otherwise competing entities (service providers) and has to be able to implement whatever has been agreed upon in the inter-provider peering agreements. (These agreements often have little to do with technically optimum solutions.)
However, a structured approach to BGP troubleshooting, as illustrated in this and the next section can quickly lead you from initial problem diagnosis to the solution. Here we focus on a simple scenario with a single BGP-speaking router in your network (see the following diagram). Similar designs are commonly used by multi-homed customers and small Internet service providers (ISPs) that do not offer BGP connectivity to their customers.
Is it a BGP problem?
Before jumping into BGP troubleshooting, you have to identify the source of the connectivity problem you’re debugging (usually you suspect that BGP might be involved if one of your customers reports limited or no Internet connectivity beyond your network). Perform a traceroute from a workstation on the problematic LAN; if the trace reaches the first BGP-speaking router (or, even better, gets beyond the edge of your network) router, you’re probably dealing with a BGP issue. Otherwise, check whether the BGP-speaking router advertises a default route into your network (without a default route, other routers in your network cannot reach the Internet destinations).
If you don’t have access to a LAN-attached workstation, you can perform the traceroute from the customer premises router, but you have to ensure that the source IP address used in the traceroute packets is the router’s LAN address.
Troubleshooting BGP adjacencies
BGP has to establish TCP session between adjacent BGP routers before they can exchange routes. The first check is thus the status of the BGP sessions between the routers.
The BGP neighbors are configured manually, and the two most probable configuration errors are:
- Neighbor IP address mismatch: The destination IP address configured on one BGP neighbor has to match the source IP address (or the IP address of the directly connected interface) configured on the other.
- AS number mismatch: The neighbor AS number configured on one side of the BGP session has to match the actual BGP AS number used by the neighbor.
You could also have a problem with packet filters deployed on the BGP-speaking router. These filters have to allow packets to and from TCP port 179.
Troubleshooting route propagation
If your users want to receive traffic from the Internet, the IP prefix assigned to your network must be visible throughout the Internet. To get there, three steps are needed:
- Your BGP router must insert your IP prefix into its BGP table.
- The IP prefix must be advertised to its BGP neighbors.
- The IP prefix must be propagated throughout the Internet.
Is the route inserted into BGP?
Most routing protocols automatically insert directly connected IP subnets into their routing tables (or databases). Owing to security requirements, BGP is an exception; it will originate an IP prefix only if it’s manually configured to do so (for example, Cisco routers use the network statement to configure advertised IP prefixes). Another option is route redistribution, which is highly discouraged in the Internet environment.
Furthermore, to avoid attracting unroutable traffic, BGP will announce a configured IP prefix only if there’s a matching route in the IP routing table. You could generate the matching IP route through route summarization, but it’s usually best to configure a static route pointing to a null interface (or its equivalent).
To check whether your IP prefix is in your BGP routing table, use a BGP show command (for example, show ip bgp prefix mask on a Cisco router).
Is the route advertised to your neighbors?
By default, all IP prefixes residing in the BGP table are announced to all BGP neighbors. Owing to security and routing policy requirements, the default behavior is usually modified with a set of output and input filters. If you have applied output filters toward your BGP neighbors, you have to check whether these filters allow your IP prefix to be propagated to the external BGP neighbors. The command to display routes advertised to a BGP neighbor on a Cisco router is show ip bgp neighbor ip-address advertised.
Is the route visible throughout the Internet?
Even if you’ve successfully announced your IP prefix to your BGP neighbors, it might still not be propagated throughout the Internet. It’s hard to figure out exactly what’s propagated beyond the boundaries of your network; the tools that can help you are called BGP looking glasses. Using these tools, you can inspect BGP tables at various points throughout the Internet and check whether your IP prefix has made it to those destinations.
There are a few factors that could cause your IP prefix to be blocked somewhere in the Internet. The most common one is BGP route flap dampening: If an IP prefix flaps (disappears and reappears) too often in a short period of time — for example, you clear your BGP sessions or change your BGP configuration — the prefix gets blocked for an extended period of time (by default, up to an hour). If your IP prefix is dampened, there’s nothing you can do except wait it out. You could also have an invalid (or missing) entry in IP routing registries, or there may be inbound filters at one of the upstream ISPs. In all these cases, it’s best if your upstream ISP can help you resolve the problem (which is, at this point, beyond the scope of technical BGP troubleshooting).
BGP troubleshooting: Advanced
In the previous section of this e-guide we addressed some basic BGP troubleshooting skills:
- How to identify whether a routing problem is a BGP problem,
- How to troubleshoot BGP sessions,
- How to troubleshoot IP route origination and propagation.
Now let’s we focus on a more advanced scenario: transit Internet service provider (ISP) networks (see the next diagram).
NOTE: Before reading this section, make sure you’ve read section and two to become familiar with basic Border Gateway Protocol technology as well as simple BGP troubleshooting.
To establish end-to-end connectivity across a service provider network, the ISP has to receive customers’ IP prefixes via BGP and announce them to other ISPs. The same process has to happen in reverse direction (or at least the default route has to be announced to the customer). The network-wide BGP troubleshooting is thus composed of three steps:
- Have we received the prefix?
- Is the prefix propagated across our network?
- Is the prefix sent to external BGP neighbors at the other edge of the network?
Have we received the prefix?
Troubleshooting inbound BGP problems is the toughest part of BGP troubleshooting you’ll encounter. There are two potential reasons that an IP prefix is not in your BGP table as you would expect it to be:
- The neighbor is not sending the prefix.
- Your inbound filters are blocking the prefix.
The only tool that can help you identify the problem is the debugging facility on your edge router (as you normally don’t have access to the other BGP neighbor). When doing BGP debugging, be aware that a BGP neighbor can send you several hundred thousand routes, so you have to ensure that the debugging output produced by the troubleshooting session does not overwhelm the router. Furthermore, the BGP prefixes are sent only when they change, not on a periodic basis (like RIP updates or OSPF LSA floods). Your debugging tool will thus not show you an IP prefix until it has actually changed (or you’ve cleared the BGP session with your neighbor).
Some BGP routers have the ability to store a separate copy of all routes sent by a neighbor into a parallel BGP table. (To enable this functionality on Cisco IOS, you have to configuresoft-reconfiguration in for a BGP neighbor.) With the parallel per-neighbor table, you can exactly pinpoint what the neighbor has sent you (the content of the parallel table) and what routes have passed your input filters (the contents of the main BGP table), but of course the parallel per-neighbor table consumes a large amount of memory.
Is the prefix propagated across our network?
Even when an edge router receives an IP prefix via BGP, it may not be propagated to the other end of your network. To start with, internal BGP (BGP within a single autonomous system) requires a full mesh of BGP sessions among all BGP routers. As every router between every pair of edge routers has to run BGP (otherwise the traffic could be dropped inside your network), the number of BGP sessions could become excessively large. (The next diagram illustrates the BGP sessions needed in a small four-router network.)
There are two tools (BGP route reflectors and BGP confederations) that can help you keep the number of BGP sessions to a sensible level, with BGP route reflectors being the most commonly used.
The BGP route reflector rules are quite simple:
- Whatever is received from a route-reflector client or an external BGP peer will be sent to every other BGP peer.
- Whatever is received from a router that is not a route-reflector client will be sent only to clients and external BGP peers.
With these rules in hand, you have to step through the graph of BGP sessions in your network, checking every BGP router on the way and ensuring that the route reflector rules are not violated (and that, using the rules, the BGP prefixes get from every edge router to all other routers).
There is another common reason an IP prefix is not propagated across your network: The external subnets on the edge of your network are not advertised to your core routers.
The IP address of the next-hop router is not changed when an IP prefix is sent to an internal BGP neighbor. The IP next-hop of an external route is thus always the IP address of a routerone hop beyond the edge of your autonomous system. The IP subnets connecting your edge routers to their external neighbors thus have to be inserted into your internal routing protocol (for example, OSPF or IS-IS), otherwise some internal BGP router will decide that the BGP next-hop is not reachable and ignore the IP prefix. (It will appear in the BGP table but will not be used or propagated to other BGP peers.)
Is the prefix sent to external neighbors?
As the last step in troubleshooting BGP route propagation, you have to check whether the IP prefixes transported across your network are announced to your external BGP peers.
Is the traffic traversing the network?
Even if your BGP route propagation works flawlessly, the IP packets may not be able to traverse your network. (Remember, we’re talking about pure IP networks here; things change a bit if you add MPLS to the mix.) The most common cause of a “black hole” in your network is a router in the transit path that does not run BGP and consequently has no idea how to route the received IP packet toward the destination network.
IP routing works hop by hop. Even though the ingress edge router knows exactly which egress edge router to use and how to get there, it cannot pass that information to the intermediate routers. All of them must therefore run BGP as well.
To identify a black hole in your network, perform a traceroute from your customer’s network to a destination in the Internet. The last router responding to the traceroute is one hop before the black hole.
Even though all core routers in your network have to run BGP, the internal BGP sessions don’t have to follow the physical structure of the network. For example, you could have a few central routers acting as BGP route reflectors for all BGP routers in your network.