- BGP Path Hunting and ASN Scheme for Data Centers
- Implementing BGP for the Underlay
- Auto-Discovered BGP Neighbors
- Summary
Auto-Discovered BGP Neighbors
The previous section demonstrated how to build an eBGP-based fabric underlay using point-to-point Layer 3 interfaces. This requires extensive IP management and operational maintenance as the fabric grows. An alternate, more efficient approach is to use a BGP feature called BGP auto-discovery (also referred to as BGP unnumbered), which uses link-local IPv6 addressing to automatically peer with its discovered neighbor by leveraging IPv6 Neighbor Discovery (ND). This is very beneficial for several reasons:
It eliminates the need for IP address management of the underlay and enables plug-and-play insertion of new fabric nodes.
It allows for easier automation of the underlay of the fabric since every fabric interface is configured the same way, with no IP addressing required. BGP, unlike IGPs, is designed to peer with untrusted neighbors, and thus the default need to specify a peer address, assign an ASN, and configure authentication for BGP peering. In a data center, which is largely a trusted environment, BGP is utilized more like an IGP, which makes automating it much easier, reducing any configuration complexity.
This section provides an implementation example of how to configure and deploy BGP auto-discovery, using packet captures for a deeper understanding of the same. The topology shown in Figure 3-10 is used to demonstrate this feature.
Figure 3-10 Topology to implement BGP auto-discovered neighbors
BGP auto-discovery relies on IPv6 Neighbor Discovery Protocol (NDP), which uses ICMPv6 messages to announce its link-local IPv6 address to its directly attached neighbors and learn the neighbors’ link-local IPv6 addresses from inbound ICMPv6 messages, replacing the traditional IPv4 ARP process. More specifically, this is achieved using an ICMPv6 message type called Router Advertisement (RA), which has an opcode of 134.
To enable BGP auto-discovery, the following steps must be done:
Enable IPv6 on the fabric-facing point-to-point interfaces. The IPv4 family must be enabled as well if IPv4 traffic is expected on the interface. Even though the peering between neighbors uses IPv6, the interface can carry traffic for any address family. No IPv6 or IPv4 address is required to be configured on these interfaces.
Enable protocol router-advertisements on the fabric-facing interfaces (the default RA interval is 15 seconds).
Configure BGP to automatically discover peers using IPv6 ND by enabling the underlay group for the IPv6 unicast address family and using the dynamic-neighbor hierarchy to define neighbor discovery using IPv6 ND for the fabric-facing interfaces.
Configure BGP for the IPv4 unicast address family, with the extended-nexthop configuration option. This allows IPv4 routes to be advertised via BGP with an IPv6 next-hop using a new BGP capability defined in RFC 8950 (which obsoletes RFC 5549) called the Extended Next Hop Encoding capability. This capability is exchanged in the BGP OPEN message.
The configuration of spine1 is shown in Example 3-16 as a reference. For the spines, since each leaf is in a different ASN, the peer-as-list configuration option is used to specify a list of allowed peer ASNs to which a BGP peering can be established. It is important that this peer ASN list be carefully curated, since a peering request from any other ASN (outside of this list) will be rejected.
Example 3-16 BGP auto-discovery configuration on spine1
admin@spine1# show interfaces ge-0/0/0 { unit 0 { family inet; family inet6; } } ge-0/0/1 { unit 0 { family inet; family inet6; } } ge-0/0/2 { unit 0 { family inet; family inet6; } } admin@spine1# show protocols router-advertisement interface ge-0/0/0.0; interface ge-0/0/1.0; interface ge-0/0/2.0; admin@spine1# show protocols bgp group auto-underlay { family inet { unicast { extended-nexthop; } } family inet6 { unicast; } dynamic-neighbor underlay { peer-auto-discovery { family inet6 { ipv6-nd; } interface ge-0/0/0.0; interface ge-0/0/1.0; interface ge-0/0/2.0; } } peer-as-list leafs; }
Once the respective fabric interfaces are enabled with IPv6 RA, the fabric nodes discover each other’s link-local IPv6 addresses. For example, leaf1 has discovered spine1’s and spine2’s link-local IPv6 addresses (as well as the corresponding MAC addresses) over its directly attached interfaces, as shown in Example 3-17, using the show ipv6 neighbors operational mode command.
Example 3-17 IPv6 neighbors discovered using RA on leaf1
admin@leaf1> show ipv6 neighbors IPv6 Address Linklayer Address State Exp Rtr Secure Interface fe80::e00:b3ff:fe09:1001 0c:00:b3:09:10:01 reachable 9 yes no ge-0/0/1.0 fe80::e00:ffff:fee3:3201 0c:00:ff:e3:32:01 reachable 14 yes no ge-0/0/0.0 Total entries: 2
This process of sending Router Advertisements can be seen in the packet capture shown in Figure 3-11, from the perspective of the link between leaf1 and spine1.
Figure 3-11 Packet capture of ICMPv6 Router Advertisement
Packet #4, highlighted in Figure 3-11, is an ICMPv6 Router Advertisement sent by spine1, while packet #5 is an ICMPv6 Router Advertisement sent by leaf1. Such packets are sent using the link-local IPv6 address as the source, destined to the well-known IPv6 multicast group of FF02::1. The link-local IPv6 address of leaf1’s interface can be confirmed as shown in Example 3-18.
Example 3-18 IPv6 link-local address assigned to ge-0/0/0.0 on leaf1
admin@leaf1> show interfaces ge-0/0/0.0 Logical interface ge-0/0/0.0 (Index 349) (SNMP ifIndex 540) Flags: Up SNMP-Traps 0x4004000 Encapsulation: ENET2 Input packets : 847 Output packets: 857 Protocol inet, MTU: 1500 Max nh cache: 100000, New hold nh limit: 100000, Curr nh cnt: 0, Curr new hold cnt: 0, NH drop cnt: 0 Flags: Sendbcast-pkt-to-re, Is-Primary, 0x0 Protocol inet6, MTU: 1500 Max nh cache: 100000, New hold nh limit: 100000, Curr nh cnt: 1, Curr new hold cnt: 0, NH drop cnt: 0 Flags: Is-Primary, 0x0 Addresses, Flags: Is-Preferred 0x800 Destination: fe80::/64, Local: fe80::e00:ecff:fe11:c601 Protocol multiservice, MTU: Unlimited Flags: Is-Primary, 0x0
With the link-local IPv6 addresses discovered for a given link, a TCP session can be initiated to establish BGP peering between the fabric nodes. The entire communication is IPv6 only, including the initial TCP three-way handshake and all the BGP messages exchanged between the prospective neighbors, such as the BGP OPEN and the BGP UPDATE messages shown in Figure 3-11.
The entire handshake, as well as the instantiation of the BGP session, is shown in Figure 3-12 as a reference.
Figure 3-12 Packet capture of TCP three-way handshake using IPv6 link-local addresses
In the BGP OPEN message exchanged between spine1 and leaf1, the extended next-hop capability is advertised, confirming that both devices support IPv4 NLRI encoded with an IPv6 next-hop address, as shown in Figure 3-13.
Once all leafs and spines are configured in the same way, an eBGP peering is established between the fabric nodes, as shown in Example 3-19 from the perspective of spine1 and spine2.
Figure 3-13 Packet capture of BGP OPEN message from spine1 advertised with extended next-hop capability
Example 3-19 Summary of BGP peers on spine1 and spine2
admin@spine1> show bgp summary Threading mode: BGP I/O Default eBGP mode: advertise - accept, receive - accept Groups: 1 Peers: 3 Down peers: 0 Auto-discovered peers: 3 Table Tot Paths Act Paths Suppressed History Damp State Pending inet.0 0 0 0 0 0 0 inet6.0 0 0 0 0 0 0 Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped... fe80::e00:36ff:fe96:af01%ge-0/0/1.0 65422 207 205 0 0 1:31:38 Establ inet.0: 0/0/0/0 inet6.0: 0/0/0/0 fe80::e00:bdff:fed8:c901%ge-0/0/2.0 65423 206 204 0 0 1:31:00 Establ inet.0: 0/0/0/0 inet6.0: 0/0/0/0 fe80::e00:ecff:fe11:c601%ge-0/0/0.0 65421 275 273 0 0 2:02:23 Establ inet.0: 0/0/0/0 inet6.0: 0/0/0/0 admin@spine2> show bgp summary Threading mode: BGP I/O Default eBGP mode: advertise - accept, receive - accept Groups: 1 Peers: 3 Down peers: 0 Auto-discovered peers: 3 Table Tot Paths Act Paths Suppressed History Damp State Pending inet.0 0 0 0 0 0 0 inet6.0 0 0 0 0 0 0 Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped... fe80::e00:11ff:fe86:9602%ge-0/0/1.0 65422 207 206 0 0 1:31:54 Establ inet.0: 0/0/0/0 inet6.0: 0/0/0/0 fe80::e00:7dff:fe45:5902%ge-0/0/0.0 65421 211 209 0 0 1:33:18 Establ inet.0: 0/0/0/0 inet6.0: 0/0/0/0 fe80::e00:95ff:feec:8502%ge-0/0/2.0 65423 206 205 0 0 1:31:16 Establ inet.0: 0/0/0/0 inet6.0: 0/0/0/0
The last piece of the puzzle is how IPv4 routes are advertised over this IPv6 BGP peering. Since the BGP group is configured to use an extended next-hop for the IPv4 address family, IPv4 routes can be advertised with an IPv6 next-hop address, as shown in Figure 3-14. In this packet capture, leaf1’s loopback address, 192.0.2.11/32, is advertised with an IPv6 next-hop address that matches leaf1’s respective link-local IPv6 address.
Figure 3-14 Packet capture of leaf1’s IPv4 loopback address advertised with an IPv6 next-hop
Taking leaf1 as an example again, all remote leaf loopback addresses are now learned with IPv6 next-hop addresses, as shown in Example 3-20, which also confirms loopback to loopback reachability between the leafs.
Example 3-20 IPv4 loopback addresses learned with an IPv6 next-hop
admin@leaf1> show route table inet.0 inet.0: 3 destinations, 5 routes (3 active, 0 holddown, 0 hidden) Limit/Threshold: 1048576/1048576 destinations + = Active Route, - = Last Active, * = Both 192.0.2.11/32 *[Direct/0] 1d 11:41:30 > via lo0.0 192.0.2.12/32 *[BGP/170] 00:00:27, localpref 100 AS path: 65500 65422 I, validation-state: unverified > to fe80::e00:ffff:fee3:3201 via ge-0/0/0.0 [BGP/170] 00:00:27, localpref 100 AS path: 65500 65422 I, validation-state: unverified > to fe80::e00:b3ff:fe09:1001 via ge-0/0/1.0 192.0.2.13/32 *[BGP/170] 00:00:07, localpref 100 AS path: 65500 65423 I, validation-state: unverified > to fe80::e00:b3ff:fe09:1001 via ge-0/0/1.0 [BGP/170] 00:00:07, localpref 100 AS path: 65500 65423 I, validation-state: unverified > to fe80::e00:ffff:fee3:3201 via ge-0/0/0.0 admin@leaf1> ping 192.0.2.12 source 192.0.2.11 PING 192.0.2.12 (192.0.2.12): 56 data bytes 64 bytes from 192.0.2.12: icmp_seq=0 ttl=63 time=3.290 ms 64 bytes from 192.0.2.12: icmp_seq=1 ttl=63 time=2.319 ms 64 bytes from 192.0.2.12: icmp_seq=2 ttl=63 time=2.914 ms 64 bytes from 192.0.2.12: icmp_seq=3 ttl=63 time=2.259 ms ^C --- 192.0.2.12 ping statistics --- 4 packets transmitted, 4 packets received, 0% packet loss round-trip min/avg/max/stddev = 2.259/2.696/3.290/0.428 ms admin@leaf1> ping 192.0.2.13 source 192.0.2.11 PING 192.0.2.13 (192.0.2.13): 56 data bytes 64 bytes from 192.0.2.13: icmp_seq=0 ttl=63 time=2.849 ms 64 bytes from 192.0.2.13: icmp_seq=1 ttl=63 time=2.453 ms 64 bytes from 192.0.2.13: icmp_seq=2 ttl=63 time=2.734 ms 64 bytes from 192.0.2.13: icmp_seq=3 ttl=63 time=2.936 ms ^C --- 192.0.2.13 ping statistics --- 4 packets transmitted, 4 packets received, 0% packet loss round-trip min/avg/max/stddev = 2.453/2.743/2.936/0.182 ms
From the perspective of the data plane, there is no change—the underlay is purely hop-by-hop routing, with a resolution of the Layer 2 address (MAC address) required for every hop. This is already resolved using the IPv6 Router Advertisement messages exchanged between the leafs and the spines, as shown in Example 3-17. Thus, the packet is still an IPv4 packet as shown in Figure 3-15, which is a packet capture of leaf1’s reachability to leaf2’s loopback address using the ping tool, while sourcing its own loopback address.
Figure 3-15 Packet capture of leaf1’s reachability test to leaf2’s loopback, using the ping tool