Home > Articles

This chapter is from the book

Implementing BGP for the Underlay

This section provides implementation specifics for building an eBGP underlay for an IP fabric or a VXLAN fabric using network devices running Junos. A unique ASN per fabric node design is used to demonstrate how spines can have suboptimal paths that can lead to path hunting, since the implementation of using the same ASNs on all spines in a 3-stage Clos network is straightforward and requires no demonstration. Then, routing policies are implemented to prevent path hunting. The implementation is based on the topology shown earlier in Figure 3-1.

In this network, for the underlay, each fabric-facing interface is configured as a point-to-point Layer 3 interface, as shown in Example 3-2 from the perspective of leaf1.

Example 3-2 Point-to-point Layer 3 interface configuration on leaf1 for fabric-facing interfaces

admin@leaf1# show interfaces ge-0/0/0
description "To spine1";
mtu 9100;
unit 0 {
    family inet {
        address 198.51.100.0/31;
    }
}

admin@leaf1# show interfaces ge-0/0/1
description "To spine2";
mtu 9100;
unit 0 {
    family inet {
        address 198.51.100.2/31;
    }
}

The goal of the underlay is to advertise the loopbacks of the VXLAN Tunnel Endpoints (VTEPs), since these loopbacks are used to build end-to-end VXLAN tunnels. Thus, on each VTEP, which are the fabric leafs in this case, a loopback interface is configured, as shown on leaf1 in Example 3-3.

Example 3-3 Loopback interface on leaf1

admin@leaf1# show interfaces lo0
unit 0 {
    family inet {
        address 192.0.2.11/32;
    }
}

The underlay eBGP peering is between these point-to-point interfaces. Since a leaf’s loopback address is sent toward other leafs via multiple spines, each leaf is expected to install multiple, equal cost paths to every other leaf’s loopback address. In Junos, to enable ECMP routing, both the protocol (software) and the hardware need to be explicitly enabled to support it. In the case of BGP, this is enabled using the multipath knob (with the multiple-as configuration option if the routes received have the same AS_PATH length but different ASNs in the list). A subset of the eBGP configuration, for the underlay, is shown from the perspective of both spines and leaf1 in Example 3-4.

Example 3-4 BGP configuration on spine1, spine2, and leaf1

admin@spine1# show protocols bgp
group underlay {
    type external;
    family inet {
        unicast;
    }
    neighbor 198.51.100.0 {
        peer-as 65421;
    }
    neighbor 198.51.100.4 {
        peer-as 65422;
    }
    neighbor 198.51.100.8 {
        peer-as 65423;
    }
}

admin@spine2# show protocols bgp
group underlay {
    type external;
    family inet {
        unicast;
    }
    neighbor 198.51.100.2 {
        peer-as 65421;
    }
    neighbor 198.51.100.6 {
        peer-as 65422;
    }
    neighbor 198.51.100.10 {
        peer-as 65423;
    }
}

admin@leaf1# show protocols bgp
group underlay {
    type external;
    family inet {
        unicast;
    }
    export allow-loopback;
    multipath {
        multiple-as;
    }
    neighbor 198.51.100.1 {
        peer-as 65500;
    }
    neighbor 198.51.100.3 {
        peer-as 65501;
    }
}

Every leaf is advertising its loopback address via an export policy attached to the BGP group for the underlay, as shown in Example 3-4. The configuration of this policy is shown in Example 3-5, which enables the advertisement of direct routes in the 192.0.2.0/24 range to its eBGP peers.

Example 3-5 Policy to advertise loopbacks shown on leaf1

admin@leaf1# show policy-options policy-statement allow-loopback
term loopback {
    from {
        protocol direct;
        route-filter 192.0.2.0/24 orlonger;
    }
    then accept;
}
term discard {
    then reject;
}

With the other leafs configured in the same way, the spines can successfully form an eBGP peering with each leaf, as shown in Example 3-6.

Example 3-6 eBGP peering on spine1 and spine2 with all leafs

admin@spine1> show bgp summary

Threading mode: BGP I/O
Default eBGP mode: advertise - accept, receive - accept
Groups: 1 Peers: 3 Down peers: 0
Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
inet.0
                       3          3          0          0          0          0
Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
198.51.100.0          65421        191        189       0       0     1:24:41 Establ
  inet.0: 1/1/1/0
198.51.100.4          65422        184        182       0       0     1:21:12 Establ
  inet.0: 1/1/1/0
198.51.100.8          65423        180        179       0       0     1:19:35 Establ
  inet.0: 1/1/1/0

admin@spine2> show bgp summary

Threading mode: BGP I/O
Default eBGP mode: advertise - accept, receive - accept
Groups: 1 Peers: 3 Down peers: 0
Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
inet.0
                       3          3          0          0          0          0
Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
198.51.100.2          65421        194        191       0       0     1:25:52 Establ
  inet.0: 1/1/1/0
198.51.100.6          65422        183        181       0       0     1:20:57 Establ
  inet.0: 1/1/1/0
198.51.100.10         65423        180        179       0       0     1:19:21 Establ
  inet.0: 1/1/1/0

With the policy configured as shown in Example 3-5, and the BGP peering between the leafs and the spines in an Established state, the loopback address of each leaf should be learned on every other leaf in the fabric.

Consider leaf1 now, to understand how equal cost paths for another leaf’s loopback address are installed. For the loopback address of leaf2, advertised by both spine1 and spine2 to leaf1, two routes are received on leaf1. Since BGP is configured with multipath, both routes are installed as equal cost routes in software, as shown in Example 3-7.

Example 3-7 Equal cost routes to leaf2’s loopback on leaf1

admin@leaf1> show route table inet.0 192.0.2.12

inet.0: 7 destinations, 9 routes (7 active, 0 holddown, 0 hidden)
Limit/Threshold: 1048576/1048576 destinations
+ = Active Route, - = Last Active, * = Both

192.0.2.12/32      *[BGP/170] 02:10:44, localpref 100, from 198.51.100.1
                      AS path: 65500 65422 I, validation-state: unverified
                       to 198.51.100.1 via ge-0/0/0.0
                    >  to 198.51.100.3 via ge-0/0/1.0
                    [BGP/170] 02:10:44, localpref 100
                      AS path: 65501 65422 I, validation-state: unverified
                    >  to 198.51.100.3 via ge-0/0/1.0

A validation-state of unverified, as shown in Example 3-7, implies that the BGP route validation feature has not been configured (this is a feature to validate the origin and the path of a BGP route, to ensure that it is legitimate), and the route has been accepted but it was not validated.

These equal cost routes must also be installed in hardware. This is achieved by configuring the Packet Forwarding Engine (PFE) to install equal cost routes, and in turn, program the hardware, by applying an export policy under the routing-options hierarchy, as shown in Example 3-8. The policy itself simply enables per-flow load balancing. This example also demonstrates how the forwarding table, on the Routing Engine, can be viewed for a specific destination IP prefix, using the show route forwarding-table destination [ip-address] table [table-name] operational mode command.

Example 3-8 Equal cost routes in PFE of leaf1 with a policy for load-balancing per flow

admin@leaf1# show routing-options forwarding-table
export ecmp;

admin@leaf1# show policy-options policy-statement ecmp
then {
    load-balance per-flow;
}

admin@leaf1> show route forwarding-table destination 192.0.2.12/32 table default
Routing table: default.inet
Internet:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
192.0.2.12/32      user     0                    ulst  1048574     3
                              198.51.100.1       ucst      583     4 ge-0/0/0.0
                              198.51.100.3       ucst      582     4 ge-0/0/1.0

While the control plane and the route installation in both software and hardware are as expected on the leafs, the spines paint a different picture. If the loopback address of the leafs, advertised by spine1 to other leafs, is chosen as the best route, spine2 will receive and store all suboptimal paths in its routing table. Again, considering leaf1’s loopback address as an example here, spine2 has three paths for this route, as shown in Example 3-9.

Example 3-9 Multiple paths for leaf1’s loopback address on spine2

admin@spine2> show route table inet.0 192.0.2.11/32

inet.0: 10 destinations, 16 routes (10 active, 0 holddown, 0 hidden)
Limit/Threshold: 1048576/1048576 destinations
+ = Active Route, - = Last Active, * = Both

192.0.2.11/32      *[BGP/170] 15:05:38, localpref 100
                      AS path: 65421 I, validation-state: unverified
                    >  to 198.51.100.2 via ge-0/0/0.0
                    [BGP/170] 00:02:39, localpref 100
                      AS path: 65422 65500 65421 I, validation-state: unverified
                    >  to 198.51.100.6 via ge-0/0/1.0
                    [BGP/170] 00:01:02, localpref 100
                      AS path: 65423 65500 65421 I, validation-state: unverified
                    >  to 198.51.100.10 via ge-0/0/2.0

This includes the direct path via leaf1, an indirect path via leaf2, and another indirect path via leaf3. Thus, in this case, if spine2 loses the direct path via leaf1, it will start path hunting through the other suboptimal paths, until the network fully converges with all withdraws processed on all fabric nodes. This problem can be addressed by applying an export policy on the spines that adds a BGP community to all advertised routes, and then using this community on the leafs to match and reject such routes from being advertised back to the spines.

In Junos, a routing policy controls the import of routes into the routing table and the export of routes from the routing table, to be advertised to neighbors. In general, a routing policy consists of terms, which include match conditions and associated actions. The routing policy on the spines is shown in Example 3-10 and includes the following two policy terms:

  • all-bgp: Matches all BGP learned routes, accepts them, and adds a community value from the community name spine-to-leaf.

  • loopback: Matches all direct routes in the IPv4 subnet 192.0.2.0/24. The orlonger configuration option matches any IPv4 address that is equal to or longer than the defined prefix length.

Example 3-10 Policy to add a BGP community on the spines as they advertise routes to leafs

admin@spine2# show policy-options policy-statement spine-to-leaf
term all-bgp {
    from protocol bgp;
    then {
        community add spine-to-leaf;
        accept;
    }
}
term loopback {
    from {
        protocol direct;
        route-filter 192.0.2.0/24 orlonger;
    }
    then {
        community add spine-to-leaf;
        accept;
    }
}

admin@spine2# show policy-options community spine-to-leaf
members 0:15;

Once the policy in Example 3-10 is applied as an export policy on the spines for the underlay BGP group, the leafs receive all BGP routes attached with a BGP community of value 0:15. This can be confirmed on leaf2, taking leaf1’s loopback address into consideration, as shown in Example 3-11.

Example 3-11 Leaf1’s loopback address received with a BGP community of 0:15 on leaf2

admin@leaf2> show route table inet.0 192.0.2.11/32 extensive

inet.0: 9 destinations, 12 routes (9 active, 0 holddown, 0 hidden)
Limit/Threshold: 1048576/1048576 destinations
192.0.2.11/32 (2 entries, 1 announced)
TSI:
KRT in-kernel 192.0.2.11/32 -> {list:198.51.100.5, 198.51.100.7}
Page 0 idx 0, (group underlay type External) Type 1 val 0x85194a0 (adv_entry)
   Advertised metrics:
     Nexthop: 198.51.100.5
     AS path: [65422] 65500 65421 I
     Communities: 0:15
    Advertise: 00000002
Path 192.0.2.11
from 198.51.100.5
Vector len 4.  Val: 0
        *BGP    Preference: 170/-101
                Next hop type: Router, Next hop index: 0
                Address: 0x7a46fac
                Next-hop reference count: 3, Next-hop session id: 0
                Kernel Table Id: 0
                Source: 198.51.100.5
                Next hop: 198.51.100.5 via ge-0/0/0.0
                Session Id: 0
                Next hop: 198.51.100.7 via ge-0/0/1.0, selected
                Session Id: 0
                State: <Active Ext>
                Local AS: 65422 Peer AS: 65500
                Age: 3:35
                Validation State: unverified
                Task: BGP_65500.198.51.100.5
                Announcement bits (3): 0-KRT 1-BGP_Multi_Path 2-BGP_RT_Background
                AS path: 65500 65421 I
                Communities: 0:15
                Accepted Multipath
                Localpref: 100
                Router ID: 192.0.2.101
                Thread: junos-main
         BGP    Preference: 170/-101
                Next hop type: Router, Next hop index: 577
                Address: 0x77c63f4
                Next-hop reference count: 5, Next-hop session id: 321
                Kernel Table Id: 0
                Source: 198.51.100.7
                Next hop: 198.51.100.7 via ge-0/0/1.0, selected
                Session Id: 321
                State: <Ext>
                Inactive reason: Active preferred
                Local AS: 65422 Peer AS: 65501
                Age: 5:30
                Validation State: unverified
                Task: BGP_65501.198.51.100.7
                AS path: 65501 65421 I
                Communities: 0:15
                Accepted MultipathContrib
                Localpref: 100
                Router ID: 192.0.2.102
                Thread: junos-main

On the leafs, it is now a simple matter of rejecting any route that has this community to stop it from being readvertised back to the spines. A new policy is created for this, and it is applied using an and operation to the existing policy that advertises the loopback address, as shown in Example 3-12 from the perspective of leaf1.

Example 3-12 Policy on leaf1 to reject BGP routes with a community of 0:15

admin@leaf1# show policy-options policy-statement leaf-to-spine
term reject-to-spine {
    from {
        protocol bgp;
        community spine-to-leaf;
    }
    then reject;
}
term accept-all {
    then accept;
}

admin@leaf1# show policy-options community spine-to-leaf
members 0:15;

admin@leaf1# show protocols bgp
group underlay {
    type external;
    family inet {
        unicast;
    }
    export ( leaf-to-spine && allow-loopback );
    multipath {
        multiple-as;
    }
    neighbor 198.51.100.1 {
        peer-as 65500;
    }
    neighbor 198.51.100.3 {
        peer-as 65501;
    }
}

With this policy applied on all the leafs, the spines will not learn any suboptimal paths to each of the leaf loopbacks. This is confirmed in Example 3-13, with each spine learning every leaf’s loopback address via the direct path to the respective leaf.

Example 3-13 Route to each leaf’s loopback address on spine1 and spine2

admin@spine1> show route table inet.0 192.0.2.11/32

inet.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
Limit/Threshold: 1048576/1048576 destinations
+ = Active Route, - = Last Active, * = Both

192.0.2.11/32      *[BGP/170] 15:45:36, localpref 100
                      AS path: 65421 I, validation-state: unverified
                    >  to 198.51.100.0 via ge-0/0/0.0

admin@spine1> show route table inet.0 192.0.2.12/32

inet.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
Limit/Threshold: 1048576/1048576 destinations
+ = Active Route, - = Last Active, * = Both

192.0.2.12/32      *[BGP/170] 15:42:09, localpref 100
                      AS path: 65422 I, validation-state: unverified
                    >  to 198.51.100.4 via ge-0/0/1.0

admin@spine1> show route table inet.0 192.0.2.13/32

inet.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
Limit/Threshold: 1048576/1048576 destinations
+ = Active Route, - = Last Active, * = Both

192.0.2.13/32      *[BGP/170] 15:40:35, localpref 100
                      AS path: 65423 I, validation-state: unverified
                    >  to 198.51.100.8 via ge-0/0/2.0

admin@spine2> show route table inet.0 192.0.2.11/32

inet.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
Limit/Threshold: 1048576/1048576 destinations
+ = Active Route, - = Last Active, * = Both

192.0.2.11/32      *[BGP/170] 15:47:10, localpref 100
                      AS path: 65421 I, validation-state: unverified
                    >  to 198.51.100.2 via ge-0/0/0.0

admin@spine2> show route table inet.0 192.0.2.12/32

inet.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
Limit/Threshold: 1048576/1048576 destinations
+ = Active Route, - = Last Active, * = Both

192.0.2.12/32      *[BGP/170] 15:42:18, localpref 100
                      AS path: 65422 I, validation-state: unverified
                    >  to 198.51.100.6 via ge-0/0/1.0

admin@spine2> show route table inet.0 192.0.2.13/32

inet.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
Limit/Threshold: 1048576/1048576 destinations
+ = Active Route, - = Last Active, * = Both

192.0.2.13/32      *[BGP/170] 15:40:45, localpref 100
                      AS path: 65423 I, validation-state: unverified
                    >  to 198.51.100.10 via ge-0/0/2.0

Junos also offers the operator a direct way to test the policy, which can be used to confirm that a leaf’s locally owned loopback address is being advertised to the spines, and other loopback addresses learned via BGP are rejected. This uses the test policy operational mode command, as shown in Example 3-14, where only leaf1’s loopback address (192.0.2.11/32) is accepted by the policy, while leaf2’s and leaf3’s loopback addresses, 192.0.2.12/32 and 192.0.2.13/32 respectively, are rejected by the policy.

Example 3-14 Policy rejecting leaf2’s and leaf3’s loopback addresses from being advertised to the spines on leaf1

admin@leaf1> test policy leaf-to-spine 192.0.2.11/32

inet.0: 9 destinations, 11 routes (9 active, 0 holddown, 0 hidden)
Limit/Threshold: 1048576/1048576 destinations
+ = Active Route, - = Last Active, * = Both

192.0.2.11/32      *[Direct/0] 1d 04:38:27
                    >  via lo0.0

Policy leaf-to-spine: 1 prefix accepted, 0 prefix rejected

admin@leaf1> test policy leaf-to-spine 192.0.2.12/32

Policy leaf-to-spine: 0 prefix accepted, 1 prefix rejected

admin@leaf1> test policy leaf-to-spine 192.0.2.13/32

Policy leaf-to-spine: 0 prefix accepted, 1 prefix rejected

With this configuration in place, the fabric underlay is successfully built, with each leaf’s loopback address reachable from every other leaf, as shown in Example 3-15, while also preventing any path-hunting issues on the spines by using appropriate routing policies.

Example 3-15 Loopback reachability from leaf1

admin@leaf1> ping 192.0.2.12 source 192.0.2.11
PING 192.0.2.12 (192.0.2.12): 56 data bytes
64 bytes from 192.0.2.12: icmp_seq=0 ttl=63 time=3.018 ms
64 bytes from 192.0.2.12: icmp_seq=1 ttl=63 time=2.697 ms
64 bytes from 192.0.2.12: icmp_seq=2 ttl=63 time=4.773 ms
64 bytes from 192.0.2.12: icmp_seq=3 ttl=63 time=3.470 ms
^C
--- 192.0.2.12 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max/stddev = 2.697/3.490/4.773/0.790 ms

admin@leaf1> ping 192.0.2.13 source 192.0.2.11
PING 192.0.2.13 (192.0.2.13): 56 data bytes
64 bytes from 192.0.2.13: icmp_seq=0 ttl=63 time=2.979 ms
64 bytes from 192.0.2.13: icmp_seq=1 ttl=63 time=2.814 ms
64 bytes from 192.0.2.13: icmp_seq=2 ttl=63 time=2.672 ms
64 bytes from 192.0.2.13: icmp_seq=3 ttl=63 time=2.379 ms
^C
--- 192.0.2.13 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max/stddev = 2.379/2.711/2.979/0.220 ms

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.