Home > Articles > Operating Systems, Server

  • Print
  • + Share This
This chapter is from the book

6.3 The iproute2 Toolset

The iproute2 package of utilities, written by Alexey Kuznetsov, is a very powerful set of tools for managing network interfaces and connections on Linux boxes. It completely replaces the functionality of ifconfig, route, and arp tools in the more traditional net-tools toolkit. The term traditional really refers to the toolset that mimics interface and routing-table management found on other Unices, the toolset we've been using so far. Because it is tightly integrated with the kernel, iproute2 is available only for Linux. I'm afraid this is one of the reasons that iproute2 hasn't received a lot of mainstream attention—old fogey network administrators like myself have been comfortable with the 15–20-year-old toolset we grew up with and look outside that framework only when we need something special. Well, no more. iproute2 is worth the trouble of learning, even if it is only for Linux.

To help introduce the syntax, we'll revisit some of the tasks we've already completed and then strike out from there. First off, you'll need the toolset loaded on your system. The package name is iproute for both Debian and RedHat. If you'd like to pull it from the upstream source and build it yourself, the main site is ftp://ftp.inr.ac.ru/ip-routing/. You'll be presented with a list of mirrors when you connect; you should be polite and use a nearby mirror if one is available. You'll also need to ensure that your running kernel was compiled with the following options:

Networking options --->
     [*]  Kernel/User netlink socket
     [*]      Routing messages

If you don't have these, you'll receive one of the following error messages when trying to use the tools, depending upon what exactly your kernel is missing:

tony@vivaldi:~$ /sbin/ip addr show
Cannot open netlink socket: Address family not supported by protocol

tony@bach:~$ /sbin/ip addr show
Cannot send dump request: Connection refused

tony@vivaldi:~$ /sbin/ip rule list
RTNETLINK answers: Invalid argument
Dump terminated

Once you have the tools built and/or installed, and you're running a 2.2 or 2.4 kernel with the appropriate support, you fire up ip and get the potentially bewildering:

tony@bach:~$ /sbin/ip
Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }
where OBJECT := { link | addr | route | rule | neigh | tunnel |
                       maddr | mroute | monitor }
           OPTIONS := { -V[ersion] | -s[tatistics] | -r[esolve] |
                          -f[amily] { inet | inet6 | ipx | dnet | link } | -o[neline] }

At first, this doesn't seem much better than what ifconfig -? spits out. It might be helpful to list out some of the common COMMAND keywords and to indicate that if no command is given, the default keyword is show or list (which are identical). The complete list of valid COMMAND keywords contains add, delete, set, flush, change, replace, get, and show or list, plus abbreviations for these. However, not all of those keywords are applicable for all OBJECTs. Fortunately, the command syntax is well designed and very regular (consistent). It will take you little time to pick it up, and you can always issue ip OBJECT help to view help for that option. But if you're like me, you're not going to remember the syntax diagrams presented by help; you remember things by using them.

6.3.1 iproute2: Common ifconfig, route, and arp Forms

tmancill@drone4:~$ /sbin/ip addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
      link/ether 00:50:56:40:40:92 brd ff:ff:ff:ff:ff:ff
      inet brd scope global eth0

The usage in this first form should be pretty clear. We want to view the address object corresponding to device eth0. There are several things to note here:

  • The very first 2: is a unique interface index. Later on, we'll change the name of our interface, but the interface index always sticks with the same physical device.

  • The mtu and device flags are directly from the ifconfig output.

  • qdisc is short for queuing discipline, which is Greek for What algorithm do we use for this interface to determine what is the next packet to be sent?

  • The qlen is the maximum depth of the queue in packets. This can be tuned to allow for greater bursts at routers acting as the boundary between high-speed and low-speed interfaces. Note that larger buNers are not always a good thing.

  • The link line shows detail about the link-level hardware, including the type (ether), the link-layer address (in this case, a MAC address), and the link-layer broadcast address.

  • The inet line is actually an address line for an inet family adapter, which is synonymous with IPv4. There may be multiple lines here—a given link (adapter) may have several addresses of one or multiple types. (If you surmise that this foreshadows IP aliases, you are correct, and there is more than that.)

  • The interface's IP address is indicated together with the netmask, which is displayed in the maskbits notation we talked about on page 27. The network is not explicitly indicated, although we know how to calculate this quickly by taking a look at the combination of the IP address and the number of bits in the subnet mask. The brd address on this line is the IP broadcast address.

  • Protocol addresses have a scope, set in this case to the default global. Don't worry about this for now, but it's part of policy-based routing, which we will touch on briefly in Section 8.4.1.

There's a lot of information in there, but unlike ifconfig, you get interface statistics only if you ask for them explicitly with -statistics, which can be abbreviated with -s. The next command displays the routing table.

tmancill@vivaldi:~$ /sbin/ip -statistics link show dev eth0
2: eth0: <BROADCAST,MULTICAST,PROMISC,UP> mtu 1500 qdisc pfifo_fast qlen 100
      link/ether 00:50:56:40:40:86 brd ff:ff:ff:ff:ff:ff
      RX: bytes     packets    errors   dropped   overrun   mcast
      18196332     50205      0           0             0              0
      TX: bytes      packets   errors    dropped   carrier     collsns
      33017745      55970     0           0              0             0

tmancill@vivaldi:~$ /sbin/ip route show dev eth1 proto kernel scope link src dev eth0 proto kernel scope link src
default via dev eth1

If you're asking yourself What's the point? I knew how to do these things before, please bear with me. We're not even scratching the surface with iproute2 yet. As you've probably noticed, all of the addresses are displayed as dotted IP addresses. If you want hostname/network name resolution, you'll have to use the -r switch to enable it. (This is a nice change; if I had a nickel for every time I've meant to add the -n switch when invoking route ) What about configuring an interface?

# assign the IP address and subnet mask
root@vivaldi:~# ip addr add dev eth1

# double check to make sure it looks OK
root@vivaldi:~# ip addr list dev eth1
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast qlen 10

      link/ether 00:50:56:40:40:87 brd ff:ff:ff:ff:ff:ff
      inet scope global eth1

# check the routing table - hey, where's the route for!
root@vivaldi:~# ip route show dev eth0 proto kernel scope link src
default via dev eth0

# doh - the link layer has to be brought up
root@vivaldi:~# ip link set dev eth1 up

root@vivaldi:~# ip route show dev eth1 proto kernel scope link src dev eth0 proto kernel scope link src
default via dev eth0

# add route with nice compact /maskbits notation
root@vivaldi:~# ip route add via

The gotchas I noticed when I first started using iproute2 were either not initializing a link as up, or not indicating the subnet mask along with the IP address. When this occurs, ip assumes you meant /32 (the only safe assumption to be made—far better than using the class A, B, and C subnet semantics). The end eNect is that you don't automatically get an entry in the routing table for that subnet, and you can't ping anything even though the device is up.

iproute2 Documentation

The primary documentation available with the package is doc/ip-cref.tex, a TEX source formatted file.1 You can compile the source into a DVI (DeVice Independent) file using LATEX. However, loading a complete TEX distribution to format a single piece of documentation may be out of the question for some folks. The Debian package includes a PostScript formatted version of this file in /usr/share/doc/iproute/, and I have made the latest version of the file available via my site at http://mancill.com/linuxrouters.

See http://www.linuxdoc.org/HOWTO/Adv-Routing-HOWTO.html for a fantastic HOWTO using iproute2.

iproute2 is very much in tune with the link-layer, and as you might expect, oNers arp functionality and more. You can determine what machines are visible via the various links on your system with ip neigh list, the neigh being short for neighbor or neighbour. In the output below, the first entry indicates that drone3 tried to contact but didn't receive a reply to its ARP requests. The reachable state means that all is well—this system is an active member of our ARP cache. An entry in the stale state was recently reachable but will need to be checked before use again. The delay state indicates that the kernel is in the process of checking a stale neighbor at the moment. In case you're wondering, nud is Neighbor Unreachability Detection, as in NUD STATE in the kernel.

drone3:~# ip neigh list dev eth0 nud failed dev eth0 lladdr 00:50:56:40:40:86 nud reachable dev eth0 lladdr 00:50:56:40:40:bf nud stale dev eth1 lladdr 00:50:56:c0:00:02 nud delay 

6.3.2 iproute2 Parlor Tricks

By now you should be getting used to ip [OPTIONS] OBJECT COMMAND [ARGS], so let's use it to do some things we haven't done yet. From the discussion of the hacker tool hunt in Section 3.3.4, you may recall that hunt sports an ARP spoofing module that can wreak havoc, even on switched networks. And DNS spoofing just makes it worse. Now, the hacker can spoof an ARP address but cannot very easily spoof a MAC address, so let's add a permanent entry into our ARP cache for the IP address of our DNS server. Assuming that you know the current mapping is OK, do the following.

drone3:~# ip neighbor change dev eth1 to nud permanent 
drone3:~# ip neighbor list dev eth0 lladdr 00:50:56:40:40:86 nud stale dev eth1 lladdr 00:50:56:c0:00:02 nud permanent dev eth0 lladdr 00:50:56:40:40:bf nud stale 

If you're paranoid (and it's healthy to be a little bit paranoid), use the additional lladdr macaddress argument to explicitly indicate the MAC address. (Just don't mistype the MAC address or you'll create an instant outage.)

You can use ip addr add to add IP aliases like those in Section 3.2 by supplying the address with subnet mask and indicating the correct physical link with dev dev . However, you need to consider whether or not you (or others) may use ifconfig to look at the status of the aliased interfaces. Unless you also use the name dev:alias argument when you add the address, the alias will exist but won't be visible via ifconfig.

What if we'd like to completely rename an interface? It may sound arbitrary to run around renaming interfaces—maybe I'm vain or have nothing else to do. But suppose I work in a pretty big IT shop with a large number of systems, all of which are connected to a network we refer to as the shadow lan. That network is used for backups and NFS traffic, and we like to keep tabs on network utilization of those interfaces via SNMP. The machine names of the interfaces might be anything, requiring us later to correlate the correct interface name with the function. Not so if we rename the interface. Here's how you would rename eth2 to shadow:

root@drone2:~# ip link set dev eth2 down 
root@drone2:~# ip link set dev eth2 name shadow 
root@drone2:~# ip link set dev shadow up 
root@drone2:~# ifconfig shadow 
shadow       Link encap:Ethernet HWaddr 00:50:56:40:40:81
                   UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 
                   RX packets:32 errors:0 dropped:0 overruns:0 frame:0 
                   TX packets:95 errors:0 dropped:0 overruns:0 carrier:0
                   collisions:0 txqueuelen:100
                   RX bytes:2984 (2.9 Kb) TX bytes:5558 (5.4 Kb)
                   Interrupt:10 Base address:0x10a0 

You can determine what route or interface a packet is going to take by using forms of ip route get ADDR. For the address you must specify a single IP address or a network range in network/maskbits notation. Because the TOS field is a crucial part of policy-based routing (deciding how to route a packet based on either its source address or the value of its TOS field), you can set it for your test, too. You may also specify a source address using from ADDR and ask the tool to assume that the packet is going to be received via a particular interface with iif, or sent via the interface specified with oif. ip route get ADDR [from ADDR iif STRING] [oif STRING] [tos TOS]

The ability to specify the sending interface is interesting, because it gives you the ability to ask yourself what if the packet were actually sent via this other interface? This includes the source address that will be placed on the packet and the gateway to be used. (By the way, there is a good section on the source-address selection algorithm in Appendix A of Kuznetsov's IP Command Reference, the documentation that accompanies iproute2.) In the example below, you'll note that the first time I specify the network to reach, the command indicates that the packet will be sent on interface vmnet4 with a source address on that network. However, if the packet were to be sent via eth0, it would have a diNerent source address and would utilize the gateway (indicated by via). If we're experiencing problems, perhaps it's because that gateway is masquerading everything passing through it, or the hosts on don't have a route to respond to YMMV, but I think it's a nice tool.

tony@bach:~$ ip route get
broadcast dev vmnet4 src
       cache <local,brd> mtu 1500 
tony@bach:~$ ip route get oif eth0 via dev eth0 src
       cache mtu 1500 

For our final trick, ip route list cache [ADDR] is going to list out routing decisions made in recent history by our system. Basically, this file is a notepad for the routing decisions performed by the kernel. Because the kernel considers many criteria when selecting a route, there may be more entries in the output than you expect. For instance, whenever packets flow between a given source and destination, that constitutes an entry. For packets following this same path, but with a diNerent TOS value, you'll see another (very similar) entry. There's a lot of information here, and I encourage you to play with the command on a system where you're familiar with the sort of traJc (that should be) flowing through it. Piecing together why entries are there (and how they got there) is a good exercise. I'd recommend that you grep out the local entries initially, since they can trip you up trying to grok them at first, or specify an ADDR explicitly: tony@java:~$ ip -s route list cache from via dev eth0 src cache <src-direct> users 1 used 11 age 202sec mtu 1500 iif eth1 via dev eth0 src cache users 1 age 206sec mtu 1500

In the output above, the entries matching are displayed. It may help to know that java is a masquerading router, with two interfaces (on the outside) and (the inside). is a client contacting The first line tells us that we reached the address via the gateway, with a packet that originated from The input interface for that packet was eth1, which is further confirmed by src, the IP address of the gateway used by the next hop (or the from address if it's on the same network). So we can conclude that and are on the same network (which we already knew) and that is using as its gateway to reach The use of -s displays additional statistics about the entry in the cache, including how many diNerent IPs used the route and how recently the route was used.

The second line of output tells us something about the return path. Packets from came via java's default gateway, over device eth0, which received the packet on address Although we already knew that is the address of eth0, the src field is more interesting when you're working with a device that has multiple IP addresses. So ip knows a lot about routing on our systems, eh? Actually, it's parsing /proc/net/rt cache and /proc/net/rt cache stat for us, but we won't quibble because it's very convenient, and this is how most of the networking commands get their information anyway. I know that there are a mind-numbing lot of numbers flying around in this example, but that's really all routers do, sling numbers around. If you really want to melt your brain, take a look at the output of ip -6 -s route cache on a moderately busy IPv6 router.

If browsing routing table minutiae doesn't strike your fancy, ip can also be used to configure tunnels. We will use this functionality throughout the rest of the chapter to build various types of tunnels.

6.3.3 iproute2: Tunneling IP over IP

The ip tunnel command is (alarmingly!) one of the simplest of the ip command forms. It covers three types of IPv4 tunnels (the modes): ipip, which is IPv4 inside of IPv4, gre (Generic Routing

Encapsulation), another IPv4 within IPv4 protocol, and sit (Simple Internet Transition), which is IPv6 inside IPv4 and intended for phased migration to the IPv6 protocol. Tunnels are typically deployed when the pathway between two networks cannot route the traJc for those networks directly because the IP address space of one or both of the networks is not routable along the pathway. Conceptually, it's a little bit like the packet is masqueraded on one end so that it can traverse the tunnel and then unmasqueraded on the other end so that it can continue on toward its destination. But in reality, the IP payload on the tunnel is packaged up inside another perfectly normal packet that is sent from one end of the tunnel to the other. Take a gander at Figure 6.2 for an artist's rendition of what's happening here.

Figure 6.2Figure 6.2 General Tunneling Setup

Simple enough, right? We have two boxes separated by an arbitrarily complex network, the key attribute of which is that we can count on most of our packets sent from the external interface (wan0) of routerA to reach the external interface of routerB. We would like systems on subnetA to be able to reach subnetB and vice versa. So we configure a tunnel to carry this traJc between the two routers and add a route to routerA to send all traJc destined for via the tunnel device on that box, and a corresponding route to routerB for the traffic.

Now, what do I mean by the term tunnel device? This is a virtual device—it shows up in an ifconfig, it has an IP address, and you can assign a route to it—but it doesn't correspond to a real physical device. By being a device (and not merely a set of rules in the kernel) it provides a handy abstraction in a paradigm that network administrators are used to manipulating. When you send something via a tunnel device, say from (clientA) to (clientB), the packet is wholly encapsulated (source and destination address included) into another packet that has source address and destination Now, the intervening network knows how to find, and when the packet arrives, it is examined by routerB, the headers are peeled oN, and it is forwarded on to clientB, just as if it had traversed the path without the tunnel. Of course, for clientB to respond to clientA, the same procedure must be followed in reverse. The nice bit is that the clients need not be aware that this is occurring.

Configuring an IPIP Tunnel

As you've come to expect, you'll need kernel support for this example. The feature is compiled into the kernel if you have a message like IPv4 over IPv4 tunneling driver in the output of dmesg after boot or after you load the ipip module. The module is enabled by compiling support for:

Networking options ---> <M> IP: tunneling 
Constructing a tunnel consists of the following steps: 
  1. Create a tunnel device, specifying the type of tunnel as well as the local and remote ends of the table. You must specify a name for your tunnel device. I use iptb to indicate a tunnel to subnetB and ipta for the tunnel to subnetA. The mode tells ip what type of encapsulation to use. The local address is what you want as the source address for any tunneled packets from this router—in other words, the address of wan0 on routerA. The remote address is the address to which the tunneled packets are to be sent.

  2. Assign an IP address to our tunnel device. Although you are free to choose almost anything within reason here, the convention is to use the same IP address as the internal address on the router (i.e., eth0 on routerA). The best reason for this is that when you initiate a packet directly from routerA, it won't have yet another IP that needs to be resolved and routed. Also, these addresses are incorporated already into the routes we're going to add.

  3. Configure the interface to be administratively up. (This can be easy to forget until you become accustomed to it.)

  4. Add a route for the distant subnet that points to the tunnel. The commands for both ends of the tunnel are spelled out below.

  5. routera:~$ ip tunnel add iptb mode ipip local remote 
    routera:~$ ip addr add dev iptb 
    routera:~$ ip link set iptb up 
    routera:~$ ip route add dev iptb 
    routerb:~$ ip tunnel add ipta mode ipip local remote 
    routerb:~$ ip addr add dev ipta 
    routerb:~$ ip link set ipta up 
    routerb:~$ ip route add dev ipta 

    Now, there had better be something on the other end expecting the IP-within-IP encapsulated packets, or they're not going to get very far. Therefore, we need to perform an almost identical configuration on routerB. This type of tunnel is not like a PPP device where the devices speak a link-layer protocol with each other. RouterA has no idea whether routerB (or the IP address specified by remote) is going to do anything with the encapsulated IP packets. It just knows to perform the encapsulation, slap the remote IP address on the header, and send the packet on its way.

    I encourage you to set up a tunnel and then use a packet sniNer to take a look at the packets generated in comparison to normal IPv4 traJc (ethereal is great for this). Note: Be careful about adding default routes over your tunnels. While this may be what you want to do, keep in mind that unless you're using some source-based routing (separate routing tables, a topic in Section 8.4.1), you cannot have multiple default routes, and your router still has to be able to find the remote end of the tunnel without using the tunnel itself! Invariably, when I make these kinds of mistakes, I'm on the far end of the tunnel, which is where it really hurts, because then I need another way in. Fortunately, the first time I configured this, both ends were located in the same building, so it didn't take too much walking back and forth before I got the procedure straight.

    IPIP Tunnel Deployment Considerations

    Hopefully it's evident how to approach the tunneling between the Ethernet segments behind cesium and xenon. The combination of media types is immaterial. You can use tcpdump and friends to monitor the tunnel device itself—a great way to troubleshoot a configuration that isn't working the way you expect it to. Also, there's no reason to stop at a single tunnel between two routers. You can build arbitrarily complex topologies where a packet may traverse several tunnels before reaching its destination, the only requirements being that routes exist along the way (you'd have to have that anyway) and that there aren't any firewalls that take oNense to IP protocol 4 (the packet type used for IPIP) or prevent two-way communication between the tunnel endpoints. If you might be tunneling across a firewall, realize that the IPIP protocol consists of raw IP packets of protocol 4, no UDP or TCP, and thus no ports. If you own the piece in the middle and would like, for some reason, to block tunnels (maybe because folks are using them to run telnet after the big boss made it clear that telnet was dangerous and not to be tolerated?), you can use the following iptables command on the intervening router to prevent IPIP tunneling from occurring:

    iptables -t filter -A FORWARD --proto 4 -j DROP 

    GRE Tunnels

    If you're interested in running GRE instead of IPIP, you have merely to indicate mode gre when you configure the tunnel. (Well, you'll need to modprobe ip gre after enabling this option as a module in the kernel.) If you don't change anything else, you'll end up with essentially the same tunnel, but this time using raw IP protocol 47. This brings up a good point about IPIP tunnels and security. They're not the least bit secure in the sense of a VPN (we'll get to that later in the book); everything you send through the tunnel is available to folks in the intervening network. GRE addresses this to some extent. Take a look at the syntax for ip tunnel and you'll notice the three options on the third line that are applicable only to GRE tunnels.

    tony@bach:~$ ip tunnel help 
    Usage: ip tunnel { add | change | del | show } [ NAME ] 
                    [ mode { ipip | gre | sit } ] [ remote ADDR ] [ local ADDR ]
                    [ [i|o]seq ] [ [i|o]key KEY ] [ [i|o]csum ]
                    [ ttl TTL ] [ tos TOS ] [ [no]pmtudisc ] [ dev PHYS_DEV ]

    By specifying key KEY , where KEY is a 32-bit integer, you can prevent folks from arbitrarily inserting traJc into the tunnel. The tunnel device will only accept GRE packets containing the correct key. This buys you something if the hacker doesn't have access to capture packets in the stream between the tunnel endpoints, because the key will have to be guessed by brute force. But the key is stored in cleartext in the packet header. The csum option tells the tunnel device to generate checksums for outgoing packets and require that incoming packets contain valid checksums. Finally, the seq flag indicates that sequence numbers should be included in the GRE headers (used for sequential packet numbering). This is a counter that starts at one and increments for each packet. This actually provides somewhat better security against someone trying to slip random packets into the stream, because they have to get the sequence number correct. (Note: I noticed that you cannot use ip tunnel change to enable sequence numbers after the tunnel has been configured up, so use this option when you first add the tunnel.) The [i|o] indicates whether the parameter is to be applied to inbound or outbound traffic on the tunnel. Not specifying a direction means that the parameter should be used for both directions.

    GRE is a more complicated protocol than IPIP and therefore supports more options, such as multicast routing, recursion prevention, the options discussed in the previous paragraph, and others. The cost compared to that of IPIP is a larger header, and you're still missing the security of full-blown encryption that we'll talk about in Section 6.4. Still, it's an invaluable tool to have on your Linux router when you need to communicate with a traditional router that speaks GRE.

    Various Other ip tunnel Commands

    You can add, change, del,2 and show (or list) tunnels on your system. You can specify specific ttl and tos values for the packets while they traverse the tunnel. If you do not, the encapsulating header inherits whatever value the packet has just before it enters the tunnel. You can indicate PMTU (Path MTU discovery) is not to occur on this tunnel, which means that the system will not negotiate an MTU. By default this is oN, and it cannot be used in conjunction with setting a fixed ttl (since PMTU needs the TTL field to do its job). If you'd like to set the MTU of the tunnel interface, you can always use ip link set dev TDEV mtu MTU , just as you would for a real interface. If you receive an error message like ioctl: No buffer space available while working with your tunnel devices, make sure that you're not re-adding a tunnel device that hasn't been deleted. You can use ip -s tunnel list [dev] to list some statistics specific to the tunnel protocol you're running. Finally, by using dev DEV , you tie a tunnel device to transmit only via particular interface on the system, meaning that if your route to the endpoint changes to point out a diNerent physical device, the tunnel won't follow. (If you do not specify a device, the tunnel will use any device it can reach at the far endpoint.)

    Tunneling Gedanken Experiments

    Can you create a tunnel across a masquerading router/NAT? Does it matter if there is more than one NAT in the path between the tunnel endpoints? Can you make it work for a single tunnel, but not if there are multiple tunnels? What about if there are multiple tunnels with a common endpoint? (You may want to use more than just your GehirnÑI found it necessary to fire up at least four different copies of tcpdump before I was satisfied with what I saw.) Think about it, try it.

  • + Share This
  • 🔖 Save To Your Account