Home > Articles > Operating Systems, Server > Linux/UNIX/Open Source

RPDB—The Linux Policy Routing Implementation

Under Linux, the implementation of Policy Routing structure is carried out through the mechanism of the Routing Policy DataBase (RPDB). The RPDB is the cohesive set of routes, route tables, and rules. Since addressing is a direct function of these elements, it also is part of the system. What the RPDB primarily does is provide the internal structure and mechanism for implementing the rule element of Policy Routing. It also provides the multiple routing tables available under Linux.

Linux with the RPDB and the complete rewrite of the IP addressing and routing structures in kernel 2.1 and higher sustains 255 routing tables, and 2^32 rules. That is one rule per IP address under IPv4. In other words, you can specify a rule to govern every single address available in the entire IPv4 address space. That works out to over 4 billion rules.

The RPDB itself operates upon the rule and route elements of the triad. In the operation of RPDB, the first element considered is the operation of the rule. The rule, as you saw, may be considered as the filter or selection agent for applying Policy Routing.

The following text about the RPDB and the definition of Policy Routing is adapted from Alexey Kuznetsov's documentation for the IPROUTE2 utility suite, with Alexey's permission. I have rewritten parts of the text to clarify some points. Any errors or omissions should be directed to me.

Classic routing algorithms used on the Internet make routing decisions based only on the destination address of packets and, in theory but not in practice, on the TOS field. In some circumstances you may want to route packets differently, depending not only on the destination addresses but also on other packet fields such as source address, IP protocol, transport protocol ports, or even packet payload. This task is called Policy Routing.

To solve this task, the conventional destination-based routing table, ordered according to the longest match rule, is replaced with the RPDB, which selects the appropriate route through execution of rules. These rules may have many keys of different natures, and therefore they have no natural order except that which is imposed by the network administrator. In Linux the RPDB is a linear list of rules ordered by a numeric priority value. The RPDB explicitly allows matching packet source address, packet destination address, TOS, incoming interface (which is packet meta data, rather than a packet field), and using fwmark values for matching IP protocols and transport ports. Fwmark is the packet filtering tag that you will use in Chapter 6 and is explained later on in this chapter in the section "System Packet Paths—IPChains/NetFilter."

Each routing policy rule consists of a selector and an action predicate. The RPDB is scanned in the order of increasing priority, with the selector of each rule applied to the source address, destination address, incoming interface, TOS, and fwmark. If the selector matches the packet, the action is performed. The action predicate may return success, in which case the rule output provides either a route or a failure indication, and RPDB lookup is then terminated. Otherwise, the RPDB program continues on to the next rule.

What is the action semantically? The natural action is to select the nexthop and the output device. This is the way a packet path route is selected by Cisco IOS; let us call it "match & set." In Linux the approach is more flexible because the action includes lookups in destination-based routing tables and selecting a route from these tables according to the classic longest match algorithm. The "match & set" approach then becomes the simplest case of Linux route selection, realized when the second level routing table contains a single default route. Remember that Linux supports multiple routing tables managed with the ip route command.

At startup, the kernel configures a default RPDB consisting of three rules:

  1. Priority 0: Selector = match anything

      Action = lookup routing local table (ID 255)

      The local table is the special routing table containing high priority control routes for local and broadcast addresses.

      Rule 0 is special; it cannot be deleted or overridden.

  2. Priority 32766: Selector = match anything

      Action = lookup routing main table (ID 254)

      The main table is the normal routing table containing all non-policy routes. This rule may be deleted or overridden with other rules.

  3. Priority 32767: Selector = match anything

      Action = lookup routing table default (ID 253)

      The table default is empty and reserved for post-processing if previous default rules did not select the packet. This rule also may be deleted.

Do not mix routing tables and rules. Rules point to routing tables, several rules may refer to one routing table, and some routing tables may have no rules pointing to them. If you delete all the rules referring to a table, then the table is not used but still exists. A routing table will disappear only after all the routes contained within it are deleted.

Each RPDB entry has additional attributes attached. Each rule has a pointer to some routing table. NAT and masquerading rules have the attribute to select a new IP address to translate/masquerade. Additionally, rules have some of the optional attributes that routes have, such as realms. These values do not override those contained in routing tables; they are used only if the route did not select any of those attributes.

The RPDB may contain rules of the following types:

  • unicast—The rule prescribes returning the route found in the routing table referenced by the rule.

  • blackhole—The rule prescribes dropping a packet silently.

  • unreachable—The rule prescribes generating the error Network is unreachable.

  • prohibit—The rule prescribes generating the error Communication is administratively prohibited.

  • nat—The rule prescribes translating the source address of the IP packet to some other value.

You will see how these rule actions operate primarily in Chapters 5 and 6. There you will make hands-on use of the command set and implement several Policy Routing structures.

The RPDB was the first implementation of and first mention within the Linux community of the concept of Policy Routing. When you consider that the ip utility was first released in late spring of 1997, and that Alexey's documentation was released in April of 1999 coinciding with the official Linux 2.2 kernel release in May of 1999, then you realize that the Linux Policy Routing structure is already over four years old. In Internet time that is considered almost ancient. But as with most new network subjects, such as IPv6 and Policy Routing, Linux leads the way.

The RPDB itself was an integral part of the rewrite of the networking stack in Linux kernel 2.2. The way in which the Policy Routing extensions are accessed is through a defined set of additional control structures within the Linux kernel. These are the NETLINK and RT_NETLINK objects and related constructs. If you are curious about the programmatic details you can look through the source to the ip utility itself. The call structure and reference to the kernel internals is laid out quite well.

One of the important features that makes the RPDB implementation so special is that it is completely backward-compatible with the standard network utilities. You do not need to use the ip utility to perform standard networking tasks on your system. You can use ifconfig and route and get along quite fine. In fact, you can even compile the kernel without the NETLINK family objects and still use standard networking tools. It is only when you need to use the full features of the RPDB that you need to use the appropriate utility.

This backward compatibility is due to the RPDB being a complete replacement of the Linux networking structure, especially as it relates to routing. The addressing modalities for Policy Routing, as discussed in the "Address" section earlier in this chapter (and illustrated in depth in Chapter 5), were also implemented as part of this change. But the main changes, besides the addition of the rule element, were the changes to the route element. Drawing upon Alexey's documentation again I provide the following information on the route element construct.

In the RPDB, each route entry has a key consisting of the protocol prefix, which is the pairing of the network address and network mask length, and optionally the TOS value. An IP packet matches the route if the highest bits of the packet's destination address are equal to the route prefix, at least up to the prefix length, and if the TOS of the route is zero or equal to the TOS of the packet.

If several routes match the packet, the following pruning rules are used to select the best one:

  1. The longest matching prefix is selected; all shorter ones are dropped.

  2. If the TOS of some route with the longest prefix is equal to the TOS of the packet, routes with different TOSes are dropped.

  3. If no exact TOS match is found and routes with TOS=0 exist, the rest of the routes are pruned. Otherwise the route lookup fails.

  4. If several routes remain after steps 1–3 have been tried, then routes with the best preference value are selected.

  5. If several routes still exist, then the first of them is selected.

Note the ambiguity of action 5. Unfortunately, Linux historically allowed such a bizarre situation. The sense of the word "the first" depends on the literal order in which the routes were added to the routing table, and it is practically impossible to maintain a bundle of such routes in any such order.

For simplicity we will limit ourselves to the case wherein such a situation is impossible, and routes are uniquely identified by the triplet of prefix, TOS, and preference. Using the ip command for route creation and manipulation makes it impossible to create non-unique routes.

One useful exception to this rule is the default route on non-forwarding hosts. It is "officially" allowed to have several fallback routes in cases when several routers are present on directly connected networks. In this case, Linux performs "dead gateway detection" as controlled by Neighbour Unreachability Detection (nud) and references from the transport protocols to select the working router. Thus the ordering of the routes is not essential. However, in this specific case it is not recommended that you manually fiddle with default routes but instead use the Router Discovery protocol. Actually, Linux IPv6 does not even allow user-level applications access to default routes.

Of course, the preceding route selection steps are not performed in exactly this sequence. The routing table in the kernel is kept in a data structure that allows the final result to be achieved with minimal cost. Without depending on any particular routing algorithm implemented in the kernel, we can summarize the sequence as this: Route is identified by the triplet {prefix,tos,preference} key, which uniquely locates the route in the routing table.

Each route key refers to a routing information record. The routing information record contains the data required to deliver IP packets, such as output device and next hop router, and additional optional attributes, such as path MTU (Maximum Transmission Unit) or the preferred source address for communicating to that destination.

It is important that the set of required and optional attributes depends on the route type. The most important route type is a unicast route, which describes real paths to other hosts. As a general rule, common routing tables contain only unicast routes. However, other route types with different semantics do exist. The full list of types understood by the Linux kernel is as follows:

  • unicast—The route entry describes real paths to the destinations covered by the route prefix.

  • unreachable—These destinations are unreachable; packets are discarded and the ICMP message host unreachable (ICMP Type 3 Code 1) is generated. The local senders get error EHOSTUNREACH.

  • blackhole—These destinations are unreachable; packets are silently discarded. The local senders get error EINVAL.

  • prohibit—These destinations are unreachable; packets are discarded and the ICMP message communication administratively prohibited (ICMP Type 3 Code 13) is generated. The local senders get error EACCES.

  • local—The destinations are assigned to this host, the packets are looped back and delivered locally.

  • broadcast—The destinations are broadcast addresses, the packets are sent as link broadcasts.

  • throw—Special control route used together with policy rules. If a throw route is selected, then lookup in this particular table is terminated, pretending that no route was found. Without any Policy Routing, it is equivalent to the absence of the route in the routing table, the packets are dropped, and ICMP message net unreachable (ICMP Type 3 Code 0) is generated. The local senders get error ENETUNREACH.

  • nat—Special NAT route. Destinations covered by the prefix are considered as dummy (or external) addresses, which require translation to real (or internal) ones before forwarding. The addresses to translate to are selected with the attribute via.

  • anycast (not implemented)—The destinations are anycast addresses assigned to this host. They are mainly equivalent to local addresses, with the difference that such addresses are invalid to be used as the source address of any packet.

  • multicast—Special type, used for multicast routing. It is not present in normal routing tables.

Linux can place routes within multiple routing tables identified by a number in the range from 1 to 255 or by a name taken from the file /etc/iproute2/rt_tables. By default all normal routes are inserted to the table main (ID 254), and the kernel uses only this table when calculating routes.

Actually, another routing table always exists that is invisible but even more important. It is the local table (ID 255). This table consists of routes for local and broadcast addresses. The kernel maintains this table automatically, and administrators should not ever modify it and do not even need to look at it in normal operation.

In Policy Routing, the routing table identifier becomes effectively one more parameter added to the key triplet {prefix,tos,preference}. Thus, under Policy Routing the route is obtained by {tableid,key triplet}, identifying the route uniquely. So you can have several identical routes in different tables that will not conflict, as was mentioned earlier in the description of action 5 and "the first" mechanism associated with action 5.

These changes to the route element provide one of the core strengths of the RPDB, multiple independent route tables. As you will see in Chapter 5, the rule element alone can only perform a selection or filter operation. It is still up to the route to indicate where the packet needs to go next. Adding on top of these elements the QoS mechanisms to determine and set the TOS field and the ability to route by the TOS field provides you with the most powerful and flexible routing structure available under IPv4 and IPv6.

In summary, the RPDB is the core facility for implementing Policy Routing under Linux. The RPDB streamlines the mechanism of dealing with rules and multiple route tables. All operations of the rule and route structure are centralized into a single point of access and control. The addition of various alternate actions and destinations for routes and rules through the RPDB allows you to fine-tune the mechanism of Policy Routing without needing to hack sections of the networking code.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020