Home > Articles > Open Source > Python

This chapter is from the book

The Regular Expression Module

The re module provides two ways of working with regexes. One is to use the functions listed in Table 13.4 (rightarrow.gif 502), where each function is given a regex as its first argument. Each function converts the regex into an internal format—a process called compiling—and then does its work. This is very convenient for one-off uses, but if we need to use the same regex repeatedly we can avoid the cost of compiling it at each use by compiling it once using the re.compile() function. We can then call methods on the compiled regex object as many times as we like. The compiled regex methods are listed in Table 13.6 (rightarrow.gif 503).

Table 13.4. The Regular Expression Module's Functions

Syntax

Description

re.compile(r, f)

Returns compiled regex r with its flags set to f if specified. (The flags are described in Table 13.5.)

re.escape(s)

Returns string s with all nonalphanumeric characters backslash-escaped—therefore, the returned string has no special regex characters

re.findall(r, s, f)

Returns all nonoverlapping matches of regex r in string s (influenced by the flags f if given). If the regex has captures, each match is returned as a tuple of captures.

re.finditer(r, s, f)

Returns a match object for each nonoverlapping match of regex r in string s (influenced by the flags f if given)

re.match(r, s, f)

Returns a match object if the regex r matches at the start of string s (influenced by the flags f if given); otherwise, returns None

re.search(r, s, f)

Returns a match object if the regex r matches anywhere in string s (influenced by the flags f if given); otherwise, returns None

re.split(r, s, m, f)

Returns the list of strings that results from splitting string s on every occurrence of regex r doing up to m splits (or as many as possible if no m is given, and for Python 3.1 influenced by flags f if given). If the regex has captures, these are included in the list between the parts they split.

3.x

re.sub(r, x, s, m, f)

Returns a copy of string s with every (or up to m if given, and for Python 3.1 influenced by flags f if given) match of regex r replaced with x—this can be a string or a function; see text

re.subn(r, x, s m, f)

The same as re.sub() except that it returns a 2-tuple of the resultant string and the number of substitutions that were made

Table 13.5. The Regular Expression Module's Flags

Flag

Meaning

re.A or re.ASCII

Makes \b, \B, \s, \S, \w, and \W assume that strings are ASCII; the default is for these character class shorthands to depend on the Unicode specification

re.I or re.IGNORECASE

Makes the regex match case-insensitively

re.M or re.MULTILINE

Makes ^ match at the start and after each newline and $ match before each newline and at the end

re.S or re.DOTALL

Makes . match every character including newlines

re.X or re.VERBOSE

Allows whitespace and comments to be included

Table 13.6. Regular Expression Object Methods

Syntax

Description

rx.findall(s start, end)

Returns all nonoverlapping matches of the regex in string s (or in the start:end slice of s). If the regex has captures, each match is returned as a tuple of captures.

rx.finditer(s start, end)

Returns a match object for each nonoverlapping match in string s (or in the start:end slice of s)

rx.flags

The flags that were set when the regex was compiled

rx.groupindex

A dictionary whose keys are capture group names and whose values are group numbers; empty if no names are used

rx.match(s, start, end)

Returns a match object if the regex matches at the start of string s (or at the start of the start:end slice of s); otherwise, returns None

rx.pattern

The string from which the regex was compiled

rx.search(s, start, end)

Returns a match object if the regex matches anywhere in string s (or in the start:end slice of s); otherwise, returns None

rx.split(s, m)

Returns the list of strings that results from splitting string s on every occurrence of the regex doing up to m splits (or as many as possible if no m is given). If the regex has captures, these are included in the list between the parts they split.

rx.sub(x, s, m)

Returns a copy of string s with every (or up to m if given) match replaced with x—this can be a string or a function; see text

rx.subn(x, s m)

The same as re.sub() except that it returns a 2-tuple of the resultant string and the number of substitutions that were made

match = re.search(r"#[\dA-Fa-f]{6}\b", text)

This code snippet shows the use of an re module function. The regex matches HTML-style colors (such as #C0C0AB). If a match is found the re.search() function returns a match object; otherwise, it returns None. The methods provided by match objects are listed in Table 13.7 (rightarrow.gif 507).

Table 13.7. Match Object Attributes and Methods

Syntax

Description

m.end(g)

Returns the end position of the match in the text for group g if given (or for group 0, the whole match); returns -1 if the group did not participate in the match

m.endpos

The search's end position (the end of the text or the end given to match() or search())

m.expand(s)

Returns string s with capture markers (\1, \2, \g<name>, and similar) replaced by the corresponding captures

m.group(g, ...)

Returns the numbered or named capture group g; if more than one is given a tuple of corresponding capture groups is returned (the whole match is group 0)

m.groupdict( default)

Returns a dictionary of all the named capture groups with the names as keys and the captures as values; if a default is given this is the value used for capture groups that did not participate in the match

m.groups(default)

Returns a tuple of all the capture groups starting from 1; if a default is given this is the value used for capture groups that did not participate in the match

m.lastgroup

The name of the highest numbered capturing group that matched or None if there isn't one or if no names are used

m.lastindex

The number of the highest capturing group that matched or None if there isn't one

m.pos

The start position to look from (the start of the text or the start given to match() or search())

m.re

The regex object which produced this match object

m.span(g)

Returns the start and end positions of the match in the text for group g if given (or for group 0, the whole match); returns (-1, -1) if the group did not participate in the match

m.start(g)

Returns the start position of the match in the text for group g if given (or for group 0, the whole match); returns -1 if the group did not participate in the match

m.string

The string that was passed to match() or search()

If we were going to use this regex repeatedly, we could compile it once and then use the compiled regex whenever we needed it:

color_re = re.compile(r"#[\dA-Fa-f]{6}\b")
match = color_re.search(text)

As we noted earlier, we use raw strings to avoid having to escape backslashes. Another way of writing this regex would be to use the character class [\dA-F] and pass the re.IGNORECASE flag as the last argument to the re.compile() call, or to use the regex (?i)#[\dA-F]{6}\b which starts with the ignore case flag.

If more than one flag is required they can be combined using the OR operator (|), for example, re.MULTILINE|re.DOTALL, or (?ms) if embedded in the regex itself.

We will round off this section by reviewing some examples, starting with some of the regexes shown in earlier sections, so as to illustrate the most commonly used functionality that the re module provides. Let's start with a regex to spot duplicate words:

double_word_re = re.compile(r"\b(?P<word>\w+)\s+(?P=word)(?!\w)",
                            re.IGNORECASE)
for match in double_word_re.finditer(text):
    print("{0} is duplicated".format(match.group("word")))

The regex is slightly more sophisticated than the version we made earlier. It starts at a word boundary (to ensure that each match starts at the beginning of a word), then greedily matches one or more "word" characters, then one or more whitespace characters, then the same word again—but only if the second occurrence of the word is not followed by a word character.

If the input text was "win in vain", without the first assertion there would be one match and one capture: win in vain. There aren't two matches because while (?P<word>) matches and captures, the \s+ and (?P=word) parts only match. The use of the word boundary assertion ensures that the first word matched is a whole word, so we end up with no match or capture since there is no duplicate whole word. Similarly, if the input text was "one and and two let's say", without the last assertion there would be two matches and two captures: one and and two let's say. The use of the lookahead assertion means that the second word matched is a whole word, so we end up with one match and one capture: one and and two let's say.

The for loop iterates over every match object returned by the finditer() method and we use the match object's group() method to retrieve the captured group's text. We could just as easily (but less maintainably) have used group(1)—in which case we need not have named the capture group at all and just used the regex \b(\w+)\s+\1(?!\w). Another point to note is that we could have used a word boundary \b at the end, instead of (?!\w).

Another example we presented earlier was a regex for finding the filenames in HTML image tags. Here is how we would compile the regex, adding flags so that it is not case-sensitive, and allowing us to include comments:

image_re = re.compile(r"""
            <img\s+                     # start of tag
            [^>]*?                      # non-src attributes
            src=                        # start of src attribute
            (?:
                (?P<quote>["'])         # opening quote
                (?P<qimage>[^\1>]+?)    # image filename
                (?P=quote)              # closing quote
            |                           # ---or alternatively---
                (?P<uimage>[^"' >]+)    # unquoted image filename
            )
            [^>]*?                      # non-src attributes
            >                           # end of the tag
            """, re.IGNORECASE|re.VERBOSE)
image_files = []
for match in image_re.finditer(text):
    image_files.append(match.group("qimage") or
                       match.group("uimage"))

Again we use the finditer() method to retrieve each match and the match object's group() function to retrieve the captured texts. Each time a match is made we don't know which of the image groups ("qimage" or "uimage") has matched, but using the or operator provides a neat solution for this. Since the case insensitivity applies only to img and src, we could drop the re.IGNORECASE flag and use [Ii][Mm][Gg] and [Ss][Rr][Cc] instead. Although this would make the regex less clear, it might make it faster since it would not require the text being matched to be set to upper- (or lower-) case—but it is likely to make a difference only if the regex was being used on a very large amount of text.

One common task is to take an HTML text and output just the plain text that it contains. Naturally we could do this using one of Python's parsers, but a simple tool can be created using regexes. There are three tasks that need to be done: delete any tags, replace entities with the characters they represent, and insert blank lines to separate paragraphs. Here is a function (taken from the html2text.py program) that does the job:

def html2text(html_text):
    def char_from_entity(match):
        code = html.entities.name2codepoint.get(match.group(1), 0xFFFD)
        return chr(code)
text = re.sub(r"<!--(?:.|\n)*?-->", "", html_text)          #1
text = re.sub(r"<[Pp][^>]*?>", "\n\n", text)                #2
text = re.sub(r"<[^>]*?>", "", text)                        #3
text = re.sub(r"&#(\d+);", lambda m: chr(int(m.group(1))), text)
text = re.sub(r"&([A-Za-z]+);", char_from_entity, text)      #5
text = re.sub(r"\n(?:[ \xA0\t]+\n)+;", "\n", text)           #6
return re.sub(r"\n\n+", "\n\n", text.strip())               #7

The first regex, <!--(?:.|\n)*?-->, matches HTML comments, including those with other HTML tags nested inside them. The re.sub() function replaces as many matches as it finds with the replacement—deleting the matches if the replacement is an empty string, as it is here. (We can specify a maximum number of matches by giving an additional integer argument at the end.)

We are careful to use nongreedy (minimal) matching to ensure that we delete one comment for each match; if we did not do this we would delete from the start of the first comment to the end of the last comment.

In Python 3.0, the re.sub() function does not accept any flags as arguments, and since . means "any character except newline", we must look for . or \n. And we must look for these using alternation rather than a character class, since inside a character class . has its literal meaning, that is, period. An alternative would be to begin the regex with the flag embedded, for example, (?s)<!--.*?-->, or we could compile a regex object with the re.DOTALL flag, in which case the regex would simply be <!--.*?-->.

3.0

From Python 3.1, re.split(), re.sub(), and re.subn(), can all accept a flags argument, so we could simply use <!--.*?--> and pass the re.DOTALL flag.

3.1

The second regex, <[Pp][^>]*?>, matches opening paragraph tags (such as <P> or <p align="center">). It matches the opening <p (or <P), then any attributes (using nongreedy matching), and finally the closing >. The second call to the re.sub() function uses this regex to replace opening paragraph tags with two newline characters (the standard way to delimit a paragraph in a plain text file).

The third regex, <[^>]*?>, matches any tag and is used in the third re.sub() call to delete all the remaining tags.

HTML entities are a way of specifying non-ASCII characters using ASCII characters. They come in two forms: &name; where name is the name of the character—for example, &copy; for ©—and &#digits; where digits are decimal digits identifying the Unicode code point—for example, &#165; for ¥. The fourth call to re.sub() uses the regex &#(\d+);, which matches the digits form and captures the digits into capture group 1. Instead of a literal replacement text we have passed a lambda function. When a function is passed to re.sub() it calls the function once for each time it matches, passing the match object as the function's sole argument. Inside the lambda function we retrieve the digits (as a string), convert to an integer using the built-in int() function, and then use the built-in chr() function to obtain the Unicode character for the given code point. The function's return value (or in the case of a lambda expression, the result of the expression) is used as the replacement text.

The fifth re.sub() call uses the regex &([A-Za-z]+); to capture named entities. The standard library's html.entities module contains dictionaries of entities, including name2codepoint whose keys are entity names and whose values are integer code points. The re.sub() function calls the local char_from_entity() function every time it has a match. The char_from_entity() function uses dict.get() with a default argument of 0xFFFD (the code point of the standard Unicode replacement character—often depicted as roundquestion.gif). This ensures that a code point is always retrieved and it is used with the chr() function to return a suitable character to replace the named entity with—using the Unicode replacement character if the entity name is invalid.

The sixth re.sub() call's regex, \n(?:[ \xA0\t]+\n)+, is used to delete lines that contain only whitespace. The character class we have used contains a space, a nonbreaking space (which &nbsp; entities are replaced with in the preceding regex), and a tab. The regex matches a newline (the one at the end of a line that precedes one or more whitespace-only lines), then at least one (and as many as possible) lines that contain only whitespace. Since the match includes the newline, from the line preceding the whitespace-only lines we must replace the match with a single newline; otherwise, we would delete not just the whitespace-only lines but also the newline of the line that preceded them.

The result of the seventh and last re.sub() call is returned to the caller. This regex, \n\n+, is used to replace sequences of two or more newlines with exactly two newlines, that is, to ensure that each paragraph is separated by just one blank line.

In the HTML example none of the replacements were directly taken from the match (although HTML entity names and numbers were used), but in some situations the replacement might need to include all or some of the matching text. For example, if we have a list of names, each of the form Forename Middlename1 ... MiddlenameN Surname, where there may be any number of middle names (including none), and we want to produce a new version of the list with each item of the form Surname, ForenameMiddlename1...MiddlenameN, we can easily do so using a regex:

new_names = []
for name in names:
    name = re.sub(r"(\w+(?:\s+\w+)*)\s+(\w+)", r"\2, \1", name)
    new_names.append(name)

The first part of the regex, (\w+(?:\s+\w+)*), matches the forename with the first \w+ expression and zero or more middle names with the (?:\s+\w+)* expression. The middle name expression matches zero or more occurrences of whitespace followed by a word. The second part of the regex, \s+(\w+), matches the whitespace that follows the forename (and middle names) and the surname.

If the regex looks a bit too much like line noise, we can use named capture groups to improve legibility and make it more maintainable:

name = re.sub(r"(?P<forenames>\w+(?:\s+\w+)*)"
              r"\s+(?P<surname>\w+)",
              r"\g<surname>, \g<forenames>", name)

Captured text can be referred to in a sub() or subn() function or method by using the syntax \i or \g<id> where i is the number of the capture group and id is the name or number of the capture group—so \1 is the same as \g<1>, and in this example, the same as \g<forenames>. This syntax can also be used in the string passed to a match object's expand() method.

Why doesn't the first part of the regex grab the entire name? After all, it is using greedy matching. In fact it will, but then the match will fail because although the middle names part can match zero or more times, the surname part must match exactly once, but the greedy middle names part has grabbed everything. Having failed, the regular expression engine will then backtrack, giving up the last "middle name" and thus allowing the surname to match. Although greedy matches match as much as possible, they stop if matching more would make the match fail.

For example, if the name is "John le Carré", the regex will first match the entire name, that is, John le Carré. This satisfies the first part of the regex but leaves nothing for the surname part to match, and since the surname is mandatory (it has an implicit quantifier of 1), the regex has failed. Since the middle names part is quantified by *, it can match zero or more times (currently it is matching twice, " le" and " Carré"), so the regular expression engine can make it give up some of its match without causing it to fail. Therefore, the regex backtracks, giving up the last \s+\w+ (i.e., " Carré"), so the match becomes John le Carré with the match satisfying the whole regex and with the two match groups containing the correct texts.

There's one weakness in the regex as written: It doesn't cope correctly with forenames that are written using an initial, such as "James W. Loewen", or "J. R. R. Tolkein". This is because \w matches word characters and these don't include period. One obvious—but incorrect—solution is to change the forenames part of the regex's \w+ expression to [\w.]+, in both places that it occurs. A period in a character class is taken to be a literal period, and character class shorthands retain their meaning inside character classes, so the new expression matches word characters or periods. But this would allow for names like ".", "..", ".A", ".A.", and so on. In view of this, a more subtle approach is required.

name = re.sub(r"(?P<forenames>\w+\.?(?:\s+\w+\.?)*)"
              r"\s+(?P<surname>\w+)",
              r"\g<surname>, \g<forenames>", name)

Here we have changed the forenames part of the regex (the first line). The first part of the forenames regex matches one or more word characters optionally followed by a period. The second part matches at least one whitespace character, then one or more word characters optionally followed by a period, with the whole of this second part itself matching zero or more times.

When we use alternation (|) with two or more alternatives capturing, we don't know which alternative matched, so we don't know which capture group to retrieve the captured text from. We can of course iterate over all the groups to find the nonempty one, but quite often in this situation the match object's lastindex attribute can give us the number of the group we want. We will look at one last example to illustrate this and to give us a little bit more regex practice.

Suppose we want to find out what encoding an HTML, XML, or Python file is using. We could open the file in binary mode, and read, say, the first 1000 bytes into a bytes object. We could then close the file, look for an encoding in the bytes, and reopen the file in text mode using the encoding we found or using a fallback encoding (such as UTF-8). The regex engine expects regexes to be supplied as strings, but the text the regex is applied to can be a str, bytes, or bytearray object, and when bytes or bytearray objects are used, all the functions and methods return bytes instead of strings, and the re.ASCII flag is implicitly switched on.

For HTML files the encoding is normally specified in a <meta> tag (if specified at all), for example, <meta http-equiv='Content-Type' content='text/html; charset=ISO-8859-1'/>. XML files are UTF-8 by default, but this can be overridden, for example, <?xml version="1.0" encoding="Shift_JIS"?>. Python 3 files are also UTF-8 by default, but again this can be overridden by including a line such as # encoding: latin1 or # -*- coding: latin1 -*- immediately after the shebang line.

Here is how we would find the encoding, assuming that the variable binary is a bytes object containing the first 1000 bytes of an HTML, XML, or Python file:

match = re.search(r"""(?<![-\w])                    #1
                      (?:(?:en)?coding|charset)     #2
                      (?:=(["'])?([-\w]+)(?(1)\1)   #3
                      |:\s*([-\w]+))""".encode("utf8"),
                  binary, re.IGNORECASE|re.VERBOSE)
encoding = match.group(match.lastindex) if match else b"utf8"

To search a bytes object we must specify a pattern that is also a bytes object. In this case we want the convenience of using a raw string, so we use one and convert it to a bytes object as the re.search() function's first argument.

The first part of the regex itself is a lookbehind assertion that says that the match cannot be preceded by a hyphen or a word character. The second part matches "encoding", "coding", or "charset" and could have been written as (?:encoding|coding|charset). We have made the third part span two lines to emphasise the fact that it has two alternating parts, =(["'])?([-\w]+)(?(1)\1) and :\s*([-\w]+), only one of which can match. The first of these matches an equals sign followed by one or more word or hyphen characters (optionally enclosed in matching quotes using a conditional match), and the second matches a colon and then optional whitespace followed by one or more word or hyphen characters. (Recall that a hyphen inside a character class is taken to be a literal hyphen if it is the first character; otherwise, it means a range of characters, for example, [0-9].)

We have used the re.IGNORECASE flag to avoid having to write (?:(?:[Ee][Nn])? [Cc][Oo][Dd][Ii][Nn][Gg]|[Cc][Hh][Aa][Rr][Ss][Ee][Tt]) and we have used the re.VERBOSE flag so that we can lay out the regex neatly and include comments (in this case just numbers to make the parts easy to refer to in this text).

There are three capturing match groups, all in the third part: (["'])? which captures the optional opening quote, ([-\w]+) which captures an encoding that follows an equals sign, and the second ([-\w]+) (on the following line) that captures an encoding that follows a colon. We are only interested in the encoding, so we want to retrieve either the second or third capture group, only one of which can match since they are alternatives. The lastindex attribute holds the index of the last matching capture group (either 2 or 3 when a match occurs in this example), so we retrieve whichever matched, or use a default encoding if no match was made.

We have now seen all of the most frequently used re module functionality in action, so we will conclude this section by mentioning one last function. The re.split() function (or the regex object's split() method) can split strings based on a regex. One common requirement is to split a text on whitespace to get a list of words. This can be done using re.split(r"\s+", text) which returns a list of words (or more precisely a list of strings, each of which matches \S+). Regular expressions are very powerful and useful, and once they are learned, it is easy to see all text problems as requiring a regex solution. But sometimes using string methods is both sufficient and more appropriate. For example, we can just as easily split on whitespace by using text.split() since the str.split() method's default behavior (or with a first argument of None) is to split on \s+.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020