Home > Store

Text Processing in Python

Register your product to gain access to bonus material or receive a coupon.

Text Processing in Python

Book

  • Sorry, this book is no longer in print.
Not for Sale

About

Features

A clear, practical guide to using Python for text processing, arguably what most programmers spend most of their time doing.

° Demonstrates how Python is the perfect language for text-processing functions.

° Provides practical pointers and tips that emphasize efficient, flexible, and maintainable approaches to text-processing challenges.

° Helps programmers develop solutions for dealing with the increasing amounts of data with which we are all inundated.

Description

  • Copyright 2003
  • Dimensions: 7" x 9-1/4"
  • Pages: 544
  • Edition: 1st
  • Book
  • ISBN-10: 0-321-11254-7
  • ISBN-13: 978-0-321-11254-5

Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.

Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.

Here is some of what you will find in thie book:

  • When do I use formal parsers to process structured and semi-structured data? Page 257
  • How do I work with full text indexing? Page 199
  • What patterns in text can be expressed using regular expressions?Page 204
  • How do I find a URL or an email address in text? Page 228
  • How do I process a report with a concrete state machine? Page 274
  • How do I parse, create, and manipulate internet formats? Page 345
  • How do I handle lossless and lossy compression?Page 454
  • How do I find codepoints in Unicode?Page 465


0321112547B05022003

Sample Content

Downloadable Sample Chapter

Download the Sample Chapter related to this title.

Table of Contents



Preface.

What Is Text Processing?

The Philosophy of Text Processing.

What You'll Need to Use This Book.

Conventions Used in This Book.

A Word on Source Code Examples.

External Resources.



1. Python Basics.

Techniques and Patterns.

Utilizing Higher-Order Functions in Text Processing.

Exercise: More on combinatorial functions.

Specializing Python Datatypes.

Base Classes for Datatypes.

Exercise: Filling out the forms (or deciding not to)

Problem: Working with lines from a large file.

Standard Modules.

Working with the Python Interpreter.

Working with the Local Filesystem.

Running External Commands and Accessing OS Features.

Special Data Values and Formats.

Other Modules in the Standard Library.

Serializing and Storing Python Objects.

Platform Specific Operations.

Working with Multi-Media Formats.

Miscellaneous Other Modules.



2. Basic String Operations.

Some Common Tasks.

Problem: Quickly sorting lines on custom criteria.

Problem: Reformatting paragraphs of text.

Problem: Column statistics for delimited or flat-record files.

Problem: Counting characters, words, lines and paragraphs.

Problem: Transmit binary data as ASCII.

Problem: Creating word or letter histograms.

Problem: Reading a file backwards by record, line, or paragraph.

Standard Modules.

Basic String Transformations.

Strings as Files, and Files as Strings.

Converting Between Binary and ASCII.

Cryptography.

Compression.

Unicode.

Solving Problems.

Exercise: Many ways to take out the garbage.

Exercise: Making sure things are what they should be

Exercise: Finding needles in haystacks (full-text indexing).



3. Regular Expressions.

A Regular Expression Tutorial.

Just What is a Regular Expression Anyway?

Matching Patterns In Text: The Basics.

Matching Patterns In Text: Intermediate.

Advanced Regular Expression Extensions.

Some Common Tasks.

Problem: Making a text block flush left.

Problem: Summarizing command-line option documentation.

Problem: Detecting duplicate words.

Problem: Checking for server errors.

Problem: Reading lines with continuation characters

Problem: Identifying URLs and email addresses in texts.

Problem: Pretty printing numbers.

Standard Modules.

Versions and optimizations.

Simple Pattern Matching.

Regular Expression Modules.



4. Parsers and State Machines.

An Introduction to Parsers.

When data becomes deep and texts become stateful.

What is a grammar?

An EBNF grammar for IF/THEN/END structures.

Pencil-and-Paper Parsing.

Exercise: Some variations on the language.

An Introduction to State Machines.

Understanding State Machines.

Text Processing State Machines.

When Not To Use A State Machine.

When to Use a State Machine.

An Abstract State Machine Class.

Processing a Report with a Concrete State Machine.

Subgraphs and State Reuse.

Exercise: Finding other solutions.

Parser Libraries for Python.

Specialized Parsers in the Standard Library.

Low-Level State Machine Parsing.

High-Level EBNF Parsing.

High-Level Programmatic Parsing.



5. Internet Tools and Techniques.

Working With Email and Newsgroups.

Manipulating and Creating Message Texts.

Communicating with Mail Servers.

Message Collections and Message Parts.

World Wide Web Applications.

Common Gateway Interface.

Parsing, Creating, and Manipulating HTML Documents.

Accessing Internet Resources.

Synopses of Other Internet Modules.

Standard Internet-Related Tools.

Third Party Internet-Related Tools.

Understanding XML.

Python Standard Library XML Modules.

Third Party XML-Related Tools



A. A Selective and Impressionistic Short Review of Python.

What Kind of Language is Python?

Namespaces and Bindings.

Assignment and Dereferencing.

Function and Class Definitions.

import Statements.

for Statements.

except Statements.

Datatypes.

Simple Types.

String Interpolation.

Printing.

Container Types.

Compound Types.

Flow Control.

if / then / else Statements.

Boolean Shortcutting.

for / continue / break Statements.

map( ), filter( ), reduce( ), and List Comprehensions.

while / else / continue / break Statements.

Functions, Simple Generators and the yield Statement.

Raising and Catching Exceptions.

Data as Code.

Functional Programming.

Emphasizing Expressions using lambda.

Special List Functions.

List-Application Functions as Flow Control.

Extended Call Syntax and apply( ).



B. A Data Compression Primer.

Introduction.

Lossless and Lossy Compression.

A Data Set Example.

Whitespace Compression.

Run-Length Encoding.

Hu_man Encoding.

Lempel-Ziv Compression.

Solving the Right Problem.

A Custom Text Compressor.

References.



C. Understanding Unicode.

Some Background on Characters.

What is Unicode?

Encodings.

Declarations.

Finding Codepoints.

Resources.



D. A State-Machine for Adding Markup to Text.


E. Glossary.


Index. 0321112547T01302003

Preface

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one—and preferably only one—obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea—let's do more of those!
—Tim Peters, The Zen of Python

0.1 What is Text Processing?

At the broadest level text processing is simply taking textual information and doing something with it. This doing might be restructuring or reformatting it, extracting smaller bits of information from it, algorithmically modifying the content of the information, or performing calculations that depend on the textual information. The lines between "text" and the even more general term "data" are extremely fuzzy; at an approximation, "text" is just data that lives in forms that people can themselves read—at least in principle, and maybe with a bit of effort. Most typically computer "text" is composed of sequences of bits which have a "natural" representation as letters, numerals and symbols; and most often such text is delimited (if delimited at all) by symbols and formatting that can be easily pronounced as "next datum."

The lines are fuzzy, but the data that seems least like text—and that, therefore this particular book is least concerned with—is the data that makes up "multimedia" (pictures, sounds, video, animation, etc.) and data that makes up UI "events" (draw a window, move the mouse, open an application, etc.). Like I said, the lines are fuzzy, and some representations of the most non-textual data are themselves pretty textual. But in general, the subject of this book is all the stuff on the near side of that fuzzy line.

Text processing is arguably what most programmers spend most of their time doing. The information that lives in business software systems mostly comes down to collections of words about the application domain—maybe with a few special symbols mixed in. Internet communications protocols consist mostly of a few special words used as headers, a little bit of constrained formatting, and message bodies consisting of additional wordish texts. Configuration files, log files, CSV and fixed-length data files, error files, documentation, and source code itself, are all just sequences of words with bits of constraint and formatting applied.

Programmers and developers spend so much time with text processing, that it is easy to forget that that is what we are doing. The most common text processing application is probably your favorite text editor. Beyond simple entry of new characters, text editors perform such text processing tasks as search/replace and copy/paste, which—given guided interaction with the user—accomplishes sophisticated manipulation of textual sources. Many text editors go farther than these simple capabilities, and include their own complete programming systems (usually called "macro processing"); in those cases where editors include "Turing-complete" macro languages, text editors suffice, in principle, to accomplish anything that the examples in this book can.

After text editors, a variety of text processing tools are widely used by developers. Tools like "File Find" under Windows, or "grep" on Unix (and other platforms) perform the basic chore of locating text patterns. "Little languages" like sed and awk perform basic text manipulation (or even non-basic). A large number of utilities—especially in Unix-like environments—perform small custom text processing tasks: wc, sort, tr, md5sum, uniq, split, strings and many others.

At the top of the text processing food chain are general purpose programming languages, such as Python. I wrote this book on Python in large part because Python is such a clear, expressive, and general purpose language. But for all Python's virtues, text editors and "little" utilities will always have an important place for developers "getting the job done." As simple as Python is, it is still more complicated than you need to achieve many basic tasks. But once you get past the very simple, Python is a perfect language for making the difficult things possible (and it is also good at making the easy things simple).

0.2 The Philosophy of Text Processing

Hang around any Python discussion groups for a little while, and you will certainly be dazzled by the contributions of the Python developer, Tim Peters (and by a number of other Pythonistas). His "Zen of Python" captures much of the reason that I choose Python as the language in which to solve most programming tasks that are presented to me. But to understand what is most special about text processing as a programming task, it is worth turning to Perl creator Larry Wall's cardinal virtues of programming: Laziness, impatience, hubris.

What sets text processing most clearly apart from other tasks computer programmers accomplish is the frequency with which we perform text processing on an ad hoc or "one-shot" basis. One rarely bothers to create a one-shot GUI interface for a program. You even less frequently performs a one-shot normalization of a relational database. But every programmer with a little experience has had numerous occasions where she has received a trickle of textual information (or maybe a deluge of it) from another department, from a client, from a developer working on a different project, or from data dumped out of a DBMS; the problem in such cases is always to "process" the text so that it is usable for our own project, program, database, or work unit. Text processing to the rescue. This is where the virtue of impatience first appears—we just want the stuff processed, right now!

But text-processing tasks that were obviously one-shot tasks that we knew we would never need again have a habit of coming back like restless ghosts. It turns out that that client needs to update the one-time data they sent last month. Or the boss decides that she would really like a feature of that text summarized in a slightly different way. The virtue of laziness is our friend here—with our foresight not to actually delete those one-shot scripts, we have them available for easy reuse and/or modification when the need arises.

Enough is not enough, however. That script you reluctantly used a second time turns out to be quite similar to a more general task you will need to perform frequently, perhaps even automatically. You imagine that with only a slight amount of extra work you can generalize and expand the script, maybe add a little error checking and some runtime options while you are at it; and do it all in time and under budget (or even as a side project, off the budget). Obviously, this is the voice of that greatest of programmers' virtues: hubris.

The goal of this book is to make its readers a little lazier, a smidgeon more impatient, and a whole bunch more hubristic. Python just happens to be the language best suited to the study of virtue.

0.3 What You'll Need to Use This Book

This book is ideally suited for programmers who are a little bit familiar with Python, and whose daily tasks involve a fair amount of text processing chores. Programmers who have some background in other programming languages—especially with other "scripting" languages—should be able to pick up enough Python to get going by reading Appendix A.

While Python is a rather simple language at heart, this book is not intended as a tutorial on Python for non-programmers. Instead, this book is about two other things: getting the job done, pragmatically and efficiently; and understanding why what works works and what doesn't work doesn't work, theoretically and conceptually. As such, we hope this book can be useful both to working programmers and to students of programming at a level just past the introductory.

Many sections of this book are accompanied by problems and exercises, and these in turn often pose questions for users. In most cases, the answers to the listed questions are somewhat open-ended—there are no simple right answers. I believe that working through the provided questions will help both self-directed and instructor-guided learners; the questions can typically be answered at several levels, and often have an underlying subtlety. Instructors who wish to use this text are encouraged to contact the author for assistance in structuring a curriculum involving it. All readers are encouraged to consult the book's web site to see possible answers provided by both the author and other readers; additional related questions will be added to the web site over time, along with other resources.

The Python language itself is conservative. Almost every Python script written ten years ago for Python 1.0 will run fine in Python 2.3+. However, as versions improve, a certain number of new features have been added. The most significant changes have matched the version number changes—Python 2.0 introduced list comprehension's, augmented assignments, Unicode support, and a standard XML package. Many scripts written in the most natural and efficient manner using Python 2.0+ will not run without changes in earlier versions of Python.

The general target of this book will be users of Python 2.1+, but some 2.2+ specific features will be utilized in examples. Maybe half the examples in this book will run fine on Python 1.5.1+ (and slightly fewer with older versions), but examples will not necessarily indicate their requirement for Python 2.0+ (where it exists). On the other hand, new features introduced with Python 2.1 and above will only be utilized where they make a task significantly easier, or where the feature itself is being illustrated. In any case, examples requiring versions past Python 2.0 will usually indicate this explicitly.

In the case of modules and packages—whether in the standard library or third-party—we will explicitly indicate what Python version is required; and where relevant, which version added the module or package to the standard library. In some cases, it will be possible to use later standard library modules with earlier Python versions. In important cases, this possibility will be noted.

0.4 Conventions Used in This Book

All constants, functions, and classes in discussions and cross-references will be explicitly prepended with their namespace (module). Methods will additionally, be prepended with their class. In some cases, code examples will use the local namespace, but a preference for explicit namespace identification will be present in sample code also. For example, a reference might read:

See Also: email.Generator.DecodedGenerator.flatten() 346; raw input() 442; tempfile.mktemp() 70;

The first is a class method in the email.Generator module; the second, a built-in function; the last, a function in the tempfile module

In the special case of built-in methods on types, the expression for an empty type object will be used in the style of a namespace modifier. For example:

Methods of built-in types include .sort(), "".islower(), {}.keys(), and (lambda:1).func code.

The file object type will be indicated by the name FILE in capitals; A reference to a file object method will appear as, e.g.:

See Also: FILE.flush() 16;

Brief inline illustrations of Python concepts and usage will be taken from the Python interactive shell. This approach allows readers to see the immediate evaluation of constructs, much as they might explore Python themselves. Moreover, examples presented in this manner will be self-sufficient (not requiring external data), and may be entered—with variations—by readers trying to get a grasp on a concept. For example:

>>> 13/7 # integer division
1
>>> 13/7. # float division
1.8571428571428572

In documentation of module functions, where named arguments are available, they are listed with their default value. Optional arguments are listed in square brackets. These conventions are also used in the Python Library Reference. For example:

foobar.spam(s, val=23 ,taste="spicy")
    The function foobar.spam() uses the argument s to . . .

If a named argument does not have a specifiable default value, the argument is listed followed by an equal sign and ellipsis. For example:

foobar.baz(string=. . . , maxlen=. . . )
    The foobar.baz() function . . .

With the introduction of Unicode support to Python, an equivalence between a character and a byte no longer holds in all cases. Where an operation takes a numeric argument affecting a string-like object, the documentation will specify whether characters or bytes are being counted. For example:

Operation A reads num bytes from the buffer. Operation B reads num charactersfrom the buffer.

The first line indicates a number of actual 8-bit bytes affected. The second line indicates an indefinite number of bytes are affected, but that they compose a number of (maybe multi-byte) characters.

0.5 A Word on Source Code Examples

First things first. All the source code in this book is hereby released to the public domain. You can use it however you like, without restriction. You can include it in free software, or in commercial/proprietary projects. Change it to your heart's content, and in any manner you want. If you feel like giving credit to the author (or sending him large checks) for code you find useful, that is fine—but no obligation to do so exists.

All the source code in this book, and various other public domain examples, can be found at the book's web site. If such an electronic form is more convenient for you, we hope this helps you. In fact, if you are able, you might benefit from visiting this location, where you might find updated versions of examples or other useful utilities not mentioned in the book.

First things out of the way, let us turn to second things. Little of the source code in this book is intended as a final say on how to perform a given task. Many of the examples are easy enough to copy directly into your own program, or to use as stand-alone utilities. But the real goal in presenting the examples is educational. We really hope you will think about what the examples do, and why they do it the way they do. In fact, we hope readers will think of better, faster, and more general ways of performing the same tasks. If the examples work their best, they should be better as inspirations than as instructions.

0.6 External Resources

GENERAL RESOURCES

A good clearing house for resources and links related to this book is the book's web site. Over time, I will add errata and additional examples, questions, answers, utilities, etc. to the site, so check it from time to time:

http://gnosis.cx/TPiP/

The first place you should probably turn for any question on Python programming(after this book), is:

http://www.python.org/

The Python newsgroup comp.lang.python is an amazingly useful resource, with discussion that is generally both friendly and erudite. You may also post to and follow the newsgroup via a mirrored mailing list:

http://mail.python.org/mailman/listinfo/python-list

BOOKS

This book generally aims at an intermediate reader. Other Python books are better introductory texts (especially for those fairly new to programming generally). Some good introductory texts are:

  • Core Python Programming, Wesley J. Chun, Prentice Hall/PTR, 2001. ISBN: 0-130-26036-3
  • Learning Python, Mark Lutz and David Ascher, O'Reilly, 1999. ISBN: 1-56592-464-9
  • The Quick Python Book, Daryl D. Harms and Kenneth McDonald, Manning Publications, 2000. ISBN: 1-884777-74-0.

As introductions, I would generally recommend these books in the order listed, but learning styles vary between readers.

Two texts that overlap this book somewhat, but focus more narrowly on referencing the standard library are:

  • Python Essential Reference, Second Edition, David M. Beazley, New Riders 2001. ISBN: 0-7357-1091-0.
  • Python Standard Library, Fredrik Lundh, O'Reilly 2001. ISBN: 0-596-00096-0.

For coverage of XML, at a far more detailed level than this book has room for, is the excellent text:

  • Python & XML, Christopher A. Jones and Fred L. Drake, Jr., O'Reilly 2002. ISBN: 0-596-00128-2.

SOFTWARE DIRECTORIES

Currently, the best Python-specific directory for software is the Vaults of Parnassus:

http://www.vex.net/parnassus/

SourceForge is a general open source software resource. Many projects—Python and otherwise—are hosted at that site, and the site provides search capabilities, keywords, category browsing, and the like:

http://sourceforge.net/

Freshmeat is another widely used directory of software projects (mostly open source). Like the Vaults of Parnassus, Freshmeat does not directly host project files, but simply acts as an information clearing house for finding relevant projects:

http://freshmeat.net/

SPECIFIC SOFTWARE

A number of Python projects are discussed in this book. Most of those are listed in one or more of the software directories mentioned above. A general search engine like Google, http://google.com, is also useful in locating project homepages. Below are a number of project URLs that are current at the time of this writing. If any of these fall out of date by the time you read this book, try searching in a search engine or software directory for an updated URL.

The author's Gnosis Utilities contains a number of Python packages mentioned in this book, including gnosis.indexer, gnosis.xml.indexer, gnosis.xml.pickle, and others. You can download the most current version from:

http://gnosis.cx/download/Gnosis Utils-current.tar.gz

eGenix.com provides a number of useful Python extensions, some of which are documented in this book. These include mx.TextTools, mx.DateTime, severeral new datatypes, and others facilities:

http://egenix.com/files/python/eGenix-mx-Extensions.html

SimpleParse is hosted by SourceForge, at:

http://simpleparse.sourceforge.net/

The PLY parsers has a home page at:

http://systems.cs.uchicago.edu/ply/ply.html



0321112547P01302003

Index

Download the Index file related to this title.

Updates

Submit Errata

More Information

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020