Objects and Classes in Python

Date: Aug 16, 2002

Sample Chapter is provided courtesy of Prentice Hall Professional.

Learn the basics of object-oriented programming through Python's classes and inheritance and see where Python diverges from other OO languages.

This chapter presents details on Python's classes and inheritance, the facilities essential for object-oriented programming. The classes contain methods, functions that apply to class instances. These methods access and assign attributes of the instance. Python provides multiple inheritance, so you can use methods of more than one superclass to manipulate an instance. Multiple inheritance can be confusing if there is more than one way to inherit from the same superclass, so-called diamond inheritance.

Unlike statically-typed object-oriented languages, you do not have to use inheritance in Python. All that is required is that an object have a method with the proper name and parameters available when you call it. There are no visibility restrictions on attribute names in instances that are used by methods in different classes.

4.1 Instances and Classes

You create a class object by executing a class statement, e.g.:

class point: pass

You create an instance of a class by calling the class name as a function:

p=point()

Both classes and instance objects have attributes. You get an attribute with the syntax:

object . attribute_name

The instance has a reference to its class in a special attribute __class__, so that p.__class__ is point.

Both classes and class instances have dictionaries, named __dict__. You reference the class point's dictionary as point.__dict__ and the instance p's dictionary as p.__dict__. The relationships between p and point and their dictionaries are shown in Figure 4–1.

Figure 4-1 Instances and classes.

4.2 Class Declarations

The basic form of a class statement is:

class name:
    suite

where name is the name of the class being declared and suite is a suite of statements to execute. All the statements in the suite of a class declaration are typically def statements, although they do not have to be.

The class declaration is an executable statement. Like a def statement, it creates an object and assigns it to the name in the current scope. The def statement creates a function object. The class statement creates a class object. But here is an important difference: The suite in a function is not executed until the function is called. The statements in a class declaration are executed while the class statement is being executed.

When a function is called, it is executed with its own local environment where local variables are stored. When the function returns, the local environment is discarded. Like a function, the statements in a class declaration are executed with the declaration's own local environment. Unlike a function, however, when the class statement finishes executing, the local environment is not thrown away. It is saved in the __dict__ dictionary of the class object. The suite of statements in a class declaration is executed to create a dictionary of class attributes, the names known in the class. In other object-oriented languages, these are known as class variables.

The most common statement to include in a class declaration is the def statement, which is used to create the "methods" that operate on instances of the class (which we discuss in the next section); but it is possible to execute other assignments as well.

When you want to get the value of one of these names, you can use the dot operator. The form ClassName.attribute gives you the value of the attribute defined in the class. For example:

>>> class W:
...     y=1
...
>>> W.y
1

creates a class named W and assigns the value 1 to an attribute y. Note that y is an attribute of the class object itself, not of instances of the class. More confusing still, we can get at class attributes through instances of the class as well as through the class object. The way to get an instance of a class is by calling the class as we would call a function. So:

>>> z=W()
>>> z.y
1

shows that we can also get the value of y through the instance, z, of the class W.

4.3 Instances

The purpose of a class is to create instances of it. We can use this for something as simple as implementing what are called structs or records in other languages. For example, class point: pass declares a class object called point. The pass statement executes no operation. We need it because the syntax requires a class statement to have a suite of one or more statements as a body.

We can create an instance of the class point by calling point() as a function:

>>> p=point()

We can then assign values to attributes of the point using the dotted notation and access them the same way:

>>> p.x=1
>>> p.y=2
>>> p.x
1
>>> p.y
2

But when we create a point, it doesn't start with any attributes.

>>> q=point()
>>> q.x
Traceback (innermost last):
  File "<stdin>", line 1, in ?
AttributeError: x

This is a major difference between instances of classes in Python and structs or records in most other languages. In most languages, you have to declare the attributes (fields, members) of the structs or records. In Python, the first assignment to an attribute creates it. The attributes are kept in the __dict__ dictionary of the instance:

>>> p.__dict__
{'x': 1, 'y': 2}

Assignment to an attribute is equivalent to an assignment to the attribute name in the __dict__ dictionary.

A problem with Python then is, "What if we want the instance of the class to start off with a set of attributes?" To do this, we can provide an initialization procedure in our declaration of the class that will be called when an instance is created:

>>> class point:
...     def __init__(self):
...             self.x=0
...             self.y=0
...
>>> q=point()
>>> q.x
0

Here we have redeclared point, replacing pass with a function definition. The function name is __init__. When Python creates an instance of the point, it calls the __init__(self) function and passes it a reference to the point it has just created. Function __init__(self) then assigns zero to both attributes x and y via the parameter self. Just as with the assignments from outside, these assignments create the attributes.

The __init__() function is an initializer. Although it is called by Python just after the instance is created to initialize it, you can call it again at any later time to reinitialize the object.

If you want to know what attributes are defined in an instance of a class, you can use the dir(), as in "directory," function:

>>> dir(p)
['x', 'y']

4.4 Methods

A def statement within a class declaration declares a method, a function that operates on instances of the class. Consider the following:

>>> class X:
...     def f(self,y): return y
...

We declared a class X containing a method f. The rule is that methods can be called only for instances of the class or a subclass. When they are called, their first argument will be given a reference to the instance and their other arguments will be taken from the argument list. So method f takes two parameters: The first one, self, will be given a reference to an instance of class X. The second one will be given a value by the argument of the call.

The __init__() method shown in the preceding section is an example of a special method. There are many other special methods, and they are discussed in Chapters 14 and 15. The special methods are called when an instance is used in a special context; for example, as an operand for a binary operator.

The normal way to call a method is to use the dot operator. For example:

>>> z=X()
>>> z.f(5)
5

The call instance.name(args) will call the method name declared in the class of instance and pass it the instance as its first parameter and the other args as the subsequent parameters. You use this first parameter to access the attributes of the instance.

If you are familiar with other object-oriented languages, you will notice that there are no "class methods" or "static methods" in a class definition; in other words, there are no methods that do not require a reference to a class instance. Instead, you would call functions in the module that contains the class and have them assign values to variables in the module.

Let's look at the data type of a method. If we look up a method in the class object, we get an unbound method object:

>>> X.f
<unbound method X.f>

This means that it is a method, but it isn't attached to any instance of the class yet. If, however, we look it up in an instance, we get a bound method. Thus:

>>> z=X()
>>> z.f
<method X.f of X instance at 007DCEFC>

gives us a method object that is bound to an instance of the class X. We can call the bound instance as a function of one parameter:

>>> g=z.f
>>> g(7)
7

But we cannot call the unbound method, f, as a function of one parameter. It needs two parameters, the first being an instance:

>>> g=X.f
>>> g(7)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: unbound method must be called with class instance 1st argument
>>> g(z,7)
7

By the way, methods are not stored in the class object as unbound methods, but rather as functions. A class object contains a special attribute, __dict__, that is a dictionary containing the namespace of the class. If you look up a method in that dictionary, you find a function. Thus:

>>> X.__dict__["f"]
<function f at 007B2464>

When it looks up a function in a class object, the dot operator creates a method object. Figure 12–4 shows a picture of the relationships between method objects, classes, and modules.

Now, to show a use of the instance reference, consider the counter class shown in Figure 4–2. Instances of the counter class have an attribute, count, that starts at zero by default, or at another value specified when the object is created. The method call c.bump() will add one to c.count. The call c.bump(k) will add k to c.count. The __init__() method is called automatically when an instance of the class is created and assigns its parameter val to the count attribute of the instance. This creates the count attribute, since, like variables, attributes are created when they are first assigned a value. The val=0 parameter specifies that the default value is zero. Method bump() adds its parameter by to count and returns the new value. The default increment is specified by the parameter specification, by=1.

Figure 4-2 Counter class, file counterObj.py.

Notice, by the way, that there is no requirement that you call the first parameter of a method self. It is a custom in Python. Java and C++ programmers may prefer the name this, but many Python programmers strongly object to using anything other than self, or perhaps s for short.

Here is a test of the counter class:

>>> from counterObj import counter
>>> c=counter()
>>> c.count
0
>>> c.bump()
1
>>> c.count
1
>>> counter.__doc__
'creates counter objects'
>>> d=counter(3)
>>> d.count
3
>>> d.bump(2)
5

Also notice that both the counterObj module and the counter class begin with string literals. These are documentation strings. You can look up these strings as the __doc__ attributes of the objects containing them:

>>> print counterObj.__doc__
 counter objects:
x=counter()
x=counter(initVal)
x.count
x.bump()
x.bump(increment)
>>> counter.__doc__
'creates counter objects'

You also need to distinguish between methods that are contained in classes and attributes of objects that contain functions. Here we create a function, hi(), that writes out the string "hi". Then we assign it as the bump attribute of counter object c.

>>> def hi():print "hi"
...
>>> c.bump=hi
>>> c.bump()
hi

When we call c.bump(), we get the hi() function, not the bump method of the class. From this and the earlier discussion of bound method objects, we can see what Python does with a reference like c.bump. First it tries to find attribute bump of object c. If it finds it, it uses that. If it doesn't find an attribute with that name, Python looks in the class of object c to find a function bump. If it finds one, it creates a bound method object containing that function and the object c.

Finally, let us remark again on the scope of names in methods. In most object-oriented languages, code in a method can refer to the attributes of an object that contains it by just using the name of the attribute. In Python, it must use an explicit reference to the object, such as self, and reference the attributes with the dot operator.

So what do the variable names in the method refer to? The same as in any function, they are either local variables, global variables defined in the surrounding module, or built-in names of the Python system.¹

4.5 Single Inheritance

Classes without inheritance are enough for what is called object-based programming. You can create new data types (called abstract data types) that have their own operations. But for object-oriented programming, you need inheritance. Python allows a class to inherit from one or more classes—multiple inheritance. We discuss single inheritance first, and then expand the discussion to multiple inheritance.

A class declaration with single inheritance has the form:

class name(superclass):
    suite

where superclass is an expression that yields a class object. In other languages, like Java, the class declarations are handled by the compiler, so the superclass would be a name of a class. In Python, class declarations are executable, so the superclass is an expression that yields a class at run-time. You could have an array of classes and a loop creating a subclass of each, something like:

for i in range(len(X)):
    class C(Y[i]): ...
    X[i]=C

although it is hard, offhand, to think of any use for doing so. Executing the same class declaration more than once is more likely to be a bug.

When we say the subclass inherits from its superclass, we mean that the subclass starts with all the superclass's methods. The subclass can add new methods and attributes beyond those possessed by the superclass. The subclass can override methods that it would inherit from its superclass; that is, it can provide its own declarations of some of the methods declared in the superclass. Then, when someone calls the method, they get the version provided by the subclass.

Because objects in the subclass get all the attributes and methods of the superclass, they can be used in any place an object of the superclass can be. They will respond to the same operations. This gives an "is-a" relationship between instances of the subclass and its superclass. If class y inherits from class X, an instance of Y is an X. The is-a relationship provides what is called "polymorphism." At a particular place in the program, you may not be sure precisely what class of object is being operated on, only that it has a certain interface, that is, that it will respond to certain method calls. (Actually, polymorphism is the wrong name. It means "multiple forms," but the interface is more analogous to a form and the implementation to a substance. The interface is the same. It is the implementations that can be different.)

So what do you use inheritance for? It has a great many different uses. We discuss some of them here and some in later chapters. Many of the uses have been given names and have been classified as object-oriented design patterns, which we discuss in Chapter 5.

One use for inheritance is to add functionality. Consider the class settableCounter in Figure 4–3. It is a subclass of class counter shown in Figure 4–2. As we've already seen, counter provides three things to its user: an attribute count that contains the current count; an __init__() method that allows the counter to be initialized to a particular value or to default to zero; and a bump() method that allows you to increase the count either by one by default or by an explicit amount, positive or negative.

Figure 4–3 Class settableCounter.

The class settableCounter adds a method set() that allows you to assign an explicit value to the current count. You may be wondering why we would need a set() method. Why not just assign a value to count? Well, with the current implementation, that would work; but does counter actually promise that you will be able to assign to count? It is possible to implement counter so that you can only read count, but not assign to it. The actual count can be hidden. We'll see how to do this in Chapter 14.

Here is an example of a settableCounter in action:

>>> from settableCounter import settableCounter
>>> x=settableCounter(1)
>>> x.count
1
>>> x.bump()
2
>>> x.set(10)
>>> x.count
10

Clearly, when we create a settableCounter, we get an object that has the methods declared in its class and in its superclass, counter. When we created it, the __init__() method in the superclass was executed, setting count initially to 1. We got at the attribute count as easily as in a counter object. When we called bump(), we called the bump() method declared in the superclass, counter. When we called set(), we got the method declared in settableCounter.

Here's how it works: As discussed earlier, when we access an object using the dot operator—for example, x.y—Python first looks for a y attribute of object x. If it finds one, that's what it returns. Otherwise, it looks through a series of classes for a definition. It first looks in x's class. Then, if it doesn't find it there, it looks in x's superclass. It will keep on looking in superclasses until it finds the class attribute y or it comes to a class that has no superclasses.

In the settableCounter example, when we referred to x.count, Python found it in x's dictionary of attributes. When we referred to set(), Python found it in x's class object. When we referred to bump(), Python found it in x's class's superclass. Similarly, when we created a settableCounter, Python found the __init__() method in the superclass, counter, and executed that. These namespaces are shown in the contour diagram in Figure 4–4. The boxes represent nested namespaces. You start searching for a name in the innermost name space and move to each enclosing namespace in turn.

Figure 4–4 Contour model of scopes (namespaces) in a settableCounter instance.

4.6 Visibility

If you know other object-oriented languages such as Java, you will find some differences between them and Python regarding visibility. Classes in other object-oriented languages declare the visibility of their attributes. If an attribute is declared private, only that class can see it. If it is public, anyone using an object of that class can see it. And there is usually a protected visibility that says that code in that class and any subclass can see the attribute, but code outside those classes cannot. Of course, the same visibility restrictions can also be used on methods.

Python objects have a single pool of attributes. In other languages, each class has its own separate pool of attributes in the instance object. Suppose you want to use some private attribute exclusively in one class. In the other languages, each class can use the same private name as any other class, and all will be kept separate. Thus you can program a class without worrying too much about what private attribute names its superclasses are using. In Python, if two of the classes use the same name for different purposes, they will clobber each other's data and the program will probably crash.

These visibility restrictions are considered important to object-or_iented programming. One goal is to have objects provide encapsulation: An object is supposed to provide an interface to the outside users of the object and hide details from them internally. Thus, programmers are required to program to the interface. The implementation of the object can change, but its users will not have to change their code.

Python has no such visibility restrictions. All attributes and methods are visible to everyone. Anyone who wishes to use knowledge about the implementation of an object can do so. That can result in more efficient code. It can also result in a crash if the implementation changes. The language does nothing to prohibit programmers from "breaking encapsulation."

However, Python does provide "name mangling" to help hide names in classes. If you begin the name of an attribute of a class with two underscores, and you don't end it with any underscores, it is automatically rewritten to include the class name. Here's an example:

>>> class XX:
...     def __init__(self):
...             self.__x=0
...
>>> z=XX()
>>> dir(z)
['_XX__x']

As you can see, attribute __x in class XX was renamed to _XX__x. It doesn't prevent anyone from accessing it, but it does make it more unpleasant, which should serve to discourage casual use. Just as important, this keeps the attributes used privately by one class separate from those used by another.

As in other object-oriented languages, a method declared in a subclass will hide a method in a superclass with the same name. Python stops looking for a method as soon as it finds one with the right name. Unlike some other object-oriented languages, there is no method overloading in Python. Method overloading allows you to declare several methods with the same name but different signatures; that is, different numbers or types of parameters. All those methods will be visible at the same time. The compiler will look at a method call and choose the correct method to execute for the argument list given in the call. Python has no such facility. There are no type declarations, so the types of parameters cannot be specified to help in choosing which method is being called, and the parameter-passing conventions are so loose, even the number of parameters would not be a good way to choose a method.

4.7 Explicit Initializer Chaining

In many object-oriented languages, the initialization code for class instances (i.e., the class's constructor) will automatically call the initialization code for its superclasses when it begins executing. But there is nothing special about the __init__() method. Python will only call one __init__() method, the first it finds. In Python, you will have to call __init__() methods of superclasses yourself.

How? The problem is, suppose settableCounter had an __init__() method:

def  __init__(self,x): ...

that needed to call the __init__() method of its superclass, counter. It couldn't just call

self.__init__ #won't work

That would call settableCounter's __init__() method again, since Python will start searching at the class of object self and stop at the first __init__() method it finds.

Other object-oriented languages have a keyword like super to give a method access to names known in the superclass. Python uses the class name of the superclass. Remember that you can get an unbound method object by writing classname.methodname. You use that to get a method in a superclass whose name is hidden:

counter.__init__(self,x) #would work

Now let's criticize the design of counter and settableCounter. It is part of the design to have an attribute count visible from outside. With the obvious implementation, users are invited to assign values to it, rather than use the set() method. It is considered poor object-oriented design to ever allow the users of a class to assign values to its attributes directly. Instead, they are supposed to call methods to ask the object to do things for them.

Also, settableCounter knows the implementation of counter and assigns a value directly to count in the set() method. This is not as bad as allowing unrelated code to assign to count. Classes generally provide a more lenient interface to their subclasses than to the rest of the world, so it is probably okay for settableCounter to access the count attribute. But this still binds the classes together, so that a change to counter may force a change in settableCounter. It would be better to program defensively and prevent changes in one class from propagating into another.

This discussion becomes more complicated still if we use the special methods __getattr__() and __setattr__() discussed in Chapter 14. They allow what looks like an attribute access to actually call a method. However, we did not use these in counter and settableCounter, and their discussion will have to wait until Chapter 14.

4.8 Example: Set Implementation

Now let's consider a more elaborate example: A class AbstractSet (Figure 4–5) is a superclass of two other classes, ListSet (Figure 4–6) and DictSet (Figure 4–7). A set, in the mathematical sense, is a collection of elements (objects) without duplications. These classes may be considered a kind of sketch of how sets could be implemented. These three classes provide two implementations of sets as follows:

Figure 4–5 AbstractSet.

Figure 4–6 ListSet.

Figure 4–7 DictSet.

AbstractSet declares all the set operations, but it doesn't implement them all. It provides some common code, but leaves many operations up to the subclasses.
ListSet implements a set using a list to hold the elements.
DictSet implements a set using a dictionary to hold the ele_ments.

Why have two implementations? Lists and dictionaries may each be more efficient than the other for some set sizes and some uses, although in Chapter 17, we will settle on the dictionary implementation of sets and provide one that has a more complete collection of methods than these.

The operations provided by these sets are as follows:

s=ListSet(elems) or s=DictSet(elems)—Creates a set initially containing the elements of the (optional) sequence elems.
s.insert(x)—Adds element x to set s if it is not already present. Returns s.
s.contains(x)—Returns true (1) if s contains x, false (0) otherwise.
s.delete(x)—Removes element x from set s. Performs no operations if s does not contain x. Returns s.
s.members()—Returns a list of all the elements of set s.
s.new()—Returns a new empty set of the same type as s, e.g., a ListSet for a ListSet.
s.copy()—Returns a copy of set s.
s.size()—Returns the number of elements in set s.
s.insertAll(q)—Inserts all the elements in sequence q into the set s. Returns s.
s.removeAny()—Removes and returns an arbitrary element of set s. If s is empty, it returns None.
s.union(t)—Returns a new set of the same type as s that contains all the elements contained in either s or t.
s.intersection(t)—Returns a new set of the same type as s that contains all the elements contained in both s and t.
str(s)—Returns a string representation of s, listing all the elements. This is the __str__() method; it tells str() how to do its job.
repr(s)—This is the __repr__() method. For these sets, it is the same as str(s).

You can find all the methods in AbstractSet, but not all of them are implemented there. Those methods that contain raise NotImplementedError are actually implemented in the subclasses. In a language like Java, we would have to declare them "abstract," which would tell the compiler that they must be implemented in a subclass and that instances of AbstractSet cannot be created, because only instances of subclasses that have the code for the methods can be created.

Python doesn't have any special way to declare "abstract" methods, but this is the custom. You raise a NotImplementedError for the abstract method; if the method hasn't been overridden at run-time, you will find out about it.

What about removing a method from a class by implementing a subclass that overrides it with a method that raises NotImplementedError? You can do that, but it is considered an extremely bad programming practice. An instance of the subclass is supposed to have an is-a relationship to its superclass. That means that it can be used anywhere an instance of the superclass can be used, but if it lacks one of the methods of the superclass, then it cannot be used anywhere that method is needed.

The __init__() method for AbstractSet does nothing when it is called—the pass statement performs no operation. Why is it present? It is there to honor the programming practice that a class instance ought to be given a chance to initialize itself. If at some future time we were to change AbstractSet so that it did need to perform some initialization, it is easier already to have the __init__() method and the subclasses already calling it.

Why have an AbstractSet? It is not essential in Python, although it would be in statically-typed object-oriented languages. It documents the operations that all sets must have. If you specify that an algorithm requires an AbstractSet, then that algorithm should use only the operations that AbstractSet provides. Since ListSet and DictSet are subclasses of AbstractSet, either of them can be provided to the algorithm and it will still work.

In object-oriented languages that use static typing, the AbstractSet class would be required to allow ListSet and DictSet objects to be used interchangeably. Variables and attributes would have to be declared with the AbstractSet class, and then objects of either subclass could be assigned to them. Python does not require this. Any object that has the required methods can be used. We could eliminate AbstractSet here if we were willing to duplicate the code for the insertAll(), removeAny(), union(), intersection(), and __str__() methods.

The reason that AbstractSet would be required in statically-typed languages, but not in Python, is that the compiler of a statically-typed language must know the value of every expression. You have to declare the types of variables and functions. The compiler would need these to check that you are performing only permissible operations and to figure out the data types of their results. So you would need the class AbstractSet in order to declare all the methods you could call for a set. This would allow you to declare a variable AbstractSet and assign either a ListSet or a DictSet to it and use them without knowing which one is there.

Python, however, doesn't know in general what kind of value a variable contains or whether an operation will work or not. All that's required is for Python to find the methods it's calling at run-time. So we didn't really need AbstractSet. If both ListSet and DictSet implement all the set operations, they can be used interchangeably.

However, ListSet and DictSet do not implement all the set operations. Some set operations, such as union and intersection, are implemented in AbstractSet. This demonstrates one of the most trivial uses for inheritance: code sharing.

The basis for the division of methods between those implemented in ListSet and DictSet on one hand and those implemented in AbstractSet on the other is this: ListSet and DictSet contain those methods that depend on the implementation of the set, on the kind of data structure it uses. AbstractSet implements those methods that are the same for all implementations.

A method can call other methods for the same object. If those methods are defined in different classes, two cases occur: up calls and down calls. If a method in a subclass calls a method in a superclass, it is called an up call (super to sub is interpreted as above to below). If a method in a superclass calls a method in a subclass, it is called a down call.

If you come from a non-object-oriented background, you may be saying, "I can see how an up call works. The subclass imports the superclass, so it knows the methods defined there. But how does the superclass know the names of methods defined in a subclass?" The question, however, assumes that the compiler must know what method is being called before the program runs. If you are calling a method on an object from outside the object's classes, you usually don't know what the actual class of the object will be. You just know it is supposed to have a method with a certain name, say M, that will do a certain kind of thing for you. At run-time, Python searches for method M in the class and superclasses of the object. It's exactly the same with the call self.M() within a method. Again, Python will take the actual class of the current object, self, and search that class and its superclasses for method M. Where will Python find M? Maybe in the same class the call is in, maybe in a superclass, maybe in a subclass. You don't know. You shouldn't have to care.

In each of ListSet and DictSet, there is an example of an up call. In the __init__() method there is a call of insertAll(), defined in AbstractSet, to initialize the set to the sequence of elements. It is in AbstractSet because it does not depend on the implementation of the set.

Method insertAll() contains a down call to insert(). Method insert() does depend on the representation of the set. At run-time this down call will either call the insert() in ListSet or the insert() in DictSet, depending on which type of set is present.

There are two other things to notice about the __init__() methods in ListSet and DictSet:

They call the __init__() method of AbstractSet, which is somewhat pointless, since it does nothing. This is considered a good programming practice. A class should be given the chance to initialize itself. Knowing that a class's initialization method does nothing is the sort of knowledge you shouldn't use. It shouldn't be part of the public definition of the class. It could be changed in some later release.
They initialize an attribute, rep, to the representation of a set. ListSet initializes it to an empty list. DictSet initializes it to an empty dictionary.

ListSet keeps the elements of the set in a list. It checks for the presence of an element with the in operator. It uses list's append() method to insert an element into the set and remove() to delete it. The members() method just returns a copy of the list. The new() method returns a new ListSet, while copy() returns a new ListSet with a copy of the current object's rep attribute.

DictSet keeps the elements as keys in a dictionary. To insert an element, the element is put into the dictionary with itself as its value. The value isn't actually important, only the key. It checks for the presence of an element by the dictionary's has_key() method. It deletes an element with a del statement. It gets a list of the members of the set using the dictionary's keys() method.

In both ListSet and DictSet, there are if statements to test for the presence of an element before removing it. These are necessary to avoid having Python raise an error if the element isn't present.²

AbstractSet has the code that can be common to all sets. The method insertAll() iterates over a sequence, inserting all the elements into the set. The call t.union(s) copies set t and then inserts all the elements of set s into it. The call t.intersection(s) uses new() to create a new set of the same class as t, and then inserts all the elements of t into it that are also in s.

Later in the book, we will look at object-oriented design patterns. There are two present here:

Factories—The new() method is a factory method. It manufactures a new set object. When it is called in AbstractSet, we don't know what kind of set it will create. Why do we have it? Because when we create an actual set, we must specify the actual class, but AbstractSet shouldn't have to know anything about the actual sets, only what is common to them. It is the subsets that know about, well, about themselves.
Template methods—The methods union() and intersection() are being used as template methods. They have the basic algorithm, but they are missing the details. These details are filled in by methods like contains() and insert(), which are defined in subclasses. The idea of a template method is that the superclass contains the general algorithm, but omits some details that are filled in by methods in a subclass. Thus the same algorithm can be implemented in several versions, sharing much of the code between them.

4.9 Critique

Now let's criticize the design of these set classes. On the positive side, they do make good use of object-oriented programming techniques, and they do allow more than one implementation of sets to be used interchangeably in the same program.

On the negative side, there are two points:

First, they are not complete. There ought to be a relative complement method to give a new set containing all the elements of one set that are not in another. Although it could be programmed, it's used a lot and it's logically one of the standard set operations.

Second, they do not have identical interfaces. You can put lists into other lists and search for them, but you cannot make list keys for hash tables, so there are operations that will succeed for ListSets that will fail for DictSets. In the following code, we create a ListSet and a DictSet and try to insert a list, [7,8], into each and look it up. We succeed only for the ListSet.

>>> import DictSet
>>> import ListSet
>>> y=ListSet.ListSet([4,5,6])
>>> x=DictSet.DictSet([1,2,3])
>>> x
{3, 2, 1}
>>> y
{4, 5, 6}
>>> y.insert([7,8])
{4, 5, 6, [7, 8]}
>>> y.contains([7,8])
1
>>> x.insert([7,8])
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "DictSet.py", line 14, in insert
    self.rep[x]=x
TypeError: unhashable type

4.10 Example: `BaseTimer`

Here is another example of a template method. Figure 4–8 gives the code for a class named BaseTimer. This class will help us time the execution of algorithms. To use it, we create a subclass containing the algorithm:

class alg(BaseTimer):...
    def __init__(self,...):...
    def xeq(self): ...#do algorithm

Figure 4–8 BaseTimer.

This subclass must contain a method named xeq(), which will actually execute the algorithm. The __init__() method, if any, can be used to save parameters for the trial, for example, the size of the data set to use.

To run the timing trial, create an instance of the subclass containing the algorithm, call its run() method, and then call its duration() method to get the time:

t=alg(N,....)
t.run()
print "run time for size",N,"is",t.duration()

Figure 4–9 shows a script, TimeListSet.py, to find the execution time of ListSet. There is another script to time DictSet, which is almost the same. The built-in function xrange() is like range(), but it does not construct an entire list. When used in a for statement, xrange() generates the elements that would be in the list created by range() with the same parameters. This script is executed with the command line python TimeListSet start end step, where start is the initial data set size, end is the terminating size, and step is the increment in size. Because these are converted to integers and passed to xrange(), data set size end is not included.

Figure 4–9 Script to time ListSet.

Here are the first two times given by TimeListSet:

TimeSet, size= 10000 , time= 18.6961543995
TimeSet, size= 20000 , time= 85.1229013743

And here are the first two given by TimeDictSet:

TimeSet, size= 10000 , time= 0.188048481022
TimeSet, size= 20000 , time= 0.356522127812

Clearly, for large set sizes, DictSets are a lot faster.

4.11 Inheritance As Classification

A common use for inheritance is the hierarchical classification of objects. Suppose there is one superclass that has many subclasses. Since all the subclasses have the same operations as their superclass, they can be used wherever the superclass is expected. The superclass, then, specifies what is common to all the subclasses, and the subclasses can indicate how they differ from the common characteristics. The superclass specifies the general kind of thing, and the subclasses specify variations. This fits in with how we define things. Actually, there are two common ways that we define things: using an abstract, Aristotelian definition or using a definition by reference to an example.

In an Aristotelian definition, we define a thing by the category of thing that it is and then by how it differs from the other things in that category. Using an Aristotelean definition, the superclass is the category of thing, and the methods and attributes in the subclass specify the differences from the category.

When you are using inheritance in this Aristotelian sense, the superclass is often an abstract class. An abstract class is something like a "bird," whereas the subclasses will be things like robins, penguins, and ostriches. An abstract class is not intended to be used itself to create objects, only to group together its subclasses, just as there is no instance of "bird" that is not some particular kind of bird.

When you implement an abstract class in Python, you often do not provide implementations for all the methods shared by members of the subclasses. Some methods have no behavior that is common to all instances.

AbstractSet, Figure 4–5, is an example of an abstract class. The superclass provides an interface that can have several implementations. The algorithms that use the objects don't need to know the implementation; they only need to know the interface. There are seven methods that are not defined in AbstractSet, but only in subclasses.

In other object-oriented languages like Java, AbstractSet would have to provide method signatures for the methods that are to be provided by the subclasses. Method signatures give the name of the method, the number and types of parameters, and the result type. The compiler needs this information to be able to compile calls to the methods.

In Python, there are no parameter types or result types to put in signatures, and methods are looked up at run-time. We did not have to put defs in AbstractSet for the seven methods. We did, however, put them in and made them all raise a NotImplementedError exception. If an instance of AbstractSet itself is created, a NotImplementedError will be raised when it is first used. If a subclass is coded without all the required methods, it will be raised as soon as a missing method is called. NotImplementedError was invented for precisely this purpose. It allows you to put defs for the required methods in the superclass, which is good for documentation, and it gives a more precise error message than that the attribute was not found. For an example of the error messages, here we create an instance of AbstractSet and try to call the copy() method and then to call a nonexistent remove() method.

>>> import AbstractSet
>>> x=AbstractSet.AbstractSet()
>>> x.copy()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "AbstractSet.py", line 9, in copy
    def copy(self): raise NotImplementedError,"set.copy()"
NotImplementedError: set.copy()
>>> x.remove(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'AbstractSet' instance has no attribute 'remove'

So, if you are defining classes the way Aristotle suggested, you have an abstract superclass and you create concrete subclasses of it—"concrete" meaning that all the details are filled in, all the methods are defined. But that's not the way we usually understand things. Mentally, we usually use a paradigm, an example instance, and relate other things to it. For example, most people in the U.S. seem to use the robin as a paradigmatic bird. Other birds are considered birds because they resemble robins: feathers, beaks, wings, nests, eggs, flying, and soon. What about penguins and ostriches? Well, they are a lot like robins—feathers, wings, beaks—but penguins swim instead of flying and they aren't big on nests. Ostriches run.

When you program using a paradigmatic definition, you use a concrete superclass that represents the example object, and concrete subclasses that represent the related things.

Would that have worked with ListSet and DictSet?

If we made ListSet the paradigm, it would try to have its own list, and then the DictSet would override that with its own dictionary. If we used different attribute names for the data structures, then each instance of a DictSet would have both a list and a dictionary. But we programmed both of them to use the attribute name rep for their data structures. That would save space in DictSet, since it could override ListSet's list with its own dictionary.

But is that safe? ListSet contains code that assumes it's manipulating a list. If any of that code is executed, the program will crash. So DictSet would have to override all of ListSet's list manipulation code. We wrote ListSet and DictSet in such a way that that could happen. All the data structure specific code is in separate methods that can be overridden. If we had been writing ListSet in isolation, would we have done that? Would we have been so careful? Probably not. And if we weren't careful, we would have to override all the methods, rather than being able to share code for union and intersection and the others. And even if there were a few methods in ListSet that didn't need to be overridden, would it be safe to use them? If someone changed the implementation of ListSet, it could break our code. In this case, at least, where we are providing different implementations, an AbstractSet class, designed to be overridden, is much better choice.

In Python, we can get around all this discussion of whether to inherit from an abstract superclass or a concrete one. We do not have to inherit at all. All we have to do is provide classes with the proper interface, the proper collection of methods. To adapt an old saying, if it looks like a duck and walks like a duck and quacks like a duck, I don't care whether it "really" is a duck: I'll treat it like a duck.

4.12 Multiple Inheritance

In Python, a class may inherit from more than one superclass. This is called multiple inheritance. It is not absolutely essential, but it does have a few uses.

For example, lists, being mutable, cannot be used as keys in dictionaries. But suppose we need to use lists as keys. Suppose we aren't interested in looking up lists by their contents, but by their identities; that is, when we look up the same list object, we want to find it, but when we look up a different list object with the same contents, we do not. Here's what we can do.

First, we create a class Hashable that provides the functions that a dictionary needs to use an object as a key.³ See Figure 4–10. The two methods are __hash__(), which a dictionary calls to decide where in the

Figure 4–10 Class Hashable in Hashable.py, pre-version 2.1.

hash table to start looking for the key, and __cmp__(), which it calls to compare two keys to see if they are equal. Both our methods will use the id() built-in function, which returns a different integer for each object. Any object inheriting these methods from Hashable will be placed in a hash table by its identity, rather than by its contents.

Now all we have to do is create a kind of list that inherits from Hashable. It would be nice to have a class that inherits from both Hashable and list, but Python's list data type is not a class. However, there is a trivial way around that. The Python library contains a module with a class UserList, which behaves exactly like a list. (It contains a list and passes all list operations on to it.) Since UserList is a class, we can inherit from it. So we create a class ListKey, Figure 4–11, that is both a list and a Hashable. All the methods are provided by its superclasses. It doesn't need any contents of its own.

Figure 4–11 Class ListKey in ListKey.py.

To understand what is happening with multiple inheritance, we need to understand the order in which Python searches the superclasses to find a method. It is no longer as simple as searching from a class to its one superclass along a chain until you find the method. Suppose a class has two superclasses and they both define the method. Which one gets used? Or does Python raise an exception if there are more than one?

To examine this, we are using a contrived example. Consider the collection of five classes given in Figure 4–12. A picture of the inheritance hierarchy is shown in Figure 4–13.

Figure 4–12 Diamond inheritance example.

Figure 4–13 Class hierarchy, diamond inheritance example.

Class E inherits from both C and D; C and D both inherit from B; and B inherits from A. Notice that both B and D define a method f(). Suppose we have an instance of an E and call method f(). Which one do we get? Let's try it:

>>> import abcde
>>> x=abcde.E()
>>> x.f()
f() in B

Okay, we get the f() method in B, even though the one in D is closer to E. In fact, the one in D lies between E and B. You might think that the definition of f() in D ought to hide the one in B from E.

The way it actually works, Python does a depth-first search for the method. When it comes to a class with more than one superclass, Python searches them and their superclasses one at a time, from left to right. So the search path from E would be E, C, B, A; then, after backing down to E, up again to D, B, and A. So, looking for f(), Python will examine E and C and then find it in B. Python won't look any further to see it in D. This leads some people to argue that the search order should be "depth first up to joins." Since the paths join at B, B wouldn't be searched until both C and D have been, but that's not how Python does it.

The contour diagram we used in Figure 4–4 won't work as easily here, since the box for one class is not enclosed in a single box. But we can use the contour model if we are willing to have classes included more than once. The boxes for the search path C-B-A would be included in the boxes for the search path D-B-A, as shown in Figure 4–14.

Figure 4–14 Contours for abcde.

4.13 Recapitulation of the Scope of Names

It is important to understand the scopes of names used in methods. There are unqualified names—in other words, variable names—and qualified names of the form object.attribute.

Unqualified names in methods are the same as unqualified names in other functions. They name parameters and local variables of the method, variables in the surrounding module, or built-in variables of the Python system. They never refer to attributes of the instance of the class or of the class itself. This is unlike many other object-oriented languages, so if you know them, you have to be careful using Python.

Your methods must get at attributes of the instance or the class using qualified names. Your method's first parameter will point to the instance your method is being called for. The custom is to call this parameter self, to remind you of its meaning. You get at all attributes of the instance using this parameter; that is, by self.name. When name isn't found in the attributes of the instance, Python searches the class of the instance and its superclasses using a depth-first, left-to-right search. If it finds a function in a class with the name you are searching for, Python gives you a bound method object that allows you to call the method for the current object. You can use this to make up calls or down calls, wherever the method is found in the class hierarchy. It's the same as calling the method from outside the object.

You can also look up functions directly in classes, which gives you an unbound method object. To call it, you have to provide it an instance as its first parameter. You use this to make explicit up calls to superclass implementations of overridden methods. A method uses this to call the method it is overriding.

4.14 Testing Objects and Classes

If you have a reference to something and you want to know if it is an instance of some class, you can use the isinstance(obj,c) built-in function. This will return true if the object obj is an instance of the class c, or any subclass of c, so if isinstance(x,AbstractSet):stuff will execute the stuff if x is an instance of a ListSet or a DictSet.

The function isinstance() works for types as well as for classes. If c is a type object, it will return true if obj is an object of that type. One way to get a type object is to use the type(x) built-in function, which will give you the type object for x's type: For example, if isinstance(x,type(1)):stuff will execute the stuff if x is an integer. Similarly, you can find type objects in the types module. If you have imported types, then if isinstance(x,types.IntType ):stuff will do the same thing.

Unlike many other object-oriented languages, classes in Python are not types. The type(x) call for instance x of class C will not give you C. (Jpython, an implementation of Python in Java, may be an exception to this.) All class instances are objects of type types.InstanceType. The way you can find x's class is by using its special attribute __class__. So if x.__class__ is AbstractSet :stuff will execute the stuff if x is an instance of a AbstractSet, but neither ListSet nor DictSet. A "special attribute" is an attribute that's built in to an instance object and is always present.

Classes themselves are objects of type types.ClassType. The types are of type types.TypeType. Figure 4–15 diagrams the relationships among instances, classes, and types.

Figure 4–15 Relationships among instances, classes, and types.

You can test classes' relationships in an inheritance hierarchy with the issubclass(c1,c2) built-in function, which returns true if class c1 is the same as c2 or if c1 inherits from c2.

4.15 Wrap-Up

In this chapter, we've seen the object-oriented features of Python. Python provides the essentials: classes, methods, and inheritance—indeed, multiple inheritance.

Python diverges from many other object-oriented languages in several ways:

No visibility restrictions: Python does not provide private, or other limited scopes, of attributes and methods. This means that encapsulation cannot be enforced, but must be programmed on the honor system. Name mangling can be used to indicate which methods and attributes are intended to be private, but it doesn't make them invisible or prevent access to them.
Absence of overloading: Python does not permit one of several methods with the same name to be called, depending on the numbers and types of parameters.
The diamond inheritance anomaly: Because of its strictly depth-first search for methods, Python allows a method in a superclass to override one in a subclass, rather than only allowing methods in subclasses to override methods from superclasses.
A single pool of instance attributes: All attributes used by all the classes in an inheritance hierarchy are put in a single dictionary. Attribute name collisions are a likely source of bugs. Name mangling can make it a bit easier to keep separate the attributes intended to be private.