Home > Articles > Open Source > Python

  • Print
  • + Share This
From the author of

Using Descriptors to Store Data

Let's look at a use of descriptors that in some ways is the complete opposite of what we've just seen. In some situations, we may prefer to store all or some of a class' data outside the class, while at the same time being able to access the data through instance attributes in the normal way.

For example, suppose we need to store large numbers of Book objects, each holding the details of a particular book. Imagine further that for some of the books we need to output the book's details as a bibliographic entry in the DocBook XML format, and that when such output is required once, it's very likely to be required again. One way of handling this situation is to use a descriptor to generate the XML—and to cache what it generates.

Here's a class that uses such a descriptor:

class Book:

    biblioentry = BiblioEntry()

    def __init__(self, isbn, title, forename, surname, year):
        self.isbn = isbn
        self.title = title
        self.forename = forename
        self.surname = surname
        self.year = year

No biblioentry data is held in Book instances. When the data is requested (for example, book.biblioentry) for the first time, the entry is computed; on the second and subsequent requests, the computed entry is returned immediately. Here's the descriptor's definition:

class BiblioEntry:

    def __init__(self):
        self.cache = {}

    def __get__(self, instance, owner=None):
        entry = self.cache.get(id(instance), None)
        if entry is not None:
            return entry
        entry = """<biblioentry><abbrev>{surname}{yr:02d}</abbrev>
<authorgroup><author><firstname>{forename}</firstname>
<surname>{surname}</surname></author></authorgroup>
<copyright><year>{year}</year></copyright>
<isbn>{isbn}</isbn><title>{title}</title>
</biblioentry>\n""".format(
        yr=(instance.year - 2000 if instance.year >= 2000
                                else instance.year - 1900),
        forename=xml.sax.saxutils.escape(instance.forename),
        surname=xml.sax.saxutils.escape(instance.surname),
        title=xml.sax.saxutils.escape(instance.title),
        isbn=instance.isbn, year=instance.year)
        self.cache[id(instance)] = entry
        return entry

Structurally, the code is quite similar to the previous example, but here we create a cache whose keys are unique instance IDs and whose values are XML biblioentry strings, suitably escaped. By using the cache we ensure that the expensive computation is done only once for each book for which it's needed. We chose to store IDs rather than the instances themselves, to avoid forcing the Book instances to be hashable.

By caching, we're trading memory for speed; whether that's the right tradeoff can only be determined on a case-by-case basis. Another issue to note: When using caching, if the data changes, some or all of the cache's contents become invalid. Since the details of published books don't change, it isn't a problem in this example, but if changes were common we must cope with them. One approach is to use a "dirty" flag and ignore the cache if instance.dirty is True; another approach is to access the cache itself and clear it.

You can provide access to an attribute's underlying descriptor by adding two lines at the beginning of the descriptor's __get__() method:

def __get__(self, instance, owner=None):
    if instance is None:
        return self
    # ...

If we had the above lines at the start of the BiblioEntry's __get__() method, we could clear the cache like this:

Book.biblioentry.cache.clear()

Now that we've seen how we can provide both computed and stored attributes using descriptors, let's look at a third use of descriptors: validation.

  • + Share This
  • 🔖 Save To Your Account