Advertisement

Pro Python 3 pp 161-223 | Cite as

Classes

  • J. Burton Browning
  • Marty Alchin
Chapter
  • 2.8k Downloads

Abstract

In Chapter  3 you reviewed how functions allow you to define code that can be reused. This allowed for general code streamlining by not having to retype "chunks" of code. However, it’s often more useful to combine those same functions into logical groupings that define the behavior and attributes of a particular type of object. This is standard object-oriented (OO) programming, which is implemented in Python by way of types and classes. These, like functions, may seem simple enough on the surface, but there’s a considerable amount of power behind them that you can leverage.

In Chapter  3 you reviewed how functions allow you to define code that can be reused. This allowed for general code streamlining by not having to retype “chunks” of code. However, it’s often more useful to combine those same functions into logical groupings that define the behavior and attributes of a particular type of object. This is standard object-oriented (OO) programming, which is implemented in Python by way of types and classes. These, like functions, may seem simple enough on the surface, but there’s a considerable amount of power behind them that you can leverage.

The most basic idea of a class is that it encapsulates the behavior of an object, whereas an instance of the class represents the data for the object. Therefore, even though data may well change from one instance to another, behavior determined by the underlying class will remain the same across those instances. Defining, extending, and altering that behavior is the focus of this chapter.

Inheritance

The simplest way to use classes is to define a single class for a single type of object. That works well for many simple applications, but you’re likely to find the need for finer-grained control over the behavior of objects. In particular, it’s common to have a single common set of behaviors for a large collection of objects, but you then need to modify them or add new ones for a smaller set of more specific objects.

To facilitate this, Python allows each class to specify one or more base classes that will provide the fundamental behavior. Then, the new class being defined can add new behaviors or override any existing ones. By default, all objects descend from the built-in object type, although that doesn’t really do anything useful on its own. It’s really just a foundation type that underpins the entire system, as everything else inherits from it.

Like most object-oriented languages, Python lets you define as many subclasses as you’d like for a given class, and you can subclass those as well, going as many levels deep as necessary. This vertical approach to inheritance is appropriate for most applications, because it maximizes the usefulness of the base classes. When a single, typically large, set of behaviors needs to be reused across a variety of other classes, vertical inheritance proves quite useful. Try a very simple Python class with an explicit constructor:

class Contact:
    def __init__(self, lName, fName): # explicit constructor for class
        self.lastName = lName
        self.firstName = fName
worker1 = Contact("Smith", "James")
print(worker1.lastName, worker1.firstName)

Python also has some built-in functions to modify your classes.. This is a peek ahead to the “Attributes” section of this chapter, but these functions are getattr(obj, name) to access the attribute of an object; setattr(obj, name, value) to set the attribute of an object; hasattr(obj, name) to check for existence; and, finally, delattr(obj, name) to delete an attribute in an object. Public properties are, of course, accessible once the object is created:

Open image in new window

class Contact:
    def __init__(self, lName, fName): # explicit constructor for class
        self.lastName = lName
        self.firstName = fName
worker1 = Contact('Smith', 'James')
print(worker1.lastName, worker1.firstName) # object.public_property
newLast=raw_input('Enter new last name: ')
setattr(worker1,'lastName',newLast) # set attribute with new value
print(worker1.lastName, worker1.firstName)
print(getattr(worker1, 'lastName')) # get existing attribute

As yet another example, consider a common scenario involving a contact management application. At the root of all else you would have a Contact class , because, by definition, everything in the application is a contact. It would have a set of fields and behaviors associated with it, which cover only those things that are pertinent to all contacts, according to the needs of your application:

class Contact:
    name = TextField()
    email = EmailAddressField()
    phone = PhoneNumberField()
    def send_mail(self, message):
        # Email sending code would go here

For now, don’t worry about the specifics of where each of the field classes come from or how they work in the application. If you’re interested, Chapter  11 demonstrates one possible framework for writing classes like this. The key for now is that each of the fields represents a single piece of data relating to the class at hand. Values might be provided by user input, results from a database query, or even random value generator; what’s important is the structure of the class and how subclasses will work with it.

Even with just a contact in place, you can create a useful application based on those core fields and behaviors. Providing additional features means adding support for different types of contacts. For instance, real people have a first name, last name, and perhaps a cell phone, whereas companies will often have only a single name and phone number. Likewise, companies will do business in particular industries, which wouldn’t make any sense in the case of individuals:

class Person(Contact):
    first_name = TextField()
    last_name = TextField()
    name = ComputedString('%(last_name)s, %(first_name)s')
    cell_phone = PhoneNumberField()
class Company(Contact):
    industry = TextField()

Now we have a basic hierarchy beginning to take shape. People are different from companies, and they each have different fields that are appropriate to each case. Python’s inheritance system automatically pulls the fields from the Contact class and makes them available on the Person and Company classes. You can subclass these as well, providing such Person types as Employee, Friend, and FamilyMember:

class Employee(Person):
    employer = RelatedContact(Company)
    job_title = TextField()
    office_email = EmailAddressField()
    office_phone = PhoneNumberField()
    extension = ExtensionField()
class Friend(Person):
    relationship = TextField()
class FamilyMember(Person):
    relationship = TextField()
    birthday = DateField()

Notice here that even though both Friend and FamilyMember have relationship fields that work identically to each other, FamilyMember doesn’t inherit from Friend. It’s not necessarily true that a family member will also be a friend, so the class structure reflects that. Each new subclass is automatically considered to be a more specific example of the class it extends, so it’s important that the inheritance scheme reflects the actual relationships being codified.

This may seem like a philosophical detail, but it has real ramifications in code as well. As will be shown in the “Introspection” section in this chapter, Python code can take a look at the inheritance structure of classes, so any mismatches can cause your code to confuse one type of class for another. The best way to avoid those problems is to think about how the objects you’re representing actually relate to one another and try to recreate those relationships in code.

Multiple Inheritance

Python also supports a horizontal approach to class inheritance, by allowing a subclass to define more than one base class at a time. This way, a class can obtain behaviors from many various classes without having to go several levels deep. Of course, that means taking a different logical approach because you’re no longer defining classes by increasing specificity. Instead, in some uses of multiple inheritance, you’re essentially building up each class as a set of components.

Building up classes like this is particularly well suited for applications in which your classes share some common behaviors but are not otherwise related to each other in a hierarchical manner. In order to make sense, this typically requires a large number of classes to be built from a reasonably large number of components. Because that’s not the way most applications are put together, it’s rarely used this way in the wild.

Instead, multiple inheritance is often called on to apply support classes, called mixins. Mixin classes don’t provide full functionality on their own; they instead supply just a small add-on feature that could be useful on a wide range of different classes. One example might be a mixin that returns None when you try to access any attribute that isn’t available on the object, rather than raising an AttributeError:

class NoneAttributes:
    def __getattr__(self, name):
        return None

The __getattr__() method, which will be described in more detail in the “Magic Methods” section later in this chapter, is called whenever an attribute is requested that isn’t available on the object. Because it works as a fallback, it’s an obvious choice for a mixin; the real class provides its own functionality, with the mixin adding onto that where applicable:

class Example(BaseClass, NoneAttributes):
      pass
e = Example()
e.does_not_exist

In typical applications, a vertical hierarchy will provide most of the functionality, with mixins adding some extras where necessary. Because of the potential number of classes involved when accessing attributes, it becomes even more important to fully understand how Python decides which class is used for each attribute and method that was accessed. To put it another way, you need to know the order in which Python resolves which method to use.

Method Resolution Order

Given a class hierarchy, Python needs to determine which class to use when attempting to access an attribute by name. To do this, Python has rules that govern how to order a set of base classes when a new class is defined. For most basic usage of classes you don’t really need to know how this works, but if you work with multilevel or multiple inheritance, the details in this section will help you understand what’s really going on.

In the simple vertical-only scenario, it’s easy to imagine how the Method Resolution Order (MRO) would be created. The class you’re actually working with would be first in line, followed by its base class, followed by the base class of the base class, and so on up the line until you get back to the root object type.

At each step in the chain, Python checks to see if the class has an attribute with the name being requested, and if it does, that’s what you get. If not, it moves on to the next one. This is easy to see with a simple example. Key this in from a prompt and try it:

>>> class Book:
...     def __init__(self, title):
...         self.title = title
...         self.page = 1
...     def read(self):
...         return 'There sure are a lot of words on page %s.' % self.page
...     def bookmark(self, page):
...         self.page = page
...
>>> class Novel(Book):
...     pass
...
>>> class Mystery(Novel):
...     def read(self):
...         return "Page %s and I still don't know who did it!" % self.page
...
>>> book1 = Book('Pro Python')
>>> book1.read()
'There sure are a lot of words on page 1.'
>>> book1.bookmark(page=52)
>>> book1.read()
'There sure are a lot of words on page 52.'
>>> book2 = Novel('Pride and Prejudice')
>>> book2.read()
'There sure are a lot of words on page 1.'
>>> book3 = Mystery('Murder on the Orient Express')
>>> book3.read()
"Page 1 and I still don't know who did it!"
>>> book3.bookmark(page=352)
>>> book3.read()
"Page 352 and I still don't know who did it!"

As you can see, when calling read() on a Mystery object, you get the method that’s defined directly on that class, while using bookmark() on that same class uses the implementation from Book. Likewise, Novel doesn’t define anything on its own—it’s just there to make for a more meaningful hierarchy—so all of the methods you have access to actually come from Book. To put it more directly, the MRO for Mystery is [Mystery, Novel, Book], while the MRO for Novel is simply [Novel, Book].

So what happens when you take a horizontal approach using multiple inheritance? For the sake of simplicity, we’ll start with just a single layer of inheritance for each of the supplied base classes so that it’s a purely horizontal approach. In this case Python goes from left to right, in the order the classes were defined as base classes. Here’s what the previous example looks like once we add a purchase() method , which would allow the user to buy a copy of the book. If you still have the previous terminal session open, try the next bit to add on to what we have done:

>>> class Product:
...     def purchase(self):
...         return 'Wow, you must really like it!'
...
>>> class BookProduct(Book, Product):
...     pass
...
>>> class MysteryProduct(Mystery, Product):
...     def purchase(self):
...         return 'Whodunnit?'
...
>>> product1 = BookProduct('Pro Python')
>>> product1.purchase()
'Wow, you must really like it!'
>>> product2 = MysteryProduct('Murder on the Orient Express')
>>> product2.purchase()
'Whodunnit?'

Thus far, each MRO has been very straightforward and easy to understand, even if you didn’t know what was going on behind the scenes. Unfortunately, things get more complex when you start combining both forms of inheritance. It doesn’t even take a very complicated example to illustrate the problem; consider what happens when you inherit from one class that has a base class of its own and a mixin that stands alone:

class A:
    def test(self):
        return 'A'
class B(A):
    pass
class C:
    def test(self):
        return 'C'

This is simple enough, but if you create a new class, D, which subclasses both B and C, what would happen if you call its test() method ? As always, it’s easy enough to test this out in the interactive interpreter, where you’ll see that the answer depends on which one you put first. Make sure you are in the same session, and have keyed in the aforementioned code, and then try the following to see the results:

>>> class D(B, C):
...     pass
...
>>> D().test()
'A'
>>> class D(C, B):
...     pass
...
>>> D().test()
'C'

On the surface, it seems easy to assume that Python simply goes depth first; it looks at the first base class and follows it all the way down, looking for the requested attribute, moving on to the next base class only when it can’t find what it needs. That observation is certainly true for this and many other cases, but it’s still not the whole story. What’s really going on takes the whole inheritance scheme into account.

Before clarifying the full algorithm, however, let’s get one thing out of the way. The first namespace Python looks at is always the instance object. If the attribute isn’t found there, it goes to the actual class that provides that object’s behavior. These two namespaces are always the first two to be checked, regardless of any inheritance structure that may be in use. Python try to locate it through class inheritance only if the attribute isn’t found there.

Rather than looking at the whole inheritance structure as a kind of tree, Python tries to flatten it out to a single list, with each class appearing just once. This is an important distinction because it’s possible for two base classes to subclass the same class deeper in the chain, but looking at that class twice would only cause confusion later on. To resolve this and other potential issues, there needs to be a single, flat list to work with.

The first step is to identify all the different paths that can be taken to get from a class to its basemost class. There will always be at least one path, even if there’s no base class, for two reasons. For one, the MRO for a given class always includes the class itself in the first position. This may seem obvious from earlier descriptions, but the rest of the algorithm will make it clear why this is important to state explicitly. Also, every class implicitly inherits from object, so that’s at the end of every MRO.

So, for just a simple class, A, which doesn’t inherit from anything, its MRO is just a simple two-element list: [A, object]. If you have another class, B, which subclasses A, its MRO becomes fairly obvious as well, being [B, A, object]. Once you introduce a bit of multiple inheritance, it’s possible for the same class to appear more than once in the overall tree, so we need some extra work in order to sort out the MRO.

Consider a new class, C, which inherits from both B and A. Now A shows up under two different branches and at two different distances from the new class, C.

Note

It might not make sense to do this because B already inherits from A. Remember, however, that you may not always know in advance what the base classes are doing behind the scenes. You might extend classes that were passed into your code from elsewhere or were generated dynamically, such as will be shown later in this chapter. Python doesn’t know how your classes are laid out, so it has to be able to account for all the possibilities.

>>> class A:
...     pass
...
>>> class B(A):
...     pass
...
>>> class C(B, A):
...     pass
...

The MRO for object is obviously just [object], and A has already been shown to be [A, object], as you would expect. B is clearly [B, A, object], but what about C? Looking at it depth-first, you might guess [C, B, A, object] once the duplicate A is removed. Taking a breadth-first (horizontal before vertical) approach, you’d come up with [C, A, B, object].

So which way does Python go? The truth is, neither of those is accurate; Python uses an algorithm called C3. This algorithm takes all the inheritance into account, reducing it by one layer at a time, until only a single list remains. At each level, C3 processes the class lists that were created for all of that level’s parent classes. Because of this, it starts at the most generic class, object, and continues outward from there.

With C in place, we can finally see how the algorithm works in detail. By the time Python encounters C, both A and B have already been processed, so their MROs are known. In order to combine them, C3 looks at the first class in each of the parent MROs to see if it can find a candidate for inclusion in the MRO for C. Of course, that begs the question of what exactly constitutes a valid candidate.

The only criteria used to identify a candidate class is whether it exists in only the first position in any of the MRO lists being considered. It doesn’t have to be in all of them, but if it’s present, it must be the first in the list. If it’s in any other position in any of the lists, C3 will skip it until its next pass. Once it finds a valid entry, it pulls that into the new MRO and looks for the next one using the same procedure.

Example: C3 Algorithm

Because algorithms are really just code, let’s put together a simple C3 function that will perform the necessary linearization—reducing the inheritance tree into a single list. Before diving into the full implementation, however, let’s first take a look at what the function call would look like, so we know what data it’ll be working with. For C, it would look like this:

C3(C, [B, A, object], [A, object], [B, A])

The first argument is the class itself, which is followed by the known MRO lists for its parent classes, in the order they were defined on the class. The last argument, however, is simply the list of parent classes themselves, without their full MROs. As will be shown in a slight modification of C later, this extra argument is necessary to resolve some ambiguities.

As with any function, there are a few boring details that need to be put in place before the real heavy lifting can be done. In the case of C3, there will be some modification of the MRO lists along the way, and we don’t want those modifications to affect the code that called the C3 function, so we have to make copies of them to work with. In addition, we need to set up a new list to contain the final MRO being generated by the algorithm:

def C3(cls, *mro_lists):
    # Make a copy so we don't change existing content
    mro_lists = [list(mro_list[:]) for mro_list in mro_lists]
    # Set up the new MRO with the class itself
    mro = [cls]
    # The real algorithm goes here.
    return mro

We can’t just use mro_list[:] here because that only copies the outer list. All the other lists that were contained inside that list would remain, so any modifications to them would be visible outside the function. By using a list comprehension and copying each of the internal lists, we get copies of all the lists involved, so they can be safely altered.

The Robustness Principle

If you’re already aware of Python’s copy module—or you’ve skipped ahead to Chapter  6—you may wonder why we don’t just use copy.deepcopy(mro_list) instead. At the very least, you may be wondering what that extra list(mro_list[:]) is for, because we’re passing in lists already. By explicitly casting each of the internal sequences to lists and wrapping it all in a list comprehension, we can allow the function to accept any valid sequence types, including tuples, which aren’t able to be modified after being created (like a constant perhaps). This makes the C3 function much more liberal in what it accepts.

With the housekeeping out of the way, we can move on to the main algorithm. Because we don’t know in advance how many classes are in each MRO, it’s best to wrap the main workload in a simple while True loop , which will execute indefinitely, so we can control its flow using break and continue. Of course, this means you shouldn’t try executing this code until a bit later on, until we have the necessary control code in place.

The first task inside that loop will be to loop over each MRO list, get its first class, and see if it’s in any position other than first in any of the other lists. If it is, that class isn’t a valid candidate yet and we need to move on to the first class in the next list. Here’s the loop necessary to perform those first steps:

import itertools
def C3(cls, *mro_lists):
    # Make a copy so we don't change existing content
    mro_lists = [list(mro_list[:]) for mro_list in mro_lists]
    # Set up the new MRO with the class itself
    mro = [cls]
    while True:
        for mro_list in mro_lists:
            # Get the first item as a potential candidate for the MRO.
            candidate = mro_list[0]
            if candidate in itertools.chain(*(x[1:] for x in mro_lists)) :
                # The candidate was found in an invalid position, so we
                # move on to the next MRO list to get a new candidate.
                continue
    return mro

The chain used here reduces all the non–first classes in all the MRO lists down to a single list, so it’s easier to test whether the current candidate is valid or not. Of course, the current code only responds if the candidate is invalid. If it wasn’t found in that chain, it’s a valid candidate and can be promoted to the final MRO right away.

In addition, we need to remove that candidate from the MRO list where it was found, as well as any of the others it might be found in. This is made a bit easier by the fact that we know it can only be the first item in any of the lists and that it won’t be in any of them that were already processed in this round. We can therefore just look at each of the remaining candidates and remove the class that was promoted. In any case, none of the other MRO lists should be processed for a new candidate this time around, so we also need to add a continue:

    while True:
    # Reset for the next round of tests
        candidate_found = False
        for mro_list in mro_lists:
            if not len(mro_list):
                # Any empty lists are of no use to the algorithm.
                continue
            # Get the first item as a potential candidate for the MRO.
            candidate = mro_list[0]
            if candidate_found:
                # Candidates promoted to the MRO are no longer of use.
                if candidate in mro:
                    mro_list.pop(0)
                # Don't bother checking any more candidates if one was found.
                continue
            if candidate in itertools.chain(*(x[1:] for x in mro_lists)) :
                # The candidate was found in an invalid position, so we
                # move on to the next MRO list to get a new candidate.
                continue
            else:
                # The candidate is valid and should be promoted to the MRO.
                mro.append(candidate)
                mro_list.pop(0)
                candidate_found = True

Note

Now that we’re removing items from the MRO lists, we also have to add in an extra bit of code to handle the situation in which one of the lists was completely emptied. Because there’s nothing of value in an empty list, the loop just moves on to the next one.

With the candidate selection now complete, the only things left are to tell the algorithm when its job is done and it should exit the loop. As it stands it will empty the lists completely, but continue looping through them forever, without ever returning the new MRO. The key to identifying this situation is that it will indeed empty all the lists. Therefore, we can check the remaining MRO lists to see if any classes remain. If not, it’s done and can end the loop:

    while True:
        # Reset for the next round of tests
        candidate_found = False
        for mro_list in mro_lists:
            if not len(mro_list):
                # Any empty lists are of no use to the algorithm.
                continue
            # Get the first item as a potential candidate for the MRO.
            candidate = mro_list[0]
            if candidate_found:
                # Candidates promoted to the MRO are no longer of use.
                if candidate in mro:
                    mro_list.pop(0)
                # Don't bother checking any more candidates if one was found.
                continue
            if candidate in itertools.chain(*(x[1:] for x in mro_lists)) :
                # The candidate was found in an invalid position, so we
                # move on to the next MRO list to get a new candidate.
                continue
            else:
                # The candidate is valid and should be promoted to the MRO.
                mro.append(candidate)
                mro_list.pop(0)
                candidate_found = True
        if not sum(len(mro_list) for mro_list in mro_lists):
            # There are no MROs to cycle through, so we're all done.
            # note any() returns false if no items so it could replace sum(len)
            break

This loop, inside the C3 function mentioned already, can successfully create an MRO for any valid Python inheritance scheme. Going back to the function call for the C class mentioned previously, we’d get the following result. Notice that we’re using strings here instead of the actual classes, to make it easier to illustrate. Nothing about the C3 algorithm is actually tied to classes anyway; it’s all just about flattening out a hierarchy that may contain duplicates:

>>> C3('C', ['B', 'A', 'object'], ['A', 'object'], ['B', 'A'])
['C', 'B', 'A', 'object']

That’s all well and good, but there’s another related situation that needs some attention as well: what happens when C inherits from A before B? One would logically assume that any attributes found on A would be used before those on B, even though B’s MRO puts B before A. That would violate an important consistency in class inheritance: the order of items in an MRO should be preserved in all of its future subclasses.

Those subclasses are allowed to add new items to their MRO, even inserting them in between items in the MRO of the base class, but all the MROs involved should still retain the same ordering they had originally. So when doing something like C(A, B), the correct result would actually be inconsistent with user expectations.

That’s why the C3 algorithm requires that the base classes themselves be added to the list of MROs that are passed in. Without them, we could invoke the C3 algorithm with this new construct and get the same result that was obtained with the original ordering:

>>> C3('C', ['B', 'A', 'object'], ['A', 'object'])
['C', 'B', 'A', 'object']
>>> C3('C', ['A', 'object'], ['B', 'A', 'object'])
['C', 'B', 'A', 'object']

Even though it seems like the two should do different things, they would actually end up doing the same thing. By adding in the extra class list at the end, however, the behavior of C3 changes a bit. The first candidate is A, which is found in the second position in the MRO of B, so A is skipped for this round. The next candidate is B, which is found in the list added in the final argument, so that’s skipped, too. When the final list is examined, A is skipped once again.

This means C3 completes a full loop without finding any valid candidates, which is how it detects inappropriate constructs like C(A, B). Without a valid candidate, no items are removed from any of the lists and the main loop would run again with exactly the same data. Without any extra handling for the invalid case, our current Python implementation of C3 will simply continue on indefinitely. It would be better to raise an exception. First, however, let’s validate this assumption by examining Python’s own behavior with C(A, B). Assuming that you keyed in the previous examples, try the following:

>>> class A:
...     pass
...
>>> class B(A):
...     pass
...
>>> class C(A, B):
...     pass
...
Traceback (most recent call last):
  ...
TypeError:  Cannot create a consistent method resolution
order (MRO)  for bases B, A

Sure enough, Python’s class system disallows this construct in an effort to force developers to only make classes that make sense. Duplicating this functionality in our own C3 class is fairly easy now that we know how to identify an invalid situation. All we have to do is check at the end of the loop and see whether a valid candidate was found. If not, we can raise a TypeError :

import itertools
def C3(cls, *mro_lists):
    # Make a copy so we don't change existing content
    mro_lists = [list(mro_list[:]) for mro_list in mro_lists]
    # Set up the new MRO with the class itself
    mro = [cls]
    while True:
        # Reset for the next round of tests
        candidate_found = False
        for mro_list in mro_lists:
            if not len(mro_list):
                # Any empty lists are of no use to the algorithm.
                continue
            # Get the first item as a potential candidate for the MRO.
            candidate = mro_list[0]
            if candidate_found:
                # Candidates promoted to the MRO are no longer of use.
                if candidate in mro:
                    mro_list.pop(0)
                # Don't bother checking any more candidates if one was found.
                continue
            if candidate in itertools.chain(*(x[1:] for x in mro_lists)) :
                # The candidate was found in an invalid position, so we
                # move on to the next MRO list to get a new candidate.
                continue
            else:
                # The candidate is valid and should be promoted to the MRO.
                mro.append(candidate)
                mro_list.pop(0)
                candidate_found = True
        if not sum(len(mro_list) for mro_list in mro_lists):
            # There are no MROs to cycle through, so we're all done.
            break
        if not candidate_found:
            # No valid candidate was available, so we have to bail out.
            break
            raise TypeError("Inconsistent MRO")
    return mro

With this last piece in place, our C3 implementation matches the behavior of Python’s own, covering all the bases. Most arbitrary class inheritance structures can be reduced to a valid MRO, so you typically don’t need to worry too much about how the algorithm works. There is one feature of classes, however—the super() function —that relies on the MRO extensively.

Using super( ) to Pass Control to Other Classes

One of the most common reasons to create a subclass is to override the behavior of some existing method. It could be as simple as logging every time the method is called, or as complex as completely replacing its behavior with a different implementation. In the case of the former, where you’re simply tweaking existing behavior, it’s quite useful to be able to use the original implementation directly so that you don’t have to reinvent the wheel just to make some minor changes.

To achieve this, Python supplies the built-in super() function , which is all too often misunderstood. The common explanation of super() is that it allows you to call a method on a base class within the overridden method on a subclass. That description works to a point, but before explaining it more fully let’s examine how it behaves in the simple case, to see what that even means:

class A(object):
    def afunction(self):
        print('afunction from Class A')
class B(A):
    def __init__(self):
        print('B is constructed!!!') # constructor for B
    def afunction(self):
        return super(B, self).afunction()
sample1=B()
print(sample1.afunction())

In this simple example, super() returns the base class of the method. To build on what we just read about, super() looks at the next class in the MRO, in this case class A. Notice that we say “overridden,” as we have two functions named afunction.

Next, consider an application that needs to create a dictionary that automatically returns None for any keys that don’t already have a value associated with them. This is fairly similar to defaultdict, but it doesn’t have to create a new value each time; it just returns None:

>>> class NoneDictionary(dict):
...     def __getitem__(self, name):
...         try:
...             return super(NoneDictionary, self). __getitem__(name)
...         except KeyError:
...             return None
...
>>> d = NoneDictionary()
>>> d['example']
>>> d['example'] = True
>>> d['example']
True

Before getting too much further, it’s important to realize what super() is really doing here. In some languages, super() is simply a language feature that gets compiled into some special code to access methods from other classes. In Python, however, super() returns an actual object, which has a set of attributes and methods that are based on where it was used.

From this simple example, it does seem that super() just provides access to a method on the base class, but remember that there can be any number of base classes involved, with more than one specified on each class. Given the complex nature of some inheritance structures, it should be clear by now that Python would use the MRO to determine which method to use. What may not be obvious, however, is which MRO is used when looking up the method.

Just looking at it, you might think that Python uses the MRO of the class where super() was used, which would be NoneDictionary in the example given here. Because most cases will look very much like that example, that assumption will be accurate enough to account for most cases. However, more complicated class hierarchies raise the question of what happens when the MRO gets changed in subclasses. Consider the following set of classes; however, start a new Python session, as these class definitions are a bit different than our first example:

>>> class A:
...     def test(self):
...         return 'A'
...
>>> class B(A):
...     def test(self):
...         return 'B->' + super(B, self). test()
...
>>> B().test()
'B->A'

In this example, using super() inside of B refers to its base class, A, as expected. Its test() method includes a reference to itself, so we’ll be able to see along the way if things change. Along with B, we could define another class, C, which also subclasses A. To illustrate things a bit better down the road, C will implement its own test() method, without using super():

>>> class C(A):
...     def test(self):
...         return 'C'
...
>>> C().test()
'C'

Of course, there’s nothing unusual or problematic about this so far, as it doesn’t interact with A or B in any way. Where things get interesting is when we create a new class, D, which subclasses both B and C. It doesn’t need a test() method , so we just leave its body blank, making it as simple as a class can be. Let’s see what happens to test() now:

>>> class D(B, C):
...     pass
...
>>> D().test()
'B->C'

Now we can finally see what’s going on. We can see that test() is called on B, causing its reference in the output, but when it calls super().test(), it refers to the method of C, rather than the one on A. If Python simply used the MRO of the class where the method was defined, it would reference A, not C. Instead, because it uses C, we can gain some insight into how super() really works.

In the most common case, which includes the usage shown here, super() takes two arguments: a class and an instance of that class. As our example has shown, the instance object determines which MRO will be used to resolve any attributes on the resulting object. The provided class determines a subset of that MRO, because super() only uses those entries in the MRO that occur after the class provided.

The recommended usage is to provide the class where super() was used as the first argument, and the standard self as the second argument. The resulting object will retain the instance namespace dictionary of self, but it only retrieves attributes that were defined on the classes found later in the MRO than the class provided. Technically, however, you could pass in a different class and get different results:

>>> class B(A):
...     def test(self):
...         return 'B->' + super(C, self). test()
...
>>> class D(B, C):
...     pass
...
>>> D().test()
'B->A'

In this example, where B actually references C in its invocation of super(), the resulting MRO skips C, moving straight onto A, which is shown by calling test() again. This is a dangerous thing to do in common practice, however, as shown when trying to use B on its own:

>>> B().test()
Traceback (most recent call last):
  ...
TypeError:  super(type, obj):  obj must be an instance or subtype of type

Because self isn’t a subclass of C in this case, C isn’t anywhere in the MRO, so super() can’t determine where it should start looking for attributes. Rather than creating a useless object that just throws an AttributeError for everything, super() fails when first called, providing a better error message.

Warning: Be Careful with Your Arguments

One common mistake when using super() is to use it on a method that won’t always have the same signature across all the various classes. In our examples here, the test( ) method doesn’t take any arguments, so it’s easy to make sure it’s the same across the board. Many other cases, such as __getitem__(), shown previously, are standard protocols that should never have their function signatures significantly changed by any subclass. Chapter  5 shows many of these cases in more detail.

Unfortunately you can’t always know what another class will do, so using super() can sometimes cause problems by providing the wrong arguments to the class given. Of course, this really isn’t any different than passing in an object that has a different protocol than what another function expects.

The reason it’s worth noting with super() is that it’s easy to assume you know what function you’re actually calling. Without a solid understanding of how MROs work and how super() determines which attributes to use, problems can seem to come up out of nowhere. Even with a thorough knowledge of these topics, however, the only real defense against such problems is an agreement among all the classes involved to not change method signatures.

Introspection

Given all the different inheritance options available, it’s appropriate that Python provides a number of tools to identify what structure a class uses. The most obvious introspection task for use with classes is to determine whether an object is an instance of a given class. This behavior is provided using the built-in isinstance() function , which takes any arbitrary object as its first argument and a Python class as its second argument. Only if the given class is anywhere in the inheritance chain of the object’s class will isinstance() return True:

>>> isinstance(10, int)
True
>>> isinstance('test', tuple)
False

A natural complement to isinstance() is the ability to determine whether one class has another class somewhere in its inheritance chain. This feature, provided by the built-in subclass() function , works just like isinstance(), except that it operates on a class rather than an instance of it. If the first class contains the second anywhere in its inheritance chain, issubclass() returns True:

>>> issubclass(int, object)
True
>>> class A:
...     pass
...
>>> class B(A):
...     pass
...
>>> issubclass(B, A)
True
>>> issubclass(B, B)
True

That last example may seem odd, as B clearly can’t be a subclass of itself, but this behavior is to remain consistent with isinstance(), which returns True if the type of the provided object is the exact class provided along with it. In a nutshell, the relationship between the two can be described using a simple expression, which is always true:

isinstance(obj, cls) == issubclass(type(obj), cls)

If you’d like more information about the inheritance structure for a particular class, there are a few different tools at your disposal. If you’d like to know what base classes were defined for a particular class, simply access its __bases__ attribute, which will contain those base classes in a tuple. It only provides the immediate base classes, however, without any of the classes that were extended deeper than that:

>>> B.__bases__
(<class '__main__.A'>,)

On the other side of the coin, every class also has a __subclasses__() method, which returns a list of all the subclasses of the class you’re working with. Like __bases__, this only goes one level away from the class you’re working with. Any further subclasses need to use some other mechanism to keep track of subclasses, some of which will be discussed later in this book:

>>> A.__subclasses__()
[<class '__main__.B'>]

If you’d like even more information and control, every class also has an __mro__ attribute, which contains the full MRO for that class, in a tuple. As mentioned previously, this also includes the actual class you pass in along with any of its parent classes. You might even try this on the first example with super() used earlier:

>>> B.__mro__
(<class '__main__.B'>, <class '__main__.A'>, <class 'object'>)

How Classes Are Created

Defining a class in Python works differently than in many other languages, although the differences are not always apparent. It seems quite simple: you supply a name, possibly a base class to inherit from, some attributes, and some methods. But when Python encounters that declaration, the process that takes place actually has more in common with functions than you may realize.

To start with, the body of a class declaration is a code block. Just like if, for, and while, the body of a class block can contain any valid Python code, which will execute from top to bottom. It will follow function calls, perform error handling, read files, or anything else you ask it to do. In fact, if blocks can be quite useful inside of a class declaration:

>>> try:
...     import custom_library
... except ImportError:
...     custom_library = None
...
>>> class Custom:
...     if custom_library is not None:
...         has_library = True
...     else:
...         has_library = False
...
>>> Custom.has_library
False

Tip

This example is useful for demonstration purposes only. If you’re looking to achieve the exact effect shown here, it’s much more pragmatic to simply assign the expression custom_library is not None directly to the has_library attribute. It returns a Boolean value anyway, so the end result is identical, but it’s a much more common approach to the task at hand.

After Python finishes executing the inner code, you’ll notice that has_library becomes an attribute of the class object that’s made available to the rest of your code. This is possible because Python’s class declarations work a little bit like functions. When a new class is found, Python starts by creating a new namespace for the block of code inside it. While executing the code block, any assignments are made in that new namespace. Then the namespace created is used to populate a new object, which implements the new class.

Creating Classes at Runtime

The previous section alluded to the fact that Python creates type objects while executing code, compiling and interpreting. As with nearly everything else that happens at runtime, you can hook into that process yourself and use it to your advantage. Doing so takes advantage of what Python does behind the scenes when encountering a class.

The really important stuff happens just after the contents of the class are processed. At this point Python takes the class namespace and passes it, along with some other pieces of information, to the built-in type(), which creates or “instantiates” the new class object. This means that all classes are actually subclasses of type(), which sits at the base of all of them. Specifically, there are three pieces of information that type() uses to instantiate a class:
  • The name of the class that was declared

  • The base classes the defined class should inherit from

  • The namespace dictionary populated when executing the class body

This information is all that’s necessary to represent the entire class, and even though Python obtains this information automatically by inspecting the class declaration, you can create a type by passing in these values directly.

The name is easiest, as it’s just a string with the name of the class. Base classes get slightly more involved, but they’re still fairly simple: just supply a sequence containing existing class objects that the new class should inherit from. The namespace dictionary is just that: a dictionary, which happens to contain everything that should be attached to the new class by name. Here’s an example of how the same class could be created in two different ways:

>>> class Example(int):
...     spam = 'eggs'
...
>>> Example
<class '__main__.Example'>
>>> Example = type('Example', (int,), {'spam': 'eggs'})
>>> Example
<class '__main__.Example'>

Don’t Repeat Yourself

You’ll notice that this example ends up having to write the name Example twice, which may seem to violate the DRY principle. Remember, however, that there are really two things going on here, and the two aren’t tied to each other. First, the class is being created, which requires us to supply a name. Second, the new class gets bound to a name in the namespace.

This example uses the same name for both operations, partly for convenience and partly for compatibility with the native class declaration above it. However, the namespace assignment is completely separate from class creation, so any name could be used. In fact, most of the time you won’t even know the name of the class in advance, so you’ll almost always use a different name in practice anyway.

Like most times, you have low-level access to a common feature, type(), which gives you plenty of chances to create problems. One of the three arguments to type() is the name of the class to create, so it’s possible to create multiple classes with the same name.

In addition, by passing in the attribute namespace, you can supply a new __module__ attribute to mimic its presence in a different module. It won’t actually put the class in the specified module, but it will fool any code that introspects the module later on. Having two classes with both the same name and module could potentially cause problems with tools that introspect modules to determine their structure and hierarchy.

Of course, it’s possible to encounter these problems even without using type() directly. If you create a class, assign it to a different name, and then create a new class with the same name as the original, you can have the exact same naming problem. Also, Python lets you supply a __module__ attribute within a standard class declaration, so you can even create clashes in code that’s not under your control.

Even though it’s possible to run into these problems without resorting to type() directly, the warning here is that type() makes it much easier to accidentally encounter problems. Without it, you’d have to write code that specifically exploits the preceding points in order to create naming conflicts. With type(), however, the values supplied might come from user input, customization settings, or any number of other places, and the code won’t look like it has any problems of this nature.

Unfortunately there are no real safeguards against these types of problems, but there are some things you can do to help reduce the risks. One approach would be to wrap all custom class creation inside of a function that keeps track of which names have been assigned and reacts appropriately when a duplicate is created. A more pragmatic option is simply to make sure any introspecting code is capable of handling a case where duplicates are encountered. Which approach to use will depend on the needs of your code.

Metaclasses

Thus far, classes have been defined as being processed by the built-in type, which accepts the class name, its base classes, and a namespace dictionary. But type is just a class like anything else; it’s only special in that it’s a class used to create classes—a metaclass. Like any other class, though, it can be subclassed to provide customized behavior for our application. Because the metaclass receives the full class declaration as soon as Python encounters it, you can unlock some pretty powerful features.

By subclassing type you can create your own metaclass, which can customize the creation of new classes to better suit the needs of your application. Like any class-based customization, this is done by creating a subclass of type and overriding any methods that make sense for the task at hand. In most cases, this is either __new__() or __init__(). The “Magic Methods” section later in this chapter will explain the difference between the two, but for this discussion we’ll just use __init__(), since it’s easier to work with.

As mentioned previously, type() takes three arguments, all of which must be accounted for in any subclasses. To start off simple, consider the following metaclass, which prints out the name of every class it encounters:

>>> class SimpleMetaclass(type):
...     def __init__(cls, name, bases, attrs):
...         print(name)
...         super(SimpleMetaclass, cls).__init__(name, bases, attrs)
...

This alone is enough to capture a class declaration. Using super() here makes sure that any other necessary initialization also takes place. Even though type doesn’t do anything in its own __init__(), remember from earlier in this chapter that this class could be part of a bigger inheritance structure. Using super() makes sure that the class gets initialized properly, regardless of what “properly” really means in the given context.

To apply this metaclass to a new class and print out its name, Python allows the class definition to specify a metaclass right alongside its parent classes. It looks like a keyword argument, but this isn’t a function call, so it’s actually part of the syntax of a class declaration. Here’s an example of how our SimpleMetaclass would work:

>>> class Example(metaclass=SimpleMetaclass):
...     pass
...
>>> Example

All that was needed here was to supply the metaclass in the class definition, and Python automatically ships that definition off to the metaclass for processing. The only difference between this and a standard class definition is that it uses SimpleMetaclass instead of the standard type.

Note

The first argument to the __init__() method on a metaclass is typically called cls, although you might think it should be self because __init__() operates an instance object, rather than a class. That’s true in general, and this case is actually no exception. The only difference here is that the instance is a class object itself, which is an instance of type, so using self would still be accurate. However, because of the differences between classes and objects, we still refer to class objects as cls, rather than self, so they stay well separated.

Metaclasses can be difficult to understand without real-world examples to illustrate their usefulness. Let’s take a look at how a simple metaclass can be used to provide a powerful framework for registering and using plugins.

Example: Plugin Framework

As an application grows flexibility becomes increasingly important, so attention often turns to plugins and whether the application can accommodate that level of modularity. There are many ways to implement plugin systems and individual plugins, but they all have three core features in common.

First, you need a way to define a place where plugins can be used. In order to plug something in, there needs to be a socket for the plug to fit into. In addition, it should be very obvious how to implement individual plugins along the way. Lastly, the framework needs to provide an easy way to access all the plugins that were found, so they can all be used. Other features may be added on top, but these are what make a plugin framework.

There are several approaches that would satisfy these requirements, but because plugins are really a form of extension, it makes sense to have them extend a base class. This makes the first requirement fairly simple to define: the point where plugins can attach themselves would be a class. As a class it takes advantage of Python’s own extension features, not only through the built-in subclass syntax but also by allowing the base class to provide some methods that constitute default functionality or offer help for common plugin needs. Here’s how such a plugin mount point might look for an application that validates user input:

class InputValidator:
    """
    A plugin mount for input validation.
    Supported plugins must provide a validate(self, input) method, which receives
    input as a string and raises a ValueError if the input was invalid. If the
    input was properly valid, it should just return without error. Any return
    value will be ignored.
    """
    def validate(self, input):
        # The default implementation raises a NotImplementedError
        # to ensure that any subclasses must override this method.
        raise NotImplementedError

Even without any of the framework-level code that makes the plugins work, this example demonstrates one of the most important aspects of an extensible system: documentation. Only by properly documenting a plugin mount can you expect plugin authors to correctly adhere to its expectations. The plugin framework itself doesn’t make any assumptions about what requirements your application will have, so it’s up to you to document them.

With a mount point written, individual plugins can easily be created simply by writing a subclass of the mount point that’s already in place. By providing new or overridden methods to satisfy the documented requirements, they can add their own little slice of functionality to the overall application. Here’s an example validator that ensures the provided input only consists of ASCII characters:

class ASCIIValidator(InputValidator):
    """
    Validate that the input only consists of valid ASCII characters.
    >>> v = ASCIIValidator()
    >>> v.validate('sombrero')
    >>> v.validate('jalapeño')
    Traceback (most recent call last):
      ...
    UnicodeDecodeError: 'ascii' codec can't decode character '\xf1' in position
    6: ordinal not in range(128)
    """
    def validate(self, input):
        # If the encoding operation fails, str.enc  ode() raises a
        # UnicodeDecodeError, which is a subclass of ValueError.
        input.encode('ascii')

Tip

Notice that this also provides its own documentation. Because plugins are also classes all their own, they can be subclassed by even more specialized plugins down the road. This makes it important to include thorough documentation even at this level, to help ensure proper usage later.

Now that we have two of the three components out of the way, the only thing left before tying it all together is to illustrate how to access any plugins that were defined. Because our code will already know about the plugin mount point, that makes an obvious place to access them, and as there could be anywhere from zero to hundreds of plugins, it’s optimal to iterate over them, without caring how many there are. Here’s an example function that uses any and all available plugins to determine whether some input provided by a user is valid:

def is_valid(input):
    for plugin in InputValidator.plugins:
        try:
            plugin().validate(input)
        except ValueError:
            # A ValueError means invalidate input
            return False
    # All validators succeeded
    return True

Having plugins means you can extend the functionality of even a simple function like this without having to touch its code again later. Simply add a new plugin, make sure it gets imported, and the framework does the rest. With that, we finally get around to explaining the framework and how it ties all these pieces together. Because we’re working with classes whose definitions specify more than just their behavior, a metaclass would be an ideal technique.

All the metaclass really needs to do is recognize the difference between a plugin mount class and a plugin subclass and register any plugins in a list on the plugin mount, where they can be accessed later. If that sounds too simple, it’s really not. In fact, the entire framework can be expressed in just a few lines of code, and it only takes one extra line of code on the plugin mount to activate the whole thing:

class PluginMount(type):
    """
    Place this metaclass on any standard Python class to turn it into a plugin
    mount point. All subclasses will be automatically registered as plugins.
    """
    def __init__(cls, name, bases, attrs):
        if not hasattr(cls, 'plugins'):
            # The class has no plugins list, so it must be a mount point,
            # so we add one for plugins to be registered in later.
            cls.plugins = []
        else:
            # Since the plugins attribute already exists, this is an
            # individual plugin, and it needs to be registered.
            cls.plugins.append(cls)

That’s all that’s necessary to supply the entire plugin framework. When the metaclass is activated on the plugin mount, the __init__() method recognizes that the plugins attribute doesn’t yet exist, so it creates one and returns without doing anything else. When a plugin subclass is encountered the plugins attribute is available by virtue of its parent class, so the metaclass adds the new class to the existing list, thus registering it for later use.

Adding this functionality to the inputValidator mount point described previously is as simple as adding the metaclass to its class definition.

class InputValidator(metaclass=PluginMount):
    ...

Individual plugins are still defined as standard plugins, without additional effort required. Because metaclasses are inherited by all subclasses, the plugin behavior is added automatically.

Controlling the Namespace

Metaclasses can also be used to help control how Python processes the class declaration. Rather than waiting for the class to be created before acting on it, another tactic is to process the raw components of the class while Python is going through them. This is made possible by a special metaclass called __prepare__().

By supplying a __prepare__() method on your metaclass, you can get early access to the class declaration. In fact, this happens so early that the body of the class definition hasn’t even been processed yet. The __prepare__() method receives just the class name and a tuple of its base classes. Rather than getting the namespace dictionary as an argument, __prepare__() is responsible for returning that dictionary itself.

The dictionary returned by __prepare__() is used as the namespace while Python executes the body of the class definition. This allows you to intercept each attribute as soon as it’s assigned to the class, so it can be processed immediately. Ordinarily this is used to return an ordered dictionary, so that attributes can be stored in the order they were declared within the class. For reference, take a look at how a metaclass would work without using __prepare__():

>>> from collections import OrderedDict
>>> class OrderedMeta(type):
...     def __init__(cls, name, bases, attrs):
...         print(attrs)
...
>>> class Example(metaclass=OrderedMeta):
...     b = 1
...     a = 2
...     c = 3
...
{'a': 2, '__module__': '__main__', 'b': 1, 'c': 3}

The default behavior returns a standard dictionary, which doesn’t keep track of how the keys are added. Adding a simple __prepare__() method provides all that’s needed to keep the ordering intact after the class is processed:

>>> class OrderedMeta(type):
...     @classmethod
...     def __prepare__(cls, name, bases):
...         return OrderedDict()
...     def __init__(cls, name, bases, attrs):
...         print(attrs)
...
>>> class Example(metaclass=OrderedMeta):
...     b = 1
...     a = 2
...     c = 3
...
OrderedDict([('__module__', '__main__'), ('B', 1), ('A', 2), ('c', 3)])

Note

The __module__ attribute is at the beginning of the attribute list because it gets added just after __prepare__() is called, before Python starts processing the body of the class.

With Great Power Comes Great Responsibility

By controlling the object used for the namespace dictionary, you can have a tremendous amount of control over how the entire class declaration behaves. Every time a line in a class references a variable or assigns an attribute, the custom namespace can intercede and change the standard behavior. One possibility is to provide decorators that can be used when defining methods within the class, without requiring a separate import to make them available to the class definition. Likewise, you can control how attributes are assigned by changing their names, wrapping them in helper objects, or removing them from the namespace completely.

This amount of power and flexibility can be easily abused to provide a level of magic not seen elsewhere. To a developer simply using your code without fully understanding how it’s implemented, it’ll look like Python itself is wildly inconsistent. Worse yet, any significant changes you make to the behavior of the class declaration could impact the behavior of other tools your users might try to combine with yours. Chapter  5 shows how you can enable these features by extending your dictionary, but be very careful when doing so.

Attributes

Once an object is instantiated, any data associated with it is kept within a new namespace dictionary that’s specific to that instance. Access to this dictionary is handled by attributes, which make for easier access than using dictionary keys. Just like dictionary keys, attribute values can be retrieved, set, and deleted as necessary.

Typically, accessing an attribute requires you to know the name of the attribute in advance. The syntax for attributes doesn’t offer the same flexibility as dictionary keys in providing variables instead of literals, so it can seem limited if you need to get or set an attribute with a name that came from somewhere else. Instead of offering a special syntax for working with attributes in this way, Python provides a trio of functions.

The first, getattr(), retrieves the value to which an attribute refers, given a variable that contains the name of the attribute. The next, setattr(), takes both the name of an attribute and its value and attaches that value to the attribute with the given name. Finally, delattr() allows you to delete an attribute value given the name as its argument. With these functions, you can work with any attribute on any object without knowing the attribute names when writing code.

Properties

Rather than only acting as a proxy to the standard namespace dictionary, properties allow attributes to be powered by methods that can access the full power of Python. Typically, properties are defined using the built-in @property decorator function. Applied to a method, it forces the method to be called whenever the function’s name is accessed as an attribute name:

>>> class Person:
...     def __init__(self, first_name, last_name):
...         self.first_name = first_name
...         self.last_name = last_name
...     @property
...     def name(self):
...         return '%s, %s' % (self.last_name, self.first_name)
...
>>> p = Person('Marty', 'Alchin')
>>> p.name
'Alchin, Marty'
>>> p.name = 'Alchin, Martin'  # Update it to be properly legal
Traceback (most recent call last):
  ...
AttributeError: can't set attribute

That last error isn’t terribly descriptive, but basically properties defined this way only retrieve attribute values, not set them. Function calls are only one way, so to set the value we’ll need to add another method that handles that side of things. This new method would accept another variable: the value that should be set on the attribute.

In order to mark the new method as the setter for a property, it’s decorated much like the getter property. Rather than using a built-in decorator, though, the getter gains a setter attribute that can be used to decorate the new method. This fits with the typical noun-based naming convention of decorators, while also describing which property will be managed:

>>> class Person:
...     def __init__(self, first_name, last_name):
...         self.first_name = first_name
...         self.last_name = last_name
...     @property
...     def name(self):
...         return '%s, %s' % (self.last_name, self.first_name)
...     @name.setter
...     def name(self, value):
...         return '%s, %s' % (self.last_name, self.first_name)
...
>>> p = Person('Marty', 'Alchin')
>>> p.name
'Alchin, Marty'
>>> p.name = 'Alchin, Martin'  # Update it to be properly legal
>>> p.name
'Alchin, Martin'

Just make sure that the setter method is named the same as the original getter method, or it won’t work properly. The reason for this is that name.setter doesn’t actually update the original property with the setter method. Instead, it copies the getter onto the new property and assigns them both to the name given to the setter method. Exactly what this means behind the scenes will be explained better in the next section on descriptors.

In addition to getting and setting values, a property can also delete the current value, using a decorator similar to the setter. By applying name.deleter to a method that only accepts the usual self, you can use that method to delete values from the attribute. For the Person class shown here, that means clearing out both first_name and last_name together:

>>> class Person:
...     def __init__(self, first_name, last_name):
...         self.first_name = first_name
...         self.last_name = last_name
...     @property
...     def name(self):
...         return '%s, %s' % (self.last_name, self.first_name)
...     @name.setter
...     def name(self, value):
...         return '%s, %s' % (self.last_name, self.first_name)
...     @name.deleter
...     def name(self):
...         del self.first_name
...         del self.last_name
...
>>> p = Person('Marty', 'Alchin')
>>> p.name
'Alchin, Marty'
>>> p.name = 'Alchin, Martin' # Update it to be properly legal
>>> p.name
'Alchin, Martin'
>>> del p.name
>>> p.name
Traceback (most recent call last):
  ...
AttributeError: 'Person' object has no attribute 'last_name'

Descriptors

One potential problem with properties is that they require all the methods to be defined as part of the class definition. It’s great for adding functionality to a class if you have control over the class yourself, but when building a framework for inclusion in other code, we’ll need another approach. Descriptors allow you to define an object that can behave in the same way as a property on any class to which it’s assigned.

In fact, properties are implemented as descriptors behind the scenes, as are methods, which will be explained in the next section. This makes descriptors perhaps one of the most fundamental aspects of advanced class behavior. They work by implementing any of three possible methods, dealing with getting, setting, and deleting values.

The first, __get__(), manages retrieval of attribute values, but unlike a property, a descriptor can manage attribute access on both the class and its instances. In order to identify the difference, __get__() receives both the object instance and its owner class as arguments. The owner class will always be provided, but if the descriptor is accessed directly on the class instead of an instance, the instance argument will be None.

A simple descriptor using only the __get__() method can be used to always provide an up-to-date value when requested. The obvious example, then, is an object that returns the current date and time without requiring a separate method call:

>>> import datetime
>>> class CurrentTime:
...     def __get__(self, instance, owner):
...         return datetime.datetime.now()
...
>>> class Example:
...     time = CurrentTime()
...
>>> Example().time
datetime.datetime(2009, 10, 31, 21, 27, 5, 236000)
>>> import time
>>> time.sleep(5 * 60) # Wait five minutes
>>> Example().time
datetime.datetime(2009, 10, 31, 21, 32, 15, 375000)

The related __set__() method manages setting a value on the attribute managed by the descriptor. Unlike __get__(), this operation can only be performed on instance objects. If you assign a value to the given name on the class instead, you’ll actually overwrite the descriptor with the new value, removing all of its functionality from the class. This is intentional, because without it, there would be no way to modify or remove a descriptor once it’s been assigned to a class.

Because it doesn’t need to accept the owner class, __set__() only receives the instance object and the value being assigned. The class can still be determined by accessing the __class__ attribute on the instance object provided, though, so there’s no information lost. With both __get__() and __set__() defined on a descriptor, we can do something more useful. For example, here’s a basic descriptor that behaves just like an attribute, except that it logs every time its value is changed:

>>> import datetime
>>> class LoggedAttribute:
...     def __init__(self):
...         self.log = []
...         self.value_map = {}
...     def __set__(self, instance, value):
...         self.value_map[instance] = value
...         log_value = (datetime.datetime.now(), instance, value)
...         self.log.append(log_value)
...     def __get__(self, instance, owner):
...         if not instance:
...             return self # This way, the log is accessible
...         return self.value_map[instance]
...
>>> class Example:
...     value = LoggedAttribute()
...
>>> e = Example()
>>> e.value = 'testing'
>>> e.value
'testing'
>>> Example.value.log
[(datetime.datetime(2009, 10, 31, 21, 49, 59, 933000), <__main__.Example object a
t 0x...>, 'testing')]

Before going on, there are a few important things to notice here. First, when setting a value on the descriptor, __set__() adds it to a dictionary on itself, using the instance as a key. The reason for this is that the descriptor object is shared among all the instances of the class it’s attached to. If you were to set the value to the descriptor’s self, that value would be shared among all those instances as well.

Note

Using a dictionary is just one way to make sure that instances are handled, but it’s not the best. It’s used here because the preferred method, assigning directly to the instance’s namespace dictionary, is only an option once you know the name of the attribute. Descriptors on their own don’t have access to that name, so the dictionary is used here instead. Chapter  11 shows an approach to address this problem based on metaclasses.

Also, notice that __get__() returns self if no instance was passed in. Because the descriptor works based on setting values, it has no additional value to contribute when called on the class. Most of the time, when a descriptor is in this situation it makes more sense to raise an AttributeError to prevent users from trying something that doesn’t make sense. Doing so here would mean the value log would never be available, so the descriptor returns itself.

In addition to getting and setting values, descriptors can also delete values from the attribute or the attribute itself. The __delete__() method manages this behavior, and because it only works on instances and doesn’t care about the value, it receives the instance object as its only argument.

In addition to managing attributes, descriptors are also used to implement one of the most important aspects of object-oriented programming: methods.

Methods

When a function is defined in a class, it’s considered to be a method. Even though it still works like a function in general, it has class information available to it because functions are actually descriptors as well. Within the category of methods, however, there are two distinct types: bound and unbound methods.

Unbound Methods

Because descriptors can be accessed from the class as well as its instances, methods can be accessed from both as well. When accessing a function on a class, it becomes an unbound method. The descriptor receives the class, but methods typically require the instance, so they’re referred to as unbound when accessed without one.

Calling it an unbound method is really more of a naming convention than any formal declaration. What you get when accessing the method on a class is just the function object itself:

>>> class Example:
...     def method(self):
...         return 'done!'
...
>>> type(Example.method)
<class 'function'>
>>> Example.method
<function method at 0x...>
# self isn't passed automatically
>>> Example.method()
Traceback (most recent call last):
  ...
TypeError: method() takes exactly 1 position argument (0 given)

It’s still callable just like any other standard function, but it also carries information about what class it’s attached to. Notice that the self argument in an unbound method isn’t passed automatically, as there’s no instance object available to bind to it.

Bound Methods

Once the class is instantiated, each method descriptor returns a function that’s bound to that instance. It’s still backed by the same function, and the original unbound method is still available on the class, but the bound method now automatically receives the instance object as its first argument:

>>> ex = Example()
>>> type(ex.method)
<class 'method'>
>>> ex.method
<bound method Example.method of <__main__.Example object at 0x...>>
# self gets passed automatically now
>>> ex.method()
'done!'
# And the underlying function is still the same
>>> Example.method is ex.method.__func__
True
# is and == have related yet different functionality and == could have replaced is in this instance, yet since is checks to see if two arguments refer to the same object versus == checks to see if two object have same value, is works better for our needs.

As you can see, bound methods are still backed by the same function as unbound methods. The only real difference is that bound methods have an instance to receive as the first argument. It’s important to realize also that the instance object is passed as a positional argument, so the argument name doesn’t need to be self to work properly, but it’s a well-established standard that you should follow whenever possible.

Tip

Because bound methods accept an instance as the first argument, method binding can be faked by explicitly providing an instance as the first argument to an unbound method. It all looks the same to the method, and it can be a useful approach when passing functions around as callbacks.

Sometimes, however, the method doesn’t need access to the instance object, regardless of whether the class has been instantiated. These methods fall into two separate types.

Class Methods

When a method only needs access to the class it’s attached to, it’s considered a class method, which Python supports through the use of a built-in @classmethod decorator. This ensures that the method will always receive the class object as its first positional argument, regardless of whether it’s called as an attribute of the class or one of its instances:

>>> class Example:
...     @classmethod
...     def method(cls):
...         return cls
...
>>> Example.method()
<class __main__.Example at 0x...>
>>> Example().method()
<class __main__.Example at 0x...>

Once the @classmethod decorator has been applied—see the section later in this chapter for information on decorators—the method() method will never receive an instance of Example as its first argument, but will always be the class itself or one of its subclasses. The cls argument will always be whatever class was used to call the method, rather than just the one where the method was defined.

Although it may not be clear from the previous example, class methods are actually bound instance methods, just like those described in the previous sections. Because all classes are actually instances of a built-in type, class methods are bound to the class itself:

>>> Example.method
<bound method type.method of <class '__main__.Example'>>

Class methods can also be created in another, slightly more indirect way. Because all classes are really just instances of metaclasses, you can define a method on a metaclass. All instance classes will then have access to that method as a standard bound method. There’s no need to use the @classmethod decorator , because the method is already bound to the class using the standard behavior described previously. Here’s how it works:

>>> class ExampleMeta(type):
...     def method(cls):
...         return cls
...
>>> class Example(metaclass=ExampleMeta):
...     pass
...
>>> Example.method
<bound method ExampleMeta.method of <class '__main__.Example'>>
>>> Example.method()
<class __main__.Example at 0x...>

The actual behavior of a method constructed this way is identical to a regular class method in most respects because they’re built the same way internally. They can be called from the class itself, rather than requiring an instance, and they always receive the class object as an implicit first argument. The difference, however, is that class methods can still be called from instances, whereas a bound class method can only be called from the class itself.

The reason for this behavior is that the method is defined in the metaclass namespace, which only puts it in the MRO of instances of that metaclass. All classes that reference the metaclass will have access to the method, but it’s not actually in their definitions. Methods decorated with @classmethod are placed directly in the namespace of the class where they’re defined, which makes them available to instances of that class as well.

Even though this difference in visibility seems like metaclass-based class methods are just an inferior version of standard decorated class methods, there are two reasons why they may be beneficial to an application. First, class methods are generally expected to be called as attributes of the class, and are rarely called from instance objects. That’s not a universal rule, and it’s certainly not enough to justify the use of a metaclass on its own, but it’s worth noting.

Perhaps more importantly, many applications that already use a metaclass also need to add class methods to any class that uses that metaclass. In this case, it makes sense to just define the methods on the existing metaclass, rather than using a separate class to hold the class methods. This is especially useful when that extra class wouldn’t have anything valuable to add on its own; if the metaclass is the important part, it’s best to keep everything there.

Static Methods

Occasionally, even the class is more information than is necessary for a method to do its job. This is the case for static methods, which are often implemented for the sake of establishing a namespace for functions that could otherwise be implemented at the module level. Using the staticmethod decorator, the method won’t receive any implicit arguments at any time:

>>> class Example:
...     @staticmethod
...     def method():
...         print('static!')
...
>>> Example.method
<function method at 0x...>
>>> Example.method()
static!

As you can see, static methods don’t really look like methods at all. They’re just standard functions that happen to sit in a class. The next section shows how a similar effect can be achieved on instances by taking advantage of Python’s dynamic nature.

Assigning Functions to Classes and Instances

Python allows most attributes to be overwritten simply by assigning a new value, which presents an interesting opportunity for methods:

>>> def dynamic(obj):
...     return obj
...
>>> Example.method = dynamic
>>> Example.method()
Traceback (most recent call last):
  ...
TypeError: dynamic() takes exactly 1 positional argument (0 given)
>>> ex = Example()
>>> ex.method()
<__main__.Example object at 0x...>

Notice here that the function assigned to the class still needs to be written to accept an instance as its first argument. Once assigned, it works just like a regular instance method, so the argument requirement doesn’t change at all. Assigning to instances works similarly in syntax, but because the function never gets assigned to a class, there’s no binding involved at all. A function assigned directly to an instance attribute works just like a static method that was attached to the class:

>>> def dynamic():
...     print('dynamic!')
...
>>> ex.method = dynamic
>>> ex.method()
dynamic!
>>> ex.method
<function dynamic at 0x...>

Magic Methods

Objects in Python can be created, manipulated, and destroyed in a number of different ways, and most of the available behaviors can be modified by implementing some extra methods on your own custom classes. Some of the more specialized customizations can be found in Chapter  5, but there are several of these special methods that are common to all types of classes. These methods can be categorized according to what aspect of classes they deal with, so the following sections each cover a few different methods.

Creating Instances

The transition from a class to an object is called instantiation. An instance is little more than a reference to the class that provides behavior and a namespace dictionary that’s unique to the instance being created. When creating a new object without overriding any special methods, the instance namespace is just an empty dictionary, waiting for data.

Therefore, the first method most classes implement is __init__(), with the purpose of initializing the namespace with some useful values. Sometimes these are just placeholders until more interesting data arrives, while at other times the interesting data comes into the method directly, in the form of arguments. This happens because any arguments passed in to the class instantiation get passed right along to __init__() along the way:

>>> class Example:
...     def __init__(self):
...         self.initialized = True
...
>>> e = Example()
>>> e.initialized = True
>>> class Example2:
...     def __init__(self, name, value=“):
...         self.name = name
...         self.value = value
...
>>> e = Example2()
Traceback (most recent call last):
  ...
TypeError: __init__() takes at least 2 positional arguments (1 given)
>>> e = Example2('testing')
>>> e.name
'testing'
>>> e.value
"

Like any Python function, you’re free to do whatever you like inside of __init__(), but keep in mind that’s intended to initialize the object, nothing more. Once __init__() has finished executing the object should be ready to be used for more practical purposes, but anything beyond basic setup should be deferred to other more explicit methods.

Of course, the real definition of initialization could mean different things to different objects. For most objects, you’ll only need to set a few attributes to either some default values or to the values passed in to __init__(), as shown in the previous example. Other times, those initial values may require calculations, such as converting different units of time into seconds, so everything’s normalized.

In some less common cases initialization may include more complicated tasks, such as data validation, file retrieval, or even network traffic. For example, a class for working with a web service might take an API token as its only argument to __init__(). It might then make a call to the web service to convert that token into an authenticated session, which would allow other operations to take place. All of the other operations require separate method calls, but the authentication that underlies all of them could happen in __init__().

The main concern with doing too much in __init__() is that there’s no indication that anything’s going on, short of documentation. Unfortunately, some users just won’t read your documentation no matter how hard you try; they may still expect initialization to be a simple operation, and they might be surprised to see errors if they don’t have a valid network connection, for example. See the example in the next section for one way to address this.

Even though __init__() is probably the most well-known magic method of all, it’s not the first that gets executed when creating a new object. After all, remember that __init__() is about initialization an object, not creating it. For the latter, Python provides the __new__() method, which gets most of the same arguments but is responsible for actually creating the new object prior to initializing it.

Rather than working with the typical instance object self, the first argument to __new__() is actually the class of the object being created. This makes it look a lot like a class method, but you don’t need to use any decorators to make it work this way—it’s a special case in Python. Technically, however, it’s a static method, so if you try to call it directly you’ll always need to supply the class; it will never be sent implicitly, like it would be if it were a true class method.

After the class parameter—typically named cls, like a regular class method—the __new__() method receives all the same arguments that __init__() would receive. Whatever you pass in to the class when trying to create the object will be passed along to __new__() to help define it. These arguments are often useful when customizing the new object for the needs at hand.

This is often different from initialization, because __new__() is typically used to change the very nature of the object being created, rather than just setting up some initial values. To illustrate, consider an example in which the class of an object can change depending on what values are passed in when creating it.

Example: Automatic Subclasses

Some libraries consist of a large variety of classes, most of which share a common set of data, but with perhaps different behaviors or other data customizations. This often requires users of the library to keep track of all the different classes and determine which features of their data correspond to the appropriate classes.

Instead, it can be much more helpful to provide a single class the user can instantiate which actually returns an object that can be of different classes depending on arguments. Using __new__() to customize the creation of new objects, this can be achieved rather simply. The exact behavior will depend on the application at hand, but the basic technique is easy to illustrate with a generic example.

Consider a class that picks a subclass randomly whenever it’s instantiated into an object. This isn’t the most practical use, of course, but it illustrates how the process could work. Using random.choice() to pick from the values available from using __subclasses__(), it then instantiates the subclass it finds, rather than the one defined:

>>> import random
>>> class Example:
...     def __new__(cls, *args, **kwargs):
...         cls = random.choice(cls.__subclasses__())
...         return super(Example, cls).__new__(cls, *args, **kwargs)
...
>>> class Spam(Example):
...     pass
...
>>> class Eggs(Example):
...     pass
...
>>> Example()
<__main__.Eggs object at 0x...>
>>> Example()
<__main__.Eggs object at 0x...>
>>> Example()
<__main__.Spam object at 0x...>
>>> Example()
<__main__.Eggs object at 0x...>
>>> Example()
<__main__.Spam object at 0x...>
>>> Example()
<__main__.Spam object at 0x...>

In another real-world example, you could pass in the contents of a file to a single File class and have it automatically instantiate a subclass whose attributes and methods are built for the format of the file provided. This can be especially useful for large classes of files, such as music or images, that behave similarly in most respects on the surface but have underlying differences that can be abstracted away.

Dealing with Attributes

With an object in use, one of the more common needs is to interact with its attributes. Ordinarily this is as simple as just assigning and accessing attributes directly, given their name, such as instance.attribute. There are a few cases in which this type of access isn’t sufficient on its own, so you need more control.

If you don’t know the name of the attribute at the time you write the application, you can supply a variable for the name if you use the built-in getattr() function instead. For example, instance.attribute would become getattr(instance, attribute_name), where the value for attribute_name can be provided from anywhere, as long as it’s a string.

That approach only handles the case in which you’re given a name as a string and you need to look up the instance attribute referenced by that name. On the other side of the equation, you can also tell a class how to deal with attributes it doesn’t explicitly manage. This behavior is controlled by the __getattr__() method .

If you define this method, Python will call it whenever you request an attribute that hasn’t already been defined. It receives the name of the attribute that was requested, so your class can decide what should be done with it. One common example is a dictionary that allows you to retrieve values by attribute instead of just using the standard dictionary syntax:

>>> class AttributeDict(dict):
...     def __getattr__(self, name):
...         return self[name]
...
>>> d = AttributeDict(spam='eggs')
>>> d['spam']
'eggs'
>>> d.spam
'eggs'

Note

A not-so-obvious feature of __getattr__() is that it only gets called for attributes that don’t actually exist. If you set the attribute directly, referencing that attribute will retrieve it without calling __getattr__(). If you need to catch every attribute regardless, use __getattribute__() instead. It takes the same arguments and functions just like __getattr__(), except that it gets called even if the attribute is already on the instance.

Of course, a dictionary that allows attribute access isn’t terribly useful if attributes are read-only. In order to complete the picture, we should support storing values in attributes as well. Even beyond this simple dictionary example, there are a variety of needs for customizing what happens when you set a value to an attribute. As expected, Python provides a parallel in the form of the __setattr__() method .

This new method takes an extra argument because there’s also a value that needs to be managed. By defining __setattr__(), you can intercept these value assignments and handle them however your application needs. Applying this to AttributeDict is just as simple as the previous example:

>>> class AttributeDict(dict):
...     def __getattr__(self, name):
...         return self[name]
...     def __setattr__(self, name, value):
...         self[name] = value
...
>>> d = AttributeDict(spam='eggs')
>>> d['spam']
'eggs'
>>> d.spam
'eggs'
>>> d.spam = 'ham'
>>> d.spam
'ham'

Tip

Just like getattr() provides for accessing attributes with a variable in place of a hardcoded name, Python provides setattr() for setting attributes. Its arguments match those of __setattr__(), as it takes the object, the attribute name, and the value.

Even though that might look like a complete picture of attribute access, there’s still one component missing. When you no longer have use for an attribute and would like to remove it from the object altogether, Python provides the del statement. When you’re working with fake attributes managed by these special methods, however, del on its own doesn’t work.

For dealing with this situation, Python hooks into the __delattr__() method if one is present. Because the value is no longer relevant, this method only accepts the name of the attribute along with the standard self. Adding this to the existing AttributeDict is easy:

>>> class AttributeDict(dict):
...     def __getattr__(self, name):
...         return self[name]
...     def __setattr__(self, name, value):
...         self[name] = value
...     def __delattr__(self, name):
...         del self[name]
...
>>> d = AttributeDict(spam='eggs')
>>> d['spam']
'eggs'
>>> d.spam
'eggs'
>>> d.spam = 'ham'
>>> d.spam
'ham'
>>> del d.spam
>>> d.spam
Traceback (most recent call last):
  ...
KeyError: 'spam'

Warning: Raise The Right Exception

This error message brings up an important point about working with these types of overridden attributes. It’s very easy to overlook how exceptions are handled inside your function, so you may end up raising an exception that doesn’t make any sense; if an attribute doesn’t exist, you would reasonably expect to see an AttributeError, rather than a KeyError.

This may seem like an arbitrary detail, but remember that most code explicitly catches specific types of exceptions, so if you raise the wrong type, you could cause other code to take the wrong path. Therefore, always make sure to raise AttributeError explicitly when encountering something that’s the equivalent of a missing attribute. Depending on what the fake attribute does, it might be a KeyError, IOError, or perhaps even a UnicodeDecodeError, for example.

This will come up at various points throughout this book and elsewhere in the real world. Chapter  5 covers a variety of protocols in which it’s just as important to get the exceptions right as the arguments.

String Representations

Of all the different object types that are possible in Python, easily the most common is the string. From reading and writing files to interacting with web services and printing documents, strings dominate many aspects of software execution. Even though most of our data exists in other forms along the way, sooner or later most of it gets converted to a string.

In order to make that process as simple as possible, Python provides an extra hook to convert an object to its string representation. The __str__() method, when implemented on a class, allows its instances to be converted to a string using the built-in str() function , which is also used when using print() or string formatting. Details on those features and more can be found in Chapter  7, but for now, look at how __str__() works in a simple class:

# First, without __str__()
>>> class Book:
...     def __init__(self, title):
...         self.title = title
...
>>> Book('Pro Python')
<__main__.Book object at 0x...>
>>> str(Book('Pro Python'))
'<__main__.Book object at 0x...>'
# And again, this time with __str__()
>>> class Book:
...     def __init__(self, title):
...         self.title = title
...     def __str__(self):
...         return self.title
...
>>> Book('Pro Python')
<__main__.Book object at 0x...>
>>> str(Book('Pro Python'))
'Pro Python'

The addition of __str__() allows the class to specify what aspects of the object should be displayed when representing the object as a string. In this example it was the title of a book, but it could also be the name of a person, the latitude and longitude of a geographic location, or anything else that succinctly identifies the object among a group of its peers. It doesn’t have to contain everything about the object, but there needs to be enough to distinguish one from another.

Notice also that when the expression in the interactive interpreter doesn’t include the call to str(), it doesn’t use the value returned by __str__() . Instead, the interpreter uses a different representation of the object, which is intended to more accurately represent the code nature of the object. For custom classes this representation is fairly unhelpful, only showing the name and module of the object’s class and its address in memory.

For other types, however, you’ll notice that the representations can be quite useful in determining what the object is all about. In fact, the ideal goal for this representation is to present a string that, if typed back into the console, would recreate the object. This is extremely useful for getting a feel for the objects in the interactive console:

>>> dict(spam='eggs')
{'spam': 'eggs'}
>>> list(range(5))
[0, 1, 2, 3, 4]
>>> set(range(5))
{0, 1, 2, 3, 4}
>>> import datetime
>>> datetime.date.today()
datetime.date(2009, 10, 31)
>>> datetime.time(12 + 6, 30)
datetime.time(18, 30)

This alternate representation is controlled by the __repr__() method , and is used primarily in cases just like this, to describe an object inside the interactive console. It’s automatically triggered when referencing an object on its own in the interpreter and is sometimes used in logging applications where __str__() often doesn’t provide enough detail.

For the built-ins such as lists and dictionaries, the representation is a literal expression that can reproduce the object easily. For other simple objects that don’t contain very much data, the date and time examples show that simply providing an instantiation call will do the trick. Of course, datetime would have to be imported first, but it gets the job done.

In cases in which the data represented by the object is too numerous to condense into a simple representation like this, the next best thing is to provide a string, surrounded in angle brackets, which describes the object in a more reasonable amount of detail. This is often a matter of showing the class name and a few pieces of data that would identify it. For the Book example, which in the real world would have many more attributes, it could look like this:

>>> class Book:
...     def __init__(self, title, author=None):
...         self.title = title
...         self.author = author
...     def __str__(self):
...         return self.title
...     def __repr__(self):
...         return '<%s by %s>' % (self.title, self.author or '<Unknown Author>')
...
>>> Book('Pro Python', author='Marty Alchin')
<Book: Pro Python by Marty Alchin>
>>> str(Book('Pro Python', author='Marty Alchin'))
'Pro Python'

Exciting Python Extensions: Iterators

An iterator is an object that can be iterated over; in other terms, you could say it is an “iterable” or “loopable” item. A list, tuple, and string are iterable; they hold more than one item, and thus are iterable containers. There are two iterator objects in Python. The first, a sequence iterator, works on an arbitrary sequence. The second object iterates over callable objects items with a sentinel value ending the process. Let’s see them in action to understand this a bit more.

A very simple example is an enhanced for loop which iterates over all of the items (you must have more than one) in the container. Consider the following:

my_string=('Hello Python!')
for item in my_string:
            print(item)
my_list=[1,2,3,4]
for item in my_list:
            print (item, end=' ')
#Note newline after printing is replaced with space
print()
my_tuple='Fred','Wilma', 1, 3
for item in my_tuple:
            print (item)

Now, if you had a text file in the same folder as the Python script, such as perhaps a CSV file with data, you could do something like the following:

for the_line in open("file.csv”):
                   print (the_line)

With Python iterators you can also combine structures for enhanced functionality. Do keep it readable. Note that we are looping through a string and counting the instances of the letter “b.”

#Combine control structures
my_string=('ababaaaabbbbaaaabb')
counter=0
for character in [char for char in my_string if char == 'b']:
            counter +=1
print('There were ', counter, ' letter b')

Another example might be Caesar cipher encryption:

#Secret message Ceasar cipher!
my_string = input('Type secret message:  ')
print (my_string)
new_string = ' '
z=input('How much to Ceasar shift by?  ')
for letter in my_string:
            x=ord(letter)
            t=x+int(z)
            print (chr(t),)

Now let’s look at the iteration protocol. The next function iterates with the first item and continues to the last, but it returns a StopIteration error when it tries to print item four, which is not present in the list:

# Simple iteration over a list
simple_list = [1, 3, 2]
simple_iter = iter(simple_list)
counter = 1
while counter <=4:
          print(next(simple_iter))
          counter +=1

Now, you could add try and except to keep this running, but this shows how things work at a general level. Time spent working with iterators will pay off well.

Taking It With You

A thorough understanding of classes unlocks a world of possibilities for your applications, whether they’re simple personal projects or large frameworks built for mass distribution. In addition to all this freedom, there is a set of established protocols that can allow your classes to work just like some of Python’s most well-known types.

Copyright information

© J. Burton Browning and Marty Alchin 2019

Authors and Affiliations

  • J. Burton Browning
    • 1
  • Marty Alchin
    • 2
  1. 1.Oak IslandUSA
  2. 2.Agoura HillsUSA

Personalised recommendations