The Three Python Tenors Sing "Threads" and "State of Mind"
Ordinarily, we interview book authors one-on-one, asking them about their expertise in a given technology and the trends they see in their area. This time, we put three authors in the same (virtual) room and gave them free rein to talk about anything related to Python. And oh boy, did they talk about Python! As you'll see in this article, Dave Beazley, Wesley Chun, and Mark Summerfield had plenty to share with Cameron Laird about how the 2.x and 3.x strains of Python compare, how to get the most performance out of Python, how newcomers — including young students — relate to Python, and the different ways their books present key topics.
In case you're unfamiliar with these authors:
David Beazley is the author of Python Essential Reference. He's also an independent developer, and has written a number of known packages such as Swig. "I also spend quite a bit of time doing Python training," he adds. All his Python material (including some he mentions here) can be found on his home page.
As you'll see, they had a comfortable and casual discussion.
Laird: One of the aspects of Python that interests me is how well it serves audiences that sometimes have no knowledge of each other. Python, for example, has enthusiastic following among some scientific programmers, some text-oriented people doing things like PLY, some folks new to programming, sysadmins, and so on. Dave, your Reference is in its fourth edition now; do you have any sense about how well it serves newcomers to Python?
Beazley: To be honest, I've never had a good sense for how the Essential Reference serves absolute newcomers. I always intended for the book to target professional programmers. Similar to a "K&R" [Kernighan and Ritchie's The C Programming Language] book for Python perhaps.
Laird: What's new in the Fourth edition?
Beazley: A major feature is almost a complete rewrite to handle Python 2.6/3.0 changes. Concurrent programming also plays a prominent role. There are major new sections that get into things like the new multiprocessing module, coroutines, and other advanced topics.
Summerfield: Huh! My Python 3 book is Python's K&R! I always thought the Essential Reference was more like Python in a Nutshell by Martelli.
Laird: You've got me laughing already. You've hit topics that each can support their own books, and about which we could chat for days.
Summerfield: I'm certainly looking forward to reading the new Essential Reference, for exactly those topics but also for its library coverage.
Beazley: I have to admit that I just jump right into the fray with the new edition. There's a practical example involving coroutines on something like page 20 of the intro.
Laird: You've been working through Python 2.x and 3.x in detail. If someone has a focus on multicore programming, is 2.x of interest, or should he push straight toward 3.x? I assume we all agree that multiprocessing is only going to become more important in day-to-day coding.
Beazley: The choice of Python versions is a really tough question. Both Python 2.x and 3.x support the same basic features and libraries in this regard.
Summerfield: Yes, like GvR (Guido van Rossum) and [Donald] Knuth etc., multiprocessing seems to be a better approach than threading; seems to me that threading pushes a whole layer of not-relevant-to-the-problem technical burden on the Python coder. Of course multiprocessing does that too, but it [Python multi-processing] seems less error prone [than threading].
Beazley: For any kind of CPU-bound work, multiprocessing is definitely what you want to consider. Threads often get a bad rap, but still work pretty well if you use them for I/O processing.
Laird: I regard that as the best brief summary: Threading is relatively error-prone, among the alternatives.
Summerfield: Haskell has a possible solution: software transactional memory...
Beazley: The thing I like about multiprocessing is that the whole approach is much more general purpose.
Summerfield: Yes, and that's a much better match for a VHLL [very high-level language] than threading, which is intrinsically low level. Python shouldn't ask the programmer to do more than express their problem (yeah unrealistic, but an aim).
Beazley: I have to admit that I'm a bit biased though. My background prior to Python was writing large message-passing applications. So, multiprocessing appeals to me in that sense.
Summerfield: I'm biased too: but that's because I find multithreaded programs much harder to debug.
Laird: What holds Python 3 back?
Summerfield: Python 2 (especially Python 2.5 and 2.6) is really excellent. Python 3 was a point-zero [release]; but now that it is .1 there's less excuse.
- Python 3.0's I/O was slow; that's now fixed.
- Python 3 is (a bit) incompatible so there are a few things to learn.
- Above all, key libraries haven't yet been converted, e.g. NumPy. (But PyQt has.)
So why switch?
- Python 3 brings plain text out of the closet. In Python 3, plain text always has an encoding, something that is true in the real world but which most of us have ignored for far too long.
- Python 3 has nicer and more consistent syntax with only one kind of class and one int type (and the latter sensibly promotes to float when non-integer division is used).
- Python 3 makes much more use of iterators (so is more efficient).
- Python 3 doesn't do random ordering of incompatible types like Python 2 does.
- Python 3 has comprehensions for lists, dicts, and sets, as well as non-empty set literals.
- Plus a zillion other improvements, such as str.format(), although quite a few of these have been backported.
Laird: How well do your readers absorb these cultural insights? Are they looking just for library references, or do they Search the Tao, or ...?
Summerfield: Yes, I get e-mails regularly, although not in proportion to sales. I get errata emails too, but fortunately my code is extracted from the source rather than cut/pasted into the PDF so I think I get fewer typos than most books.
Beazley: Well, I have to admit that I've never really thought about "cultural" insights in the context of the book.
Laird: I mean things like, not just threading, or how to work with the GIL [Global Interpreter Lock], but how to think about it. Your GIL talk certainly is an example.
Beazley: In some sense, I'm going to try to give balanced treatment to every approach (threads, multiprocessing, etc.) and talk about things to avoid. Regarding e-mails, I don't get tons of e-mails about the Essential Reference — except when something is broken. Ha!
Summerfield: I'm not sure how concerned the "average" Python programmer needs to be about the GIL. I hope the "average" programmer never really needs to know or care about it. Python is high level after all.
Beazley: I think a big part of the problem is that understanding a lot of that stuff can only really come from experience — and usually bad experiences. For instance, that GIL talk I gave was really aimed at the extreme high-end: someone who might have taken a graduate course in operating systems or written multithreaded apps. I completely agree with Mark that average Python programmers shouldn't have to ever concern themselves with such matters.
Laird: All true. Yes, one can write many, many fine Python applications without once thinking about GIL (or threading, or eval(), or lambda, or ...). But as we all know, there are times when they demand attention.
Summerfield: Yes, when you hit the GIL wall, sure you have to stop! "How to think about complicated problems" is what patterns are about, in some ways. Unfortunately, because Java "normalized" threading, some programmers feel they must use it, even when they don't need to or ought not to. But it's not surprising. I found "Foundations of multithreaded, parallel, and distributed programming" by Andrews way too hard.
Beazley: It's unfortunate that there is so much confusion about threads generally, though. For example, a lot of people simply think that "threads are evil" without ever giving it much thought. Then, they go running off into asynchronous I/O and all sorts of other madness — which has its own peculiar set of complexities (often greater than threads).
Laird: Bluntly, you three are among the people doing the most to correct that "it's unfortunate…" situation. Your books are just as you already said: You lay out the alternatives, and show how to make best advantage of each.
Summerfield: David's right that "threads are evil" is silly but doing threads right can be hard!
Beazley: Oh yes. Wicked hard! One thing that I think is really interesting is how a totally different community of programmers is now thinking about concurrent programming.
Summerfield: I recently read The Art of Multiprocessor Programming. It is pragmatic, uses Java throughout, but it is still a very demanding book.
Beazley: Twenty years ago, I was heavily involved in parallel computing. It was basically all Fortran all of the time. That's all anyone thought about. Now, it's this whole different set of problems and concerns. So, I think a lot of what's happening in Python, Ruby, Erlang, Scala, and other languages is really quite interesting.
Summerfield: Engineers in the university near me still use Fortran with parallel libraries.
Beazley: Oh definitely, I wouldn't suggest that the Fortran crowd has gone away. However, now virtually every programmer has access to multicore machines. That's pretty interesting.
Summerfield: Well it'll be interesting to see if the Google project on Python threading succeeds.
Beazley: Agreed on the Unladen Swallow project.
Summerfield: I think multiprocessors are what have pushed threading to the forefront. Did any of you read Tim Bray's weblog on "Processors"? He said, "Now that the best and the brightest have spent a decade building and debugging threading frameworks in Java and .NET, it's increasingly starting to look like threading is a bad idea; don't go there."
Most modern threading libraries push a huge burden — essentially of bookkeeping — onto the programmer. This is rather like the situation with pointers before garbage collectors became available. Now Python is a VHLL, so naturally it has garbage collection, but right now it doesn't offer any equivalent high level threading interface. The Qt library (4.4+) has gone some way to addressing this in the C++/Qt context with its QtConcurrent module while Haskell has gone the software transactional memory route. In Python we have the multiprocessing module, but I still hope that other high level approaches will come along that take good advantage of multiple cores and processors but at the same time handle the low-level bookkeeping chores.
Laird: Time-out; we're doing it again. That is, chatting about this lovely abstraction-rich stuff that intrigues us, at the same time as we agree that most programmers can live without thinking about it. The fact is, just as Mark already said, a far more common concern with them is, "How do I make a GUI ...?" My real stake is with readers. I like Python and I like theory, but my energy goes toward helping readers get the results they want from computers.
Beazley: Definitely agree that most programmers can live without thinking about concurrency.
Summerfield: Yeah, and sad to say most don't care about GUIs. Not on the desktop; most seem interested in web GUIs which aren't nearly so nice or versatile (although when a canvas is standardized it'll help). Python will deliver that better than any other language I've tried, and Python 3 is better than Python 2 for beginners.
Beazley: I would agree. One thing that interests me is the real extent to which most Python users use some of its advanced features.
Summerfield: But what is "advanced"? I don't think comprehensions are, but in my opinion, metaclasses are advanced.
Laird: Newcomers like comprehensions, indeed.
Summerfield: When I first wanted to teach python I couldn't present a uniform "If it has parentheses it is a function" because print was a statement! So print() actually helps start people off.
Beazley: I have to admit that I've never run into many questions about print when teaching.
Chun: Same here. Print usually comes off just fine as-is. My [teaching] slides are both Python 2 and 3 compliant now, however, and colorized in a way that attendees can quickly identify the differences.
Laird: Clean-up, regularity, uniformity: That's what Python 3 gives.
Summerfield: The library has (slightly) better names and organization.
Beazley: One issue that concerns me with Python 3 is the lack of libraries. It seems like a lot of users who come into Python are doing so because of some third party package (i.e., plotting, numpy, etc.).
Summerfield: It is the uniformity I wanted (and now have with Python 3).
Beazley: The fact that many libraries don't support Python 3 is definitely an issue.
Chun: I always recommend people start with Python 2 first, especially if you're in industry, which is usually several releases behind. Much less no dependencies being ported to Python 3 yet. However, if you're starting from scratch without dependencies, then you can go with 3.
Summerfield: Yes, PyQt is now Python 3 but the biggie I think is numpy which seems really widely used.
Beazley: I also always recommend that people start with Python 2 — especially for industry.
Summerfield: I recommend 3 if you don't need any non-ported libs to avoid the pain of the upgrade. But yes, one problem for Python 3 is that Python 2 is soooooo good!
Beazley: I still encounter groups using Python 2.3 when teaching classes — often because they're using some very specific package or library that requires it.
Chun: Exactly. At my last job, we were on 2.3 and switched to 2.4 before I left in November, 2008. At my current job, we're on 2.4, but we're using a custom version of Stackless, so porting becomes an issue.
Summerfield: The main wall I hit is just the libraries. Where's numpy, etc.?
Laird: One of the things Python's history should have taught us is that subtle shadings accumulate to make big differences. Python is a good language — but what makes it so involves many small things combined.
Chun: Fortunately for me, my introductory and intermediate course focuses on only stdlib stuff, so issues like NumPy don't show up that regularly. Although I do point out when stdlib mods/packages change names in 3.x.
Summerfield: I think one difference with my experience is that I'm involved with GUI apps. Python gets shipped as part of the app, so they can use the version they want. But I guess web developers are often stuck with whatever's on the server.
Beazley: Unfortunately, what's on the server is usually woefully out of date.
Laird: I'm fighting in my daytime job, incidentally, with a database adapter documented for 1.5. Python's cross-version compatibility is noteworthy — the libraries, though, can be a problem.
Beazley: I'm ashamed to admit that my own Internet hosting provider still only provides Python 2.2. Which I discovered after writing some small WSGI thing that I wanted to try out.
Summerfield: Well at least you can use Apache on your local machine or one of the Python lib's web servers.
Beazley: I have to admit that I've always been a little puzzled by forwards compatibility in Python though. I've rarely had any of my own code break going forward between different Python 2.x releases.
Chun: Well, SCons [a Python-based utility for managing application generation] is 1.5-compliant. And they intend on keeping it that way and "front-port" features as they need them.
Beazley: Maybe I just don't program anything interesting enough to break. :-)
Summerfield: There is a subtle Python 3.0 versus 3.1 difference, because floating point numbers come out different (you can have fewer digits in 3.1 with no loss of accuracy).
Beazley: Yes, I saw that change in 3.1. That is really interesting.
Chun: Well, it only shows fewer digits.
Summerfield: that breaks doctests... sure, it is a good change.
Beazley: I actually have to talk about that in classes. For instance, if you type x=3.4, why does the interpreter show 3.39999999999999999 or something.
Chun: Yeah, they changed the way it's represented. So far, I've been getting away with it by explaining the problem of using bits to represent repeating fractions, and then just telling attendees to convert them to strings if they want them to look "nicer."
Summerfield: You have to bang on about floating point issues anyway though (and that's not python-specific) so that people realize these are just approximations.
Beazley: Yes. I tell people that C and Java have the same representation — but you usually don't see it when printing.
Chun: I just give people the link for further reading.
Summerfield: I think it's actually useful, helps explain why using == isn't such a good idea for floats.
Laird: Do you do a sales job at that point? Python actually has great answers, because it's so much handier than in C or Java to write your own class to get the arithmetic you want, if you're say, in Accounting.
Summerfield: If you're in Accounting you surely don't want to use floats. :-)
Laird: In a world where people use Excel to...
Summerfield: Sure, and people think passwords are secure! :-)
Chun: Some people switch to using decimal.Decimal [a built-in Python class which, among other things, makes decimal sums come out even].
Beazley: Speaking of floating point, I recently saw a link to some bug report related to submitting a 6,000-digit floating point number into some kind of query related to Django.
Laird: 'Twould be fun to see a teach-off between the three of you.
Summerfield: I wonder if we'd all teach the same things? Or in the same order? Certainly not in the same way, since our cultures are so different.
Laird: Big Stories for Python: We've already touched on several aspects of performance and multicore. Are any of you directly in touch with unladen swallow or related projects? Do we know what the outcome will be?
Beazley: I'm not in direct contact with that project. However, I did give that insane GIL presentation that tried to look at some of the multicore problems.
Summerfield: the only one I closely follow is pyqt because Phil Thompson helped a lot with the book. We keep in touch, as he makes the Python 3 version more pythonic. I read (well, skimmed a bit of) that talk, loved the bit about for how long the GIL code had been left untouched.
Beazley: For me, the big Python story that most interests me is fixing I/O performance in Python 3.
Laird: How about the Ubuntu-OneComputerPerChild-PythonInEducation-... stream?
Summerfield: I'm a skeptic about children and computers. I think that computers will end up being seen as "cheap teachers" but I won't go on about teaching. I'm a qualified teacher but I hate how "education" is done (in England & Wales).
Chun: I'm keeping my ears to the ground as far as python-in-education is concerned.
Beazley: At the risk of blasphemy, the OLPC, education, etc. angle of Python doesn't hold my interest at all.
Chun: I haven't seen that yet, but then again, I'm only barely paying attention.
Beazley: I think it's great that students are learning with Python, but it's not really my interest to develop materials or software with that in mind.
Summerfield: In California they're ditching school books and going for e-books.
Chun: I'm in touch with a few people who have been nominated for posts in the US Department of Education but beyond pleasantries, haven't "done anything yet."
Summerfield: Yes, of course it is great that children, students, etc. learn to program — especially with Python.
Chun: Vern Ceder and Jeff Elkner, mainstays at Python conferences, delivered a talk yesterday in Washington DC at the NECC.
Laird: Are any of your books currently assigned for university classes?
Beazley: I'm not aware of any classes that require my book. Maybe for some advanced University classes, but I would hope that it would only be some kind of side-reference. At PyCON/UK last year, there were some interesting talks about Python in education. Specifically, how classes in Java were mostly producing students who didn't want to do any more programming.
Chun: My book is used in a few courses in colleges... both upper-division or grad specialty courses or lower-division undergrad programming courses. It seems that most computer science courses prefer a text like Zelle's.
Chun: I recommend Dave's and Alex Martelli's Nutshell as pure references in my courses.
Summerfield: My Python 3 book has been reviewed by the UK "higher education academy." Engineers will get Fortran; management science classes, if they do any programming, might get Visual Basic.
Chun: Here's a nice slide presentation I found yesterday describing Python and Computer Literacy. Also, for us non-Manning authors, the first Python father-son book was just published a few months ago. The title is very (deliberately) similar to "CP4E" ["computer programming for everyone," an initiative to present Python to non-programmers]. I don't know if [it's] intended or not; it looks like a more "user-friendly" version of the dummies book.
Summerfield: I'd seen that. That kind of thing is great, but I prefer targeting grown ups since I can make far more assumptions about prior knowledge (and teaching kids, especially in book form — if done well — ain't easy).
Chun: I love the Computer Literacy slides though, especially with all the famous references.
Beazley: I have to admit that I feel like I'm part of a generational gap with respect to learning programming — being part of that largely self-taught group that came out of early PCs (TRS-80, Apple II, etc.).
Chun: Hey, you forgot my Commodore CBM, Pet, Vic-20, 64, 128, and Amiga!
Beazley: Actually, one thing I'd like to hear from Mark and Wesley is their thoughts on Python 3 I/O. I have to say, it's the one feature of Python 3 that's really got my ire.
Chun: you mean the new I/O package/library? Like replacing files, *StringIO, etc.? I had to change my slides a bit because file objects no longer exist.
Summerfield: Okay, it was a mistake to let 3.0 out of the door as it was. But they've fixed it. (No, he means the slow speed, I think.)
Beazley: Well, partly it is the I/O library. But the performance of it.
Chun: Oh yeah, that's why 3.1 came out so quickly afterwards.
Summerfield: Oh, the naming is annoying, no more simple "file" objects, but instead io.Text thingys.
Beazley: Do you think 3.1 is going to be fast enough? I just recently played with 3.1. It's definitely faster — but still much slower than 2.6.
Summerfield: Re speed: I just don't know, haven't had time to test it. Oh, that's a pity.
Summerfield: But I still prefer Python 3! Because Python 3 no longer lets you lie to yourself about encodings. You either have bytes or strings with a specified encoding, and to me that's worth a lot.
Beazley: The thing that worries me about I/O is that I could see that being a topic that would prevent people from adopting it. Whether or not it's a good reason or not.
Chun: I haven't taken any performance measurements, i.e. Python 2.6 versus 3.1.
Laird: People keep asking, "Is your book obsolete because it's [about] Python 2.5?"
Beazley: My book is obsolete and it's not even out yet! :-)
Chun: I tell them No, I'm just not comfortable putting something in writing against a ".0" release of any software, much less Python. I'll let it settle for a little while first.
Summerfield: All our books are "out of date," but are they still useful?
Summerfield: The way to stop books getting out of date is to make Python x.y an ISO standard. ;-)
Chun: Well, I try to focus on "core language features" and don't try to put releases against each other, with the hope that the book takes longer to "obsolete." For example, a third edition of my book will be both 2.x and 3.x-compliant. Yes, at some point, we need "ANSI Python" or ISO or whatever standards bodies are out there.
Beazley: I'm not too worried about the book being out of date. There's a lot of great stuff in the new edition coming out.
Chun: I can't wait to get a copy of PER4e!
Summerfield: Okay, that's what David's done. But I wanted to avoid any confusion (yes, I'm looking forward to reading it).
Beazley: I have to say, the library part of the PER4e was hell.
Chun: I think Mark's Py3 book and mine overlap the most. There can't be that much crossing with your Qt book as I only have one small Qt snippet in my GUI section.
Summerfield: That's the bit I'm most looking forward to. My book's focus is on the language.
Beazley: A major complication was actually the Python 3.0 bytes/unicode separation. A tremendous amount of existing documentation (and older editions) are vague about whether or not they work with bytes or strings.
Chun: I spend very little time on delving into stdlib stuff. I only tell readers what they need to do to be able to complete the exercises at the end of every chapter.
Summerfield: Wow, I don't have any PyQt in my Py3 book.
I think the bytes/Unicode separation is a primary motive for switching to Python 3; it just doesn't let you get away with guessing. But there are issues. For example, as far as I know, subprocess uses the local 8-bit encoding and you can't change that (I reported it as a bug).
Beazley: The bytes/Unicode separation is ultimately a good thing. But man does it ever hammer on the library.
Chun: People have Unicode issues in Python 2 all the time. I got so fed up with it, that I wrote a blog posting just to combat it. Of course, I didn't truly understand the real problem until I was faced with it at work.
Beazley: Take almost any library function that manipulates "text" and it becomes a whole new thing in Python 3.
Summerfield: Yes, and the semantics: If you compare two strings with the same chars they might not compare as equal, and as for sorting strings... just look at the Unicode algorithm for that).
Laird: This has been an incredible conversation. Thanks so much for talking, guys!