Home > Articles > Open Source > Python

📄 Contents

  1. Python 2.3
  2. Sets
  3. Bool
  4. Logging
  5. Miscellaneous Enhancements
  6. Conclusion
  • Print
  • + Share This
Like this article? We recommend

Miscellaneous Enhancements

As with any new release, there are a great deal of smaller enhancements to existing features and plenty of less important new features.

One new feature that I confidently expect few people to use, is the option to use a different character encoding for your Python code. For instance, by declaring the following at the top of your Python file, you can use the full Unicode range in your code:

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

It's generally a bad idea to use Chinese characters in your function names, for example, but in certain cases, it's very useful to be able to use literal Unicode characters in your strings. For instance, I've been working on an application that simulates the phonetic drift of words. For that I needed the full range of International Phonetic Alphabet glyphs, preferably recognizable in my code. This feature made that job a snip.

The new built-in function enumerate()can be used to loop over the len of objects that possess that method. This means that you no longer need to write this:

>>> list = ['a', 'b', 'c']
>>> for i in range(len(list)):
...     print i, list[i]
...
0 a
1 b
2 c

Instead, you can do this:

>>> for i, item in enumerate(list):
...     print i, item
...
0 a
1 b
2 c

The second snippet looks slightly cleaner. On the other hand, I'm not going to update all of my old code to make use of this: It's not that compelling. In fact, I doubt that it's compelling enough to give up backward compatibility for.

On the other hand, a change to the String class is so useful that I can see it being used regardless. You can now use the in operator to check whether a certain string is a substring of another string instead of just a single character:

Python 2.1.1 (#1, Oct 25 2001, 09:53:13)
[GCC 2.95.3 20010315 (SuSE)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> 'aa' in 'abcaa'
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: 'in <string>' requires character as left operand

Compare the previous code to this:

>>> 'aa' in 'abcaa'
True

A problem that comes up quite often on the Python newsgroup is the "strict" way that Python tries to encode a Unicode string into a byte string. Whenever Python comes across a character that cannot be encoded into the currently used encoding (most often ASCII because that's the default), it throws a wobbly:

>>> u = u"\u3456"
>>> s = str(u)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode 
character '\u3456' in position 0: ordinal not 
in range(128)
>>>

This happens more often you might think because even simple Latin-1 characters, such as a-acute or u-umlaut, will cause this error. Python offered two other strategies: skipping the offending character or replacing it with a question mark. However, now you can add your own callback that can do something sensible with the character, such as translate it to an XML entity. And that one is available by default, too. Batteries included…. The following snippet uses such a callback to translate unprintable characters into XML entities:

>>> codecs.register_error('xml', codecs.xmlcharrefreplace_errors)
>>> u = u"\u3456"
>>> s = str(u)
>>> s=codecs.getencoder('ASCII')(u, 'xml')
>>> s
('?'?', 1)
>>>

It would have been handier to be able to register the error handler with the codec so that functions such as str() would work with the error handler once set. Sadly, though, that isn't the case.

  • + Share This
  • 🔖 Save To Your Account