InformIT

xMail: E-mail as XML

Date: Jun 27, 2003

Sample Chapter is provided courtesy of Prentice Hall Professional.

Return to the article

Take the steps toward converting your email to XML for more effective processing, archiving, and searching. This sample chapter gives a detailed outline of converting Unix mbox and Eudora mailing programs.

E-mail is a good example of a structured text format that can usefully be converted to XML for processing, archiving, and searching. In this chapter, we develop xMail–a Python application to convert e-mail to XML.

It is an unfortunate fact of life that e-mail systems differ in the way they store e-mail. Some store it in proprietary binary formats. The two e-mail notations we deal with in this chapter (Unix mbox and Eudora) are, thankfully, text based. On Linux, e-mail messages are stored so that each message begins with From:. If that sequence of characters happens to occur within the body of a message, it is escaped by being prefixed with a > character. The Eudora e-mail client begins each message with a sentinel string of the form From ???@???.

Although there are differences in the way Linux and Eudora store e-mail messages, there is a lot of commonality we can exploit in the conversion code. In particular, we can take advantage of the Python standard rfc822 module to do most of the work in parsing e-mail headers.

14.1 | The rfc822 Module

The term rfc822 refers to the standard for the header information used in Internet e-mail messages. The full specification can be found at http://www.ietf.org/rfcs/rfc0822.txt. The bulk of rfc822 is concerned with specifying the syntax for the headers that accompany the body of e-mail messages; headers such as from, to, subject, and so on. Python's rfc822 module takes a file object and puts as much of the content as it can parse into headers, according to the rules of rfc822.

The following program illustrates how the rfc822 module is used.

CD-ROM reference=14001.txt
"""
Simple program to illustrate use of Python's rfc822 module

"""

import rfc822,StringIO

email = """To: sean@digitome.com
From: paul@digitome.com
Subject: Parsing e-mail headers
Reply-To: Majordomo@allrealgood.com
Message-Id: <199902051120.EAA14648@digitome.com>

Sean,

Can Python parse this?

regards,
Paul
"""
fo =StringIO.StringIO(email)
m = rfc822.Message (fo)
print "<headers>"
for (k,v) in m.items():
      print "<%s>%s</%s>" % (k,v,k)
print "</headers>"
print "<body>"
print fo.read()
print "</body>"

The result of running this program is shown below.

CD-ROM reference=14002.txt
<headers>
<subject>Parsing e-mail headers</subject>
<from>paul@digitome.com</from>
<message-id><199902051120.EAA14648@digitome.com></message-id>
<reply-to>paul@digitome.com</reply-to>
<to>sean@digitome.com</to>
</headers>
<body>
Sean,

Can Python parse this?

regards,
Paul

</body>

14.2 | A Simple DTD for E-mail

Before going any further with parsing Linux or Eudora mailboxes, we need to settle on an XML representation of a mailbox. We will use the following simple DTD.

CD-ROM reference=14003.txt
<!--
XMail = A simple DTD for a collection of e-mail messages

An xMail file consists of zero or more message elements.

A message has a headers element that contains fields such as
from, to, subject, and so on. The body element houses
the text of the e-mail
-->

<!ELEMENT xmail   (message)*>
<!ELEMENT message (headers,body)>
<!ELEMENT headers (field)+>
<!ELEMENT field   (name,value)>
<!ELEMENT name    (#PCDATA)>
<!ELEMENT value   (#PCDATA)>
<!ELEMENT body    (#PCDATA)>

14.3 | -An Example of an E-mail Message in XML

Here is an example of an e-mail message that conforms to the xMail DTD.

CD-ROM reference=14004.txt
<?xml version="1.0"?>
<!DOCTYPE xmail SYSTEM "xmail.dtd">
<xmail>
<message>
<headers>
<field>
<name>subject</name>
<value>Greetings</value>
</field>
</headers>
<body>
Hello World
</body>
</message>
</xmail>

14.4 | Processing a Eudora Mailbox

The following code fragment shows the control structure required to process a Eudora mailbox into individual e-mail messages. The processing of each message has been delegated to the ProcessMessage function. This function is used by both the Linux and Eudora converters. Note how the sentinel string "From ???@???" is used to chop the contents of the mailbox into individual messages.

CD-ROM reference=14005.txt
def DoEudoraMailbox(f):
      # f is a file object.
      # Chop the contents of a Eudora mailbox
      # into individual messages for processing
      # by the ProcessMessage subroutine.
      Message = []
      L = f.readline()
      while L:
            if string.find(L,"From ???@???")!=-1:
                  -# Full message accumulated in the Message  # list, so process it to XML.
                  ProcessMessage(Message,out)
                  Message = []
            else:
                  # Accumulate e-mail contents line by line in
                  # Message list.
                  Message.append (L)
            L = f.readline()
      if Message:
            # Last message in the mailbox
            ProcessMessage(Message,out)

14.5 | Processing a Linux Mailbox

To process a Linux-style mailbox into individual messages, a different control structure is required. Note, however, that the processing of each individual e-mail is handled by ProcessMessage, which is common to both Linux and Eudora converters.

CD-ROM reference=14006.txt
DoLinuxMailBox(f):
      # f is a file object.
      L = f.readline()[:-1]
      if string.find(L,"From ")!=0:
            -print 'Expected mailbox "%s" to start with "From "' % MailBox
            return
      Message = []
      L = f.readline()
      while L:
            if string.find(L,"From ")==0:
                  -# Full message accumulated in the Message  # list, so process it to XML
                  ProcessMessage(Message,out)
                  Message = []
            else:
                  # Accumulate e-mail contents line by line in
                  # Message list.
                  Message.append (L)
            L = f.readline()
      if Message:
            # Last message in the mailbox
            ProcessMessage(Message,out)

14.6 | -Processing an E-mail Message by Using the rfc822 Module

The two functions DoLinuxMailBox and DoEudoraMailBox chop up mailboxes into individual messages that are processed by the ProcessMessage function. This function uses the rfc822 module to separate the headers from the body of the message.

CD-ROM reference=14007.txt
def ProcessMessage(lines,out):
      """
      Given the lines that make up an e-mail message,
      create an XML message element. Uses the rfc822
      module to parse the e-mail headers.
      """
      out.write("<message>\n")
      # Create a single string from these lines.
      MessageString = string.joinfields(lines,"")
      # Create a file object from the string for use
      # by the rfc822 module.
      fo = StringIO.StringIO(MessageString)
      m = rfc822.Message (fo)
      # The m object now contains all the headers.
      # The headers can be accessed as a Python dictionary.
      out.write("<headers>\n")
      for (h,v) in m.items():
            out.write("<field>\n")
            out.write("<name>%s</name>\n" % XMLEscape(h))
            out.write("<value>%s</value>\n" % XMLEscape(v))
            out.write("</field>\n")
      out.write("</headers>\n")
      out.write("<body>\n")
      out.write(XMLEscape(fo.read()))
      out.write("</body>\n")
      out.write("</message>\n")

Time to illustrate the program in action. The -l (Linux) or -e (Eudora) command-line switch tells the program what type of mailbox to process.

Here is a small Eudora mailbox.

CD-ROM reference=14008.txt

C>type test.mbx

From ???@??? Mon Sep 06 14:07:14 1999
To: sean@p13
From: Sean Mc Grath <sean@digitome.com>
Subject: Hello
Cc:
Bcc:
X-Attachments:
In-Reply-To:
References:
X-Eudora-Signature: <Standard>

World

From ???@??? Mon Sep 06 14:07:31 1999
To: sean@p13
From: Sean Mc Grath <sean@digitome.com>
Subject: Message 2
Cc:
Bcc:
X_Attachments:
In-Reply-To:
References:
X-Eudora-Signature: <Standard>
Hello
From ???@??? Mon Sep 06 14:13:41 1999
To: sean@p13
From: Sean Mc Grath <sean@digitome.com>
Subject: Message 2
Cc:
Bcc:
X-Attachments:
In-Reply-To:
References:
X-Eudora-Signature: <Standard>

From sean@digitome.com

Hello

The file can be converted to XML as follows.

CD-ROM reference=14009.txt

C>python xmail.py -e test.mbx

<?xml version="1.0"?>
<!DOCTYPE xmail SYSTEM "xmail.dtd">
<xmail>
<message>
<headers>
<field>
<name>subject</name>
<value>Hello</value>
</field>
<field>
<name>references</name>
<value></value>
</field>
<field>
<name>bcc</name>
<value></value>
</field>
<field>
<name>x-attachments</name>
<value></value>
</field>
<field>
<name>cc</name>
<value></value>
</field>
<field>
<name>in-reply-to</name>
<value></value>
</field>
<field>
<name>x-eudora-signature</name>
<value>&lt;Standard></value>
</field>
<field>
<name>from</name>
<value>Sean Mc Grath &lt;sean@digitome.com></value>
</field>
<field>
<name>to</name>
<value>sean@p13</value>
</field>
</headers>
<body>
World

</body>
</message>
<message>
<headers>
<field>
<name>subject</name>
<value>Message 2</value>
</field>
<field>
<name>references</name>
<value></value>
</field>
<field>
<name>bcc</name>
<value></value>
</field>
<field>
<name>x-attachments</name>
<value></value>
</field>
<field>
<name>cc</name>
<value></value>
</field>
<field>
<name>in-reply-to</name>
<value></value>
</field>
<field>
<name>x-eudora-signature</name>
<value>&lt;Standard></value>
</field>
<field>
<name>from</name>
<value>Sean Mc Grath &lt;sean@digitome.com></value>
</field>
<field>
<name>to</name>
<value>sean@p13</value>
</field>
</headers>
<body>
Hello

</body>
</message>
<message>
<headers>
<field>
<name>subject</name>
<value>Message 2</value>
</field>
<field>
<name>references</name>
<value></value>
</field>
<field>
<name>bcc</name>
<value></value>
</field>
<field>
<name>x-attachments</name>
<value></value>
</field>
<field>
<name>cc</name>
<value></value>
</field>
<field>
<name>in-reply-to</name>
<value></value>
</field>
<field>
<name>x-eudora-signature</name>
<value>&lt;Standard></value>
</field>
<field>
<name>from</name>
<value>Sean Mc Grath &lt;sean@digitome.com></value>
</field>
<field>
<name>to</name>
<value>sean@p13</value>
</field>
</headers>
<body>
From sean@digitome.com

Hello

</body>
</message>
</xmail>

Notice how the & character has been escaped to &amp; whenever it occurs in a header or the body of an e-mail message.

Here is a small, Linux-style mailbox.

CD-ROM reference=14010.txt

$cat test

From sean@digitome.com  Mon Sep  6 13:58:36 1999
Return-Path: <sean@digitome.com>
Received: from gateway ([100.100.100.105])
        by p13.digitome.com (8.9.3/8.8.7) with SMTP id NAA07403
        for <sean@p13>; Mon, 6 Sep 1999 13:58:36 GMT
Message-Id: <3.0.6.32.19990906140714.009b0ac0@p13>
X-Sender: sean@p13
X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.6 (32)
Date: Mon, 06 Sep 1999 14:07:14 +0100
To: sean@p13.digitome.com
From: Sean Mc Grath <sean@digitome.com>
Subject: Hello
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
World

From sean@digitome.com  Mon Sep  6 13:58:53 1999
Return-Path: <sean@digitome.com>
Received: from gateway ([100.100.100.105])
        by p13.digitome.com (8.9.3/8.8.7) with SMTP id NAA07407
        for <sean@p13>; Mon, 6 Sep 1999 13:58:52 GMT
Message-Id: <3.0.6.32.19990906140731.009b6a40@p13>
X-Sender: sean@p13
X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.6 (32)
Date: Mon, 06 Sep 1999 14:07:31 +0100
To: sean@p13.digitome.com
From: Sean Mc Grath <sean@digitome.com>
Subject: Message 2
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"

Hello

It can be converted to XML with the following command.

CD-ROM reference=14011.txt

$python xmail.py -l test

<?xml version="1.0"?>
<!DOCTYPE xmail SYSTEM "xmail.dtd">
<xmail>
<message>
<headers>
<field>
<name>subject</name>
<value>Hello</value>
</field>
<field>
<name>x-sender</name>
<value>sean@p13</value>
</field>
<field>
<name>x-mailer</name>
<value>QUALCOMM Windows Eudora Light Version 3.0.6 (32)</value>
</field>
<field>
<name>content-type</name>
<value>text/plain; charset="us-ascii"</value>
</field>
<field>
<name>message-id</name>
<value>&lt;3.0.6.32.19990906140714.009b0ac0@p13></value>
</field>
<field>
<name>to</name>
<value>sean@p13.digitome.com</value>
</field>
<field>
<name>date</name>
<value>Mon, 06 Sep 1999 14:07:14 +0100</value>
</field>
<field>
<name>mime-version</name>
<value>1.0</value>
</field>
<field>
<name>return-path</name>
<value>&lt;sean@digitome.com></value>
</field>
<field>
<name>from</name>
<value>Sean Mc Grath &lt;sean@digitome.com></value>
</field>
<field>
<name>received</name>
<value>from gateway ([100.100.100.105])
 by p13.digitome.com (8.9.3/8.8.7) with SMTP id NAA07403
 for &lt;sean@p13>; Mon, 6 Sep 1999 13:58:36 GMT</value>
</field>
</headers>
<body>
World

</body>
</message>
<message>
<headers>
<field>
<name>subject</name>
<value>Message 2</value>
</field>
<field>
<name>x-sender</name>
<value>sean@p13</value>
</field>
<field>
<name>x-mailer</name>
<value>QUALCOMM Windows Eudora Light Version 3.0.6 (32)</value>
</field>
<field>
<name>content-type</name>
<value>text/plain; charset="us-ascii"</value>
</field>
<field>
<name>message-id</name>
<value>&lt;3.0.6.32.19990906140731.009b6a40@p13></value>
</field>
<field>
<name>to</name>
<value>sean@p13.digitome.com</value>
</field>
<field>
<name>date</name>
<value>Mon, 06 Sep 1999 14:07:31 +0100</value>
</field>
<field>
<name>mime-version</name>
<value>1.0</value>
</field>
<field>
<name>return-path</name>
<value>&lt;sean@digitome.com></value>
</field>
<field>
<name>from</name>
<value>Sean Mc Grath &lt;sean@digitome.com></value>
</field>
<field>
<name>received</name>
<value>from gateway ([100.100.100.105])
 by p13.digitome.com (8.9.3/8.8.7) with SMTP id NAA07407
 for &lt;sean@p13>; Mon, 6 Sep 1999 13:58:52 GMT</value>
</field>
</headers>
<body>
Hello

</body>
</message>
</xmail>

14.7 | Sending E-mail by Using xMail

Having converted the e-mail to XML, we can process it in a variety of ways by using any XML-aware databases, editors, search engines, and so on. We can contemplate processing them with Python by using Pyxie or SAX- or DOM-style processing. One useful form of processing would be to send e-mail from this XML notation. In this section, we develop a Pyxie application, sendxMail, to do that.

Sending e-mail to a group of people at the same time is common, so we start by defining an XML notation for a mailing list. Here is a sample document conforming to a contacts DTD.

CD-ROM reference=14012.txt
<!DOCTYPE contacts SYSTEM "contacts.dtd">
<contacts>
<contact>
<name>Neville Bagnall</name>
<email>neville@digitome.com</email>
</contact>
<contact>
<name>Noel Duffy</name>
<email>noel@digitome.com</email>
</contact>
<contact>
<name>Sean Mc Grath</name>
<email>sean@digitome.com</email>
</contact>
</contacts>

The DTD for this is, of course, trivial.

CD-ROM reference=14013.txt

C>type contacts.dtd

<!--

Trivial DTD for a mailing list

-->

<!ELEMENT contacts (contact)*>
<!ELEMENT contact (name,email)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT email (#PCDATA)>

The full source code for sendxMail is given at the end of this chapter. The program uses the smtplib Python standard library. This library allows Python programs to send e-mail messages by talking to an SMTP server. Here is a small test program that illustrates how the smtplib module works.

CD-ROM reference=14014.txt
"""
Small test program to illustrate Python's standard smtplib library
"""
import smtplib

SMTPServer = "gpo.iol.ie"

#Create an SMTP server object.
server = smtplib.SMTP (SMTPServer)

#Turn debugging on.
server.set_debuglevel(1)

# Send an e-mail. First argument is the sender. Second argument
# is a list of recepients. Third argument is the text of the
# message.
server.sendmail (
  "From:sean@digitome.com",
  ["paul@digitome.com"],
  "Hello World")

To execute this program, change SMPTServer to point to a suitable SMTP server. The program will produce a lot of output because debugging has been turned on. The abridged output from an execution of this program is shown here.

CD-ROM reference=14015.txt

send: 'ehlo GATEWAY\015\012'
reply: '250-gpo2.mail.iol.ie Hello dialup-024.ballina.iol.ie
[194.125.48.152], pleased to meet you\015\012'
-reply: retcode (250); Msg: gpo2.mail.iol.ie Hello dialup_   024.ballina.iol.ie
[194.125.48.152], pleased to meet you
send: 'mail FROM:<sean@digitome.com> size=11\015\012'
reply: '250 <sean@digitome.com>... Sender ok\015\012'
reply: retcode (250); Msg: <sean@digitome.com>... Sender ok
send: 'rcpt TO:<paul@digitome.com>\015\012'
reply: '250 <paul@digitome.com>... Recipient ok\015\012'
reply: retcode (250); Msg: <paul@digitome.com>... Recipient ok
send: 'data \015\012'
reply: '354 Enter mail, end with "." on a line by itself\   015\012'
reply: retcode (354); Msg: Enter mail, end with "." on a line    by itself
data: (354, 'Enter mail, end with "." on a line by itself')
send: 'Hello World'
send: '\015\012.\015\012'
reply: '250 QAA03299 Message accepted for delivery\015\012'
reply: retcode (250); Msg: QAA03299 Message accepted for    delivery
data: (250, 'QAA03299 Message accepted for delivery')

To execute sendxMail, you provide it with four parameters:

A sample invocation is shown below.

CD-ROM reference=14016.txt
C>-python sendxMail.py sean@digitome.com PyxieList.xml  Welcome.xml gpo.iol.ie

14.8 | -Source Code for the SendxMail Application

CD-ROM reference=14017.txt
"""
sendxMail
XML Processing with Python
Sean Mc Grath

Send e-mail over the Internet to a group of e-mail accounts,
using the xmail XML representation.

The program connects to the specified SMTP server and uses
Python's smtplib library.

The list of addresses in also in XML.

A simple message file looks like this:

<xmail>
<message>
<headers>
 <field><name>subject</name><value>Greetings</value></field>
</headers>
<body>
Hello World
</body>
</message>
</xmail>

A simple address file looks like this:
                      
<contacts>
 <contact>
  <name>Neville Bagnall</name>
  <email>neville@digitome.com</email>
 </contact>
</contacts>

Sample invocation:
      -python sendxmail.py sean@digitome.com contacts.xml    email.xml
gpo.iol.ie
"""
import smtplib
from pyxie import *

# Class uses event-driven XML processing style to send messages
# one at a time and so inherits from xDispatch.
class xMailSender(xDispatch):
      -def __init__(self,Sender,MailingListFile,MessageFile,   SMTPServer):
            -# PYX source for later event dispatching is the  # message file
            xDispatch.__init__(self,File2PYX(MessageFile)).
            # The Gathered variable is used to gather characters
            # arriving in the data handler method between certain
            # start- and end-tags.
            self.Gathered = []
            self.Sender = Sender
            self.Addresses = []
            self.MessageFile = MessageFile
            # Accumulated message header
            self.MessageHeader = ""
            # Accumulated message body
            self.MessageText = ""

            self.Recepients = []
            self.server = smtplib.SMTP( SMTPServer )
            self.server.set_debuglevel(1)

            # Use tree-processing style to assemble list of
            # recipients.
            T = File2Tree(self.MessageListFile)
            for n in T:
                  T.Seek(n)
                  if T.AtElement("email"):
                        email = T.JoinData(" ")
                        self.Addresses.append(email)

                  # Invoke event dispatching to handler methods
                  # PYX source is the message file.
                  self.Dispatch()

            def start_body(self,etn,attrs):
                  # Reset gathered data for each message body.
                  self.Gathered = []
            def end_body(self,etn,attrs):
                  # Save gathered data as message body.
                  self.messageText = string.join(self.Gathered)
            def start_name(self,etn,attrs):
                  # Reset gathered data for each name element.
                  self.Gathered = []
            def start_value(self,etn,attrs):
                  # Reset gathered data for each value element.
                  self.Gathered = []
            def end_name(self,etn):
                  -# Save gathered data as header field  # recipient name.
                  self.fieldname = string.join(self.Gathered)

            def end_value(self,etn):
                  # Save gathered data as header field value.
                  -self.fieldvalue = string.join(self.   Gathered)
                  -# Add the new name/value pair to the end of  # the message header.
                  -self.MessageHeader = self.MessageHeader +  self.fieldname + ": " + self.fieldvalue + "\n"
            def characters(self,str):
                  -# Handler for character data. Accumulate  # data in the Gathered variable. 
Various # end-tag handlers copy out the accumulated # contents as needed. self.Gathered.append (PYXDecoder(str)) def end_body(self,etn): -# At this point, we have everything we need # to send e-mail. -self.MessageText = string.join (self .Gathered) self.server.sendmail (self.Sender, -self.Addresses, self.MessageHeader+"\n"+ self.MessageText ) # Close down the SMTP connection. self.server.quit() if __name__ == '__main__': import sys if len(sys.argv)==1: xMailSender ("sean@digitome.com", "contacts.xml", "email.xml", "gpo.iol.ie") else: xMailSender (sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4])

14.9 | -Source Code for the xMail Application

CD-ROM reference=14018.txt

"""
xMail
Convert mailboxes to a simple XML form for
e-mail messages.
XML Processing with Python
Sean Mc Grath
The Eudora e-mail client stores e-mail messages in mailboxes.
The file format is plain text. Individual messages are separated by the string "From ???@???".
This program processes a mailbox creating an XML file
that conforms to the xmail DTD.
"""
# Import some standard modules
# rfc822 is the module for e_mail header parsing
import string,rfc822,StringIO
LINUX  = 0
EUDORA = 1
def XMLEscape(s):
      """
      Escape XMLs two special characters which may
      occur within an e-mail message.
      """
      s = string.replace(s,"&","&amp;")
      s = string.replace(s,"<","&lt;")
      return s
def ProcessMessage(lines,out):
      """
      Given the lines that make up an e-mail message,
      create an XML message element. Uses the rfc822
      module to parse the e_mail headers.
      """
      out.write("<message>\n")
      # Create a single string from these lines.
      MessageString = string.joinfields(lines,"")
      # Create a file object from the string for use
      # by the rfc822 module.
      fo = StringIO.StringIO(MessageString)
      m = rfc822.Message (fo)
      # The m object now contains all the headers.
      # The headers can be accessed as a Python dictionary.
      out.write("<headers>\n")
      for (h,v) in m.items():
            out.write("<field>\n")
            out.write("<name>%s</name>\n" % XMLEscape(h))
            out.write("<value>%s</value>\n" % XMLEscape(v))
            out.write("</field>\n")
      out.write("</headers>\n")
      out.write("<body>\n")
      out.write(XMLEscape(fo.read()))
      out.write("</body>\n")
      out.write("</message>\n")

def DoEudoraMailBox(MailBox):
      """
      -Given a Eudora mailbox, convert its contents to XML conforming to the xmail DTD.
      """
      f = open (MailBox,"r")
      l = f.readline()[:_1]
      if string.find(l,"From ???@???")==_1:
            -# Sentinel that separates e-mail messages in the  # Eudora mbx notation.
            print 'Expected mailbox "%s"' % MailBox,
            Print 'to start with "From ???@???"'

            return
      if MailBox[-4:] != ".mbx":
            -print "Expected mailbox to have .mbx file    extension", MailBox
            return
      # Output file has same base name but .xml extension.
      out = open(MailBox[:-3]+"xml","w")
      out.write ('<?xml version="1.0"?>\n')
      out.write ('<!DOCTYPE xmail SYSTEM "xmail.dtd">\n')
      out.write ('<xmail>\n')
      Message = []
      l = f.readline()
      while l:
            if string.find(l,"From ???@???")!=-1:
                  # Full message accumulated in the Message list,
                  # so process it to XML.
                  ProcessMessage(Message,out)
                  Message = []
            else:
                  # Accumulate e-mail contents line by line in
                  # Message list.
                  Message.append (l)
            l = f.readline()
      if Message:
            # Last message in the mailbox
            ProcessMessage(Message,out)
      out.write ('</xmail>\n')
      f.close()
      out.close()

def DoLinuxMailBox(MailBox):
      """
      -Given a Unix mbox style mailbox, convert its contents to XML conforming to the xmail DTD.
      """
      f = open (MailBox,"r")
      l = f.readline()[:_1]
      if string.find(l,"From ")!=0:
            -print 'Expected mailbox "%s" to start with "From "' % MailBox
            return
      # Output file has same name as mailbox but with ".xml" added.
      out = open(MailBox+".xml","w")
      out.write ('<?xml version="1.0"?>\n')
      out.write ('<!DOCTYPE xmail SYSTEM "xmail.dtd">\n')
      out.write ('<xmail>\n')
      Message = []
      l = f.readline()
      while l:
            if string.find(l,"From ")==0:
                  # Full message accumulated in the Message list,
                  # so process it to XML.
                  ProcessMessage(Message,out)
                        Message = []
            else:
                  # Accumulate e_mail contents line by line in
                  # Message list.
                  Message.append (l)
            l = f.readline()
      if Message:
            # Last message in the mailbox
            ProcessMessage(Message,out)
      out.write ('</xmail>\n')
      f.close()
      out.close()

if __name__=="__main__":
      import sys,getopt
      format = LINUX
      (options,remainder) = getopt.getopt (sys.argv[1:],"le")
      for (option,value) in options:
            if option == "-l":
                  format = LINUX
            elif option == "-e":
                  format = EUDORA
      if len(remainder)!=1:
            print "Usage: %s -l|-e mailbox" % sys.argv[0]
            sys.exit()
      if format==EUDORA:
            DoEudoraMailBox(remainder[0])
      elif format==LINUX:
            DoLinuxMailBox(remainder[0])

800 East 96th Street, Indianapolis, Indiana 46240