Home > Guides

SAX and PHP

Last updated May 13, 2004.

The Simple API for XML, or SAX, was developed by the XML-DEV mailing list. Rather than treating an XML document as a tree-like structure, SAX treats it as a series of events such as startDocument or endElement. To accomplish this, a SAX appllication consists of a parser that sends these events to "handlers," methods or functions designated to handle them.

In PHP, you can handle this with straight functions, or you can handle them within a class. In this example, we'll do a combination of the two, handling the entire process from within a single class.

Creating a SAX application involves processing events as they arrive, keeping in mind that the handler class knows only about the current event; if you need information about previous events, you need to save it yourself. For example, consider this order file:

<?xml version="1.0"?>
<order orderid="THX1138" customerNumber="3263827">
    <lineitem itemid="C33">
       <item>3/4" Hex Bolt</item>
       <quantity>36</quantity>
       <unitprice currency="dollars">.35</unitprice>
    </lineitem>
    <lineitem itemid="M48">
       <item>Condenser</item>
       <quantity>1</quantity>
       <unitprice currency="dollars">2200</unitprice>
    </lineitem>
    <delivery>Overnight</delivery>
</order>

We can create a SAX application that lists the order information, including the extended total for each item and the grand total for the order. We'd start by creating the main SAX application:

<?php

class OrderProcessor {

   function OrderProcessor(){
   }

   function ProcessOrder($url) {

      $parser = xml_parser_create();

      $fp = fopen($url, "r");
      while(!feof($fp)) {
         $line = fgets($fp, 4096);
         xml_parse($parser, $line);
      }
      fclose($fp);

      xml_parser_free($parser);

   }

}

$order =& new OrderProcessor();
$success = $order->ProcessOrder("order.xml");

?>

We start by creating the new object and running the ProcessOrder function. The function creates the parser and feeds the file to it 4K at a time. At this point, however, the script doesn't actually do anything. In order to get it to act on the file, we need to assign handlers:

<?php

class OrderProcessor {

   function OrderProcessor(){
   }

   function ProcessOrder($url) {

      $parser = xml_parser_create();

      xml_set_element_handler($parser, "_startElement", "_endElement");
      xml_set_character_data_handler($parser, "_charHandler");

      $fp = fopen($url, "r");
      while(!feof($fp)) {
         $line = fgets($fp, 4096);
         xml_parse($parser, $line);
      }
      fclose($fp);

      xml_parser_free($parser);

   }

   function _startElement($parser, $name, $attrs) {
   }

   function _endElement($parser, $name) {
   }

   function _charHandler($parser, $data) {
   }

}

$order =& new OrderProcessor();
$success = $order->ProcessOrder("order.xml");

?>

The xml_set_element_handler() function sets both the "start element" and "end element" handlers, and the xml_set_character_data_handler() function takes care of, well, setting the character data handler.

Note that in PHP there is no way to set handlers for the "start document" and "end document" events, so we'll run them manually within the application:

<?php

class OrderProcessor {

   var $totalPrice;

   function OrderProcessor(){ }

   function ProcessOrder($url) {

      $parser = xml_parser_create();
      xml_set_object($parser, &$this);

      xml_set_element_handler($parser, "_startElement", "_endElement");
      xml_set_character_data_handler($parser, "_characters");

      $this->_startDocument($parser);
   
      $fp = fopen($url, "r");
      while(!feof($fp)) {
         $line = fgets($fp, 4096);
         xml_parse($parser, $line);
      }
      fclose($fp);

      $this->_endDocument($parser);

      xml_parser_free($parser);

   }

   function _startDocument($parser){
      $this->totalPrice = 0;
   }

   function _endDocument($parser){
      echo("<br />Order total: ".$this->totalPrice."<br />");
   }

   function _startElement($parser, $name, $attrs) {
   }

   function _endElement($parser, $name) {
   }

   function _characters($parser, $data) {
   }

}

$order =& new OrderProcessor();
$success = $order->ProcessOrder("order.xml");

?>

In this case, we're going to gather information about the order, so we'll start by initializing the totalPrice variable before we start parsing, and displaying its value when parsing is finished.

Most events fire multiple times. For example, the first events in the sample document are:

startDocument
characters (white space)
startElement (lineitem)
characters (white space)
startElement (item)
characters (3/4" Hex Bolt)
endElement (item)
characters (white space)
startElement (quantity)
characters (36)
endElement (quantity)
...

Now, it's important to understand that each of these events are completely independent of each other. When the characters event fires to note the 3/4" Hex Bolt -- more on the _characters function in a moment -- the handler has no way of knowing that that text is part of the item element. If this information is important (as it is here) we need to keep track of it ourselves.

For our purposes, that means that when we close an element we're tracking, such as item or quantity, we need to store the text that's been flowing through the _characters method function, like so:

<?php

class OrderProcessor {

   var $totalPrice = 0; 
   var $itemid = ""; 
   var $itemname = ""; 
   var $quantity = 0;
   var $unitprice = 0; 
   var $currentElement = ""; 
   var $thisText = ""; 

   function OrderProcessor(){ }

   function ProcessOrder($url) {

      $parser = xml_parser_create();
      xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, false);
      xml_set_object($parser, &$this);

      xml_set_element_handler($parser, "_startElement", "_endElement");
      xml_set_character_data_handler($parser, "_charHandler");

      $this->_startDocument($parser);
   
      $fp = fopen($url, "r");
      while(!feof($fp)) {
         $line = fgets($fp, 4096);
         xml_parse($parser, $line);
      }
      fclose($fp);

      $this->_endDocument($parser);

      xml_parser_free($parser);

   }

   function _startDocument($parser){
      $this->totalPrice = 0;
   }

   function _endDocument($parser){
      echo("<br />Order total: ".$this->totalPrice."<br />");
   }

   function _startElement($parser, $name, $attrs) {

      if ($name == "order"){ 
  
         $orderid = $attrs["orderid"]; 
         $customerid = $attrs["customerNumber"]; 
         echo("Order ".$orderid." for customer ".$customerid.":<br /><br />"); 

      } else if ($name == "lineitem"){ 
         $this->itemid = $attrs["itemid"]; 
      }    

      $currentElement = $name;

   }

   function _endElement($parser, $name) {

      if (strlen($this->thisText) > 0) { 
         if ($name == "item"){ 
            $this->itemname = $this->thisText;
         } else if ($name == "quantity"){ 
            $this->quantity = $this->thisText; 
         } else if ($name == "unitprice"){ 
            $this->unitprice = $this->thisText; 
         } 
         $this->thisText = ""; 
      } 
      if ($name == "lineitem"){ 
         $this->extendedPrice = $this->quantity * $this->unitprice; 
         echo(" Item: ".$this->itemname." (".$this->itemid.") ".$this->quantity. " @ ".$this->unitprice." = ".$this->extendedPrice."<br />"); 
         $this->totalPrice = $this->totalPrice + $this->extendedPrice; 
         $this->itemname = ""; 
         $this->quantity = ""; 
         $this->quantityInt = 0; 
         $this->unitprice = ""; 
         $this->unitpriceDbl = 0; 
      }

   }

   function _charHandler($parser, $data) {

      $this->thisText = $this->thisText . $data;

   }

}

$order =& new OrderProcessor();
$success = $order->ProcessOrder("order.xml");

?>

Let's start with _startElement. If it's the order element or the lineitem element we've run across, we're pulling the appropriate information from the attributes present, which are fed to the function as an array. In any case, we're storing the name of the element.

In most cases, the next event that will fire is the characters event as the content of the element is processed. One thing that's a little strange about SAX is that you never really know just how text will be processed. You might get it all in one big chunk, or you might get it in a series of smaller pieces. Because of this little idiosyncrasy, we need to store each call in the thisText variable. When we get to the end of the element, the _endElement function executes, and we can check (and clear) the contents of the variable.

Note that our method of saving the "current" element only works because we're only looking for the text children of simple elements. If we needed to track multiple levels of elements, we'd have to find another way of storing the information (or use another way of parsing the document, such as DOM). In this case, though, it's sufficient, so as each element closes, we check to see what it was and perform the appropriate actions. If it was an item, quantity, or unitprice element, we simply store the appropriate values. If, on the other hand, its the end of a lineitem element, we perform the appropriate calculations, display the information for that item, and reinitialize the variables.

Calling up the PHP page displays a result of

Order THX1138 for customer 3263827

Item: 3/4" Hex Bolt (C33) 36 @ .35 = 12.6
Item: Condenser (M48) 1 @ 2200 = 2200.0

Order total: 2212.6

SAX is, in many cases, faster and more efficient than DOM, because it only deals with the information that's relevant at that particular moment rather than keeping the entire tree in memory at once. It may take a little getting used to, but you'll find that it can be an extremely versatile item in your toolbox.

Discussions

JavaScript and XSLT
Posted Jun 12, 2008 03:40 PM by cjalkam
0 Replies
delete node
Posted May 17, 2008 07:42 AM by fiatydave
0 Replies
passing java instances into xslt
Posted Apr 4, 2008 02:17 PM by tdr50040830
0 Replies

Make a New Comment

You must log in in order to post a comment.

Related Resources

Jennifer  BortelWin FREE iPhone Developer Books and Videos- Introducing @InformIT Giveaways
By Jennifer BortelFebruary 5, 2010 No Comments

Apples’s recent iPad announcement made our hearts flutter so we couldn’t resist making an announcement of our own!

Today marks the first ever @InformIT Giveaway!

We’ll regularly post a video like this one profiling spectacular prizes we’re giving away—from books and videos to T-shirts and other exciting stuff. Check out the video below to see the giveaways for today, and then scroll down for more prize details and instructions on how to win them!

So Far So Good
By John TraenkenschuhFebruary 2, 2010 No Comments

So far, Win 7 is making a thoroughbred of what has been a plough mule laptop

Dustin Sullivan"Every OSX developer should have this book on their desk."
By Dustin SullivanFebruary 1, 2010 No Comments

That was the sentence Mike Riley ended his recent Dr Dobb's CodeTalk review of Cocoa Programming Developer's Handbook with.

See More Blogs

Informit Network