Home > Articles > Web Services > XML

  • Print
  • + Share This
Like this article? We recommend

Parsing the XML

Before you can do anything with the XML data, you need some way to parse it into the tree. Parsing is actually a two-step process:

  1. Locate the elements that you want to parse. An XML document can contain comments and processing instructions. In addition, you might not want to start with the root node (Categories, in this case).

  2. Now that you have a starting place, look through the various elements—at least until you've reached the depth within the document that you want. The example relies on recursion to perform this task, and it keeps parsing until the entire document is completed.

Starting the Parsing Process

The sample application makes an assumption about the XML document—it relies on Categories being the root node. (You could easily modify this assumption to consider other document types.) Listing 2 shows how the example starts the parsing process by looking for the Categories node.

Listing 2 Starting the Parsing Process

private void mnuFileParse_Click(object sender, System.EventArgs e)
{
  XmlDocument Doc; // Holds the XML data.
  XmlNodeList Cats; // All of the categories.

  // Load the XML document.
  Doc = new XmlDocument();
  Doc.Load("CategoryList.XML");

  // Get the list of categories.
  Cats = Doc["Categories"].ChildNodes;

  // Parse the list.
  CheckChildren(Cats, null);
}

NOTE

The example leaves out error trapping for the sake of clarity.

The code begins by loading an XML document that resides on the local hard drive, but the example doesn't really care where the XML document resides. You could easily download the XML data from a web site or even get it as part of a web service call. To use this technique, however, you must have full access to the XML document and not just an XML fragment. In addition, the document must be well-formed or the Common Language Runtime (CLR) will throw an error.

After the document loads, the code obtains the child nodes of the Categories root node. Leaving out the root node reduces the processing time and possible problems, but you could easily process the root node as well. Finally, the code sends these nodes to CheckChildren(), described in the next section.

Using Recursion for Parsing

Recursion can be a wonderful technique for working with documents of unknown configuration, because you don't have to track individual levels—the code does it for you automatically. The problem with recursion is that it substantially increases the memory requirements for an application. There's always the chance of running out of memory. However, you really won't run into a problem with most modern machines—the dataset would have to be immense to run out of memory (at which point, you should consider other parsing techniques). Listing 3 shows how this example uses recursion for parsing.

Listing 3 Using Recursion To Parse the XML Document

private void CheckChildren(XmlNodeList List, TreeNode ParentNode)
{
  Int32  NewNode;  // Number of a new node.

  // Look at each node.
  foreach (XmlNode ThisNode in List)
  {
   // Determine whether this is a text (value) node.
   if (ThisNode.NodeType == XmlNodeType.Text)
   {
     // Add the value to the parent's tag value.
     ParentNode.Tag = ThisNode.Value;

     // Return without further processing.
     return;
   }

   // Determine whether this is a top-level node.
   if (ThisNode.Attributes["Parent"].Value == "0")
   {

     // Add the node to the top of the tree.
     NewNode = tvNodes.Nodes.Add(new TreeNode(ThisNode.Name, 0, 1));

     // Determine if this is the end of this level.
     if (ThisNode.ChildNodes.Count == 0)

      // Return nothing.
      return;

     else
      // Otherwise, there is something to process.
      CheckChildren(ThisNode.ChildNodes, tvNodes.Nodes[NewNode]);
   }
   else
   {
     // Determine whether this is the end of this level.
     if (ThisNode.ChildNodes.Count == 0)
     {

      // Add a subordinate leaf node.
      NewNode = ParentNode.Nodes.Add(new TreeNode(ThisNode.Name, 2, 3));

      // Return nothing.
      return;
     }

     else
     {
      // Add a subordinate parent node.
      NewNode = ParentNode.Nodes.Add(new TreeNode(ThisNode.Name, 0, 1));

      // Process the next child.
      CheckChildren(ThisNode.ChildNodes, ParentNode.Nodes[NewNode]);
     }
   }
  }
}

The code begins by taking a shortcut—it uses a foreach statement to select each node in turn for processing. The code continues processing until all of the nodes at a certain level are processed. It's a little hard to wrap your brain around recursion sometimes, but CLR keeps each level separate through use of the stack.

Every call to CheckChildren() begins a new level of processing. Some levels are text nodes that contain the value for a particular element. The code checks for text nodes next and saves this value in the parent node's Tag property. When an element doesn't have a value associated with it, the Tag property remains null. Consequently, it's easy to determine when an element is a leaf node.

The top-level nodes require special processing. Perhaps you want to assign a special icon to the top-level nodes, or simply want to reduce the amount of processing time because you know that the top-level nodes always have children. However, the top-level nodes also require special processing for another reason—they form the nodes to which all other nodes attach in the tree view. Consequently, these nodes don't have any parent nodes, and you must handle them differently.

The code begins by adding the node to the TreeView control, tvNodes. Notice that the code relies on the Add() method and creates a new TreeNode control to fill it. You'll need to create an ImageList control and assign it to the tvNodes' ImageList property to add icons to the various layers. The example uses four icons: parent node closed, parent node open, leaf node closed, and leaf node open. When the top-level node has one or more children, the code calls CheckChildren() recursively. Otherwise, it ends processing for the current node and moves on to the next node.

Nodes that appear at the second level or greater might not have any children. When a child node doesn't have children, the code adds the child node to the list by using the leaf node icons. Notice also that the code adds the node to the parent, not to tvNodes. This technique ensures that you don't have to track which level the code is at because you never work directly with tvNodes unless you're adding a top-level node.

A child node with children receives the parent node icons. Note that CLR treats a child node with a value as a node that has children—the only difference is that the child is a text node, not an element. Consequently, when you want the code to use a leaf node icon for all children that don't have other elements as children, you need to peek at the next level to see what kind of children the node contains. As with a parent node, the code makes a recursive call to CheckChildren() to continue processing the node levels.

  • + Share This
  • 🔖 Save To Your Account