Home > Articles > Web Services > XML

Creating C++ Interpreters for XML Extension Languages

📄 Contents

  1. Creating Program Tree Objects from XML
  2. About This Article
Fabio Arciniegas, author of C++ XML, shows you how to create C++ interpreters for custom XML extension languages. The process is exemplified with the construction of an XML language to control a graphic processing application, including image manipulation primitives, loops, and conditionals.
This article is excerpted from C++ XML, by Fabio Arciniegas.
Like this article? We recommend

Like this article? We recommend

The first mechanism for implementing our own scripting (or extension) XML languages on top of C++ applications is the creation of an interpreter for the XML format into our program. The theory and implementation options behind writing an interpreter can be overwhelming. We are going to stick with a clean and proven--if not the most efficient in every case--pattern called "Little Language."

Creating Program Tree Objects from XML

The following sections describe concepts--such as variables and control flow structures for an XML language--for scripts in a program that manipulates images.

Why Use XML for Extension Languages?

Some platforms already define mechanisms for programs to expose interfaces so scripting languages can manipulate them. Furthermore, XML syntax can be verbose, so why create extension languages based on XML?

The answer is twofold. First, from the user's perspective, writing scripts in XML can be much more pleasant and safe than writing lisp, Visual Basic, or some other scripting language. The user already has the tools to do it, knows the conventions and underlying syntax, and can use a multitude of readily available free tools to generate and audit his scripts. Second, from the developer's perspective, using XML brings the promise of portability, transparency, the possibility to easily implement cross-platform extension hooks without relying in any particular platform or middleware (such as CORBA or COM), and many chances for code reuse and robustness (for example, you no longer need to write a low-level parser for the language; you already have SAX to do that).

Overview of the Mechanism

Without getting into the details of programming language theory, it can be said that an expression or program can be seen as a tree, where each element can be evaluated to a certain value, either because it is an atomic "terminal" (such as the number "5" in the arithmetic example) or because it can be evaluated by applying the semantics of the "non-terminal" to its children (for example, the result of an "add" element is determined by adding its two children). Figure 1 further illustrates the point.

Figure 1 Expression As Tree

The evaluation of some nodes may produce side effects (for example, a call to printf produces an output according with its arguments).

An XML document, as you well know, is a tree. Therefore, you can represent programs using XML documents where the non-leaf nodes represent functions and control structures, and the leaves represent atomic values. The interpretation of these files will result in object hierarchies as the one that is described in the figure, which can be evaluated at runtime.

Designing the Language

The testImage program is a powerful set of tools to make all sorts of image manipulation, including dozens of effects and support for many file types. The winConvert application is a windows front-end to testImage, adding the ability to select visually the files to treat and the options to use.

winConvert exposes only a very small subset of the functionality of the core program; namely, it allows the transformation of the file into three output formats, cropping to a particular rectangle, and adding a message at the top of the image. winConvert is limited, but the goal is to define XML extension languages, not comprehensive image manipulation. So, it will serve your purposes well (for more about all the possibilities of the testImage program and image manipulation software related to it, go to the postgraphy site.

The goal of this section is to create an XML scripting language for applying winConvert functions automatically to several files.

Real-World Examples

The code included and the components it links are the basis of real products. The script manipulation techniques shown here, and the core image manipulation engine, are the foundations of some of the products of my own company, postgraphy.

Philosophy

The first step in the creation of the language is coming up with high-level decisions about what functionality to expose and the paradigm used to do it. In particular, it is important to decide whether XML will reflect the underlying API or provide higher level abstractions.

In this case, the underlying API is basically that of the CImage object (see Figure 2). You could very well expose its methods directly as XML elements, but instead it is preferred to expose a higher-level view of the language. For example, Image objects provide a method to change their own output format, but instead of saying "save this image as JPG" you want to say "all the save operations from this point on should be done using JPG format". The language must reflect such decisions.

Figure 2 CImage Object

Primitives

Following the philosophy for this language, you come to the following basic primitive functions:

  • Set output format

  • Crop image

  • Annotate image

To provide a way to specify the input filename, you will provide a treat element, which can contain both crop and annotate instances. Turning the above into a DTD, you obtain Listing 1.

Listing 1: convertScript_0_1.dtd

<!ELEMENT convertScript  (setOutputFormat|treat)*>
<!ELEMENT setOutputFormat EMPTY>
<!ATTLIST setOutputFormat 
     to       (JPEG|TIFF|GIF)   #REQUIRED>
<!ELEMENT treat      (crop?,annotate?)>
<!ATTLIST treat 
     file      CDATA        #REQUIRED>

<!ELEMENT crop      (x,y,xf,yf)>
<!ELEMENT x        (#PCDATA)>
<!ELEMENT y        (#PCDATA)>
<!ELEMENT xf       (#PCDATA)>
<!ELEMENT yf       (#PCDATA)>

<!ELEMENT annotate    (#PCDATA)>

Using this language, you can already define somewhat interesting scripts, such as the one in Listing 2.

Listing 2: script_0_1.xml

<?xml version="1.0"?>
<!DOCTYPE convertScript SYSTEM "convertScript_0_1.dtd">
<convertScript>
<setOutputFormat to="JPEG"/>
<treat file="c:\temp\Leira.gif">
<crop>
<x>0</x><y>0</y>
<xf>102</xf><yf>49</yf>
</crop>
</treat>
<treat file="c:\temp\Et.tif">
<annotate>second file</annotate>
</treat>
<treat file="c:\temp\Oeluc.gif">
</treat>
<setOutputFormat to="GIF"/>
<!-- treat some more images here...-->
</convertScript>

Variables and Operators

It would be interesting to have the ability to define variables in your language (for example, to reuse an annotation). In Listing 3, the convertScript DTD is expanded to allow the assignment and retrieval of values to/from variables (namely the assign and variable elements). The equal, greater than, and plus operators are also introduced. The improved DTD is shown in Listing 3.

Note that I keep the number of operands at a minimum for space and complexity reasons, but the language can be easily extended with other primitives if you so desire.

Listing 3: Adding Variables and Operators to the Language

<!ELEMENT convertScript  (assign|setOutputFormat|treat)*>

<!ENTITY % operator    "equal|plus|greaterThan">
<!ELEMENT variable     EMPTY>
<!ATTLIST variable
     name       CDATA   #REQUIRED>
<!ELEMENT assign     (variable,value)>
<!ELEMENT value      (#PCDATA|variable|%operator;)*>
<!ELEMENT equal      (operand,operand)>
<!ELEMENT plus      (operand,operand)>
<!ELEMENT greaterThan   (operand,operand)>

<!ELEMENT operand     (#PCDATA|variable)*>

<!ELEMENT setOutputFormat EMPTY>
<!ATTLIST setOutputFormat 
     to       (JPEG|TIFF|GIF)   #REQUIRED>
<!ELEMENT treat      (crop?,annotate?)>
<!ATTLIST treat 
     file      CDATA        #REQUIRED>

<!-- now the values of crop can be also complex expressions -->
<!ELEMENT crop      (x,y,xf,yf)>
<!ELEMENT x        (#PCDATA|variable|%operator;)*>
<!ELEMENT y        (#PCDATA|variable|%operator;)*>
<!ELEMENT xf       (#PCDATA|variable|%operator;)*>
<!ELEMENT yf       (#PCDATA|variable|%operator;)*>

<!ELEMENT annotate    (#PCDATA|variable|%operator;)*>

Control Structures

Finally, you want to illustrate the support for some control structure. So, you add the while statement (again, syntactic sugar such as for, and other control structures such as if, can be added at leisure).

The while element has two children: a condition and a body. Much in the C tradition, the contents of the condition are ultimately evaluated to a value: In case the value is 0, the condition is taken to be false; in any other case, the condition is true. Listing 4 shows the final version of the language.

Listing 4: convertScript_1_0.dtd

<!ELEMENT convertScript  (while|assign|setOutputFormat|treat)*>

<!ENTITY % operator    "equal|plus|greaterThan">
<!ELEMENT variable     EMPTY>
<!ATTLIST variable
     name       CDATA   #REQUIRED>

<!ELEMENT assign     (variable,value)>
<!ELEMENT value      (#PCDATA|variable|%operator;)*>
<!ELEMENT equal      (operand,operand)>
<!ELEMENT plus      (operand,operand)>
<!ELEMENT greaterThan   (operand,operand)>

<!ELEMENT while      (condition,body)>
<!ELEMENT condition    (#PCDATA|variable|%operator;)*>
<!ELEMENT body      (while|assign|setOutputFormat|treat)*>

<!ELEMENT operand     (#PCDATA|variable)*>

<!ELEMENT setOutputFormat EMPTY>
<!ATTLIST setOutputFormat 
     to       (JPEG|TIFF|GIF)   #REQUIRED>
<!ELEMENT treat      (crop?,annotate?)>
<!ATTLIST treat 
     file      CDATA        #REQUIRED>

<!ELEMENT crop      (x,y,xf,yf)>
<!ELEMENT x        (#PCDATA|variable|%operator;)*>
<!ELEMENT y        (#PCDATA|variable|%operator;)*>
<!ELEMENT xf       (#PCDATA|variable|%operator;)*>
<!ELEMENT yf       (#PCDATA|variable|%operator;)*>

<!ELEMENT annotate    (#PCDATA|variable|%operator;)*>

Listing 5 shows a script that uses all the constructs you generated in order to generate multiple crops with different sizes out of a single file.

Listing 5: script_1_0.xml

<?xml version="1.0"?>
<!DOCTYPE convertScript SYSTEM "convertScript_1_0.dtd">
<!-- the following script would be equivalent to the following c++ pseudo-code:
   setOutputFormat(JPEG)
   j = 0;
   while(300 > j)
   {
     f = treatFile("c:\\temp\\face.gif");
     f.crop(0,0,j,j);
     j = j + 50;
   }   
-->
<convertScript>
<setOutputFormat to="JPEG"/>
<assign>
<variable name="j"/>
<value>0</value>
</assign>
<while>
<condition>
<greaterThan>
<operand>300</operand>
<operand><variable name="j"/></operand>
</greaterThan>
</condition>
<body>
<treat file="c:\temp\face.gif">
<crop>
<x>0</x>
<y>0</y>
<xf><variable name="j"/></xf>
<yf><variable name="j"/></yf>
</crop>
</treat>
<assign>
<variable name="j"/>
<value>
<plus>
<operand><variable name="j"/></operand>
<operand>50</operand>
</plus>
</value>
</assign>
</body>
</while>
</convertScript>

This language, as you can see, can already throw somewhat interesting results.

Creating the Object Structure

In order to interpret the programs above, you will construct hierarchies, where every object exposes an eval() method, implementing the logic associated with it (for example, the eval method of Plus will return the addition of its two operands).

The complete script will be represented by a tree of Terms (the base class for Treat, Crop, and all other possible members of the tree). When the eval() method in the root is called, it will recursively call eval() on its children (and so on) finally evaluating the whole program. The following subsections show the implementation of the different types of terms.

Modeling the Primitives

The Treat, Crop, setOutputFormat, and Annotate primitives are modeled by making calls to a global CImage variable, as illustrated in Listing 6. Because you are allowing the one-time setting of the output format via the setOutputFormat element, but the Image object is getting updated all the time, you must also keep track of the format value so that every time Treat is called, it can set the Image format to the correct value.

Listing 6: Crop Implementation

class Crop : public Term {
 public:

  crop(int nx,int ny,int nxf,int nyf) : x(nx), y(ny), xf(nxf), xy(nyf){ }

  int eval() 
  {  
    globalImage.crop(x,y,xf,yf); 
  }

 private:
  int x,y,xf,yf;
};

Listing 7 shows the implementation of the Treat object, which reads the file, applies all the children underneath it, and then writes it back, appending a random number to the filename (in an imperfect way to avoid re-writes when a loop forces the same file to be treated more than once).

Listing 7: Treat Implementation

#include <vector>
#include <string.h>

using namespace std;

class Treat : public Term {
 public:

  void setName(char* newName) 
  {
     strcpy(name,newName);
  }

  void addTerm(Term *t)
  {
     children.push_back(t);
  }

  int eval() 
  { 
     globalImage.read(name);
     // This method sets the output format
     globalImage.Format(format); //format is a global variable, 
                   //set by// the eval method ofsetOutputFormat
     //now, process all the children, thus modifying the image
     for(int i = 0; i < children.size();i++)
      children[i]->eval();
     // the modifications are complete, save
     char randomPostfix[15];
     globalImage.save(strcat(name,itoa(rand()%100,randomPostfix,10)));
     return 1;
  }

 private:
   vector<Term*> children;
   char name[90];
};

Modeling the Control Structures

The implementation of while objects also falls compactly into the term-based scheme we have been using. Listing 8 shows the eval() method for this class.

Listing 8: While::eval()

  int eval()
  { 
   while(condition->eval() != 0)
     for(int i = 0; i < children.size();i++)
      children[i]->eval();
   return 1;
  }

Finally, in order to keep the value of variables, maintain a map of names versus values (for simplicity, all variables will be integers):

static vars<string,int>;

The eval() method of the Variable Object fetches values from this table, while the eval() of Assign updates it.

Constructing the Term Tree

Using the XMLableFR framework (which is included in the book), you can easily write the SAX2 skeleton class that will act as your object hierarchy builder. Filling it out is a matter of creating instances of Crop, Treat, and so on according to an element name. For space reasons, the builder code for Term trees is omitted from the article (it can be found on http://www.cppxml.com).

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020