Home > Articles > Programming > PHP

Cutting-Edge Applications in PHP 4.0

  • Print
  • + Share This
Learn everything you need to know about cutting-edge PHP applications, including common data formats and open standards for data exchange, remote procedure calling, platform-independent data storage, how to design, create, and set up a knowledge repository, and how to store and retrieve data.
This chapter is from the book

      If you realize that all things change, 
there is nothing you will try to hold on to. 
              If you aren't afraid of dying, 
         there is nothing you can't achieve.

In this chapter, we're going to delve further into modern Web application topics.

In the first section, "Knowledge Repositories," we create a tip repository featuring user ratings, hit counter, and unlimited nested categories. You'll learn about tree structures and put into practice the knowledge gained in Chapter 2, "Advanced Syntax."

XML (Extensible Markup Language) is rapidly becoming the most widely used standard for data exchange. Nonetheless, often generalized explanations ("XML is HTML allowing you to create your own tags") make it hard to understand the real concepts behind it. We try to explain it in detail and give you thorough introduction into XML parsing with Expat, the Document Object Model interface (DOM), and LibXML.

WDDX (Web Distributed Data eXchange) provides a means to exchange programming language structures (objects, classes, arrays, and so on) across the Internet. We'll show why this is useful and how to use it in your own applications.

Knowledge Repositories

In the corporate environment, a clear trend emerged during the last few years: away from product-based planning and toward customer-focused strategy. With this trend, a new technology gained widespread publicity and success: knowledge management.

For a company that wants to have a strategic advantage over its competitors, it's necessary to organize corporate knowledge in a way that makes it easily accessible by anyone, all the time. With the dawn of enterprise intranets, this topic was made more current than ever.

In traditional intranets, information is often hard to find because it's spread to many different pages and coming from many different sources. The information that's actually there is often quite useless because it's not indexed and not broken down into smaller logical units, making it hard to search.

What can a company do to solve these problems efficiently? The key lies in proper knowledge management. A lot of companies offer sophisticated solutions, but you may also consider developing your own tools—simpler and therefore easier to use than commercial solutions, or better fitting your company's strategy.

We have a starting point for you. On the CD-ROM to this book, you'll find the full source code for a knowledge database, which could easily be transformed into a support repository or a corporate link directory. The system was originally developed for Zend Technologies, but they have been kind enough to let us distribute it with our book.

The application has a wide range of features: full-text search, an unlimited number of categories and subcategories, a report showing all tips by a specific author and authors with the most submissions, user rating of entries, user submission entries, and more.

The system was realized using the PHPLib for database abstraction and HTML templates. Therefore, the following walkthrough will also give you a thorough overview of application development with the PHPLib. Figure 7.1 shows the application's start screen.

07fig01.gifFigure 7.1. The knowledge repository's start screen.

Requirement List

As discussed in Chapter 3, "Application Design: A Real-Life Example," a project starts with compiling the requirements. Usually this is an iterative process in close collaboration with the customer, and is often handled by a systems analyst, project manager, or consultant, often also by programmers. Analyzing the problem domain and writing a requirement list is one of the most important phases in software development, which will in substantial part determine the success of the project. This project was started with a requirement list provided by Zend Technologies.

A software system should be developed to organize PHP facts, tips, and hints, in an easily browseable and searchable manner. The first page of the application should show the available categories below the root category and a list of newly added entries to the database. By clicking on one of the categories, the user can browse the entries below this category. By clicking on an entry, the user gets to a page showing the details of an entry: the title of the entry, the full text, the author's name, the date the entry was added, and the current rating. This page should also make it possible to rate the entry, in a classification from one to five, one being the highest rate.

The software should have a full-text search feature, using AND as default concatenation operator: If the user enters "imap connect" the system should return all entries in the database having "imap" and "connect" in their title or body. Search should not be case sensitive.

It should be possible to retrieve all entries submitted by a certain author. Three additional reports should be available, showing the authors with the most entries in the database, the entries with the highest ratings, and the entries accessed most often.

Only registered users should be able to submit new entries. Submitted entries shouldn't be visible, but should be inserted into the database with a flag indicating that they need to be approved. The administrator should be notified when a new entry is submitted.

On the Zend.com site, the PHPLib is already in use. The system should therefore use the PHPLib for session management, database access, and templates. PHPLib's Template class should be used to separate code from layout. The system should expose a clean API, as it would be maintained later by different developers at Zend Technologies.

It's not typical that such a detailed requirement list is provided by the customer. Often, customers won't know how business problems may translate into software applications. The customer is not an expert in software development, but he or she knows about the problem domain. During the first discussions with the customer, the analyst usually compiles a requirement list from the problem domain—"What is the application for?" and "What should the application do for the user?" are typical questions in this stage. It's then the analyst's task to help the customer to express the problem in terms appropriate to software solutions. During the analysis phase, the analyst learns more about the problem and can put it into concrete and documentable terms.


The requirement list gives you a general understanding of the problem. Once you have that, it's time to create guidelines for the actual implementation: Write a specification. The first step for this is to explore the data structures needed.

Try to break up the complex problem into smaller structures. By analyzing the requirement list and the problem domain, it becomes clear that there are three important data structures—the rest of the application is built on them. The most important structure is an entry in the knowledge repository. What forms an entry? From the requirement list, we know that an entry has associated properties: a title, the body, author, category, ratings, and logs. Because we have solved similar problems before, we see a design pattern in this data structure: It's a simple container. But we know that we'll need a way to reference this container. (Drawing from past experience is very important and can distinguish a really good from a mediocre programmer. The toughest problem is easy to solve if you have already solved it earlier.)

Creating the data structure for the category follows a similar procedure, but initially all we know about it is that the structure should have an associated "name" property. The requirement list says that we need nested categories, so this structure needs a unique identifier as well (two categories in different branches could have the same name). Unlimited nesting of categories is also a requirement, but we'll skip this for the moment because it's a separate problem.

The third data structure is already predefined, as we use the PHPLib. It simply maps to PHPLib's Auth class.

This approach is different from traditional top-down engineering and functional decomposition. Functional decomposition identifies the functions of a system being built (in our example, "organize facts in categories," "provide reports for most accessed entries," etc.) and breaks them into smaller subfunctions until the functions are atomic and can be mapped directly to program functions. At this time, we make no attempt at doing this, as we have no idea how to define subfunctions as yet. Functional decomposition sounds great—until you try it. It can lead you in a totally wrong direction, and once you're on the way it's practically impossible to correct decisions because you can only divide the function again and again—you'd have to start over completely.

Instead, we try to break the whole problem into design patterns: We try to recognize problems that we have already solved once. Already knowing the solution to a similar problem is the best method for problem solving. For example, we don't need to do any sort of functional decomposition on the authentication problem—we know that we can simply use parts of the PHPLib for this.

The entry and category structures can already be mapped to code. Our application keeps the entry and category structures in classes:

class category
    var $cat_id, $cat_name, $parent_id;

class entry extends category
    var $entry_id, $title, $body, $t_stamp, $author, $views, $votes, $rating;

To use classes was a design decision, and not implied by the requirement list—our previous experience shows that using classes leads to cleaner code because you can have multiple separate instances.

You can see these data structures as different "domains" of the software. They form logical units but interact with each other. The goal of the specification document is to cover all domains in an application. How the domains are represented in code is not important at this point and serves purely for illustrating the structures.

Translating these data structures to a relational model is rather simple. The application uses database tables to store entries in the knowledge repository: available categories, entry ratings, and access logs. The two main tables are entries and categories, with links to the sub-tables ratings and logs. Of course, because MySQL doesn't know foreign keys, these ties need to be handled in application space—for example, if the administrator wants to delete an entry, he also needs to delete the corresponding entries (referenced by the same entry_id) in the tables ratings and logs. Figure 7.2 shows an entity relationship diagram for the table structure.

07fig02.gifFigure 7.2. Entity relationship diagram (ERD) for the knowledge repository application.

Now that you know the application's basic data structures, the next question is what happens with them. From the requirement list, we know a series of actions that the application should allow. It's now our task to cleanly separate these tasks.

Usually, actions are grouped around the data structures defined earlier. Let's first focus on actions dealing with entries in the database:

  • Retrieve a specific entry. This action needs to know the identifier of the entry to be retrieved. The action can fail if there's no entry matching the identifier in the database. It can also fail if system failures occur; for example, if the database system is inaccessible. In case of success, the action returns a structure for an entry.

  • Retrieve entries in a specific category. This actions needs to know the name of the category for which entries should be retrieved. It can fail in case of an error, or return a list of valid entries. A list of entries? Wait, we have no such data structure defined yet. Time to return to the specification and add a new structure for lists.

  • Retrieve the ten most-recently-added entries.

  • Retrieve top-rated entries.

  • Retrieve the most-accessed entries.

  • Retrieve all entries submitted from a certain author.

And so forth, for all outlined actions and domains. This will result in a comprehensive list of needed structures and actions, documenting the whole project.

To summarize: We've created a requirement list, describing the problem domain and the features the application should have. Putting the requirement list into more concrete terms resulted in a specification. The specification describes data structures and behavior of the application.

After that, it's time to look at details of the application's implementation. We pick two interesting points here, namely the use of templates and the implementation of nested categories in SQL.

The Template Class

As shown in Chapter 6, "Database Access with PHP," the PHPLib offers a solution for many problems common to Web application. In our case, the Zend developers already used the PHPLib for parts of their site, so we standardized on it for session management, database abstractions, and HTML templates.

PHPLib's Template class allows separation of code and layout, similar to the EasyTemplate class we developed in Chapter 5, "Basic Web Application Strategies." This class has a richer feature set than our class; for example, it can contain blocks, which mark sections to be replaced more than one time (useful for rows in tables, for example), and it can open multiple files in one instance and combine them easily. The drawback is that it's less intuitive to use than EasyTemplate.

The Template class is completely separate from the rest of the PHPLib, and you can use it without using any other PHPLib features. In case you're interested in looking at the source code for it, you can find it in the file template.inc in the PHPLib distribution.

Just like EasyTemplate, PHPLib's template class keeps HTML in separate files, using placeholders for data that should be substituted dynamically by PHP. "Scalar" placeholders, which will get replaced by ordinary strings, have the same format as the ones in EasyTemplate.

The code in Listing 7.1 processes this simple template.

Listing 7.1. Basic example of the Template class.

// Create a template instance
$tpl = new Template();

// Load file, assign an identifier to it
$tpl->set_file("page" => "basic_template.inc.html");

// Assign contents to the placeholders
$tpl->set_var(array("TITLE" => "This is a Template test",
                    "CONTENTS" => "Hello World!"));

// Parse into a temporary variable (identifier)
$tpl->parse("out", "page");

// Output the parsed template


The first line creates an instance of the Template class. The constructor of the class takes two optional arguments. The first optional argument specifies a base directory where your template files reside (the default is the current directory, ./); the second argument defines how to handle placeholders that aren't used in your script. This can be one of keep,comment, or remove, the default value being remove. If set to keep, placeholders are retained—if our example wouldn't assign a value to the {TITLE} placeholder, the placeholder would show up as is in the parsed template. Setting the variable to comment would produce the following output in our example, if the {TITLE} placeholder weren't assigned a value:

 <title><!-- Template : Variable TITLE undefined --></title>
  Hello World!

Setting the variable to remove (the default) would silently delete unassigned placeholders from the template.

The next line in the example uses set_file() to assign a template file to the class. This function takes as first argument a handle under which the template file will be referenced in later functions. The second argument is the filename; the file will be searched in the path specified in the constructor (in our example, the current directory). Alternatively, you can pass an associative array to the set_file() function, assigning multiple files at once. In that case, the keys of the array are the handles, and the elements define the actual filenames.

After this, strings are assigned to the template's placeholders using set_var(). Again, you can pass a single key/value pair to this function, or an associative array for batch processing. The remaining part of the example invokes the parsing function parse()) and prints the result (p()).

The example shows the most basic usage of the Template class; it works with only one template file and replaces each placeholder with one string variable. You could have used EasyTemplate for this as well, and it would probably have been faster and more intuitive than the PHPLib approach. However, the Template class shows its full strength when used in more complex scenarios.

One of the more advanced features is that the Template class can handle multiple template files and combine them into one output file. The knowledge repository application uses this extensively: One page template defines the general look and feel, the Cascading Style Sheets, and the page header and footer, and many smaller templates form the contents within the "parent" template. Look at these excerpts from the application's main page, index.php:

     "page" => "page.inc.html",
     "table" => "table.inc.html",
     "entry_summary" => "entry_summary.inc.html"

// [rest of code, assignments, etc]

$tpl->parse("CONTENTS", "table", true);
$tpl->parse("CONTENTS", "entries", true);
$tpl->parse("CONTENTS", "page");

The main template is referenced with the identifier page. This is basically an HTML framework, containing one important placeholder, {CONTENTS}, for the actual contents of the page. This placeholder will be replaced by another, separate template file, referenced as table. This works because the Template class allows you to append the results of a round of parsing to a placeholder. The script first parses the template file referenced by table and then appends it to the main template.

Looking at another template file of the application, entry_summary.inc.html, you'll see another advanced feature of the Template class: blocks. Dynamic blocks are used for parts of a template that will be replaced iteratively with itself. In our case, this is used to display the last five entries in the knowledge base. The entry_summary.inc.html template contains a block, which gets repeated to produce five entry summaries. A block is defined in the template using a comment syntax:

<!-- BEGIN blockname -->
<!-- END blockname -->

In code, the block is accessed using the set_block() function. The first argument to this function is the parent reference, usually a reference to the template file. The second argument is the block's name. The optional third argument is the name of a new reference; if omitted, it's assumed to be same as the block's name. For our example, the set_block() call would look like this:

set_block("table", "blockname");

The resulting reference (blockname) can then be handled like references produced with set_file(), and parsed regularly. Admittedly, this is confusing when you first hear it. Let's see how the Template class works conceptually. An important logical unit of the Template class is handles. Handles are similar to link identifiers (resource IDs): they point to a certain dataset and can be used in various functions as reference to this dataset. You can create handles using one of three methods:

  • set_file() creates a handle for a template file

  • set_var() creates a handle for a placeholder inside a template

  • set_block() creates a handle for a block inside a template

With each of these functions, you can specify the handle that should be created. In set_file() and set_var(), the handle to create is the first argument (or the keys of the array, if an associative array is passed as argument). In set_block(), the handle is the second argument. In functions like parse(), subst(), or get_undefined(), you use the previously created handles to reference the dataset the functions should process. For the functions, it doesn't matter how the handle was created—they work on whole template files as well as on placeholders or dynamic blocks. Let's look at a simpler example again. Say you've got one template file with one placeholder and one dynamic block:

<!-- BEGIN block -->
<!-- END block -->

To parse this, you first define a handle for the whole file using set_file(). The normal placeholder can be treated normally, just as we've shown earlier. Then you define a block handle. This block can now be treated the same way as you would treat the file handle itself—it's an equally important, independent division inside the file. Therefore, you can also combine the two handles as we've done earlier with the two separate files. In code, this would look like the following:

$tpl->set_file("page", "page.inc.html");

// Assign value to scalar placeholder
$tpl->set_var("PLACEHOLDER", "This is just a test.");

// Create block handle, named "block"
$tpl->set_block("page", "block");

// Create three block instances
for($i=0; $i<3; $i++)
    // Replace placeholder for this loop iteration
    $tpl->set_var("BLOCK_PLACEHOLDER", "Loop #$i");

    // Parse block, append the result to itself
    $tpl->parse("block_handle", "block", true);

// Parse and output page
$tpl->parse("page", "page");

This gives the designer the possibility to define row templates without having to deal with any PHP code. While this adds flexibility to the designer's task, there are still certain scenarios in which you have no other way than to mix code and layout again. An example for this is our application's search results page. In the code for this page, you'll find this section:

$entries = kb_get_entries_by_keyword($keywords);
// Any entries found?
    $tpl->set_block("tip_summary", "tip", "entries");
    kb_entries_to_template($entries, $tpl);

            "RESULTS_TITLE" => sprintf(count($entries). " %s found:", count($entries) > 1 ? "entries" : "entry"),
            "KEYWORDS" => $keywords
    $tpl->set_var("MESSAGE",  '<div align="center"><i>No entries found.</i></div>');
    $tpl->parse("entries", "tip", true);

The code checks whether entries matching the search term have been found in the database, and displays either a message stating that no entries have been found, or the listing of found entries. In the listing, the code also formats the message according to whether more than one entry is shown ("1 entry found" versus "x entries found"). This is clearly a layout issue, though—the number of found entries doesn't influence the application logic at all. Therefore, in an ideal world, the designer would be able to provide these messages. Maybe the designer would want to format the "No entries found" message as bold red, and the number of found entries as large and bold. In our case, the designer would have to kindly ask the programmer to implement it—after the third change, this becomes pretty frustrating for both designer and programmer.

One approach to solving this problem is to give the template some control back, and let the designer decide on template logic. Templates would then contain a simple meta scripting language, looking like this:

{{if ENTRIES_FOUND > 1}}
    {{ENTRIES_FOUND}} entries found:
    One entry found:
    No entries found for your search!

Of course, it's a fine line between separation of code and layout and mixing them again. Do you prefer layout in the code or code in the layout? It's a chicken-and-egg problem. At the time of this writing, first efforts were underway to create a template API for the standard PHP distribution. The meta script example above is taken from Andrei Zmievski's draft for a template language. Andrei (proponent of the template API and PHP core developer) plans to implement a number of other features; for example, standard predefined variables #ODD or #EVEN inside dynamic blocks. This would make it possible to implement the popular color changes in repeated table rows—which otherwise would need to be handled by the programmer again. Andrei plans to integrate the template API directly into PHP, which would offer a number of advantages over current template solutions, like PHPLib's. First, it would be standard, and software developers could depend on it. Second, as it would be tightly integrated into PHP's core engine, it could be a major performance boost. Parsed templates could be cached in memory, for example.

Recursion with SQL

Our application allows unlimited nesting of categories. We have chosen the most basic and most easy-to-implement solution for nesting categories. The categories table is defined as follows:

CREATE TABLE categories (
   cat_id bigint(21) DEFAULT '0' NOT NULL auto_increment,
   name varchar(32) NOT NULL,
   parent_id bigint(21) DEFAULT '0' NOT NULL,
   PRIMARY KEY (cat_id),
   KEY parent_id (parent_id)

The field responsible for the nesting is of course parent_id; it contains the cat_id value of the category one level above. Actually, this is the most basic tree implementation possible: Each node has exactly one property referencing the parent node. There are a number of drawbacks with this approach, though, the most important being that it's impossible to get all parent nodes for a node with one SQL query. Instead, to get the parent, you need to issue multiple SQL queries; to be exact, you need n-1 single queries for a depth of n levels.

We have chosen a recursive implementation to get the parent nodes in the function kb_cat_get_parents(). It retrieves category nodes from the database as long as cat_id matches the root category. This can best be visualized with an example. Let's assume there are three nested categories:

INSERT INTO categories VALUES (1, 'Main Category', 0);
INSERT INTO categories VALUES (2, 'Sub Category I', 1);
INSERT INTO categories VALUES (3, 'Sub Category II', 2);

When called with an initial cat_id value of 3, the kb_cat_get_parents() function first retrieves the parent ID for this node (which is 2 in our example). Then it calls itself with this ID, forming a recursive function. The terminator of the recursive function is the condition parent_id == 0—this is the root category, and no nodes can be above the root category. The function will call itself recursively as long as this condition is not met.


The requirements for the application include that only registered users should be allowed to submit new entries. Thanks to the PHPLib, adding authentication to the system is a matter of adding a page_open() call to the script you want to protect—in our case, submit.php. This way, the user is only able to access the page contents after having been authenticated.

In the entries table, we store the unique user ID provided by the PHPLib. As we've shown earlier in this chapter, you can access this ID using the $auth array: It's stored in $auth->auth["uid"], and $auth->auth["uname"] contains the username.

Our application doesn't deal with user management. Somewhere else on your Web site, you have to provide means to register as user, edit registrations, send forgotten passwords, etc. Because it's up to you to implement this, we also have no chance to get the full name of a user—all we have is a unique user ID and the username. Therefore, there's a function called real_user_name(), which takes a user ID as parameter and should return a full name for this user. By default, the function simply returns the user ID again; you should extend the function to look up the user's full name in your database and return it.

The Finished Product

The real_user_name() function and all other API functions are held in one central file, lib.inc.php3. As all functions have a basic syntax documentation in the source, it's easy to compile an API overview automatically. All we need is a simple grep command:

grep '^[\\\/ ]*\*' lib.inc.php

This is no replacement for complete technical documentation, however, and should serve only as a quick reference. After having defined the API, the rest of the application deals mostly with invoking the API functions and printing the results.

  • + Share This
  • 🔖 Save To Your Account