In my last article, "Real-Time Sports Scores as a Service," we constructed a functional, if unremarkable, way to distribute sports scores to wayward employees throughout a company. Although our solution met the requirements that we set, it is hosted by a single point-of-failure, the data is insecure, and it suffers from limitations that would quickly prevent scalability.
The one thing that this approach does not suffer from is extensibility. Let's pretend that the booking quickly takes off in a profitable way that only Web business have a way of doing accidentally. The creators (that's us) quit their jobs and move to the Bahamas with a plan to expand the gambling empire and thus the infrastructure of their application. What would be the initial targets for improvement?
The most glaring weakness is the method of data transfer. Query strings are fine for returning entries in your remote address book that begin with W and not the ones that begin with X (who was Xander, anyway?), but they're lacking in areas of performance and extension. Any new piece of data to be handled would require distributing the correct query string parameter to anyone who wanted to provide information. If only there was a way to publish our data format and allow providers to expand their offerings as their capabilities increased. Now there is. With all the talk of XML as the solution to our ills, I would be remiss if I did not establish a standard score-reporting XML format (not guaranteed to comply with any standard but its own).
XML is a technology that is developing at the perfect time to capitalize on the distributed nature of the Web and associated services. Falling under the heading of "You've come a long way, flat file," XML provides an ever-increasing array of tools for creating, validating, and manipulating data in a nonproprietary, human-readable format. For the purpose of expanding our profitable message-passing business, XML makes a perfect package, for a couple of very important reasons.
First, let's do a generic survey of some of the more familiar uses. The weakness of HTML, especially as an interface standard for something as important as the Internet, is that, without compilation, developers are free to do it as badly as they choose, with minimal consequences. What a browser does not understand, a browser tends to ignore. With the addition of a rigorous data form (XML) creating XHTML, HTML can be verified as correct.
Extending this a bit further is something called an XSL style sheet. For those of you familiar with cascading styles sheets (CSS), which apply branding to a Web site in the form of fonts sizes and background colors, XSL takes this to the next logical level and applies a sort of structural style to data. Among other benefits, this brings code reuse. Now adding the side navigation bar and the top navigation bar to another page is a matter of applying the appropriate transformations. Finally, allowing the user to choose not to add the one on the left or the right instead is a matter of applying a different transformation. In addition, similar style sheets can be used to limit data to the capabilities of the consumer. For example, pagers receive only text, while the Jumbotron can receive signals to display fireworks when a home run is hit.
XML as Data Formatting
Our needs are less interface-related and more strictly data-related. The weakness of all binary formats, including Java serialization, is the proprietary basis. Whatever packs the data must also unpack the data. This operation must be done in the right order, with the correct parameter byte width, the correct software version, and so on. The weakness of all previous text-based data formats other than the size and speed is that, to handle textual data, text must be used to delimit the data. Not much progress has been made since comma-delimited files, and everyone has their strategies for dealing with names: "[First name], [Middle Name/No middle name], [Last Name], [Suffix/No Suffix]." With XML, it becomes a simple matter of this:
<NAME> <FIRST_NAME>Gordon</FIRST_NAME> <LAST_NAME>Lightfoot</LAST_NAME> <SUFFIX_NAME>III</SUFFIX_NAME> </NAME>
Adding a middle name is only a matter of another element named <MIDDLE_NAME>. Handling a different combination becomes simple:
<NAME> <FIRST_NAME>Prince</FIRST_NAME> </NAME>
Again, the scoring example intrudes. Sure, as long as we stick to major league baseball, data is relatively obvious:
<NEW_GAME start_time = "8:00 PST"> <TEAM location = "Seattle" mascot = "Mariners"/> <TEAM location="Texas" mascot="Rangers"/> </NEW_GAME>
A query string is relatively inflexible and very ugly at any length. XML improves upon this greatly for messaging and formatting, but XML offers the additional benefit of structured storage. We can put updates in a form roughly as follows:
<UPDATE> <TEAM location = "Seattle" mascot = "Mariners"/> <TEAM location="Texas" mascot="Rangers"/> <POSITION>5th Inning</POSITION> <DESCRIPTION> Alex Rodriguez hit by line drive. </DESCRIPTION> </UPDATE>
Then those updates can simply be inserted between the previous tags with the duplicated <TEAM> information removed. No matter how many separate sources submit their take on an event, they can all be incorporated into a single game-recap file.
Without getting overly excited about this business that I now plan to start, consider a few of the many openings that this format provides. Wrestlemania now becomes a trivial addition to the repertoire, even though only two of the participants are known at the outset. Each time a new wrestler enters the ring, this can be the package:
<UPDATE> <TEAM>Stone Cold</TEAM> <UPDATE>
Now men's versus women's college sports can be indicated with the sex (or gender) attribute. The game recap can include an end_time attribute as well, even though it did not start with one.
Where does it all lead? To the Document Type Definition (DTD). If the business truly will expand beyond the water cooler, a variety of sources must be capable of submitting valid data without having to ask how. By publishing a complete DTD, we allow the general public to view the instructions for formatting. Any updates are controlled by version numbering of a single document. Even if wrestling grows to require an entirely new data format to cover the plot twists and sordid back stories, the update that we showed earlier can still be accepted by a less savvy provider if it's validated against the older DTD. There's no need for database table changes or additions to handle unforeseen customer demands. Yet, XML allows for basic queries, and if the archiving grows too large, a partial database solution could be added to enhance performance.
We also can slap on security features to maintain an official recap from official sources such as ESPN and CNNSi, and an unofficial version supplied by any fans and foes who have submitted their email addresses. Now, color commentary and the words of pork rind-eating brothers-in-arms can scroll in concert across the game and the sport. We also can add a little box next each to allow replies. We then finish with a link to the official team Webcast in either audio or video, depending on the bandwidth. Who could ask for more?