Home > Articles > Software Development & Management > UML

📄 Contents

  1. HTTP
  2. HTML
  3. Web Applications
  4. Summary
This chapter is from the book

Web Applications

Web applications use enabling technologies to make their content dynamic and to allow users of the system to affect business logic on the server. The distinction between Web sites and Web applications is subtle and relies on the ability of a user to affect the state of the business logic on the server. Certainly, if no business logic exists on a server, the system should not be termed a Web application. For those systems on which the Web server—or an application server that uses a Web server for user input—allows business logic to be affected via Web browsers, the system is considered a Web application. For all but the simplest Web applications, the user needs to impart more than just navigational request information; typically, Web application users enter a varied range of input data: simple text, check box selections, or even binary and file information.

The distinction becomes even more subtle in the case of search engines, on which users do enter in relatively sophisticated search criteria. Search engines that are Web sites simply accept this information, use it in some form of database SELECT statement, and return the results. When the user finishes using the system, there is no noticeable change in the state of the search engine—except, of course, in the usage logs and hit counters. This is contrasted with Web applications that, for example, accept online registration information. A Web site that accepts course registration information from a user has a different state when the user finishes using the application.

The architecture for a Web site is straightforward. It contains the same principal components of a Web site: a Web server, a network connection, and client browsers. Web applications also include an application server. The addition of the application server enables the system to manage business logic and state. A more detailed discussion of Web application architectures is given in Chapter 7, Defining the Architecture.

Client State Management

One common challenge of Web applications is managing client state on the server. Owing to the connectionless nature of client and server communications, a server doesn't have an easy way to keep track of each client request and to associate it with the previous request, since each and every Web page request establishes and breaks a completely new set of connections.

Managing state is important for many applications; a single use case scenario often involves navigating through a number of Web pages. Without a state management mechanism, you would have to continually supply all previous information entered for each new Web page. Even for the most simple applications, this can get tedious. Imagine having to reenter the contents of your shopping cart every time you visit it or to enter in your user name and password for each and every screen you visit while checking your Web-based e-mail.

To address this common problem, the W3C has proposed an HTTP state management mechanism.6 This mechanism, more commonly known as "cookies," has received quite a bit of attention from privacy advocates in the past few years and will most likely continue to as more and interesting uses of this mechanism are found. This book isn't about privacy concerns but rather is focused on the technology around Web applications, so I'll focus on describing the technology and leave the philosophy to you.

Cookies

A cookie is a piece of data that a Web server can ask a Web browser to hold on to, and to return every time the browser makes a subsequent request for an HTTP resource to that server. Typically, the size of the data is small, between 100 and 1K bytes; however, the official limit is around 4K. Initially, a cookie is sent from a server to a browser by adding a line to the HTTP headers:

Content-type: text/html
Set-Cookie: sessionid=12345; path=/; expires Mon, 09-Dec-2002 11:21:00 GMT; secure

If the browser is configured to accept cookies, the line is accepted and stored somewhere on the client's machine, depending on the browser vendor. After that, each and every HTTP request to the server is sent back the values of these cookies.

When it is sent to a client, a cookie can have up to six parameters passed with it:

  • Name (required)

  • Value (required)

  • Expiration date

  • Path

  • Domain

  • Requires a secure connection

The Set-Cookie header is a string that contains characters, including white space, commas, and the semicolon. The name and value parameters are required and must not contain white space, commas, or semicolons. The expiration date tells the browser how long to keep this information. The path and the domain are a way of determining which servers, or domains, to send the cookies back to. If the domain is not set explicitly, it defaults to the full domain of the document creating the cookie. The path helps organize cookies within a domain. Only when a resource is requested in the domain under the path will the cookie be sent back to the server.

The server sending the cookie must be a member of the domain that is specified. Thus, a server in the domain http://www.myserver.com cannot set a cookie for the domain http://www.otherserver.com. If it could, one company would be able to set cookies in another company's domain.

The server can send multiple Set-Cookie headers with an HTTP response. When the browser responds with the Set-Cookie header, all cookies for the domain and the path are returned. For example, the server could have included the following Set-Cookie headers:

Set-Cookie: sessionid=12345; path=/; expires Mon, 09-Dec-2002 11:21:00 GMT
Set-Cookie: colorPref=Blue; path=/; expires Mon, 09-Dec-2002 11:21:00 GMT

When it requests a URL in the path / on this same server, the client sends with its HTTP request the following:

Cookie: sessionid=12345; colorPref=Blue

When the response is returned, the server might set another cookie with:

Set-Cookie: rateCode=B; path=/order 

When it requests a URL in path /order, a client sends:

Cookie: sessionid=12345; colorPref=Blue; rateCode=B

Note that all three cookies are sent with the request because the first two are in a higher path and are "inherited" in the mapping.

In addition to the server's being able to set a cookie value, so too can JavaScript. Chapter 3, Dynamic Clients, describes the capabilities of client-side scripting in more detail; here, however, it is sufficient to say that cookies can be set and obtained in multiple ways. The specific mechanisms for setting and accessing cookies are typically provided by the development environment and architecture, by a single function call to an accessible object.

This mechanism is not without faults. Privacy advocates point to cookies as the primary mechanism supporting the tracking of unknowing users across multiple Web sites. In fact, while writing this chapter, I wanted to look at some sample cookies on my machine. When I scanned the list, I was surprised to find a few cookies from domains that I know I had never visited. Of course, this piqued my curiosity, and as I looked at the data in the cookies, I noticed name/value pairs that included URLs from sites I do remember visiting. Investigating a little further, I found out that these cookies were placed on my machine through the use of banner ads that appeared in the sites that I did visit.

The reality here is that the images in most banner ads are not hosted by the sites that referenced them. Rather, companies specializing in banner ads sell a service to Web sites. When someone visits those sites, the companies provide most of the content of the Web page, as well as a reference to an image stored on the advertisement company's server. Because the image is obtained with a standard HTTP request, the exchange of cookies also happens with this "other" server. So when you visit a Web page that has banner ads in it, they most likely are coming from another company's server and are being collected and managed by that company. After a while, you will visit enough Web sites using the same advertiser's server that the banner ad company can start to build a profile of the sites you visit most and begin to target more appropriate advertisement for you.

Using cookies in this way is very controversial and has led to the heated debate on the use of cookies, privacy, and the Internet. But we won't focus on that type of usage here. Instead, we'll look at how cookies were intended to be used: to manage client state in the context of a single use case or set of use cases.

Sessions

A session represents a single cohesive use of the system. A session usually involves many executable Web pages and a lot of interaction with the business logic on the application server. Because achieving a use case goal often requires the successful execution of a number of executable Web pages, it is often useful to keep track of a client's progress throughout the use case session.

The most common example of keeping client state on the server can be found on the Internet at any e-commerce site. The use of virtual shopping carts is a nice feature of an online store. A shopping cart contains all the items an online customer has selected from the store's catalog. In most sites, the shopper can check the contents of the cart at any time during the session. This feature requires that the server be capable of maintaining some state about the client across a series of Web page requests.

Session state in a Web application can be maintained in four common ways, two of which require the use of cookies:

  1. Place all state values in cookies.

  2. Place a unique key in the cookie and use with a server-managed dictionary or map.

  3. Include all state values as parameters in every URL of the system.

  4. Include a unique key as a parameter in every URL of the system and use with a server-managed dictionary or map.

When you place all state values in cookies, you are first limited by size (4K) and at most 20 cookies per domain. All state data must be encoded into simple text: no white space, semicolons, and so on. You can't directly use higher-level objects in the session state. The real limitation, however, is that many clients' security settings don't allow the automatic storing of cookies. If the application is an Internet application targeting the consumer market, you don't want to automatically turn away a significant number of potential customers without a good reason.

When a unique key is used in a cookie and then used on the server as a key into a dictionary or a map, any type of server-side object can be part of the session state. This is the default mechanism used by most Web application–enabling environments, such as ASP and JSP. It is very effective and flexible; however, like any cookie-based method, it depends on the willingness of clients to accept cookies.

URL redirection is the other class of session management. In this mechanism, all URLs in the system are dynamically constructed to include parameters that contain either the entire session state or only one key into a server-side dictionary.

Each mechanism has tradeoffs. Keeping a dictionary in memory for every user of the system could be very expensive if it never expired. For practical reasons, most session dictionaries are removed when the Web application user either finishes the process or stops using the system for a set period of time. A session timeout value of 15 minutes is typical. No matter what technique is used, the management of client state on the server is almost always an issue in Web applications.

Enabling Technologies

The enabling technologies for Web applications are varied and differentiated principally by the vendor. Enabling technologies are, in part, the mechanism by which Web pages become dynamic and respond to user input. Of the several approaches to enabling a Web application, the earliest involved the execution of a separate module by a Web server. Instead of requesting an HTML-formatted page from the file system, the browsers would request the module, which the Web server interpreted as a request to load and to run the module. The module's output is usually a properly formatted HTML page but could be image, audio, video, or other data.

The original mechanism for processing user input in a Web system is the Common Gateway Interface (CGI), a standard way to allow Web users to execute applications on the server. Because letting users run applications on your Web server might not be the safest thing in the world, most CGI-enabled Web servers require CGI modules to reside in a special directory, typically named cgi-bin. CGI modules can be written in any language and can even be scripted. In fact, the most common language for small-scale CGI modules is Perl (practical extraction and reporting language), which is interpreted each time it is executed.

Even though HTML documents are the most common output of CGI modules, they can return any number of document types. They can send to the client an image, plaintext—an ASCII document with no special formatting—audio, or even a video clip. They can also return references to other documents. In order for it to interpret the information properly, the browser must know what kind of document it is receiving. In order for the browser to know this, the CGI module must tell the server what type of document it is returning.

In order to tell the server what kind of document is being sent back—a full document or a reference to one—CGI requires a short header on the output. This header is ASCII text, consisting of separate lines followed by a single blank line. For HTML documents, the line would be

Content-type: text/html

If it does not build the returning HTML Web page, the CGI module can redirect the Web server to another Web page on the server or even another CGI module. To accomplish this, the CGI module simply outputs a header similar to

Location: /responses/default.html

In this example, the Web server is told to return the page default.html from the responses directory.

The two biggest problems with CGI are that it doesn't automatically provide session management services and that every execution of the CGI module requires a new and separate process on the application/Web server. Creating a lot of processes can be expensive on the server.

All the available solutions overcome the multiprocess problems of CGI by adding plug-ins to the Web server. The plug-ins allow the Web server to concentrate on servicing standard HTTP requests and deferring executable pages to another, already running process. Some solutions, such as Microsoft's Active Server Pages, can even be configured to run in the same process and to address space as the Web server itself, although this is not recommended.

Two major approaches to Web application–enabling technologies are used today: compiled modules and interpreted scripts. Compiled-module solutions are CGI-like modules that are compiled loadable binaries executed by the Web server. These modules have access to APIs that provide the information submitted by the request, including the values and names of all the fields in the form and the parameters on the URL. These modules produce HTML output that is sent to the requesting browser. Some popular implementations of this approach are Microsoft's Internet Server API (ISAPI), Netscape Server API (NSAPI), and Java servlets.

ISAPI and NSAPI server extensions can also be used to manage user authentication, authorization, and error logging. These extensions to the Web server are essentially a filter placed in front of the normal Web server's processing.

Compiled modules are an efficient, suitable solution for high-volume applications. The biggest drawbacks are related to development and maintenance. These modules usually combine business logic with HTML page construction. The modules often contain many print lines of HTML tags and values, which can be confusing and difficult for a programmer to read.

The other problem is that each time the module needs to be updated, or fixed, the Web application has to be shut down and the module unloaded. For most mission-critical applications, this is not much of a problem; the rate of change in the application should be small. Also, it's likely that a significant effort would have been made by the QA/test team to ensure that the delivered application was free of bugs. For smaller, internal intranet applications, however, the rate of change might be significant. For example, the application might provide sets of financial or administrative reports. The logic in these reports might change over time, or additional reports might be requested.

The other category of solutions is scripted pages. Whereas the compiled-module solution looks like a business logic program that happens to output HTML, the scripted-page solution looks like an HTML page that happens to process business logic. A scripted page, a file in the Web server's file system, contains scripts to be interpreted by the server; the scripts interact with objects on the server and ultimately produce HTML output. The page is centered on a standard HTML Web page but includes special tags, or tokens, that are interpreted by an application server. Typically, the file name's extension tells the Web server which application server or filter should be used to preprocess the page. Some popular vendor offerings in this category are JavaServer Pages, Microsoft's Active Server Pages, and PHP.

Figure 2-5 shows the relationship between components of the enabling technology and the Web server. The database in the figure, of course, could be any server-side resource, including external systems and other applications. This figure shows how the compiled-module solution almost intercepts the Web page requests from the Web server and in a sense acts as its own Web server. In reality, the compiled module must be registered with the Web server before it can function. Nonetheless, the Web server plays only a small role in the fulfillment of these requests.

Figure 2-5FIGURE 2-5 Web server–enabling technologies

The scripted-page solution, however, is invoked by the Web server only after it has determined that the page does indeed have scripts to interpret. Typically, this is indicated by the file name extension: .aspx, .jsp, .php. When it receives a request for one of these pages, the Web server first locates the page in the specified directory and then hands that page over to the appropriate application server engine, or filter. The application server preprocesses the page, interpreting any server-side scripts in the page and interacting with server-side resources, if necessary. The results are a properly formatted HTML page that is sent to the requesting client browser.

Even though JavaServer Pages are scripted, they get compiled and loaded as a servlet the first time they are invoked. As long as the server page doesn't change, the Web server will continue to use the already compiled server page/servlet. This gives JavaServer Pages some performance benefits over the other scripted-page offerings.

The real appeal of scripted pages, however, is not their speed of execution but their ease of development and deployment. Typically, scripted pages don't contain most of the application's business logic, which instead is often found in compiled business objects that are accessed by the pages. Scripted pages are used mostly as the glue that connects the HTML user interface aspects of the system with the business logic components.

In any Web application, the choice of technologies depends on the nature of the application, the organization, and even the development team itself. On the server, a wealth of technologies and approaches may be used, many of them together. Regardless of the choices, they need to be expressed in the larger model of the system. The central theme in this book is that all the architecturally significant components of a Web application need to be present in the system's models. Servers, browsers, Web pages, and enabling technologies are architecturally significant elements and must be part of the model.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020