Commerce Server Planning
In this Chapter
Designing Your Site
Common Site Designs
Designing Your Site
Commerce Server 2002 is not something you simply install and start using. Well, you could, but you'd regret it. Any e-commerce site is a complex collection of technologies, and they require a significant amount of planning and design before you begin implementing them. In this chapter, we'll discuss some of the primary design concepts behind Web sites built with Commerce Server 2002. We'll use our fictional sample company, HipThreads.Net, as a template for site design, and we'll discuss other business needs that may drive site design decisions in your environment.
Unfortunately, no Web-site design decisions are hard and fast. There are certainly best practices, and we'll introduce you to them in this chapter. In the end, though, Web-site design depends entirely on your business needs, your management decisions, your staff's expertise, and your budget.
So, rather than give you rules for Web-site design, we'll give you guidelines. We'll discuss various common business needs, and outline the design decisions to which those needs usually lead. Carefully review our design guidelines with your own business needs in mind, and make the appropriate decisions for your e-commerce Web site.
Designing a Web site requires considerable effort, and unfortunately it's an effort many developers and site architects don't bother taking. The various versions of Commerce Server have often garnered a poor reputation for performance and scalability, simply because developers didn't understand the complex technologies that Commerce Server is built upon, nor did they take the time to design a site that took advantage of those technologies' strengths and avoided their weaknesses.
You won't have that problem, though, because in the next few sections we're going to introduce you to all the various design goals and behind-the-scenes technology that makes Commerce Server work.
We'll start with Web-page architecture, which covers the design of Web pages for maximum performance and scalability. We'll move on to site architecture, where we'll introduce you to distributed Commerce Server site design. We'll finish with network design, which is an introduction to the ways that a well-designed network can help Commerce Server achieve maximum performance. In each of these sections, we'll cover the design challenges presented by each technology. Then, in our "Tips from the Pros" section, we'll cover real-world design tips that show you how all of these technologies fit together to meet your business goals.
You can program your site's Web pages in a number of different ways. The development technique that's best for your site, though, depends on the business needs of your site and the amount of performance and maintainability you want to achieve from your site. In the next few sections, we'll show you how Microsoft ASP technology works inside, so that you can start thinking about ways to improve Web page performance. Later, in our "Tips from the Pros" section, we'll show you ways to work aroundor at least minimizethese problems when programming Web pages for your site.
ASP Under the Hood
Understanding how ASP (and ASP.NET) operate is the key to creating Web pages that perform well on a Web site. Many of the worst-performing Web sites we've seen were simply the result of a Web developer who didn't understand how ASP worked, and fell into pretty much every performance trap the technology offers.
For a brief tutorial on ASP, see "Active Server Pages."
For a brief tutorial on ASP.NET, see "ASP.NET."
When a user's Web browser requests an ASP page from a Web server, IIS reads the physical ASP file from the server's hard disk into memory. IIS then reads through the entire file from beginning to end. All regular HTML is sent directly to the user. All ASP program code is processed, line by line, and the results are sent to the user.
Simply naming an HTML file with an .asp file extension places a slight performance overhead on IIS, even if the file doesn't contain a single line of program code. That's because IIS has to scan through the entire file looking for the code. With a regular HTML file (with an .htm or .html file extension), IIS simply transmits the file's contents to the user without doing any kind of scan for ASP code.
Only give files an .asp filename extension if the files actually contain ASP program code that IIS needs to process.
ASP's biggest performance problem is that ASP program code is interpreted. In other words, ASP program code, in the form of VBScript or JScript statements, is stored in plain text within the ASP file. IIS has to translate each line of script into something that the computer can execute, one line at a time. When a second user requests the same file, IIS has to go through the whole process a second time. The process of interpreting the script into computer-executable instructions can be time-consuming. When thousands of users request the same file, the overhead is very noticeable. For example, a powerful server running IIS and delivering only static HTML pages could easily support a thousand users at once. That same server might support only a third of those users if they were requesting ASP pages with complex scripts.
ASP's performance slows even more when scripts begin accessing COM and COM+ objects, such as the ActiveX Data Objects (ADO) libraries used to provide database access. IIS must perform a significant amount of work to load the appropriate library into memory, feed it interpreted instructions from the ASP script, and manage the object's execution in memory.
How can you avoid ASP's performance problems? Don't use it. ASP.NET offers improved performance by using a radically different means of processing Web pages (which we'll look at in a bit). If you aren't ready to make the jump to ASP.NET, minimize your use of ASP pages as much as possible. We'll discuss tips for doing so later in this chapter.
One of the biggest challenges of Web-page development has been overcoming the problem of maintaining state. State is all the data associated with a particular user, such as the contents of their shopping basket, their username, and so forth. Web servers are a bit like fast-food counters: You go up to the counter, place your order, and get your food. When you go back to the counter later for more ketchup, the person behind the counter acts as if they've never seen you before in your life. They don't remember what you ordered, how you paid for it, or even that you've been in the store before. In other words, they don't maintain any state information about you.
Retail shops such as Nordstrom's do maintain state information. Your personal shopper remembers the last outfit you were looking at, and makes recommendations for matching accessories. When you return to the store the following day, the personal shopper remembers your waist size, your favorite colors, and even your name. Maintaining state is useful in any shopping scenario, because it allows you to provide more personalized service to your customers. On a Web server, state is absolutely necessary, because it enables the server to remember what customers put into their shopping basket, which in turn makes it possible for the customer to check out and purchase those products.
Web developers have always struggled with different ways to maintain state. Prior to the introduction of ASP, back-end databases provided the best solution. Web developers would use a process similar to the following:
When a new customer started using the Web site, the site would realize that the customer didn't have an ID number. So, the Web site would create an ID number and assign it to the customer. The customer's Web browser would be given a cookie, which is a small piece of information passed between a Web browser and server. The cookie would contain the customer's ID number. The Web server would also create a matching record in a back-end database.
Every time the customer requested a new page from the Web server, the customer's browser would send along the cookie. The cookie would enable the Web server to identify the customer and update the customer's information in the database. For example, if the customer added a product to their shopping basket, the Web server would make the appropriate entry in the database, attaching the customer's ID.
At some point, the database server would clean up state information. Any information pertaining to a completed order could be deleted, because the order information would be copied to other tables in the database. Any customers who hadn't been heard from in a day or two could be cleared out, too, because they had obviously stopped using the site for the time being.
The program code required to implement that state maintenance functionality could be pretty complex. That's why ASP was such a hit when Microsoft introduced it: ASP contained built-in functionality to help Web developers maintain state information. Basically, ASP followed the same process that programmers had been using for years. It sent a cookie to each user's Web browser with an ID number and used that ID to match users to their state information. Rather than storing state information in a database, though, ASP stored the information in the server's memory, where it was easily accessible. Programmers accessed state information in special variables called session variables. For example, programmers might write ASP code like the following:
This line of code outputs the contents of the ""UserName"" session variable. ASP maintained a different ""UserName"" value for each user accessing the Web site, and used their cookie-based ID number to match users up to the appropriate set of session variable values.
Although session variables made state maintenance very easy for developers, it created some significant problems, too:
Because session variables are stored in the Web server's memory, they place an additional burden on the Web server. Web servers making heavy use of session variables can support fewer users than Web servers that don't use session variables.
If a Web server crashes, all the information in session variables is lost.
In a Web farm, users are load balanced across several Web servers. Each time the user requests a new Web page, it might be delivered from a different server. Unfortunately, session variables can't be shared between Web servers, effectively defeating the load-balancing capabilities of a Web farm.
Some users disable cookie support on their browsers. Those users can't use a site that utilizes session variables, because session variables rely on a cookie to match users to their data.
Problems with session variables sometimes lead to ASP "forgetting" them, making it seem to users as if all their datasuch as the contents of their shopping baskethas suddenly vanished.
The solution is to ignore session variables, and return to the tried-and-true technique of using a database to store session state. Session variables are nice for a small, intranet Web site, but unsuited for large, multiserver Internet Web sites.
ASP.NET was created to solve a number of problems with the older ASP technology. As we've already discussed, ASP works by sending along regular HTML, and processing ASP code. That results in Web pages that can be difficult to maintain, because they are a hodge-podge of HTML and program code. ASP.NET uses a completely different programming model, where the HTML itself forms part of the program code. The result is Web pages that are easier to maintain, although more server processing is theoretically required to produce the HTML page that the Web server sends to the user.
To get a better idea of how ASP and ASP.NET differ, see "ASP.NET."
Fortunately, ASP.NET still manages to provide better performance than ASP, though, because it compiles Web pages. You're probably familiar with compiling in a programming language such as Visual Basic: The compiler goes through the entire program, interpreting each line and producing code that the computer can execute (often called native code). When ASP.NETs executes a Web page for the first time, it performs a similar process, creating a compiled version of the page that executes very quickly. The result is a slight delay the first time the page is requested by a user, and subsequently faster execution from then on. ASP.NET monitors the original page, and recompiles it whenever the page is changed by a programmer.
ASP.NET also makes it possible to replace and extend portions of ASP.NET itself. For example, although ASP.NET includes the same session variables that ASP supports, ASP.NET provides a mechanism for a developer to replace the way session variables work. Using ASP.NET, you could write your own "session variables" that use a database back-end. Doing so would give you the easy programming of session variables, with the performance and scalability of tried-and-true database-based state maintenance.
Site architecture is the way the various components of your site fit together. For example, you've probably already figured out that your Web server and your SQL Server computer will be different machines, because one machine can't realistically handle both tasks in a production environment. You can increase the scalability of your site by breaking out other components onto dedicated machines, too.
Commerce Server's Subsystems
Commerce Server consists of five core subsystems:
Business Analytics, which includes a data warehouse and reporting functionality.
The profiling system, which tracks user information.
The targeting system, which allows Commerce Server to select Web site content based on the data contained within customer profiles.
The catalog system, which includes product catalogs, and other product-related information and tasks.
Processing pipelines, which are responsible for enforcing rules and business processes in a variety of situations, such as customer checkout.
Each of these subsystems can run on a single Commerce Server computer, or they can be distributed across multiple computers. Distributing the subsystems allows each server in your site to focus on a single set of related tasks, and allows you to fine-tune each server to achieve the best performance for those tasks.
Understanding what these various subsystems are used for will help you design them into your site. We cover each of these subsystems in detail in Part III of this book, with a chapter dedicated to each major Commerce Server component.
Creating a Distributed Architecture
Most Web-site architects start with the assumption that each Commerce Server computer will be equal, performing the same tasks as the other servers in the site. Don't make that assumption. Instead, start with the assumption that each server will be single purposed. Budgetary or management issues may require you to multipurpose some servers, which is fine, but always start with an architecture that uses single-purpose servers.
It's relatively easy to picture a set of single-purposed back-end database servers. Figure 3.1 shows three separate SQL Server computers, each responsible for a specific portion of the overall Commerce Server data. This form of distributed architecture allows each SQL Server to focus on a specific portion of the overall data, and helps spread the database workload across three servers, instead of just one.
Figure 3.1 Distributing Commerce Server's database not only allows you to improve performance, but makes it easier for you to grow your site's back end to accommodate increased traffic.
You can also design your Web server tier with a distributed architecture. How you do that is not as readily apparent, because there aren't many clues within Commerce Server itself to help you determine where to make the split. A common tactic is to dedicate one or more Web servers to the major tasks customers perform on your site. Figure 3.2 shows an example, with a group of servers dedicated to providing search functionality, another dedicated to serving product detail pages, and a third dedicated to handling the checkout process and customers' shopping baskets.
In this example, each set of Web servers connects to all the back-end database servers, so they can access all the data they need to function. You might include additional servers to handle tasks such as sending email (which can be very resource intensive), serving banner ads, and so forth. Consider all the functionality your site will deliver, and think about how that functionality can be broken down and distributed.
Figure 3.2 Distributed Web sites dedicate a Web server (or a set of them) to specific tasks. Each task-specific set of servers can then be expanded individually to accommodate site traffic.
Just because a Web page comes from one server doesn't mean the content of that Web page has to come from the same server. For example, it's possible to have one Web server deliver a product detail page. Within that page, banner ads might be pulled from a second server, product graphics from a third, and personalized suggestions for other products from a fourth. To the customer, it'll look like everything came from one server.
By distributing the information across several servers, they can work together to provide the necessary information to customers, making each server capable of serving more customers than it could if it had to deliver all of that information itself.
Make sure each group of specialized servers contains at least two servers. That way, if one server crashes, your site won't be without the functionality that the group provides.
Many Web site architects think that network architecture simply means installing the fastest possible network and letting 'er rip. Not so! There are a number of factors to consider when designing a network for a busy Web site:
Every single component in the network represents a possible point of failure. A well-designed network will contain redundancies at every level, so that the site can continue to function somewhat normally even when a component fails. This technique includes providing redundant connections to the Internet through redundant routers, redundant network adapters in each server, and so on.
Most networks are built around the Ethernet protocol, which is a shared access networking system. Shared access means that each participant in the network has to work together to use the network. Imagine a network as a large room full of people. Only one person is allowed to talk at once, because if everyone starts talking at the same time, nobody will be able to understand anything. Installing a faster network is like moving everyone to a bigger room: More people can fit in, but only one person can talk at a time. True, that person can talk faster, so he can quit hogging the room faster, but it's still only one person talking at a time.
Creating high-performance networks, then, involves many smaller networks instead of one big one. Network switches can help break a large network into many smaller ones, which are individually more efficient than the big network. You should also provide dedicated (nonshared) network connections between your Web servers and the servers they rely on, such as back-end database servers.
Networking is one area of technology where you definitely get what you pay for. Many networks can receive a huge performance benefit simply by installing higher-quality network adapters, switches, routers, and other devices. Although you may be tempted to skimp on things such as network adapters, they're not all the same. Use high-quality, name-brand hardware to achieve consistently high performance.
Network Design Goals
Network design goals are relatively straightforward:
Networks must be scalable, which means they must easily expand to accommodate more use. That includes every aspect of the network, including the Internet connection, which is usually the most difficult to expand because of the cost of Internet connectivity.
Networks must be maintainable, which means they must be easy to modify, and easy to troubleshoot in the event of a problem. Ensuring network maintainability is usually as easy as documenting the network: writing down network addresses, which computers connect to which hubs and switches, and so forth.
Networks must be available, which means they must be able to continue functioning after a hardware or network failure. You can achieve various levels of availability by installing and configuring redundant network hardware and redundant network connections.
Network design itself is less straightforward than these goals, though. Network design decisions are also impacted by your budget, your staff's skill level, and the availability of the correct network hardware and connections.
Network Design Examples
We'll provide more specific examples later in this chapter, in our "Tips from the Pros" section. However, here are some general examples of how network design can increase your site's performance:
Ethernet networks can usually maintain a maximum sustained utilization of about 50%. That means a 100Mbps network will usually have a maximum sustained throughput of about 50Mbps. Windows is capable of transmitting a lot more data than that, so using multiple network adapters in a server will take advantage of Windows' maximum throughput.
Dedicated network connectivity improves throughput. When only two computers are connected to a network, they can run at a much higher sustained utilizationusually up to 80% or more. By connecting each network device in a site to a switch, instead of a hub, you effectively create many individual, dedicated networks.
Using a deterministic networking system, such as Fiber Distributed Data Interface (FDDI), provides better throughput than shared systems such as Ethernet. That's because FDDI includes built-in mechanisms to determine when a particular computer is allowed to use the network. Systems such as FDDI often provide close to 90% sustained utilization, although the systems' hardware components are often more expensive than Ethernet.
Using high-end routers will allow you to use two Internet connections from different service providers. Under normal conditions, your site will use both connections, improving the site's performance. If one connection fails, both routers can use the remaining connection, ensuring that your site remains available (although with less performance than normal).
Web servers should have at least two network adapters. One should provide communication with users (whether over the Internet or a private intranet), and the other should provide communications with the site's back-end servers. This technique takes advantage of Windows' network throughput, and ensures that back-end communications don't have to compete for bandwidth with user communications.
Our "Tips from the Pros" section, later in this chapter, will provide more specific examples and network diagrams that utilize these basic techniques.
Network Architecture and Hosted Web Sites
You can't always do what you want when it comes to network design, especially if your site is being hosted at someone else's facility. Many Web hosting companies have specific requirements for networking, and might not allow you to create a complex, multitier network like you want.
Unfortunately, if your Web host says, "No," there's not much you can do. When shopping for a Web hosting company (if you plan to use one), discuss the kind of network you'd like to build, and how it might accommodate you. Because at least some of your site's networkmainly the Internet connection and the associated routerswill be controlled by the hosting company, find out what kind of redundancy it has. Is it using multiple Internet connections, preferably from multiple vendors? Is it using routers that can get traffic to whichever connection is working at the time? Are the hosting company's Internet service providers well-connected to the Internet themselves? Is it a certified Microsoft Hosting and Application Services provider?
Asking as many questions as possible will help you determine which Web-hosting company is best for you. The hosting company's network is the most valuable resource they provide you. Sure, a nice data center with fire suppression and diesel electrical generators is great to have, but almost any reputable company can provide those amenities. In the end, Web-hosting companies are distinguished by the quality of their networks.