mySAP: Determining a Suitable Stress-Test Business Process
- Overview: What's a Mix?
- Real-World Low-Level Technology Stack Test Input and Mixes
- Testing and Tuning for Daily System Loads
- Testing and Tuning for Business Peaks
- Identifying Key Transactions and Business Processes
- Real-World Access Method Limitations
- Best Practices for Assembling Test Packages
- SAP Component and Other Cross-Application Test Mix Challenges
- Tools and Approaches
Finally, we are in a position to give attention to one of the core matters surrounding testing and tuning: determining a suitable stress-test business process or other "input data" test mix. After all, it is the mix of activities, transactions, and processes executed under your guidance that ultimately simulates your financials' month-end close or helps you understand the load borne by your SAP customer-facing systems during the holidays or other seasonal peaks. And it's the test mix that brings together master data, transactional data, customer-specific data, and other input necessary to fuel a business process from beginning to end. You've certainly heard the phrase "garbage in, garbage out." It applies without question here, because poor test data will never allow you to achieve your testing and follow-on tuning goals.
Beyond SAP application-level data, though, input data can also consist of the scripts, batch files, configuration files, and so on required of lower level testing tools, like those associated with testing the performance of your disk subsystem, network infrastructure, and so on. As you know by now, sound testing of your SAP Technology Stack encompasses much more than strictly business process testing.
The goal of this chapter is therefore to help walk you through the challenges surrounding data: how to select appropriate data, what to look for, and what to avoid. In this way, you'll be that much closer to conducting stress-test runs that not only "work"but also truly simulate the load planned for your production environment.
9.1 OverviewWhat's a Mix?
What exactly is a test mix? In true consulting fashion, the right answer is "it depends." The next couple of pages provide a high-level overview of a test mix, followed by more detailed discussions on this subject. For beginners, note that all test mixes must include a way to control timing, or the time it takes for a test run (or subtasks within a run) to actually execute. Sometimes this factor is controllable via the test mix itself or a test-tool configuration file, whereas other times it's left up to test-tool controller software or the team executing and monitoring the test runs.
Timing is not everything, though. Test mixes also typically allow the number of OS or other technology stackbased threads or processes to be controlled, another key input factor relevant to multiuser stress testing. In this way, the very load placed on a system may be controlled, making it possible to vary workload rates or individual users or processes without actually changing the absolute number of users or processes.
Unfortunately, we must also keep in mind that the assorted tools and approaches we have examined thus far differ greatly in terms of their fundamental execution and goals, and therefore their specific data-input-mix requirements. On the flip side, test mixes tend to adhere to a set of general rules of thumb, too. But rather than trying to lay down a dictionary-style definition, let's instead look at test mixes by way of example, working our way up the technology stack, as follows:
For network infrastructure testing, a test mix consists of factors like the number of data transfers or other network operations performed, the size of those transfers/operations, the use of configuration files that define the two end points necessary for testing (i.e., server-to-server testing), and even the protocols that may be tested (although for SAP testing, there's usually little reason to test anything other than TCP/IP-based RFC, CPIC, and ALE network activity).
For disk subsystem testing, a test mix defines the number of reads and writes to be executed (either as a ratio or an absolute number), the types of operations to be executed (sequential versus direct/random, or inserts versus appends), data block sizes, and even the number of iterations (rather than leveraging timing criteria, a test mix might instead specify the number of I/O operations each thread, process, or test run will execute, such as 1,000). And the number of data files or partitions against which a test is executed is also often configurable, as is the size of each data file. For instance, I often execute tests against six different data files spread across six different disk partitions, each 5GB in size, thus simulating a small but realistic 30GB "database."
For server testing, a test mix might include the number of operations that specifically stress a particular subsystem or component of the server (e.g., the processor complex, RAM subsystem, system bus). Server testing often is intertwined with network infrastructure and disk subsystem testing, too, and therefore may leverage the same types of data input as previously discussed.
At a database level, a test mix must reflect operations understood and executable by the explicit DBMS being tested. Thus, SQL Server test mixes may differ from Oracle in terms of syntax, execution, and so forth. But, in all cases, a database-specific test will reflect a certain type of operation (read, write, join) executed against a certain database, which itself supports a host of database-specific tuning and configuration parameters.
A single R/3 or mySAP component's test mix will often reflect discrete transactions (or multiple transactions sequentially executed to form a business process) that are executed against only the system being tested. I call these "simple" or "single-component" tests, because they do not require the need for external systems, and therefore CPIC-based communications, external program or event-driven factors, and interface issues stay conveniently out of scope. (Note that CPIC, or SAP's Common Programming Interface Communication protocol, allows for program-to-program communication.) The input necessary to drive these simple tests is transaction-specific, therefore, and relatively easy to identify: all input data that must be keyed into the SAPGUI (anything from data ranges to quantities to unique transaction-specific values like PO numbers, and more) and any master, transactional, or other client-specific data must be known, available, and plentiful. User accounts configured with the appropriate authorizations and other security considerations also act as input. The right mix essentially becomes a matter of the proper quantity of high-quality input data versus the amount of time a test run needs to execute to prove viable for performance tuning.
Complex R/3 or mySAP component test mixes, on the other hand, necessitate multiple components or third-party applications to support complex business process testing that spans more than a single system. Even so, the bulk of the input data may still be quite straightforward, possibly originating in a single core system. But like any complete business process, the output from one transaction typically becomes the input to a subsequent transaction. Cross-component business processes therefore require more detailed attention to input data than their simpler counterparts, because the originating system (along with other supporting data) must be identified as well.
Many of these examples are discussed in more detail later in this chapter. Suffice it to say, though, that a single TU or stress-test run can quickly grow complex from an input data perspective if several technology stack layers or systems are involved, especially in light of the many stack-specific test tools that might be used.
9.1.1 Change-Driven Testing
It's important to remember one of the fundamental reasons behind testingto quantify the delta in performance that a particular change or configuration creates. I refer to this generically as change-driven testing, and believe that most if not all performance-oriented testing falls into this big bucket. But it's impossible to quantify and characterize performance without a load behind the change. In other words, things don't tend to "break" until they're exercised. The strength and robustness of a solution remains unproven, just like the maximum load a weight-trainer can lift, until it (or he or she) successfully supports or lifts it. The "it" is the workload, in essence the manipulation or processing of input data. Thus, it only follows that a good performance test depends on a good mix of data, data that has been identified and characterized as to their appropriateness in helping you meet your simulation and load-testing goals. Not surprisingly, stress tests executed without the benefit of adequate test mix analysis, or workload characterization, often fall short in achieving their goals.
9.1.2 Characterizing Workloads
Given that the workload a test run is to process is central to the success of stress testing, characterizing or describing that workload is paramount as well. Workloads vary as much as the real world varies, unfortunately, so there's no easy answer. Key workload considerations as they pertain to the SAP application layer include the following:
The mix or ratio of online users to batch jobs, reports, and other similar processes, reflecting a particular business scenario.
The mix of functional areas, such that the resulting mix reflects a sought-after condition (e.g., month-end closing for the entire business) or state (e.g., an average daily load).
The pure number of online users, batch processes, report generators, and so on leveraged to create a representative load on an SAP system.
The predictability of a workload, so that apples-to-apples comparisons may be made against subsequent test runsin other words, avoiding true randomization when it comes to input data is important. True randomization needs to be replaced instead with pseudorandom algorithms that are at once "random" and repeatable.
The quantity of data. Low quantities result in higher cache hit rates as more data may be stuffed into hardware- and software-based caches, therefore exercising the disk subsystem less than the workload truly would otherwise.
The quality of data. Only unique combinations of customers, materials, plants, and storage locations "work" to create a sales order, for example. Other combinations result in error messages in the best of cases, and more often in failed or simply incompletely executed business processes.
The configuration of the system itself in relation to performance also represents "input" to some extent as well, though I prefer to keep this separate, under the guise of configuration or current-state documentation as discussed in the previous chapter.
But how do you determine what a representative workload or data mix looks like? In the past, I've spent most of my time speaking with various SAP team members to answer this question. Functional leads are your best bet, because they know the business as well as anyone else on the SAP technical team. But SAP power users representing each of the core functional areas are also valuable sources of workload information, as are management representatives of the various functional organizations found in most companies.
Outside of people resources/team members, you can also determine the "Top 40" online and batch processes through CCMS's ST03 or ST03N, both of which display detailed transaction data over defined periods of time. That is, you can drill down into the days preceding or following month-end to identify precisely the transactions that are weighing down the system most in terms of CPU, database, network, and other loads, along with relative response-time metrics. Particularly heavy hours, like 9 a.m. to 11 a.m., or 1 p.m. to 4 p.m., can be scrutinized closely as well. These "top transactions" can be sorted in any number of ways, too: by the total number executed, the peak hour executed, and even by greatest to least impact (highlighting the transactions that beat up the database or CPU or network the worst, for example). And, because CCMS since Basis release 4.6C allows you to globally view ST03 data across an entire SAP system, the once time-consuming job of reviewing individual application servers for your production BW system, for example, is no longer necessary.
Once you understand the particular transactions and business processes representative of a particular workload or regular event/condition, you must then consider how you will represent this workload in a stress test. To make the process of workload characterization as flexible and manageable as possible, I like to create "packages" of work. Within a package, the work is typically similar in natureonline transactions that focus on a particular business process or functional area, batch processes that execute against a particular set of data or for a similar period of time, and so on might make sense for you. Besides maintaining a certain functional consistency, I also chop up the packages into manageable user loads. For example, if I need to test 600 CRM users executing a standard suite of business activities, I'll not only create a set of business scripts to reflect those activities, but I'll also create perhaps six identical packages of 100 virtual users each, so as to easily control and manage the execution of a stress-test run. If the workload needs to be more granular and represent five core business activities, I might divide the packages instead into these five areas (where each package then reflects unique business process scripts), and work with the business or drill down into the CCMS to determine how many users need to be "behind" or associated with each package. Even then, if one of these granular and functionally focused packages represents many users, I would still be inclined to chop it up further as just mentioned and as shown in Figure 9-1. More details regarding the creation and divvying up of test packages are discussed later in this chapter.
Figure 9-1 Once your workload is characterized and sorted by tasks, functional areas, or business processes, it makes sense to further divide the workload into smaller and more manageable packages, each reflecting a representative user mix.
9.1.3 Don't Forget to Baseline!
Beyond baselining and documenting the configuration to be stress tested, each test mix also needs to be characterized by way of a baseline. The baseline serves as documentation relevant to a particular statethe configuration of the system in relation to the workload that it can supportso that future changes to either the technology stack or workload are not only easily identified but also specifically quantified in terms of performance. In my experience, these initial performance baselines tend to multiply rather quickly. For instance, I'll often begin initial load testing by working through a number of specific configuration alternatives, testing the same test mix against each different configuration in the name of pretuning or to help start off a stress-test project on the best foot possible. Each alternative rates a documented baseline, but only the most promising end-results act as a launch pad for further testing and pretuning iterations.
Once a particular configuration seems to work best, I then move into the next phase of baselining, called workload baselining. This phase is very much test mixcentric, as opposed to the first phase's configuration-centric approach. The idea during these workload characterization and baselining processes is to maintain a static system configuration (i.e., refrain from tuning SAP profiles, disk subsystems, etc.), so as to focus instead on monitoring the performance deltas achieved through executing different workloads. All of this is performed by way of single-unit testing, of course, to save time and energy. The goal is quite simple, too: to prove that a particular workload indeed seems to represent the load described by the business, technical teams, or CCMS data. The work of scripting a true multiuser load followed by real-world performance tuning comes later, then, after you've solidified your workloads and executed various stress-test scenarios as depicted in the next few chapters.
Baseline testing is useful from a number of different perspectives. For instance, it's useful even when it comes to testing a non-Production platform, like your Development or Test/QA systems deployed prior to building, configuring, and initially optimizing the Production platform. That is, you can still gain a fair bit of performance-related knowledge testing against a non-Production system, both in terms of single-unit and multiuser stress testing. And this knowledge can pay off simply by helping you create a faster development experience for your expensive ABAP and Java developers or by creating a functional testing environment for a company's unique end-user base. This is especially true prior to Go-Live, when the production environment is still being built out, but the need for initial single-unit or "contained" load testing is growing ever more critical. The same goes for any change-driven testing as well, however, such as that associated with pending functional upgrades, technical refreshes, and so on.