Home > Articles > Operating Systems, Server

This chapter is from the book

SCSI-2 Reservation Issues

With the possibility of drastic failures during crucial operations, we need to understand how we can alleviate the possibility of SCSI Reservation conflicts. We can eliminate these issues by changing our operational behaviors to cover the possibility of failure. Although these practices are generally simple, they are nonetheless fairly difficult to implement unless all the operators and administrators know how to tell whether an operation is occurring, and whether the new operation would cause a SCSI Reservation conflict if it were implemented. This is where monitoring tools make the biggest impact. SCSI Reservation conflicts will be avoided if a simple rule is followed: Verify that any other operation has first completed on a given file, LUN, or set of LUNs before proceeding with the next operation.

To verify that any operation is in use, we need to perform all these operations using a similar management interface so that there is only one place to check. The use of Virtual Center or HPSIM with VMM will assist because it gives you one place to check for any operation that could cause a conflict. Verify in your management tool that all operations upon a given LUN or set of LUNs have been completed before proceeding with the next operation. In other words, serialize your actions per LUN or set of LUNs. In addition to checking your management tools, check the state of your backups and whether any current open service console operations have also completed. If a VMDK backup is running, let that take precedence and proceed with the next operation after the backup has completed. To check whether a backup is running, you can quickly look at the VMFS via the MUI or VIC to determine whether there are any REDO files out on the LUNs in question. If there are, either a backup is running or a backup has left a REDO file on the file system, which means the backup most likely failed for some reason. To check to see whether service console operations that could affect a LUN or set of LUNS have completed, judicious use of sudo is recommended. Sudo can log all your operations to a file that you can peruse and then you, as the administrator, can check the process lists for all servers. No user interface combines backups, VMotion, and service console actions.

As an example, let’s look at a system of three ESX Servers with five identical LUNs presented to the servers via XP12000 storage. Because each of the three servers shares each LUN we need, we should limit our LUN activity to one operation per LUN at any given time. In this case, we could perform five operations simultaneously as long as those operations were LUN specific. Once LUN boundaries are crossed, the number of simultaneous operations drops. To illustrate the second case, consider a VM with two disk files, one for the C: drive and one for the D: drive. Normally in ESX, we would place the C: and D: drives on separate LUNs to improve performance, among other things. In this case, because the C: and D: drives live on separate LUNs, manipulation of this VM, say with VMotion, counts as four simultaneous VM operations. This count is due to one operation affecting two LUNs. Therefore, five LUN operations could equate to fewer VM operations. This is the most careful of methods. However, instead of LUN, we can use FILE in many of these suggestions that follow, except where we are changing the metadata.

Using the preceding example as a basis, the suggested operational behaviors are as follows:

  • Simplify deployments so that a VM does not span more than one LUN. In this way, operations on a VM are operations on a single LUN.
  • Determine whether any operation is happening on the LUN you want to operate on. If your VM spans multiple LUNs, check the full set of LUNs by visiting the management tools in use and making sure that no other operation is happening on the LUN in question.
  • Verify that there is no current backup operation happening and that the VM is not in REDO mode.
  • Choose one ESX Server as your deployment server. In this way, it is easy to limit deployment operations, imports, or template creations to only one host and therefore one LUN at a time.
  • Use a naming convention for VMs that also tells what LUN or LUNs are in use for the VM. This way it is easy to tell what LUN could be affected by VM operation. This is an idealistic solution to a problem, but at least label VMs as spanning LUNs.
  • Inside VC or any other management tool, limit access to the administrative operations so that only those who know the process can actually enact an operation. In the case of VC only the administrative users should have any form of administrative privileges. All others should only have VM user or read-only privileges.
  • Administrators should only be allowed to power on or off a VM. For reboots required by patch application, schedule each reboot so that there is only one reboot per LUN at any given time. A power-off and power-on are considered separate operations. However, there are more than just SCSI Reservation concerns with this case. For example, if you have 80 VMs across 4 hosts, rebooting all 80 at the same time would create a performance issue, and some of the VMs could fail to boot. The standard boot process for an ESX Server is to boot only the next VM after the VMware Tools are started, guaranteeing that there is no initial performance issue. The necessary time of the lock for a power-on or -off operation is less than 7 microseconds, so many can be done in the span of a minute. However, this is not recommended because the increase in load on ESX could adversely affect your other VMs. Limiting this is a wise move from a performance viewpoint.
  • Use care when scheduling VMDK-level backups. It is best to have one host schedule all backups and to have one script to start backups on all other hosts. In this way, backups can be serialized per LUN. For ESX version 3, this problem is solved by using the VMware Consolidated Backup tool. However, for ESX versions 2.5.x and earlier, use the built-in ESX tool, vmsnap_all, to start a backup, or use the vmsnap_all tool to serialize all activities per VM and LUN. This is discussed further in Chapter 12, “Disaster Recovery and Backup.” Using the following pseudo-code may assist with backups where ssh-keygen was employed to make SSH not require a password be entered. By having one host and only one ESX Server run the following, you are guaranteed a serialized action for each backup regardless of the number of LUNs in use. In addition, third-party tools such as ESXRanger can serialize backups:
    for x in $hosts
      do
        for y in $vms
        do
          ssh $x vmsnap.pl $y &
        done
    done
    

    This pseudo-code demonstrates for ESX version 2.5.x and earlier releases that we can also change the behavior so more than one backup can occur simultaneously as long each VM in the list of VMs in $vms has all its disks on a single separate LUN. If you go with this approach, it is better for performance reasons to have each ESX Server doing backups on a different LUN at any given time. For example, our three machines can each do a backup using a separate LUN. Even so, the activity is still controlled by only one host so that there is no mix up or issue with timing. Let the backup process limit and tell you what it is doing. Find tools that will:

    • Never start another backup on a LUN while another is still running.
    • Signal the administrators that backups have finished either via e-mail, message board, or pager(s). This way there is less to check per operation.
  • Limit VMotion (hot migrations), fast migrates, and cold migrations to one per LUN. If you must do a huge number of VMotion migrations at the same time, limit this to one per LUN. With our example there are five LUNs, so there would be the possibility of five simultaneous VMotions, each on its own LUN, at any time. This assumes the VMs do not cross LUN boundaries. VMotion needs to be fast, and the more you attempt to do VMotions at the same time, the slower all will become. There is a chance that the OS inside the VM will start to complain if this time lag is too great. Using VMotion on ten VMs at the same time could be a serious issue for the performance and health of the VM regardless of SCSI Reservations. Make sure the VM has no REDO logs before invoking VMotion.
  • Only use the persistent VM disk modes. The other modes create lots of files on the LUNs that will require locking. In ESX version 3, persistent disk modes lead to not being able to perform snapshots and use the consolidated backup tools. These limitations make this item a lower priority from an operational point of view.
  • Do not suspend VMs as this also creates a file and therefore requires a SCSI Reservation.
  • Do not run vm-support requests unless all other operations have completed.
  • Do not use the vdf tool when any other modification operation is being performed.
  • Do not rescan storage subsystems unless all other operations have completed.
  • Limit use of vmkmultipath, vmkfstools, and other VMware-specific COS commands until all other operations have completed.
  • Create, modify, or delete a VMFS only when all other operations have completed.
  • Be sure no third-party agents are accessing your storage subsystem via vdf, or direct access to the /vmfs directory. Although vdf does not normally force a reservation, it could experience one if another host, due to a metadata modification, locked the LUN.
  • Do not run scripts that modify VMFS ownership, permissions, access times, or modification times from more than one host. Localize such scripts to a single host. It is suggested that you use the deployment server as the host for such scripts.
  • Stagger any scripts or agents that affect a LUN so that they run from a management node that can control when actions can occur.
  • Stagger the running of disk-intensive tools within a VM such as virus scan. The extra load on your SAN could cause results similar to those that occur with SCSI Reservations but which are not reservations errors but are instead queue-full or unavailable-target errors.
  • Use only one file system per LUN.
  • Do not mix file systems on the same LUN.
  • Do not store a VM’s VMX configuration files on shared ext3 partitions on a SAN LUN. In ESX 3.0, you can place a VMX configuration of virtual machines on VMFS volumes (locally or on the SAN).

What this all boils down to is ensuring that any possible operation that could somehow affect a LUN is limited to only one operation per LUN at any given time. The biggest hitters of this are automated power operations, backups, VMotion, and deployments. A little careful monitoring and changes to operational procedures can limit the possibility of SCSI Reservation conflicts and failures to various operations. A case in point follows. One company under review due to constant, debilitating SCSI Reservation conflicts reviewed the list of 23 items and fixed one or two possible items but missed the most critical item. This customer had an automated tool that ran simultaneously on all hosts at the same time to modify the owner and group of every file on every VMFS attached to the host. The resultant metadata updates caused hundreds of SCSI-2 reservations to occur. The solution was to run this script from a single ESX Server for all LUNs. By limiting the run of the script to a single host, all the reservations disappeared, because no two hosts were attempting to manipulate the file systems at the same time, and the single host, in effect, serialized the actions.

Hot and cold migrations of VMs can change the behavior of automatic boot methodologies. Setting a dependency on one VM or a time for a boot to occur deals with a single ESX Server where you can start VMs at boot of ESX, after VMware Tools start in the previous VM, after a certain amount of time, or not at all. This gets much more difficult with more than one ESX Server, so a new method has to be used. Although starting a VM after a certain amount of time is extremely useful, what happens when three VMs start almost simultaneously on the same LUN? Remember we want to limit operations to just one per LUN at any time. We have a few options:

  • Stagger the boot or reboot of your ESX Server and ensure that your VMs only start after the previous VMs’ VMware Tools start, to ensure that all the disk activity associated with the boot sequence finishes before the next VM boots, thereby helping with boot performance and eliminating conflicts. VM boots are naturally staggered by ESX when it reboots anyway.
  • Similar to doing backups, have one ESX Server that controls the boot of all VMs, guaranteeing that you can boot multiple VMs but only one VM per LUN at any time. So, if you have multiple ESX Servers, more than one VM can start at any time on each LUN. In essence, we use the VMware PERL API to gather information about each VM from each ESX Server and correlate the VMs to a LUN and create a list of VMs that can start simultaneously; that is, each VM is to start on a separate LUN. Then we wait a bit of time before starting the next batch of VMs.

All the listed operational changes will limit the amount of SCSI subsystem errors that will be experienced. Although it is possible to implement more than one operation per LUN at any given time, we cannot guarantee success with more than one operation. This depends on the type of operation, the SAN, settings, and most of all, timings for operations.

There are several other considerations, too. Most people want to perform multiple operations simultaneously, and this is possible as long as the operations are on separate LUNs. To increase the number of simultaneous operations, increase the number of LUNs available. Table 6.1 shows the maximum number of operations allowed per number of hosts connected to the LUN for various arrays. The table is broken into categories of risk based on the number of operations per LUN and the SCSI conflict retry count. In this table, gaps exist between the number of hosts per LUN and the number of operations per LUN; assume that if you are above the listed number, you are in the next-highest category.

Table 6.1. Risk Associated with Number of Operations per LUN

Array Type

# of Host(s)

Low Risk (0% to 10% failure)

Medium Risk (30% to 60% failure)

High Risk (> 60% failure)

SCSI Conflict Retry Count

Entry level - MSA

1

4

8

10

20

2

2

4

5

20

4

2

3

4

20

8

1

2

3

20

Enterprise – EVA, Symmetrics

1

8

12

16

8

2

2

4

6

8

4

2

3

4

8

8

1

2

3

8

Hitachi/HDS

1

6

10

12

20

2

2

4

6

20

4

2

3

4

20

8

1

2

3

20

Note that no more than eight hosts should be attached to any one given LUN at a time. Also note that as firmware is modified, these values can change to be higher or lower.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020