Home > Articles > Programming > Windows Programming

The Cost of GUIDs as Primary Keys

Mar 8, 2002

📄 Contents

␡

⎙ Print

< Back Page 6 of 9 Next >

Like this article? We recommend 

.NET Enterprise Design with Visual Basic .NET and SQL Server 2000

Learn More Buy

The Results

Now, I'm sure you are curious about the results. I promise to show them, but first I'd like to say a few things about test results in general and the environment that I used.

Disclaimer

It's extremely important to be skeptical when reading test results. To come to the right conclusion, you have to translate the tests to your environment and to your situation, and then re-execute the tests.

Environment

The equipment that I used for the tests is not real production equipment; I used only a development machine (1.7GHz, 512MB RAM). The test machine was a Windows XP machine with SP2 for SQL Server 2000.

In a new production environment, you will typically have RAID 1 (mirroring) for the log and RAID 10 (mirroring and striping) with at least four disks for the data. You will also have more CPU cache than 256KB, probably more than one CPU, and also at least 1GB of memory.

With that out of the way, let's take a look at the test results.

Test A: `INSERT` of Orders

In Table 1, you can find the results of the first test, in which I INSERT orders and details for those orders. The time values are recalculated so that the first result has a value of 1 (a baseline value) and all other time values shows a percentage compared to the first test result value.

As you can see in Table 1, the overhead for the GUID solution is actually huge. What is especially worrisome is the sign of bad scalability that I saw when I compared results from executions when I had different numbers of orders.

Table 1 Result of Test A: `INSERT` of 500 Orders

Test A	Time (with 1,000,000 Orders)
`INTEGER`	1
GUID	10

This big difference occurred with "only" 1,000,000 orders (and 10,000,000 detail rows). The difference was very small with a small amount of orders, so this is probably a really bad problem with 100,000,000 orders!

Something is wrong—very wrong!

Test A: `INSERT` of Orders, Revised

I did quick tests before comparing GUIDs and IDENTITY+INT without seeing the huge overhead that I saw when I scaled up the test a bit. (I also did the tests on NT4, and the problem is probably smaller there, as you soon will see.) This certainly worried me a lot! I tried different tricks, different indexing strategies, different fill factors, a clustered IDENTITY+INT, and so on—everything without luck. My guess was that the reason for the problem was the lack of order for the GUIDs generated at Windows 2000 (and beyond). In Listing 3, you can see a series of GUIDs generated with a couple of calls to NEWID() on SQL Server 7 on NT4. Compare that series with the one in Listing 4, which is generated with a couple of calls to NEWID() on SQL Server 2000 on Windows 2000.

Listing 3: Some GUIDs Generated with SQL Server 7 on NT4

B3BFC6B1-05A2-11D6-9FBA-00C04FF317DF
B3BFC6B2-05A2-11D6-9FBA-00C04FF317DF
B3BFC6B3-05A2-11D6-9FBA-00C04FF317DF
B3BFC6B4-05A2-11D6-9FBA-00C04FF317DF
B3BFC6B5-05A2-11D6-9FBA-00C04FF317DF

Listing 4: Some GUIDs Generated with SQL Server 2000 on Windows 2000

C87FC84A-EE47-47EE-842C-29E969AC5131
2A734AE4-E0EF-4D77-9F84-51A8365AC5A0
70E2E8DE-500E-4630-B3CB-166131D35C21
15ED815C-921C-4011-8667-7158982951EA
56B3BF2D-BDB0-4AFE-A26B-C8F59C4B5103

As you saw in Listing 3, only the eighth half byte is changed between calls. On the other hand, in Listing 4, only the 13th half byte is constant between calls. In Windows 2000, the MAC address isn't used any longer for when GUIDs are generated. Instead, the GUID is only a 16-byte random number. Well, that isn't totally true. The 13th half byte is constant, so only 15.5 bytes are random. Adam Nathan at Microsoft explained to me that the 13th half byte is the value that will point out the source of the GUID, and 4 means Microsoft. Thanks, Adam!

The guess that I had about the problem with the INSERT overhead for GUIDs was that the lack of order in the Windows 2000–generated GUIDs was giving SQL Server a hard time administering indexes under massive INSERT periods. Therefore, I tried to create an order for the GUIDs instead. I tried to CAST the current DATETIME to a BINARY(8) and put that first in the GUID. Unfortunately, that had no effect. When I investigated it further, I found that when I had a BINARY(16) value and CASTed it to a UNIQUEIDENTIFIER, some bytes were scrambled. What to do? You guessed it. I tried to compensate for the scrambling to see if that had any positive effect on throughput, but no effect occurred. Then I found out that it wasn't the first (high) byte that was important for the new ordering, but the last (low) bytes. I also learned that I didn't have to use BINARY(8) for the current DATETIME. BINARY(6) is enough for the next 77 years, so I decided to occupy only the last (low) 6 bytes with the current DATETIME. In Listing 5, you can see the result of PRINT CAST(GETDATE() AS BINARY(8)). The first (high) bytes are 0.

Listing 5: `GETDATE()` as `BINARY(8)`

0x0000919001862D20

The algorithm that I used to generate my new GUIDs is shown in Listing 6; GUIDs of that type (generated in SQL Server 2000 on Windows 2000) are shown in Listing 7. I call this type of GUIDs COMBs (for COMBined, abbreviated).

Listing 6: An Algorithm to Generate Ordered GUIDs (COMBs)

DECLARE @aGuid UNIQUEIDENTIFIER

SET @aGuid = CAST(CAST(NEWID() AS BINARY(10)) 
+ CAST(GETDATE() AS BINARY(6)) AS UNIQUEIDENTIFIER)

Listing 7: A Set of GUIDs (COMBs) Generated with the New Algorithm

E25AFE33-DB2D-4502-9BF0-919001862D20
83E689D3-8549-4094-B223-919001862D20
CC22A56D-0CD5-43C5-990E-919001862D20
D5149998-1718-468C-B1AD-919001862D20
CBD0182D-4A0E-40AC-9A4C-919001862D20

As you saw in Listing 7, those GUIDs (COMBs) have the same last 6 bytes. Those represent the DATETIME of when those five GUIDs were generated. A DATETIME has the "precision" of 1/300th of a second, and it's no problem for SQL Server to generate five GUIDs in 3ms with this algorithm, even though it is slower than using only NEWID(). As you can see, ordering the GUIDs isn't perfect because I will have several GUIDs being generated with the same DATETIME—but that doesn't really matter here.

Finally, I found something that helped with my INSERT overhead. When I used this new solution, I got the throughput table shown in Table 2.

Table 2 Result of Test A: `INSERT` of 500 Orders, Revised

Test A	Time (with 1,000,000 Orders)
`INTEGER`	1
GUID (COMB)	1.05

This was great news, and the overhead of the GUIDs was actually less than expected. Perhaps you wonder whether my algorithm to generate GUIDs in itself isn't very slow. Often when you do conversions, you see CPU cycles fly away. On the other hand, compared to writing to a database, those conversions are pretty cheap.

You also might think that this algorithm must be risky because there is no guarantee of unique GUIDs. That's correct, but the built-in algorithm makes no guarantee, either. It's just a random number. (Not even the NT4 algorithm was guaranteed, as I said before.) Okay, a 15.5-byte random number has low predictability of being duplicated. (Most of us think that the chance of winning millions in the lottery is almost zero. Still, the chance is a number that is less than 4 bytes!)

With my algorithm, the random number's size decreases from 15.5 bytes to 9.5 bytes because the last 6 bytes are occupied with the current DATETIME. Of course, the risk of generating duplicated random numbers with my algorithm is larger than with NEWID(), but it's not a problem if the numbers don't occur within the same 1/300 of a second. In my opinion, the risk of duplicates is small enough that it's acceptable. And when you have UNIQUE CONSTRAINTs, it's okay even if you happen to see two duplicated GUIDs being generated once a decennium or so.

NOTE

To judge the likelihood of generating two duplicate COMBs, I ran a quick test. I used 6 static bytes for the low bytes and then created a million COMBs without duplicates. It won't be possible for you to INSERT close to that many rows in 1/300 of a second.

I also discussed the risk with a friend who is a mathematician. He pointed out that we don't know the algorithm being used for the random number. Is it "perfect"? He also pointed out that it's very dangerous to make assumptions and conclusions about likelihood and random numbers. Even so, he said that there seemed to be a very low risk of creating duplicate COMBs.

To add some more fuel to the fire, I'd like to point out that the differences in real-life apps are probably bigger than the one I showed in Table 1 when ordinary GUIDs are used. This is because I skipped test results when CHECKPOINTing was going on—CHECKPOINTing was much more meaningful to the GUID test. When I used my custom (COMB) algorithm, CHECKPOINTing was much less frequent and was used for short intervals. That is obvious when I compare the time for INSERTing 500,000 orders when there are 500,000 orders present at the start. You can see the throughput shown in Table 3.

Table 3 Result of Test A: `INSERT` of 500,000 Orders

Test A	Time (with 500,000 Orders)
`INTEGER`	1
GUID	30
GUID (COMB)	1.1

When did you last get a wage increase of 30 times your salary? Think about it, and you'll get a feeling for what a huge difference this is!

Finally, the memory consumption of the tests was very different. The IDENTITY+INT and my new GUID (COMB) algorithm grabbed perhaps 1MB of RAM for each 500 orders INSERTed. The ordinary GUID algorithm grabbed all memory given to SQL Server (350MB in my tests) when 500 or 1,000 orders were INSERTed!

NOTE

One property of ordinary GUIDs isn't maintained with COMBs: The order of creation is not known with GUIDs. With COMBs, it is known. For some applications, this can be a problem that keeps COMBs from being used.

Is it common to see this type of raw INSERTs only? No, not that common. Most applications also need some reading of data. Now on to the second test.

Test B: Fetch of Single Order

In the second test, I fetch the customer ID from some of the orders, one by one. To get a correct set of IDs to work with, I first ran a script that loops through the table to grab 100 random IDs and PRINTs a complete stored procedure call to the stored procedure a_Order_Fetch, which is shown in GUID version in Listing 8.

Listing 8: GUID Version of `a_Order_Fetch`

CREATE PROCEDURE a_OrderGUID_Fetch
(@orderId UNIQUEIDENTIFIER
, @customerId UNIQUEIDENTIFIER OUTPUT)
AS
 DECLARE @anError INT

 SET @aCustomerId = 
 (SELECT om.customer_id
 FROM orderGUIDmaster om
 WHERE om.id = @orderId)

 SET @anError = @@ERROR

 RETURN @anError

Then I copied the 1,000 PRINTed rows and pasted them to another script that EXECUTEs the calls. To skew the test results as little as possible here, I first executed a DBCC DROPCLEANBUFFERS in the beginning of the script that calls the stored procedure.

Let's take a look at the results for the second test. As usual, the first value is recalculated to 1, to be used as a baseline.

Table 4 Result of Test B: `SELECT` of Orders

Test B	Time for Fetching 100 Orders
`INTEGER`	1 (baseline)
GUID (COMB)	1.1

This is fine, isn't it? But what about the dreaded 16-byte JOIN?

Test C: A Small `JOIN`

The most common argument for not using GUIDs for primary keys is that it's so expensive for JOINs. Perhaps you have heard this: "Use a 16-byte column to join? Are you crazy?"

I reused Test B but changed the stored procedures slightly so that the orderdetail table was also used in a JOIN, as shown in Listing 9.

Listing 9: COMB Version of `a_Order_FetchWithDetail`

CREATE PROCEDURE a_OrderCOMB_Fetch
(@orderId UNIQUEIDENTIFIER, @noOfRows INT OUTPUT)
AS
 DECLARE @anError INT

 SET @noOfRows = 
 (SELECT COUNT(*)
 FROM orderCOMBmaster om
 INNER JOIN orderCOMBdetail od
 ON om.id = od.ordermaster_id
 WHERE om.id = @orderId)

 SET @anError = @@ERROR

 RETURN @anError

The results of the third test case are shown in Table 5.

Table 5 Result of Test C: `SELECT` of orders and `JOIN` with Details

Test C	Time for Fetching 100 Orders
`INTEGER`	1 (baseline)
GUID (COMB)	1.1

Worth mentioning is that the foreign keys are clustered both in the INTEGER case and in the COMB case. As a matter of fact, it was faster not using clustered foreign keys for this specific test, but I thought that it was most natural (and probably more effective in other tests) to use clustered foreign keys.

< Back Page 6 of 9 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Privacy Notice

Overview

Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information

To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information

Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security

Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children

This site is not directed to children under the age of 13.

Marketing

Pearson may send or direct marketing communications to users, provided that

Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
Such marketing is consistent with applicable law and Pearson's legal obligations.
Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information

If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out

Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information

Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents

California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure

Pearson may disclose personal information, as follows:

As required by law.
With the consent of the individual (or their parent, if the individual is a minor)
In response to a subpoena, court order or legal process, to the extent permitted or required by law
To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
To investigate or address actual or suspected fraud or other illegal activities
To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links

This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact

Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice

We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020

Email Address