Home > Articles > Software Development & Management

How Not To Optimize

David Chisnall
  • PrintPrint
  • Share ThisShare This
  • DiscussDiscuss
Close WindowDavid Chisnall

David Chisnall

Learn more…

Inside Modern X11 Programming
Sep 18, 2009
Making JavaScript Fast, Part 2
Sep 15, 2009
Security in Your Pocket: OpenBSD on ARM
Sep 11, 2009
Making JavaScript Fast, Part 1
Sep 8, 2009
The Failure of the GPL
Aug 31, 2009
How Not To Optimize
Aug 21, 2009
A Half-Way Step to Apple’s Source Code: An Interview with David Chisnall
Jun 5, 2009
Advanced Flow Control for Objective-C
Jun 5, 2009
Erica Sadun on the iPhone SDK, OS X, and the Computing Landscape
Jun 5, 2009
From NeXTSTEP to Cocoa: Erik Buck on the Development of Cocoa and Objective-C
Jun 5, 2009
Fun with the Objective-C Runtime
Jun 5, 2009
Marcus Zarra and Matt Long on Core Animation
Jun 5, 2009
Steve Kochan on the Evolution of Objective-C
Jun 5, 2009
The Technology NeXT Gave the World
Jun 5, 2009
Where the Web and the Desktop Meet: An Interview with Lee Barney
Jun 5, 2009
Pandora: An Open Console
Jun 2, 2009
The Future of Wireless Networking
May 15, 2009
GNU or Linux?
May 11, 2009
Debugging C-Family Languages
Mar 27, 2009
How Small Is Your PC? The Rise of Netbooks and Other Small Form-Factor PCs
Mar 23, 2009
David Chisnall's CPU Feature Wishlist
Mar 13, 2009
The Dynamic Languages Renaissance
Jan 30, 2009
Robert Seacord on the CERT C Secure Coding Standard
Dec 15, 2008
Objective-C for C++ Programmers, Part 3
Nov 21, 2008
Objective-C for C++ Programmers, Part 2
Nov 14, 2008
Objective-C for C++ Programmers, Part 1
Nov 7, 2008
Writing Insecure C, Part 3
Oct 24, 2008
Writing Insecure C, Part 2
Oct 17, 2008
Writing Insecure C, Part 1
Oct 10, 2008
iRex iLiad e-Reader: Linux's Answer to the Kindle?
Aug 29, 2008
How It Works: Filesystems
Jun 13, 2008
How the LLVM Compiler Infrastructure Works
May 23, 2008
How It Works: Virtual Memory
May 21, 2008
What Is C For?
May 16, 2008
The Future of eBooks
Apr 25, 2008
Imagining an Open Network
Apr 18, 2008
Understanding How Xen Approaches Device Drivers
Mar 21, 2008
Examining the Legendary HURD Kernel
Mar 14, 2008
Competition Among Open Source Compilers
Feb 1, 2008
Inside Your OS: What is a Process Scheduler, and How Does it Work?
Jan 25, 2008
Bad UI of the Week: Read This (OK/Cancel)
Jan 18, 2008
The End of the Desktop Era
Jan 11, 2008
The What and Why of Open IM
Dec 28, 2007
A Look at the Modern X Server
Dec 21, 2007
The Future of Digital Media
Dec 14, 2007
The Future of Identity
Dec 7, 2007
Bad UI of the Week: Ask Forgiveness, Not Permission
Nov 21, 2007
Copyright Versus Free Software
Nov 16, 2007
Is Computer Science Dying?
Nov 9, 2007
A Brief History of Programming, Part 2
Nov 2, 2007
A Brief History of Programming, Part 1
Oct 26, 2007
The 700MHz Question: Will the Wireless Spectrum Auction Lead to Innovation or More of the Same?
Sep 28, 2007
Bad UI of the Week: The Menu Bar
Aug 24, 2007
The Dark Corners of x86
Aug 17, 2007
Bad UI of the Week: The Cross-Platform User Interface
Aug 17, 2007
Bad UI of the Week: The Mythical "is Like" Operator
Aug 10, 2007
Bad UI of the Week: Don't Make Me Tell You Twice...
Aug 3, 2007
Bad UI of the Week: Kettles and Washing Machines
Jul 27, 2007
The BBC iPlayer Controversy Explained
Jul 20, 2007
Bad UI of the Week: The Mitten Mouse
Jul 20, 2007
Bad User Interface of the Week: File It Under “Bad”
Jul 13, 2007
Bad User Interface of the Week: The DVD
Jul 6, 2007
A Roundup of Free Operating Systems
Jun 22, 2007
DragonFly BSD: UNIX for Clusters?
Jun 15, 2007
CPU Wars, Part 3: Put Your Left ARM In
May 18, 2007
CPU Wars, Part 2: POWER to the People
May 11, 2007
CPU Wars, Part 1: When the Chips Are Down
May 4, 2007
ZFS Uncovered
Apr 6, 2007
Vector Programming with GCC
Mar 30, 2007
Free Software Versus Open Source Software
Mar 16, 2007
What Programming Languages Should You Know?
Mar 9, 2007
Standardizing UNIX
Feb 2, 2007
Prolog: Logic Programming for Rapid Development
Jan 26, 2007
POSIX Parallel Programming, Part 3: Threads
Jan 19, 2007
POSIX Parallel Programming, Part 2: Message Passing
Jan 12, 2007
POSIX Parallel Programming, Part 1
Jan 5, 2007
The Nokia 770 Revisited
Dec 29, 2006
The Open Source Desktop Myth
Dec 22, 2006
Separating Style and Content: LaTeX and Typesetting
Dec 1, 2006
GNUstep: A Free Software alternative to OpenStep
Nov 10, 2006
Behind the Scenes of Objective-C 2.0
Nov 3, 2006
The Future of CPUs: What's After Multi-Core?
Oct 27, 2006
What Makes a Good Programming Language?
Oct 20, 2006
Emulation: Role-Playing for Computers
Oct 13, 2006
NetBSD: Not Just for Toasters
Oct 6, 2006
POSIX Asynchronous I/O
Sep 22, 2006
Breaking Down GPL Version 3
Aug 18, 2006
The Role of Binary Drivers in a Free OS
Aug 4, 2006
Security Is a UI Problem
Jul 28, 2006
Debunking the Myth of High-level Languages
Jul 14, 2006
A Taste of Erlang, a Dynamic, Asynchronous Message-Passing Language
Jun 30, 2006
Alternatives to LAMP
Jun 2, 2006
BSD Packaging Systems
May 26, 2006
DRM: Digital Rights or Digital Restrictions?
May 4, 2006
Introducing OpenBSD 3.9
Apr 28, 2006
The Need for Virtualization and Xen
Mar 31, 2006
Making Effective Software TCO Calculations
Mar 24, 2006
10 Things I Hate About U(NIX) Revisited: Readers Speak
Mar 17, 2006
Comparing Open Source Licenses: GPL vs. BSDL
Feb 3, 2006
BSD: The Other Free UNIX Family
Jan 20, 2006
Measuring the Effectiveness of Application Security Policies
Jan 13, 2006
The Cost of Free Software
Dec 9, 2005
Nokia 770 Internet Tablet Week-long Test Drive
Nov 18, 2005
10 Things I Hate About (U)NIX
Nov 4, 2005
The Lure of Open Source Software: Why Consider It for Your Business?
Oct 14, 2005

Sorry, this author hasn't posted any blogs.

David Chisnall takes a look at a number of optimization techniques that have been recommended over the years, discussing why they no longer work on modern architectures.

It's often said that there are two rules to follow when optimizing code:

  1. Don't optimize.
  2. Don't optimize yet (experts only).

Although this thinking is slightly facetious, there's a good reason why the second rule is marked as "experts only." Optimizing code means either making it run faster or making it take up less space (ideally both, but in most cases it's a tradeoff).

Generally speaking, you can do two kinds of optimizations:

  • Algorithmic improvements. If you go from using a bubble sort to using a merge sort, for example, your code will be faster. A quick sort usually will make it even faster, although this result depends somewhat on the input data.
  • Choosing good algorithms is generally the best way of making your code go quickly. This practice has the nice advantage that it works independently of the programming language you use or the architecture of the underlying system. Sometimes, however, even the best algorithm you can design is still too slow.

  • Optimizations specific to a language or implementation. This type of optimization requires a thorough understanding of how your compiler and CPU work. Unfortunately, over the years many of these techniques have been incorporated into rules that made sense for some architectures in the past, but no longer are practical.

This article examines some of these popular (but now outdated) rules for optimization.

A Load of Shift

On older CPUs, multiplication was very slow; in particular, multiplying by a constant was no faster than multiplying by a variable. One common trick was to decompose multiplications into sequences of add and shift operations. If you were compiling for an early RISC CPU, this would be done for you, because the CPU had no "multiply" hardware. On other machines, it was done for speed.

Multiplication by a power of two can be translated trivially into a left-shift operation. Essentially, any multiplication can be rewritten in terms of add and shift operations. For example, multiplying by 5 is equivalent to multiplying by 4 and adding the result once. The following two statements are equivalent:

// Multiply directly
a *= 5;

// Multiply using shift-and-add
a = a + (a << 2);

You can rewrite any multiplication operation in this way, and on a lot of systems this technique used to be much faster for multiplying by a constant. This was such a good trick that some compilers—including GCC—used to do this translation automatically.

Then hardware multipliers got better. CPUs like the AMD Athlon could do integer multiplication quickly. Because most programs contained a lot of multiplications and not many shifts, AMD put two shift units and one multiply unit in the Athlon, which meant that it could do two multiplications at a time, but only one shift. Because most multiplication by a constant requires several add-and-shift operations, suddenly the code was faster (not to mention clearer) if you wrote the multiplication instead of the add-and-shift.

Most modern CPUs are superscalar, meaning that they can execute two or more adjacent instructions simultaneously if the operations aren't dependent on each other and the CPU has enough free execution units. The independence requirement is important here; while two multiplications can often be done in parallel, each shift and add needed to simulate the multiplication depends on the result of the previous one, so this practice can cause pipeline stalls on in-order architectures (and even on out-of-order architectures) when other instructions are pushed too far back in the instruction stream to be seen. Modern CPUs get a lot of their speed from pipelining. Many instructions on something like a Core 2 take 10–20 cycles to execute, but because they're decomposed into a large number of independent steps, you can start a second operation only one or two cycles after the first one has started.

This practice is even more pronounced on modern systems. To demonstrate this principle, I wrote two simple programs for the Core 2 Duo. Both loop 1,000,000,000 times, multiplying a running counter by 12345. Program A uses a multiply instruction; program B uses a sequence of shifts and adds (10 in total). Program A takes 3.7 seconds to complete, whereas the shift-and-add version in program B takes 21.3 seconds. Thus, the "naïve" version is almost 7 times faster than the "optimized" version. This result isn't just because the multiplication is faster. A lot of the extra time is taken up with loads and stores, and program B's add-and-shift version involves a lot of moving of operands between hidden registers inside the execution units, and shuffling data from the end of the pipeline back to the start. Another version, with just a single shift, ran marginally faster than the program A multiplication; even turning a "multiply by 2" into a "left-shift by 1" gives almost no performance improvement (less than 1%) on a Core 2.

The important point here is that just because something used to be slow on old CPUs doesn't mean that it's still slow. You should always time your optimizations and make sure that they really offer an improvement in speed. When I started programming in C, most processors didn't have floating-point units. Any floating-point calculations were implemented entirely in software, and typically were two orders of magnitude slower than integer operations. These days, floating-point operations are done in hardware and are often within a factor of 2 of the speed of integer operations. Floating-point has gone from being something that should be used only when absolutely necessary to something to be avoided only in performance-critical code, or in code intended to run on embedded systems with no floating-point unit (FPU).

  • Share ThisShare This
  • Your Account

Discussions

Re: Typo?
Posted Oct 17, 2009 02:11 PM by David Chisnall
0 Replies
Typo?
Posted Oct 7, 2009 03:22 PM by rdpate
1 Replies
how not to optimize...
Posted Sep 1, 2009 09:28 AM by ranjix
0 Replies

Make a New Comment

You must log in in order to post a comment.

Related Resources

Danny KalevMinutes from the October 2009 Meeting
By Danny Kalev on November 19, 2009 No Comments

The minutes from the Santa Cruz (October 2009) meeting are available here. Even if you're not a language layer at heart, I encourage you to read them.

Danny KalevA Reader's Opinion on Attributes
By Danny Kalev on October 20, 2009 No Comments

In August I dedicated a series to the debate about C++0x attributes. I believe that it covered the subject in a balanced and detailed way, but I keep getting complaints from C++ users who don't like attributes for various reasons. Here's a recent email I received from a Polish C++ programmer. While it  doesn't represent my opinion about attributes -- I'm rather neutral about this feature and consider it a "solution waiting for a problem" -- but it suggests that attributes are still a highly controversial issue that will haunt C++ for a long time. The email is quoted here with minor edits that and as usual, with all private details removed.

Danny KalevFollowup: The Web 2.0 Guy I Ain't
By Danny Kalev on October 16, 2009 1 Comment

Almost a year ago, I posted here The Web 2.0 Guy I Ain't. People wonder whether I still resist all those Web 2.0 features and technologies at the end of 2009.

See All Related Blogs

Informit Network