- Table of Contents
- .NET Book Recommendations
- Getting Started with .NET
- The Microsoft .NET Framework
- The Common Language Runtime (CLR), the Common Type System (CTS), and the Common Language Specification (CLS)
- .NET Framework Class Library
- Visual Studio .NET
- .NET Enterprise Servers and .NET My Services
- .NET Compliant Languages
- C#
- Visual Basic .NET (VB .NET)
- ASP.NET
- XML Web Services
- ADO.NET
- XML.NET
- Windows Forms
- Why .NET?
- Displaying Errors with the Error Provider
- COM Interoperability
- Comparing Java and .NET
- Calling Unmanaged Code
- .NET Application Security
- Code Access Security
- .NET Standards Support
- Numeric Types in the .NET Framework
- Working with Strings
- Formatting Strings
- Trimming Character Strings
- Comparing Strings in .NET 2.0
- Arrays and Collections
- Arrays as Class Members
- Sorting a Multi-Dimensional Array
- Sorting a Multi-Dimensional Array with LINQ
- File I/O (System.IO)
- Working with File Names
- Using the File System
- Working with Files and Directories
- Monitoring the File System
- Working with Streams
- Working with Text Encodings
- Working with Date and Time
- Extending the DateTime Class
- Using DateTimeOffset
- Fun with Dates
- Exceptions
- Delegates
- Events
- Asynchronous Programming
- Asynchronous File I/O
- Timers
- Random Numbers
- Cryptographically Secure Random Numbers
- Serialization
- MultiThreading (System.Threading)
- Multi-Threading Overview
- The Managed Thread Pool
- Managed Threading
- Thread Synchronization
- Synchronizing Data Access
- Trace Debugging
- Tracing in .NET 2.0
- ASP.NET Trace
- Validating User Input in ASP.NET Web Pages
- Event Logging
- Monitoring Application Performance
- Accessing the Registry
- Accessing Environment Information
- Environment Variables in .NET 2.0
- Managing Windows Forms Applications
- Working with Email
- Working with Graphics
- Animating a Background
- Working with Images
- Drawing Cycloid Curves
- Simulating the Spirograph
- Building International Web Applications
- .NET Compact Framework
- Mobile Web Development with ASP.NET
- Speech Technologies
- Microsoft MapPoint Web Service
- Working with Typed DataSets
- Using Relationships in DataSets
- DataColumn Expressions
- Playing Simple Sounds
- Playing Sounds with .NET 2.0
- Returning an Image in a Web Page
- RSS
- Best Practices Project Structure
- Best Practices Application Blocks
- The Data Access Application Block
- The Exception Management Application Block
- Best Practices — Performance
- Best Practices — Performance and Scalability
- Best Practices - Testing
- Reading the Tea Leaves, 2005
- Predictions: A Look Back at 2005, and a Look Ahead to 2006
- .NET Downloads
- Application Deployment Overview
- Application Deployment — Versioning
- Application Deployment — Version Policy
- Application Deployment — Packaging and Distribution
- .NET Remoting Overview
- A Remoting Demonstration
- Remoting Configuration
- Remoting: Lifetimes and Leases
- Remoting: Other Issues
- Attributes
- Writing Custom Attributes
- Accessing Attributes in Code
- Reflection
- Class Design: Inheritance, Interface, or Composition?
- The TriTryst Game
- Console Applications in .NET 2.0
- New File I/O Methods in .NET 2.0
- Building Projects with MSBuild
- Unmanaged Callbacks in .NET 2.0
- Timer Troubles
- Non-Rectangular Windows Forms
- Windows Forms Transparency
- 10 Things I Hate About Visual Basic
- 10 Things I Hate About C#
- Background Processing with Idle Time
- Scaling Windows Forms
- Reading and Writing Binary Data
- New Memory Management Functions in .NET 2.0
- Compatibility Between .NET 1.1 and .NET 2.0
- Managed Debugging Assistants in .NET 2.0
- XDir: A Program for Viewing Directory Sizes
- The Microsoft.VisualBasic Namespace
- Operator Overloading
- Working with GPS Data
- Hidden Visual Studio Tools
- .NET 3.0
- The .NET 2.0 Stopwatch Class
- Nullable Types
- Drawing Rotated Text
- Unsafe Code
- Other .NET Languages
- Compiler Directives
- Safe Handles
- Predictions, 2007 Edition
- New Features in C# 3.0
- Generics
- Network Client Programming
- On the Misuse of Exceptions
- Maximum Object Size in .NET
- More on Maximum Object Sizes
- Keyed Collection Memory Limitations
- Matching String Endings
- Allocating Small Data Structures
- Grumbling About Limitations
- Some Thoughts on the Nature of What We Do
- Working with Predicates in Collections
- Working with DataReaders
- Outputting XML with XmlWriter
- Writing XML Data
- Working with Compression
- Another Look at Compressed Streams
- Compressing a Very Large File
- Canonical URIs
- Constructing URIs
- Using OneWayAttribute for Remote Calls
- Selecting a Garbage Collector
- Linked List
- Linked List Application - The MRU List
- Auto-implemented Properties in C#
- The HashSet Collection
- Looking Ahead: 2018
- An Experiment in Optimization
- A Larger Integer
- Extension Methods
- Language Integrated Query (LINQ)
- Variable Length Parameter Lists
- The ReaderWriterLockSlim Synchronization Primitive
- Sorting a Text File
- Sorting a Large Text File
- Using ListView with Large Data Sets
- LINQ One-Liners
- Regular Expression Optimization
- Random File I/O
- Computing the Size of a Structure
- More on Computing Structure Sizes
- UnmanagedMemoryStream
- Dynamically Loading Code
- Building a String Table
- Delegates Versus Function Pointers
- Visual Studio Editor Features
- A Simple Profile Timer
- New Features in C# 4.0
- IEnumerator or IList?
- New Features in .NET 4.0
- Set Operations with IEnumerable and HashSet
- Using File Locks
- Extending Object Functionality
- Clearing a HashSet
- When Hash Codes Matter
- Parsing Command Line Options
- Creating a Single-Instance Program
- Asynchronous Windows Forms Events
- The BackgroundWorker Component
- Fixing a Dumb Mistake
- Thinking About Multi-Threaded Programs
- JavaScript Object Notation
- Better JSON Processing with JSON.Net
- Useful .NET-related Sites
- Markov Models
- Building an Order 0 Markov Model
- Higher Order Markov Models
- Webmaster's Guide to robots.txt
- An Overview of the Parallel Extensions to .NET
- Parallel Extensions Synchronization Objects
- Thread Safe Collections
- A Bug and a Conundrum
- Another Bug and an Answer
- Task Parallel Library
- Good and Bad Ideas in C#
- Parallel LINQ
- Copying Large Files
- Replacing File.Copy
- Learning from Our Mistakes
- Symbolic Links
- There Is No Easy Fix
- Tracking Hurricanes
- Examining Hurricane Data
- Searching for Multiple Strings
- Simple JSON Processing
- Aho-Corasick String Searching
- Writing a Web Crawler
- Web Crawler Politeness
- Source Control Management
- Subversion
- Communicating with Datagrams
- Fun with Actions and Funcs
- The Future of Media
- The Importance of Metadata
- Of Comparison and IComparer
- IComparer, Comparer, IComparable, Oh My!
- Comparing Generic Types
- A Simple HTTP Server
- Quantizing DateTime Fields
- More Fun with the Garbage Collector
- Refactor, Don't Rewrite
- A Generic BinaryHeap Class
- A Generic File Sorter
- Birthdays, Random Numbers, and Hash Keys
- Random Selection from Large Groups
- Command Line Tools for Windows
- Reading and Writing, Bit by Bit
- Selecting the Top N Items from a Group
- Determining Website Content Encoding
- Benefits and Drawbacks of Syndication
- Pubsubhubbub
- Memory Use Misconceptions
- Risk, Lost Opportunity, and Other Hidden Upgrade Costs
- Culture Shock: from .NET to JavaScript
- Using .NET for a Startup
- Tracking Wikipedia Changes with IRC
- Browser Applications and the Same Origin Policy
- Handling the Unexpected
- Dealing with Growth
- Deleting the Oldest File
- Where Do I Put Stuff?
- .NET Timer Resolution
- Exploring Options for Better Timers
- Using the Windows Timer Queue API
- Locks Aren't Slow
- Alternatives to Locks
- Lock Free Concurrent Collections
- The BlockingCollection Class
- Customizing BlockingCollection
- What Time Is It? Daylight Saving Time and Computers
- Using enums to Save Memory
- New File Operations in .NET 4.0
- Building a Hierarchy of Rectangles
- A Faster File Copy
- Constants Are Forever
- The Dangers of Floating Point
- Goto is Not Inherently Evil
- The Weakest Link
- Reducing Memory Required for Strings
- Grouping with LINQ
- HttpListener "Gotchas"
- Extension Methods Are Evil
- Finding the Registered Domain in a URL
- Drawing Text
- Obfuscating Sequential Keys
- Properties of Obfuscated Keys
- Finding Changes Between Two Lists
- Using the ConcurrentBag Collection
- Never Sleep!
- Shuffling and Sorting
- Viewing Large Text Files
- Use the Right Tool
- Why GetHashCode Matters
- Optimization Guidelines
- Timer Differences
- The Mutex
- Modifying a Working System
- Building a New Type of Stream
- More Large File Problems
- A Better File.Copy Replacement
- Throwing the Wrong Exception
- Approximate Counters
- Monitoring a Timer
- Combining Consoles and Forms
- Embedding a Text Resource
- Handling Concurrent Downloads
- The Importance of Domain Knowledge
- Stupid Programmer Tricks
- Aho-Corasick Revisited
- Expressiveness is the Soul of Brevity
- Fun with Anonymous Types
- Simplifying a Multi-Threaded Application
- Work Smarter
- The Skip List Data Structure
- A More Memory-Efficient Skip List
- Selection Revisited
- Why Async?
- What the Future Holds
- The "Roslyn" CTP
- Where We've Been
- Informit Reference Library
Multiplying Large Integers
Last updated Feb 1, 2008.
At its core, multiplication is simply repeated addition. Multiplying 2 by 3, for example, is the same as adding 2+2+2, or 3+3. So the simplest multiply operator we could write is:
public static Int128 operator *(Int128 a, Int128 b)
{
Int128 rslt = 0;
while (a > 0)
{
rslt += b;
--a;
}
return rslt;
}
That’s simple enough, but it will only work if <tt>a</tt> is positive. You’d have to write conditional code to handle negative numbers. Still, it wouldn’t be especially difficult. But it would be incredibly slow. Even if you added logic to use the absolute value of the smaller of the two numbers as the loop control variable, such a method would take a very long time to multiply even two 64 bit numbers. Imagine trying to multiply 2,718,281,828,459,045 by 31,415,926,535,897,932,384. Even on a 4 GHz machine, if it could do one addition per clock cycle, it would still take over 600,000 seconds, or something more than 7 days.
Fortunately, there’s a much faster way.
There are several different ways to multiply multi-precision numbers. Perhaps the easiest is conceptually similar to the way that you learned to multiply large numbers in grade school. For example, if you want to multiply 15 by 23, you multiply the individual digits to create partial products, which you then sum to arrive at the final result.
23 x 15 ------- 115 (5 * 23) + 230 (10 * 23) ------- 345
But even that is something of a shortcut. With experience, we learned to combine into a single line the partial products that result from multiplying each of digits in the multiplicand. There are actually four partial products, not just two:
23 x 15 ------- 15 (5 * 3) 100 (5 * 20) 30 (10 * 3) + 200 (10 * 20) ------- 345
You might be surprised to find that we can make a computer do much the same thing: multiply pieces of a very large number to obtain partial products, which are then summed to obtain the result.
As an example, trying to multiply two 8-bit numbers to obtain a 16-bit result, but your computer doesn’t have a multiply instruction that will do it. You do, however, have an instruction that will multiply two 4-bit numbers to get an 8-bit result. You can use that instruction to obtain partial products, like this:
2F x 1A ------ 96 (A * F) 140 (A * 20) F0 (10 * F) 200 (10 * 20) ------ 4C6
With a little bit twiddling, you can extend this to any arbitrary precision. Want to multiply 128-bit numbers, four bits at a time? No problem. Since each of the partial products fits in 8 bits, you can multiply the individual digits, shift them left as required, and add them together. It would require a lot of operations, though: in general, 32 * 32, or 1,024 4-bit multiplies. There are ways to optimize that, of course, to avoid multiplying when you don’t have to. Still, 1,024 multiplies in the worst case is much better than trillions of additions.
But we can do better still. The method above works for any number of bits. That is, if you have an 8x8 multiply that gives a 16-bit result, you can use 16-bit partial products. Since 8 goes into 128 16 times, that makes your worst case 16 * 16, or 256 multiplies. In C#, we can multiply two 32-bit numbers to obtain a 64-bit result. So our worse case is just 16 multiplies, and we can eliminate some of those.
I mentioned above that we can multiply two 32-bit numbers to obtain a 64-bit result. C# will allow us to multiply two 64-bit numbers, but the result is only 64 bits, which means that we could potentially lose information. That’s why the algorithm will work with 32-bit multiplies.
When you multiply two n-bit numbers, the result is, at most, a number that contains 2*n bits. For example, the decimal number 15 occupies 4 bits. 15 * 15 is 225, which requires 8 bits to represent. In our case of multiplying two 128-bit numbers, the result could occupy 256 bits. Granted, we don’t care about the high 128 bits of the result, but we’ll consider them here while describing the algorithm.
The algorithm splits each of the 128-bit operands into four 32-bit chunks. The multiplier chunks are labeled A, B, C, and D. The multiplicand’s chunks are E, F, G, and H. The result is a 256-bit number that is split into eight 32-bit "registers", which we’ll call R0 through R7, with R0 being the least significant portion.
We then proceed to multiply the parts and place the partial products into the proper parts of the result. So the 64-bit product of H*D would go into registers R1 and R0. H*C would go into registers R2 and R1, etc. Since the partial products are 64 bits wide and our registers are 32 bits wide, we have to split the partial products into high and low portions. We’ll refer to a 64-bit partial product as HD, for example, and add a suffix to indicate the high or low part: HDl for the low part, and HDh for the high part.
The result of doing all of the multiplies is a table that looks like this:
|
|
R7 |
R6 |
R5 |
R4 |
R3 |
R2 |
R1 |
R0 |
|
H*D |
|
|
|
|
|
|
HDh |
HDl |
|
H*C |
|
|
|
|
|
HCh |
HCl |
|
|
H*B |
|
|
|
|
HBh |
HBl |
|
|
|
H*A |
|
|
|
HAh |
HAl |
|
|
|
|
G*D |
|
|
|
|
|
GDh |
GDl |
|
|
G*C |
|
|
|
|
GCl |
GCl |
|
|
|
G*B |
|
|
|
GBh |
GBl |
|
|
|
|
G*A |
|
|
GAh |
GAl |
|
|
|
|
|
F*D |
|
|
|
|
FDh |
FDl |
|
|
|
F*C |
|
|
|
FCh |
FCl |
|
|
|
|
F*B |
|
|
FBh |
FBl |
|
|
|
|
|
F*A |
|
FAh |
FAl |
|
|
|
|
|
|
E*D |
|
|
|
EDh |
EDl |
|
|
|
|
E*C |
|
|
ECh |
ECl |
|
|
|
|
|
E*B |
|
EBh |
EBl |
|
|
|
|
|
|
E*A |
EAh |
EAl |
|
|
|
|
|
|
After computing the partial products, we just have to add them up to get our result.
The table above shows a 256-bit result. Since we’re only interested in the first 128 bits, anything in registers R4 through R7 is irrelevant and we can eliminate those. So we can eliminate any multiplies that contribute only to the upper 128 bits. In this case, we can eliminate six of the sixteen multiplies: GA, FB, FA, EC, EB, and EA. In addition, the high order parts of four multiplies will not affect the result: HAh, GBh, FCh, and EDh.
That leaves us with the sum of 10 partial products:
HD + (HC << 32) + (HB << 64) + (HAl << 96) + (GD << 32) + (GC << 64) + (GBl << 96) + (FD << 64) + (FCl << 96) + (EDl << 96)
With that explanation, I present the code for our 128-bit multiply:
public static Int128 operator *(Int128 a, Int128 b)
{
// Get the individual values.
int A = (int)(a._hi >> 32);
int B = (int)(a._hi & 0xFFFFFFFF);
int C = (int)(a._lo >> 32);
int D = (int)(a._lo & 0xFFFFFFFF);
long E = b._hi >> 32;
long F = b._hi & 0xFFFFFFFF;
long G = (long)(b._lo >> 32);
long H = (long)(b._lo & 0xFFFFFFFF);
// Compute the partial products.
long temp;
// HD
Int128 HD = H * D;
// HC << 32
Int128 HC = H * C;
HC <<= 32;
// HB << 64
Int128 HB = new Int128(0, H * B);
// HAl << 96
temp = H * A;
Int128 HA = new Int128(0, temp << 32);
// GD << 32
Int128 GD = G * D;
GD <<= 32;
// GC << 64
Int128 GC = new Int128(0, G * C);
// GBl << 96
temp = G * B;
Int128 GB = new Int128(0, temp << 32);
// FC << 64
Int128 FD = new Int128(0, F * D);
// FCl << 96
temp = F * C;
Int128 FC = new Int128(0, temp << 32);
// EDl << 96
temp = E * D;
Int128 ED = new Int128(0, temp << 32);
// Now add the partial products.
Int128 result = HD + HC + HB + HA + GD + GC + GB + FD + FC + ED;
return result;
}



