Home > Articles > Programming > C#

.NET Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

Multiplying Large Integers

Last updated Mar 14, 2003.

At its core, multiplication is simply repeated addition. Multiplying 2 by 3, for example, is the same as adding 2+2+2, or 3+3. So the simplest multiply operator we could write is:

public static Int128 operator *(Int128 a, Int128 b)
{
  Int128 rslt = 0;
  while (a > 0)
  {
    rslt += b;
    --a;
  }
  return rslt;
}

That’s simple enough, but it will only work if <tt>a</tt> is positive. You’d have to write conditional code to handle negative numbers. Still, it wouldn’t be especially difficult. But it would be incredibly slow. Even if you added logic to use the absolute value of the smaller of the two numbers as the loop control variable, such a method would take a very long time to multiply even two 64 bit numbers. Imagine trying to multiply 2,718,281,828,459,045 by 31,415,926,535,897,932,384. Even on a 4 GHz machine, if it could do one addition per clock cycle, it would still take over 600,000 seconds, or something more than 7 days.

Fortunately, there’s a much faster way.

There are several different ways to multiply multi-precision numbers. Perhaps the easiest is conceptually similar to the way that you learned to multiply large numbers in grade school. For example, if you want to multiply 15 by 23, you multiply the individual digits to create partial products, which you then sum to arrive at the final result.

   23
  x 15
-------
  115 (5 * 23)
 + 230 (10 * 23)
-------
  345

But even that is something of a shortcut. With experience, we learned to combine into a single line the partial products that result from multiplying each of digits in the multiplicand. There are actually four partial products, not just two:

   23
  x 15
-------
   15 (5 * 3)
  100 (5 * 20)
   30 (10 * 3)
 + 200 (10 * 20)
-------
  345

You might be surprised to find that we can make a computer do much the same thing: multiply pieces of a very large number to obtain partial products, which are then summed to obtain the result.

As an example, trying to multiply two 8-bit numbers to obtain a 16-bit result, but your computer doesn’t have a multiply instruction that will do it. You do, however, have an instruction that will multiply two 4-bit numbers to get an 8-bit result. You can use that instruction to obtain partial products, like this:

  2F
 x 1A
------
  96 (A * F)
  140 (A * 20)
  F0 (10 * F)
  200 (10 * 20)
------
  4C6

With a little bit twiddling, you can extend this to any arbitrary precision. Want to multiply 128-bit numbers, four bits at a time? No problem. Since each of the partial products fits in 8 bits, you can multiply the individual digits, shift them left as required, and add them together. It would require a lot of operations, though: in general, 32 * 32, or 1,024 4-bit multiplies. There are ways to optimize that, of course, to avoid multiplying when you don’t have to. Still, 1,024 multiplies in the worst case is much better than trillions of additions.

But we can do better still. The method above works for any number of bits. That is, if you have an 8x8 multiply that gives a 16-bit result, you can use 16-bit partial products. Since 8 goes into 128 16 times, that makes your worst case 16 * 16, or 256 multiplies. In C#, we can multiply two 32-bit numbers to obtain a 64-bit result. So our worse case is just 16 multiplies, and we can eliminate some of those.

I mentioned above that we can multiply two 32-bit numbers to obtain a 64-bit result. C# will allow us to multiply two 64-bit numbers, but the result is only 64 bits, which means that we could potentially lose information. That’s why the algorithm will work with 32-bit multiplies.

When you multiply two n-bit numbers, the result is, at most, a number that contains 2*n bits. For example, the decimal number 15 occupies 4 bits. 15 * 15 is 225, which requires 8 bits to represent. In our case of multiplying two 128-bit numbers, the result could occupy 256 bits. Granted, we don’t care about the high 128 bits of the result, but we’ll consider them here while describing the algorithm.

The algorithm splits each of the 128-bit operands into four 32-bit chunks. The multiplier chunks are labeled A, B, C, and D. The multiplicand’s chunks are E, F, G, and H. The result is a 256-bit number that is split into eight 32-bit "registers", which we’ll call R0 through R7, with R0 being the least significant portion.

We then proceed to multiply the parts and place the partial products into the proper parts of the result. So the 64-bit product of H*D would go into registers R1 and R0. H*C would go into registers R2 and R1, etc. Since the partial products are 64 bits wide and our registers are 32 bits wide, we have to split the partial products into high and low portions. We’ll refer to a 64-bit partial product as HD, for example, and add a suffix to indicate the high or low part: HDl for the low part, and HDh for the high part.

The result of doing all of the multiplies is a table that looks like this:

 

R7

R6

R5

R4

R3

R2

R1

R0

H*D

 

 

 

 

 

 

HDh

HDl

H*C

 

 

 

 

 

HCh

HCl

 

H*B

 

 

 

 

HBh

HBl

 

 

H*A

 

 

 

HAh

HAl

 

 

 

G*D

 

 

 

 

 

GDh

GDl

 

G*C

 

 

 

 

GCl

GCl

 

 

G*B

 

 

 

GBh

GBl

 

 

 

G*A

 

 

GAh

GAl

 

 

 

 

F*D

 

 

 

 

FDh

FDl

 

 

F*C

 

 

 

FCh

FCl

 

 

 

F*B

 

 

FBh

FBl

 

 

 

 

F*A

 

FAh

FAl

 

 

 

 

 

E*D

 

 

 

EDh

EDl

 

 

 

E*C

 

 

ECh

ECl

 

 

 

 

E*B

 

EBh

EBl

 

 

 

 

 

E*A

EAh

EAl

 

 

 

 

 

 

After computing the partial products, we just have to add them up to get our result.

The table above shows a 256-bit result. Since we’re only interested in the first 128 bits, anything in registers R4 through R7 is irrelevant and we can eliminate those. So we can eliminate any multiplies that contribute only to the upper 128 bits. In this case, we can eliminate six of the sixteen multiplies: GA, FB, FA, EC, EB, and EA. In addition, the high order parts of four multiplies will not affect the result: HAh, GBh, FCh, and EDh.

That leaves us with the sum of 10 partial products:

HD + (HC << 32) + (HB << 64) + (HAl << 96) + (GD << 32) + (GC << 64) + (GBl << 96) + (FD << 64) + (FCl << 96) + (EDl << 96)

With that explanation, I present the code for our 128-bit multiply:

public static Int128 operator *(Int128 a, Int128 b)
{
  // Get the individual values.
  int A = (int)(a._hi >> 32);
  int B = (int)(a._hi & 0xFFFFFFFF);
  int C = (int)(a._lo >> 32);
  int D = (int)(a._lo & 0xFFFFFFFF);
  long E = b._hi >> 32;
  long F = b._hi & 0xFFFFFFFF;
  long G = (long)(b._lo >> 32);
  long H = (long)(b._lo & 0xFFFFFFFF);

  // Compute the partial products.
  long temp;

  // HD
  Int128 HD = H * D;

  // HC << 32
  Int128 HC = H * C;
  HC <<= 32;

  // HB << 64
  Int128 HB = new Int128(0, H * B);

  // HAl << 96
  temp = H * A;
  Int128 HA = new Int128(0, temp << 32);

  // GD << 32
  Int128 GD = G * D;
  GD <<= 32;

  // GC << 64
  Int128 GC = new Int128(0, G * C);

  // GBl << 96
  temp = G * B;
  Int128 GB = new Int128(0, temp << 32);

  // FC << 64
  Int128 FD = new Int128(0, F * D);

  // FCl << 96
  temp = F * C;
  Int128 FC = new Int128(0, temp << 32);

  // EDl << 96
  temp = E * D;
  Int128 ED = new Int128(0, temp << 32);

  // Now add the partial products.
  Int128 result = HD + HC + HB + HA + GD + GC + GB + FD + FC + ED;

  return result;
}