Home > Articles > Programming > C#

.NET Reference Guide

Hosted by

Toggle Open Guide Table of ContentsGuide Contents

Close Table of ContentsGuide Contents

Close Table of Contents

How Much Memory Does That String Take?

Last updated Mar 14, 2003.

Computing the amount of available memory is somewhat involved, because it depends not only on how much physical memory is in the system, but also how much memory all running processes currently use. I chose to ignore computing available memory, and will let the calling program tell the sort method how much memory to use. The caller can then make the tradeoff between chunk size and memory usage.

In order to keep track of how much memory we’re using, we need to know not only how much memory the characters in each string occupy, but also whatever overhead is involved in creating a string.

Computing the memory used by the characters in each string is simple. As you probably know, .NET strings are composed of 16-bit Unicode "characters." So the amount of memory occupied by a string’s characters is two times the string length. But there is operating system overhead, too. How much is that overhead? It’s not documented anywhere that I know of, but it’s possible to determine it empirically.

The program below allocates one million strings, each 128 characters long, and adds each one to a list. Before any strings are created, though, the program pre-allocates the list and determines how much memory is allocated. After generating the strings, it’s a simple matter to subtract the starting memory used value from the ending value, and from there compute the per-string overhead.

class Program
{
  const int NUM_STRINGS = 1000000;
  const int STRING_LENGTH = 128;

  // test to see how much space string allocations take
  static void Main(string[] args)
  {
    List<string> strings = new List<string>(NUM_STRINGS);
    StringBuilder sb = new StringBuilder(STRING_LENGTH);
    Random rnd = new Random();

    // get starting memory
    long startMem = GC.GetTotalMemory(true);

    // Generate the strings
    for (int i = 0; i < NUM_STRINGS; ++i)
    {
      sb.Length = 0;
      for (int j = 0; j < STRING_LENGTH; ++j)
      {
        char c = (char)rnd.Next(65, 92);
        sb.Append(c);
      }
      strings.Add(sb.ToString());
    }
    //

    // get ending memory
    long stopMem = GC.GetTotalMemory(true);

    // memUsed is the total amount of memory used by the strings
    long memUsed = stopMem - startMem;

    Console.WriteLine("Memory used = {0:N0}", memUsed);

    // rawStringMem is the amount of memory used by the string’s characters.
    long rawStringMem = (long)2 * NUM_STRINGS * STRING_LENGTH;

    Console.WriteLine("Memory used by strings = {0:N0}", rawStringMem);

    // Any excess memory used is overhead
    long overhead = memUsed - rawStringMem;
    Console.WriteLine("Allocation overhead = {0:N0} bytes", overhead);
    Console.WriteLine("Overhead per string = {0:N2} bytes", (double)overhead / NUM_STRINGS);
    Console.Write("Press Enter:");
    Console.ReadLine();
  }
}

If you execute this program in the 32-bit runtime, you’ll see that the per-string allocation overhead is 20 bytes. In the 64-bit runtime, the per-string allocation overhead is 32 bytes.

There is one more wrinkle. In the program above, I pre-allocated the list because I wanted to know what the per-string overhead is. But we won’t be able to pre-allocate our list because we don’t know how many strings will fit into whatever memory is available. We could make a guess, but it might not be very reliable. In this case it will be better to let the <tt>List</tt> dynamically allocate memory. That’s going to cost us another pointer per string: 4 bytes on 32-bit systems, and 8 bytes on 64-bit systems. So our per-string overhead values are 24 and 40 bytes, respectively.

So, the total memory required by a .NET string is:

(2 * Length) + AllocationOverhead

A function to compute that, taking into account the differences in allocation overhead is:

int ComputeStringMemory(string s)
{
  return (2 * s.Length) + (IntPtr.Size == 4 ? 24 : 40);
}

This won’t be exact, because the list will have some memory allocated for strings that actually aren’t used. But it should be sufficient for our purposes.