Home > Articles > Operating Systems, Server > Linux/UNIX/Open Source

User-Level Memory Management in Linux Programming

  • Print
  • + Share This
This chapter is from the book

In this chapter

  • 3.1 Linux/Unix Address Space

  • 3.2 Memory Allocation

  • 3.3 Summary

  • Exercises

Without memory for storing data, it's impossible for a program to get any work done. (Or rather, it's impossible to get any useful work done.) Real-world programs can't afford to rely on fixed-size buffers or arrays of data structures. They have to be able to handle inputs of varying sizes, from small to large. This in turn leads to the use of dynamically allocated memory—memory allocated at runtime instead of at compile time. This is how the GNU "no arbitrary limits" principle is put into action.

Because dynamically allocated memory is such a basic building block for real-world programs, we cover it early, before looking at everything else there is to do. Our discussion focuses exclusively on the user-level view of the process and its memory; it has nothing to do with CPU architecture.

3.1 Linux/Unix Address Space

For a working definition, we've said that a process is a running program. This means that the operating system has loaded the executable file for the program into memory, has arranged for it to have access to its command-line arguments and environment variables, and has started it running. A process has five conceptually different areas of memory allocated to it:

Code

    Often referred to as the text segment, this is the area in which the executable instructions reside. Linux and Unix arrange things so that multiple running instances of the same program share their code if possible; only one copy of the instructions for the same program resides in memory at any time. (This is transparent to the running programs.) The portion of the executable file containing the text segment is the text section.

Initialized data

    Statically allocated and global data that are initialized with nonzero values live in the data segment. Each process running the same program has its own data segment. The portion of the executable file containing the data segment is the data section.

Zero-initialized data

    Global and statically allocated data that are initialized to zero by default are kept in what is colloquially called the BSS area of the process.1 Each process running the same program has its own BSS area. When running, the BSS data are placed in the data segment. In the executable file, they are stored in the BSS section.

    The format of a Linux/Unix executable is such that only variables that are initialized to a nonzero value occupy space in the executable's disk file. Thus, a large array declared 'static char somebuf[2048];', which is automatically zero-filled, does not take up 2 KB worth of disk space. (Some compilers have options that let you place zero-initialized data into the data segment.)

Heap

    The heap is where dynamic memory (obtained by malloc() and friends) comes from. As memory is allocated on the heap, the process's address space grows, as you can see by watching a running program with the ps command.

    Although it is possible to give memory back to the system and shrink a process's address space, this is almost never done. (We distinguish between releasing nolonger-needed dynamic memory and shrinking the address space; this is discussed in more detail later in this chapter.)

    It is typical for the heap to "grow upward." This means that successive items that are added to the heap are added at addresses that are numerically greater than previous items. It is also typical for the heap to start immediately after the BSS area of the data segment.

Stack

    The stack segment is where local variables are allocated. Local variables are all variables declared inside the opening left brace of a function body (or other left brace) that aren't defined as static.

    On most architectures, function parameters are also placed on the stack, as well as "invisible" bookkeeping information generated by the compiler, such as room for a function return value and storage for the return address representing the return from a function to its caller. (Some architectures do all this with registers.)

    It is the use of a stack for function parameters and return values that makes it convenient to write recursive functions (functions that call themselves).

    Variables stored on the stack "disappear" when the function containing them returns; the space on the stack is reused for subsequent function calls.

    On most modern architectures, the stack "grows downward," meaning that items deeper in the call chain are at numerically lower addresses.

When a program is running, the initialized data, BSS, and heap areas are usually placed into a single contiguous area: the data segment. The stack segment and code segment are separate from the data segment and from each other. This is illustrated in Figure 3.1.

Figure 3.1 Figure 3.1 Linux/Unix process address space

Although it's theoretically possible for the stack and heap to grow into each other, the operating system prevents that event, and any program that tries to make it happen is asking for trouble. This is particularly true on modern systems, on which process address spaces are large and the gap between the top of the stack and the end of the heap is a big one. The different memory areas can have different hardware memory protection assigned to them. For example, the text segment might be marked "execute only," whereas the data and stack segments would have execute permission disabled. This practice can prevent certain kinds of security attacks. The details, of course, are hardware and operating-system specific and likely to change over time. Of note is that both Standard C and C++ allow const items to be placed in read-only memory. The relationship among the different segments is summarized in Table 3.1.

Table 3.1 Executable program segments and their locations

Program memory

Address space segment

Executable file section

Code

Text

Text

Initialized data

Data

Data

BSS

Data

BSS

Heap

Data

Stack

Stack

The size program prints out the size in bytes of each of the text, data, and BSS sections, along with the total size in decimal and hexadecimal. (The ch03-memaddr.c program is shown later in this chapter; see Section 3.2.5 Address Space Examination," page 78.)

$ cc -o ch03-memaddr.c -o ch03-memaddr           Compile the program
$ ls -l ch03-memaddr                             Show total size
-rwxr-xr-x    1 arnold   devel       12320 Nov 24 16:45 ch03-memaddr
$ size ch03-memaddr                              Show component sizes
   text    data     bss     dec     hex filename
   1458     276       8    1742     6ce ch03-memaddr
$ strip ch03-memaddr                             Remove symbols
$ ls -l ch03-memaddr                             Show total size again
-rwxr-xr-x    1 arnold   devel        3480 Nov 24 16:45 ch03-memaddr
$ size ch03-memaddr                              Component sizes haven't changed
   text    data     bss     dec     hex filename
   1458     276       8    1742     6ce ch03-memaddr

The total size of what gets loaded into memory is only 1742 bytes, in a file that is 12,320 bytes long. Most of that space is occupied by the symbols, a list of the program's variables and function names. (The symbols are not loaded into memory when the program runs.) The strip program removes the symbols from the object file. This can save significant disk space for a large program, at the cost of making it impossible to debug a core dump2 should one occur. (On modern systems this isn't worth the trouble; don't use strip.) Even after removing the symbols, the file is still larger than what gets loaded into memory since the object file format maintains additional data about the program, such as what shared libraries it may use, if any. 3

Finally, we'll mention that threads represent multiple threads of execution within a single address space. Typically, each thread has its own stack, and a way to get thread local data, that is, dynamically allocated data for private use by the thread. We don't otherwise cover threads in this book, since they are an advanced topic.

  • + Share This
  • 🔖 Save To Your Account