*C++ by Example: "UnderC" Learning Edition*, by Steve Donovan.

So far we have discussed the core arithmetic and logical operations of C++, as well as features such as strings and input/output that are part of the standard library. In the case study in Chapter 2, "Functions and Control Statements," we hit some limitations of basic C++. To pass a number of integers as a parameter, we were forced to write them as a string. There are two problems with this; first, there is no guarantee that these are all valid integers, and second, this would be very inefficient for really large lists of integers. That case study also depends heavily on having everything like ID lists in the correct order, but we cannot control the world, and often things happen in an order other than what we plan, so we cannot insist on sorted lists. C++ provides a basic built-in way to set up tables of numbers or objects, called arrays. The standard library also provides more sophisticated ways to group items together, called the standard containers.

Algorithms are standard procedures (or recipes) for doing operations such as searching and sorting. I will compare two ways to search tables of values (binary and linear search), and I will introduce the very powerful library of general algorithms in the standard C++ library.

In this chapter you will learn

How to declare, initialize, and use arrays

How to search for particular values and how to sort arrays

How to use the standard containers, including vectors, lists, and maps

## Arrays

Arrays contain a linear table of values, which must all be of the same type, called the base type. These values are called the array's elements, and any element can be specified by its position in the array, called its index. In this section we will discuss C++'s built-in arrays, together with typical array operations like copying and searching. The next section will then look at the standard containers, which are `intelligent' arrays.

### Arrays as Tables of Values

A C++ array declaration reserves space for a number of elements. For instance, the following declarations create an array of `int`s and an array of `char`s:

;>int arr1 [10];;>char arr2 [10];;>sizeof(arr1);(int) 40 ;>sizeof(arr2);(int) 10

You can see in this example that in general, the space occupied by an array is the number of elements multiplied by the size of each element. The number of elements in an array is usually called its dimension, and it can be any positive constant. You can declare more than one array at a time; in the following code, notice that the dimension follows each new array name. The dimension of `a3` is the expression `M*N`, which is acceptable because it contains no variables. You can mix regular variables in as well, although this is not considered a stylish thing to do. The variable `x` as declared here is just a plain `double`.

;>const int N = 20, M = 30;;>double a1[6],a2[N],a3[M*N],x;

You declare two-dimensional arrays (which are used for representing tables or mathematical matrixes) with two dimensions, rather than with a comma-separated pair, as in the following example:

;>int table[10][10]; // NOT int table[10,10]!!

### Initializing Arrays

You can initialize arrays by using a list of values separated by commas enclosed in braces, as in the following declaration of `arr`. The declaration of `names` shows how you can optionally leave out the constant in the brackets and let the system work out the dimension from the size of the list. For global declarations, the values in the list must be constants.

;>int arr[4] = {10,5,6,1};;>string names[] = {"Peter","Alice","James"};;>double interest_rates[]= { 5.1, 5.6, 6.0, 7.0 }; ;>int numbers[] = {1,2,3,4,5,6,7,8};

You cannot assign arrays in the following fashion because C++ arrays are not variables. You also cannot use `a1 = a2`, where both `a1` and `a2` are arrays.

;>arr = {1,2,3,5};CON 10:parse error CON 10:error in expression

You can, however, treat each element of the array as if it is a variable; this is called indexing the array, and the index is often also called a subscript. To access the i-th element of `arr` you say `arr[i]`, where the index `i` is put in square brackets (`[]`) like in other languages such as Pascal. This expression can be used wherever a variable can be used. The array index goes from 0 to one less than the size of the array. Using the definitions of `arr` and `names` from the preceding examples, you can say the following:

;>arr[0];(int&) 10 ;>arr[3];(int&) 1 ;>arr[0] + 2*arr[3];(int) 12 ;>arr[2] = 2;(int&) 2 ;>names[2] + " Brown";(string) `James Brown'

You can make constant arrays that cannot be written to afterward:

;>const int AC[] = {10,2,4};;>AC[2] = 2;CON 71:Can't assign to a const type

There are two basic rules to keep in mind with C++ arrays: Arrays go from 0 to `n-1` (not 1 to `n`), and there is no bounds checking whatsoever with arrays. Therefore, if you write past the end of an array, the value will probably go somewhere else. In the following case, we have changed the value of `var`, (which was an innocent bystander) by forgetting both rules:

;>int a1[2];;>int var;;>a1[2] = 5;(int&) 5 ;>var;(int) var = 5

**NOTE**

The fact that C++ arrays always begin at 0 is a benefit. In some other languages, you can change the start value by a global setting.

**NOTE**

Running over the end of an array is particularly bad if the array is declared local to a function, and on most systems, this will crash the program badly. The first C program I ever wrote did this, and my DOS machine went down in flames. Up to then I had trained on Pascal, which slaps you on the wrist if you exceed array bounds, but it doesn't amputate the whole hand. You may regard this as dumb and dangerous behavior, and many would agree with you, but C was designed to be fast and somewhat reckless.

You initialize most arrays by using a `for` loop. `for` loops naturally go from 0 to `n-1`, as required. This code is equivalent to the initialization of the array `numbers` from an earlier example in this chapter:

;> for(int i = 0; i < 8; i++) numbers[i] = i+1;

Arrays make possible code that otherwise would require big ugly `switch` statements. Consider the problem of starting with a date in International Standards Organization (ISO) form, and working out the day number (counting from January 1). One solution begins like this and would require quite a bit more tedious coding:

int days = days(date); switch(month(date)) { // runs from 1 to 12 case 2: days += 31; break; case 3: days += 61; break; .... }

Another solution would be to initialize an array with the lengths of the months and then generate a running sum. Thereafter, you could use just one line. Here's how you would implement this solution:

;>int days_per_month[] = {0,31,28,31,30,31,31,30,31,30,31};;>int days_upto_month[13];;>int sum = 0;; int days_upto_month[0] = 0; ;>for(int i = 1; i <= 12; i++)days_upto_month[i] = days_upto_month[i-1] + days_per_month[i]; ;>int days = days_upto_month[month(day)] + day;

### Passing Arrays to Functions

You can pass an array to a function, by declaring an array argument with an empty dimension. You usually have to pass the size as another argument because there is no size information in C/C++ arrays. The advantage of this is that the following function can be passed any array of `int`s:

void show_arr(int arr[], int n) { for(int i = 0; i < n; i++) cout << arr[i] << ` `; cout << endl; } ;>show_arr(numbers,8);1 2 3 4 5 6 7 8

However, you cannot pass this function an array of `double`s. The following error message means that the compiler failed to match the actual argument (`double[]`) with the formal argument (which was `int[]`):

;>double dd[] = {1.2, 3.4, 1.2};;>show_arr(dd,3);CON 28:Could not match void show_arr(double[],const int);

A `double` value is organized completely differently from an `int` type. If the compiler allowed you to pass an array of `double`s, you would get very odd integers displayed. On the other hand, simply passing a `double` value is not much trouble in C++; the compiler sensibly converts the `double` to an integer and gives you a mild warning that this will involve a loss of precision. This difference between passing a `double` value and passing an array of `double`s exists because in C++, ordinary variables (that is, scalars) are passed by value; they are actually copied into the formal argument. Arrays, on the other hand, are passed by reference. This difference is similar to the difference between mailing someone his or her own copy of a document and telling the person where to find it on a network. An interesting result of being able to pass arrays by reference is that you can change an array's elements within a function. (If you altered the document on the network, this would change that document for all readers.) The following small function modifies the first element of its array argument:

;>void modify_array(int arr[]) { arr[0] = 777; };>modify_array(numbers);;>show_arr(numbers,8);777 2 3 4 5 6 7 8

What if you didn't want the array to be (accidently) modified? This was always an issue with C, but C++ has a number of solutions. One solution is to use the standard library's vector class, which we will discuss later in this chapter. Another good solution is to make the array parameter `const`, as in the following example, so that it is impossible to modify the parameter within the function:

void just_looking(const int arr[]) { cout << arr[0] << endl; }

A `const` parameter is a promise from the function to the rest of the program: "Pass me data, and I shall not modify it in any form." It's as if I tell you where to find the document on the server, but make it read-only.

### Reading Arrays

Data arrays are commonly read from files. Because arrays are passed by reference, it is easy to write a function to do this job. The following example uses a few new shortcuts:

int read_arr(string file, int arr[], int max_n) { ifstream in(file.c_str()); if (in.bad()) return 0; int i = 0; while (in >> arr[i]) { ++i; if (i == max_n) return i; } return i; }

In this example, if you initialize the `ifstream` object `in` with a filename, that file will be automatically opened. The file is automatically closed when `in` goes out of scope. Also, the `bad()` method tells you if the file cannot be opened. There is no simple way to work out how many numbers are in any given file, so there's always the danger that you might overrun an array. In functions that modify arrays, it is common for the actual allocated array size to be passed. This function reads up to `max_n` numbers and will exit the file read loop if there are more numbers; it will return the number of values read, which is one plus the last index.

Writing out arrays to a file is straightforward. You can use the remainder operator (`%`) to control the number of values written out per line, as in the following example. `(i+1) % 6` is zero for i = 5,11,17,...—that is, the condition is true for every sixth value. (I added 1 to `i` so that it would not put out a new line for `i == 0`.)

void write_arr(string file, int arr[], int n) { ofstream out(file.c_str()); for(int i = 0; i < n; i++) { out << arr[i]; if ((i+1) % 6 == 0) out << endl; else out << ` `; } out << endl; }

### Searching

You will find that you often need to search for a value in a table of numbers. You want to know the index of that value within that table, or even simply whether the value is present. The simplest method is to run along until you find the value, or key. If you don't find the key, you just return `-1` to indicate that the key is not present in the array, as in the following example, where the function `linear_search()` is defined:

int linear_search(int arr[], int n, int val) { for(int i = 0; i < n; i++) if (arr[i] == val) return i; return -1; } ;> linear_search(numbers,4); (int) 3 ;> linear_search(numbers,42); (int) -1

This method is fast enough for small tables, but it isn't adequate for large tables that need to be processed quickly. A much better method is to use a binary search, but the elements of that table must be in ascending order. To perform a binary search, you first divide the table in two; the key must be either in the first half or the second half. You compare the key to the value in the middle; if the key is less than the middle value, you choose the first half, and if the key is greater than the middle value, you choose the second half. Then you repeat the process, dividing the chosen half into halves and comparing the key, until either the key is equal to the value or until you cannot divide the table any further. Here is a C++ example of performing a binary search:

int bin_search(int arr[], int n, int val) { int low = 0, high = n-1; // initially pick the whole range while (low <= high) { int mid = (low+high)/2; // average value... if (val == arr[mid]) return mid; // found the key if (val < arr[mid]) high = mid-1; // pick the first half else low = mid+1; // pick the second half } return -1; // did not find the key }

How much faster is a binary search than a linear search? Well, if there are 1,000 entries in a table, a binary search will take only about 30 tries (on average) to find the key. The catch here is that the table must be sorted. If it's already sorted, then it seems a good idea to insert any new value in its proper place. Finding the place is easy: You run along and stop when the key is less than the value. The code for finding the position is similar to the code for a linear search, but `-1` now means that the key was larger than all the existing values and can simply be appended to the table:

int linear_pos(int arr[], int n, int val) { for(int i = 0; i < n; i++) if (arr[i] > val) return i; return -1; }

### Inserting

It takes a surprising amount of effort to insert a new value into an array. To make space for the new value means moving the rest of the array along. For example, consider that inserting 4 into the sequence `1 3 9 11 15`; `linear_pos()` gives an index of `2` (because 9 is greater than 4). Everything above that position must be shifted up to make room; the top line shows the array subscripts:

`0 1 2 3 4 5` subscripts

`1 3| 9 11 15` before shifting up by one

`1 3 xxx 9 11 15` after shifting up by one

Here I've shown the sequence before and after the shift. We have to put 4 before index 2 (that is, the third position). So everything from the third position up (`9 11 15`) has got to move up one. That is, the shift involves `A[5] = A[4]`, `A[4] = A[3]`, `and A[3] = A[2]`. We can then set `A[2] = 4`. In general:

void insert_at(int arr[], int n, int idx, int val) { if (idx==-1) arr[n] = val; // append else { for(int i = n; i > idx; i—) arr[i] = arr[i-1]; arr[idx] = val; } } ;>int n = 5;;>int pos = linear_pos(arr,n,4);;>insert_at(arr,n,pos,4);;>show_arr(arr,n);1 3 4 9 11 15

Once `insert_at()` is defined, then it is straightforward to put a new element into an array so that it remains in order. The special case when `idx` is -1 just involves putting the value at the end. But generally there will be a lot of shuffling, making insertion much slower than array access. Note that this routine breaks the first rule for dealing with arrays: It goes beyond the end of the array and writes to `arr[n]`. For this to work, the array needs to have a dimension of at least `n+1`.

This is a good example of a `for` loop doing something different from what you've seen `for` loops do before: in this case, the `for` loop is going down from `n` to `idx+1`. In tricky cases like this, I encourage you to experiment (that's what UnderC is for.) But nothing beats sitting down with a piece of paper and a pencil.

As you can see from the examples in this section, it isn't straightforward to insert a value into an array in order. The algorithm for deleting an item is similar to `insert_at()` and involves shifting to the left; I leave this as an exercise for a rainy afternoon.

### Sorting

Sorting a sequence that is not in any order also involves moving a lot of data around. The following algorithm is called a bubble sort because the largest numbers "bubble" up to the top. It includes calls to `show_arr()` so that you can see how the larger numbers move up and the smaller numbers move down:

void bsort(int arr[], int n) { int i,j; for(i = 0; i < n; i++) for(j = i+1; j < n; j++) if (arr[i] > arr[j]) { // swap arr[i] and arr[j] show_arr(arr,n); // print out array before swap int tmp = arr[i]; arr[i] = arr[j]; arr[j] = tmp; } show_arr(arr,n); // show sorted array } ;>int b[] = {55,10,2,3,6};;>bsort(b,5);55 10 2 3 6 10 55 2 3 6 2 55 10 3 6 2 10 55 3 6 2 3 55 10 6 2 3 10 55 6 2 3 6 55 10 2 3 6 10 55