On Unix, if you're not working on or with files, you're probably working with processes. A process, simply put, is a program running on the system. More precisely, it's an instance of a running program; that is, each time you or someone else using the system runs that program, another process is created.
In many ways, processes are similar to files: Every process is owned by a user, every process has a name, and every process has a number. Processes are owned by the user who ran the program, although suid programs are owned by the setuid file's owner. The name of a process is always the name of the command that the system is executing. Processes also have a size, although this is space in memory, whereas files occupy space on disk.
The system provides the process number. The first process, which controls the execution of all other processes, is init. The process ID (also known as the PID) for init is always 1.1
Process numbers don't just go up and up forever; they're generally of a fixed length, usually 15 bits. (A 15-bit number is a two-byte number with one of those bits reserved to indicate whether that number is positive or negative.) Some systems have PIDs of 16 bits or more, but there are always a fixed number of possible processes. The reason for this is simply that when a program is written, a fixed amount of space must be allocated for process numbers, and that number must be the same throughout any given system.
Process IDs are doled out sequentially: after process 15321 is created, the next process is always 15322, even if process 15320 is no longer running. After the top process number has been used, the system rolls back to the bottom and then starts handing out process numbers all over again. If a given process is still running, the system skips that one and moves on to the next, not returning again until it has reached the top.
Each process has a parent process, much as each file is within a directory. The consequence of this is that the list of processes on a system can be thought of much like a directory structure. init creates several other processes. Each of these can parent many more processes, each of which can in turn also create any number of processes.
A particular process starts when you log on to a system. This process is the shell, the program that lets you run other programs and that interprets your command line. It's a process like any other, and when your shell dies, you are logged out of the system. There are different shells that operate somewhat differently.
So, what processes are running on your system? Let's take a look at my system and see what's running there:
[ jon@frogbog jon ]$ top 3:04am up 2:14, 5 users, load average: 0.11, 0.04, 0.06 68 processes: 66 sleeping, 1 running, 1 zombie, 0 stopped CPU states: 1.7% user, 2.3% system, 0.0% nice, 95.8% idle Mem: 191124K av, 86976K used, 104148K free, 0K shrd, 4284K buff Swap: 204080K av, 0K used, 204080K free 52648K cached PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 530 root 8 0 10844 10M 2068 S 0 1.3 5.6 0:59 X 2 root 15 0 0 0 0 SW 0 1.1 0.0 1:18 kapmd 1251 jon 11 0 1176 1176 964 R 0 0.9 0.6 0:00 top 900 jon 2 0 3640 3640 1500 S 0 0.3 1.9 0:01 xterm 543 jon 1 0 2472 2472 1644 S 0 0.1 1.2 0:03 wmaker 1 root 0 0 460 460 388 S 0 0.0 0.2 0:03 init 3 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kswapd 4 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kflushd 5 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kupdate 6 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 khubd 7 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 uhci-control 8 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 acpi 225 bin 0 0 404 404 316 S 0 0.0 0.2 0:00 portmap 278 root 0 0 528 528 428 S 0 0.0 0.2 0:00 syslogd 289 root 0 0 928 928 384 S 0 0.0 0.4 0:00 klogd 305 daemon 0 0 484 484 404 S 0 0.0 0.2 0:00 atd 321 root 0 0 600 600 504 S 0 0.0 0.3 0:00 crond 331 root 0 0 556 556 416 S 0 0.0 0.2 0:00 cardmgr
Note that top is an interactive program: It doesn't simply produce output and exit but instead produces output at regular intervals until quit. To quit top, just type q at any time. top's output might look different by default on your system, but the program is very configurable and you can adjust it to your liking.
Our habit, as good students, should be to dissect the information presented to us so that we can properly interpret it. Let's start at the top:
3:04am up 2:14, 5 users, load average: 0.11, 0.04, 0.06
The top line of the screen gives output similar to the uptime command, giving the current time, the amount of time since the computer was started or rebooted, the number of users logged in at the moment, and three load averages.
A load average is a simple measure of how much work the computer is doing. Although we generally think of Unix as a multitasking operating system, in reality a given CPU can do only one thing as a time. (Unix does handle multiple CPUs very well, but most systems tend to be single CPU.) Because each CPU can do only one thing at a time, there's a list of programs waiting for CPU time. The higher the number of such processes over a given period of time, the higher the load average. A load average of one, on most systems, indicates the capacity of one processor. On a four-processor system, a load average of four would indicate capacity. In reality, CPU power isn't the limiting factor for most applications, and a load of two to four times the number of processors is reasonable.
The load averages that uptime and top display are for one, five, and fifteen minutes. From these numbers, it's clear that my system's CPU is almost completely idle. The third line of top's display confirms this:
CPU states: 1.7% user, 2.3% system, 0.0% nice, 95.8% idle
User time is that being used by normal processes on the system. System time is that used to write to disks, time spent managing low-level hardware details, and so on. These numbers, together with the idle time, should add up to 100%. The nice time can actually be subtracted from this total because it indicates processes that have been given a substantially lower priority by either the user or the system administrator. These processes run only in time not used by other programs.
The next two lines give an overview of the memory usage on the system:
Mem: 191124K av, 86976K used, 104148K free, 0K shrd, 4284K buff Swap: 204080K av, 0K used, 204080K free 52648K cached
Mem: represents memory installed on the system. The machine has 192MB of RAM, of which just under half is used. Most of that RAM, however, is used for buffers and cache, used to speed apparent disk performance. The system also has 200MB of swap space, disk partitions used by the system as though they were RAM. None of the swap space is currently used, which indicates that a shortage of memory isn't lowering our performance.
Finally, we approach the list of running processes. Generally, dozens, hundreds, or even thousands of processes run simultaneously on a given machine. top displays only those processes using the greatest percentage of the CPU. (We can use the ps command, which I will soon discuss, to see all processes on the system.) With the process listing header, here's the first several lines of top's output:
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 530 root 8 0 10844 10M 2068 S 0 1.3 5.6 0:59 X 2 root 15 0 0 0 0 SW 0 1.1 0.0 1:18 kapmd 1251 jon 11 0 1176 1176 964 R 0 0.9 0.6 0:00 top 900 jon 2 0 3640 3640 1500 S 0 0.3 1.9 0:01 xterm
The first field is the process ID, the only guarantee that the process you're looking at is the one you think you're looking at. This is followed with the process owner, and then a number representing the priority of the process. A higher number means that more CPU time should be allocated to that process by the scheduler, the portion of the system that controls this. The next value, NI, represents the "niceness" of the process; A negative number means that the process is willing to give up CPU time to more important processes.
The STAT field is one of the most important because it gives us the status of the process: is it actively Running, is it Sleeping as it waits to be run, has it been sTopped by a user, or is it a Zombie?2
What's a zombie? Programmers, like mad scientists, make mistakes.3 Sometimes a process exits ("dies"), but its parent process doesn't clean up afterward. These processes remain in the land between the living and the dead until either such clean-up occurs or the parent itself dies, after which the system cleans up. Zombies are unsightly and take up resources but are rarely actually harmful, though they tend to stink up the room...metaphorically, of course.
On my system, the man page for top says that the LIB field doesn't work. If it did, it would indicate the number of library pages used by the application. If you don't know what this means, you probably don't care, even if LIB works just fine on your system.
You probably do care about the %CPU and %MEM fields: They indicate the percentage of the CPU being used by the process as well as the percentage of physical memory. If you want to know what's slowing you down process-wise, these are good indicators. On a system with acceptable performance, it's all right if a CPU takes up nearly 100% of the time because it might only be taking up time that would otherwise be idle. If, by contrast, performance is lousy, see what's hogging memory and the CPU. If you have a multiprocessor system, these values often add up to some multiple of 100% because many systems handle each CPU separately.
TIME shows the total amount of CPU "time" that the process has taken up over the course of its life. This is a rough measure of how much work the CPU has dedicated to handling that process. Long-lived processes often have large numbers in the TIME field even if they're well behaved, because even small increments add up over days, weeks, or even months.
The last field is simply the name of the command being run. This is useful if you haven't memorized what processes are running at what PID. (Yes, that's a joke. Nobody does that, though most admins know that init is always PID 1.)
So how's my system doing? Well, it's almost totally idle, it's got lots of free memory, and no process is currently hogging the CPU. The worst offenders on my system appear to be the X Window System (the Unix GUI), which is taking up a full 10MB of memory and using something less than 2% of the CPU. Something called kapmd is taking up just over 1% of the CPU. This process is handling power management on my laptop, and it's been running since my machine last booted. If you look closely, you'll note that its PID is 2. kapmd is really part of the Linux kernel, the part of the OS that controls access to all system resources. It starts up something that looks like a process to handle power management. kapmd has taken up a fair amount of CPU time total, but it would be hard to argue that it's affected my performance.
Next comes top, which is taking up almost nothing, as are xterm (a terminal window, to type commands to the system from X) and wmaker, my window manager. Everything else on the system is taking up entirely negligible resources.