Home > Articles > Operating Systems, Server

  • Print
  • + Share This
This chapter is from the book

This chapter is from the book

Scheduler-Related System Calls

Linux provides a family of system calls for the management of scheduler parameters. These system calls allow manipulation of process priority, scheduling policy, and processor affinity, as well as provide an explicit mechanism to yield the processor to other tasks.

Various books—and your friendly system man pages—provide reference to these system calls (which are all implemented in the C library without much wrapper—they just invoke the system call). Table 3.3 lists the system calls and provides a brief description. How system calls are implemented in the kernel is discussed in Chapter 4, "System Calls."

Table 3.3 Scheduler-Related System Calls

System Call



Set a process's nice value


Set a process's scheduling policy


Get a process's scheduling policy


Set a process's real-time priority


Get a process's real-time priority


Get the maximum real-time priority


Get the minimum real-time priority


Get a process's timeslice value


Get a process's processor affinity


Set a process's processor affinity


Temporarily yield the processor

Scheduling Policy and Priority-Related System Calls

The sched_setscheduler() and sched_getscheduler() system calls set and get a given process's scheduling policy and real-time priority, respectively. Their implementation, like most system calls, involves a lot of argument checking, setup, and cleanup. The important work, however, is merely to read or write the policy and rt_priority values in the process's task_struct.

The sched_setparam() and sched_getparam() system calls set and get a rocess's real-time priority. This call merely returns rt_priority encoded in a special sched_param structure. The calls sched_get_priority_max() and sched_get_ priority_min() return the maximum and minimum priorities, respectively, for a given scheduling policy. The maximum priority for the real-time policies is MAX_USER_RT_PRIO minus one; the minimum is one.

For normal tasks, the nice() function increments the given process's static priority by the given amount. Only root can provide a negative value, thereby lowering the nice value and increasing the priority. The nice() function calls the kernel's set_user_nice() function, which sets the static_prio and prio values in the task's task_struct, as appropriate.

Processor Affinity System Calls

The Linux scheduler enforces hard processor affinity. That is, although it tries to provide soft or natural affinity by attempting to keep processes on the same processor, the scheduler also enables a user to say "this task must remain on this subset of the available processors no matter what." This hard affinity is stored as a bitmask in the task's task_struct as cpus_allowed. The bitmask contains one bit per possible processor on the system. By default, all bits are set and, therefore, a process is potentially runnable on any processor. The user, however, via sched_setaffinity(), can provide a different bitmask of any combination of one or more bits. Likewise, the call sched_getaffinity() will return the current cpus_allowed bitmask.

The kernel enforces hard affinity in a very simple manner. First, when a process is first created, it inherits its parent's affinity mask. Because the parent is running on an allowed processor, the child thus runs on an allowed processor. Second, when the affinity of a processor is changed, the kernel uses the migration threads to push the task onto a legal processor. Finally, the load balancer only pulls tasks to an allowed processor. Therefore, a process only ever runs on a processor whose bit is set in the cpus_allowed field of its process descriptor.

Yielding Processor Time

Linux provides the sched_yield() system call as a mechanism for a process to explicitly yield the processor to other waiting processes. It works by removing the process from the active array (where it currently is, because it is running) and inserting it into the expired array. This has the effect of not only preempting the process and putting it at the end of its priority list, but putting it on the expired list—guaranteeing it will not run for a while. Because real-time tasks never expire, they are a special case. Therefore, they are merely moved to the end of their priority list (and not inserted into the expired array). In earlier versions of Linux, the semantics of the sched_yield() call were quite different; at best, the task was only moved to the end of their priority list. The yielding was often not for a very long time. Nowadays, applications and even kernel code should be certain they truly want to give up the processor before calling sched_yield().

Kernel code, as a convenience, can call yield(), which ensures the task's state is TASK_RUNNING, and then calls sched_yield(). User-space applications use the sched_yield()system call.

  • + Share This
  • 🔖 Save To Your Account