This document presents how to use signals in userspace Linux applications, showing several kinds of program patterns in C and C++ with them. Alternative (and better) patterns to signals are presented, too.

Signals are positioned in the threaded world, i.e., programs are assumed to be multi-threaded. Even a single-threaded program could call library functions that are multi-threaded, without knowing it, and thus become multi-threaded itself.

Sections in the document present first functionalities, and the code pattern to use to implement them.

Some dedicated sections are devoted to digging into the hows and whys of the proposed solutions. They have a colored background like this. Readers who are not interested, or have no time or will to spend to know more, can skip them.

Copying

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.

Copyright owner waives GNU Free Documentation License's obligation to include a copy of the licence text if redistributing the covered work or derivatives thereof.

Waiver: this document contain program examples. They are displayed in monospace font. Permission is granted to copy and modify them for inclusion in programs.

Disclaimer: the same disclaimers of warranty and liability stated in sections 15, 16, and 17 of the GNU General Public License apply to the contents of this document.

Overview

Signals were introduced when there was no threading, and had two purposes: to interrupt the flow of execution when it was impossible to continue it (synchronous signals), to control it (stop, continue, kill), and to carry on activities in parallel with the man ones (e.g., asynchronous I/O).

With the introduction of threading, the last one, and partly also the second one, can be done better using threads.

Signals, while in principle a simple mechanism, in practice have a number of drawbacks and dark corners that make their use rather difficult. One of the drawbacks is that they interrupt the execution of a number of system calls that must then be restarted by the program. Another is that they interrupt execution of normal code asynchronously (instead of doing it at some defined places), and a further one is that their handlers can do a rather restricted set of actions.

This document shows how to perform the activities that traditionally have been done with signals, using signals when necessary, and using threads otherwise.

Semantics

Signals notify a process of the occurrence of some event: the synchronous ones notify about violation of some condition regarding the instructions being executed; the asynchronous ones about some software or hardware event. Typically, applications handle signals in the following way:

Basically, there are three kinds of semantics for signals: the one to abort a sequence of actions, the one to continue seamlessly it, and the one to support a parallel thread of actions. The handling is then done either interrupting a course of actions, or using a parallel course of actions. The latter can be done either using signal handlers (when the operations to be done are very simple), or dedicated threads. Signals have been introduced in Unix long before threads, and allowed some form of threading inside a same process to handle asynchronous events.

Using a thread dedicated to handle signals is in most cases the same as using signal handlers, but without the restriction to run only asynch-signal-safe functions, and the nuisance of having system calls interrupted. There are few cases, however, in which handlers are needed. One is when synchronous signals occur. E.g., when accessing a memory mapped region at an address that is not mapped, we get a SIGSEGV or a SIGBUS, and in the handler we can map it, and return, thus making the code that accesses that memory completely unaware of this. We cannot use another thread because the signal is sent to the offending thread, and, even if we could, we could not restart the offending instruction.

Bear in mind that, while interrupts really interrupt the current process (unless it is running with interrupts off), a signal might not interrupt a process with the same promptness. Therefore, we cannot be sure that a process is scheduled so promptly as to get all signals sent to it. Realtime signals are queued, and then at least are not lost.

Ideally, the signals that need little time to be served could be handled immediately, and the ones needing a long time (i.e., a time greater than the time between two arrivals of a same signal) could be queued by a thread and handled by another. However, signals are not meant to notify processes so frequently as to require such front-end/back-end architecture.

Note that signals are somehow more asynchronous than are interrupts with respect to the program that expects them. The program that uses interrupts to perform I/O has enabled a device to assert them. They occur at a point in time that lies in the interval from the enabling instant onwards (and often within some defined time). A kill signal is more unexpected, even if a program that wants to treat it has defined a handler for it. An event that makes a program terminate abruptly (e.g., a power failure) is even worse because it cannot be handled and can lead to leave data (e.g., disk files) inconsistent. To handle the latter, a stronger form of atomicity must be used, e.g., the writing of a block, that is likely to occur or not to occur even in case of failure. Operations that are made atomic by using locks can be interrupted by a failure.

Signal are simple means to communicating events without creating persistent objects. They have been used for this before threads were introduced in Unix.

Note that, in accordance with the table above, most of these cases are handled with threads.

Since the disposition of most signals is to terminate the receiving process, possibly leaving persistent objects in an inconsistent state, signals need be handled. Take into account that a process can send all signals to another, even the ones that are meant to be generated only internally.

The examples

All uses of signals in this documents are shown with working examples. In them, frequently occurring actions are represented as functions that are hyperlinked. In actual programs, they need not be kept as functions; they can instead be inlined.

Error paths are indicated, but no error action is provided. Simple programs could handle them issuing an error message and then terminating; production-quality programs would provide appropriate error reporting and recovery.

Signals basic

This chapter describes how signals behave, and the basic techniques to deal with them.

Overview

Signals are sent either from processes to other processes, or by the system as a result of the violation of some condition (e.g., illegal instructions), or the occurrence of some event (e.g., the expiry of a timer). Sending signals shares some behaviour with interrupts:

The kernel very frequently (sometimes immediately, otherwise each time it switches from kernel mode to user mode, and at least almost every timer interrupt) checks if there are signals to deliver to running processes. If there are pending signals, the kernel takes one and consumes it:

The execution of the normal flow of control can be interrupted by several signals at the same time. It is up to handlers to block or ignore them so as to make the behaviour of a process manageable. E.g., a kill task signal should make a process ignore further signals of the same kind until it has handled the current one (not only in the handler, but in the process normal code also).

Lifetime

When pending signals are given the target process and an action taken, they are delivered.

The lifetime of a signal is the interval of time between its creation and its delivery (possibly consisting in ignoring or discarding the signal).

Initial state of signals

The disposition of signals when a process makes an exec() is the default one for all signals that are caught by the creator, and the same as the creator's for the others. In particular, signals that are ignored by the creator are initially ignored by the created process. This allows a shell that does not support job control to set the disposition of the interrupt signal to ignore when it creates background processes so as not to interrupt them when interrupting the foreground process. However, the standard shells of Linux do support job control, and do not send interrupt signals to background jobs.

When the initial disposition (provided by the creator) is to ignore a signal, and a process wants to honor the creator's disposition to ignore signals, this is the scheme to be used:

Ignoring signals that are set to be ignored by the creator applies only to the job control ones, or to signals that are agreed upon between the creator and the child. Actually, it should be (and it is) up to a shell to decide what processes to kill when the user interrupts the foreground process, and not to the background processes to take care of it. The suggestion is then to use this only when there is a real need for it.

A process that is created with fork() inherits the disposition of the creator. A process created by clone() can optionally inherit the disposition of the creator. A process created by pthread_swap() can choose to inherit the ignoring of signals from the parent.

It is possible to create a process with all signals blocked. (The mask is inherited in execv() and fork().) A process that is created with all signals unblocked can be killed before it decides to handle them. Since the signals that are ignored in the father are also ignored in the child process, it is also possible to create a process with all signals ignored, such a process can then register handlers as it likes without being terminated by a signal before doing it. For processes created by the shell, there is no way. The shell creates them as it likes. For ones created by a user process, the solution is to fork, then block all signals, call exec, and unblock them in the child. The process then has the possibility of ignoring the ones it wants to (thus discarding any such signals possibly pending). It would be fine to ignore all of them at the beginning, so as to clear the pending ones, but it is unlikely that there are any. Note that the pending signals of the creator are not pending signals for the created one.

Delivering signals

When a signal is sent to a process, a thread that does not block it or a thread that is waiting for it is chosen randomly, and the signal delivered to it. Signals that are directed to specific threads are delivered to them.

When several signal are pending, one is chosen to be delivered with this priority ordering:

Sending signals

Signals can be sent using one of the system calls enlisted below. The API is not uniform:

system call	target	data
`kill()`	process (thread group), process group, all processes	no
`pthread_kill()`	thread	no
`sigqueue()`	process	yes

All other system calls that send signals (raise(), killpg() and tgkill()) are wrappers to these.

Note that there is no way for a thread to send a signal with data to another thread: The one system call that sends signals with data is sigqueue(), that sends signals to processes only. However, if there is only one thread in the process that has registered a signal handler for the signal, that thread will receive the signal and its data.

sigqueue() can send a pointer as value: The pointer is an address in the user space of the sender. It can have a meaning in that of the receiver only if it is the process itself, or it has been created with fork() not followed by an exec(), or it is a thread of the same process as the sender. It does not in all other cases, and likely it causes a SIGSEGV if de-referenced.

sigqueue() allows sending signals with a value attached, not only for realtime signals, but for the others, too. In Linux, all signals carry the sender's pid and the data.

Note that there are no built-in means for a process to check that a process to which it is sending a signal can actually handle it. It would be nice if sending a signal could atomically return an indication telling if it succeeded (e.g., if the receiver has not set the disposition of the signal to ignore). This can be achieved by having the receiver send back another signal to acknowledge the handling.

After sending a signal to a process, always check its return status, since the receiver process could have terminated. Both kill() and sigqueue() return success when sending a signal to a zombie process (that cannot obviously handle it). Do the same when sending signals to threads. pthread_kill() returns ESRCH when sending a signal to a thread that has terminated but not yet joined.

A process can send a signal to another only if its real or effective user ID is the same as the real or saved set-user-ID of the other. The calls above return EPERM when the process does not have the privilege to send the signal.

Some signals are meant to be generated only internally (e.g., SIGSEGV). However, they can also be sent by other processes. The means to fend off them is described here.

Received signals

Note that a process does not receive any signal when the computer is suspended, hibernated, or the system shut down (shutdown).

Blocking signals

Signals can be blocked (i.e., left pending, not causing handlers to run until unblocked). Note that blocking signals does not mean preventing a process to handle them: a process can have threads that are waiting for them.

Signals can get blocked because explicitly set in the process signal mask, or because the process is executing a handler registered with sigaction(). When a signal is delivered, the kernel adds the signals specified in sigaction() to the signal mask of the thread to which the signal is delivered, and restores it when the handler terminates.

Note that the thread signal mask is counter-intuitive: bits are on for the signals that are blocked.

If a thread creates a child process or another thread, it makes it inherit its set of blocked signals.

Pending signals

Signals that are not handled because blocked are pending. Actually, all signals between generation and delivery are pending. There is one set of pending signals, that is made of the process directed signals, and there is a set of pending signals for each thread, which is made of the signals directed to the thread.

sigpending() delivers the set of signals pending for the process united with the ones pending for the current thread. It does not remove pending signals.

To clear a pending signal, sigtimedwait() can be used, and also setting its disposition to ignore (and then resetting it back again). They have a different effect.

A solution is to use sigtimedwait(), that returns immediately if the timeout argument is zero, clearing any pending signals, if any. It removes a thread-specific pending signal, if any, and, if none exists, a process-specific one. If there are both, it removes only the thread-specific one:

Another solution is to set the disposition of the signal to ignore and then back to what it was before. This removes the thread-directed pending signals of all threads and the process-directed ones:

Note that a thread has no way to remove just a signal that is directed to itself: the first solution removes a process-directed signal if no thread-directed one exists, and the second removes all.

Note that a signal whose disposition is SIG_IGN and that is blocked, is delivered and kept pending. If its disposition is set again to SIG_IGN, it is removed from the pending signals.

Newly created processes by fork() have no pending signals; but exec() makes the process inherit the signals that have been raised between fork() and exec().

Synchronizing sending with receipt

A thread needs to delimit the intervals of time in which it receives signals, and handles them. It depends on the paradigm:

Delimiting can either be done by unblocking/blocking signals, registering/de-registering handlers, and enabling handlers using per-handler flags. The first two have a similar cost; the latter has almost no cost.

There is a need to make a distinction between internal and external signals. The internal ones are the ones that can occur as a result of some operation made by a thread. E.g., timers, memory accesses, completion of I/O operations. They are used in supervised blocks, and thus they must only be handled as long as a supervised block lasts. The external ones are requests made by other threads and processes. The remaining part of this section deals only with the external ones.

Sending a signal to a receiver thread is a two-step action, and each step has its own caveats:

Hooking the target means ensuring that a process to interact with is there and it is the right one (i.e., not an homonym). This means to wait, or be notified when it starts, and to be notified when it ends. This is not a problem for related processes since fathers are notified about the doings of their children. On the contrary, unrelated processes must communicate their creation. This can be done in several ways, as described below. Making a sender aware of the termination of the receiver is more difficult, because termination can occur unexpectedly. Some solutions are described below.

Some of these means require a sender process to poll for an object to come up; they differ in the time spent in polling. Polling for a process with a given name (i.e., scanning repeatedly /proc) is done until the process is created. Polling for a queue is done until the queue is created, creation that can be done by any of the processes that participate to the moot, or even by some startup script; thereafter a sender can wait for a message to come up (telling a receiver has started) without polling. The choice depends on how the application is made, but some means are better than others. E.g., the ones that allow multiple installations are better than the ones that work on a single one; the ones that do not require manual configuration are better than the manual ones.

Message queues appear as files in the /dev/mqueue directory (to be mounted with mount -t mqueue none /dev/mqueue if it is not already mounted).

This scheme can be adapted to look for processes that match some given attributes.

Note that using inotify() to be informed about process termination is not perfect: there is a race because the time from process termination and the waking up of read() is not null, and then the sender could have sent a signal while the receiver is dead. However, this should be not a big problem because this time is much lower than the pid recycle time.

Sending a signal can only be done when the sender is sure that the receiver (pid) is the right one, and the receiver is there to accept the signal (and not to lose it). Since the sender has no cheap means to know that the receiver is there, signals can only be used when receivers are always ready to accept them.

Let's tackle the sending.

Threads and related processes know each other, and then can easily ensure that they exist when sending a signal: related processes exist until their fathers wait for them, threads exist until joined.

When processes are instead unrelated, a sender needs to know the pid of the receiver in order to send any signal. Even so, the pid could have been recycled (and the same pid be used by an homonym process). Then, there are two alternatives to prevent sending signals to homonyms: the first is to monitor the existence of the receiver, and invalidate the receiver's pid held by a sender so as not to send a signal when it no longer exists. The second is to use the data attached to signals to disambiguate between homonyms: the receiver process picks up a value that is likely to be unique (e.g., the time in seconds), and sends it along with its pid to senders. Senders attach it to the signals sent as a token. A receiver would handle the signal only if the token is the expected one. Note that this solution requires collaboration from receivers. However, even if the right receiver has been so instrumented, the recycled pid could have been taken by any other process in the system, that is not instrumented, and that reacts to the (unwanted) signal (possibly aborting). Therefore, it is not correct for a sender to send a signal without knowing that the receiver was the right one: the sender must make sure of it in advance. How to monitor receivers is detailed below.

Even having a sure target, senders should take into account that receivers present windows of time in which they accept signals. They depend on what kind of synchronization protocol one wants to put in place: if one wants a persistent one, like, e.g., semaphores, then it can block signals and use the realtime ones, which are queued and caught when the process decides to wait for them; if one wants a volatile one, like, e.g., condition variables, it can set up a handler or a thread that waits for them and handles them only when the process decides to have a look, and otherwise discard them. They differ in what happens when signals are sent and the thread is not waiting for them. The non persistent one is not easy for senders, because they can hardly know when a receiver is accepting signals. Accepting a signal sent from another process without making known when it can be accepted is like providing an aleatory service: the sender sends and, if lucky, it can obtain the desired response. This forces senders to poll sending signals until they obtain what they want (supposing that they can check it). Synchronization between processes is better done with means that are persistent. I.e., a sender sends or unlocks something, and a receiver comes to a waiting (or lock acquisition) point with its own time, and if it comes after, nothing is lost.

Note: Consider what happens with semaphores. They are persistent, and then the sender does not need to know that the receiver is already there. Named semaphores also can live for a span of time, and addressing them when they are no longer alive returns an error. Semaphores can be created also before processes that use them are there. Thus, one difference is that they have fewer problems with existence, and more with persistence. In other words, they are more persistent than processes, and thus are there when needed, but also when not needed (more than processes). A criterion is that when the lifetimes of the processes that communicate are such that when the sender sends a signal, the receiver is mostly there, signals are better; otherwise, semaphores are better. If the lifetimes do not overlap, the only solution is semaphores. There is a paradigm in which two processes need to interact, which requires them to be alive at the same time (at least as long as the intercourse has to be done, which means that one has to wait for the other to be created if it is not there), and another paradigm in which one sends messages to another, which might not even exist at that time.

To eliminate the race that occurs when using inotify() to detect process termination, there would be a need for a dedicated system call that does the same as what is done between fathers and children: keeps a receiver zombie until all senders have waited for them, or removed the watch. I have the impression that there is a deficiency in *nix: processes needing to send a signal to another should not be obliged to watch for the receiver to be alive, they should just send the signal and get an error reply if the other is not there, with virtually no problem of homonyms. If pids would recycle in a very long time (which is the case of 64-bit kernels), this problem would not exist. A solution is to have a process that creates the receiver, and that keeps it zombie until there are senders that need it. It would keep a list of senders, that is updated when a sender registers to it to receive notifications.

Interrupted system calls

Slow system calls are interrupted by signals. When a system call returns interruption, this means that it has not (entirely) fulfilled its task, because something happened which is not an error in the system call. Then either it is restarted, or it is abandoned because the overall operation has to be killed, or it is ignored. Ignoring is appropriate in cleanup only (e.g., as cleanup of file operations, a file can be closed without checking the result because nothing can further be done if the close fails). Restarting should be the default.

A number of I/O, file-locking, semaphore, etc. system calls (after having been silently interrupted) are restarted if the handler that interrupts them has been registered with the SA_RESTART option. Some 28 slow system calls are never restarted: the ones with timeouts, the ones for signals, and a number of others (see man 7 signal).

If while a process is blocked in a system call, a signal handler runs and executes a longjmp() that makes the process restart its execution from another point, the system call is never restarted. When a signal arrives, and the process is waiting in a slow system call, the system call is aborted, and possibly restarted if and when the handler returns normally. Handlers that are registered with SA_RESTART make a system call restarted when the handler returns normally. Upon jumping, errno is not set to EINTR. However, do take into account that a longjmp() is not asynch-signal-safe, and therefore it is plausible that the Linux documentation does not specify what happens when one is executed in a signal handler.

Some 15 slow system calls are interrupted when the process is stopped with a stop signal and then continued with a SIGCONT (see man 7 signal) making them return with EINTR when the process is continued (i.e., it receives SIGCONT), except for sleep() that returns a nonzero value. If there are several threads that are suspended on such a call, all of them are interrupted. N.B. the entries in man 7 signal that are tagged: "Linux 2.xxx" enlist calls that are interrupted only in that version of the kernel. Since SIGSTOP cannot be blocked, ignored, or handled, and SIGCONT continues anyway the process, EINTR (or interruption) must be tested after system calls.

It is not possible to know what signal interrupted a system call (except for the ones that wait for signals, like, e.g., sigwait()). However, something can be done by setting flags in handlers. Flags must be cleared before executing a system call, and tested after:

    flag1 = 0; flag2 = 0;  ...          (1)
    ret = syscall(...);                 (2)
    if (ret == -1 && errno == EINTR){   (3)
       if (flag1) ...                   (4)
       if (flag2) ...
    }                                   (5)

If a signal whose handler sets a flag occurs between (1) and (2), it sets a flag, but it does not interrupt the system call. If the handler runs within (2), then the test at (4) is correct. If it runs between (2) and (3), then it has not interrupted the system call, and (4) is not executed. The same happens if it runs between (3) and (4). In all cases, several handlers could have run, setting their flags. E.g., a handler could have run between (1) and (2) and another within (2). Flags are then not a reliable means to tell what signal interrupted a system call, but flags whose meaning is to kill the overall operation can still be used reliably. When a kill flag is true, little matters if it has been raised before, during or after the system call. It should be tested before testing flags whose meaning is to restart the system call. (The opposite would make system calls restarted when they have completed successfully, and handlers run before or after them.) Kill flags could be tested irrespectively on the system call returning EINTR (i.e., right after (2)). However, this does not improve the program much: the kill request will anyway be honored the next time it is tested. Moreover, the program would handle a system call that completed successfully as if it had failed. N.B.: handlers could set an integer variable telling what signal they caught, but they would need to set it only if it is not yet set by more priority signals, taking then upon themselves the burden to manage priorities.

Note that a system call can have been interrupted by several signals.

Restarting system calls

The normal way to restart an interrupted system call is simply to execute it again. However, there can be special cases.

In some cases, an interrupted system call has partly fulfilled its task (e.g., sleep(), that returns the remaining time); when it has to be restarted, likely it should be requested to do only the remaining part. This is also the case of system calls with timeouts. Some system calls have a timeout argument that is the absolute time, and some others the relative (elapsed) one. They both denote a point in time beyond which the system call does not wait. If the process is suspended, time still goes on, and, when resumed, if time has passed that point, the system call must fail. Absolute timeouts are simpler to restart, but in theory they could expire after they have been computed and the system call to which they are passed is executed.

All system calls that have a timeout argument, except for select(), clock_nanosleep(), nanosleep() and sleep() do not return the remaining time when interrupted. System calls that require an absolute timeout can simply be restarted. For the ones that requires a relative one, the absolute time must be saved at the time the system call is executed, then the absolute time checked again when it is interrupted, and the difference made to check if the defined time has expired. If it has not, then the remaining timeout is passed, otherwise the system call is aborted.

It is possible to implement a library of functions that wrap the 28 system calls that restart them when interrupted. However, it is not advisable to use the same function names as the interrupted ones (and link the library before the standard libraries), because it would impair the implementation of task kill with signals, because kill requests must be tested before restarting.

Non-event-driven signal handling

Blocking signals and testing if they are pending allows to use signals in a non-event-driven way.

Waiting for signals

Most signals can be treated having a thread that processes them. This allows performing I/O and all other operations that are not allowed in a signal handler. Realtime signals are queued by the kernel. (There is a limit to the signals queued, that can be changed with setrtlimit().) Concerning the others, queuing is needed only when they have low inter-arrival times, that sometimes are lower than processing times, which is seldom the case. Otherwise, there is a need to use a paradigm that is similar to that of drivers: having a top-half and a bottom half. By far, the simplest way to implement it is to have a thread that catches signals and queues them, and another that processes them. If the speed of the former is not sufficient, queuing must be done in a signal handler, but synchronization with the consumer thread is quite difficult because there are no system calls that can be used to do it. Luckily, in practice there is no need for it.

The codes of realtime signals are between SIGRTMIN+3 and SIGRTMAX (the first three are used by Linux). Their default disposition is to terminate the process.

To wait for a signal, sigwait() is the best; to wait and get the attached data, the best is to make a loop with sigwaitinfo(), cycling when it returns EINTR.

sigwait() does not run handlers, and does not need a handler registered for the signals it is waiting on. It is as if it changed the disposition of such signals to be accepted by it. It overrides also the "ignore" (SIG_IGN) disposition. However, this seems to be a borderline case, and thus it is better not to have the disposition set to "ignore" for signals to be waited on.

N.B. sigwait() is meant to be called with signals blocked, and returns also with signals blocked: it allows to use them like events. If a signal that is not in the set passed to a sigwait() call occurs while the thread is suspended in it, the signal is handled according to its disposition.

siguspend() is similar to sigwait(), but it is a bit twisted: its argument denotes the complement of the signals to wait for. Moreover, the signals to wait for must have an handler. Otherwise, the default disposition is used, which is mostly to terminate the process.

This snippet waits for a signal to come. While waiting for the signal, the other signals are served.

To devote a thread to handle a signal, a handler can be registered that re-sends the signal to the designated thread if it is caught by the wrong one:

This solution has a drawback, though. The thread that catches the signal, which could be any, could be executing a slow system call. Some slow system calls are never restarted when interrupted by a handler. They must then be restarted by the thread code. N.B. some pthreads functions are called in the handler. They are not in the list of the asynch-signal-safe ones, but are actually so in the Linux implementation.

There is another solution, in which the signal is blocked by all threads, except the one devoted to handle it so that the signal is delivered to it. The easiest way to do it is to block signals in the main thread before creating the others. But in so doing, any thread that creates a child with a fork() creates a child with that signal blocked, unless it unblocks it in the child.

We shall see that there is no way to overcome this drawback by attempting some other solutions.

A solution could be to register fork handlers, but that ought to be done in all threads that are created.
Another solution could be to block the signal for the process in the signal handler, to awake the dedicated thread, and to unblock signals in it after having got the data of the received signal. However, a thread can only block/unblock signals for itself and not for the entire process or for another thread.

None of these alternatives is a solution. The fact that the thread to which a process signal is dispatched is chosen between the ones that do not block the signal is a problem. This means that a library that creates threads to make its job faster can have such threads receive signals that other threads do not want to handle. A program could have a main thread that does not block a signal, a dedicated thread that blocks it, and possibly other threads that (e.g., being created from within library functions) do not block it either. They could get the signal instead of the main thread. The scheme to have only one thread catch a signal is to block it in the main thread, which makes all threads generated afterwords and not changing their signal mask have it blocked, and have one dedicated thread wait for the signal. This has the drawback that also the forked processes would have it blocked. Now, since it is undefined what thread can get a signal, if we want to devote one to catch it, the signals must be blocked for all threads. This means also that fork() must unblock signals if it wants to use the default disposition. The conclusion is that although it is possible to catch signals in the main thread and dispatch them to some other dedicated threads (synchronizing them with semaphores, Peterson's algorithm, wait free data structures), this does not avail because we must still block the signals to handle in all other threads (otherwise we are not sure that the main catches them). We can then as well use dedicated threads and catch them directly with sigwait().

N.B.: By the way, this drawback is not so bad, because processes must block signals anyway when started, and unblock the ones that they choose to handle.

signalfd() creates a file descriptor on which signals can be received; without a need for handlers. It can help when one wants to wait for an operation on a file descriptor (e.g., an I/O operation) and a signal with a select(). There are no races: the signal is not lost since there is an atomic operation that allows to wait and to detect the signal at the same time. This supersedes the practice to register a handler that sends a byte to a pipe so as to get it with a select() or read(). This works if the process waits for some input, and wants to interrupt the wait with a signal (but make sure that there are no races when cancelling the input request when a signal occurs and is got with a select()). The running of a signal handler, instead, works with all the slow primitives. This is a means to make a signal persist until it is handled: that signal, which must be blocked before calling select()or read() is not lost if it occurs when the thread is not suspended, and it is handled at the first select() or read() executed. pselect() provides the same functionality: to wait on file descriptors and also on signals. signalfd() provides a unique interface (i.e., a file descriptor) towards events, which is more flexible than sigwait() because it can be used in select(). The scheme is:

Threads

The use of signals must be placed in the multi-threaded context: programs that are single-threaded must be implemented as if they were multi-threaded. A single-threaded process might call a library function that internally creates threads, and thus becomes multi-threaded itself (possibly without the programmer knowing it).

There are process-directed signals and thread-directed signals. SIGSEGV, SIGFPE, SIGBUS, SIGILL, SIGSYS, and the ones generated with pthread_kill() are directed to specific threads; the others to the process.

Each thread can block incoming signals on a per-signal basis: each thread (including the main one) has its own signal mask. Blocking signals on a per-thread basis is also the way to tell what threads get what signals: a process-directed signal is delivered to a thread chosen between the ones that do not block it or are waiting for it, if any. E.g., if two threads call sigwait() for the same signal, an unspecified one is chosen. Moreover, each thread has its own pending signals.

The functions pthread_sigmask() and sigprocmask() apply to processes, threads, and signal handlers and deliver the same results (even if the documentation states that sigprocmask() has an unspecified behaviour on multi-threaded processes). When a handler runs, the signal mask is the one set by sigaction(), or-ed with that of the thread to which the signal has been delivered.

All threads share the same signal dispositions. E.g., sending SIGKILL to a thread (i.e., from a thread to another) kills the process.

Thread-directed signals can be handled in a thread specific way by threads that wait for them, whereas signals that are handled by signal handlers are treated in the same way even when they are caught by different threads (unless handlers distinguish among interrupted threads).

Signal handlers are a sort of shared resource. This is a problem only for signals that need a handler, and not for the ones that are accepted because we can have several threads waiting for the same signal, and handling them differently. Then, lay out a process-wide plan for using signals, defining what threads handle what process directed signals, and what signals are served with handlers. A thread that calls a library that sets a handler can disrupt other threads that relied on the handlers that were in force before. Then, in libraries, never register handlers.

Realtime signals have no predefined meaning, and thus can be used freely. However, in general they cannot be used as resources that are assigned to threads (e.g., for timers): there are processes that create a variable number of tasks, and with them it would be easy to run out of signals. They must be assigned a process-wide office, or none.

A solution to provide per-thread signal handlers is to use a thread private variable to hold the per-thread function pointer to the handler:

Note that the (process-wide) handler is set only once, and all threads share its registration flags, and also the set of signals that are blocked in it (and thus in all per-thread handlers). This is not much of a restriction since these settings are quite common. Threads can also have nested sections in which they save and restore and set new handlers. Note also that the per-thread handler runs in the context of the thread that caught it originally (e.g., pthread_self() denotes the original thread, as well as thread private variables). This solution then applies to thread directed signals that need a thread specific signal handler, and not to process directed signals.

Note also that this does not work with existing libraries that set handlers (that would not work anyway since they conflict with other uses of the same signals, unless we redefine sigaction() with a library that is linked before libc.)

While in theory it would be preferable to let each thread have its own handlers, in practice not having this is not a big restriction since handlers are not that much used.

Threads execute simultaneously with other threads, also when they have caught a signal, and are thus executing a signal handler. Therefore, several simultaneous executions of a same signal handler can exist even if the handler has been registered with no SA_NODEFER flag.

Signal Handlers

Signal handlers may run at any time (asynchronously) when signals are unblocked in a piece of program. Some system calls (e.g., pselect(), ppool(), sigwait()) are called with signals blocked; they unblock signals only when waiting. In this case, handlers are not run asynchronously since they can be invoked only in well defined places in the program. When signals are delivered, if more than one signal is pending, the system can (and usually does) run all the handlers of such signals before returning to the execution of the normal code.

Signal handlers must be registered with sigaction(). The sigaction struct argument can be reused: it conveys only data to a sigaction() call, which does not use the struct afterwords. There is a need to initialize all fields (e.g., the mask of blocked signals) before passing it. Example:

It is possible to set a handler for several signals and then use a switch in it to define the actions for all signals handled. It is also possible in it to change the actions for a signal by testing a flag that is set in the program (but care must be paid to avoid races: see this example).

Since it is not stated whether the standard string functions are re-entrant, you should assume that they are not. If you plan to make handlers do something more than barely raising a flag, you may need to build a library at least to manipulate strings to debug the handlers.

Handlers can be told to use an alternate stack when running. This is not much useful in normal cases. (It allows handlers to run also when a stack overflow has occurred, and a SIGSECV or SIGBUS generated.)

What can be done in handlers

Handlers can access global variables declared static volatile sig_atomic_t, but, when they access other data (e.g., doubles, composite data (structs), etc.) they may find them in an inconsistent state. When such data need to be accessed in handlers, they must be updated by threads in critical sections with signals blocked. static volatile sig_atomic_t name is the way to declare global variables that can be accessed from within signal handlers and outside them without a need to block signals. It ensures that reads and writes cannot be interrupted in the middle.

Sequential consistency is not guaranteed between a thread and a signal handler that interrupts it (except when the handler is executed because of abort(), kill() or raise()). Actually, a signal handler interrupts a sequence of steps that could have been reordered during compilation or execution. The only guarantee concerns sig_atomic_t variables, whose accesses are never interrupted in the middle. (I.e., loads and stores, when initiated, are carried out to completion and the thread interrupted at the end of an access, but not in the middle of it, which can be done for other variables whose loads and stores involve several memory references.) Consider, e.g., a thread that updates two variables in a known program order, the second one being a flag that tells that the first has been changed. From within a handler, you cannot count on them be updated in this order unless both have been declared volatile.

Most of the problems of handlers accessing global data are solved by having, instead, a thread that waits for signals, which means that once it has resumed, it can handle them in a thread context (in which it can use mutexes, for example).

The safest way is to make handlers do as little as possible, and then do what has to be done outside handlers.

Handlers must not call system calls that are not asynch-signal-safe, or functions that are not so. There is a difference between thread-safe (aka MT-safe) and asynch-signal-safe. A function that is not re-entrant can be made so by protecting it with a mutex, providing that it does not call itself. (However, a truly re-entrant function can be recursive, while one protected with a mutex cannot.) Of course, a recursive mutex can be used (but a function that uses a mutex probably does so because it uses global data, which makes it intrinsically non-recursive; to allow to call itself, it must be changed to use automatic data). Functions that have a state, such as malloc(), are difficult to make re-entrant. They could be made so by temporarily blocking signals inside them. A truly re-entrant function is also asynch-signal-safe, while one that has been made so with an internal mutex is not asynch-signal-safe. The difference, then, between a thread-safe and an asynch-signal-safe function is that the former could access global variables and protect them with a mutex (threads will sequentialize in accessing that), while the second cannot. Such a function cannot be asynch-signal safe, because, if it is interrupted, and a signal handler runs, it will deadlock (besides waiting on a mutex not belonging to the list of functions that can be called in handlers). Moreover, the latter could protect accesses to global variables blocking signals, while the former cannot. (Blocking signals does not prevent it from being executed contemporaneously by different threads.)

Note that a function (and thus a handler) that sets a global flag is probably not considered re-entrant by canonical definitions, but actually it is.

There is an exception to the rule above that forbids handlers to call asynch-signal-unsafe functions: a handler that does not interrupt one such functions can call them. This can occur, e.g., with some synchronous signals. This holds between a thread and a signal handler that interrupts it. That handler can call one such function even if the very same function is called by other threads. I.e., the restriction holds only between a thread and a handler that interrupts it.

In practice, most system calls are actually asynch-signal-safe except for the assignment to errno, which can easily be handled in signal handlers. (The list of system calls is contained in syscall.h.)

pthread_sigmask() is the same as sigprocmask() (with some checks added), and therefore it is actually asynch-signal-safe. pthread_kill() executes a tkill(), which is a system call, and thus is actually asynch-signal-safe.

Handlers may be empty, in which case they serve only to interrupt slow system calls, or to ignore signals, or contain some code that accesses global variables, in which case they are almost always non-reentrant.

When a handler is in force in a program piece in which no asynch-signal-unsafe functions are used, such functions can be used in the handler because they will not interrupt themselves. However, this is a dangerous programming practice, because when the program undergoes some maintenance change it is easy to forget this hack, and add some asynch-signal-unsafe function calls to the program piece, thus breaking the program.

There is no need to do I/O in signal handlers, except for debugging and for making another I/O request in signal-driven I/O.

In a signal handler, write() should not be used: it is better to defer it to some thread. However, it could be used for debugging, and it is better than printf(), which could make the program abort (when the signal interrupts a printf()). A thread that is executing a write() and is interrupted by a signal handler that executes itself a write() may produce a partial output followed by that of the handler. Blocking signals in handlers spares at least to test EINTR after write()s. Linux guarantees atomicity of writes to pipes only (and only when the amount of data is lower than a configured limit). A thread can test if the data to be written have been actually emitted, and retry when that is not so. However, this means that output can be intermixed, and there is little that can be done, because locks cannot be used in handlers.

There is no built-in way for a function to tell if it is being executed in a handler or in the mainstream code of a thread.

A slow system call, called in a handler, returns with EINTR when another handler interrupts it.

Handlers should not interrupt each other: it only makes things more complicated, and adds little. Since handlers must not perform long operations, there is no point in interrupting each other.

Handlers and threads

When a signal is delivered to a process, a thread is chosen among the ones that have not blocked the signal or are executing a sigwait() for it. If one such thread exists, then it is interrupted, and the signal handler (if any) runs. When the handler runs, it blocks the signals specified when registered by ORing them to the suspended thread signal mask. Note that while the handler is executed, the thread is considered to be executing, too. (It is just in another place, out of the main road.) Actually, a thread can cancel another without bothering, if the latter is executing its mainstream code or if it is in a signal handler.

There is some knowledge of the interrupted thread from within a handler. Inside the handler, pthread_self() and gettid() deliver values that are the same as the ones delivered from within the interrupted thread (but this is told nowhere, and these functions are not asynch-signal-safe). However, not all thread functions behave the same when called from within a thread and a handler. E.g., a handler cannot change the thread signal mask permanently.

Handlers interrupting different threads can run in parallel. Moreover, a handler runs in parallel, with threads other than the interrupted one. This means that a handler can safely update data that are accessed also by the interrupted thread (including operations that need to perform multiple accesses, since a handler is atomic with respect to the interrupted thread), but not data that are accessed by other threads.

Libraries

Library functions that internally need to handle some signal must restore the signal handling when they return. A library function:

In libraries, refrain from setting a handler, because it could disrupt signal handling in other threads.

Unblockable and stray signals

The default disposition of many signals is to terminate the process. A process should then register handlers or create threads for all these signals, or block them if it does not want to be terminated from the outside by them. There is a protection mechanism that makes processes receive signals only from other processes with the same real or effective UID. However, there can exist processes that need a stronger protection (e.g., processes that update important data, whose integrity must be guaranteed). This, unfortunately, cannot be fully achieved, because SIGKILL can never be blocked, but it can be to some extent. Let's call stray signals the ones that are sent by a process to another, but were instead meant to be generated only internally to it.

SIGSEGV, SIGBUS, SIGFPE, and SIGILL must not be blocked because the program behaviour is otherwise undefined, unless they are generated with kill(). Even when they are generated by kill(), there is a chance that they are generated also internally. Therefore, they must never be blocked, which means that either they have the default disposition, which is to terminate the process, or they are caught by a handler. In either cases, another process can send them.

Stop signals always interrupt 15 system calls. SIGSTOP cannot be blocked, and SIGTSTP should normally not be blocked.

Normal code here means statements that are not in blocks supervised by signals, i.e., ordinary thread code, including cleanup handlers. In cleanup handlers, these system calls should not occur, but should they do so, restart them.

One such library can be used in normal code and also in supervised blocks implemented with threads.

Discussion

Signals that are meant to be internal could also be sent by another process (stray signals). A handler, or a thread that waits for a signal, can tell if the signal was originated from within the same process or was sent by another process by executing:

    if ((siginfo->si_code == SI_USER || siginfo->si_code == SI_QUEUE) &&
        siginfo->si_pid != getpid()){
        ... external signal
    }

where siginfo is the argument of the handler or of sigwaitinfo(). The expression evaluates to true if the signal comes from another process. However, if the signal is sent by another process, it has still the effect of interrupting a slow system call if the thread that catches it is suspended in one such call.

N.B.: A process can know if a signal has been sent to it by another, but a thread cannot know what other thread has sent it a signal. (The signals that carry along the pid have the process pid.)

Unfortunately, a handler executed while the interrupted thread was suspended in a slow system call cannot decide to restart or interrupt the call. (First, it does not even know that it has interrupted a system call, and second, registering again the handler with SA_RESTART form within a handler has no effect.) This would allow restarting the call when the signal was external, and interrupting it otherwise, relieving the caller of deciding it. But even so, there are system calls that are never restarted automatically.

There are 28 system calls that are never restarted automatically. They all return with an error value and errno equal to EINTR except sleep() that returns the time left. After a system call, except sigwait() (and the like), it is not possible to test if the signal that interrupted it was external. It is not even possible to know what signal interrupted it, unless some (possibly misleading) indication is left by a handler that has run.

To cater to stray signals, there are the following alternatives:

the handler sets a restart flag when it receives such an external signal, and interruption is checked after each system call to restarting it when the flag is set.
the handler sets a kill flag when the signal must interrupt the actions performed by the thread, and does not set it otherwise (in particular, when the signal is stray). The flag is checked after each system call, and acted upon. When it is not set, and the system call returns with interruption, the call is restarted. This alternative is similar to the previous one.
check interruption after each system call, and restart it when occurred.
accept that processes with the same or real UID can send stray signals, and terminate the process, while the ones that do not match these UIDs cannot send signals (unless privileged).
use the main thread to catch the external signals that we want to discard. The main thread registers a handler for them, then it creates a thread on which all the real processing is done. The handler just returns when the signal is external, and executes the desired actions when it is internal. This relies on the kernel preferring the main thread over the other threads when a signal can be delivered to several ones. It can be implemented swapping the control thread with the main one, i.e., making the main one handle signals and the other do the rest. However, this preference is not documented, much less guaranteed.
much the same as a system call that is interrupted by a stray signal, a library function may fail for the same reason, and its calls must be restarted. This is not practical because it adds to the number of calls to restart (which is already a nuisance).

Threads that wait for signals can discard the stray ones. There are 15 system calls that are interrupted by stop signals (a subset of the 28 ones). These calls must always be restarted. The additional 13 that are always interrupted are: pause(), sigsuspend(), poll(), ppoll(), select(), pselect(), msgrcv(), msgsnd(), clock_nanosleep(), nanosleep(), usleep(), io_getevents() and sleep(). SIGSEGV, etc. can occur at any place in the code, and thus there is a need to decide what to do with these 13 calls.

Solutions for stray SIGSEGV, etc. signals in normal code:

they kill the process: restart unconditionally 15 calls.
they are discarded: restart unconditionally 28 calls, i.e., 13 more.

Signals that are not blocked and that have a handler that terminates the process have no influence on the solutions. In non-supervised blocks, no signal can occur except for SIGSEGV and the like, because the others are blocked (or have no influence). Since these are thread-directed signals, there is no thread that is waiting for them. The first solution needs a handler that kills the process, both in the case of normal code and of supervised blocks.

Let's see what alternatives we have for supervised blocks:

pselect(), etc.: stray signals kill: nothing need be done (stop signals do not interrupt them); stray signals discarded: restart these 4 calls if no kill request is pending.
polling: restart all interrupted calls if no kill request is pending. Note that 15 system calls must be restarted anyway, and the others must be checked for interruption. If stray signals kill the process, all system calls can be restarted here if no kill request is pending, and if they are discarded, the same.
threads: as for normal code.

If stray signals are to be discarded, system calls must also be restarted in normal code so as to be consistent making the program discard them always. In supervised blocks (implemented with signals), interruption must be tested on 28+20 calls (20 are the additional slow ones that are interrupted not setting SA_RESTART in their handler).

For supervised blocks implemented with a thread, in which the supervised event is a signal, the signal handler should normally cancel the thread. The signal must be got by the thread that creates the supervised one, e.g., waiting for it. Then, it can be discarded when stray. Keep in mind that threads that unblock signals can receive stray signals. Thus, either stray signals kill the process, or the thread that unblocks them must handle them.

Thus the decision is between: stray SIGSEGV and the like kill the process = 15 calls to be restarted in normal code, or are discarded = 13 calls more to be restarted. Since killing the process is the normal thing to do with SIGSEGV and the like, then it is pointless to spend the effort to restart 13 calls more. After all, there are already a number of kill-process signals, and four more should not be a problem.

Restarting all system calls that are interrupted (almost all that return EINTR) could be simpler to remember than only some 15 ones. They are many, though (20+28). However, if one does not remember what to restart and what not, it could restart all the system calls that return with interruption.

Let's then face the problem of the other stray signals. Stray signals that are not SIGSEGV and the like can occur in supervised blocks, and they can be only the ones that are expected. In normal code they are blocked, and thus cannot occur, behaving as if they were discarded. Threads that wait for such signals can (and should) easily discard them. In supervised blocks they must then be discarded too:

pselect(), etc.: restart these 4 calls if no kill request is pending, and have handlers that set a kill request when signals are not stray).
polling with signals blocked: we cannot tell if signals are stray because here they are pending, so this solution cannot be used; with signals unblocked, restart all interrupted calls if no kill request is pending (and have handlers as above).
threads: as for normal code.

This problem does not happen with signals that are meant to come also from other processes throughout the whole program.

Note that it is not possible in general to register handlers for these signals that kill the process when the signals are stray because the process behaviour would not be consistent: sometimes it would kill when a stray signal comes, and some other times not. There are then two alternatives:

such signals have a process-wide assignment for what concerns their source: either they can come from anywhere, or they can come only internally from the process. In the first case they are never stray, and therefore nothing need be done in supervised blocks in addition to what is specified above. In the second case, their handlers must set a kill flag when signals are not stray, and system calls restarted when the kill flag is not set.
such signals in some places can come from anywhere, and in some others they can only be internal. Their handlers must be controlled with a flag: when the flag is off, they set the kill flag always, when it is on they set the kill flag only when signals are stray.

Note that technically speaking, there would be no need to test the kill flag when a system call has been interrupted so as to decide to restart it or not. E.g., a system call that is not interrupted by stop signals, in a supervised block for external signals is interrupted only by those signals, and can then be restarted always when interrupted. However, testing the kill flag after interruption on all system calls makes no harm, and it is a simpler rule.

Implementing a new library:

There is no built-in way to detect what signal interrupted a system call (if there would one, then we could test that the interruption was caused by a stop signal and retry, and we could expect existing libraries to have used it, and we could write wrappers for such system calls that make them restart so to be sure that all the calls in the whole program restart. The analysis done before tells to restart always the 15 system calls interrupted by stop signals. Moreover, a library can restart also the 13 additional ones. Alternatively, we could assume that system calls and libraries fail when receiving a stop signal, but this is not nice. A programmer, when implementing a new library, could instrument it, thinking that it could be used in the following contexts:

in normal thread code, with almost all signals blocked: restart unconditionally the 15 (even better, 28) system calls when interrupted (and refrain from using signals altogether).
in ppoll supervised blocks: abort when kill request, otherwise restart.
in polling supervised blocks with signals blocked: abort when a kill request is pending, otherwise restart. With signals unblocked: abort when a kill request is pending, otherwise restart.

In each case, we should provide cleanup handlers. Implementing a library that can be used in all these contexts is cumbersome, as it needs either to define the name of a kill flag, or to pass the codes of the signals that can abort the library function. Normally, only the first is to be followed. General libraries are not instrumented to support supervised blocks implemented with signals.

A general library cannot know that some signals are handled with SA_RESTART, and some not. Thus, it should restart all slow system calls, but this defeats the very purpose of SA_RESTART. I think that it is acceptable for a library to restart only the 28 ones. SA_RESTART allows to test only 28 calls upon stray SIGSEGV instead of 28+20. This is consistent with what the user code should do. Let's say that a library that restarts the 28 ones can also be used in an application that wants to ignore stray signals (and that consistently restarts the 28 calls itself), as well as in applications that want stray signals to abort.

Cost of signal functions

This is the approximate cost of signal functions, measured on an Athlon 64 X2 4200, 2.2 Ghz, both in terms of absolute execution time, and number of units, where a unit is the time taken by a memory-to-memory copy:

signal(), sigset(), sigvec(), sigpause(), siginterrupt() (which has the same purpose of SA_RESTART), etc., are old functions, not to be used any more.

Debugging aids

To debug signals, psignal() can be used to display a signal. Standard I/O can be used, but at the risk to making the process abort.

The main program

A main program (the main() function of a process) that does not change the default disposition of signals can be terminated by many signals that are sent by other processes. When this is not the desired behaviour, signals must be blocked, ignored, or handled.

This is the suggested scheme to cope with it. Basically, most signals are blocked throughout the program, and unblocked only when necessary, or are handled by dedicated threads.

Killing applications, processes, threads, and tasks

There are a number of cases in which a course of actions need be prematurely terminated (killed):

It has often been said that killing processes is not a good practice, but when there is something wrong with a process, the only one thing to do is to kill it, hoping that it has damaged the other processes and data as little as possible. After killing, a verify/repair program can be run to mend damage. Not killing does not solve the problem: the only alternative to kill a process is to reboot the system, unless the process keeps quiet and does no harm.

Graceful killing and forced killing are two opposite concepts. However, both are needed. The former allows stopping something, keeping all data and other resources consistent, and therefore is the preferred one. The latter is the emergency one, which may require some repair to be done, but is anyway better than rebooting.

In Linux, forced killing is fully supported only on processes. I.e., a process can kill another unconditionally, or, even better, can request another to perform graceful termination, and, if it does not, forcibly kill it. Thread killing instead is either collaborative or imperative. There is no way to try the soft one first, and then the hard one. Moreover, a thread can disable cancellation. This means that, when there are parallel activities in a system that can execute a possibly unreliable program, they should be implemented as processes. E.g., some Web browsers open new tabs or windows using processes, so as to be able to kill them, should they get stuck in some plugin or applet. Threads can also run code that is provided dynamically by using dynamic libraries, and would then be fault compartments if they could be forcibly cancelled. Unfortunately, it is not so, and one reason could be that threads share data. Thus, forcibly killing one of them is likely to place the process in an inconsistent state.

Killing implies a temporal relationship between the killer and its victim. With long-standing victims, this is not a problem. E.g., a process that enters an endless loop and stops producing results can be detected by measuring the production rate (and seeing it zero), and then killing. With endless applications such as servers (daemons), there is no problem, either. In other cases, there could be a need to provide some means to make sure that killing at least does not kill something else.

Promptness of killing could be termed kill latency. This is the maximum time elapsed between a kill request and when when a request is honored, like, e.g., the time between you push the brake pedal and when the car stops. I have never seen strong requirements for very low kill latencies (e.g., lower than a second). Perhaps the lowest latency is needed when SIGPWR occurs, which could be due to switching over power supply to a UPS, or the battery of a laptop running low.

The killer of a process or an application is a process (possibly an interactive shell). The killer of a thread or a task is (possibly another thread of) its process. Process control is done by having a process that starts, monitors, and kills or restarts other processes.

When killing, some error conditions might be encountered during cleanup. It is much better to continue it rather than terminating the process, because, in so doing, there is a chance to restore properly the application, process, thread, or task state.

Applications, processes, threads, and tasks are objects ranging from the largest to the smallest, in increasing level of granularity. The highest is the task (shortest code), but the thread one is appropriate for most of the cases, except perhaps for time supervision. Very seldom does a short sequence of actions need to be killed.

Needless to say, applications, processes, etc., need be designed, taking killing into account, in order to be killed.

Inconsistent states

When a process terminates, Linux performs some cleanup on the system objects accessed by the process. Note that a process terminates because it aborts spontaneously or is forcibly killed (there being no way to perform graceful kill when processes contain endless loops), or calls exit().

Note that when a process that holds a semaphore lock aborts (or is forcibly killed), the lock is not released. Aside from the inconsistent states due to system objects, there are also ones due to application objects.

Killing an application

When the application is made of a father process that creates all the others, killing the application is the same as killing the father process. When it is made by a set of unrelated processes, there is a need to tally all of them and kill them. The kill request is sent to a process in the application that shuts down all the others in an orderly fashion. The order in which processes are killed depends on the application. In general, there are processes that can be killed independently from others (and thus simultaneously), and others that must be killed before others. This makes up the shutdown graph. The main constraint on it is to maintain consistency of what is external to the application, and not to cause any problem in the application itself. E.g., an application that is made of a pipeline of processes should shut down by killing first the input stage of the pipeline rather than the output stage.

Killing a process

Process killing depends on what the process does. In any case, a control thread has to be provided that waits for a kill signal:

Resetting the interprocess state

This applies when there is a known and safe (from races) way to reset the interprocess state. The scheme is:

General process kill

The process to kill can be either a child or an unrelated process. Processes that were created (as children) can be re-parented, becoming then unrelated.

To kill a process, first, graceful kill must be attempted, and then forced kill.

The processes that provide graceful kill must be killed using the means they provide, which could be sending a signal, or a message, or any other means of interprocess communication. The scheme described here makes use of a SIGINT signal (but any other signal can be used, instead). If the means to kill a process is not documented, you can try sending it a signal such as SIGINT, SIGTERM, SIGQUIT, SIGABRT, or at worst SIGKILL, or even all of them in sequence.

Graceful process kill is done by cancelling its main thread, which in turn undoes what can be undone, and in particular cancels the threads that are alive at that point (and that have something to clean up), and the created processes, too. However, a process kill function that retrieves all threads and cancels them would be handy to spare to reckon them. It could be used in simple programs as a convenience function, when threads can be cancelled in any order. But this needs to retrieve all the threads of a process, which is not possible. Moreover, if threads can be killed in any order, then probably they have little to clean up, and in such a case there is no need to kill them: they disappear at process exit. A similar reasoning applies to child processes.

Scheme of killer

In this example, the victim allows itself to be killed with a signal sig. Moreover, it is a related process, and thus must be waited for. Waiting for its completion is done with waitpid(). This also allows detection of its termination. If it were an unrelated process, it could still be killed by sending it a signal, but its termination must be detected, either probing it with kill(pid,0) or waiting for some reply. When several children have been created, they can be killed simultaneously. See here.

Scheme of victim

If the victim has several children to kill, it can kill all of them and then wait. Suppose there are two children: a quick and a slow one. If it waits for the quick first, and then for the slow, it will lose no time, and the same if it waits for the slow first. However, if the waited one does not terminate, all will be blocked. The technique is then to make a non-blocking waitpid() call for each process and loop over all the processes to kill, noting the ones that have been killed, and allowing a few spare runs, after which forcibly kill.

When a process receives from another (i.e., a shell script) a request to kill, and that process has children, the forced killing of children could be left to the overall killing of the victim. However, forcibly killing a child in place (i.e., in the cleanup handler) allows a more accurate control over the time to wait for its graceful killing with respect to an overall timeout. Moreover, we have to poll its termination anyway so as not to block cleanup. This means that forcibly killing it locally is not an extra cost. Additionally, it allows graceful kill to proceed.

To honor a kill request with cancellation, a thread must be created at the beginning of process execution, that waits for the kill signal (or any other interprocess communication means). Cancellation could also be started from within a signal handler that the main thread registers for it, and that after executing a pthread_cancel() executes a pthread_testcancel(), but this does not allow to forcibly kill the process when the cancellation fails. Note that a process cannot issue a cancellation request to another: it can send a signal, or a message. It is possible to register an handler for the main thread that executes a pthread_kill(), but a handler has all too many restrictions.

When forking off a new process, unblock SIGINT in the child if a process is being created that does not adhere to this scheme (i.e., it does not have a thread that handles SIGINT). Note that when a process forks another, it cannot be sure about what signals are blocked at the time the fork is executed. Therefore, it should initialize the signal mask in the child (it may execute an executable that does not initialize the mask when started).

If a pthread_cleanup_push() is executed, and then a fork() (which is the case here, for example), the child inherits the cleanup handler stack, but has no means to pop the cleanup handler. This can only be solved by calling an exec() after the fork(). Do not mix cancellation and creation of clones. The separation of process creation between forking and exec-ing serves only to allow to set a few things in between, such as unblocking signals, redirecting files, etc. A fork() must almost always be followed by an exec(). The program stretch in between is also a minefield much the same as a signal handler.

When a process is killed, and it knows that it has sent messages to queues that support purging of pending messages (which is a nice feature), the process can purge them as cleanup action. But not all that is done can be undone. Even though, a victim can send messages to the processes it is interworking with to inform them that it is quitting. Such processes can then cleanup the pending transactions they have with the victim. Moreover, as a general rule, processes that are interworking with others must be prepared to detect the disappearance of their partners, and to perform the necessary cleanup of pending transactions, such as discarding messages that have been sent by a process that is quitting. In order to do that, heartbeat messages must be passed between interworking processes (or some process monitor used).

Discussion

Process groups

If we forcibly kill a process, and it has children that are still running, they become zombies. Better to forcibly kill them. There are no built-in means to send a signal to all of them. Note that also nephews must be killed, which means that we must reconstruct the whole process tree, unless we state that a process that wants to be forcibly killed must put all its children in some dedicated group (e.g., a new process group). It is possible for a process to place all its children into a same process group. This could be a solution, but it would not be general. E.g., a library that is called could create processes in another group. A process group is not the same as the group of children: processes can change their process group, and shells put all processes created with a pipe command in the same group. Scanning the /proc filesystem allows to find all the descendants of a process. Note that this is needed only when forcibly killing because with graceful killing a process knows what direct children are alive, and then kills them (and they in turn kill their children). Now, we have two solutions: sending a signal (i.e., kill(0,sig)) to the process group (which is not perfect because the process group could contain processes other than its descendants, and some descendants could not belong to the process group of the ancestor), and killing all descendants (which is not perfect either because new children can be created in the meantime, unless perhaps we scan the /proc until no descendants exist). But process groups had been invented with the purpose to allow to control processes, i.e., to send signals to groups of them. The problem is not much of shells putting processes created in pipes in a same group (the programmer knows that and if it does not like it can use other means than pipes), it is children that can change their group. A process can set its process group (or set the one of its descendants) to one that does not exist (practically, to its pid), or to one that is in the same session. This is mainly used by shells, and can be used by processes too to set up groups of processes that can be killed (or stopped) with one single operation. It should then be the canonical way to kill children. Sending a signal to a process group is also more "atomic" than sending a signal to each descendant. Processes can decide to put children in a dedicated group with the purpose to kill all of them sending a signal to them with a single operation. Note also that the control thread sets the disposition of SIGCHLD to SIG_IGN so as not to have zombies. It is indeed impossible for it to wait for the completion of processes in the same group that are not direct children (waitpid() waits only for the direct ones).

In theory, when a child has been killed, the process or other children could create further children, making killing never end. However, this is very unlikely to happen, unless the process monitors its children and recreates them when they disappear. But then such a process should not do that in its cleanup handlers.

Fork

The main thread creates first the control thread. Later, it could execute a fork(), not immediately followed by an exec(). This would make a copy of the process address space when the control thread is running, and possibly at a point in time in which the values in it are inconsistent. In particular, a printf() could have filled a buffer and not yet flushed it. In order to avoid races, there is then a need for the main thread to wait until the control thread is in a safe state. Note that atfork() handlers help only in avoiding deadlocks, but not in other kinds of races. Note also that when it is not possible to make threads reach a safe state, only asynch-signal-safe functions can be called between fork() and exec(). Moreover, ensuring that the control thread reached the point in which it waits for signals guarantees that the kill signal is accepted and served when the main thread starts to execute its actions.

Exit

When we want to kill a process, what is important is the state of the objects that the process shares with other related, and unrelated processes. Its internal data and its threads are not much important. They become important if they access such objects. If exit() first stopped all threads and then called the exit handlers, then there should be no need to cancel threads: each module would register an exit handler that performs interprocess cleanup. Alternatively, the cleanup handlers could perform it. Note that since threads are not stopped at exit() before the exit handlers are called, then they need be stopped in the exit handlers so as to prevent them to change the interprocess state. But this means to cancel them. The exit_group() system call terminates all threads, and is executed when an exit() is done. The exit handlers are executed before the threads are terminated (they are executed before all other actions such as file closing, flushing, etc.). Killing threads from within exit handlers is not simple: it would require to know the threads of the process, which is not possible unless the process reckons them. Exit handlers have also the restriction that they can be registered, but not de-registered. Again, this means that when there is a simple, known way to restore the interprocess state, and threads do not interfere with such restoration this can be done in the control thread or in an exit handler, otherwise, graceful killing must be done (which could be more lengthy because it would terminate threads and restore the intraprocess state, that in some cases has little impact on the interprocess state).

The killer has to time supervise killing if the victim does not provide a guaranteed termination, which means always since it can seldom be sure about the victim. However, a process that has guaranteed termination can be called with kill -SIGINT, while one that does not needs to be killed using a program, or with kill -SIGINT and then kill -9. A process that kills another might need to get a reply to know that the other quit. Many scripts wait a long time not knowing for sure that the victim quit, polling its termination, not knowing how much to wait, or fearing to wait insufficiently.

Reply

To synchronize killers with victims termination many means can be used. It is possible to implement an IPC with signals, better the realtime ones, that allow to carry also some data. Signals have the advantage that they need not create a kernel object (and to destroy it, so as not to leave it in the system). Unnamed semaphores can be used only between related processes, and need a shared memory object. Signals can be used as semaphores among processes: multiple posts are allowed (with a limit: the current number of signals that sigqueue() can queue is 16K per user, but can be changed with setrtlimit(RLIMIT_SIGPENDING)), wait is supported (also with time supervision, and waiting on several signals), and probing the presence of messages (sigpending()), and they allow also to carry data. Of course, you can have as many semaphores as you want (but their names are system-wide), while there are a fixed number of signals, but they are process-wide. Among threads there are many synchronization means: semaphores, mutexes, barriers, conditions, etc. so that there is no need to use signals. In the scheme above, there is no need to define handlers for those signals because the main thread blocks them at the process level, and a thread (the control one) waits for them. There is also no need for a loop to retry sigwaitinfo() if we use a thread to wait for these signals (which there is in a single threaded program that accepts several signals. Thus, the killer thread can send a reply to the process that sent the SIGINT (or whatever other kill signal).

Abnormal termination

When a process terminates abnormally (e.g., because of a synchronous signal), the exit handlers are not executed. This is a pity because when a process terminates normally it has a chance to cleanup the interprocess state, and thus it does not need to register exit handlers (although they are handy to use also there). Exit handlers would be much more handy in abnormal exit. Abnormal termination is done raising signals. It is then sufficient to register signal handlers for them. The signals are: SIGBUS, SIGILL, SIGEMT, SIGIOT, SIGSEGV, SIGSTKFLT, SIGSYS, SIGTRAP. These signals do not interrupt system calls and make it return with an error code. In general, a program can detect the arrival of a signal catching EINTR error returns, but these signals do not make calls return that way. Their signal handler raises a signal for the control thread (so as to kill gracefully the process) and then pauses (so as to prevent the offending thread to continue possibly causing other signals to be generated or damaging the process state). SIGABRT by default terminates abnormally the process, but it is not caused by the violation of a condition (it is caused by abort()) and it is delivered to the process, while the others are delivered to the offending thread. It is then caught by the control thread. The (synchronous) signals above can also be sent by a process to another (with kill()), which is not nice. Thus, the signal handler discards the signal when sent by another process.

One such signals can be generated from within a cleanup handler. Is it possible to know that we are inside a cleanup handler so as not to call again cancellation and redo the very same actions that generated the signal. When one such signal occurs in the non-cleanup code of a thread, it triggers graceful kill (since it is not possible to recover locally). The signal handler sets a flag to note that one such signals has already been caught (and process kill initiated). This means that the signal has been generated from cleanup code. The next time one of these signals is caught by the signal handler, it triggers forced process kill.

Note that process kill does not make use of exit handlers (the ones registered with atexit()). They are more oriented to normal execution rather than killing. They spare programmers from remembering to perform actions at every exit(), which could be called also within some library, thus giving users no chances to call such actions (when an error occurs some libraries make an exit()). Moreover, they cannot be de-registered.

Naive killing

A process could be killed registering a handler that executes a pthread_cancel() followed by a pthread_testcancel(). They are not in the list of asynch-signal-safe functions, but can be called all the same from within a signal handler. However, this is quite naive because it does not allow to perform forced cancellation when the graceful one does not succeed.

Killing a thread

Killing a thread can occur during process killing, or as part of the normal operation of a process (e.g., a process that starts a number of threads to make faster an operation using parallelism, like, e.g., a search, can kill threads once the operation has completed, or, e.g., an application with an user interface can use a thread to execute some user requests, that the user could kill). In the latter case, the program should be prepared to cope with killed threads.

A thread can (try to) kill another by executing a pthread_cancel(). The victim:

The victim can enable, disable and change its cancellation type dynamically. It could disable cancellation during initialization, enable it as deferred thereafter, and enable it as asynchronous in long stretches of computation. A thread that performs only computation and does not change shared data can terminate gracefully with asynchronous cancellation.

The difference between asynchronous and deferred cancellations is that the programmer is sure that with the deferred one, a thread is not interrupted between calls, and thus it can update data without protecting the operation.

Scheme of victims

When a thread changes the system or program state (e.g., it allocates a resource) just before a cancellation point, and should restore that state before the thread is cancelled, it must push a cleanup handler before the cancellation point. When a thread has restored the changed state, it must pop the cleanup handler. This is the paradigm:

Changes to the system or program that must be undone are the ones that take the system or program in an inconsistent state, like, e.g., allocation of resources such as files or memory (that cause leaks if not released), partial updates of data, open communications with other programs, etc. Remember that the whole purpose of handling properly killing is to keep the system consistent.

It is difficult for a programmer to remember what functions contain cancellation points. This means that once the state has changed (e.g., a resource has been acquired), and then the program calls system or user functions, the programmer (not knowing if the called functions contain cancellation points), has to set up a cleanup handler. He could also remember the state change (e.g., resource acquired) in some data structure so as to make the cleanup handler release it parametrically. Note also that the called functions can evolve during time, and in a later version contain a cancellation point. This means that functions are potential cancellation points and must be considered as such. Enclosing calls in cleanup handlers push/pop pairs ensures that proper cleanup will be done, should the called functions contain cancellation points or otherwise. Cancellation works if at least all system calls that contain blocking points are cancellation points. This is actually the case (except some few exceptions, see below).

E.g., suppose there is a process that locks a semaphore, does some operations, and then unlocks it:

To kill it, nothing must be done if the cancellation request occurs before or during sem_wait(), but the semaphore must be released if it occurs after it and before sem_post(). Of course, in this simple example this does not happen, but if one of the actions contains a cancellation point this can occur. The solution is:

The last two statements can be also swapped. As soon as the state is restored (e.g., resource released), the cleanup handler must be popped. Popping and releasing can be done in any order, unless releasing is blocking, in which case it must be done before popping.

Note that when a system call is interrupted by a signal that lets the program continue, the state does not change, and no cleanup handler must be pushed. This can be detected by testing the EINTR error.

System calls can also be interrupted because they blocked the thread and a cancellation request occurred. As far as I know, there are no means in a cleanup handler to detect if it has been called because of the interruption of a system call or because of a pthread_testcancel().

When cancellability is enabled, pay attention when inserting statements that may contain cancellation points, such as trace statements. A solution could be to disable cancellation in the program parts in which they are used. Another is disabling cancellation in them.

On the other hand, it is convenient to exploit the fact that deferred cancellation does not interrupt a function anywhere: there are parts that need not be protected (either setting a cleanup handler or disabling cancellation).

In order to make a thread responsive to cancellation requests, make sure that in long program stretches there are cancellation points. Some system functions are cancellation points, and some others might be. When in such stretches only the latter are called, insert calls to pthread_testcancel().

Pay attention to the scope of variables: do take into account that cleanup push and pop are a block. Variables that are (re)defined in them are not visible outside. E.g.,

Cancellation of interleaved state changes

the simplest way is to enclose all this in a cleanup handler that releases what resources are actually allocated, reckoning them in some data structure.

Cancellation of condition variables

Calls to wait for condition variables, pthread_cond_wait() and pthread_cond_timedwait(), lock the mutex when they are cancelled. There is a need to register a handler that unlocks it if the application needs to use that mutex again. E.g.,:

Cancellation of mutexes

Cancellation of barriers

POSIX barriers cannot be cancelled. When there is a need to cancel a thread that could wait on a barrier, the barrier must be implemented with semaphores. This is an implementation:

Cancellation of thread join

A thread that has registered a cleanup handler and is cancelled when joining another, e.g.,:

needs to join the victim in the cleanup handler in order to terminate it properly, and not to leave a zombie thread around.

It is true that detaching it would make its cancellation faster when also the creator is being cancelled, but normally cancellation is fast, and thus joining the cancelled thread is not so a big loss of time. Detaching has a problem: when the main thread exits, all its threads are terminated at once. But then, when one detaches a thread it must be quite sure that the thread terminates well before the main one.

Cancellation of thread creation

Cancellation of process creation

To cancel a child process, a signal must be sent to it, and then its termination waited for. If that does not happen within a defined amount of time, the process must be forcibly terminated. The amount of time to wait depends on what the process is appointed to do.

Suicide

A thread kills itself executing pthread_exit(), which runs all the cleanup handlers. If it has to kill other threads, it must do it before suiciding, and if it wants to hand its cancellation to another thread, it must then pause() instead of exiting.

Cancellation in C++

A thread that receives a cancellation request (from itself or from another thread) can handle it using exception handling:

When a cancellation request is sent to the thread, and a catch-all handler is present, the try block is exited, and the destructors of the objects in its scope called (e.g., MyObject m above); then the handler is executed. The handler must end throwing again the exception. Then the cleanup handlers are executed, if any. I.e. it is as if exception handlers were cleanup handlers. Note that cancellation exceptions terminate threads, i.e., there is no way to recover them and keep threads going on.

Calling pthread_exit() has the same effect as cancellation (except for the value returned by the thread).

Protecting from cancellation

When deferred cancellation is enabled, code that executes only computation, including calls to functions that do not contain cancellation points, is not interrupted by cancellation requests. Remembering what system calls and also system library calls are cancellation points is not simple. The OpenGroup Base Specification enlists a number of calls which are cancellation points (at least in some cases), and a number of others that can be. All of them must be taken as cancellation points. This standard states that implementations must not introduce cancellation points in any other function specified in the standard. To stay on the safe side, it is better to check each function used reading its man pages.

Some synchronization functions, like, e.g., pthread_mutex_lock(), are not cancellation points. They can be used to perform operations that are not interrupted by other threads and also by cancellation (i.e., critical sections). When cancellable calls should be used inside one such critical section, the whole section must be protected disabling cancellation. Note that deferred cancellation leaves to threads the responsibility to honor cancellation requests, and threads should do it as soon as they can, avoiding getting blocked indefinitely or even for long amounts of time. The possibility for threads to use critical sections that are not cancelled in the middle allows them to update safely data (or to perform entirely some sequence of actions) thus preserving consistency of the program state.

To protect a piece of code that contains calls that are cancellation points (or are suspected to be), cancellation can be disabled:

When a function is provided that wraps a library one (e.g., a malloc that performs some additional checks), pay attention to honor the specification of the wrapped one for what concerns cancellability.

This allows to block killing, then do whatever operation we want without bothering to be killed, and then restore the previous state. Note, however, that in stretches of code in which calls that are not cancellation points are used, there is no need to use protection. But sometimes one does not know if a library function contains cancellation points. Suppose you want to implement a sort function and use threads to parallelize it, and use some synchronization calls that are basically blocking, but that used in such a place it would block very little. To make it safe, you could either handle cancellation, or disable it.

pthread_setcancelstate(state,oldstate) sets the new state and atomically returns the previous one. However, atomicity seems not needed here. If it were not atomic, then a thread could first read the current state, then set the new. Ideally, it could be interrupted between the two. If asynchronous cancellation is enabled it can be interrupted, but then the thread is terminated (running cleanup handlers, if any). It seems more aimed to have a unique point in which cancellation is disabled, and another in which it is restored. It could be just an issue of convenience: an additional parameter to a set function is less costly then a get function, and it is also easier to use since the getting of the value followed by the setting of a new one is frequent. Atomicity is needed when something could occur between reading the existing value and setting a new one, like, e.g., the implementation of mutexes with fetch-and-add atomic operations.

Cleanup handlers

During the execution of cleanup handlers, cancellation is disabled, and cannot be enabled (thus pay attention not to enter endless waiting). However, consider that killer and victim threads are cooperating, and then a killer should not issue two cancellation requests to the same thread (the second has no effect, and denotes poor coordination). Once a thread has started executing cleanup handlers, it becomes unresponsive to cancellation requests.

Cleanup handlers are not executed when exit() and _exit() are executed (and neither are per-thread data destructors). This is so because they are devoted to terminate processes. Objects that a process uses to communicate with other processes can be cleaned up with atexit().

Notes

Scheme of killers

The choices for a creator thread to synchronize with a created one to get its results are:

Cancellation does not change the detachable or joinable status of threads (unless a thread changes that itself), and therefore the killer has to wait for the victims as it does normally when it does not cancel them. The best alternative when cancellation is done depends then on what the thread does.

A killer and a victim proceed in parallel, and a victim could terminate independently from a killer. However, there are no races when killing is done because a thread that creates another kills it before joining or synchronizing, or at any time if it runs forever. There must not be threads that end spontaneously without notice.

After thread cancellation, victims continue to run, possibly executing some code before they reach a cancellation point, or executing cleanup handlers. But the killer wanted to kill them for some purpose. How can it know when its purpose has been achieved? The purpose could be to make the victims stop changing shared data, or consuming CPU, or memory:

Treads are closely related objects. Often, killing a created thread does not only require to send it a cancellation request to it, but also to adjust the rest of the program to do without it. E.g., if there are 4 threads that must execute a parallel search for some result, and then wait on a barrier, killing one requires to adjust the parameter of the barrier.

Thread cancellation is done without error checking because the thread could have terminated in the meantime, and cancelling a terminated thread returns an error. The registration of the cleanup handler is done before the creation of the thread because the thread can execute and even cancel its creator. If the registration of the cleanup handler were done after creation, there would be no cleanup handler to execute.

When there are several threads that must be cancelled, the killer could cancel them all, and then wait for all of them. When a program piece creates a variable number of threads and wants to cancel them, it can have a cleanup handler that receives an array of tid's as argument and cancels the ones pointed to by it, and then waits for them all with pthread_join() (that would not sequentialize cancellation and waiting for each thread). The program piece can reckon the thread tid's in an array. The assignment of the return value of a pthread_create() call is done unless the function returns an error (it is not a cancellation point). This means that it is safe to reckon tid's. The scheme for killing threads in parallel and joining them is:

Note that thread cancellation is done without checking errors. This is so because when the cleanup handler is called, some of the threads might have terminated. Cancelling a terminated threads returns an error. However, attempting a cancellation is simpler than reckoning what threads are alive.

Killing a task

A task is supervised block. Supervised blocks are blocks of code in which a signal can occur, that must be handled terminating the block and cleaning up the program state. In languages such as C++ and Java, supervised blocks are bracketed statements that are made of a body, and a number of sections (exception handlers) to which control is implicitly (and sometimes explicitly) transferred when an event occurs. After executing them, the block is terminated. In C there is no such construct, and therefore, events are handled by checking their occurrence after any statement on which they can occur, or jumping at the end of the block, or cancelling the current thread when it coincides with the task.

A supervised block is meaningful when the supervised event can be generated during the execution of some actions occurring in the block, not with a signal that can occur at any time, even before or after the block, except for the case in which we want to protect the block so as to ensure that it performs a transaction (in which case when it is killed and it cannot complete the transaction, it rolls back). A means to ensure that a block is executed entirely is to block signals or to disable cancellation. A process that wants to support graceful kill must enclose all the code in a supervised block (which is done by default, if cancellability is used). As an example, consider time supervision, in which the supervised block starts exactly when a timer is started, and ends when it is stopped.

This is the scheme of a supervised block with some calls. E.g., a sequence of resources is allocated and the same released in reverse ordering, releasing that acts also as cleanup code (when a kill request is detected, control is transferred to the proper place in the cleanup code). Cleanup is protected against interruption, be it executed as the last part of normal operation, or because of killing:

Solution 1 can be used only when there are no slow system calls other than the four ones above. Solution 2 implies polling, solution 3 is not general and should be avoided, and solution 4 is expensive. When solutions 1, 2 and 3, cannot be used, solution 4 is the only applicable one.

Solutions 1 and 2 can be used only for time supervision, and other internal signals, and not for synchronous signals (e.g., stack overflow) since they cannot be handled testing them at the next slow syscall aborted, but only terminating the block (e.g., cancelling the thread). Solution 3 can be used for all signals, and solution 4 for any kind of events.

Cleaning up means releasing allocated resources, and undoing what else can be undone. There is then a need to reckon what resources have been allocated at any point in time during the execution of a supervised block. Knowing what resources a process has at any one point in time is in general a good idea: it allows it to tell how much it is using and what. This can be done e.g., keeping a list of open files, etc. Note that in /proc/xxx files there is a list of some resources, but not all. E.g., there is no list of locked semaphores, and of course there are no resources enlisted that are not Linux objects.

When supervised blocks that need a signal handler (albeit empty) are used, make sure that the handler is agreed among all threads that need to handle the very same signal, i.e., that there are no two threads needing a different handler.

When supervised blocks are implemented with signals, and the originator of the signal sends process directed signals (e.g., a timer), make sure that there is only one thread at a time in one such supervised block, or that different signals are used. E.g., if there are several threads that need to do something when a signal occurs, they all cannot catch it because only one will get it. Of course, it is possible to have a single thread that accepts that signal and re-sends it to all other threads (but getting all their tid's is not simple, so, there would be a need for a sort of registration for the threads that want to receive it). Note that this needs to use another signal because it is not possible to bounce back the same one.

In supervised blocks that are implemented with signals that are generated internally (e.g., by timers), those signals must be removed if any is pending when the supervision starts (so as not to kill the block should one such signal be pending while there should be none), and also when it terminates (so as not to cause problems to what follows, like, e.g., interrupt system calls should the signal be generated when exiting the block). For signals that are not generated internally, there is no problem because such signals can come at any time, and it makes no difference if they occur immediately before the supervision, or just after.

To allow killers to identify uniquely the instances of tasks to kill, tasks can be numbered (and task numbers recycled with a long recycle time), and kill requests accompanied by task numbers. This allows to protect against killing tasks implemented with threads, that have reused the tid of terminated threads.

Pselect(), ppoll(), etc.

Having a signal handler is essential because ppoll() unblocks signals while waiting, and if a signal has the default disposition, it can terminate the process. In the code above, pselect() can be used with the same scheme.

If the convention to keep signals blocked (almost all) in non-supervised (normal) code is followed, there is no need to block the signals and restore the mask upon entry and exit of blocks. However, doing is does no harm.

Polling

We must test the kill flag often enough to have a responsive kill. This means testing when polling, and if there are long computations, testing it in them too.

N.B. blocking signals in cleanup code does not cause any race with a signal that occurs right before blocking it: cleanup is reached because one such signal has already been caught.

Unblocking is needed because the code outside supervised blocks has normally signals blocked.

Task as thread

Here is the scheme of supervised block in which a task is implemented as a thread, and the supervision is a time supervision:

The thread that defers the task uses a semaphore as timer. This has the advantage that it can easily be created and destroyed, and it can be stopped by the task when it has ended before the time has expired.

This scheme contains a convenience function that can be used all the times there is a need to time supervise a sequence of actions: they can simply be put in a function, that is called. A similar technique can be used when the event to supervise is a different one. This example supports also exception handling.

If the supervision is not a time supervision, after having created the task thread, the creator has to wait for whatever event it must, and when it occurs, cancel the task thread. Note that the creator must use a system call that waits for two events to occur: the supervision one and the cancellation of the supervision made by the task thread.

The time supervision is started here by a thread that is not the one that executes the actions. This could be less precise than starting it from within the latter. However, that is not simple, and it is not precise anyway.

Preventing killing

To implement properly killing, either we instrument all the places in which a resource is allocated so as to treat killing, or we block killing where we do not want to treat it.

In library functions (and in object methods), killability could at most be disabled, but never enabled: the caller could want not to be killable.

General cleanup function

The use of a general cleanup function is possible for process, thread and task kill. It can be used when killing consists only in simple operations like, e.g., releasing resources. A table of allocated resources can be kept. They are system resources, but could also be process-wide resources contained in some shared segment. The table would indicate for each entry the kind of resource, and for the user ones, it would also contain a function pointer to the release function. A thread would register into the table any new resource acquired, and remove from it any released. The cleanup code would then scan the table and release all the resources in it. This can be used globally (i.e., for an entire thread or process), and also locally (for a task).

However, there are cases in which simply releasing is not enough (e.g., to close some connection there would be a need to send messages). This can be solved by making the code that executes actions that need to be undone provide a function to do it, that needs to be registered too. This allows also to preserve information hiding (which would not be the case if the general cleanup function accessed the internals of all modules that need cleanup).

With all kill paradigms it is possible to reckon resources allocated, and treat kill at the end of a block, Note that when the program detects a kill request, it must not continue, but jump to the cleanup code. Continuing could lead to unpredictable results, like, e.g., blocking (which should not happen with persistent events, but it could with simulated persistent signals: to kill a sequence of blocking system calls a sequence of signals must be sent until an acknowledge comes). Note that continuing does not occur when cancellation is used because when a cancellation request is acted, the thread jumps to cleanup handlers as soon as it encounters the first cancellation point, and does not continue any more with the code that the thread was executing before cancellation.

Discussion

Killing is an action that terminates an application, a process, a thread, a task, usually performed by another process or thread, and sometimes also by the same one. It has to be performed in such a way as to preserve system integrity, i.e., not to cause memory leaks, cluttering of system tables, generating unavailability of resources, etc. Since it is difficult, if not impossible, to guarantee this while aborting processes and threads at a random point in time during their execution, killing is usually performed with the cooperation of the victims (agreed resignation, graceful killing).

The basic properties of any killing scheme are:

that it allows the cleanup code to know what to do, either because of the place in the program from which cleanup has been called, or because some data can tell it
to support nested sections of code that need cleanup
to support non-killable sections
not to break any programming rule (such as asynch-signal-safety)
to put no or little penalty on the normal execution

These are the basic paradigms that are used to implement graceful killing of a sequence of operations:

testing often a kill flag, and when seeing it true performing the appropriate cleanup actions. In general, this can be used only in OSs that provide signals that are persistent (i.e., that when occurred raise a flag, that is cleared when they are handled), and that have system calls that terminate both either when a signal occurred upon entry, or occurs when they are executing. This can be used also in OSs that do not support persistent signals, providing that time supervised or non-blocking calls are always used instead of their blocking variants. Scheme:
```
        
        { // doit
            { // cleanup
                ... operation
                if (killed) goto cleanup;
                ... operation
                if (killed) goto cleanup;
                break doit;
            } // cleanup
            cleanup:
        } // doit
        doit:
```
Note that cleanup code can then fall through, in which case killing is restricted to the operations contained in the block, or it can return to the enclosing block, possibly unwinding up to the main one of the process. The sending of the kill request can be done in several ways (raising the kill flag, generating signals, etc.).
enclosing statements in a block in which supervision is enabled. This has to be done all the times specific cleanup actions need be performed. In blocks that contain long computations without calls to system functions, tests of kill requests must be inserted. Pseudocode:
```
        
        begin_kill
            ... operation
            ...
            ... operation
        on_kill
            ... cleanup
        end
```
The difference with the previous one is that it has no explicit statements to check if a kill has been requested (except when there are long computations). Depending on the implementation, dedicated syntactic constructs or system calls are used. The former have the advantage that cleanup actions are close to the ones to cleanup; and do not need to be put in some dedicated place or function. This kind of supervised blocks are provided by the longjmp scheme (that cannot be used in general with signals, however), by thread cancellation (that performs unwinding), and by the try blocks of C++ (that cannot be used with asynchronous signals).
when there are several operations that need to be cleaned up, and the cleanup of each of them consists in a simple undo, the program can be simplified by reckoning what operations have been done (e.g., resources acquired), enclosing all of them in a block and use a general (parametric) cleanup function. Note that with the first paradigm (the kill flag), cleanup can be done "inline" at each test, or at the end of the block.

These paradigms can be implemented using:

self killing, which requires to use only time supervised or non-blocking system calls, and to test kill requests. A signal can be used, whose signal handler raises a flag, the victim tests it and performs cleanup. This is not a general solution.
a thread that simulates persistent signals (the shooter, see below), which requests to test kill requests.
thread cancellation (which is a form of supervised block, implemented with a kill flag).

Pros/cons of these solutions:

to kill a process: graceful and forced killing is possible. It can be requested with a signal. Except for self killing, there is a need for a thread that requests the main one to kill (either to send it signals or a cancel request) because cleanup cannot be performed in a signal handler.
- self killing: see above
- shooter: blocks on pthread_join(), needs a thread and a timer
- cancellation: needs a thread
to kill a thread, graceful killing possible, forced killing impossible:
- self killing: see above
- shooter: kills less than cancellation and is complex
- cancellation: the best
to kill a task, graceful killing possible, forced killing impossible:
- self killing: see above
- shooter: kills less than cancellation, is complex and can misfire, but does not require to put the task in a dedicated thread
- cancellation: requires to create a thread for the task, reusing it, or creating it for long tasks only (i.e., no killing of quick tasks), but it is the best all the same

Cancellation kills some system calls such as pthread_join() that are not killed by signals.

Deferred cancellation allows to implement (graceful) killing because it does not interrupt the flow of instructions at any place, but only at some points in which it is safe to do it, and when cancellation is acted, it calls a cleanup handler that restores properly the state.

The only solution that can be used in general libraries is the cancellation one. Libraries that use the others can be used only in places that set up handlers, and therefore are special purpose libraries. Cancellation is the best solution, except perhaps for some limited cases in which other forms of supervised blocks can be used.

Linux implements cancellation with SIGRTMIN.

Implementation of supervised blocks

Let's tackle the implementation of supervised blocks by considering first the races that occur when they are implemented naively by registering a handler that interrupts slow system calls and sets a flag, and testing before and/or after system calls the value of the flag.

All races fall in the category: signals lost, lack of atomic unblock+wait. In books some others are reported, but are either related to the use of old system calls (like, e.g., signal()) or the use of functions that are not allowed in handlers (like, e.g., longjmp()). The race is:

    pthread_sigmask(SIG_UNBLOCK,...);    // unblock signal
    ...
    if (killFlag) exit block   (1)
    systemcall(...);           (2)

    void handler(int signo){
        killFlag = 1;
    }

The race is that a signal can occur immediately after (1) and before (2), then loosing the signal.

Some races can be cured, though:

alarm() followed by pause(): the alarm can go off before pausing. Use a longer time, or in this special case use siglongjmp() (and suspend all other signals while in the handler), or block SIGALRM and then call alarm() and sigwait(). The same applies when alarm() is used to supervise any other system call that is asynch-signal-safe. Time supervision of I/O can be better done with select() and poll(). When slow system calls have also a non-blocking version, use the poll solution.
unblocking signals and then pause(): use sigwait() (a signal can occur between blocking and pause(), and be lost). Unblocking and waiting must be atomic, and so it is in pselect(), ppoll(), epoll_wait(), etc. but not in other blocking system calls.

Races are defined in Wikipedia as results that are time dependent, while they should not be. However, in Wikipedia they seem restricted to the ones occurring on data when several processes read or write them, which is normally solved with the notion of atomicity. They are present also in other contexts, such as signals when we implement supervised blocks. In this case we want that another thread of execution (the signal handler) meddles with the supervised one, and want to avoid races that occur in doing it (as the ones reported in this document). With supervised blocks, the tools we have are atomically blocking+waiting/unblocking+resuming signals.

The purpose of handlers aborting slow system calls like pause(), pselect(), ppoll(), sigwait(), and sigsuspend() is to support supervised blocks. For the other slow system calls it is of little use (except for timeouted system calls): it cannot be used to kill an operation since it leads to a race. Apparently, since the process has done something (i.e., executed the handler), it should not continue with a slow call, but this seems a weak reason. However, it is possible to re-suspend it at the exit of the handler, which is achieved by registering the handler with SA_RESTART. Synchronous signals by definition abort the operation that raised them. Aborting seems to support the semantics of signals that serve to kill a sequence of actions.

A general fact is that a thread of execution (e.g., a thread, or a signal handler) has no built-in means to know where the other threads are in their execution, or what they are doing (like, e.g., if they are inside a system call or not). Knowing their program counter would not help either. They must use some explicit means to inform each other. Doing something and informing another are usually two separate actions, and when they are, they need to be performed atomically, much the same as updating several data, to avoid another thread to run between them and read inconsistent data. This can be achieved by using some means to mutually exclude the execution of threads. In the realm of signal handlers, the only means to do it is blocking signals, much the same as with interrupt handlers (that disable the served interrupt line). However, Interrupt handlers can use spin locks to mutually exclude when accessing data that can be accessed also by their tasklet, but this because they can run in parallel with them giving then tasklets a chance to release them. On the contrary, signal handlers do not run truly in parallel with the thread they interrupt, and therefore if they spin for the interrupted thread to do anything, they cause a deadlock. It is to be noted that blocking signals and performing some actions need not be atomic in most cases. When signals are blocked, handlers do not run, and then a thread can update data safely. However, if in doing this a thread needs to suspend itself, handlers would not run until the thread resumes and unblocks signals. If there is a need to let handlers run while a thread is suspended, then signals must be unblocked atomically while suspending and blocked again atomically while resuming. Atomicity is needed to avoid to have windows of time in which handlers can run while the thread is not suspended. Atomicity can only be achieved having the kernel perform these operation. The kernel does it when running handlers, and when executing few system calls.

That said, let's see then if a signal handler can perform cleanup. Cleanup must be done when a thread has executed a system call or any function that needs something to be undone, such as releasing a resource (e.g., a lock). A signal handler then needs to know if the thread has completed one such system call, or is before or inside it, and can do it only by testing some flag set by the thread since it has no built-in means to do it (e.g., errno in them is not set to EINTR when handlers abort a system call). The thread could block signals, clear a flag, do the system call, set the flag and unblock signals. The problem occurs when the system call suspends it. In such a case the handler has no chance to run, and if its purpose were to kill the operation, it cannot fulfill it. There would be a need to unblock signals during suspension. This is what pselect(), ppoll(), sigwait(), etc. do. But they do not reserve any resource, and therefore do not need any cleanup. No other system call do it. If they did, then handlers could test the flag above and from its value know if a system call terminated successfully, and only in such a case perform cleanup. The conclusion is then that cleanup cannot be done in signal handlers. Note, however, that if system calls behaved like that, not only handlers could perform cleanup, but also the calling threads by testing EINTR after system calls. See here for more information.

Reckoning resources is not difficult if the program does not use such information from within handlers, but, e.g., in cleanup handlers, or if the program is not aborted asynchronously, but only in some known places.

Blocking signals while executing slow system calls, besides allowing to reckon resources, has also another purpose, which is unrelated to maintaining consistency in updating data, and that is to defer the running of handlers until a slow system call is executed (one that unblocks signals while waiting). This avoids to loose signals. We want to block signals beforehand because we want it be unblocked when waiting, or in other words, we want to be interrupted only when waiting, which is like registering handles when waiting, which is what drivers do when suspending for interrupts. Like drivers, when suspending, signals need be unblocked (which enables signals to be caught by handlers), when handlers are run, signals are blocked again, and when handlers terminate signals must remain blocked, and the thread resumed. In supervised blocks we want not to loose events (signals occurring) that occur before suspension points (in addition to the ones that occur during suspension). The kernel handles signals much the same as interrupts: when interrupts are not enabled, and a datum becomes available at the interface, the datum is kept, and an interrupt immediately asserted when the interrupts are enabled. Likewise, when signals are blocked, and a signal arrives, it is kept, to run a handler as soon as signals are unblocked. To use this paradigm, there is a need to keep signals blocked, to be unblocked when slow system calls suspend processes. Note that the kernel handles all slow system calls in a similar way when deferred cancellation requests are acted: they do not produce an immediate effect, but interrupt slow system calls when they suspend the process.

Having renounced to perform cleanup in handlers, there is no strict need to return from slow system calls with signals blocked. Re-blocking in pselect() is there in order to avoid to have a window in which signals are lost after such a call and before another. But this can be overcome: suppose pselect() returned with signals unblocked, and the program blocked them again before a subsequent call. A signal can come in between, its handler run, which can set a flag. After blocking signals, the program can test the flag and if set, kill. There would be no signals lost. After all, a signal with a handler that sets a flag resembles a persistent signal. Re-blocking signals would, however, spare all this (and make programming less risky). But there is another reason to return from an interrupted call with signals blocked, and it is that we want then to perform cleanup, and in doing it we do not want cleanup code to be killed by a signal (and system calls in it aborted). It is true that in cleanup code slow system calls should be seldom used, but it is also true that we want to perform all the cleanup code. Therefore, if we execute it with signal unblocked, at least we need to set up handlers that do almost nothing, and restart system calls (but there are some that are never restarted). If signals are not blocked, a solution in cleanup code could be to test EINTR and explicitly restart system calls. Another is to block signals explicitly at the beginning of the cleanup code providing that empty or harmless handlers are in force (otherwise the default disposition would kill the process, and no cleanup done). A signal occurring just before blocking signals in cleanup code would request again killing, which we are already doing.

A rule is that a program that registers a handler that can be invoked asynchronously by the system (e.g., an interrupt handler or a signal handler), must start the execution of such a handler with interrupts disabled or signals blocked (or whatever makes the handler run disabled) so as not to begin a new execution of the handler while the previous one has not yet completed. The reason is that inside the handler, data are accessed, and two parallel executions can mess this up. Of course, if the handler is re-entrant, there is no danger in having two parallel executions, but this is seldom the case. This is exactly what happens in signals handlers: signals (at least the one kind that invokes them) are blocked in them (unless this default overridden). A similar condition avoids races also in programs that call slow system calls: if calls were resumed with signals blocked, we can safely handle their result (successful or aborted) without bothering about being interrupted again by the same signal handler, which could mess up things. However, this is not strictly necessary: it depends on what the handler does. E.g., handlers that only raise a flag cause no races. A handler that executes a longjmp() normally sets a flag to bypass it the next time.

Here it is shown that longjmp() in signal handlers is unsafe. This technique can be applied only in some few cases, but it is generally to be avoided. The only one case in which longjmp really avails is when the block contains slow asynch-signal-safe calls (accept(), connect(), open(), pause(), poll(), pselect(), readlink(), recv(), recvfrom(), recvmsg(), select(), send(), sendmsg(), sendto(), setsockopt(), sigsuspend(), wait() and write()), but it is dangerous with respect to program changes. Moreover, terminating abruptly a sequence of instructions can leave data in an inconsistent state. When a supervised block has only pure computation in it, signals can be blocked and upon detecting any pending ones, the block terminated.

In the first solution of supervised blocks (pselect(), etc.), a (possibly empty) signal handler is needed because the default disposition is either to ignore or to terminate the process, unless a thread has been created and dedicated to handle kill signals. System calls return the EINTR error to let a program know that they aborted. If there is a need to restrict killing to some specific signals, all others can be blocked beforehand, and the wanted ones unblocked while waiting.

The solution to avoid to loose a signal in a supervised block is that of pselect(): signals are blocked, then in the block pselect() is called, which atomically suspends the process and unblocks signals (to block them again when resumed). It is then possible to test EINTR. Also ppoll() and others share this behaviour. This is a way to make signals persistent. Unfortunately, this feature is present only in few system calls. E.g., there is no equivalent for a sem_wait(). Note that blocking signals and testing EINTR is exactly making the victim suicide.

There is no way to handle killing safely when done asynchronously (i.e. without the help of the victim).

Some operations are atomic for signals, like, e.g., the normal system calls because they are executed by the kernel in a context in which signals have no effect on them. Note, however, that many system calls are wrapped by library functions, and the ones that are not need anyway the execution of several instructions before switching to the kernel context, and some after returning from it. If a signal handler interrupts this and does not resume the process where it was interrupted it leaves it in an inconsistent state. See the discussion of calling longjmp() in signal handlers.

When setting a supervision on a block statement there is a need to cater for an outer supervision. I.e. supervised blocks can be nested, and use the same signals for supervision. There is a need to save/restore things so as to make it work. With alarms it is even more difficult because we must compute the time remaining.

The kill flag

Programs that use time supervised or non-blocking system calls (instead of indefinite waits), can implement kill using a signal handler that sets a kill flag. They have no problem of "loosing" the kill:

    if kill flag ...           (1)
    for (;;)
        timed wait             (2)
        if not timeout break;
        if kill flag ...       (3)
    }

If the signal arrives after (1) and before (2), the timed wait is executed, but then the kill flag is tested again after it, and the effect of the signal is achieved.

This solution works because:

all the blocking system calls have a time controlled version, or a non-blocking versions. However, a usleep(1000000) makes a process sleep for one second, which means that to achieve shorter response time to kill, a loop of smaller ones must be used.
polling does not cause other races:
```
    
    if (killFlag){
        (0)
        ... block signals
        ... cleanup, unwind
        ... unblock signals
    }
    (1)
    for (;;){
        (2) ... acquire resource with timeout
        if not timeout break
        (3)
        if (killFlag){
            (4)
            if (resource acquired){
                (5) ... release resource
            }
            (6) ... unwind
        }
        (7)
    }
```
If the signal arrives at:

0: cleanup can protect itself blocking signals so as not to be killed. A signal here is just a harmless duplicate

1: the process will execute a needless acquisition and after it serve the kill request

2: the acquisition will be aborted, and thus the resource not released

3: correctly enter the if below

4: this is a duplicated signal, with no effect

5: it could interrupt cleanup: to avoid this, signals should be blocked

6: same as (5)

7: no problem

Note that retrying must be done at each resource acquisition.

The solution then exists, but it implies polling. All the more, it requires that polling is used in all inner pieces of software, including libraries. Note that if polling is not used, the risk of this solution is to block the process loosing the signal sent to kill it. The solution of the supervised block (if it had no races) instead would work even when the inner software is not instrumented.

With the kill flag technique there is no need to provide some means to support nesting of supervised blocks: there is a need to enclose in a supervised block only blocking system calls. For code that executes without waiting there is always a chance to detect the kill request. Even when the kill flag is raised immediately after having tested it (and found it false), it does its job: it will be detected at the next test point. Logically, nested supervised blocks do exist. They can be simulated with explicit tests.

Note that only library functions that test kill requests and honor them can be called here.

This solution needs a per-thread kill flag, and needs that all functions, including the library ones use non-blocking system calls instead of the blocking ones. Since this is seldom the case, this solution can only be used in some particular context, in which library functions that contain blocking calls are not used.

Supervised blocks polling solution

The polling solution can be done with signals blocked, or with signals unblocked:

signals blocked: for time supervised system calls the kill condition is errno = ETIMEDOUT (or any other timeout returns), or interruption, and a kill signal(s) pending. For non-blocking system calls the condition is errno = EAGAIN, or interruption, and a kill signal(s) pending. Interruption occurs with signals that cannot be blocked, like, e.g., stop signals. In these cases polling ends and the system call does not succeed, otherwise it returns with success. When a system call has a non-blocking variant and a time supervised one, the non-blocking one should be preferred because it allows to detect immediately a pending kill request, while the other does it after the first timeout has expired. Cleanup code is better not be aborted by kill requests, and here it is so since signals are blocked.
signals unblocked: for time supervised system calls the kill condition is errno = ETIMEDOUT (or any other timeout returns), or interruption, and a kill flag set. For non-blocking system calls the condition is errno = EAGAIN, or interruption, and a kill flag set. It is also possible to test that the sleep in the polling loop is aborted by a kill signal. In these cases polling ends and the system call does not succeed, otherwise it returns with success. This solution is more responsive because it can abort the sleep in the poll loop. It is also less expensive to test a kill flag or an aborted system call than a pending signal. Cleanup code is better not to be aborted by kill requests, and to have it here, it must be protected by blocking signals in it.

These solutions differ mostly in the interruption of system calls that is done in the latter and not in the former. Note that errno reflects an event that occurred during the execution of a system call, while a kill flag or signals pending reflect something that can have occurred at any time, before, during or after a system call. Therefore, it may not be related to what occurred in the system call executed immediately before. Polling allows to detect kill signals that occur before the first poll or between a poll and the next. It is not much important that a signal occurred before, during or after a system call: it means killing and we act it when detected. Note that testing a flag or a pending signal and performing an action is not atomic: a signal can occur in between. This, however, does not cause a race: the event will only be noted at the next iteration of polling.

When a system call returns successfully, there is no need to test immediately the occurrence of a kill signal, when it returns unsuccessfully, the kill signal is tested because we need to decide whether to stop polling. In such a case cleanup is void (unless something need be undone to cope with a failed system call) . Sometimes there is a need to kill also sequences of actions that do not contain slow system calls: to do it, a kill flag or pending signals must be tested. Here, again, we detect an event that occurred before.

In the case of polling with signals blocked, the only signals that can interrupt a call are the stop ones, the others are blocked, or kill the process. Therefore, in this case of polling, the test of interruption need be done only on the 15 system calls interrupted by stop signals.

The kill request can be tested (and acted upon) before testing timeouts, hangs, errors and interruptions. It would kill when the signal arrived immediately after the system call. However, it is cleaner to test it and exit the block when the system call terminates prematurely. What is important is that in this case we do not restart the call before testing the kill request, otherwise we can have system calls that have been interrupted by a kill signal and are restarted.

In a process, all signals must be handled, otherwise there is no point in treating only one in a supervised block bothering to cleanup while all the others would kill the process without cleaning anything. The same applies to a time supervised block. However, we likely want killing to be performed by only few signals. With the signals unblocked solution we need to unblock only these few signals, and provide a handler that raises a flag. With the blocked signals solution we need to test only these few signals pending, and not all. Let's say that blocking or ignoring signals that must have no effect on a program is something that we do at the beginning of a process. It is better to block or ignore all signals, and then to let the program pieces that need to handle some signal enable it.

The kill request (pending signals or flags) must be cleared at the beginning of supervised blocks so as to discard a kill signal, that arrived at the wrong moment. Note that in the case of unblocked signals, if a signal arrived before the block, unblocking it makes its handler run, and then set the flag, and eventually kill the block. So, the block is responsive to those signals. In the blocked signals solution, there is a need to clear pending signals after the block so as not to cause problems to another block executed later. In the unblocked solution there is no such a need since handlers are run only in the block.

The signal blocked solution cannot handle stray signals. It could then be used only when signals can come from anywhere. Since it has this drawback and it is also less responsive than the one with signals blocked, is not suggested. However, here it is.

Polling with signals blocked

This is the scheme of polling with signals blocked:

    // test if kill has been requested
    int killRequested(){
        sigset_t sigpend;
        sigemptyset(&sigpend);                // sigpending does not clear it
        sigpending(&sigpend);
        if (sigismember(&sigpend,SIGxxx)){
            int sig;
            sigwait(&sigpend,&sig);           // clear kill request
            errno = EINTR;                    // report abortion
            return 1;
        }
        return 0;
    }

    // example of wrapping a system call using a time supervised one (with absolute time)
    int psem_twait(sem_t* sem){
        if (killRequested()) return -1;       // not to loose the time for one poll
        for (;;){
            struct timespec ts;
            struct timespec tick = {1,0};     // poll period 1 s
            timeend(&tick,&ts);
            if (sem_timedwait(sem,&ts) == 0) return 0;
            if (errno == ETIMEDOUT){
                if (killRequested()) return -1;
            } else {                          // some error
                return -1;
            }
        }
    }

    // example of wrapping a system call using a non-blocking one
    int psem_wait(sem_t* sem){
        if (killRequested()) return -1;       // not to loose the time for one poll
        for (;;){
            if (sem_trywait(sem) == 0) return 0;
            if (errno == EAGAIN){
                if (killRequested()) return -1;
            } else {                          // some error
                return -1;
            }
            struct timespec ts = {1,0};
            nanosleep(&ts,NULL);
        }
    }

    // example of wrapping a system call that is interrupted by stop signals
    int pepoll_wait(int epfd, struct epoll_event* events,
        int maxevents, int timeout){
        ... see the one for polling with signals unblocked
    }

    // supervised block
    ...
    {                                         // supervised block
        clearKill();                          // kill pending signal, if generated internally
        ...
        int res = psem_wait(&sem);            // slow system call
        if (res == 0){
            ... success
        } else if (errno == EINTR){
            ... killed
            ... cleanup. N.B. signals blocked
            goto endblock;
        } else {
            ... error
        }
        ... other actions
        clearKill();                          // kill pending signal, if generated internally
    } endblock:;

System calls that have a timeout (which is not the one used to implement polling) need be supervised only when the timeout is sufficiently long, in which case polling must honor that timeout, but system calls that are interrupted by stop signals need be always supervised. To make polling honor the timeout, a test must be done at each polling iteration to check if the deadline has passed.

Supervised blocks kill longjmp solution

Signal handling is often shown in textbooks and man pages using supervised blocks implemented with longjmp(). This technique has many problems, as will be shown here. However, since it has long been in use, it deserves studying it.

The canonical way to write a supervised block is:

    if (setjmp(cxt) != 0){
        handle exception
    } else {
        try block
    }

When the if statement is executed the first time, the context is set and the try block executed. If during the execution of the try block a longjmp() is done, execution resumes at the setjmp(), but making it return a value different from zero, thus entering the exception handling branch.

Here is the implementation of graceful killing done using a supervised block:

    handler
       no need to block permanently signals: siglongjmp does it
       siglongjmp(context,1);

    // leave the signal unblocked at the beginning: it must kill
    save signal mask
    block SIGxxx
    register handler for SIGxxx
    if (sigsetjmp(context,1) != 0){
        cleanup
    } else {
        unblock SIGxxx
        supervised block
    }
    deregister handler
    restore signal mask

The signal, that is temporarily blocked in the handler, remains permanently blocked when the handler returns with siglongjmp(), so that the code that handles the signal (the "bottom" half cleaner) can perform the cleanup actions without the risk to get interrupted again by the same signal. Since the signal is blocked at the beginning, it cannot occur while sigsetjmp() writes its jump buffer, or before if: siglongjmp() would not be done, and nobody would complain if its jump buffer is inconsistent. If the signal occurs when the supervised block has completed, but before the handler is de-registered, then the cleanup would be done, but the cleanup code must be prepared for it since the signal can occur at any time in the supervised block. We have shown before that reckoning acquired resources is not possible when a handler interrupts asynchronously a course of actions, and here the cleanup code is reached with a jump from a handler and thus it is executed as if it were in the handler. But let's forget for a moment this issue.
If the signal occurs during the cleanup, but before the handler is de-registered, nothing happens since it is blocked. It the signal occurs after de-registering the handler, then it does nothing if cleanup has been executed, otherwise it executes the previous handler. It would not restore the old mask, but this would leave a mask different only if the signal was blocked in the old mask. Let's try to avoid to execute cleanup twice. Let's see what happens if we swap the de-registration and the restoring:

    restore signal mask
    deregister

signal occurs after cleanup, before restore: it is blocked, and if restore unblocks it, the cleanup is executed again
signal occurs after supervised block, before restore: cleanup executed
signal occurs after restore: cleanup again

This is no improvement. Let's then do the following:

    if (setjmp(context) != 0){
       cleanup
       (1)
       deregister handler
       (2)
       restore signal mask
    } else {
       unblock SIGxxx
       (3)
       supervised block
       (4)
       restore signal mask
       (5)
       deregister handler
    }

Let's see what happens when the signal arrives at the indicated points:

1: nothing, signal blocked
2: nothing, signal blocked
3: cleanup
4: cleanup
5: if signal unblocked, cleanup, otherwise nothing

The problem is to distinguish between 3 and 4. If the signal occurs at 3, the change to the program state has not occurred, and the cleanup code must do nothing; if occurs while executing the supervised block, some cleanup need be done; if it occurs at 4, then the cleanup code must restore the previous state. Note that the signal can occur just between the last instruction of the supervised block (which could be the return from a system call, and the next one, which can be one of the function that wraps the system call) and the next one. Unfortunately, there is no way to distinguish these cases. This race is one of the biggest failures (besides others) in the longjmp() implementation of supervised blocks.

Note that it is possible to nest supervision since it saves and restores the handler in effect. This solution has the defect to register and deregister handlers, which is not fast. Moreover, setting up a supervised block and closing it requires many statements. and it is not easy to put them in a couple of functions or macros.

As a side comment, there is an implementation of sleep() that uses alarm() and that does not use longjmp(). It blocks SIGALRM, then it issues an alarm() (and after it, the handler cannot interrupt since the signal is blocked), and then suspends atomically unblocking SIGALARM (see Advanced Programming in the Unix environment, page 318). This can be done here because there is a function that atomically waits and sets the signal mask.

Since setting a handler is costly, a solution is to register it only once at the beginning and play with the jump buffer using a siglongjmp(*ptr) to jump to the cleanup code of the current supervised block. ptr can be set to NULL at beginning, and a test made in the handler to use it when not null. Here is the solution with the handler registered only once:

    static volatile __thread sigjmp_buf* jmp;
    static volatile sigjmp_buf nokill;

    handler
       sigjmp_buf* tmp;           // save current
       tmp = (sigjmp_buf*)jmp;
       jmp = NULL;                // make it NULL: prevent further jumping after handler jump
       if (tmp == NULL){
           pthread_exit(NULL);
       } else if (tmp == (sigjmp_buf*)&nokill){
           return;
       }
       siglongjmp(*tmp,1);
    }

    sigaction with SA_RESTART handler, all signals blocked
    ...
    sigjmp_buf buf;
    sigjmp_buf* prev = (sigjmp_buf*)jmp;     // save current
    if (sigsetjmp(buf,1) != 0){
        ... cleanup
    } else {
        jmp = (volatile sigjmp_buf*)&buf;    // (1)
        ... block of statements
        sigjmp_buf* save = (sigjmp_buf*)jmp; // save: start non-killable block
        jmp = &nokill;
        ... actions not killed
        jmp = save;                          // restore: end non killable block
    }
    jmp = prev;                              // restore previous supervision

At the beginning the jump buffer is null, which means that the signal is caught and it kills the process or thread. The pointer to the jump buffer is a per-thread variable so as to make the handler jump to the appropriate place for each thread.

Note that setjmp() does not accept a volatile jump buffer argument. However, volatile prevents optimizations and reordering, that can hardly be done when passing arguments to calls. The solution above, however, uses a (volatile) pointer to a jump buffer, and therefore the only caveat that it needs is to make sure that no reordering occurs between the setting of the pointer in (1) and the statements in the block below.

Another solution is to use a flag to inform the handler when it can jump (and otherwise do nothing). The flag is cleared before registering the handler, and before returning from it, and it is set at the beginning of the supervised actions. Since the handler returns with the flag cleared, it does not jump again when run again.

To support nesting, there are two alternatives:

each cleanup restores the previous supervision and then calls the enclosing cleanup: it makes killing propagate until it reaches the top-level and kills the thread.
it could return or break with an error code, and let the enclosing block handle it.

Semantically, the former seems more correct: if an outer block wants to catch kill, it should catch it when a kill signal is sent even when there is an inner one that does the same.

Macros can be defined to make the supervised block opening, closing, etc. more readable.

Jump from signal handlers unsafe

The problem with longjmp() in signal handlers is that the handler could have interrupted a call that is not asynch-signal-safe and that has left some global data in an inconsistent state. Think, e.g., to a printf() that upon exit clears some global data, and that upon entry uses such data, assuming that they are properly initialized, or left so by the previous execution. Interrupting it would mean that it cannot be called anymore in the program. longjmp() in a signal handler is not safe. If a handler interrupts a malloc(), that uses static variables to reckon the status of memory, then a longjmp() would restore registers, stack and program counter, but will leave the malloc() in an inconsistent state. See the text of CERT SIG32-C: "Invoking longjmp() from a signal handler causes the code after the setjmp() to be called under the same conditions as the signal handler itself. That is, the code must be called while global data may be in an inconsistent state and must be able to interrupt itself (in case it is itself interrupted by a second signal). So the risks in calling longjmp() from a signal handler are the same as the risks in a signal handler calling asynch-signal-unsafe functions." I.e. the code after a longjmp() could fail calling a function that is not asynch-signal-safe and that finds its data corrupted because the handler has interrupted a previous execution of the same function.

Note that a handler that does not terminate because it executes a longjmp(), and that instead terminates normally does not have this problem because when it returns, the interrupted function or system call is resumed, and then it can proceed and make the data consistent. The program, when resuming after a longjmp in a signal handler, is in a very critical state: it has no way to make an interrupted asynch-signal-unsafe function recover (so as to clean its data), allowing it to be used again. All the more, it does not even know if the handler has interrupted such a function. This means that supervised blocks cannot be implemented. This means also that siglongjmp() can be used only when supervised blocks do not contain asynch-signal-unsafe calls.

Note that since a signal can interrupt a thread at any time, if a handler would be allowed to make a siglongjmp(), then any update of data would need to be made atomic, or surrounded in a supervised block. This would make the code very involved. It is a lot better to let the thread code go on, update its local data without making this atomic (i.e., protecting it by blocking signals). Moreover, such a handler could interrupt a thread also between a resource acquisition and its reckoning, thus making difficult, if not impossible, to tell if it has to be released in the cleanup code, which is not in the handler (but in the thread).

THEREFORE, longjmp() and siglongjmp() in general must not be used in signal handlers. Supervised blocks cannot be made using them.

Reckoning resources unsafe with signals handlers performing cleanup

This section is concerned with programs that want to perform cleanup in signal handlers. Suppose a process acquires a lock, and receives a signal after the lock acquisition and before the process reckons this acquisition somewhere in its process data. If a signal handler has been registered, it is called (at that point), but it is not able to release the lock since it is not sure that it has been acquired because there is no atomic operation that acquires a lock and remembers it. What is atomic is the getting of a lock and the advancing of the program counter, but the signal handler cannot test the program counter. In some cases we can record in a list an object with a null pointer to a resource (e.g., an open file), that is filled by the operation that allocates the resource. But there can be cases in which the allocation of a resource is not done by a system call, and therefore is not atomic. The allocator of such resources should have a supervised block in it so as to return either with a resource (a single pointer) or with none. A single pointer does not need registration: when it is null it means no resource, when non-null a resource. However, returning a resource handle in a variable is often done by wrapper functions that perform it with instructions instead of being done by system calls before they return to the process space (when they are still atomic). A means to make resource acquiring and reckoning atomic is to disable signals when doing it. As with monitors, when there is a need to wait for a resource, the "lock" must be released before waiting, which here means to unblock signals before waiting. E.g.:

    block signals
    acquire lock
    register lock acquired
    unblock signals

However, this does not work: it would keep signals blocked while waiting. We could swap acquiring and blocking:

    acquire lock
    block signals
    register lock acquired
    unblock signals

but this has a race between acquiring the lock and blocking signals. There would be a need for a primitive that acquires a lock, while waiting unblocks signals (more or less like sigwait()), and blocks again signals before returning. Four such primitives exist: pselect(), ppoll(), epoll_wait() and read() from a signalfd() but all the others do not do it. We could place it in a supervised block:

    BEGIN_KILL
        acquire lock
    ON_KILL
        release lock
    END_KILL

But in so doing, there is no reckoning. All the more, a signal can occur immediately after the acquiring as well as during it, which means that in the cleanup code we do not know if we should release the lock or not. Likewise, the signal can occur immediately before the acquiring, making the cleanup code also unable to tell if it has to release the lock. Unfortunately, there are no means for a handler to know if it had interrupted a system call or not, otherwise it could let the cleanup code know it, and release the lock in such a case. When a system call is interrupted, errno in the handler is 0, and also in the siginfo struct passed to it. But since the block could be interrupted also before the system call, knowing in the cleanup code that the signal aborted a system call would not avail (it still does not know if it had interrupted the block before acquiring or after it).

Note that when a process receives a signal, it can be interrupted inside a system call, in which case it aborts it, but it can be interrupted also immediately after a system call resumes the process. Suppose a process is blocked on a semaphore. The process is placed in the ready queue. Later, another unblocks the semaphore. But this does not mean that the process becomes running immediately (there can be ready processes with higher priority). When in such a state, a signal can be sent to it. There is no guarantee that the process, when resumed executes at least one instruction placed immediately after the system call: it is instead likely that it will execute the signal handler. Moreover, the function that acquires a resource could be wrapped by a library function, that executes a few instruction after the return of the system call in it. This means that the process can be interrupted after the system call and before it has a chance to record somewhere that it got a resource.

This is only possible if the resource acquisition returns something observable atomically, like, e.g., a pointer. This is seldom the case, e.g., a sem_wait() returns nothing.

This means that the only one safe method to perform killing when a process acquires resources is to inform the victim and let it do it.

The shooter

Basically, to kill a process, a signal could be sent to it. Its handler sets a flag, and the process tests it often, cleanups and exits. The only problem is when the process is inside a blocking system call or in an endless loop. In the former case, the signal interrupts the call, and the process can test the flag. In the second one, the handler, besides setting the flag, could also start a timer, and the handler of the timer terminate the process. However, this has a race: if the signal is caught between testing the flag and entering a blocking system call, the effect of the signals is lost, and the process blocks in the call. If this race were not present (i.e., if some form of persistent signals were supported), the user code would be something like this:

    { // doit
        { // cleanup
            operation
            if (killFlag) break cleanup;
            system call
            if (errno == EINTR) break cleanup;
            break doit;
        }
        cleanup:
        perform cleanup here
    }
    doit:

Note that a signal handler here would just do nothing besides setting the flag, and return (and as a side effect, make blocking system calls return).

In order to eliminate this race, these solutions exist:

use a supervised block
have a thread (shooter) that sends a sequence of signals to the victim

Supervised blocks can provide graceful killing, except the ones implemented with siglongjmp() because it makes impossible to tell if a resource has been acquired: when the signal comes right after a successful acquisition, the handler cannot tell if acquisition has aborted or not. When a process is blocked in a slow system call, and a signal occurs, and is caught by a handler, there is no way for the handler to detect that the process was in a system call (errno is not set to EINTR). Only when a handler returns normally, errno is set to EINTR. Besides that, it would be quite a costly solution if it used siglongjmp() because the code that would be executed normally (i.e., all the times, even when there is no kill) would contain a sigsetjmp(), that costs as much as 110 memory-to-memory copies.

Note that cleanup cannot be performed directly in a signal handler because inside it there would be a need to call a generic cleanup function that closes files, releases locks, etc., but such a function needs to know what resources have been allocated, which is not possible to know reliably in a program that is interrupted by asynchronous signals. They interrupt the acquisition of resources with their reckoning in some data structure. See here why. Moreover, many system calls that are needed to perform cleanup cannot be called in signal handlers.

An alternative solution to cancellation (the shooter) is to send a sequence of signals to the victim so as to solve the problem of lost signals, i.e., to unblock it, should it be blocked in a system call.

Unfortunately, it is not possible to create the killer thread when the signal comes: it must be already there.

The shooter can be used in the following contexts:

to kill a thread: the shooter is worse than cancellation (there are more blocking system calls that cancel than the ones that interrupt). Moreover, the thread could get de-scheduled before blocking, in which case signals would misfire. Shooting, thus, has a race in the case of thread kill.
to kill a number of threads: a shooter can be implemented that kills several threads in parallel. It has the same weaknesses as the shooter to kill a thread, and it is more complex. Cancellation is better.
to kill a process letting it do cleanup and termination. The shooter, which is called by a dedicated thread, shoots the main one. The main thread shoots the other threads, if any alive at that time. This requires to have a shooter that can shoot a number of threads in parallel (if we do not want the main thread to sequentialize the killing of all its threads). The dedicated thread starts a time supervision to forcibly kill the process if it does not terminate within a given time. However, the dedicated thread can kill the other using cancellation, which is better (see above).
to kill a process letting it do cleanup, but performing termination automatically. This is more difficult than letting the process terminate itself (i.e., executing an exit()) because the shooter would have to rendezvous with it at the end of cleanup.
to kill a process and all its threads: the shooter has the same weaknesses as in the previous cases. Moreover, there is no way to get the ID of all threads of a process. A thread that cancels all the others, terminating the process if a timeout expires is then a better solution.
to kill a task: here acknowledging from the victim is needed, otherwise the shooter could go on and kill some other action that the process performs after having killed a task. There are cases in which the shooter is unable to kill a task (endless loops, misfire). There is no way to forcibly kill a task. Note that a task that is implemented with time supervised or non-blocking systems calls can also be killed with a kill flag. A shooter that kills just only one specific task, done by only one thread is simpler than one that kills several tasks in parallel (it does not need to keep track of tids ). A solution that can be used in multi-threaded programs needs a per-thread kill flag. Note that since the threads that are shot manage directly the suicide, they can decide to terminate entirely, or also to kill the current task. Supposing that the problems of the shooter would be solved:
- acknowledge
- use of timer or periodic timer
We would have a solution that has more non-killable points than cancellation, and can misfire. Overall, it kills less than cancellation. The choice is then between these weaknesses and the nuisance to create a thread for long tasks and cancel it, or to reuse a thread,

There are a number of blocking system calls, like, e.g., pthread_mutex_lock() that are not interrupted by signals. There are more of them than the ones that are not interrupted by a cancellation request. This means that with signals it is possible to kill less than with cancellation.

However, for sake of completeness, the implementation of the shooter solution is described in the following sections.

Synchronization between the shooter and the victim(s)

When killing a thread, the shooter must synchronize with the victim so as to stop sending signals, otherwise it could kill another thread with the same thread ID if the victim quit in the meantime.

When killing a single-threaded process: The killer thread has no means to detect that the victim reached the cleanup, or that it was blocked and then it is no longer so. If it could detect it, then it can stop sending signals. To acknowledge, an unnamed semaphore can be used (it is implicitly destroyed when the process terminates, even abnormally). However, acknowledging is not strictly needed: the cleanup code either does not contain blocking system calls, or it does, but in such a case it is executed with signals blocked, and therefore the sequence of signals has no effect. Moreover, if the termination of the main thread terminates the killer too, there is no delay in termination. Likewise, if the sequence of signals is fast, there is no delay in exiting. In conclusion, there is no need to send an acknowledge in this case. Take into account that when the killer has sent a kill signal, and the other does not acknowledge, the killer does not know where the other is. Then it sends another kill, but it is like shooting a thread that becomes vulnerable only when it is blocked. If you shoot when it is not vulnerable, nothing happens, you only misfire. When it is vulnerable, it proceeds and immediately it ceases to be vulnerable.

When killing a task, and the task has detected a kill request (either because it has tested the kill flag, or because it has detected an interrupted system call), it starts to perform cleanup, and in it it can use system calls, or it has ended cleanup and has started some other task, and that also may contain blocking system calls. Such system calls risk to be killed by stray shooter signals (if the killer gun is not closed). There is thus a need to synchronize the shooter with the task.

Acknowledging can be done with a semaphore. It can be done with a flag that is tested by the shooter before posting to the semaphore. The shooter could be de-scheduled between testing and sending, and in the meantime the victim raise the flag, but when setting the flag, the victim can also block signals, thus preventing any interruptions. But then to prevent the signals to show up when unblocked, there is a need to flush the pending signals before unblocking them (e.g., setting the disposition to ignore, and then restoring the previous one and unblocking).

Scheme of victims

A victim has to test the kill flag, as if signals were persistent. It is up to the victim to cleanup: another thread cannot do it reliably because it would execute in parallel with the victim, and the latter change the set of resources immediately after the killer has got it, and without the killer being informed:

the victim tests often a kill flag, and when it sees it true it acknowledges (except when killing a process).
the killer raises the kill flag when a kill signal has been caught, then waits for the victim to acknowledge it (except when killing a process), with a timeout. If the timeout expires, then it is likely that the thread is inside a long system call. The killer then send a kill signal to the victim, and waits for a reply. If it does not come, then the thread is likely to be in an endless loop, and then (in the case of killing a process) the shooter kills the process unconditionally.
this has the advantage to keep simple the code of the thread to be killed.
however, the timeout can expire also when the victim is de-scheduled, which would cause a forced kill (only in when killing a process).
this requires also a thread to control another, or possibly a thread to control many.

Scheme of the shooter

For the shooter, killing a task or a thread is basically the same: in both cases it has to send signals. What is different is killing a process because in such a case it can forcibly kill it.

Requests to kill a thread are sent to the shooter. It could be done sending it a signal (a realtime one, that is queued). The shooter must be quick to get these requests, and must handle all of them in parallel, i.e., keep track of what threads are under killing, and how many signals have been sent to them. It could be possible to keep track of the tid's only, and send signals to all the threads under killing (and that did not have disappeared), up to a retry limit until there is at least one to kick. The idea is that the shooter is waiting for a signal, that can be the one sent from another process to kill the process, a request of a thread to kill another, or a timer expiring to retry a kick. It is acceptable to synchronize all the kicks so that when a new request to kill a thread has come, it is put in the list, the thread kicked, and if the timer is not running, it is started. When the timer sends a signal, the list is scanned, and each thread in it kicked. The ones that are terminated, are removed from the list. The ones that have completed their kicks are removed too. If the list becomes empty, the timer is stopped. Pay attention to the timer because the risk is to have some unexpected shoots. Since the shooter is using sigwait(), it should not be disturbed by signals when it is executing its code. The problem is to distinguish all the events in it. It is not possible to distinguish because there is no additional info because sigwait() does not execute the handler. This can be overcome using sigsuspend(), that executes the handler. We could then use SIGTERM both for process kill, and for a thread kill request, in which we pass the tid, providing that in the case of the process, the associated info is guaranteed to be zero. A timer could also send SIGTERM, and it can sends also a value, e.g., -1 to distinguish it from the other cases.

Another means to implement a shooter is to use a realtime timer, and make it periodic. It sends a sequence of signals. There is a need to stop the timer when no more shoots need be done. Of course, this is convenient when there is only one thread to kill since it needs to create a timer, and use a dedicated signal. The signal could be the same for all threads to kill: its purpose is mainly to abort blocking system calls, but if there are several threads to kill, all of them must receive the signal, which is not simple. In such a case there is a need for a more complex shooter, that serves the timer, and kills all the threads that need be killed.

The shooter must set a thread-private kill flag. It should be accessed both by the killer and the victim, which means that victims must pass a pointer to their flag to the shooter.

The shooter for a multi-threaded process

A special shooter can be used to kill a process. It is a thread that waits for a kill signal and kills the others repeatedly sending them a number of kill signals, sleeping for some time and eventually killing unconditionally the entire process. The threads can perform graceful termination in that timespan. If they succeed, then the process terminates and with it all its threads. If they do not, they are forcibly terminated by the process termination. It would kill unconditionally the process only when it does not react to a sequence of kill signals occurring in the interval between a blocking system call and the last test of the kill flag done before it. This interval can be made very short by testing the kill flag before any such call. The likeness that the process does not react (i.e., that it receives all the kill signals in that interval) is very small, and in such a case the process will be forcibly killed. In such an interval the process is not acquiring resources, and therefore it should be possible for it to reckon exactly the allocated resources, and thus allow the forced kill to perform cleanup. The victim, when testing the kill flag and finding it true should first acknowledge it to the killer, and then perform cleanup and eventually tell the killer it has finished (but the killer could wait with timeout for it to finish). Acknowledging allows the killer to avoid sending unnecessary kill signals, and to terminate the process as soon as the threads terminate if that occurs before the timeout.

The shooter could kill some threads that in their cleanup handlers kill others, and that join them. This is not a problem because the shooter cancels only the ones that are alive. But the shooter could have killed one that is also killed by another. With such a shooter, threads should not kill others, but they could kill others as part of their normal operation. To avoid to wait more than needed, the shooter could monitor the number of threads alive, and when it comes to 1, exit immediately.

Note, however, that this procedure for killing threads can only be applied when threads can be killed in any order. A more general procedure is to perform cancellation on the main thread (which would cancel the others created by it) and time supervise the operation.

Linux does not provide any means to send a kill signal to all the threads in a group with a single system call. The /proc virtual filesystem allows to retrieve the tid's of all threads in a process looking to the /proc/<pid>/task directory. However, Linux does not provide any means to get task IDs (pthread_t) from tid's, which are needed to invoke pthread_kill() and pthread_cancel(). This means that there is a need for the program to reckon task IDs (which cannot be done for the ones that are not explicitly created by the program, but by some library functions that are called).

Note that while reading /proc, new threads could be created or terminated. To cater for this, either the shooter can loop until it finds no changes, which could be forever, or it can read it, then send signals to the threads in it, then reads it again and send signals to the thread again, retrying a few times. The shooter would kill the threads that generate others, and then likely converge. Note that this eliminates also the need to acknowledge: at each retry the threads that have tested the kill flag and entered cleanup and completed it disappear.

When there are many threads, there is a need to have only one receive the kill signal and kill the others. To kill the others it needs to send them another signal.

Of course, there can be threads that block SIGTERM, and that will not receive it. They would behave as if they entered an endless loop.

Cancellation

Asynchronous cancellation could be implemented with signals using some signal that is always blocked and that is raised and unblocked on pthread_cancel(), having its handler run the registered cleanup functions (but cleanup handlers do not have the restrictions of signal handlers). The deferred one could be implemented testing a kill flag in system calls. Actually, Linux implements cancellation with SIGRTMIN.

Cancellation for threads does not have the races that the signals have, i.e. its requests are not lost. It allows graceful termination. A cancellation request is detected at the next system call (one of the many that detect it) that the thread executes. So, if the cancellation request is made after having tested it, and before making a blocking system call, it is not lost.

Cancellation requests are really permanent: when one arrives immediately before a thread has disabled cancellation, or while it is disabling it, it is honored at the first cancellation point after the thread has re-enabled cancellation.

Thread cancellation has one important restriction: it does not allow to cancel forcefully another thread (the victim can disable cancellability, and none of the others can enable it). Even if a signal handler or a cleanup handler would change the cancellation type to asynchronous it would not be possible to forcibly kill a thread because the tread could disable cancellation anyway. This means that cancellation is meant to be used on cooperating threads, that are responsive to cancellation. Threads that can execute code linked dynamically are not responsive if that code is not so.

Cancellation points

The OpenGroup documentation provides a list of system calls that are (or can be) cancellation points. There are many calls that POSIX specifies to be cancellation points and that actually are, but their man pages say nothing about it. There is no guarantee that there are no blocking calls that are not cancellation points (actually, e.g., pthread_mutex_lock() is not a cancellation point). But cancellation points are more than the system calls that return EINTR, thus cancellation is the best that can be used to kill. The Open Group documentation does not ensure that system calls that return EINTR are also cancellation points.

This is a table of the major blocking system calls, showing if they can be interrupted by signals or cancellation requests:

system call	?EINTR?	?cancellation point?
`pthread_mutex_lock()`	no	no
`flockfile()`	no	no
`sem_wait()`	yes	yes
`sem_timedwait()`	yes	yes
`pthread_cond_wait()`	no	yes
`pthread_cond_timedwait()`	no	no
`pthread_barrier_wait()`	no	no
`pthread_spin_lock()`	no	no
`sigwait()`	no	yes
`sigwaitinfo()`	yes	yes
`sigtimedwait()`	yes	yes
`pthread_join()`	no	yes

Some system calls detect cancellation only when they are blocking. The man page of sem_wait() states nothing about the behaviour of cancellation: whether the lock is not got when cancellation occurs, but is says what happens when it is interrupted. It is likely that cancellation occurs when it could be interrupted. Thus, sem_wait() on a semaphore > 0 would not be a cancellation point. Actually, it is not. The POSIX standard seems to allow a sem_wait()on a green semaphore not to honor a pending cancellation request ("However, if the thread is suspended at a cancellation point and the event for which it is waiting occurs before the cancellation request is acted upon, it is unspecified whether the cancellation request is acted upon or whether the cancellation request remains pending and the thread resumes normal execution."). I.e. it allows an implementation to honor a cancellation request only on blocking points, which is when a system call returns with EINTR. What is important is to define clearly what happens when a system call is executed and there is a pending cancellation request, or such a request occurs during its execution:

either the system call returns successfully, or the cancellation is honored (in which case we know that the system call aborted, and cleanup handlers are called)
same, but when the cancellation is honored, the system call can have performed its task entirely, or partly, or not at all, and there is a means to know it (see, e.g., the case of condition variables).

Some functions, like, e.g., pthread_mutex_lock() is not a cancellation point. Therefore, if a thread is deadlocked on a mutex, pthread_cancel() does nothing to unblock it. There is no much documentation explaining why some blocking functions are not cancellation points. The rationale, as much as I got it is:

to allow atomic changes to data even when cancellation occurs: if a pthread_mutex_lock() were a cancellation point and a thread needed to make an atomic update of data, it would had to disable cancellation in critical regions (note that pthread_mutex_lock()is also not interrupted by signals). Mutexes are meant to be used to implement short critical sections only. E.g., a mutex lock can only be released by the same thread that owns the lock (as a rule this is checked only when the mutex is created with the error checking attribute, and otherwise it is not checked, but considered an error). Moreover, cleanup handlers are code that is running as part of the cancelled thread, and that has then to run concurrently with other threads. It needs then some mechanism to protect critical sections, and this is mutexes. In order to keep the state consistent, threads using mutexes would have to disable cancellation when using them (if mutexes were cancellation points), which would make the code more lengthy.
many library calls perform atomic changes using mutexes. If pthread_mutex_lock()were a cancellation point, they would become so as well. Having too many cancellation points makes programming difficult because of the need to provide cleanup at too many places.
the documentation of MKS states that making it a cancellation point "would make writing correct cleanup handlers difficult, if not impossible." However, cleanup handlers run with cancellation disabled, so, they would not be cancelled anyway.
the OpenGroup rationale states that it would be difficult to keep the internal data of barriers consistent should they be cancellation points. When barriers are implemented with semaphores, it is possible (and easy) to kill a task waiting on a barrier: it simply disappears and decrements the barriers counter, thus restoring the barrier (data) exactly as it was before the task waited on it. Barriers, however, could be implemented with other techniques, that make difficult to restore them. Thus, when there is a need to kill threads waiting on barriers, the POSIX one must not be used, and the semaphore ones used instead. If all we are waiting for someone to come that is dead, we would better know it and stop waiting. N.B. pthread_barrier_wait() is interrupted by signals, but it is resumed automatically when the handler returns. Killing threads when they are going to wait on a barrier seems meaningless if we look to the single thread, but it is not if we think to all threads that are going to wait on a barrier. E.g., a program could call a library function that uses internally some threads and that at the end wait on a barrier, and the caller could need to kill itself, and thus also the library function and all what is in it.

Cancellation scope

A restriction of cancellation is that it is not possible to delimit its scope: it applies to whole threads only. An application that needs to restart afresh (i.e., to recycle) must kill all the inner operations, and not terminate the main. To implement recycling, the main should create a thread, and if it is cancelled, then it must recreate it.

Another restriction is that it does not allow to treat locally the interruption (and the cleanup). E.g., in a snippet of code:

    open(file1);
    open(file2);
    open(file3);

with the cancellation API, since all three are cancellation points, if we want to close the files that are opened, we must set three cleanup handlers, that contain the cleanup code:

    open(file1);
    pthread_cleanup_push(cl1);
    open(file2);
    pthread_cleanup_push(cl2);
    open(file3);
    pthread_cleanup_push(cl3);

(or reckon what files are opened, and use only one cleanup) while with persistent signals one can remember the opened files, which in this case is automatic, and then perform the appropriate closing:

    {
        open(file1);
        if (killed) goto doit:
        open(file2);
        if (killed) goto doit:
        open(file3);
        if (killed) goto doit:
    } //
    doit:
    if (killed){
        if (file1 != null) close(file1);
        if (file2 != null) close(file2);
        if (file3 != null) close(file2);
    }

On the other hand, there is a need to unwind the stack and to execute all the cleanup code of the nested functions. However, reckoning is also possible with cancellation. The real difference is that it is somehow more difficult to pass the handles of the files to close to the cleanup handler than to close them in the same piece of code that has opened them.

Cancellable operations

Many (probably most) operations can be undone. Some cannot. Think, e.g., sending a message to a system logger that does not support a remove operation. There is no way in such a case to undo an operation (except sending another message telling that the former is not valid). Another example is formatting a disk.

Registration of cleanup handlers

The registration of cleanup handlers is provided by functions that are implemented in Linux as macros that enforce scoping. This has the advantage of avoiding to forget pairing, but has some disadvantages:

variables that are assigned within a push - pop block are not visible after the block. Thus, they must be moved before the block itself.
it is not possible to implement a library that provides a function that starts some functionality that requires cleaning up, and another function that stops it. E.g., if setting up contains the opening of a file, there is a need for a cleanup handler to close it, should the thread be cancelled.

In order to spare callers the burden of setting up a cleanup handler and then calling a start function, and doing the opposite when stopping it, libraries should provide these functionalities as macros. E.g.

    void openservice(){
        ... code to open or start this service
    }
    void cleanup(void* arg){
        ... code to closer or stop this service
    }
    #define OPENSERVICE() pthread_cleanup_push(cleanup,NULL); openservice()
    #define CLOSESERVICE() pthread_cleanup_pop(1)

    // example of library use
    OPENSERVICE();
    ... other actions
    CLOSESERVICE();

Forced cancellation

There is no way to forcibly kill a thread. This would be done after a thread has been cancelled, and the thread did not terminate in a defined amount of time. At that point the thread is executing its cleanup handlers, or it is in an endless loop. In the second case a signal must be sent to it so as to interrupt it and cancel (or exit) it. In order to make this cancellation act from within a handler, the handler would set the cancellability type to enabled and asynchronous. Strangely, when a handler interrupts a system call, the cancellability becomes asynchronous (if in the handler it was deferred), while when it interrupts normal code, the cancellability remains deferred. This caters for endless loops in the normal code of a thread. However, if there is an endless loop in a cleanup handler, there is no way to kill the thread. The only thing a handler can do to kill the thread is to cancel it, but at that point cancellation has no effect (the handler cannot do a pthread_exit() since this is forbidden from cleanup handlers).

This means that threads are parallel functions inside a same process, that must cooperate also on killing. There is no possibility to repair always. If nothing can be done on a thread once its cleanup handlers are in execution, there is no way to forcibly kill it. Threads are tightly coupled parts of a process, and thus if they do not terminate, this is a program error that cannot be recovered.

Knowing then that a thread has no way to forcibly kill another, let's see what it can do to kill it.

Joinable threads must be joined. When a join is executed, it would be nice to time it. But this would be useful if the killer had a way to forcibly kill it. Not only, timing join would allow the killer to proceed, and then two things can happen: if the killer is quicker and terminates the process, the victim is killed immediately; if the killer is slower, it terminates placing the process in the defunct state. A killer thread can join a victim reliably only by blocking. A timed join could be implemented with a supervised block. Alternatively, we could synchronize with the normal or abnormal completion of a thread with some other means (e.g., a semaphore) that supports timeouts (and using detached threads). However, as said above, a thread that does not terminate is a programming error that cannot be recovered anyway.

With detached threads no join is done (and the killer must know what threads are detached because it cannot not test if a thread is such), and then there are two alternatives: to use a signal to make the victim tell the killer when it is exiting, or the killer polls it. The first is less safe: what if the victim forgets it? Forcing the detached state from the killer on the victim is not possible (and it would not be safe either because the victim could change it), and the same applies to getting it. We can detect that a thread terminated sending the signal 0 to it (but then it would be better to use a joinable threads). Detached threads have the same problem as the joinable ones: if the main thread terminates before, it remains defunct. This means that when a thread creates another, and that other is detached, it has to use some means to synchronize with its termination so as not to leave it around.

Killing requests

Killing a request is the killing of a sequence of operations that are being done in a reply to a request. Think, e.g., to a web server process that receives requests to deliver pages. To implement it, a control thread must be started, that is told what request to kill, and that it finds what thread is serving it, and then it cancels the thread, possibly not waiting so as to serve quickly other kill requests. If there are several threads executing requests, there should be some map between requests ids and thread ids; moreover, there should be no window between the time a control thread seeks a request, and the time it cancels the thread. I.e. the control thread must lock the entry in the map and cancel the thread using a critical region so as to prevent the thread to remove the entry and terminate, starting processing another request.

Signals as IPC

All other IPC means, like, e.g., messages, semaphores, etc. require to create a permanent object, and to destroy it when it is no longer needed, while signals can be sent to any process without a need for system objects. This can be seen as a disadvantage, but it can be an advantage too: it allows to communicate with processes that are not yet created.

Semaphores, mailboxes, pipes, etc, need to be created, and even worse, need to be removed when processes terminate (even when they terminate because killed, with the risk to litter the system with no longer useful objects). Signals are also served before the process to which they are sent does anything else (providing it has registered a handler). Moreover, the idea of interrupting a sequence of statement is something basic that cannot be achieved with messages sent to threads waiting for them. Of course, the request made by a process to another to kill itself or the actions that it is executing can be implemented sending a message to it (having in it a thread receiving the message), but that thread needs anyway a means to interrupt the actions done by its process.

Signals can be used as semaphores or queues: waiting for a semaphore to become green or a queue to have an element to get is done by using sigwait(), or sigwaitinfo(); posting to a semaphore or a queue by sigqueue(), pthread_kill() or kill(). Synchronization among threads can better be done using one of the IPC means available (locks of various kinds, message queues, etc.). Therefore, signals as IPCs are restricted to synchronizing processes. Note that related processes know each other pids, while unrelated processes do not know them, and thus they must communicate (or find) their pids in order to synchronize with signals. A drawback is then that this exchange of data must be done, and another is that pids can denote processes that do not exist any more, or worse that are recycled and denote homonyms. This kind of synchronization is mostly used to issue requests to servers, or demons, i.e., processes that run permanently.

Another difference is that signals can carry data, i.e., are similar to messages, while semaphores do not.

This is the scheme of the server process (the thread in it that is devoted to handle signals):

Time supervision

Time supervision, although it is in general incompatible with the fact that processes and threads progress with the time that the scheduler gives them, is a need that occurs in applications. E.g., there are cases in realtime systems in which a sequence of operations must be done in a given time, otherwise something else need be done. There are several means to ensure that applications that have time constraints meet them. In these cases it is meaningful to supervise progress.

Time supervision can be applied to a single operation, usually a slow system call, or to a sequence of operations. Many slow system calls support time supervision. This works fine because supervision is started at the same time a process suspends, and therefore it supervises exactly the waiting time. Some system calls, though, require absolute time, which means that the current time need be got, and the deadline computed, and the process can get de-scheduled in between, thus making the deadline unreliable to some extent. Usually, however, the timeout is much bigger than the error caused by this. The same applies to the system calls that have no time supervision.

To time supervise a system call, like, e.g., a read(), we could run a thread that makes the read(), and another that starts the timer, and wait for one to end, but this also would not be perfect (the one that reads might get de-scheduled, letting the other come first). Upon time expiry a signal is can be sent. A problem is that once the handler has been set, and a signal unblocked, it can be delivered, and this can occur before the system call or function to time supervise is executed. If this occurs, we must not execute the call or function (otherwise it would not be supervised any more). This in practice is not a big problem because the chance that it occurs are few. However, the code must not contain the race all the same. A bigger problem is the expiry of the timer due to de-scheduling. This can be mended by using a long timer, or by using timed operations, when available.

Time supervision is a supervised block in which the kill signal is generated internally, and thus can be implemented as such:

Cancelling a thread when time expires can be done by starting an ancillary thread that waits for the given time and then cancels the other. This means to have two threads for each time supervised task.

Another solution is to start a timer that delivers a signal when the time expires. However, that signal is process directed, and not aimed to the thread to be cancelled. Timers deliver signals as specified in the sigevent struct, which has a way to specify thread directed signals, but timers do not use it. An alternative it to use a different realtime signal for each time supervised thread, but they are too few, and when sufficient, may need a monitor to be reserved and released.

Another solution would be a (timer) thread that accepts supervision requests from threads, and when an appointed signal occurs (generated by a timer) sends a per-thread one to the thread that is waiting for it. It can also discard external signals, but the threads that want to be time supervised must unblock a signal anyway, and thus have the same hole. The timer thread has to wait for requests and signals, it could do that with ppoll() or also with sigwait(). Requests can be made sending a signal that carries a malloc'ed struct containing the requester thread-id. A function can be provided that creates a timer, sends its id and the thread-id to the timer thread, waits for a reply and then arms the timer. It is precise because if the timer is armed instead by the timer thread, the other could be de-scheduled, and some time pass before it is scheduled again. When a timer sends a signal, is sends also a siginfo that contains the timer id. When the timer generates a signal, the timer thread gets it, searches the request that has that timer-id in it, removes it, and sends a given signal to the thread whose tid is in it. That signal can carry along the timer id, allowing the receiving thread to disarm it immediately. This eliminates the problem of stray timer signals because supervised threads can at the end disarm unconditionally the timer, and request the timer thread to remove the request. This solution is complex, and has the drawback that in the supervised thread there could be calls to library functions that do not test kill flags (and restart interrupted system calls unconditionally). After all, the purpose of placing supervised code in a thread is to avoid to use signals to kill it.

Another solution is to ask timers to run a function that makes a pthread_cancel(). To that function, a thread-id must be passed, cast to a pointer. This spares the implementation of the timer thread. There is a need to synchronize thread termination and timer termination. When the thread terminates normally, the timer must be disarmed because otherwise it can cancel a thread that no longer exists (which is not a problem), or worse, whose tid has been recycled. When it terminates because cancelled, the timer should be disarmed for the same reason. No temporal windows must exist in which killing or cancellation is done on a thread that does not exist. To achieve this, the timer must be armed at the beginning of the thread and disarmed at the end, before termination. Killing a thread by having a timer create a thread to do it is a bit expensive, but timeouts should occur seldom in well designed programs. Note that such a thread is created only when time supervision expires. Timers, however, when told to run a function, create upfront an additional thread (and another one when running the function), that is not terminated when they are deleted. This is a bug, and has been filed. Anyway, here it is this solution:

    typedef struct task_t {
        int timeout;                    // max execution time of task, in msec
        void* (*function) (void* arg);  // actual task function
        void* arg;                      // function argument
        void* retvalue;                 // function return value
        timer_t timer_id;               // id of the timer
    } task_t;

    // timeout function that cancels the thread
    void threadcancel(sigval_t sigval){
        pthread_cancel((pthread_t)sigval.sival_ptr);
    }

    // cleanup handler, that cancels the timer
    void cleanup(void* arg){
        task_t* task = (task_t*)arg;
        if (task->timer_id == NULL) return;          // argument not yet set
        timer_delete(task->timer_id);                // disarm and delete timer
    }

    // thread that starts/stops time supervision and executes the task
    void* thread(void* data){
        task_t* task = (task_t*)data;
        task->timer_id = NULL;                       // initialize task data
        task->retvalue = NULL;
        pthread_cleanup_push(cleanup,task);          // register timer cancellation

        struct sigevent event;
        event.sigev_notify = SIGEV_THREAD;
        event.sigev_notify_function = threadcancel;
        event.sigev_notify_attributes = NULL;
        event.sigev_value.sival_ptr = (void*)pthread_self();
        timer_t timer_id;
        if (timer_create(CLOCK_REALTIME,&event,&timer_id) < 0){
            pthread_exit(PTHREAD_CANCELED);
        }
        task->timer_id = timer_id;                   // remember timer it and attributes

        struct itimerspec itime;                     // arm timer
        itime.it_value.tv_sec = task->timeout/1000;
        itime.it_value.tv_nsec = (task->timeout % 1000) * 1000000;
        itime.it_interval.tv_sec = 0;
        itime.it_interval.tv_nsec = 0; 
        if (timer_settime(timer_id,0,&itime,NULL) < 0){
            pthread_exit(PTHREAD_CANCELED);
        }

        task->retvalue = task->function(task->arg);  // call actual task function
        pthread_cleanup_pop(1);                      // cancel timer
        return NULL;
    }

    // cleanup for convenience wrapper function
    static void cleanupt(void* arg){
        pthread_t thr = *((pthread_t*)arg);
        pthread_cancel(thr);                         // n.b. no error checking
        pthread_join(thr,NULL);                      // no errors can occur
    }

    // convenience wrapper function, that executes user function with timeout
    int timedTask(void* (*function) (void* arg), void* arg, int timeout){
        task_t task;
        task.timeout = timeout;                      // fill in time and function
        task.function = function;
        task.arg = arg;
        pthread_t th;                                // create thread to execute function
        pthread_cleanup_push(cleanupt,&th);
        int res = pthread_create(&th,NULL,&thread,&task);
        if (res != 0){
            return res;
        }
        void* status;                                // wait for its termination
        res = pthread_join(th,&status);
        if (res != 0){
            return res;
        }
        if (task.retvalue != NULL){                  // return value of function
            status = task.retvalue;
        }
        pthread_cleanup_pop(0);
        return (int)status;
    }

    void* funct(void* data){
        ... actual task
    }

    // example of use
    int res = timedTask(funct,argument,timeout);
    if (res){
        ... error
    }

A better solution is to place the actions to be supervised in a thread and to make the one that creates it control the passing of time. This creates only one thread per supervised block.

Waiting for a child process

This scrap of code can be placed in any thread, including one dedicated to wait for children. If the SIG_IGN handler has been set for SICGCHLD, there is no need to wait to recover the status of children, and children do not become zombies. Note that this is not the same as the default disposition (even if it looks similar).

A process must wait for its children even when they have been killed sending SIGKILL, unless it sets the disposition of SIGCHLD to SIG_IGN.

Waiting for children must be done when they are designed to terminate, and also when they terminate unexpectedly, while they where supposed not to do.

Waiting for children can also be done in a handler registered for SIGCHLD, but this is not convenient because it would interrupt a number of system calls even when registered with the SA_RESTART option. When there is no need to get the children exit status, the disposition of SIGCHLD can be set to SIG_IGN, and nothing else done; otherwise, children can be waited either by the thread that created them or by any other (e.g., a dedicated one). Moreover, waitpid() resumes only when a child terminates, and not when SIGCHLD is sent by any process, while a handler is run when SIGCHLD occurs, also when sent by non-children processes.

Setting SIGCHLD to SIG_IGN or to an handler with SA_NOCLDWAIT prevents children to become zombies. However, init() in Linux clears quickly orphaned processes (i.e., children whose parent died before them), and also zombies. There is thus no need to set SIGCHLD to SIG_IGN to avoid zombies.

system call	returns	what	delivers	blocks
`wait()`	pid	any child	status	yes
`waitpid()`	pid	specific child/any/group, and kind of state change	status	optional
`waitid()`	pid	specific child/any/group, and more kinds of state change	status, real user id	optional

waitpid() can wait for any child that has a specific process group. Note that there can exist processes that belong to the same group, but that are not children, and thus that are not waited for.

To wait for children, and collect their exit status:

a thread (the one that created children, or a dedicated one) can call one of these system calls (see above).
a thread (as above) can wait for the SIGCHLD signal, and then call one of these system calls.
a handler can be registered for SIGCHLD.

Solution 1 is the cleanest and simplest. It makes the executing thread wait until all the children terminate. If there is a need to wait for a specific child, it can be done. It is in general cleaner to make the threads that create children to take care of their breed.

Solutions 2 applies when a dedicated thread is used. Solution 3 requires a loop in which waitpid() is repeatedly called to get the status of all children that have terminated so as to compensate for the loss of some SIGCHLD signal. We can have a thread that does it, but in this simple case a handler spares a thread. Note that handling child termination with a parallel thread of execution (be it a handler or a thread) requires to keep track of the status of processes in some data structure, like, e.g., a list), unless it can be dealt with on the spot. Creating a process and appending an element to the list must be atomic with respect to signals for solution 3, i.e., it must be done with this signal blocked. Moreover, if process creation is done by several threads, it must also be protected with a mutex. Removing an element needs to change the list, and thus must be protected too. This bookkeeping can be placed in an atfork handler. Atfork handlers are not guaranteed to be mutually exclusive (although they are): they must protect if they want to.

Solution 3 does not allow to wait until all children have terminated. Moreover, waiting for a specific child to complete using SIGCHLD is not easy: the handler could receive signals for other children, thus preventing other parts of the program to use this signal to wait for them. Moreover, signals might be discarded because some previous ones are still pending.

For an example of solution 3, see: http://www.gnu.org/software/libtool/manual/libc/Merged-Signals.html#Merged-Signals.

Handling SIGSEGV and SIGBUS

SIGSEGV is generated when a memory address that does not belong to the process address space is accessed, or when an attempt is made to write into a read-only memory address. E.g., null pointers, page faults or stack overflow.

SIGBUS is generated when accessing a memory mapped file at an address that does not correspond to a position in the file.

Both these signals, after executing a handler, restart the very same instruction that caused them. Therefore, either the handler is able to cure the cause of the signal (e.g., map properly memory), or the handler must terminate the program or the affected thread.

SIGSEGV and SIGBUS are normally programming errors or failures, that can be handled with cancellation, reporting the exception (and therefore, when they occur in system calls, they must not make them return with an error, but terminate the thread). When they denote a programming error, little can be done but killing the process. When they are failures, and there is a means to recover them, the thread can be cancelled, the problem cured, and the thread recreated. In the latter case, the solution is then to put the code that might generate one of these signals in a thread, and cancel it when the signal occurs.

Do simply register/deregister a cleanup handler around statements that can cause one such signal. To tell the signal handler not to kill the process, but only the thread, a flag is set. This is the scheme:

The latter is meant to be used on some specific statement or system call that can produce a SIGSEGV or SIGBUS when executed, and not on a large section of code, that could generate them also because of some programming error.

Note that the handler cancels the interrupted thread using pthread_exit(), which is not asynch-signal-safe. However this handler catches a synchronous signal, that cannot occur while executing this function, and thus it does not interrupt an execution of it.

When they are not programming errors or failures, they are used to remap memory, and not to terminate processes or threads (or supervised blocks). However, when they are generated from within user code, their handlers are executed, and when returning they restart silently the code, but when they occur in system calls (e.g., because a return argument points to unmapped memory), the behaviour is dependent on the system call, and may range from error returns to partial execution of the system call. A program that wants to remap silently memory accesses can then provide a third case to the handler above, supplying it a user function to call instead of killing the process. That program, however, must either be sure not to execute system calls that can address unmapped memory, or to check any error that they return and act accordingly. This latter case is not treated further here.

When there is a need to perform some memory remapping under the hood, in the handler, system calls such as mprotect() and mmap() can be safely called provided that errno be saved and restored (they cannot cause a SIGSEGV to occur while they are executing, and therefore there are never two executions of them at the same time).

Stack overflow

Stack overflow can occur at any function call, and then likely in library functions that are not prepared to handle it, and that do not even register cleanup handlers.

The stack should be dimensioned to support the deepest nesting, without reckoning recursion. Detection of stack overflow should be done only in functions that use much stack, and this occurs typically in recursive functions. It is better to control the depth of recursion than to rely on the system detecting stack overflow. Recovering from stack overflow is difficult: it could be done only, e.g., when there are alternative, less expensive ways to perform some task. E.g., a function needing a large amount of temporary storage could allocate it on the stack declaring an automatic variable. If it fails because of stack overflow, it could be called again telling it to allocate on the heap (which is more costly).

When a function is called, and there is no room in the stack for it, its stack frame is not placed in the stack, SIGSEGV is generated, its handler executed (if any), and if the handler terminates the thread, its cleanup handlers are run on the normal thread stack. The first cleanup handler executed must then use less stack than the offending function call, otherwise the process is immediately terminated with a segmentation fault (i.e., the default SIGSEGV disposition). E.g., suppose a function opens a file, but being close to the stack limit, it causes a stack overflow, and that the close function requires more stack than the open one: the process is terminated immediately. This means that the functions devoted to undo some operation must be thrifty with the stack. Since cleanup handlers unwind the stack, the subsequent ones have more stack at their disposal. N.B. if a thread has no alternate stack, and the handler is registered with SA_ONSTACK, the normal stack is used. Note also that there is no way to terminate a thread bypassing its cleanup handlers, and neither there is a way to extend a bit the stack of a thread (so as to have room to run its cleanup handlers).

Threads that want to terminate gracefully when a stack overflow occurs, must create an alternate stack. Consequently, the SIGSEGV handler must be registered with the SA_ONSTACK flag:

An alternate stack is set for the thread that executes the operations that can cause the signal to occur so as to have room for the handler to execute (which may not be the case when a stack overflow occurs). Note that sigaltstack() sets an alternate stack for the signal handlers of the calling thread (and not the calling process, as said in the man pages) that are registered with SA_ONSTACK.

Discussion

SIGSEGV occurring while executing a system call and having a handler that cures the problem makes some system calls proceed instead of interrupting, including the ones that always interrupt when a signal occurs in them, and some others return with EFAULT (i.e., it is not transparent). This seems to occur when system calls, after having performed the requested operation try to copy the results in the user space (e.g., read()). It is documented nowhere, though. Note that in such a case, the error cannot be recovered: restarting them, another operation is executed. E.g., sigwaitinfo() when receives a signal it is waiting for (and has an argument in unmapped memory) returns EFAULT and does not run the handler; and when it receives a stray SIGSEGV it runs the handler and returns EINTR. In both cases it has received a signal, and so it cannot even be restarted. If sent by kill() it behaves as a normal signal. Another example is read() with a buffer in unmapped memory: it terminates with EFAULT, but it reads the data, and therefore it cannot be restarted.

SIGBUS also is not transparent (e.g., sigwaitinfo() returns with EFAULT even when there is a handler that repairs the problem, but for some other system calls, like, e.g., sem_wait(), it is transparent). It is either a programming error (misaligned data) or it occurs when a memory mapped file is accessed at a position that does not correspond to data in the file. N.B. It is not possible to assess the correctness of accesses to memory mapped files before executing an I/O operation because immediately after the test, the file could have been truncated by another process. This, though, can be avoided by locking the file. System calls that return with an error when a SIGBUS handler has repaired the problem could be restarted once. It is true that right after the handler has cured the problem, the problem could occur again (e.g., a file truncated again), which would need another restart. However, restarting indefinitely could make the thread enter an endless loop. Moreover, it is not possible to determine that the handler cured the problem and it was executed within a system call.

There is then a need for the programmer to check exactly what each system call (that might address unmapped memory) returns in order to restart it properly (if possible). This could then be done only on a case-by-case basis. However, do take into account that the behaviour of system calls is not specified in such a case, and thus it can change in the future. A better solution is to make sure that the data accessed by system calls are properly mapped in accessible memory, and to make it so when that is not the case. When accessing a memory mapped file, the file should be locked so as not to be truncated by other processes.

A handler that makes a siglongjmp() allows the thread to continue, although there is little that can be done there. Either the handler cures the problem, and then returns, or it kills the process or the thread, in which case it does not need to jump. However, for sake of completeness, here it is. Such a handler can be enabled only in a stretch of code that executes only asynch-signal safe operations. In practice, it is advisable to apply it only when there are no function calls in that stretch of code. E.g.:

    static volatile  __thread sigjmp_buf* jmp;
    void handler(int sig, siginfo_t* siginfo, void* context){
        if ((siginfo->si_code == SI_USER || siginfo->si_code == SI_QUEUE) &&
            siginfo->si_pid != getpid()){   // external signal
            ... process kill       // or return, if stray signals discarded
        } else {
        sigjmp_buf* tmp;           // save current
        tmp = (sigjmp_buf*)jmp;
        jmp = NULL;                // make it NULL: prevent further jumping after handler jump
        if (tmp == NULL){
            ... process kill
        }
        siglongjmp(*tmp,1);
    }

    ... handler registration
    struct sigaction sa;
    sa.sa_flags = SA_RESTART | SA_SIGINFO;
    sa.sa_sigaction = handler;
    sigsetmost(&sa.sa_mask);
    sigaction(SIGSEGV,&sa,NULL);

    ... supervised block
    sigjmp_buf buf;
    if (sigsetjmp(buf,1) != 0){
        ... cleanup
    } else {
        jmp = (volatile sigjmp_buf*)&buf;
        ... actions that may generate the signal
        jmp = NULL;
    }

For SIGSEGV, the handler can determine if the signal is internal also by testing:

    if (info->si_code == SEGV_ACCERR || info->si_code == SEGV_MAPERR){
        ... internal signal
    }

Do remember, however, that a handler must be agreed among all threads because all threads can receive these signals.

Exception Handling

In this chapter, error reporting for threads that use cleanup handlers and signal handlers is presented. Cleanup handlers perform exception handling, i.e. they execute any action that is deemed appropriate to cater for failures or other premature termination (e.g., killing). This applies to all signals that denote errors, and to the explicit checking of dynamic conditions whose violations denote errors.

Exception handling means basically detecting failures, cleaning up the process state up to a point at which something can be done to recover (which can be retrying, trying alternatives, or doing less) or terminating the process if nothing can be done.

In the first two cases we are not interested in what the process (and all its threads) was exactly doing, in the others we need a detailed error reporting to tell what went wrong:

SIGSEGV cannot be left to its default disposition (barely no error reporting), and stray SIGSEGVs cannot produce an error (but, e.g., a message such as "process killed on user request", as all other kill requests). SIGSEGV can occur at any place in the code (e.g., at memory accesses made with pointers), and normally denotes a programming error. SIGBUS, in some specific circumstances, can denote a failure (e.g., a system call that accesses a memory mapped file that is truncated), that the program wants to handle.

The only one other signal that denotes the violation of a dynamic condition is a time supervision one. However with it, reporting the circumstances of the error is not difficult. Almost all system calls that detect the violation of a dynamic condition return with an error and not with a signal.

All these cases can be handled performing graceful kill and reporting an exception to higher levels, being the last the one that circumstantiates the error with respect to the external process interface. This can be done recording the error data in cleanup handlers. When an internal signal denotes a failure, we should catch it close to the point (setting a cleanup handler that reports an error). For system calls that do not generate this signal, but return an error or interruption, pthread_exit() can be called, reporting an error. When an error occurs, most of the times we need to perform cleanup much the same as we do when killing. There could be cases in which at some level of nesting a specific recovery can be done, and then the thread not terminated. This can be handled telling the handler not to cancel the thread (but in many cases no recovery can be done). It is then appropriate that cancellation be used both to serve an external request and to handle errors.

When an internal signal denotes a programming error, it can be reported by telling at least the thread or topmost function being executed (with a cleanup handler). Printing the call stack would be quite useful, but unfortunately, there are no means to do it (unless a coredump is generated, and analyzed). It would be nice to tell the source line in error, or at least the program (instruction) counter. This seems quite a difficult thing in C: I have never found a linker listing map that tells where the modules are placed in virtual memory, and neither a C compiler listing telling the relative addresses of instructions in the module (compilation unit). The process could have other threads extant at that point, that must be cancelled without reporting any error.

Discussion

Threads are terminated either by cancellation (when performing graceful kill) or by exiting. In both cases, cleanup handlers are executed. Since cancellation can occur both because of an external request (in which case no error reporting is wanted), and because of an unrecoverable error, a per-thread flag (failing) can be used to control error reporting in them. It is set by handlers before cancellation or killing, and by treads before suiciding.

When a thread is terminated because of an error, some object is used to document it, and each cleanup handler can add chained exceptions to it. Cleanup handlers do not return any data, and thus they must leave exception data in some variable. In some cases it could be appropriate to pre-allocate a number of exception descriptors so as to have room to store error data (and not to generate out of memory errors trying to allocate them there). They would be allocated for each thread. An alternative is to pass an exception object to a thread when created. To easily chain exception objects, further exception objects can be linked in a circular list built on the one that is passed to a thread. The list can be made linear to ease visiting it when displaying exceptions. When there is no heap to allocate additional (chained) exception objects, chaining is simply skipped.

Note that passing up the call stack (or block nesting) the information that the operations (thread, function, block) failed, requires the transmission of some transient data, i.e., values that are passed on the fly (e.g., from pthread_exit() to pthread_join(), etc.) in some predefined container (e.g., a return value from a function) that does not need allocation and freeing. These values occupy that space for only some time. However, usually such containers are rather small, and passing values is thus not simple. E.g., the argument to the function executed by a thread when created is only a pointer, which means that some other container need be created when there is a need to pass several values.

Threads that are cancelled return PTHREAD_CANCELED (and unfortunately not a user value, which could have been the exception), and this happens when they are cancelled with pthread_cancel(); instead, when they are terminated with pthread_exit(), its argument is returned. However, to tell easily a failing thread from a successful one, PTHREAD_CANCELED should be returned in all cases a thread did not complete its task. When a thread failed, it is possible to tell if it did so because of an error, or otherwise: in the first case the passed exception object is part of a circular list.

Note that when a thread is created, and an exception object passed to it, that object must be initialized, either before passing it, or immediately after (in the created thread). It is pretty the same because thread creation is not a cancellation point.

There is no built-in way to tell if a thread is exiting or not, or in other words, if a function has been called by a cleanup handler or not. Unfortunately, it is also difficult to do it using a flag because it would have to be set when executing a pthread_cancel() (which is difficult to remember). Consequently, it is not possible to have only one function to throw an exception because if it is called from a thread it must do a pthread_exit(), while when it is called from within a cleanup handler it must not.

Each process must be able to decide how to manifest it is being killed, like, e.g., displaying a message on the controlling terminal, or simply replying with a signal. It can be done registering a cleanup handler.

Handling SIGFPE

SIGFPE is difficult to handle: it should not be ignored (otherwise the program behaviour is undefined), it should not be handled by the default disposition (which is to abort the process), and then it needs to be caught. However, in the handler the only alternative is to terminate the process. It is also conceivable to try to skip the offending instruction, but this entails a code that is dependent on the instruction set of the processor. Moreover, the signal could occur many instructions later, as Tydeman states. Although it is possible for a compiler to generate the appropriate instructions to prevent this, there is no guarantee that this is done. This makes this signal even more difficult to handle.

To handle the floating point exceptions, the functions in fenv.h can instead be used:

Handling SIGXCPU

SIGXCPU is sent to a process to notify it that it is reaching its limit of CPU consumption, and that soon after a SIGKILL will terminate it. The process has thus the chance to save or cleanup its important data. The CPU time limits are set as follows:

Handling timers

A thread can be used to catch the signals generated by timers. A timer can be told to create a thread each time it tics. This is quite costly, and thus can be done for timers that tic only once:

The created thread behaves as if it was detachable: it ceases to exist when its function returns. However, timers when told to run a function, create a thread for it, but create also another thread that is not terminated when timers are destroyed. This causes a leak of threads.

For timers that tic periodically, a thread can be created in advance, to wait for signals sent by the interval timers (without drift):

Handling SIGSTOP, SIGTSTP and SIGCONT

These signals are sent to processes to stop (i.e., suspend) them and to continue (resume) them. SIGTSTP is sent when typing ^Z, SIGCONT when typing fg at a shell prompt when a program has been run by that shell. All of them can also be sent with kill(). SIGSTOP cannot be caught. SIGCONT can be caught, but eventually resumes the process anyway. One use of catching these signals is to set, or reset, the terminal to the desired operating mode, or to redraw it, or redisplay a prompt. If something need be done when both a process is interactively stopped (SIGTSTP) and when it is continued, and the actions are very simple, this is the scheme:

If there is a need to do some (very simple) actions only when SIGCONT is received, then a handler for it can be set. Otherwise, a thread can be dedicated to them:

For signals whose disposition is to stop the process (i.e., SIGSTOP always, and SIGTSTP, SIGTTIN and SIGTTOU with the default disposition), the interruption occurs when SIGCONT is received. SIGTSTP, SIGTTIN and SIGTTOU can be caught, and when they do, they do not stop the process (and thus do not need a SIGCONT), but they interrupt the same system calls as SIGSTOP. To make them stop the process, their handling must raise a SIGSTOP. Unfortunately, in Linux, there are 15 system calls that are interrupted when a process gets stopped (see man 7 signal), and this applies to all the calls that are are being executed (in different threads).

Blocking SIGCONT has no effect, all threads are resumed all the same, and all 15 system calls that had been stopped are interrupted anyway. However, if a handler is in place, only one handler is run when the signal is sent. N.B. there is no way to stop and continue individual threads.

The handling of these signals is process-wide (i.e., only one thread gets the signals, as usual). Unfortunately, each thread could have its own actions to be done to react to them. This could be achieved by re-sending the signals to the threads that need to perform some specific actions when stopped or continued. A single threaded process that is stopped and then continued and has registered a handler for SIGCONT can execute some code before anything is done by the main thread, like, e.g., set properly the terminal. In so doing, the main thread could almost forget about stop/continue events. A multi-threaded one instead resumes all threads, and at most one executes a signal handler for SIGCONT before resuming its operation. This needs a different organization, like, e.g., making all threads that want to display something on the terminal to send requests to a dedicated thread, that handles also stop/resume events.

There are no handlers for SIGSTOP, but the caller can know that the interruption was due to it (actually, to a stop signal) registering a handler for SIGCONT.

In any code (supervised blocks or otherwise), there is a need to restart the 15 system calls that are interrupted by stop signals. It is not acceptable that a program fails because it has been stopped and then continued. The problem is that there is no built-in way to detect that they have been interrupted by such signals, and not by others.

The solution to restart system calls does not change if a handler is provided for stop (and/or SIGCONT) signals.

Rotating logs

Signals can be used to ask processes (or threads) to change the course of some action that they are doing repeatedly. This can be done informing them about the event with a flag. This is the case of processes that repeatedly execute a loop, or that perform frequently some actions. An example is processes that record log messages in some file. When there is a need to get the messages logged so far, the process can be told to close the current log file, and open a new one. Another example is processes that read a configuration file, and that can be told to reload it. If they execute repeatedly a loop, they can check if such a request has come.

This could be implemented registering a handler that sets a flag. However, that would interrupt many system calls that are not restarted automatically. A better solution is to use a thread to handle that signal. It can be handled by the control thread. This is an example of log rotation:

N.B. since the flag is shared between two threads, and since there is no guarantee that accesses are not reordered, the flag is implemented with a semaphore. The control thread posts to it to let the other know that there is a request pending, and the other tries to get it, rotating the log when it got a request, and proceeding normally otherwise.

Power failure, suspension and hibernation

In Linux there is no way for a userland application to be notified about the occurrence of power failure, computer suspension or hibernation (and also resume), with signals or otherwise.

Applications could need to perform some operations when these events occur, like, e.g., closing a dial-up Internet connection.

Parallel execution

Parallel execution in the user space occurs when a process:

creates a child process
creates a thread
executes a signal handler

Identification

Processes are identified by pid's, that are recycled. A test on an Athlon 64 X2 4200+ has shown that a process creating other processes in a row creates 32312 processes in 12 seconds before a pid is recycled. Of course, this is somehow an upper limit because these processes do nothing and terminate immediately. The time a pid takes to be reused is then fairly large, and yet not too large to prevent to kill a homonym. Thread-id's (tid's) are instead reused immediately. I.e. when a thread terminates, and another is created, the tid of the first is reused. With a 32 bits kernels and processes created every 300 μs the recycle time is 10^-4 ? (2¹⁵ - 1) = 10 seconds, and with a 64 system it is 5 minutes. On 64 bits kernels, the maximum pid is 4194303, and in 32 bits kernels it is 32768. It is configured in /proc/sys/kernel/pid_max. However, a system is never creating processes in a row like that, and thus pids are never recycled before a few minutes.

Creation

When a process is creating too many threads, it receives an EPERM as a result of threads creation. Concerning processes, after having created 24854 processes, the system practically hangs (it allows to kill processes, though).

A child, when created, does not have the threads of the father; it has only one thread and that is the one that forked it. If the father had several threads, with locks held by some of them, the locks will be duplicated in the child, but not the threads that hold them. There is then a need to be very careful in touching locks that are held by threads that do not exist in the child. Moreover, when a thread executes a fork(), others could be in the middle of updating shared data.

Mutexes and rwlocks have an holder, while semaphores do not. A mutex knows what thread locked it, while a semaphore does not (this allows deadlock detection). There are no means to make condition variables fork-safe, and therefore they must not be used after fork() (they can be implemented using locks).

To avoid races, no asynch-signal-unsafe function can be called before exec-ing. However, some libraries, like, e.g., the ones that perform I/O redirection are likely to be called after fork(). Some implementations of them are multi-threaded and use locks (e.g., dup(), fcntl()). Extant non multi-threaded programs may not follow the restriction to call only asynch-signal-safe functions. In order to avoid to provide a special variant of these functions to be used only in such programs, a means have been introduced to allow to use locks in libraries, and safely call them after fork(): atfork handlers. They should be considered something special for a special case, though: new programs must obey the rule to call only asynch-signal-safe functions after fork() (e.g., they cannot use locks). Programs must also avoid to use data that can be in an inconsistent state in a child.

Functions that use locks must also register atfork handlers in order to become fork-safe. This must be done either in an initialization function placed in the package in which they lie, or in the functions that use mutexes in such a way as to be executed only once. Remember that atfork handlers can be registered, but not un-registered. The prepare handler acquires locks, the father handler and the child handler release it. This is quite burdensome. However, it has been introduced for a special case. This is rather coarse because it makes every fork() acquire locks that might not be used in children.

The problem occurs when a thread executes a fork() while the state of the data is inconsistent, or when a thread holds a mutex, and the thread has no counterpart in the child (the mutex in the child will be locked by a nonexistent thread). A multi-threaded process has better to make an exec() soon after a fork().

A function that uses some global variables could be made thread-safe by protecting the access to such data with a mutex. This makes another thread call the same function wait a bit. However, here the picture is worse because a fork() does not wait for any mutex, much the same as a signal handler. They both run without paying any attention to mutexes. Or it is even worse because you can protect critical sections with mutexes in the realm of threads, and blocking signals in that of signal handlers, but you can do nothing when a fork is done.

Since a program can be made by several modules, and can call also several library functions, to preserve encapsulation, each module or library can make fork() reserve/release its own locks by installing its own handlers.

Moreover also all data in the father that contain pid's and tid's may be meaningless in the child.

The man pages do not say anything regarding what functions can be called from within atfork handlers. I guess that the asynch-signal-unsafe ones are allowed since the ones that release locks are so.

Concerning per-thread keys, threads have the same tid's in fathers and children. Thus, the main thread in a child has the same per-thread values as the creator thread in the father. What is not said is if the associations present in the father for other threads are deleted. E.g., it is likely that keys that associate non-existent threads with malloc-ed memory produce a memory leak. If a thread in the father creates locally a key, that key could be deleted with a child atfork handler. However, since between fork() and exec() little should be done, unused memory is not much of a problem there.

Suppose a process is inside a slow system call, and having caught a signal, it is executing a signal handler, and the handler makes a fork(). The child is inside the system call and the handler too, and when the handlers executions return, both system calls are aborted or are restarted.

Process and thread relationships

Let's consider a process with children and also with threads: the children are sons of the main thread. All threads share the same pid and father.

The difference between the main thread and the others is that when the main one falls off the bottom, an implicit exit() is done, while when any other thread falls off the bottom, a pthread_exit() is done. An exit() terminates at once all threads (it executes an exit_group()). Therefore, when the main thread falls off the bottom, all threads are terminated.

A process that creates a child and terminates leaves the child running, but re-parented to 1. Such child is said to be orphaned.

Let's have a process that creates a child that terminates before the father terminates. During the timespan between the child termination and the father waiting for it, the child is a zombie (and marked as <defunct> in the ps output). A zombie ceases altogether to exist when the father terminates because at that time it becomes orphaned and it is re-parented to init, which promptly waits for it.

Memory

Threads and signal handlers share the memory with the process that created them or was interrupted by them.

gcc supports __thread, that declares thread-local variables. This is much more handy than the use of thread keys. E.g.

    static __thread int var;  // declares a variable var, private for each thread
    ... var ...               // from within a thread addresses its private one

Children are created with a copy of the memory of their father.

Occurrences of errno refer to the thread specific one, controlled by the _posix_source feature definition, which is the default one. Actually, errno is always a macro that calls a function that delivers a thread private location.

Signal handlers address the errno of the thread to which the signal has been delivered, and that they interrupted.

Passing data between threads

Since all threads of a same process share the same address space, they can pass data using shared (static) variables, providing that they protect accesses (e.g., with mutexes), or do that when absolutely sure that no simultaneous accesses occur (and no reordering of accesses too).

Another means to exchange data is through arguments:

pthread_create() has a void* parameter. The creator can pass a scalar value (casting it) or a pointer to:
- static, or malloc'ed data
- automatic variables, but only when it uses joinable threads and joins them before terminating (otherwise the creator could have terminated before the created thread, and the variables have disappeared)
pthread_exit() has a void* parameter. The thread can pass a scalar value (casting it) or a pointer to:
- static, or malloc'ed data
- automatic variables of the creator (under the same condition as above)
the value is got by the thread that executes a pthread_join() passing the address of a return variable to it. Note that when the joined thread has been cancelled, the PTHREAD_CANCELED value is returned. When there is a need to pass data from a thread that might be cancelled, the easiest means is to pass a variable to it when it is created.

When a thread terminates because it is cancelled, all the data that it has malloc'ed (and that must not be returned, which is the normal case), must be freed. This can be done in its cleanup handlers. A thread can detect some condition that prevents it from continuing. In such a case it could terminate executing a pthread_exit(). This makes its cleanup handlers be executed. If there is a need to return malloc'ed data, they must not be freed. However, cleanup handlers have no built-in way to tell that they are being executed because of cancellation from being executed because of thread exiting. To distinguish these two cases, a per-thread flag can be set before calling pthread_exit(). Do also take into account that cleanup handlers have no means to change the value that is returned by pthread_join().

Note that a thread that has called a pthread_exit() and is executing its cleanup handlers is insensitive to cancellation. Therefore, when cancelling a thread, there is no guarantee that it terminates having freed all its malloc'ed data.

N.B. Terminate the main function of a joinable thread always with a return or a pthread_exit() because otherwise the value returned by pthread_join() is undefined.

Detaching a thread that is inside its cleanup handlers is allowed, it succeeds both either when that is due to exiting and cancelling. pthread_join() returns EINVAL on detached threads.

Thread termination and process termination

When a thread cancels the main thread, the process does not terminate. The main thread, after having executed the cleanup handlers terminates, but not the whole process. This is because cancellation performs a pthread_exit() at the end.

When the main thread returns, or falls through the end or executes an exit(), the process is terminated immediately, including all its threads (their cleanup handler, if any, are not executed). When the main thread executes a pthread_exit() it terminates, but the process becomes defunct and terminates only when all threads terminate (or one executes an exit()). It is possible to detach the main thread if it does not need to join the other threads. When any thread executes an exit(), the process is terminated immediately.

Improvements

This section contains a number of possible improvements to the current semantics of signals and cancellation in Linux. To my knowledge no one of these have been submitted to the Linux community.

We know that it is not possible to implement killing by simply acting a program counter transfer (except for few cases): there is a need to release all resources that were allocated when the transfer occurred. This cannot be done automatically in general. Therefore, killing works only if all the code to be supervised contains tests on kill requests and handles them, much the same as with cancellation. Thus, it is not possible to just take an existing piece of code (with library calls in it) and enclose it in some try block. Solutions:

a per-thread flag that tells the kernel to abort system calls when a signal is pending. In order to support non-killable sections, there would be a need to clear/restore the flag.
a per-thread (kill) flag that tells the kernel to abort slow system calls when it is set on entry or becomes so during suspension. It is similar to the interrupted status of Java threads. In order to support non-killable section (e.g., cleanup code), the effect of this flag need be enabled/disabled. A system call would be needed to test and another to clear it.
a new signal, that has no handler, and is kept pending until explicitly consumed (persistent signal). Slow system calls would abort when such a signal is pending upon entry, or occurs during suspension.
cancellation could be extended so as to delimit the scope of the activation of handlers. Currently, cancellation does not define a point in which cancellation can stop, but rather points in which cleanup handlers can be registered and removed. There would be a need for another type of cancellation other than asynchronous and deferred: inline. When this new kind is set, no handlers are registered, slow system calls are aborted when cancellation acted, and the program can test if cancellation has been done, and whether slow system calls have been aborted because of it. Reusing cancellation is a good idea because the code to detect cancellation requests is already in place in slow system calls. Library functions might have cleanup handlers in place. This means that they support traditional cancellation, but not the new one. If there were a means to reuse this, little changes would be needed to support killing. A special cleanup handler could allow to stop the execution of cleanup handlers, returning instead from the function that has registered it. This solution would leverage code that is already in place in slow system calls (to test cancellation requests), and would be consistent with the Open Group Base Specification not to introduce a new signal.
extend the feature of pselect() to the other slow system calls. This is costly.

They would allow to kill a single operation, while the minimum granularity (for the general case) now is the thread.

All these solutions remove the races described in the section of supervised blocks.

Note that not all blocking functions seem to be interrupted by signals (e.g., pthread_cond_wait()). Thus, if a signal were used to kill, it should interrupt them.

Aborted system calls would return with EINTR in order to allow the caller to know if they succeeded or not. This allows to know if resources have been acquired, and thus must be released when cleaning up.

A solution is persistent signals: a slow system call is aborted when a signal comes while the process is suspended in it, and also when, before blocking, a persistent signal has occurred. Alternatively, it could abort when a special kill flag is true upon entrance. It would have the same effect of blocking all signals, testing the flag, and if false execute the call and atomically unblock signals. Persistent signals would then not make handlers executed, but only abort slow system calls. This would be the paradigm:

    if (killFlag) ... kill
    (1)
    sem_wait(sem);
    (2)
    if (killFlag){
        (3)
        if (errno == EINTR){
            (4)
            kill
        }
        (5)
        sem_post(sem);
    }

Signal arriving at:

1: ok, it makes sem_wait() abort, and also if it occurs inside sem_wait()2: ok, it raises the killFlag, and thread can release sem
3: ok, duplicated signal
4: same
5: same

In order to protect cleanup code, signals can be blocked in it.

Persistent signals and kill flags allow to register resource allocation:

    (1)
    resourceAllocated = false;
    (2)
    sem_wait(sem);
    (3)
    if (killFlag){
        (4)
        if (errno == EINTR){
            (5)
            kill, but do not release sem
        } else {
            (6)
            kill and release sem
        }
    }
    resourceAllocated = true;

Signal arriving at:

1: ok, makes sem_wait() abort
2: ok, makes sem_wait() abort, and also if it occurs inside sem_wait()
3: ok, nothing
4: same
5: same
6: same

Basically, persistent signals interrupt the code, but since the handler does nothing, it lets the code proceed. Such signals make only the slow system calls abort. There is a need in the system calls to clear the signal. But clearing could also be done outside. Note that when there is a sequence of actions, and in them there are several cancellation points, there is a need to remember what calls have been interrupted to undo the ones that had not been interrupted.

The correct scheme after a system call is to test the kill flag because we can then handle the case in which the kill has arrived during or just after the system call. However, we can test if the system call has been interrupted, in which case we probably do nothing (except exiting), and let the test made at the next kill point to release the resource.

Here is a list of other possible improvements:

To have timers execute a pthread_cancel() (otherwise there is no way for a thread to time-supervise a sequence of actions and to cancel)
To have timers deliver thread-directed signals, and also for all the system calls that have a sigevent argument.
To allow pthread_cancel() to be called from within a signal handler (this is already possible, even if not documented).
To allow a process to tell what signals it wants to be generated only internally, so as that sending them from another process is forbidden, and doing so results in an error.
To know what signals interrupted a system call. Since several could have done it running their handlers while a system call was suspended, a signal set must be returned.
To tell in sigaction the thread to choose to get process-directed signals. An attribute can tell that the current thread gets them.
To be able to remove pending signals selectively, i.e., thread directed or process directed ones. Currently, when a pending signal is removed, and there are no thread directed ones, but there is a process directed one, that one is removed, while the removal of a thread directed one should remove none if there are none.
Per-thread signal handlers.
Not to interrupt system calls when a stop signal occurs.
To send SIGPWR to all processes when the computer is suspended (or hibernated), and SIGCONT when it is resumed. Also, suitable signals when a process is killed for any reason (run from a terminal that is closed, user logged off, system shut down, etc.).
To make the behaviour of system calls consistent when SIGSEGV and SIGBUS occur. E.g., they should always return an error without executing any operation (they could check they can write into return arguments before doing the operation).
To let a cleanup handler decide what to return when the thread is joined. It would allow to return exceptions when there are any.

Currently, signals can safely be used only in very few contexts (especially in a MT-process) because of the lack of these features.

Java

Java does not have a notion of asynchronously interrupting threads. It has a notion of interrupting them at well defined points during execution. When a thread is interrupted, an exception is thrown. This makes interruption blend seamlessly with the exception handling mechanism, allowing thus threads to set up exception handlers to catch them, deciding then to terminate or to repair and recover.

A thread can post an interruption request to another, placing it into an interrupted status that makes it resume with an exception at the next execution of one of a set of (blocking) methods (belonging to several classes). The same occurs if the thread is already blocked in one such methods.

Java does not have a forced thread kill. Thread.destroy() is not implemented. As a consequence, there is no way to cure an application that has a wild thread (as with POSIX threads).

Java programs can register shutdown hooks, which are threads that are executed when the program (actually, the virtual machine) receives a signal.

SIGINT, SIGTERM, SIGHUP make the shutdown hooks run. SIGQUIT produces a dump of the current threads and garbage collection statistics. The other signals abort the program, some with an error message.

signal	meaning	handler	thread
SIGHUP	recycle		yes
SIGINT, SIGQUIT, SIGABRT, SIGIOT, SIGTERM	graceful+forced kill		yes
SIGPWR, SIGXCPU	fast kill		yes
SIGSEGV, SIGBUS	memory access	yes
SIGILL, SIGFPE	illegal instruction, fpu exception	yes
SIGPIPE (same as EPIPE from `write()`)	pipe broken	ignore
SIGPOLL, SIGIO, SIGURG, SIGWINCH	I/O events		yes
SIGLOST	file lock lost		yes
SIGXFSZ	file size overflow	ignore
SIGALRM, SIGVTALRM, SIGPROF	time supervision	yes	yes
SIGUSR1, SIGUSR2	user-defined	yes	yes
SIGCHLD	child terminated		yes
SIGCONT, SIGTSTP, SIGTTIN, SIGTTOU	job control		yes
SIGTRAP, SIGEMT, SIGUNUSED, SIGSTKFLT, SIGSYS	debugger, or unused	ignore
SIGRTMIN+3 .. SIGRTMAX	user-defined		yes

system call	waits for	handler of waited signal	handler of non waited signals	returns
`sigwait()`	signals in argument	not executed	executed, wait not interrupted	signal
`sigwaitinfo()`	signals in argument	not executed	executed, wait interrupted	signal, `siginfo_t`
`sigtimedinfo()`	signals in argument, time	not executed	executed, wait interrupted	signal, `siginfo_t`
`sigsuspend()`	signals not in argument	executed	executed, wait interrupted	signal
`signalfd()`	signals in argument	not executed	executed, wait interrupted	`signalfd_siginfo`

system call	execution time	units
`pthread_sigmask()`	0.159088 μs	110.3 units
`sigemptyset()`	0.015451 μs	10.7 units
`sigaddset()`	0.006815 μs	4.7 units
`sigaction()`	0.153278 μs	106.3 units
`setjmp()`	0.008640 μs	6.0 units
`sigsetjmp()`	0.113223 μs	79.1 units
`longjmp()`	0.016844 μs	11.8 units
`siglongjmp()`	0.158201 μs	110.5 units
`pthread_testcancel()`	0.004869 μs	3.4 units
`pthread_setcancelstate enable/disable()`	0.034077 μs	23.5 units

system object	persistent (survives process end)	at process abort
file	yes	closed
file locks	no	released
temporary file	no	removed
named pipe	yes	close
unnamed pipe	no (when no references)	close
named semaphore	until reboot	nothing
unnamed thread semaphore	no	nothing
unnamed process semaphore	yes	nothing
message queue	until reboot	close
mutex, condition, barrier, rwlock, spinlock	no (shared by threads only)	nothing
socket	no	background close
shared memory object	yes	close
streams	depends	flushed and closed
child process	yes	defunct (re-parented to 1)
thread	no	terminated
timer	no	removed