Angelo Borsotti, November 2009
This document presents how to use signals in userspace Linux applications, showing several kinds of program patterns in C and C++ with them. Alternative (and better) patterns to signals are presented, too.
Signals are positioned in the threaded world, i.e., programs are assumed to be multi-threaded. Even a single-threaded program could call library functions that are multi-threaded, without knowing it, and thus become multi-threaded itself.
Sections in the document present first functionalities, and the code pattern to use to implement them.
Some dedicated sections are devoted to digging into the hows and whys of the proposed solutions. They have a colored background like this. Readers who are not interested, or have no time or will to spend to know more, can skip them.
Copyright (C) 2009 Angelo Borsotti.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
Copyright owner waives GNU Free Documentation License's obligation to include a copy of the licence text if redistributing the covered work or derivatives thereof.
Waiver: this document contain program examples. They are displayed in
monospace font.
Permission is granted to copy and modify them for
inclusion in programs.
Disclaimer: the same disclaimers of warranty and liability stated in sections 15, 16, and 17 of the GNU General Public License apply to the contents of this document.
Signals were introduced when there was no threading, and had two purposes: to interrupt the flow of execution when it was impossible to continue it (synchronous signals), to control it (stop, continue, kill), and to carry on activities in parallel with the man ones (e.g., asynchronous I/O).
With the introduction of threading, the last one, and partly also the second one, can be done better using threads.
Signals, while in principle a simple mechanism, in practice have a number of drawbacks and dark corners that make their use rather difficult. One of the drawbacks is that they interrupt the execution of a number of system calls that must then be restarted by the program. Another is that they interrupt execution of normal code asynchronously (instead of doing it at some defined places), and a further one is that their handlers can do a rather restricted set of actions.
This document shows how to perform the activities that traditionally have been done with signals, using signals when necessary, and using threads otherwise.
Signals notify a process of the occurrence of some event: the synchronous ones notify about violation of some condition regarding the instructions being executed; the asynchronous ones about some software or hardware event. Typically, applications handle signals in the following way:
Basically, there are three kinds of semantics for signals: the one to abort a sequence of actions, the one to continue seamlessly it, and the one to support a parallel thread of actions. The handling is then done either interrupting a course of actions, or using a parallel course of actions. The latter can be done either using signal handlers (when the operations to be done are very simple), or dedicated threads. Signals have been introduced in Unix long before threads, and allowed some form of threading inside a same process to handle asynchronous events.
Signals need be handled with signal handlers only in few cases:
pselect()
, ppoll()
or
epoll_pwait()
.Using a thread dedicated to handle signals is in most cases the same as using signal handlers, but without the restriction to run only asynch-signal-safe functions, and the nuisance of having system calls interrupted. There are few cases, however, in which handlers are needed. One is when synchronous signals occur. E.g., when accessing a memory mapped region at an address that is not mapped, we get a SIGSEGV or a SIGBUS, and in the handler we can map it, and return, thus making the code that accesses that memory completely unaware of this. We cannot use another thread because the signal is sent to the offending thread, and, even if we could, we could not restart the offending instruction.
This table reports the suggested handling of signals:
signal | meaning | handler | thread |
SIGHUP | recycle | yes | |
SIGINT, SIGQUIT, SIGABRT, SIGIOT, SIGTERM | graceful+forced kill | yes | |
SIGPWR, SIGXCPU | fast kill | yes | |
SIGSEGV, SIGBUS | memory access | yes | |
SIGILL, SIGFPE | illegal instruction, fpu exception | yes | |
SIGPIPE (same as EPIPE from write() ) |
pipe broken | ignore | |
SIGPOLL, SIGIO, SIGURG, SIGWINCH | I/O events | yes | |
SIGLOST | file lock lost | yes | |
SIGXFSZ | file size overflow | ignore | |
SIGALRM, SIGVTALRM, SIGPROF | time supervision | yes | yes |
SIGUSR1, SIGUSR2 | user-defined | yes | yes |
SIGCHLD | child terminated | yes | |
SIGCONT, SIGTSTP, SIGTTIN, SIGTTOU | job control | yes | |
SIGTRAP, SIGEMT, SIGUNUSED, SIGSTKFLT, SIGSYS | debugger, or unused | ignore | |
SIGRTMIN+3 .. SIGRTMAX | user-defined | yes |
Bear in mind that, while interrupts really interrupt the current process (unless it is running with interrupts off), a signal might not interrupt a process with the same promptness. Therefore, we cannot be sure that a process is scheduled so promptly as to get all signals sent to it. Realtime signals are queued, and then at least are not lost.
Ideally, the signals that need little time to be served could be handled immediately, and the ones needing a long time (i.e., a time greater than the time between two arrivals of a same signal) could be queued by a thread and handled by another. However, signals are not meant to notify processes so frequently as to require such front-end/back-end architecture.
Note that signals are somehow more asynchronous than are interrupts with respect to the program that expects them. The program that uses interrupts to perform I/O has enabled a device to assert them. They occur at a point in time that lies in the interval from the enabling instant onwards (and often within some defined time). A kill signal is more unexpected, even if a program that wants to treat it has defined a handler for it. An event that makes a program terminate abruptly (e.g., a power failure) is even worse because it cannot be handled and can lead to leave data (e.g., disk files) inconsistent. To handle the latter, a stronger form of atomicity must be used, e.g., the writing of a block, that is likely to occur or not to occur even in case of failure. Operations that are made atomic by using locks can be interrupted by a failure.
Signal are simple means to communicating events without creating persistent objects. They have been used for this before threads were introduced in Unix.
This is a list of program patterns in which signals come into play:
Note that, in accordance with the table above, most of these cases are handled with threads.
Since the disposition of most signals is to terminate the receiving process, possibly leaving persistent objects in an inconsistent state, signals need be handled. Take into account that a process can send all signals to another, even the ones that are meant to be generated only internally.
All uses of signals in this documents are shown with working examples. In them, frequently occurring actions are represented as functions that are hyperlinked. In actual programs, they need not be kept as functions; they can instead be inlined.
Error paths are indicated, but no error action is provided. Simple programs could handle them issuing an error message and then terminating; production-quality programs would provide appropriate error reporting and recovery.
This chapter describes how signals behave, and the basic techniques to deal with them.
Signals are sent either from processes to other processes, or by the system as a result of the violation of some condition (e.g., illegal instructions), or the occurrence of some event (e.g., the expiry of a timer). Sending signals shares some behaviour with interrupts:
The kernel very frequently (sometimes immediately, otherwise each time it switches from kernel mode to user mode, and at least almost every timer interrupt) checks if there are signals to deliver to running processes. If there are pending signals, the kernel takes one and consumes it:
The execution of the normal flow of control can be interrupted by several signals at the same time. It is up to handlers to block or ignore them so as to make the behaviour of a process manageable. E.g., a kill task signal should make a process ignore further signals of the same kind until it has handled the current one (not only in the handler, but in the process normal code also).
Signals are created:
Once created, a signal is pending.
When pending signals are given the target process and an action taken, they are delivered.
If delivering makes a signal handler run, the signals is caught.
Signals that are handled without the use of a handler are accepted.
In this document, handled means caught or accepted.
The lifetime of a signal is the interval of time between its creation and its delivery (possibly consisting in ignoring or discarding the signal).
The disposition of signals when a process makes an exec()
is
the default one for all signals that are caught by the creator, and the same as
the creator's for the others. In particular, signals that are ignored by the
creator are initially ignored by the created process. This allows a shell that
does not support job control to set the disposition of the interrupt signal to
ignore when it creates background processes so as not to interrupt them when
interrupting the foreground process. However, the standard shells of Linux do
support job control, and do not send interrupt signals to background jobs.
When the initial disposition (provided by the creator) is to ignore a signal, and a process wants to honor the creator's disposition to ignore signals, this is the scheme to be used:
struct sigaction sa; sa.sa_flags = 0; sigsetmost(&sa.sa_mask); // block most signals sigaction(SIGxxx,NULL,&sa); if (sa.sa_handler == SIG_DFL){ sa.sa_handler = sig_int_handler; sigaction(SIGxxx,&sa,NULL); }
Ignoring signals that are set to be ignored by the creator applies only to the job control ones, or to signals that are agreed upon between the creator and the child. Actually, it should be (and it is) up to a shell to decide what processes to kill when the user interrupts the foreground process, and not to the background processes to take care of it. The suggestion is then to use this only when there is a real need for it.
A process that is created with fork()
inherits the disposition
of the creator. A process created by clone()
can optionally
inherit the disposition of the creator. A process created by
pthread_swap()
can choose to inherit the ignoring of signals from
the parent.
It is possible to create a process with all signals blocked. (The mask is
inherited in execv()
and fork()
.) A process that is
created with all signals unblocked can be killed before it decides to handle
them. Since the signals that are ignored in the father are also ignored in the
child process, it is also possible to create a process with all signals
ignored, such a process can then register handlers as it likes without being
terminated by a signal before doing it. For processes created by the shell,
there is no way. The shell creates them as it likes. For ones created by a
user process, the solution is to fork, then block all signals, call exec, and
unblock them in the child. The process then has the possibility of ignoring the
ones it wants to (thus discarding any such signals possibly pending). It
would be fine to ignore all of them at the beginning, so as to clear the pending
ones, but it is unlikely that there are any. Note that the pending signals of
the creator are not pending signals for the created one.
When a signal is sent to a process, a thread that does not block it or a thread that is waiting for it is chosen randomly, and the signal delivered to it. Signals that are directed to specific threads are delivered to them.
When several signal are pending, one is chosen to be delivered with this priority ordering:
Signals can be sent using one of the system calls enlisted below. The API is not uniform:
system call | target | data |
kill() |
process (thread group), process group, all processes | no |
pthread_kill() |
thread | no |
sigqueue() |
process | yes |
All other system calls that send signals (raise()
,
killpg()
and tgkill()
) are wrappers to these.
Note that there is no way for a thread to send a signal with data to another
thread: The one system call that sends signals with data is
sigqueue()
, that sends signals to processes only. However, if
there is only one thread in the process that has registered a signal handler
for the signal, that thread will receive the signal and its data.
sigqueue()
can send a pointer as value: The pointer is an
address in the user space of the sender. It can have a meaning in that of the
receiver only if it is the process itself, or it has been created with
fork()
not followed by an exec()
, or it is a thread
of the same process as the sender. It does not in all other cases, and likely
it causes a SIGSEGV if de-referenced.
sigqueue()
allows sending signals with a value attached, not
only for realtime signals, but for the others, too. In Linux, all signals carry
the sender's pid and the data.
sigqueue()
does not interpret 0 as the pid of the current
process.
Note that there are no built-in means for a process to check that a process to which it is sending a signal can actually handle it. It would be nice if sending a signal could atomically return an indication telling if it succeeded (e.g., if the receiver has not set the disposition of the signal to ignore). This can be achieved by having the receiver send back another signal to acknowledge the handling.
After sending a signal to a process, always check its return status, since
the receiver process could have terminated. Both kill()
and
sigqueue()
return success when sending a signal to a zombie
process (that cannot obviously handle it). Do the same when sending signals to
threads. pthread_kill()
returns ESRCH when sending a signal to a
thread that has terminated but not yet joined.
A process can send a signal to another only if its real or effective user ID is the same as the real or saved set-user-ID of the other. The calls above return EPERM when the process does not have the privilege to send the signal.
Some signals are meant to be generated only internally (e.g., SIGSEGV). However, they can also be sent by other processes. The means to fend off them is described here.
A process receives signals when:
Note that a process does not receive any signal when the computer is
suspended, hibernated, or the system shut down (shutdown
).
Signals can be blocked (i.e., left pending, not causing handlers to run until unblocked). Note that blocking signals does not mean preventing a process to handle them: a process can have threads that are waiting for them.
Signals can get blocked because explicitly set in the process signal mask, or
because the process is executing a handler registered with
sigaction()
. When a signal is delivered, the kernel adds the
signals specified in sigaction()
to the signal mask of the thread
to which the signal is delivered, and restores it when the handler
terminates.
To block signals:
sigset_t newmask, oldmask; sigsetmost(&newmask); sigaddset(&newmask,SIGxxx); // a signal to block sigaddset(&newmask,SIGyyy); // another signal to block pthread_sigmask(SIG_BLOCK,&newmask,&oldmask); ... // to restore to the previous state: pthread_sigmask(SIG_SETMASK,&oldmask,NULL);
Note that the thread signal mask is counter-intuitive: bits are on for the signals that are blocked.
Blocking all signals except for SIGBUS, SIGFPE, SIGILL and is achieved with:
void sigsetmost(sigset_t* set){ sigfillset(set); sigdelset(set,SIGBUS); sigdelset(set,SIGFPE); sigdelset(set,SIGILL); sigdelset(set,SIGSEGV); }
If a thread creates a child process or another thread, it makes it inherit its set of blocked signals.
Signals that are not handled because blocked are pending. Actually, all signals between generation and delivery are pending. There is one set of pending signals, that is made of the process directed signals, and there is a set of pending signals for each thread, which is made of the signals directed to the thread.
sigpending()
delivers the set of signals pending for the
process united with the ones pending for the current thread. It does not remove
pending signals.
To clear a pending signal, sigtimedwait()
can be used, and
also setting its disposition to ignore (and then resetting it back again). They
have a different effect.
A solution is to use sigtimedwait()
, that returns immediately
if the timeout argument is zero, clearing any pending signals, if any. It
removes a thread-specific pending signal, if any, and, if none exists, a
process-specific one. If there are both, it removes only the thread-specific one:
sigset_t sigpend; sigemptyset(&sigpend); sigaddset(&sigpend,SIGxxx); struct timespec ts = {0,0}; // do not wait siginfo_t siginfo; while (sigtimedwait(&sigpend,&siginfo,&ts) == -1 && errno == EINTR);
Another solution is to set the disposition of the signal to ignore and then back to what it was before. This removes the thread-directed pending signals of all threads and the process-directed ones:
struct sigaction sa, oldsa; sa.sa_flags = 0; sa.sa_handler = SIG_IGN; sigfillset(&sa.sa_mask); sigaction(SIGxxx,&sa,&oldsa); sigaction(SIGxxx,&oldsa,&sa);
Note that a thread has no way to remove just a signal that is directed to itself: the first solution removes a process-directed signal if no thread-directed one exists, and the second removes all.
Note that a signal whose disposition is SIG_IGN and that is blocked, is delivered and kept pending. If its disposition is set again to SIG_IGN, it is removed from the pending signals.
Newly created processes by fork()
have no pending signals; but
exec()
makes the process inherit the signals that have been raised
between fork()
and exec()
.
A thread needs to delimit the intervals of time in which it receives signals, and handles them. It depends on the paradigm:
Delimiting can either be done by unblocking/blocking signals, registering/de-registering handlers, and enabling handlers using per-handler flags. The first two have a similar cost; the latter has almost no cost.
There is a need to make a distinction between internal and external signals. The internal ones are the ones that can occur as a result of some operation made by a thread. E.g., timers, memory accesses, completion of I/O operations. They are used in supervised blocks, and thus they must only be handled as long as a supervised block lasts. The external ones are requests made by other threads and processes. The remaining part of this section deals only with the external ones.
Sending a signal to a receiver thread is a two-step action, and each step has its own caveats:
Hooking the target means ensuring that a process to interact with is there and it is the right one (i.e., not an homonym). This means to wait, or be notified when it starts, and to be notified when it ends. This is not a problem for related processes since fathers are notified about the doings of their children. On the contrary, unrelated processes must communicate their creation. This can be done in several ways, as described below. Making a sender aware of the termination of the receiver is more difficult, because termination can occur unexpectedly. Some solutions are described below.
Communicating creation/existence:
inotify
wait for a message queue. Message queues have
fairly sized names (NAME_MAX: 255), and thus can support multiple
installations.inotify
wait for a file created in some known directory.
It supports multiple installations.Some of these means require a sender process to poll for an object to come
up; they differ in the time spent in polling. Polling for a process with a
given name (i.e., scanning repeatedly /proc
) is done until the
process is created. Polling for a queue is done until the queue is created,
creation that can be done by any of the processes that participate to the moot,
or even by some startup script; thereafter a sender can wait for a message to
come up (telling a receiver has started) without polling. The choice depends on
how the application is made, but some means are better than others.
E.g., the ones that allow multiple installations are better than the ones that
work on a single one; the ones that do not require manual configuration are
better than the manual ones.
This is the scheme for waiting for a message queue or a file to be created:
int fd = inotify_init(); // create inotify file descriptor if (fd == -1){ ... error } int wd = inotify_add_watch(fd,path,IN_CREATE); // path is the name of the directory if (wd == -1){ ... error } struct inotify_event* ev; int evlen = sizeof(struct inotify_event) + pathconf("/tmp",_PC_NAME_MAX) + 1; ev = malloc(evlen); if (ev == NULL){ ... error } for (;;){ ssize_t siz = read(fd,ev,evlen); // wait for notification if (siz == -1){ if (errno == EINTR) continue; ... error } if (siz == 0) break; // end of file if ((IN_CREATE & ev->mask) != 0){ if (strcmp(ev->name,filename) == 0){ // filename is the name of the file to watch break; } } } free(ev); int res = close(fd); if (res == -1){ ... error }
Message queues appear as files in the /dev/mqueue
directory (to
be mounted with mount -t mqueue none /dev/mqueue
if it is not
already mounted).
This is the scheme for finding a process with a given name:
int ret = 0; DIR* dir = opendir("/proc"); // open directory of processes if (dir == NULL){ ... error } int len = offsetof(struct dirent,d_name) + pathconf("/proc",_PC_NAME_MAX) + 1; struct dirent* entryp; entryp = malloc(len); struct dirent* res; for (;;){ // scan it if (readdir_r(dir,entryp,&res) != 0){ // get an entry ... error } if (res == NULL) break; if (entryp->d_name[0] == '.') continue; // skip the ones that are not processes if (!isdigit(entryp->d_name[0])) continue; char buf[256]; sprintf(buf,"/proc/%s/stat",entryp->d_name); FILE* fil = fopen(buf,"r"); // open stat file of this process if (fil == NULL){ ... error } int pid; char* comm = buf; // get the command that created it fscanf(fil,"%d %s ",&pid,comm); int len = strlen(comm); comm[len-1] = 0; // remove ) comm++; // remove ( fclose(fil); if (strcmp(comm,name) == 0){ // name is the name of the desired process ret = pid; break; } } free(entryp); if (closedir(dir) < 0){ ... error }
This scheme can be adapted to look for processes that match some given attributes.
Communicating termination:
wait()
, waitpid()
./usr/include/linux/cn_proc
)kill()
/proc
with inotify()
to know when a
process dies. Currently, it works only when the watcher and the watched
are related processes. This is a Linux bug.Note that using inotify()
to be informed about process
termination is not perfect: there is a race because the time from process
termination and the waking up of read()
is not null, and then the
sender could have sent a signal while the receiver is dead. However, this
should be not a big problem because this time is much lower than the pid
recycle time.
Sending a signal can only be done when the sender is sure that the receiver (pid) is the right one, and the receiver is there to accept the signal (and not to lose it). Since the sender has no cheap means to know that the receiver is there, signals can only be used when receivers are always ready to accept them.
Let's tackle the sending.
Threads and related processes know each other, and then can easily ensure that they exist when sending a signal: related processes exist until their fathers wait for them, threads exist until joined.
When processes are instead unrelated, a sender needs to know the pid of the receiver in order to send any signal. Even so, the pid could have been recycled (and the same pid be used by an homonym process). Then, there are two alternatives to prevent sending signals to homonyms: the first is to monitor the existence of the receiver, and invalidate the receiver's pid held by a sender so as not to send a signal when it no longer exists. The second is to use the data attached to signals to disambiguate between homonyms: the receiver process picks up a value that is likely to be unique (e.g., the time in seconds), and sends it along with its pid to senders. Senders attach it to the signals sent as a token. A receiver would handle the signal only if the token is the expected one. Note that this solution requires collaboration from receivers. However, even if the right receiver has been so instrumented, the recycled pid could have been taken by any other process in the system, that is not instrumented, and that reacts to the (unwanted) signal (possibly aborting). Therefore, it is not correct for a sender to send a signal without knowing that the receiver was the right one: the sender must make sure of it in advance. How to monitor receivers is detailed below.
Even having a sure target, senders should take into account that receivers present windows of time in which they accept signals. They depend on what kind of synchronization protocol one wants to put in place: if one wants a persistent one, like, e.g., semaphores, then it can block signals and use the realtime ones, which are queued and caught when the process decides to wait for them; if one wants a volatile one, like, e.g., condition variables, it can set up a handler or a thread that waits for them and handles them only when the process decides to have a look, and otherwise discard them. They differ in what happens when signals are sent and the thread is not waiting for them. The non persistent one is not easy for senders, because they can hardly know when a receiver is accepting signals. Accepting a signal sent from another process without making known when it can be accepted is like providing an aleatory service: the sender sends and, if lucky, it can obtain the desired response. This forces senders to poll sending signals until they obtain what they want (supposing that they can check it). Synchronization between processes is better done with means that are persistent. I.e., a sender sends or unlocks something, and a receiver comes to a waiting (or lock acquisition) point with its own time, and if it comes after, nothing is lost.
Note: Consider what happens with semaphores. They are persistent, and then the sender does not need to know that the receiver is already there. Named semaphores also can live for a span of time, and addressing them when they are no longer alive returns an error. Semaphores can be created also before processes that use them are there. Thus, one difference is that they have fewer problems with existence, and more with persistence. In other words, they are more persistent than processes, and thus are there when needed, but also when not needed (more than processes). A criterion is that when the lifetimes of the processes that communicate are such that when the sender sends a signal, the receiver is mostly there, signals are better; otherwise, semaphores are better. If the lifetimes do not overlap, the only solution is semaphores. There is a paradigm in which two processes need to interact, which requires them to be alive at the same time (at least as long as the intercourse has to be done, which means that one has to wait for the other to be created if it is not there), and another paradigm in which one sends messages to another, which might not even exist at that time.
To eliminate the race that occurs when using inotify()
to
detect process termination, there would be a need for a dedicated system call
that does the same as what is done between fathers and children: keeps a
receiver zombie until all senders have waited for them, or removed the watch. I
have the impression that there is a deficiency in *nix: processes needing to
send a signal to another should not be obliged to watch for the receiver to be
alive, they should just send the signal and get an error reply if the other is
not there, with virtually no problem of homonyms. If pids would recycle in a
very long time (which is the case of 64-bit kernels), this problem would not
exist. A solution is to have a process that creates the receiver, and that
keeps it zombie until there are senders that need it. It would keep a list of
senders, that is updated when a sender registers to it to receive
notifications.
Slow system calls are interrupted by signals. When a system call returns interruption, this means that it has not (entirely) fulfilled its task, because something happened which is not an error in the system call. Then either it is restarted, or it is abandoned because the overall operation has to be killed, or it is ignored. Ignoring is appropriate in cleanup only (e.g., as cleanup of file operations, a file can be closed without checking the result because nothing can further be done if the close fails). Restarting should be the default.
A number of I/O, file-locking, semaphore, etc. system calls (after having
been silently interrupted) are restarted if the handler that interrupts them
has been registered with the SA_RESTART option. Some 28 slow system calls are
never restarted: the ones with timeouts, the ones for signals, and a number of
others (see man 7 signal
).
If while a process is blocked in a system call, a signal handler runs and
executes a longjmp()
that makes the process restart its execution
from another point, the system call is never restarted. When a signal arrives,
and the process is waiting in a slow system call, the system call is aborted,
and possibly restarted if and when the handler returns normally. Handlers that
are registered with SA_RESTART make a system call restarted when the handler
returns normally. Upon jumping, errno
is not set to EINTR.
However, do take into account that a longjmp()
is not
asynch-signal-safe, and therefore it is plausible that the Linux documentation
does not specify what happens when one is executed in a signal handler.
Some 15 slow system calls are interrupted when the process is stopped with a
stop signal and then continued with a SIGCONT (see man 7 signal
)
making them return with EINTR when the process is continued (i.e., it receives
SIGCONT), except for sleep()
that returns a nonzero value. If
there are several threads that are suspended on such a call, all of them are
interrupted. N.B. the entries in man 7 signal
that are tagged:
"Linux 2.xxx" enlist calls that are interrupted only in that version of the
kernel. Since SIGSTOP cannot be blocked, ignored, or handled, and SIGCONT
continues anyway the process, EINTR (or interruption) must be tested after
system calls.
It is not possible to know what signal interrupted a system call (except for
the ones that wait for signals, like, e.g., sigwait()
). However,
something can be done by setting flags in handlers. Flags must be cleared
before executing a system call, and tested after:
flag1 = 0; flag2 = 0; ... (1) ret = syscall(...); (2) if (ret == -1 && errno == EINTR){ (3) if (flag1) ... (4) if (flag2) ... } (5)
If a signal whose handler sets a flag occurs between (1) and (2), it sets a flag, but it does not interrupt the system call. If the handler runs within (2), then the test at (4) is correct. If it runs between (2) and (3), then it has not interrupted the system call, and (4) is not executed. The same happens if it runs between (3) and (4). In all cases, several handlers could have run, setting their flags. E.g., a handler could have run between (1) and (2) and another within (2). Flags are then not a reliable means to tell what signal interrupted a system call, but flags whose meaning is to kill the overall operation can still be used reliably. When a kill flag is true, little matters if it has been raised before, during or after the system call. It should be tested before testing flags whose meaning is to restart the system call. (The opposite would make system calls restarted when they have completed successfully, and handlers run before or after them.) Kill flags could be tested irrespectively on the system call returning EINTR (i.e., right after (2)). However, this does not improve the program much: the kill request will anyway be honored the next time it is tested. Moreover, the program would handle a system call that completed successfully as if it had failed. N.B.: handlers could set an integer variable telling what signal they caught, but they would need to set it only if it is not yet set by more priority signals, taking then upon themselves the burden to manage priorities.
Note that a system call can have been interrupted by several signals.
The normal way to restart an interrupted system call is simply to execute it again. However, there can be special cases.
In some cases, an interrupted system call has partly fulfilled its task
(e.g., sleep(),
that returns the remaining time); when it has to be
restarted, likely it should be requested to do only the remaining part. This is
also the case of system calls with timeouts. Some system calls have a timeout
argument that is the absolute time, and some others the relative (elapsed) one.
They both denote a point in time beyond which the system call does not wait. If
the process is suspended, time still goes on, and, when resumed, if time has
passed that point, the system call must fail. Absolute timeouts are simpler to
restart, but in theory they could expire after they have been computed and the
system call to which they are passed is executed.
All system calls that have a timeout argument, except for
select()
, clock_nanosleep()
, nanosleep()
and sleep()
do not return the remaining time when interrupted.
System calls that require an absolute timeout can simply be restarted. For the
ones that requires a relative one, the absolute time must be saved at the time
the system call is executed, then the absolute time checked again when it is
interrupted, and the difference made to check if the defined time has expired.
If it has not, then the remaining timeout is passed, otherwise the system call
is aborted.
Restarting with the remaining time is done as follows:
// deliver in tend the current time plus ts void timeend(struct timespec* ts, struct timespec* tend){ clock_gettime(CLOCK_REALTIME,tend); tend->tv_sec += ts->tv_sec; tend->tv_nsec += ts->tv_nsec; if (tend->tv_nsec > 1000000000){ // normalize tend->tv_sec++; tend->tv_nsec -= 1000000000; } } // deliver in ts the difference between tend and the current time, // return -1 is negative, 0 if null, and > 0 if positive int timediff(struct timespec* ts, struct timespec* tend){ ts->tv_sec = tend->tv_sec; ts->tv_nsec = tend->tv_nsec; struct timespec tnow; clock_gettime(CLOCK_REALTIME,&tnow); ts->tv_sec -= tnow.tv_sec; ts->tv_nsec -= tnow.tv_nsec; if (ts->tv_nsec < 0){ // normalize ts->tv_sec--; ts->tv_nsec = 1000000000 + ts->tv_nsec; } if (ts->tv_sec < 0 || ts->tv_nsec < 0){ return -1; // time expired } return ts->tv_sec > 0 || ts->tv_nsec > 0; } ... before a system call struct timespec ts = max waiting time struct timespec tend; timeend(&ts,&tend); ... syscall(...); if (errno == EINTR){ if (timediff(&ts,&tend) > 0) restart the syscall }
It is possible to implement a library of functions that wrap the 28 system calls that restart them when interrupted. However, it is not advisable to use the same function names as the interrupted ones (and link the library before the standard libraries), because it would impair the implementation of task kill with signals, because kill requests must be tested before restarting.
Blocking signals and testing if they are pending allows to use signals in a non-event-driven way.
This is one of the solutions of the kill problem.
Most signals can be treated having a thread that processes them. This allows
performing I/O and all other operations that are not allowed in a signal
handler. Realtime signals are queued by the kernel. (There is a limit to the
signals queued, that can be changed with setrtlimit()
.) Concerning
the others, queuing is needed only when they have low inter-arrival times, that
sometimes are lower than processing times, which is seldom the case. Otherwise,
there is a need to use a paradigm that is similar to that of drivers: having a
top-half and a bottom half. By far, the simplest way to implement it is to have
a thread that catches signals and queues them, and another that processes them.
If the speed of the former is not sufficient, queuing must be done in a signal
handler, but synchronization with the consumer thread is quite difficult
because there are no system calls that can be used to do it. Luckily, in
practice there is no need for it.
The codes of realtime signals are between SIGRTMIN+3 and SIGRTMAX (the first three are used by Linux). Their default disposition is to terminate the process.
To wait for a signal:
system call | waits for | handler of waited signal | handler of non waited signals | returns |
sigwait() |
signals in argument | not executed | executed, wait not interrupted | signal |
sigwaitinfo() |
signals in argument | not executed | executed, wait interrupted | signal, siginfo_t |
sigtimedinfo() |
signals in argument, time | not executed | executed, wait interrupted | signal, siginfo_t |
sigsuspend() |
signals not in argument | executed | executed, wait interrupted | signal |
signalfd() |
signals in argument | not executed | executed, wait interrupted | signalfd_siginfo |
To wait for a signal, sigwait()
is the best; to wait and get
the attached data, the best is to make a loop with sigwaitinfo()
,
cycling when it returns EINTR.
sigwait()
does not run handlers, and does not need a handler
registered for the signals it is waiting on. It is as if it changed the
disposition of such signals to be accepted by it. It overrides also the
"ignore" (SIG_IGN) disposition. However, this seems to be a borderline case,
and thus it is better not to have the disposition set to "ignore" for signals
to be waited on.
N.B. sigwait()
is meant to be called with signals blocked, and
returns also with signals blocked: it allows to use them like events. If a
signal that is not in the set passed to a sigwait()
call occurs
while the thread is suspended in it, the signal is handled according to its
disposition.
siguspend()
is similar to sigwait()
, but it is a
bit twisted: its argument denotes the complement of the signals to wait for.
Moreover, the signals to wait for must have an handler. Otherwise, the default
disposition is used, which is mostly to terminate the process.
This snippet waits for a signal to come. While waiting for the signal, the other signals are served.
sigset_t mask, oldmask; sigemptyset(&mask); sigaddset(&mask,SIGxxx); // signal(s) to wait for sigprocmask(SIG_BLOCK,&mask,&oldmask); int sig; // return signal that makes sigwait return sigwait(&mask,&sig); sigprocmask(SIG_SETMASK,&oldmask,NULL); ... process resumed
To devote a thread to handle a signal, a handler can be registered that re-sends the signal to the designated thread if it is caught by the wrong one:
void* thread(void* data){ increasePriority(); // increase thread priority sigset_t mask; sigsetmost(&mask); // block most signals for this thread pthread_sigmask(SIG_BLOCK,&mask,NULL); sigemptyset(&mask); sigaddset(&mask,SIGxxx); // signal(s) to wait for int sig; sigwait(&mask,&sig); // accept the signal ... return NULL; } static pthread_t th; static void handler(int sig){ if (!pthread_equal(pthread_self(),th)){ // it has been caught by the wrong thread pthread_kill(th,SIGxxx); // redirect it } } int main(int argc, char* argv[]){ ... struct sigaction sa; // register the handler sa.sa_flags = SA_RESTART; sa.sa_handler = handler; sigemptyset(&sa.sa_mask); sigaction(SIGxxx,&sa,NULL); int res = pthread_create(&th,NULL,&thread,NULL); // create the signal thread if (res != 0){ ... error } ... here the signal is re-sent to the signal thread }
This solution has a drawback, though. The thread that catches the signal, which could be any, could be executing a slow system call. Some slow system calls are never restarted when interrupted by a handler. They must then be restarted by the thread code. N.B. some pthreads functions are called in the handler. They are not in the list of the asynch-signal-safe ones, but are actually so in the Linux implementation.
There is another solution, in which the signal is blocked by all threads, except the one devoted to handle it so that the signal is delivered to it. The easiest way to do it is to block signals in the main thread before creating the others. But in so doing, any thread that creates a child with a fork() creates a child with that signal blocked, unless it unblocks it in the child.
void* thread(void* data){ increasePriority(); // increase thread priority sigset_t mask; sigemptyset(&mask); sigaddset(&mask,SIGxxx); int sig; sigwait(&mask,&sig); ... return NULL; } int main(int argc, char* argv[]){ sigset_t mask; sigsetmost(&mask); // block most signals for all threads pthread_sigmask(SIG_BLOCK,&mask,NULL); pthread_t th; int res = pthread_create(&th,NULL,&thread,NULL); if (res != 0){ ... error } ... a fork here or in a thread creates a child with the signal blocked }
We shall see that there is no way to overcome this drawback by attempting some other solutions.
None of these alternatives is a solution. The fact that the thread to which
a process signal is dispatched is chosen between the ones that do not block the
signal is a problem. This means that a library that creates threads to make its
job faster can have such threads receive signals that other threads do not want
to handle. A program could have a main thread that does not block a signal, a
dedicated thread that blocks it, and possibly other threads that (e.g., being
created from within library functions) do not block it either. They could get
the signal instead of the main thread. The scheme to have only one thread catch
a signal is to block it in the main thread, which makes all threads generated
afterwords and not changing their signal mask have it blocked, and have one
dedicated thread wait for the signal. This has the drawback that also the
forked processes would have it blocked. Now, since it is undefined what thread
can get a signal, if we want to devote one to catch it, the signals must be
blocked for all threads. This means also that fork()
must unblock
signals if it wants to use the default disposition. The conclusion is that
although it is possible to catch signals in the main thread and dispatch them
to some other dedicated threads (synchronizing them with semaphores, Peterson's
algorithm, wait free data structures), this does not avail because we must
still block the signals to handle in all other threads (otherwise we are not
sure that the main catches them). We can then as well use dedicated threads and
catch them directly with sigwait().
N.B.: By the way, this drawback is not so bad, because processes must block signals anyway when started, and unblock the ones that they choose to handle.
When dedicating a thread to handle signals, increase its priority:
void increasePriority(){ int error; pthread_attr_t highprio; struct sched_param param; int policy; if ((error = pthread_attr_init(&highprio)) || // get current priority (error = pthread_attr_getschedparam(&highprio,¶m)) || (error = pthread_attr_getschedpolicy(&highprio,&policy))){ ... error } if (param.sched_priority < sched_get_priority_max(policy)){ param.sched_priority++; if (error = pthread_attr_setschedparam(&highprio,¶m)){ ... error } } else { ... cannot increase priority of thread } }
signalfd()
creates a file descriptor on which signals can be
received; without a need for handlers. It can help when one wants to wait for
an operation on a file descriptor (e.g., an I/O operation) and a signal with a
select()
. There are no races: the signal is not lost since there
is an atomic operation that allows to wait and to detect the signal at the same
time. This supersedes the practice to register a handler that sends a byte to a
pipe so as to get it with a select()
or read()
. This
works if the process waits for some input, and wants to interrupt the wait with
a signal (but make sure that there are no races when cancelling the input
request when a signal occurs and is got with a select()
). The
running of a signal handler, instead, works with all the slow primitives. This
is a means to make a signal persist until it is handled: that signal, which
must be blocked before calling select()
or read()
is
not lost if it occurs when the thread is not suspended, and it is handled at
the first select()
or read()
executed.
pselect()
provides the same functionality: to wait on file
descriptors and also on signals. signalfd()
provides a unique
interface (i.e., a file descriptor) towards events, which is more flexible than
sigwait()
because it can be used in select()
. The
scheme is:
sigset_t mask, oldmask; sigemptyset(&mask); sigaddset(&mask,SIGxxx); // signals to accept sigaddset(&mask, ... ); pthread_sigmask(SIG_BLOCK,&mask,&oldmask); // n.b. not needed if already blocked int sfd = signalfd(-1,&mask,0); if (sfd < 0){ ... error } struct signalfd_siginfo si; // returned info on the signal ssize_t res; res = read(sfd,&si,sizeof(si)); if (res < 0){ ... error } if (res != sizeof(si)){ ... error } if (si.ssi_signo == SIGxxx){ ... handle signal } else if (si.ssi_signo == ...){ ... handle signal } close(sfd); ... clear any pending signals pthread_sigmask(SIG_SETMASK,NULL,&oldmask);
The use of signals must be placed in the multi-threaded context: programs that are single-threaded must be implemented as if they were multi-threaded. A single-threaded process might call a library function that internally creates threads, and thus becomes multi-threaded itself (possibly without the programmer knowing it).
There are process-directed signals and thread-directed
signals. SIGSEGV, SIGFPE, SIGBUS, SIGILL, SIGSYS, and the ones generated with
pthread_kill()
are directed to specific threads; the others to the
process.
Each thread can block incoming signals on a per-signal basis: each thread
(including the main one) has its own signal mask. Blocking signals on a
per-thread basis is also the way to tell what threads get what signals: a
process-directed signal is delivered to a thread chosen between the ones that
do not block it or are waiting for it, if any. E.g., if two threads call
sigwait()
for the same signal, an unspecified one is chosen.
Moreover, each thread has its own pending signals.
The functions pthread_sigmask()
and sigprocmask()
apply to processes, threads, and signal handlers and deliver the same results
(even if the documentation states that sigprocmask()
has an
unspecified behaviour on multi-threaded processes). When a handler runs, the
signal mask is the one set by sigaction()
, or-ed with that of the
thread to which the signal has been delivered.
All threads share the same signal dispositions. E.g., sending SIGKILL to a thread (i.e., from a thread to another) kills the process.
Thread-directed signals can be handled in a thread specific way by threads that wait for them, whereas signals that are handled by signal handlers are treated in the same way even when they are caught by different threads (unless handlers distinguish among interrupted threads).
Signal handlers are a sort of shared resource. This is a problem only for signals that need a handler, and not for the ones that are accepted because we can have several threads waiting for the same signal, and handling them differently. Then, lay out a process-wide plan for using signals, defining what threads handle what process directed signals, and what signals are served with handlers. A thread that calls a library that sets a handler can disrupt other threads that relied on the handlers that were in force before. Then, in libraries, never register handlers.
Realtime signals have no predefined meaning, and thus can be used freely. However, in general they cannot be used as resources that are assigned to threads (e.g., for timers): there are processes that create a variable number of tasks, and with them it would be easy to run out of signals. They must be assigned a process-wide office, or none.
A solution to provide per-thread signal handlers is to use a thread private variable to hold the per-thread function pointer to the handler:
typedef void (*funct_t) (int sig); static volatile __thread funct_t function; // per-thread function pointer void handler(int sig){ // unique, per-process handler if (function == NULL) return; // actual handler not present function(sig); // call actual handler } ... handler setting, e.g., in main()
struct sigaction sa; sa.sa_flags = SA_RESTART; sa.sa_handler = handler; sigsetmost(&sa.sa_mask); sigaction(SIGxxx,&sa,NULL); ... in a thread function = myhandler; // set the handler ... some action function = NULL; // reset it
Note that the (process-wide) handler is set only once, and all threads share
its registration flags, and also the set of signals that are blocked in it (and
thus in all per-thread handlers). This is not much of a restriction since these
settings are quite common. Threads can also have nested sections in which they
save and restore and set new handlers. Note also that the per-thread handler
runs in the context of the thread that caught it originally (e.g.,
pthread_self()
denotes the original thread, as well as thread
private variables). This solution then applies to thread directed signals that
need a thread specific signal handler, and not to process directed signals.
Note also that this does not work with existing libraries that set handlers
(that would not work anyway since they conflict with other uses of the same
signals, unless we redefine sigaction()
with a library that is
linked before libc
.)
While in theory it would be preferable to let each thread have its own handlers, in practice not having this is not a big restriction since handlers are not that much used.
Threads execute simultaneously with other threads, also when they have caught a signal, and are thus executing a signal handler. Therefore, several simultaneous executions of a same signal handler can exist even if the handler has been registered with no SA_NODEFER flag.
Signal handlers may run at any time (asynchronously) when signals are
unblocked in a piece of program. Some system calls (e.g.,
pselect()
, ppool()
, sigwait()
) are
called with signals blocked; they unblock signals only when waiting. In this
case, handlers are not run asynchronously since they can be invoked only in
well defined places in the program. When signals are delivered, if more than
one signal is pending, the system can (and usually does) run all the handlers
of such signals before returning to the execution of the normal code.
Signal handlers must be registered with sigaction()
. The
sigaction struct argument can be reused: it conveys only data to a
sigaction()
call, which does not use the struct afterwords. There
is a need to initialize all fields (e.g., the mask of blocked signals) before
passing it. Example:
void handler(int signo){ ...} struct sigaction sa; sa.sa_flags = ...; // any combination of flags sa.sa_handler = handler; sigemptyset(&sa.sa_mask); sigaddset(&sa.sa_mask,SIGxxx); // signals to block in handler sigaction(SIGxxx,&sa,NULL);
Or:
void handler(int signo, siginfo_t * si, void * context){ ...} struct sigaction sa; sa.sa_flags = SA_SIGINFO | ...; // or any combination of flags sa.sa_sigaction = handler; sigemptyset(&sa.sa_mask); sigaddset(&sa.sa_mask,SIGxxx); // signals to block in handler sigaction(SIGxxx,&sa,NULL);
It is possible to set a handler for several signals and then use a switch in it to define the actions for all signals handled. It is also possible in it to change the actions for a signal by testing a flag that is set in the program (but care must be paid to avoid races: see this example).
In handlers:
errno
.errno
when some system functions are called
(or any other function that changes it). Even better, always save and
restore it.static volatile
sig_atomic_t
. However, any integer or pointer type can also be
used.volatile
. E.g., when a variable
is initialized and a flag set to tell it, both must be declared
volatile
(accesses to volatile
and
non-volatile
variables can be reordered). This pattern works
properly also when the variable's type is not sig_atomic_t
if
the handler accesses it only after testing that it has been initialized.
This constraint holds in the thread code only since handlers do not
interrupt themselves (unless they are explicitly told to do so).__thread
(e.g., jump buffers that are set by a thread and
used by its handlers).__sync*
gcc built-ins, or libraries alike, e.g.:
libatomic_ops (or atomics
in the upcoming C++0x standard when
gcc will support it, i.e., some version beyond 4.5).static volatile
__thread sig_atomic_t
.Since it is not stated whether the standard string functions are re-entrant, you should assume that they are not. If you plan to make handlers do something more than barely raising a flag, you may need to build a library at least to manipulate strings to debug the handlers.
Handlers can be told to use an alternate stack when running. This is not much useful in normal cases. (It allows handlers to run also when a stack overflow has occurred, and a SIGSECV or SIGBUS generated.)
Handlers can access global variables declared static volatile
sig_atomic_t
, but, when they access other data (e.g., doubles, composite
data (structs), etc.) they may find them in an inconsistent state. When such
data need to be accessed in handlers, they must be updated by threads in
critical sections with signals blocked. static volatile
sig_atomic_t
name is the way to declare global variables that
can be accessed from within signal handlers and outside them without a need to
block signals. It ensures that reads and writes cannot be interrupted in the
middle.
Sequential consistency is not guaranteed between a thread and a signal
handler that interrupts it (except when the handler is executed because of
abort()
, kill()
or raise()
). Actually, a
signal handler interrupts a sequence of steps that could have been reordered
during compilation or execution. The only guarantee concerns
sig_atomic_t
variables, whose accesses are never interrupted in
the middle. (I.e., loads and stores, when initiated, are carried out to
completion and the thread interrupted at the end of an access, but not in the
middle of it, which can be done for other variables whose loads and stores
involve several memory references.) Consider, e.g., a thread that updates two
variables in a known program order, the second one being a flag that tells that
the first has been changed. From within a handler, you cannot count on them be
updated in this order unless both have been declared volatile
.
Most of the problems of handlers accessing global data are solved by having, instead, a thread that waits for signals, which means that once it has resumed, it can handle them in a thread context (in which it can use mutexes, for example).
The safest way is to make handlers do as little as possible, and then do what has to be done outside handlers.
Handlers must not call system calls that are not asynch-signal-safe, or
functions that are not so. There is a difference between thread-safe (aka
MT-safe) and asynch-signal-safe. A function that is not re-entrant can be made
so by protecting it with a mutex, providing that it does not call itself.
(However, a truly re-entrant function can be recursive, while one protected with
a mutex cannot.) Of course, a recursive mutex can be used (but a function that
uses a mutex probably does so because it uses global data, which makes it
intrinsically non-recursive; to allow to call itself, it must be changed to use
automatic data). Functions that have a state, such as malloc()
, are
difficult to make re-entrant. They could be made so by temporarily blocking
signals inside them. A truly re-entrant function is also asynch-signal-safe,
while one that has been made so with an internal mutex is not
asynch-signal-safe. The difference, then, between a thread-safe and an
asynch-signal-safe function is that the former could access global variables
and protect them with a mutex (threads will sequentialize in accessing that),
while the second cannot. Such a function cannot be asynch-signal safe, because,
if it is interrupted, and a signal handler runs, it will deadlock (besides
waiting on a mutex not belonging to the list of functions that can be called in
handlers). Moreover, the latter could protect accesses to global variables
blocking signals, while the former cannot. (Blocking signals does not prevent it
from being executed contemporaneously by different threads.)
Note that a function (and thus a handler) that sets a global flag is probably not considered re-entrant by canonical definitions, but actually it is.
There is an exception to the rule above that forbids handlers to call asynch-signal-unsafe functions: a handler that does not interrupt one such functions can call them. This can occur, e.g., with some synchronous signals. This holds between a thread and a signal handler that interrupts it. That handler can call one such function even if the very same function is called by other threads. I.e., the restriction holds only between a thread and a handler that interrupts it.
In practice, most system calls are actually asynch-signal-safe except for
the assignment to errno
, which can easily be handled in signal
handlers. (The list of system calls is contained in syscall.h
.)
pthread_sigmask()
is the same as sigprocmask()
(with some checks added), and therefore it is actually asynch-signal-safe.
pthread_kill()
executes a tkill()
, which is a system
call, and thus is actually asynch-signal-safe.
Handlers may be empty, in which case they serve only to interrupt slow system calls, or to ignore signals, or contain some code that accesses global variables, in which case they are almost always non-reentrant.
When a handler is in force in a program piece in which no asynch-signal-unsafe functions are used, such functions can be used in the handler because they will not interrupt themselves. However, this is a dangerous programming practice, because when the program undergoes some maintenance change it is easy to forget this hack, and add some asynch-signal-unsafe function calls to the program piece, thus breaking the program.
There is no need to do I/O in signal handlers, except for debugging and for making another I/O request in signal-driven I/O.
In a signal handler, write()
should not be used: it is better
to defer it to some thread. However, it could be used for debugging, and it is
better than printf()
, which could make the program abort (when the
signal interrupts a printf()
). A thread that is executing a
write()
and is interrupted by a signal handler that executes
itself a write()
may produce a partial output followed by that of
the handler. Blocking signals in handlers spares at least to test EINTR after
write()
s. Linux guarantees atomicity of writes to pipes only (and
only when the amount of data is lower than a configured limit). A thread can
test if the data to be written have been actually emitted, and retry when that
is not so. However, this means that output can be intermixed, and there is
little that can be done, because locks cannot be used in handlers.
There is no built-in way for a function to tell if it is being executed in a handler or in the mainstream code of a thread.
A slow system call, called in a handler, returns with EINTR when another handler interrupts it.
Handlers should not interrupt each other: it only makes things more complicated, and adds little. Since handlers must not perform long operations, there is no point in interrupting each other.
When a signal is delivered to a process, a thread is chosen among the ones
that have not blocked the signal or are executing a sigwait()
for
it. If one such thread exists, then it is interrupted, and the signal handler
(if any) runs. When the handler runs, it blocks the signals specified when
registered by ORing them to the suspended thread signal mask. Note that while
the handler is executed, the thread is considered to be executing, too. (It is
just in another place, out of the main road.) Actually, a thread can cancel
another without bothering, if the latter is executing its mainstream code or if
it is in a signal handler.
There is some knowledge of the interrupted thread from within a handler.
Inside the handler, pthread_self()
and gettid()
deliver values that are the same as the ones delivered from within the
interrupted thread (but this is told nowhere, and these functions are not
asynch-signal-safe). However, not all thread functions behave the same when
called from within a thread and a handler. E.g., a handler cannot change the
thread signal mask permanently.
Handlers interrupting different threads can run in parallel. Moreover, a handler runs in parallel, with threads other than the interrupted one. This means that a handler can safely update data that are accessed also by the interrupted thread (including operations that need to perform multiple accesses, since a handler is atomic with respect to the interrupted thread), but not data that are accessed by other threads.
Library functions that internally need to handle some signal must restore the signal handling when they return. A library function:
In libraries, refrain from setting a handler, because it could disrupt signal handling in other threads.
Libraries must:
The default disposition of many signals is to terminate the process. A process should then register handlers or create threads for all these signals, or block them if it does not want to be terminated from the outside by them. There is a protection mechanism that makes processes receive signals only from other processes with the same real or effective UID. However, there can exist processes that need a stronger protection (e.g., processes that update important data, whose integrity must be guaranteed). This, unfortunately, cannot be fully achieved, because SIGKILL can never be blocked, but it can be to some extent. Let's call stray signals the ones that are sent by a process to another, but were instead meant to be generated only internally to it.
SIGSEGV, SIGBUS, SIGFPE, and SIGILL must not be blocked because the program
behaviour is otherwise undefined, unless they are generated with
kill()
. Even when they are generated by kill()
, there
is a chance that they are generated also internally. Therefore, they must never
be blocked, which means that either they have the default disposition, which is
to terminate the process, or they are caught by a handler. In either cases,
another process can send them.
Stop signals always interrupt 15 system calls. SIGSTOP cannot be blocked, and SIGTSTP should normally not be blocked.
The suggested solution to cater for all this is:
Normal code here means statements that are not in blocks supervised by signals, i.e., ordinary thread code, including cleanup handlers. In cleanup handlers, these system calls should not occur, but should they do so, restart them.
Use of libraries:
Implementing a new library:
One such library can be used in normal code and also in supervised blocks implemented with threads.
Signals that are meant to be internal could also be sent by another process (stray signals). A handler, or a thread that waits for a signal, can tell if the signal was originated from within the same process or was sent by another process by executing:
if ((siginfo->si_code == SI_USER || siginfo->si_code == SI_QUEUE) && siginfo->si_pid != getpid()){ ... external signal }
where siginfo
is the argument of the handler or of
sigwaitinfo()
. The expression evaluates to true if the signal
comes from another process. However, if the signal is sent by another process,
it has still the effect of interrupting a slow system call if the thread that
catches it is suspended in one such call.
N.B.: A process can know if a signal has been sent to it by another, but a thread cannot know what other thread has sent it a signal. (The signals that carry along the pid have the process pid.)
Unfortunately, a handler executed while the interrupted thread was suspended in a slow system call cannot decide to restart or interrupt the call. (First, it does not even know that it has interrupted a system call, and second, registering again the handler with SA_RESTART form within a handler has no effect.) This would allow restarting the call when the signal was external, and interrupting it otherwise, relieving the caller of deciding it. But even so, there are system calls that are never restarted automatically.
There are 28 system calls that are never restarted automatically. They all
return with an error value and errno
equal to EINTR except
sleep()
that returns the time left. After a system call, except
sigwait()
(and the like), it is not possible to test if the signal
that interrupted it was external. It is not even possible to know what signal
interrupted it, unless some (possibly misleading) indication is left by a
handler that has run.
To cater to stray signals, there are the following alternatives:
Threads that wait for signals can discard the stray ones. There are 15
system calls that are interrupted by stop signals (a subset of the 28 ones).
These calls must always be restarted. The additional 13 that are always
interrupted are: pause(), sigsuspend()
, poll()
,
ppoll()
, select()
, pselect()
,
msgrcv()
, msgsnd()
, clock_nanosleep()
,
nanosleep()
, usleep()
, io_getevents()
and sleep()
. SIGSEGV, etc. can occur at any place in the code, and
thus there is a need to decide what to do with these 13 calls.
Solutions for stray SIGSEGV, etc. signals in normal code:
Signals that are not blocked and that have a handler that terminates the process have no influence on the solutions. In non-supervised blocks, no signal can occur except for SIGSEGV and the like, because the others are blocked (or have no influence). Since these are thread-directed signals, there is no thread that is waiting for them. The first solution needs a handler that kills the process, both in the case of normal code and of supervised blocks.
Let's see what alternatives we have for supervised blocks:
pselect()
, etc.: stray signals kill: nothing need be done
(stop signals do not interrupt them); stray signals discarded: restart
these 4 calls if no kill request is pending.If stray signals are to be discarded, system calls must also be restarted in normal code so as to be consistent making the program discard them always. In supervised blocks (implemented with signals), interruption must be tested on 28+20 calls (20 are the additional slow ones that are interrupted not setting SA_RESTART in their handler).
For supervised blocks implemented with a thread, in which the supervised event is a signal, the signal handler should normally cancel the thread. The signal must be got by the thread that creates the supervised one, e.g., waiting for it. Then, it can be discarded when stray. Keep in mind that threads that unblock signals can receive stray signals. Thus, either stray signals kill the process, or the thread that unblocks them must handle them.
Thus the decision is between: stray SIGSEGV and the like kill the process = 15 calls to be restarted in normal code, or are discarded = 13 calls more to be restarted. Since killing the process is the normal thing to do with SIGSEGV and the like, then it is pointless to spend the effort to restart 13 calls more. After all, there are already a number of kill-process signals, and four more should not be a problem.
Restarting all system calls that are interrupted (almost all that return EINTR) could be simpler to remember than only some 15 ones. They are many, though (20+28). However, if one does not remember what to restart and what not, it could restart all the system calls that return with interruption.
Let's then face the problem of the other stray signals. Stray signals that are not SIGSEGV and the like can occur in supervised blocks, and they can be only the ones that are expected. In normal code they are blocked, and thus cannot occur, behaving as if they were discarded. Threads that wait for such signals can (and should) easily discard them. In supervised blocks they must then be discarded too:
pselect()
, etc.: restart these 4 calls if no kill request is
pending, and have handlers that set a kill request when signals are not
stray).This problem does not happen with signals that are meant to come also from other processes throughout the whole program.
Note that it is not possible in general to register handlers for these signals that kill the process when the signals are stray because the process behaviour would not be consistent: sometimes it would kill when a stray signal comes, and some other times not. There are then two alternatives:
Note that technically speaking, there would be no need to test the kill flag when a system call has been interrupted so as to decide to restart it or not. E.g., a system call that is not interrupted by stop signals, in a supervised block for external signals is interrupted only by those signals, and can then be restarted always when interrupted. However, testing the kill flag after interruption on all system calls makes no harm, and it is a simpler rule.
Implementing a new library:
There is no built-in way to detect what signal interrupted a system call (if there would one, then we could test that the interruption was caused by a stop signal and retry, and we could expect existing libraries to have used it, and we could write wrappers for such system calls that make them restart so to be sure that all the calls in the whole program restart. The analysis done before tells to restart always the 15 system calls interrupted by stop signals. Moreover, a library can restart also the 13 additional ones. Alternatively, we could assume that system calls and libraries fail when receiving a stop signal, but this is not nice. A programmer, when implementing a new library, could instrument it, thinking that it could be used in the following contexts:
In each case, we should provide cleanup handlers. Implementing a library that can be used in all these contexts is cumbersome, as it needs either to define the name of a kill flag, or to pass the codes of the signals that can abort the library function. Normally, only the first is to be followed. General libraries are not instrumented to support supervised blocks implemented with signals.
A general library cannot know that some signals are handled with SA_RESTART, and some not. Thus, it should restart all slow system calls, but this defeats the very purpose of SA_RESTART. I think that it is acceptable for a library to restart only the 28 ones. SA_RESTART allows to test only 28 calls upon stray SIGSEGV instead of 28+20. This is consistent with what the user code should do. Let's say that a library that restarts the 28 ones can also be used in an application that wants to ignore stray signals (and that consistently restarts the 28 calls itself), as well as in applications that want stray signals to abort.
This is the approximate cost of signal functions, measured on an Athlon 64 X2 4200, 2.2 Ghz, both in terms of absolute execution time, and number of units, where a unit is the time taken by a memory-to-memory copy:
system call | execution time | units |
pthread_sigmask() |
0.159088 μs | 110.3 units |
sigemptyset() |
0.015451 μs | 10.7 units |
sigaddset() |
0.006815 μs | 4.7 units |
sigaction() |
0.153278 μs | 106.3 units |
setjmp() |
0.008640 μs | 6.0 units |
sigsetjmp() |
0.113223 μs | 79.1 units |
longjmp() |
0.016844 μs | 11.8 units |
siglongjmp() |
0.158201 μs | 110.5 units |
pthread_testcancel() |
0.004869 μs | 3.4 units |
pthread_setcancelstate enable/disable() |
0.034077 μs | 23.5 units |
signal()
, sigset()
, sigvec()
,
sigpause()
, siginterrupt()
(which has the same
purpose of SA_RESTART), etc., are old functions, not to be used any more.
To debug signals, psignal()
can be used to display a signal.
Standard I/O can be used, but at the risk to making the process abort.
A main program (the main()
function of a process) that does not
change the default disposition of signals can be terminated by many signals
that are sent by other processes. When this is not the desired behaviour,
signals must be blocked, ignored, or handled.
This is the suggested scheme to cope with it. Basically, most signals are blocked throughout the program, and unblocked only when necessary, or are handled by dedicated threads.
The initial state of signals when a process starts is described here.
There are a number of cases in which a course of actions need be prematurely terminated (killed):
It has often been said that killing processes is not a good practice, but when there is something wrong with a process, the only one thing to do is to kill it, hoping that it has damaged the other processes and data as little as possible. After killing, a verify/repair program can be run to mend damage. Not killing does not solve the problem: the only alternative to kill a process is to reboot the system, unless the process keeps quiet and does no harm.
Graceful killing and forced killing are two opposite concepts. However, both are needed. The former allows stopping something, keeping all data and other resources consistent, and therefore is the preferred one. The latter is the emergency one, which may require some repair to be done, but is anyway better than rebooting.
In Linux, forced killing is fully supported only on processes. I.e., a process can kill another unconditionally, or, even better, can request another to perform graceful termination, and, if it does not, forcibly kill it. Thread killing instead is either collaborative or imperative. There is no way to try the soft one first, and then the hard one. Moreover, a thread can disable cancellation. This means that, when there are parallel activities in a system that can execute a possibly unreliable program, they should be implemented as processes. E.g., some Web browsers open new tabs or windows using processes, so as to be able to kill them, should they get stuck in some plugin or applet. Threads can also run code that is provided dynamically by using dynamic libraries, and would then be fault compartments if they could be forcibly cancelled. Unfortunately, it is not so, and one reason could be that threads share data. Thus, forcibly killing one of them is likely to place the process in an inconsistent state.
Killing implies a temporal relationship between the killer and its victim. With long-standing victims, this is not a problem. E.g., a process that enters an endless loop and stops producing results can be detected by measuring the production rate (and seeing it zero), and then killing. With endless applications such as servers (daemons), there is no problem, either. In other cases, there could be a need to provide some means to make sure that killing at least does not kill something else.
Promptness of killing could be termed kill latency. This is the maximum time elapsed between a kill request and when when a request is honored, like, e.g., the time between you push the brake pedal and when the car stops. I have never seen strong requirements for very low kill latencies (e.g., lower than a second). Perhaps the lowest latency is needed when SIGPWR occurs, which could be due to switching over power supply to a UPS, or the battery of a laptop running low.
The killer of a process or an application is a process (possibly an interactive shell). The killer of a thread or a task is (possibly another thread of) its process. Process control is done by having a process that starts, monitors, and kills or restarts other processes.
When killing, some error conditions might be encountered during cleanup. It is much better to continue it rather than terminating the process, because, in so doing, there is a chance to restore properly the application, process, thread, or task state.
Applications, processes, threads, and tasks are objects ranging from the largest to the smallest, in increasing level of granularity. The highest is the task (shortest code), but the thread one is appropriate for most of the cases, except perhaps for time supervision. Very seldom does a short sequence of actions need to be killed.
Needless to say, applications, processes, etc., need be designed, taking killing into account, in order to be killed.
When a process terminates, Linux performs some cleanup on the system objects
accessed by the process. Note that a process terminates because it aborts
spontaneously or is forcibly killed (there being no way to perform graceful kill
when processes contain endless loops), or calls exit()
.
system object | persistent (survives process end) | at process abort |
file | yes | closed |
file locks | no | released |
temporary file | no | removed |
named pipe | yes | close |
unnamed pipe | no (when no references) | close |
named semaphore | until reboot | nothing |
unnamed thread semaphore | no | nothing |
unnamed process semaphore | yes | nothing |
message queue | until reboot | close |
mutex, condition, barrier, rwlock, spinlock | no (shared by threads only) | nothing |
socket | no | background close |
shared memory object | yes | close |
streams | depends | flushed and closed |
child process | yes | defunct (re-parented to 1) |
thread | no | terminated |
timer | no | removed |
Note that when a process that holds a semaphore lock aborts (or is forcibly killed), the lock is not released. Aside from the inconsistent states due to system objects, there are also ones due to application objects.
When the application is made of a father process that creates all the others, killing the application is the same as killing the father process. When it is made by a set of unrelated processes, there is a need to tally all of them and kill them. The kill request is sent to a process in the application that shuts down all the others in an orderly fashion. The order in which processes are killed depends on the application. In general, there are processes that can be killed independently from others (and thus simultaneously), and others that must be killed before others. This makes up the shutdown graph. The main constraint on it is to maintain consistency of what is external to the application, and not to cause any problem in the application itself. E.g., an application that is made of a pipeline of processes should shut down by killing first the input stage of the pipeline rather than the output stage.
Process killing depends on what the process does. In any case, a control thread has to be provided that waits for a kill signal:
exit()
, or make the
main register an atexit()
handler and the control thread
execute only an exit()
. This can be done when the main thread
or any other do not have any races in changing the state concurrently with
the control thread. pthread_cancel()
(which in turn kills all its threads and
processes), wait for a given time and then forcibly exit the process. This
is the general solution.This applies when there is a known and safe (from races) way to reset the interprocess state. The scheme is:
void* controlthread(void* data){ increasePriority(); // increase thread priority sigset_t mask; sigemptyset(&mask); sigaddset(&mask,SIGINT); int sig; sigwait(&mask,&sig); ... cleanup exit(EXIT_FAILURE); } int main(int argc, char* argv[]){ sigset_t mask; sigsetmost(&mask); // block most signals pthread_sigmask(SIG_BLOCK,&mask,NULL); pthread_t cth; pthread_create(&cth,NULL,&controlthread,NULL); pthread_detach(cth); ... do the job }
The process to kill can be either a child or an unrelated process. Processes that were created (as children) can be re-parented, becoming then unrelated.
To kill a process, first, graceful kill must be attempted, and then forced kill.
The processes that provide graceful kill must be killed using the means they provide, which could be sending a signal, or a message, or any other means of interprocess communication. The scheme described here makes use of a SIGINT signal (but any other signal can be used, instead). If the means to kill a process is not documented, you can try sending it a signal such as SIGINT, SIGTERM, SIGQUIT, SIGABRT, or at worst SIGKILL, or even all of them in sequence.
Graceful process kill is done by cancelling its main thread, which in turn undoes what can be undone, and in particular cancels the threads that are alive at that point (and that have something to clean up), and the created processes, too. However, a process kill function that retrieves all threads and cancels them would be handy to spare to reckon them. It could be used in simple programs as a convenience function, when threads can be cancelled in any order. But this needs to retrieve all the threads of a process, which is not possible. Moreover, if threads can be killed in any order, then probably they have little to clean up, and in such a case there is no need to kill them: they disappear at process exit. A similar reasoning applies to child processes.
The snippet of code that kills a child process is:
{ if (kill(pid,sig) == -1){ // pid: the one of the child to kill ... error } int i; for (i = 0; i < 3; i++){ // wait for the child to terminate pid_t pid = waitpid(pid,NULL,WNOHANG); if (pid == -1){ ... error } if (pid != 0) goto killed; struct timespec ts = {1,0}; if (nanosleep(&ts,NULL)) break; } kill(pid,SIGKILL); // forcibly kill pid_t pid = waitpid(pid,NULL,0); } // killed killed:
In this example, the victim allows itself to be killed with a signal
sig
. Moreover, it is a related process, and thus must be waited
for. Waiting for its completion is done with waitpid()
. This
also allows detection of its termination. If it were an unrelated process, it
could still be killed by sending it a signal, but its termination must be detected,
either probing it with kill(pid,0)
or waiting for some reply. When
several children have been created, they can be killed simultaneously. See here.
The scheme is:
static pthread_t mainth; // tid of main thread static sem_t contrReady; // to wait until control thread ready static pid_t killer; // pid of killer process static pthread_t cth; // tid of control thread volatile sig_atomic_t kill_requested = 0; // tell cleanup handlers external request volatile __thread void* kill_thread = NULL; // tell synchhandler to kill thread only volatile __thread sig_atomic_t failing = 0; // tell cleanup handlers thread is in error static __thread failure_t* failure = NULL; // pointer to exception object // kill the process and all others in its group void forceKill(){ kill(killer,SIGCHLD); // send reply before suicide (kill process group) struct sigaction sa; // do not generate zombies sa.sa_flags = 0; sa.sa_handler = SIG_IGN; sigemptyset(&sa.sa_mask); sigaction(SIGCHLD,&sa,NULL); kill(0,SIGKILL); // kill all the process group } // control thread to handle kill signals static int kill_underway = 0; // to discard a SIGINT during killing void* controlthread(void* data){ increasePriority(); // increase thread priority sigset_t mask; sigsetmost(&mask); // block most signals for this thread pthread_sigmask(SIG_SETMASK,&mask,NULL); sigemptyset(&mask); // wait for kill signals sigaddset(&mask,SIGINT); sigaddset(&mask,SIGTERM); sigaddset(&mask,SIGQUIT); sigaddset(&mask,SIGABRT); sigaddset(&mask,SIGPWR); sigaddset(&mask,SIGXCPU); // possibly others int sig = 0; sem_post(&contrReady); // reached the fork-safe point int beingkilled = 0; for (;;){ siginfo_t siginfo; sig = sigwaitinfo(&mask,&siginfo); if (sig == -1 && errno == EINTR){ // stop signals interrupt it continue; } switch (sig){ case SIGINT: case SIGTERM: case SIGQUIT: case SIGABRT: case SIGPWR: case SIGXCPU: if (beingkilled) continue; beingkilled = 1; killer = siginfo.si_pid; // all signals carry the sender if ((siginfo.si_code == SI_USER || siginfo.si_code == SI_QUEUE) && siginfo.si_pid != getpid()){ // external kill_requested = 1; // let cleanup handlers know it } pthread_cancel(mainth) != 0); // cancel gracefully main thread struct timespec ts = {(int)data,0}; nanosleep(&ts,NULL); // wait for graceful kill to happen forceKill(); // kill the whole process group } } } // kill the process because of an error void abortprocess(){ failing = 1; pthread_kill(cth,SIGINT); // kill process pause(); // no return possible here, wait to be cancelled } // catch synchronous signals static int synchsiggot = 0; // to rekon signal occurred static void synchhandler(int signo, siginfo_t* info, void* context){ int errno_save = errno; // save errno if ((info->si_code != SI_USER && info->si_code != SI_QUEUE) || info->si_pid == getpid()){ // internal signal if (kill_thread != 0){ // thread wants to handle exception pthread_exit(PTHREAD_CANCELED); } else { if (synchsiggot){ // already got one, signal in cleanup forceKill(); } synchsiggot = 1; sigset_t sigset; // unblock signal for cleanup handlers sigemptyset(&sigset); sigaddset(&sigset,signo); pthread_sigmask(SIG_UNBLOCK,&sigset,NULL); abortprocess(); // kill the process in error } } else { // external signal kill_requested = 1; pthread_kill(cth,SIGINT); // kill process or return } } // first cleanup handler void sigcleanup(void* arg){ kill(killer,SIGCHLD); // send reply to killer exit(EXIT_FAILURE); } // kill processes in the specified array void cleanupchildren(void* arg){ pid_t* children = (pid_t*)arg; int nc; for (nc = 0; children[nc] != -1; nc++); // determine nr. of children int j; int k; { for (j = 0; j < 6; j++){ int sig; switch (j){ case 0: sig = SIGINT; break; case 1: sig = SIGQUIT; break; case 2: sig = SIGTERM; break; case 3: sig = SIGABRT; break; case 4: sig = SIGPWR; break; case 5: sig = SIGXCPU; break; } // kill all processes in array for (k = 0; children[k] != -1; k++){ if (children[k] == 0) continue; // done pid_t pid = children[k]; if (pid < 0) pid = -pid; kill(pid,sig); } int i; for (i = 0; i < 3; i++){ // try, say, 3 times for (k = 0; children[k] != -1; k++){ if (children[k] == 0) continue; // done if (children[k] > 0){ // child pid_t pid = waitpid(children[k],NULL,WNOHANG); if (pid > 0){ // killed children[k] = 0; // clear its pid in array if (--nc == 0) goto killed; // last } } else { // unrelated process if (kill(-children[k],0) == -1){ if (errno == ESRCH){ // killed children[k] = 0; if (--nc == 0) goto killed; } } } } struct timespec ts = {0,1000000}; // sleep 100ms nanosleep(&ts,NULL); } } struct sigaction sa; // do not generate zombies sa.sa_flags = 0; sa.sa_handler = SIG_IGN; sigemptyset(&sa.sa_mask); sigaction(SIGCHLD,&sa,NULL); for (k = 0; children[k] != -1; k++){ // kill children if (children[k] == 0) continue; // done pid_t pid = children[k]; if (pid < 0) pid = -pid; kill(pid,SIGKILL); // forcibly kill } } // killed killed: } // initialize the handling of signals void siginit(){ mainth = pthread_self(); sigset_t mask, oldmask; // block most signals sigsetmost(&mask); sigdelset(&mask,SIGTSTP); // unblock stop sigdelset(&mask,SIGTTIN); sigdelset(&mask,SIGTTOU); pthread_sigmask(SIG_BLOCK,&mask,&oldmask); struct sigaction sa; // register handlers for synchronous abort signals sa.sa_flags = SA_SIGINFO | SA_RESTART; sa.sa_sigaction = synchhandler; sigsetmost(&sa.sa_mask); int synchsigs[] = {SIGBUS,SIGILL,SIGFPE}; int i; for (i = 0; i < sizeof(synchsigs)/sizeof(int); i++){ sigaction(synchsigs[i],&sa,NULL); } sa.sa_flags |= SA_ONSTACK; // support threads that want to catch stackoverflow sigaction(SIGSEGV,&sa,NULL); } // create the control thread to handle signals void sigsetup(){ sem_init(&contrReady,0,0); pthread_create(&cth,NULL,&controlthread,(void*)10); pthread_detach(cth); sem_wait(&contrReady); // wait it to reach a fork-safe point } int main(int argc, char* argv[]){ siginit(); pthread_cleanup_push(sigcleanup,NULL); // register first cleanup handler sigsetup(); ... place here the specific actions of the program pthread_cleanup_pop(0); }
Notes:
pthread_cancel()
of the main thread, sleeps a bit, and then sends a reply signal (e.g.,
SIGCHLD) and sends a SIGKILL to the process group
(which kills the current process and all its children). It is possible for
a process to use SIGCHLD to notify termination also for non-related
processes (providing that the processes that receive them do not attempt to
wait for the senders). It could be possible, after having seen that
graceful kill does not kill, to try sending a kill signal to all children
and wait again before forcibly killing all of them immediately. It could
provide some more chances to properly reset the interprocess state, but
needs to retrieve the pids of all children, which is not easy, and likely
does not pay off the effort. The sleep time could be passed as an argument,
when the control thread is created. The victim, after having created the
control thread, waits for it to reach the point at
which it is ready to accept signals.exit(EXIT_FAILURE)
. If this is quicker than the control
thread, it terminates the process, otherwise the control thread terminates
it.wait()
does not support time supervision), and, if that does
not occur, sends SIGKILL to it. I.e., the killer
scheme above. If the victim has created several children, it can kill
all of them at once, and wait for each of them. This step also kills
children that have been re-parented on purpose. Note that it is up to the
process to keep track of threads and processes, and then to kill the ones
that are alive.pthread_kill(controlthread,SIGINT)
.synchsiggot
) so as to forcibly kill the process,
should it occur again, because this would occur in a cleanup handler. Then
the process is killed by sending a request to the control thread, and the
current thread paused, waiting to be cancelled. This is better than
cancelling the main thread on the spot because the control thread forcibly
kills the process if it does not cancel quickly. The current thread
cannot be exited or cancelled on the spot, also, because that could cause a
race with process kill: when thread exiting is faster than process kill, the
main thread could exit before being killed, and, when process kill is
faster, thread exiting could not be executed. Since the thread is pausing
inside a handler, the signal needs to be unblocked explicitly before
pausing; otherwise its cleanup handler would run with the signal blocked,
causing immediate process abort, should the signal occur again.If the victim has several children to kill, it can kill all of them and then
wait. Suppose there are two children: a quick and a slow one. If it waits for
the quick first, and then for the slow, it will lose no time, and the same if
it waits for the slow first. However, if the waited one does not terminate, all
will be blocked. The technique is then to make a non-blocking
waitpid()
call for each process and loop over all the processes to
kill, noting the ones that have been killed, and allowing a few spare runs,
after which forcibly kill.
When a process receives from another (i.e., a shell script) a request to kill, and that process has children, the forced killing of children could be left to the overall killing of the victim. However, forcibly killing a child in place (i.e., in the cleanup handler) allows a more accurate control over the time to wait for its graceful killing with respect to an overall timeout. Moreover, we have to poll its termination anyway so as not to block cleanup. This means that forcibly killing it locally is not an extra cost. Additionally, it allows graceful kill to proceed.
To honor a kill request with cancellation, a thread must be created at the
beginning of process execution, that waits for the kill signal (or any other
interprocess communication means). Cancellation could also be started from
within a signal handler that the main thread registers for it, and that after
executing a pthread_cancel()
executes a
pthread_testcancel()
, but this does not allow to forcibly kill the
process when the cancellation fails. Note that a process cannot issue a
cancellation request to another: it can send a signal, or a message. It is
possible to register an handler for the main thread that executes a
pthread_kill()
, but a handler has all too many restrictions.
When forking off a new process, unblock SIGINT in the child if a process is being created that does not adhere to this scheme (i.e., it does not have a thread that handles SIGINT). Note that when a process forks another, it cannot be sure about what signals are blocked at the time the fork is executed. Therefore, it should initialize the signal mask in the child (it may execute an executable that does not initialize the mask when started).
If a pthread_cleanup_push()
is executed, and then a
fork()
(which is the case here, for example), the child inherits
the cleanup handler stack, but has no means to pop the cleanup handler. This
can only be solved by calling an exec()
after the
fork()
. Do not mix cancellation and creation of clones. The
separation of process creation between forking and exec-ing serves only to
allow to set a few things in between, such as unblocking signals, redirecting
files, etc. A fork()
must almost always be followed by an
exec()
. The program stretch in between is also a minefield much
the same as a signal handler.
When a process is killed, and it knows that it has sent messages to queues that support purging of pending messages (which is a nice feature), the process can purge them as cleanup action. But not all that is done can be undone. Even though, a victim can send messages to the processes it is interworking with to inform them that it is quitting. Such processes can then cleanup the pending transactions they have with the victim. Moreover, as a general rule, processes that are interworking with others must be prepared to detect the disappearance of their partners, and to perform the necessary cleanup of pending transactions, such as discarding messages that have been sent by a process that is quitting. In order to do that, heartbeat messages must be passed between interworking processes (or some process monitor used).
Scheme of creation of a child process:
pid_t child_pid = fork(); if (child_pid == (pid_t)-1){ ... error } if (child_pid == (pid_t)0){ // child process pthread_sigmask(SIG_SETMASK,&oldmask,NULL); // initial mask of father ... file redirection, etc. ... only asynch-signal-safe functions if (execl(exe-path,exe-name,arg,NULL) == -1){ // one of the exec functions ... error } } pthread_cleanup_push(cleanup,(void*)child_pid); ... child_pid = waitpid(child_pid,NULL,0); pthread_cleanup_pop(0);
Scheme of creation of several children:
pid_t children[n]; // n: number of children + 1 memset(children,-1,sizeof(pid_t)*10); pthread_cleanup_push(cleanup,(void*)children); pid_t child_pid = fork(); // create child ... exec as above children[0] = child_pid; // rekon pid, if re-parented store -child_pid ... create other children // wait for created process to complete int k; for (k = 0; children[k] != -1; k++){ if (children[k] > 0){ // child pid_t pid = waitpid(children[k],NULL,0); if (pid == -1) ... error } else { // unrelated process for (;;){ if (kill(-children[k],0) == -1){ if (errno == ESRCH) break; // terminated } struct timespec ts = {1,0}; nanosleep(&ts,NULL); } } } pthread_cleanup_pop(0);
If we forcibly kill a process, and it has children that are still running,
they become zombies. Better to forcibly kill them. There are no built-in means
to send a signal to all of them. Note that also nephews must be killed, which
means that we must reconstruct the whole process tree, unless we state that a
process that wants to be forcibly killed must put all its children in some
dedicated group (e.g., a new process group). It is possible for a process to
place all its children into a same process group. This could be a solution, but
it would not be general. E.g., a library that is called could create processes
in another group. A process group is not the same as the group of children:
processes can change their process group, and shells put all processes created
with a pipe command in the same group. Scanning the /proc filesystem allows to
find all the descendants of a process. Note that this is needed only when
forcibly killing because with graceful killing a process knows what direct
children are alive, and then kills them (and they in turn kill their children).
Now, we have two solutions: sending a signal (i.e., kill(0,sig)
) to
the process group (which is not perfect because the process group could contain
processes other than its descendants, and some descendants could not belong to
the process group of the ancestor), and killing all descendants (which is not
perfect either because new children can be created in the meantime, unless
perhaps we scan the /proc until no descendants exist). But process groups had
been invented with the purpose to allow to control processes, i.e., to send
signals to groups of them. The problem is not much of shells putting processes
created in pipes in a same group (the programmer knows that and if it does not
like it can use other means than pipes), it is children that can change their
group. A process can set its process group (or set the one of its descendants)
to one that does not exist (practically, to its pid), or to one that is in the
same session. This is mainly used by shells, and can be used by processes too
to set up groups of processes that can be killed (or stopped) with one single
operation. It should then be the canonical way to kill children. Sending a
signal to a process group is also more "atomic" than sending a signal to each
descendant. Processes can decide to put children in a dedicated group with the
purpose to kill all of them sending a signal to them with a single operation.
Note also that the control thread sets the disposition of SIGCHLD to SIG_IGN so
as not to have zombies. It is indeed impossible for it to wait for the
completion of processes in the same group that are not direct children
(waitpid()
waits only for the direct ones).
In theory, when a child has been killed, the process or other children could create further children, making killing never end. However, this is very unlikely to happen, unless the process monitors its children and recreates them when they disappear. But then such a process should not do that in its cleanup handlers.
The main thread creates first the control thread. Later, it could execute a
fork()
, not immediately followed by an exec()
. This
would make a copy of the process address space when the control thread is
running, and possibly at a point in time in which the values in it are
inconsistent. In particular, a printf()
could have filled a buffer
and not yet flushed it. In order to avoid races, there is then a need for the
main thread to wait until the control thread is in a safe state. Note that
atfork()
handlers help only in avoiding deadlocks, but not in
other kinds of races. Note also that when it is not possible to make threads
reach a safe state, only asynch-signal-safe functions can be called between
fork()
and exec()
. Moreover, ensuring that the
control thread reached the point in which it waits for signals guarantees that
the kill signal is accepted and served when the main thread starts to execute
its actions.
When we want to kill a process, what is important is the state of the
objects that the process shares with other related, and unrelated processes.
Its internal data and its threads are not much important. They become important
if they access such objects. If exit()
first stopped all threads
and then called the exit handlers, then there should be no need to cancel
threads: each module would register an exit handler that performs interprocess
cleanup. Alternatively, the cleanup handlers could perform it. Note that since
threads are not stopped at exit()
before the exit handlers are
called, then they need be stopped in the exit handlers so as to prevent them to
change the interprocess state. But this means to cancel them. The
exit_group()
system call terminates all threads, and is executed
when an exit()
is done. The exit handlers are executed before the
threads are terminated (they are executed before all other actions such as file
closing, flushing, etc.). Killing threads from within exit handlers is not
simple: it would require to know the threads of the process, which is not
possible unless the process reckons them. Exit handlers have also the
restriction that they can be registered, but not de-registered. Again, this
means that when there is a simple, known way to restore the interprocess state,
and threads do not interfere with such restoration this can be done in the
control thread or in an exit handler, otherwise, graceful killing must be done
(which could be more lengthy because it would terminate threads and restore the
intraprocess state, that in some cases has little impact on the interprocess
state).
The killer has to time supervise killing if the victim does not provide a
guaranteed termination, which means always since it can seldom be sure about
the victim. However, a process that has guaranteed termination can be called
with kill -SIGINT
, while one that does not needs to be killed
using a program, or with kill -SIGINT
and then kill
-9
. A process that kills another might need to get a reply to know that
the other quit. Many scripts wait a long time not knowing for sure that the
victim quit, polling its termination, not knowing how much to wait, or fearing
to wait insufficiently.
To synchronize killers with victims termination many means can be used. It
is possible to implement an IPC with signals, better the realtime ones, that
allow to carry also some data. Signals have the advantage that they need not
create a kernel object (and to destroy it, so as not to leave it in the
system). Unnamed semaphores can be used only between related processes, and
need a shared memory object. Signals can be used as semaphores among processes:
multiple posts are allowed (with a limit: the current number of signals that
sigqueue()
can queue is 16K per user, but can be changed with
setrtlimit(RLIMIT_SIGPENDING)
), wait is supported (also with time
supervision, and waiting on several signals), and probing the presence of
messages (sigpending()
), and they allow also to carry data. Of
course, you can have as many semaphores as you want (but their names are
system-wide), while there are a fixed number of signals, but they are
process-wide. Among threads there are many synchronization means: semaphores,
mutexes, barriers, conditions, etc. so that there is no need to use signals. In
the scheme above, there is no need to define handlers for those signals because
the main thread blocks them at the process level, and a thread (the control
one) waits for them. There is also no need for a loop to retry
sigwaitinfo()
if we use a thread to wait for these signals (which
there is in a single threaded program that accepts several signals. Thus, the
killer thread can send a reply to the process that sent the SIGINT (or whatever
other kill signal).
Abnormal termination
When a process terminates abnormally (e.g., because of a synchronous signal),
the exit handlers are not executed. This is a pity because when a process
terminates normally it has a chance to cleanup the interprocess state, and thus
it does not need to register exit handlers (although they are handy to use also
there). Exit handlers would be much more handy in abnormal exit. Abnormal
termination is done raising signals. It is then sufficient to register signal
handlers for them. The signals are: SIGBUS, SIGILL, SIGEMT, SIGIOT, SIGSEGV,
SIGSTKFLT, SIGSYS, SIGTRAP. These signals do not interrupt system calls and
make it return with an error code. In general, a program can detect the arrival
of a signal catching EINTR error returns, but these signals do not make calls
return that way. Their signal handler raises a signal for the control thread
(so as to kill gracefully the process) and then pauses (so as to prevent the
offending thread to continue possibly causing other signals to be generated or
damaging the process state). SIGABRT by default terminates abnormally the
process, but it is not caused by the violation of a condition (it is caused by
abort()
) and it is delivered to the process, while the others are
delivered to the offending thread. It is then caught by the control thread. The
(synchronous) signals above can also be sent by a process to another (with
kill()
), which is not nice. Thus, the signal handler discards the
signal when sent by another process.
One such signals can be generated from within a cleanup handler. Is it possible to know that we are inside a cleanup handler so as not to call again cancellation and redo the very same actions that generated the signal. When one such signal occurs in the non-cleanup code of a thread, it triggers graceful kill (since it is not possible to recover locally). The signal handler sets a flag to note that one such signals has already been caught (and process kill initiated). This means that the signal has been generated from cleanup code. The next time one of these signals is caught by the signal handler, it triggers forced process kill.
Note that process kill does not make use of exit handlers (the ones
registered with atexit()
). They are more oriented to normal
execution rather than killing. They spare programmers from remembering to
perform actions at every exit()
, which could be called also within
some library, thus giving users no chances to call such actions (when an error
occurs some libraries make an exit()
). Moreover, they cannot be
de-registered.
Naive killing
A process could be killed registering a handler that executes a
pthread_cancel()
followed by a pthread_testcancel()
.
They are not in the list of asynch-signal-safe functions, but can be called all
the same from within a signal handler. However, this is quite naive because it
does not allow to perform forced cancellation when the graceful one does not
succeed.
Examples cases in which thread kill is used:
Killing a thread can occur during process killing, or as part of the normal operation of a process (e.g., a process that starts a number of threads to make faster an operation using parallelism, like, e.g., a search, can kill threads once the operation has completed, or, e.g., an application with an user interface can use a thread to execute some user requests, that the user could kill). In the latter case, the program should be prepared to cope with killed threads.
A thread can (try to) kill another by executing a
pthread_cancel()
. The victim:
pthread_testcancel()
),pthread_cancel()
return ERSCH,pthread_cancel()
return successfully, but makes
pthread_join()
return the value specified in
pthread_exit()
if the cleanup handlers were initiated by
pthread_exit()
, otherwise PTHREAD_CANCELED,The victim can enable, disable and change its cancellation type dynamically. It could disable cancellation during initialization, enable it as deferred thereafter, and enable it as asynchronous in long stretches of computation. A thread that performs only computation and does not change shared data can terminate gracefully with asynchronous cancellation.
The difference between asynchronous and deferred cancellations is that the programmer is sure that with the deferred one, a thread is not interrupted between calls, and thus it can update data without protecting the operation.
When a thread changes the system or program state (e.g., it allocates a resource) just before a cancellation point, and should restore that state before the thread is cancelled, it must push a cleanup handler before the cancellation point. When a thread has restored the changed state, it must pop the cleanup handler. This is the paradigm:
allocate resource push cleanup handler .. call any function release resource pop cleanup handler
Changes to the system or program that must be undone are the ones that take the system or program in an inconsistent state, like, e.g., allocation of resources such as files or memory (that cause leaks if not released), partial updates of data, open communications with other programs, etc. Remember that the whole purpose of handling properly killing is to keep the system consistent.
It is difficult for a programmer to remember what functions contain cancellation points. This means that once the state has changed (e.g., a resource has been acquired), and then the program calls system or user functions, the programmer (not knowing if the called functions contain cancellation points), has to set up a cleanup handler. He could also remember the state change (e.g., resource acquired) in some data structure so as to make the cleanup handler release it parametrically. Note also that the called functions can evolve during time, and in a later version contain a cancellation point. This means that functions are potential cancellation points and must be considered as such. Enclosing calls in cleanup handlers push/pop pairs ensures that proper cleanup will be done, should the called functions contain cancellation points or otherwise. Cancellation works if at least all system calls that contain blocking points are cancellation points. This is actually the case (except some few exceptions, see below).
E.g., suppose there is a process that locks a semaphore, does some operations, and then unlocks it:
sem_wait(sem); ... ... actions ... sem_post(sem);
To kill it, nothing must be done if the cancellation request occurs before
or during sem_wait()
, but the semaphore must be released if it
occurs after it and before sem_post()
. Of course, in this simple
example this does not happen, but if one of the actions contains a cancellation
point this can occur. The solution is:
sem_wait(sem); pthread_cleanup_push(fn,args); ... ... actions ... pthread_cleanup_pop(0);
sem_post(sem);
The last two statements can be also swapped. As soon as the state is restored (e.g., resource released), the cleanup handler must be popped. Popping and releasing can be done in any order, unless releasing is blocking, in which case it must be done before popping.
Note that when a system call is interrupted by a signal that lets the
program continue, the state does not change, and no cleanup handler must be
pushed. This can be detected by testing the EINTR
error.
System calls can also be interrupted because they blocked the thread and a
cancellation request occurred. As far as I know, there are no means in a
cleanup handler to detect if it has been called because of the interruption of
a system call or because of a pthread_testcancel()
.
When cancellability is enabled, pay attention when inserting statements that may contain cancellation points, such as trace statements. A solution could be to disable cancellation in the program parts in which they are used. Another is disabling cancellation in them.
On the other hand, it is convenient to exploit the fact that deferred cancellation does not interrupt a function anywhere: there are parts that need not be protected (either setting a cleanup handler or disabling cancellation).
In order to make a thread responsive to cancellation requests, make sure
that in long program stretches there are cancellation points. Some system
functions are cancellation points, and some others might be. When in such
stretches only the latter are called, insert calls to
pthread_testcancel()
.
Pay attention to the scope of variables: do take into account that cleanup push and pop are a block. Variables that are (re)defined in them are not visible outside. E.g.,
pthread_cleanup_push(fn,args); int res; ... pthread_cleanup_pop(0); if (res) ... // error
When a program makes a sequence of interleaved state changes, e.g.,:
acquire resource 1 acquire resource 2 release resource 1 release resource 2
the simplest way is to enclose all this in a cleanup handler that releases what resources are actually allocated, reckoning them in some data structure.
Calls to wait for condition variables, pthread_cond_wait()
and
pthread_cond_timedwait(),
lock the mutex when they are cancelled.
There is a need to register a handler that unlocks it if the application needs
to use that mutex again. E.g.,:
pthread_mutex_lock(mutex); pthread_cleanup_push(cleanup,(void*)&mutex); // see below ... pthread_cond_wait(&cv,&mutex); pthread_cleanup_pop(0); void cleanup(void* arg){ pthread_mutex_unlock((pthread_mutex_t*)arg); }
This is the suggested paradigm for cleanup handlers with mutexes:
pthread_cleanup_push_defer_np(pthread_mutex_unlock,(void *)&mut); pthread_mutex_lock(&mut); ... pthread_cleanup_pop_restore_np(1);
It works also with asynchronous cancellation.
POSIX barriers cannot be cancelled. When there is a need to cancel a thread that could wait on a barrier, the barrier must be implemented with semaphores. This is an implementation:
initialization: sem_t arrival; sem_t departure; int counter = 0; int n = number of threads that must come to the barrier void cleanup(void* arg){ int par = (int)arg; if (par) counter--; } void barrier_wait(int n){ sem_wait(&arrival); counter++; if (counter < n){ sem_post(&arrival); } else { sem_post(&departure); } pthread_cleanup_push(cleanup,(void*)(counter<n?1:0)); sem_wait(&departure); pthread_cleanup_pop(0); counter--; if (counter > 0){ sem_post(&departure); } else { sem_post(&arrival); } }
A thread that has registered a cleanup handler and is cancelled when joining another, e.g.,:
pthead_cleanup_push(...); pthread_join(th);
needs to join the victim in the cleanup handler in order to terminate it properly, and not to leave a zombie thread around.
It is true that detaching it would make its cancellation faster when also the creator is being cancelled, but normally cancellation is fast, and thus joining the cancelled thread is not so a big loss of time. Detaching has a problem: when the main thread exits, all its threads are terminated at once. But then, when one detaches a thread it must be quite sure that the thread terminates well before the main one.
See the scheme of killers.
To cancel a child process, a signal must be sent to it, and then its termination waited for. If that does not happen within a defined amount of time, the process must be forcibly terminated. The amount of time to wait depends on what the process is appointed to do.
See process kill.
A thread kills itself executing pthread_exit()
, which runs all
the cleanup handlers. If it has to kill other threads, it must do it before
suiciding, and if it wants to hand its cancellation to another thread, it must
then pause()
instead of exiting.
A thread that receives a cancellation request (from itself or from another thread) can handle it using exception handling:
extern "C" void* thread(void* arg){ pthread_cleanup_push(cleanup,NULL); try { MyObject m; .... } catch (std::exception&){ ... handle the standard exception } catch (...){ ... handle cancellation exception throw; // mandatory } pthread_cleanup_pop(0); return NULL; }
When a cancellation request is sent to the thread, and a catch-all handler is present, the try block is exited, and the destructors of the objects in its scope called (e.g., MyObject m above); then the handler is executed. The handler must end throwing again the exception. Then the cleanup handlers are executed, if any. I.e. it is as if exception handlers were cleanup handlers. Note that cancellation exceptions terminate threads, i.e., there is no way to recover them and keep threads going on.
Calling pthread_exit()
has the same effect as cancellation
(except for the value returned by the thread).
When deferred cancellation is enabled, code that executes only computation, including calls to functions that do not contain cancellation points, is not interrupted by cancellation requests. Remembering what system calls and also system library calls are cancellation points is not simple. The OpenGroup Base Specification enlists a number of calls which are cancellation points (at least in some cases), and a number of others that can be. All of them must be taken as cancellation points. This standard states that implementations must not introduce cancellation points in any other function specified in the standard. To stay on the safe side, it is better to check each function used reading its man pages.
Some synchronization functions, like, e.g.,
pthread_mutex_lock()
, are not cancellation points. They can be
used to perform operations that are not interrupted by other threads and also
by cancellation (i.e., critical sections). When cancellable calls should be used
inside one such critical section, the whole section must be protected disabling
cancellation. Note that deferred cancellation leaves to threads the
responsibility to honor cancellation requests, and threads should do it as soon
as they can, avoiding getting blocked indefinitely or even for long amounts of
time. The possibility for threads to use critical sections that are not
cancelled in the middle allows them to update safely data (or to perform
entirely some sequence of actions) thus preserving consistency of the program
state.
To protect a piece of code that contains calls that are cancellation points (or are suspected to be), cancellation can be disabled:
int oldstate; pthread_setcancelstate(PTHREAD_CANCEL_DISABLE,oldstate); ... non-cancellable section pthread_setcancelstate(oldstate,NULL);
When a function is provided that wraps a library one (e.g., a malloc that performs some additional checks), pay attention to honor the specification of the wrapped one for what concerns cancellability.
This allows to block killing, then do whatever operation we want without bothering to be killed, and then restore the previous state. Note, however, that in stretches of code in which calls that are not cancellation points are used, there is no need to use protection. But sometimes one does not know if a library function contains cancellation points. Suppose you want to implement a sort function and use threads to parallelize it, and use some synchronization calls that are basically blocking, but that used in such a place it would block very little. To make it safe, you could either handle cancellation, or disable it.
pthread_setcancelstate(state,oldstate)
sets the new state and
atomically returns the previous one. However, atomicity seems not
needed here. If it were not atomic, then a thread could first read the current
state, then set the new. Ideally, it could be interrupted between the two. If
asynchronous cancellation is enabled it can be interrupted, but then the thread
is terminated (running cleanup handlers, if any). It seems more aimed to have a
unique point in which cancellation is disabled, and another in which it is
restored. It could be just an issue of convenience: an additional parameter to
a set function is less costly then a get function, and it is also easier to use
since the getting of the value followed by the setting of a new one is
frequent. Atomicity is needed when something could occur between reading the
existing value and setting a new one, like, e.g., the implementation of mutexes
with fetch-and-add atomic operations.
During the execution of cleanup handlers, cancellation is disabled, and cannot be enabled (thus pay attention not to enter endless waiting). However, consider that killer and victim threads are cooperating, and then a killer should not issue two cancellation requests to the same thread (the second has no effect, and denotes poor coordination). Once a thread has started executing cleanup handlers, it becomes unresponsive to cancellation requests.
Cleanup handlers:
pthread_exit()
.pthread_exit()
or pthread_cancel()
.pthread_join()
.Cleanup handlers are not executed when exit()
and
_exit()
are executed (and neither are per-thread data
destructors). This is so because they are devoted to terminate processes.
Objects that a process uses to communicate with other processes can be cleaned
up with atexit()
.
longjmp.
pthread_cleanup_push
and
pthread_cleanup_pop
are asynch-cancel-safe. However, they can
be used when cancellability is asynchronous. There are asynch-cancel-safe
functions, that can be called by a thread (including its cleanup handlers)
that has asynchronous cancellation enabled. There is no list of them, but
pthread_cancel()
, pthread_setcancelstate()
and
pthread_setcanceltype()
are asynch-cancel-safe.
Asynch-cancel-safe means that it can be called when asynch
cancellation is enabled since it does not cause anything to be corrupted.
Functions that are not guaranteed to be asynch-cancel-safe could leave the
process in an inconsistent state when cancellation is executed. Therefore,
enable asynchronous cancellation when a thread is doing only
computation.pthread_testcancel()
).
pthread_setcancelstate()
is not a cancellation point: this
means that a function that does not want to be a cancellation point can
really be so.pthread_exit()
, that executes all the cleanup handlers on the
stack.pthread_cleanup_push_defer_np()
and
pthread_cleanup_pop_defer_np()
push/pop cleanup handlers and
also set/restore deferred cancellability.sigwait()
.pthread_cancel()
returns EPERM
when issued to a
thread that has completed its cleanup handlers, it was joinable, but has
not yet joined, and also that it was detached and has terminated. I.e. it
returns it when it cannot cancel the thread.The choices for a creator thread to synchronize with a created one to get its results are:
pthread_detach()
), otherwise another
synchronization can be done, letting the creator know when it can reuse
shared data. Note that this is more flexible than joining because it allows
to wait for several threads in parallel.Cancellation does not change the detachable or joinable status of threads (unless a thread changes that itself), and therefore the killer has to wait for the victims as it does normally when it does not cancel them. The best alternative when cancellation is done depends then on what the thread does.
A killer and a victim proceed in parallel, and a victim could terminate independently from a killer. However, there are no races when killing is done because a thread that creates another kills it before joining or synchronizing, or at any time if it runs forever. There must not be threads that end spontaneously without notice.
After thread cancellation, victims continue to run, possibly executing some code before they reach a cancellation point, or executing cleanup handlers. But the killer wanted to kill them for some purpose. How can it know when its purpose has been achieved? The purpose could be to make the victims stop changing shared data, or consuming CPU, or memory:
Treads are closely related objects. Often, killing a created thread does not only require to send it a cancellation request to it, but also to adjust the rest of the program to do without it. E.g., if there are 4 threads that must execute a parallel search for some result, and then wait on a barrier, killing one requires to adjust the parameter of the barrier.
The scheme to cancel a single thread and wait for it is:
void cleanup(void* arg){ pthread_t thr = (pthread_t)arg; pthread_cancel(thr); // n.b. no error checking pthread_join(thr,NULL); // no errors can occur } ... piece of program that creates a thread pthread_t th; pthread_cleanup_push(cleanup,&th); // N.B. no cancellation points before creation pthread_create(&th,NULL,&thread,NULL); ... do something pthread_cleanup_pop(0);
Thread cancellation is done without error checking because the thread could have terminated in the meantime, and cancelling a terminated thread returns an error. The registration of the cleanup handler is done before the creation of the thread because the thread can execute and even cancel its creator. If the registration of the cleanup handler were done after creation, there would be no cleanup handler to execute.
When there are several threads that must be cancelled, the killer could
cancel them all, and then wait for all of them. When a program piece creates a
variable number of threads and wants to cancel them, it can have a cleanup
handler that receives an array of tid's as argument and cancels the ones
pointed to by it, and then waits for them all with pthread_join()
(that would not sequentialize cancellation and waiting for each thread). The
program piece can reckon the thread tid's in an array. The assignment of the
return value of a pthread_create()
call is done unless the
function returns an error (it is not a cancellation point). This means that it
is safe to reckon tid's. The scheme for killing threads in parallel and joining
them is:
typedef struct threads_t { int len; // number of elements pthread_t tids[0]; // their tids } threads_t; void cleanup(void* arg){ threads_t* thr = (threads_t*)arg; int err; int i; for (i = 0; i < thr->len; i++){ pthread_cancel(thr->tids[i]); // n.b. no error checking } for (i = 0; i < thr->len; i++){ pthread_join(thr->tids[i],NULL); // no errors can occur } free(thr); } ... piece of program that creates several threads threads_t* thr = malloc(sizeof(threads_t)+sizeof(pthread_t)*n); // n = number of threads thr->len = 0; pthread_cleanup_push(cleanup,(void*)thr); ... pthread_create(&thr->tids[thr->len++],NULL,NULL); ... other threads created pthread_cleanup_pop(0);
Note that thread cancellation is done without checking errors. This is so because when the cleanup handler is called, some of the threads might have terminated. Cancelling a terminated threads returns an error. However, attempting a cancellation is simpler than reckoning what threads are alive.
A task is supervised block. Supervised blocks are blocks of code in which a signal can occur, that must be handled terminating the block and cleaning up the program state. In languages such as C++ and Java, supervised blocks are bracketed statements that are made of a body, and a number of sections (exception handlers) to which control is implicitly (and sometimes explicitly) transferred when an event occurs. After executing them, the block is terminated. In C there is no such construct, and therefore, events are handled by checking their occurrence after any statement on which they can occur, or jumping at the end of the block, or cancelling the current thread when it coincides with the task.
A supervised block is meaningful when the supervised event can be generated during the execution of some actions occurring in the block, not with a signal that can occur at any time, even before or after the block, except for the case in which we want to protect the block so as to ensure that it performs a transaction (in which case when it is killed and it cannot complete the transaction, it rolls back). A means to ensure that a block is executed entirely is to block signals or to disable cancellation. A process that wants to support graceful kill must enclose all the code in a supervised block (which is done by default, if cancellability is used). As an example, consider time supervision, in which the supervised block starts exactly when a timer is started, and ends when it is stopped.
This is the scheme of a supervised block with some calls. E.g., a sequence of resources is allocated and the same released in reverse ordering, releasing that acts also as cleanup code (when a kill request is detected, control is transferred to the proper place in the cleanup code). Cleanup is protected against interruption, be it executed as the last part of normal operation, or because of killing:
begin block ... allocate resource 1 if killed goto r1 ... allocate resource 2 if killed goto r2 ... disable killing // cleanup code r2: release resource r2 r1: release resource r1 enable killing end block
Supervised blocks can be implemented by:
pselect()
, ppoll()
, epoll_pwait()
and read()
on a signalfd()
. A signal handler is
also needed.siglongjmp()
, when the
block contains only asynch-signal-safe function calls that do not have any
cleanup to be done (i.e., that do not allocate resources, and do not leave
inconsistent data if aborted). Note that these conditions are similar to
the ones of asynchronous cancellation. Note also that using this ad-hoc
solution is risky because the program could be modified in the future,
forgetting these conditions. Enclosing a big bunch of code in a supervised
block in which a signal performs a transfer of control to the end of the
block is not a good idea because, even if non-re-entrant functions could be
safely interrupted (which is not), no cleanup could be done, leaving the
program in a bad state. Cooperative killing is instead the solution.Solution 1 can be used only when there are no slow system calls other than the four ones above. Solution 2 implies polling, solution 3 is not general and should be avoided, and solution 4 is expensive. When solutions 1, 2 and 3, cannot be used, solution 4 is the only applicable one.
Solutions 1 and 2 can be used only for time supervision, and other internal signals, and not for synchronous signals (e.g., stack overflow) since they cannot be handled testing them at the next slow syscall aborted, but only terminating the block (e.g., cancelling the thread). Solution 3 can be used for all signals, and solution 4 for any kind of events.
Cleaning up means releasing allocated resources, and undoing what else can be undone. There is then a need to reckon what resources have been allocated at any point in time during the execution of a supervised block. Knowing what resources a process has at any one point in time is in general a good idea: it allows it to tell how much it is using and what. This can be done e.g., keeping a list of open files, etc. Note that in /proc/xxx files there is a list of some resources, but not all. E.g., there is no list of locked semaphores, and of course there are no resources enlisted that are not Linux objects.
When supervised blocks that need a signal handler (albeit empty) are used, make sure that the handler is agreed among all threads that need to handle the very same signal, i.e., that there are no two threads needing a different handler.
When supervised blocks are implemented with signals, and the originator of the signal sends process directed signals (e.g., a timer), make sure that there is only one thread at a time in one such supervised block, or that different signals are used. E.g., if there are several threads that need to do something when a signal occurs, they all cannot catch it because only one will get it. Of course, it is possible to have a single thread that accepts that signal and re-sends it to all other threads (but getting all their tid's is not simple, so, there would be a need for a sort of registration for the threads that want to receive it). Note that this needs to use another signal because it is not possible to bounce back the same one.
In supervised blocks that are implemented with signals that are generated internally (e.g., by timers), those signals must be removed if any is pending when the supervision starts (so as not to kill the block should one such signal be pending while there should be none), and also when it terminates (so as not to cause problems to what follows, like, e.g., interrupt system calls should the signal be generated when exiting the block). For signals that are not generated internally, there is no problem because such signals can come at any time, and it makes no difference if they occur immediately before the supervision, or just after.
In signal blocks that are implemented with signals, the handler:
In the following, only the first two cases are presented.
To allow killers to identify uniquely the instances of tasks to kill, tasks can be numbered (and task numbers recycled with a long recycle time), and kill requests accompanied by task numbers. This allows to protect against killing tasks implemented with threads, that have reused the tid of terminated threads.
See here an in-depth discussion of the implementation of supervised blocks.
Here is the scheme:
static volatile __thread sig_atomic_t killFlag = 0;
static void handler(int sig){
killFlag = 1;
}
struct sigaction act; // register handler
act.sa_handler = handler;
act.sa_flags = 0;
sigemptyset(&act.sa_mask);
sigaction(SIGxxx,&act,NULL);
// supervised block
sigset_t mask, oldmask;
sigemptyset(&mask);
sigaddset(&mask,SIGxxx); // block signal. N.B. not needed if already blocked
pthread_sigmask(SIG_BLOCK,&mask,&oldmask);
pthread_sigmask(0,NULL,&mask); // get the previous mask with this signal ..
sigdelset(&mask,SIGxxx); // .. unblocked
clearKill(); // clear pending signals, only if generated in the block
...
struct timespec ts = {timeout,0};
for (;;){
int res = ppoll(&fds,1,&ts,&mask);
if (res == -1){
if (errno == EINTR){
if (killFlag) ... kill the block
continue;
} else {
... error
}
} else if (res == 0){
... timedout
} else {
... an event occurred
}
break;
}
clearKill(); // clear pending signals, only if generated in the block
pthread_sigmask(SIG_SETMASK,&oldmask,NULL); // restore previous mask
// clear pending signals
void clearKill(){
sigset_t sigpend;
sigemptyset(&sigpend);
sigaddset(&sigpend,SIGxxx);
struct timespec ts = {0,0}; // make sigtimedwait return immediately
siginfo_t siginfo; // remove pending signals, if any
while (sigtimedwait(&sigpend,&siginfo,&ts) == -1 && errno == EINTR);
}
Having a signal handler is essential because ppoll()
unblocks
signals while waiting, and if a signal has the default disposition, it can
terminate the process. In the code above, pselect()
can be used
with the same scheme.
If the convention to keep signals blocked (almost all) in non-supervised (normal) code is followed, there is no need to block the signals and restore the mask upon entry and exit of blocks. However, doing is does no harm.
To supervise a block of statements with polling:
errno
= ETIMEDOUT or other
timeout returns), nohanged (errno
= EAGAIN), or
interrupted: if the kill flag is set, exit the block, otherwise restart
the system call,N.B. the poll loop must sleep for a while at each iteration.
We must test the kill flag often enough to have a responsive kill. This means testing when polling, and if there are long computations, testing it in them too.
This is the scheme of polling:
static volatile __thread sig_atomic_t killFlag = 0;
static void handler(int signo, siginfo_t* info, void* context){
// if signal is stray, do not set the flag
killFlag = 1;
}
// clear a pending kill request
void clearKill(){
killFlag = 0;
}
// test if kill has been requested
int killRequested(){
if (killFlag){
errno = EINTR;
killFlag = 0;
return -1;
}
return 0;
}
// example of wrapping a timeouted system call (with absolute time)
int psem_twait(sem_t* sem){
if (killRequested()) return -1; // not to loose the time of one poll
struct timespec tsn = {1,0};
for (;;){
struct timespec ts;
struct timespec tick = {1,0};
timeend(&tick,&ts);
if (sem_timedwait(sem,&ts) == 0) return 0;
if (errno == ETIMEDOUT || errno == EINTR){
if (killRequested()) return -1;
} else { // some error
return -1;
}
if (nanosleep(&tsn,NULL) -1 && errno == EINTR && killFlag){
errno = EINTR;
killFlag = 0; // clear kill request
return -1;
}
}
}
// example of wrapping a non-blocking system call
int psem_wait(sem_t* sem){
struct timespec ts = {1,0};
for (;;){
if (sem_trywait(sem) == 0) return 0;
if (errno == EAGAIN){
if (killRequested()) return -1;
} else { // some error
return -1;
}
if (nanosleep(&ts,NULL) -1 && errno == EINTR && killFlag){
errno = EINTR;
killFlag = 0; // clear kill request
return -1;
}
}
}
// example of wrapping a system call that is interrupted by stop signals
int pepoll_wait(int epfd, struct epoll_event* events,
int maxevents, int timeout){
if (killRequested()) return -1; // not to loose the time of one poll
struct timespec ts = {timeout/1000,(timeout%1000)*1000000};
struct timespec tend;
timeend(&ts,&tend); // compute deadline
int nfds;
int i;
timeout /= 10; // poll 10 times the timeout
for (i = 0; i < 10; i++){
nfds = epoll_wait(epfd,events,maxevents,timeout);
if (nfds > 0) break;
if (nfds == 0 || errno == EINTR){ // timeout or interruption
if (killRequested()) return -1;
} else {
return -1;
}
if (timediff(&ts,&tend) <= 0) return 0;
}
return nfds;
}
... before a supervised block:
struct sigaction act; // register handler
act.sa_sigaction = handler;
act.sa_flags = SA_SIGINFO; // N.B. do not restart slow system calls
sigemptyset(&act.sa_mask);
sigaction(SIGxxx,&act,NULL);
// supervised block
...
{ // supervised block
sigset_t mask, oldmask;
sigemptyset(&mask);
sigaddset(&mask,SIGxxx); // unblock signal
pthread_sigmask(SIG_UNBLOCK,&mask,&oldmask);
clearKill(); // kill pending request, if generated internally
...
int res = psem_wait(&sem); // slow system call
if (res == 0){
... success
} else if (errno == EINTR){
... killed
... block signal
... cleanup
... unblock signal
goto endblock;
} else {
... error
}
... other actions
pthread_sigmask(SIG_SETMASK,&oldmask,NULL); // restore previous mask
} endblock:;
N.B. blocking signals in cleanup code does not cause any race with a signal that occurs right before blocking it: cleanup is reached because one such signal has already been caught.
Unblocking is needed because the code outside supervised blocks has normally signals blocked.
Here is the scheme of supervised block in which a task is implemented as a thread, and the supervision is a time supervision:
typedef struct task_t { void* (*function) (void* arg); // actual task function void* arg; // function argument void* retvalue; // function return value pthread_t th; // thread that executes the supervised actions sem_t sem; // synchronization semaphore with thread failure_t* fail; // pointer to failure object } task_t; // cleanup handler for thread void cleanup(void* arg){ task_t* task = (task_t*)arg; sem_post(&task->sem); // tell task terminated } // thread for executing the supervised actions void* thread(void* data){ pthread_cleanup_push(cleanup,data); task_t* task = (task_t*)data; failure = task->fail; task->retvalue = task->function(task->arg); // call actual task function pthread_cleanup_pop(1); return NULL; } // cleanup handler for convenience function static void cleanupt(void* arg){ task_t* task = (task_t*)arg; pthread_cancel(task->th); // n.b. no error checking pthread_join(task->th,NULL); // no errors can occur sem_destroy(&task->sem); } // convenience wrapper function, that executes user function with timeout int timedTask(void* (*function) (void* arg), void* arg, void** ret, int timeout){ task_t task; task.function = function; // store function pointer task.arg = arg; // store function argument task.retvalue = NULL; // clear return value sem_init(&task.sem,0,0); // initialize task termination semaphore task.fail = failure; // store exception object pointer int res = 0; pthread_cleanup_push(cleanupt,&task); res = pthread_create(&task.th,NULL,&thread,&task); if (res != 0){ return res; } struct timespec ts; clock_gettime(CLOCK_REALTIME,&ts); // start supervision ts.tv_sec += timeout/1000; // timeout in millis ts.tv_nsec += (timeout % 1000) * 1000000; if (ts.tv_nsec > 1000000000){ // normalize ts.tv_sec++; ts.tv_nsec -= 1000000000; } res = sem_timedwait(&task.sem,&ts); // wait until task completed or timeout if (res == -1 && errno == ETIMEDOUT){ res = 0; pthread_cancel(task.th); } void* status; pthread_join(task.th,&status); if (status == PTHREAD_CANCELED){ res = ETIMEDOUT; } sem_destroy(&task.sem); pthread_cleanup_pop(0); *ret = task.retvalue; // return task return value return res; } // example function that contains the actions to supervise void* funct(void* data){ ... actions } ... code that defers the task to another thread and supervises it void* ret; int res = timedTask(funct,argument,&ret,timeout); if (res){ ... task timed out, or creation error }
The thread that defers the task uses a semaphore as timer. This has the advantage that it can easily be created and destroyed, and it can be stopped by the task when it has ended before the time has expired.
This scheme contains a convenience function that can be used all the times there is a need to time supervise a sequence of actions: they can simply be put in a function, that is called. A similar technique can be used when the event to supervise is a different one. This example supports also exception handling.
If the supervision is not a time supervision, after having created the task thread, the creator has to wait for whatever event it must, and when it occurs, cancel the task thread. Note that the creator must use a system call that waits for two events to occur: the supervision one and the cancellation of the supervision made by the task thread.
The time supervision is started here by a thread that is not the one that executes the actions. This could be less precise than starting it from within the latter. However, that is not simple, and it is not precise anyway.
To implement properly killing, either we instrument all the places in which a resource is allocated so as to treat killing, or we block killing where we do not want to treat it.
In library functions (and in object methods), killability could at most be disabled, but never enabled: the caller could want not to be killable.
The use of a general cleanup function is possible for process, thread and task kill. It can be used when killing consists only in simple operations like, e.g., releasing resources. A table of allocated resources can be kept. They are system resources, but could also be process-wide resources contained in some shared segment. The table would indicate for each entry the kind of resource, and for the user ones, it would also contain a function pointer to the release function. A thread would register into the table any new resource acquired, and remove from it any released. The cleanup code would then scan the table and release all the resources in it. This can be used globally (i.e., for an entire thread or process), and also locally (for a task).
However, there are cases in which simply releasing is not enough (e.g., to close some connection there would be a need to send messages). This can be solved by making the code that executes actions that need to be undone provide a function to do it, that needs to be registered too. This allows also to preserve information hiding (which would not be the case if the general cleanup function accessed the internals of all modules that need cleanup).
With all kill paradigms it is possible to reckon resources allocated, and treat kill at the end of a block, Note that when the program detects a kill request, it must not continue, but jump to the cleanup code. Continuing could lead to unpredictable results, like, e.g., blocking (which should not happen with persistent events, but it could with simulated persistent signals: to kill a sequence of blocking system calls a sequence of signals must be sent until an acknowledge comes). Note that continuing does not occur when cancellation is used because when a cancellation request is acted, the thread jumps to cleanup handlers as soon as it encounters the first cancellation point, and does not continue any more with the code that the thread was executing before cancellation.
Killing is an action that terminates an application, a process, a thread, a task, usually performed by another process or thread, and sometimes also by the same one. It has to be performed in such a way as to preserve system integrity, i.e., not to cause memory leaks, cluttering of system tables, generating unavailability of resources, etc. Since it is difficult, if not impossible, to guarantee this while aborting processes and threads at a random point in time during their execution, killing is usually performed with the cooperation of the victims (agreed resignation, graceful killing).
The basic properties of any killing scheme are:
These are the basic paradigms that are used to implement graceful killing of a sequence of operations:
{ // doit { // cleanup ... operation if (killed) goto cleanup; ... operation if (killed) goto cleanup; break doit; } // cleanup cleanup: } // doit doit:
Note that cleanup code can then fall through, in which case killing is restricted to the operations contained in the block, or it can return to the enclosing block, possibly unwinding up to the main one of the process. The sending of the kill request can be done in several ways (raising the kill flag, generating signals, etc.).
begin_kill ... operation ... ... operation on_kill ... cleanup end
The difference with the previous one is that it has no explicit
statements to check if a kill has been requested (except when there are
long computations). Depending on the implementation, dedicated syntactic
constructs or system calls are used. The former have the advantage that
cleanup actions are close to the ones to cleanup; and do not need to be put
in some dedicated place or function. This kind of supervised blocks are
provided by the longjmp
scheme (that cannot be used in general
with signals, however), by thread cancellation (that performs unwinding),
and by the try blocks of C++ (that cannot be used with asynchronous
signals).
These paradigms can be implemented using:
Pros/cons of these solutions:
pthread_join()
, needs a thread and a
timerCancellation kills some system calls such as pthread_join()
that are not killed by signals.
Deferred cancellation allows to implement (graceful) killing because it does not interrupt the flow of instructions at any place, but only at some points in which it is safe to do it, and when cancellation is acted, it calls a cleanup handler that restores properly the state.
The only solution that can be used in general libraries is the cancellation one. Libraries that use the others can be used only in places that set up handlers, and therefore are special purpose libraries. Cancellation is the best solution, except perhaps for some limited cases in which other forms of supervised blocks can be used.
Linux implements cancellation with SIGRTMIN.
Let's tackle the implementation of supervised blocks by considering first the races that occur when they are implemented naively by registering a handler that interrupts slow system calls and sets a flag, and testing before and/or after system calls the value of the flag.
All races fall in the category: signals lost, lack of atomic unblock+wait.
In books some others are reported, but are either related to the use of old
system calls (like, e.g., signal()
) or the use of functions that
are not allowed in handlers (like, e.g., longjmp()
). The race
is:
pthread_sigmask(SIG_UNBLOCK,...); // unblock signal ... if (killFlag) exit block (1) systemcall(...); (2) void handler(int signo){ killFlag = 1; }
The race is that a signal can occur immediately after (1) and before (2), then loosing the signal.
Some races can be cured, though:
alarm()
followed by pause()
: the alarm can go
off before pausing. Use a longer time, or in this special case use
siglongjmp()
(and suspend all other signals while in the
handler), or block SIGALRM and then call alarm()
and
sigwait()
. The same applies when alarm()
is used
to supervise any other system call that is asynch-signal-safe. Time
supervision of I/O can be better done with select()
and
poll()
. When slow system calls have also a non-blocking
version, use the poll solution.pause()
: use
sigwait()
(a signal can occur between blocking and
pause()
, and be lost). Unblocking and waiting must be atomic,
and so it is in pselect()
, ppoll()
,
epoll_wait()
, etc. but not in other blocking system calls.Races are defined in Wikipedia as results that are time dependent, while they should not be. However, in Wikipedia they seem restricted to the ones occurring on data when several processes read or write them, which is normally solved with the notion of atomicity. They are present also in other contexts, such as signals when we implement supervised blocks. In this case we want that another thread of execution (the signal handler) meddles with the supervised one, and want to avoid races that occur in doing it (as the ones reported in this document). With supervised blocks, the tools we have are atomically blocking+waiting/unblocking+resuming signals.
The purpose of handlers aborting slow system calls like
pause()
, pselect()
, ppoll()
,
sigwait()
, and sigsuspend()
is to support supervised
blocks. For the other slow system calls it is of little use (except for
timeouted system calls): it cannot be used to kill an operation since it leads
to a race. Apparently, since the process has done something (i.e., executed the
handler), it should not continue with a slow call, but this seems a weak
reason. However, it is possible to re-suspend it at the exit of the handler,
which is achieved by registering the handler with SA_RESTART. Synchronous
signals by definition abort the operation that raised them. Aborting seems to
support the semantics of signals that serve to kill a sequence of actions.
A general fact is that a thread of execution (e.g., a thread, or a signal handler) has no built-in means to know where the other threads are in their execution, or what they are doing (like, e.g., if they are inside a system call or not). Knowing their program counter would not help either. They must use some explicit means to inform each other. Doing something and informing another are usually two separate actions, and when they are, they need to be performed atomically, much the same as updating several data, to avoid another thread to run between them and read inconsistent data. This can be achieved by using some means to mutually exclude the execution of threads. In the realm of signal handlers, the only means to do it is blocking signals, much the same as with interrupt handlers (that disable the served interrupt line). However, Interrupt handlers can use spin locks to mutually exclude when accessing data that can be accessed also by their tasklet, but this because they can run in parallel with them giving then tasklets a chance to release them. On the contrary, signal handlers do not run truly in parallel with the thread they interrupt, and therefore if they spin for the interrupted thread to do anything, they cause a deadlock. It is to be noted that blocking signals and performing some actions need not be atomic in most cases. When signals are blocked, handlers do not run, and then a thread can update data safely. However, if in doing this a thread needs to suspend itself, handlers would not run until the thread resumes and unblocks signals. If there is a need to let handlers run while a thread is suspended, then signals must be unblocked atomically while suspending and blocked again atomically while resuming. Atomicity is needed to avoid to have windows of time in which handlers can run while the thread is not suspended. Atomicity can only be achieved having the kernel perform these operation. The kernel does it when running handlers, and when executing few system calls.
That said, let's see then if a signal handler can perform cleanup. Cleanup
must be done when a thread has executed a system call or any function that
needs something to be undone, such as releasing a resource (e.g., a lock). A
signal handler then needs to know if the thread has completed one such system
call, or is before or inside it, and can do it only by testing some flag set by
the thread since it has no built-in means to do it (e.g., errno
in
them is not set to EINTR when handlers abort a system call). The thread could
block signals, clear a flag, do the system call, set the flag and unblock
signals. The problem occurs when the system call suspends it. In such a case
the handler has no chance to run, and if its purpose were to kill the
operation, it cannot fulfill it. There would be a need to unblock signals
during suspension. This is what pselect()
, ppoll()
,
sigwait()
, etc. do. But they do not reserve any resource, and
therefore do not need any cleanup. No other system call do it. If they did,
then handlers could test the flag above and from its value know if a system
call terminated successfully, and only in such a case perform cleanup. The
conclusion is then that cleanup cannot be done in signal handlers. Note,
however, that if system calls behaved like that, not only handlers could
perform cleanup, but also the calling threads by testing EINTR after system
calls. See here for more information.
Reckoning resources is not difficult if the program does not use such information from within handlers, but, e.g., in cleanup handlers, or if the program is not aborted asynchronously, but only in some known places.
Blocking signals while executing slow system calls, besides allowing to reckon resources, has also another purpose, which is unrelated to maintaining consistency in updating data, and that is to defer the running of handlers until a slow system call is executed (one that unblocks signals while waiting). This avoids to loose signals. We want to block signals beforehand because we want it be unblocked when waiting, or in other words, we want to be interrupted only when waiting, which is like registering handles when waiting, which is what drivers do when suspending for interrupts. Like drivers, when suspending, signals need be unblocked (which enables signals to be caught by handlers), when handlers are run, signals are blocked again, and when handlers terminate signals must remain blocked, and the thread resumed. In supervised blocks we want not to loose events (signals occurring) that occur before suspension points (in addition to the ones that occur during suspension). The kernel handles signals much the same as interrupts: when interrupts are not enabled, and a datum becomes available at the interface, the datum is kept, and an interrupt immediately asserted when the interrupts are enabled. Likewise, when signals are blocked, and a signal arrives, it is kept, to run a handler as soon as signals are unblocked. To use this paradigm, there is a need to keep signals blocked, to be unblocked when slow system calls suspend processes. Note that the kernel handles all slow system calls in a similar way when deferred cancellation requests are acted: they do not produce an immediate effect, but interrupt slow system calls when they suspend the process.
Having renounced to perform cleanup in handlers, there is no strict need to
return from slow system calls with signals blocked. Re-blocking in
pselect()
is there in order to avoid to have a window in which
signals are lost after such a call and before another. But this can be
overcome: suppose pselect()
returned with signals unblocked, and
the program blocked them again before a subsequent call. A signal can come in
between, its handler run, which can set a flag. After blocking signals, the
program can test the flag and if set, kill. There would be no signals lost.
After all, a signal with a handler that sets a flag resembles a persistent
signal. Re-blocking signals would, however, spare all this (and make
programming less risky). But there is another reason to return from an
interrupted call with signals blocked, and it is that we want then to perform
cleanup, and in doing it we do not want cleanup code to be killed by a signal
(and system calls in it aborted). It is true that in cleanup code slow system
calls should be seldom used, but it is also true that we want to perform all
the cleanup code. Therefore, if we execute it with signal unblocked, at least
we need to set up handlers that do almost nothing, and restart system calls
(but there are some that are never restarted). If signals are not blocked, a
solution in cleanup code could be to test EINTR and explicitly restart system
calls. Another is to block signals explicitly at the beginning of the cleanup
code providing that empty or harmless handlers are in force (otherwise the
default disposition would kill the process, and no cleanup done). A signal
occurring just before blocking signals in cleanup code would request again
killing, which we are already doing.
A rule is that a program that registers a handler that can be invoked
asynchronously by the system (e.g., an interrupt handler or a signal handler),
must start the execution of such a handler with interrupts disabled or signals
blocked (or whatever makes the handler run disabled) so as not to begin a new
execution of the handler while the previous one has not yet completed. The
reason is that inside the handler, data are accessed, and two parallel
executions can mess this up. Of course, if the handler is re-entrant, there is
no danger in having two parallel executions, but this is seldom the case. This
is exactly what happens in signals handlers: signals (at least the one kind
that invokes them) are blocked in them (unless this default overridden). A
similar condition avoids races also in programs that call slow system calls: if
calls were resumed with signals blocked, we can safely handle their result
(successful or aborted) without bothering about being interrupted again by the
same signal handler, which could mess up things. However, this is not strictly
necessary: it depends on what the handler does. E.g., handlers that only raise a
flag cause no races. A handler that executes a longjmp()
normally
sets a flag to bypass it the next time.
Here it is shown that longjmp()
in signal
handlers is unsafe. This technique can be applied only in some few cases, but
it is generally to be avoided. The only one case in which longjmp really avails
is when the block contains slow asynch-signal-safe calls
(accept()
, connect()
, open()
,
pause()
, poll()
, pselect()
,
readlink()
, recv()
, recvfrom()
,
recvmsg()
, select()
, send()
,
sendmsg()
, sendto()
, setsockopt()
,
sigsuspend()
, wait()
and write()
), but
it is dangerous with respect to program changes. Moreover, terminating abruptly
a sequence of instructions can leave data in an inconsistent state. When a
supervised block has only pure computation in it, signals can be blocked and
upon detecting any pending ones, the block terminated.
In the first solution of supervised blocks (pselect()
, etc.), a
(possibly empty) signal handler is needed because the default disposition is
either to ignore or to terminate the process, unless a thread has been created
and dedicated to handle kill signals. System calls return the EINTR error to
let a program know that they aborted. If there is a need to restrict killing to
some specific signals, all others can be blocked beforehand, and the wanted
ones unblocked while waiting.
The solution to avoid to loose a signal in a supervised block is that of
pselect()
: signals are blocked, then in the block
pselect()
is called, which atomically suspends the process and
unblocks signals (to block them again when resumed). It is then possible to
test EINTR. Also ppoll()
and others share this behaviour. This is
a way to make signals persistent. Unfortunately, this feature is present only
in few system calls. E.g., there is no equivalent for a sem_wait()
.
Note that blocking signals and testing EINTR is exactly making the victim
suicide.
There is no way to handle killing safely when done asynchronously (i.e. without the help of the victim).
Some operations are atomic for signals, like, e.g., the normal system calls
because they are executed by the kernel in a context in which signals have no
effect on them. Note, however, that many system calls are wrapped by library
functions, and the ones that are not need anyway the execution of several
instructions before switching to the kernel context, and some after returning
from it. If a signal handler interrupts this and does not resume the process
where it was interrupted it leaves it in an inconsistent state. See the
discussion of calling longjmp()
in signal handlers.
When setting a supervision on a block statement there is a need to cater for an outer supervision. I.e. supervised blocks can be nested, and use the same signals for supervision. There is a need to save/restore things so as to make it work. With alarms it is even more difficult because we must compute the time remaining.
Programs that use time supervised or non-blocking system calls (instead of indefinite waits), can implement kill using a signal handler that sets a kill flag. They have no problem of "loosing" the kill:
if kill flag ... (1) for (;;) timed wait (2) if not timeout break; if kill flag ... (3) }
If the signal arrives after (1) and before (2), the timed wait is executed, but then the kill flag is tested again after it, and the effect of the signal is achieved.
This solution works because:
if (killFlag){ (0) ... block signals ... cleanup, unwind ... unblock signals } (1) for (;;){ (2) ... acquire resource with timeout if not timeout break (3) if (killFlag){ (4) if (resource acquired){ (5) ... release resource } (6) ... unwind } (7) }
If the signal arrives at:
Note that retrying must be done at each resource acquisition.
The solution then exists, but it implies polling. All the more, it requires that polling is used in all inner pieces of software, including libraries. Note that if polling is not used, the risk of this solution is to block the process loosing the signal sent to kill it. The solution of the supervised block (if it had no races) instead would work even when the inner software is not instrumented.
With the kill flag technique there is no need to provide some means to support nesting of supervised blocks: there is a need to enclose in a supervised block only blocking system calls. For code that executes without waiting there is always a chance to detect the kill request. Even when the kill flag is raised immediately after having tested it (and found it false), it does its job: it will be detected at the next test point. Logically, nested supervised blocks do exist. They can be simulated with explicit tests.
Note that only library functions that test kill requests and honor them can be called here.
This solution needs a per-thread kill flag, and needs that all functions, including the library ones use non-blocking system calls instead of the blocking ones. Since this is seldom the case, this solution can only be used in some particular context, in which library functions that contain blocking calls are not used.
The polling solution can be done with signals blocked, or with signals unblocked:
errno
= ETIMEDOUT (or any other timeout returns), or
interruption, and a kill signal(s) pending. For non-blocking system calls
the condition is errno
= EAGAIN, or interruption, and a kill
signal(s) pending. Interruption occurs with signals that cannot be blocked,
like, e.g., stop signals. In these cases polling ends and the system call
does not succeed, otherwise it returns with success. When a system call has
a non-blocking variant and a time supervised one, the non-blocking one should
be preferred because it allows to detect immediately a pending kill
request, while the other does it after the first timeout has expired.
Cleanup code is better not be aborted by kill requests, and here it is so
since signals are blocked.errno
= ETIMEDOUT (or any other timeout returns), or
interruption, and a kill flag set. For non-blocking system calls the
condition is errno
= EAGAIN, or interruption, and a kill flag
set. It is also possible to test that the sleep in the polling loop is
aborted by a kill signal. In these cases polling ends and the system call
does not succeed, otherwise it returns with success. This solution is more
responsive because it can abort the sleep in the poll loop. It is also less
expensive to test a kill flag or an aborted system call than a pending
signal. Cleanup code is better not to be aborted by kill requests, and to
have it here, it must be protected by blocking signals in it.These solutions differ mostly in the interruption of system calls that is
done in the latter and not in the former. Note that errno
reflects
an event that occurred during the execution of a system call, while a kill flag
or signals pending reflect something that can have occurred at any time,
before, during or after a system call. Therefore, it may not be related to what
occurred in the system call executed immediately before. Polling allows to
detect kill signals that occur before the first poll or between a poll and the
next. It is not much important that a signal occurred before, during or after a
system call: it means killing and we act it when detected. Note that testing a
flag or a pending signal and performing an action is not atomic: a signal can
occur in between. This, however, does not cause a race: the event will only be
noted at the next iteration of polling.
When a system call returns successfully, there is no need to test immediately the occurrence of a kill signal, when it returns unsuccessfully, the kill signal is tested because we need to decide whether to stop polling. In such a case cleanup is void (unless something need be undone to cope with a failed system call) . Sometimes there is a need to kill also sequences of actions that do not contain slow system calls: to do it, a kill flag or pending signals must be tested. Here, again, we detect an event that occurred before.
In the case of polling with signals blocked, the only signals that can interrupt a call are the stop ones, the others are blocked, or kill the process. Therefore, in this case of polling, the test of interruption need be done only on the 15 system calls interrupted by stop signals.
The kill request can be tested (and acted upon) before testing timeouts, hangs, errors and interruptions. It would kill when the signal arrived immediately after the system call. However, it is cleaner to test it and exit the block when the system call terminates prematurely. What is important is that in this case we do not restart the call before testing the kill request, otherwise we can have system calls that have been interrupted by a kill signal and are restarted.
In a process, all signals must be handled, otherwise there is no point in treating only one in a supervised block bothering to cleanup while all the others would kill the process without cleaning anything. The same applies to a time supervised block. However, we likely want killing to be performed by only few signals. With the signals unblocked solution we need to unblock only these few signals, and provide a handler that raises a flag. With the blocked signals solution we need to test only these few signals pending, and not all. Let's say that blocking or ignoring signals that must have no effect on a program is something that we do at the beginning of a process. It is better to block or ignore all signals, and then to let the program pieces that need to handle some signal enable it.
The kill request (pending signals or flags) must be cleared at the beginning of supervised blocks so as to discard a kill signal, that arrived at the wrong moment. Note that in the case of unblocked signals, if a signal arrived before the block, unblocking it makes its handler run, and then set the flag, and eventually kill the block. So, the block is responsive to those signals. In the blocked signals solution, there is a need to clear pending signals after the block so as not to cause problems to another block executed later. In the unblocked solution there is no such a need since handlers are run only in the block.
The signal blocked solution cannot handle stray signals. It could then be used only when signals can come from anywhere. Since it has this drawback and it is also less responsive than the one with signals blocked, is not suggested. However, here it is.
Polling with signals blocked
This is the scheme of polling with signals blocked:
// test if kill has been requested int killRequested(){ sigset_t sigpend; sigemptyset(&sigpend); // sigpending does not clear it sigpending(&sigpend); if (sigismember(&sigpend,SIGxxx)){ int sig; sigwait(&sigpend,&sig); // clear kill request errno = EINTR; // report abortion return 1; } return 0; } // example of wrapping a system call using a time supervised one (with absolute time) int psem_twait(sem_t* sem){ if (killRequested()) return -1; // not to loose the time for one poll for (;;){ struct timespec ts; struct timespec tick = {1,0}; // poll period 1 s timeend(&tick,&ts); if (sem_timedwait(sem,&ts) == 0) return 0; if (errno == ETIMEDOUT){ if (killRequested()) return -1; } else { // some error return -1; } } } // example of wrapping a system call using a non-blocking one int psem_wait(sem_t* sem){ if (killRequested()) return -1; // not to loose the time for one poll for (;;){ if (sem_trywait(sem) == 0) return 0; if (errno == EAGAIN){ if (killRequested()) return -1; } else { // some error return -1; } struct timespec ts = {1,0}; nanosleep(&ts,NULL); } } // example of wrapping a system call that is interrupted by stop signals int pepoll_wait(int epfd, struct epoll_event* events, int maxevents, int timeout){ ... see the one for polling with signals unblocked } // supervised block ... { // supervised block clearKill(); // kill pending signal, if generated internally ... int res = psem_wait(&sem); // slow system call if (res == 0){ ... success } else if (errno == EINTR){ ... killed ... cleanup. N.B. signals blocked goto endblock; } else { ... error } ... other actions clearKill(); // kill pending signal, if generated internally } endblock:;
System calls that have a timeout (which is not the one used to implement polling) need be supervised only when the timeout is sufficiently long, in which case polling must honor that timeout, but system calls that are interrupted by stop signals need be always supervised. To make polling honor the timeout, a test must be done at each polling iteration to check if the deadline has passed.
Signal handling is often shown in textbooks and man pages using supervised
blocks implemented with longjmp()
. This technique has many
problems, as will be shown here. However, since it has long been in use, it
deserves studying it.
The canonical way to write a supervised block is:
if (setjmp(cxt) != 0){ handle exception } else { try block }
When the if statement is executed the first time, the context is set and the
try block executed. If during the execution of the try block a
longjmp()
is done, execution resumes at the setjmp()
,
but making it return a value different from zero, thus entering the exception
handling branch.
Here is the implementation of graceful killing done using a supervised block:
handler no need to block permanently signals: siglongjmp does it siglongjmp(context,1); // leave the signal unblocked at the beginning: it must kill save signal mask block SIGxxx register handler for SIGxxx if (sigsetjmp(context,1) != 0){ cleanup } else { unblock SIGxxx supervised block } deregister handler restore signal mask
The signal, that is temporarily blocked in the handler, remains permanently
blocked when the handler returns with siglongjmp()
, so that the
code that handles the signal (the "bottom" half cleaner) can perform the
cleanup actions without the risk to get interrupted again by the same signal.
Since the signal is blocked at the beginning, it cannot occur while
sigsetjmp()
writes its jump buffer, or before if:
siglongjmp()
would not be done, and nobody would complain if its
jump buffer is inconsistent. If the signal occurs when the supervised block has
completed, but before the handler is de-registered, then the cleanup would be
done, but the cleanup code must be prepared for it since the signal can occur
at any time in the supervised block. We have shown before
that reckoning acquired resources is not possible when a handler interrupts
asynchronously a course of actions, and here the cleanup code is reached with a
jump from a handler and thus it is executed as if it were in the handler. But
let's forget for a moment this issue.
If the signal occurs during the cleanup, but before the handler is
de-registered, nothing happens since it is blocked. It the signal occurs after
de-registering the handler, then it does nothing if cleanup has been executed,
otherwise it executes the previous handler. It would not restore the old mask,
but this would leave a mask different only if the signal was blocked in the old
mask. Let's try to avoid to execute cleanup twice. Let's see what happens if we
swap the de-registration and the restoring:
restore signal mask deregister
This is no improvement. Let's then do the following:
if (setjmp(context) != 0){ cleanup (1) deregister handler (2) restore signal mask } else { unblock SIGxxx (3) supervised block (4) restore signal mask (5) deregister handler }
Let's see what happens when the signal arrives at the indicated points:
1: nothing, signal blocked
2: nothing, signal blocked
3: cleanup
4: cleanup
5: if signal unblocked, cleanup, otherwise nothing
The problem is to distinguish between 3 and 4. If the signal occurs at 3,
the change to the program state has not occurred, and the cleanup code must do
nothing; if occurs while executing the supervised block, some cleanup need be
done; if it occurs at 4, then the cleanup code must restore the previous state.
Note that the signal can occur just between the last instruction of the
supervised block (which could be the return from a system call, and the next
one, which can be one of the function that wraps the system call) and the next
one. Unfortunately, there is no way to distinguish these cases. This race is
one of the biggest failures (besides others) in the longjmp()
implementation of supervised blocks.
Note that it is possible to nest supervision since it saves and restores the handler in effect. This solution has the defect to register and deregister handlers, which is not fast. Moreover, setting up a supervised block and closing it requires many statements. and it is not easy to put them in a couple of functions or macros.
As a side comment, there is an implementation of sleep()
that
uses alarm()
and that does not use longjmp()
. It
blocks SIGALRM, then it issues an alarm()
(and after it, the
handler cannot interrupt since the signal is blocked), and then suspends
atomically unblocking SIGALARM (see Advanced Programming in the Unix
environment, page 318). This can be done here because there is a function that
atomically waits and sets the signal mask.
Since setting a handler is costly, a solution is to register it only once at
the beginning and play with the jump buffer using a
siglongjmp(*ptr)
to jump to the cleanup code of the current
supervised block. ptr
can be set to NULL
at
beginning, and a test made in the handler to use it when not null. Here is the
solution with the handler registered only once:
static volatile __thread sigjmp_buf* jmp; static volatile sigjmp_buf nokill; handler sigjmp_buf* tmp; // save current tmp = (sigjmp_buf*)jmp; jmp = NULL; // make it NULL: prevent further jumping after handler jump if (tmp == NULL){ pthread_exit(NULL); } else if (tmp == (sigjmp_buf*)&nokill){ return; } siglongjmp(*tmp,1); } sigaction with SA_RESTART handler, all signals blocked ... sigjmp_buf buf; sigjmp_buf* prev = (sigjmp_buf*)jmp; // save current if (sigsetjmp(buf,1) != 0){ ... cleanup } else { jmp = (volatile sigjmp_buf*)&buf; // (1) ... block of statements sigjmp_buf* save = (sigjmp_buf*)jmp; // save: start non-killable block jmp = &nokill; ... actions not killed jmp = save; // restore: end non killable block } jmp = prev; // restore previous supervision
At the beginning the jump buffer is null, which means that the signal is caught and it kills the process or thread. The pointer to the jump buffer is a per-thread variable so as to make the handler jump to the appropriate place for each thread.
Note that setjmp()
does not accept a volatile
jump
buffer argument. However, volatile
prevents optimizations and
reordering, that can hardly be done when passing arguments to calls. The
solution above, however, uses a (volatile
) pointer to a jump
buffer, and therefore the only caveat that it needs is to make sure that no
reordering occurs between the setting of the pointer in (1) and the statements
in the block below.
Another solution is to use a flag to inform the handler when it can jump (and otherwise do nothing). The flag is cleared before registering the handler, and before returning from it, and it is set at the beginning of the supervised actions. Since the handler returns with the flag cleared, it does not jump again when run again.
To support nesting, there are two alternatives:
Semantically, the former seems more correct: if an outer block wants to catch kill, it should catch it when a kill signal is sent even when there is an inner one that does the same.
Macros can be defined to make the supervised block opening, closing, etc. more readable.
The problem with longjmp()
in signal handlers is that the
handler could have interrupted a call that is not asynch-signal-safe and that
has left some global data in an inconsistent state. Think, e.g., to a
printf()
that upon exit clears some global data, and that upon
entry uses such data, assuming that they are properly initialized, or left so
by the previous execution. Interrupting it would mean that it cannot be called
anymore in the program. longjmp()
in a signal handler is not safe.
If a handler interrupts a malloc()
, that uses static variables to
reckon the status of memory, then a longjmp()
would restore
registers, stack and program counter, but will leave the malloc()
in an inconsistent state. See the text of CERT SIG32-C: "Invoking
longjmp()
from a signal handler causes the code after the
setjmp()
to be called under the same conditions as the signal
handler itself. That is, the code must be called while global data may be in an
inconsistent state and must be able to interrupt itself (in case it is itself
interrupted by a second signal). So the risks in calling longjmp()
from a signal handler are the same as the risks in a signal handler calling
asynch-signal-unsafe functions." I.e. the code after a
longjmp()
could fail calling a function that is not
asynch-signal-safe and that finds its data corrupted because the handler has
interrupted a previous execution of the same function.
Note that a handler that does not terminate because it executes a
longjmp()
, and that instead terminates normally does not have this
problem because when it returns, the interrupted function or system call is
resumed, and then it can proceed and make the data consistent. The program,
when resuming after a longjmp in a signal handler, is in a very critical state:
it has no way to make an interrupted asynch-signal-unsafe function
recover (so as to clean its data), allowing it to be used again. All the more,
it does not even know if the handler has interrupted such a function. This
means that supervised blocks cannot be implemented. This means also that
siglongjmp()
can be used only when supervised blocks do not
contain asynch-signal-unsafe calls.
Note that since a signal can interrupt a thread at any time, if a handler
would be allowed to make a siglongjmp()
, then any update of data
would need to be made atomic, or surrounded in a supervised block. This would
make the code very involved. It is a lot better to let the thread code go on,
update its local data without making this atomic (i.e., protecting it by
blocking signals). Moreover, such a handler could interrupt a thread also
between a resource acquisition and its reckoning, thus making difficult, if not
impossible, to tell if it has to be released in the cleanup code, which is not
in the handler (but in the thread).
THEREFORE, longjmp()
and siglongjmp()
in general
must not be used in signal handlers. Supervised blocks cannot be made using
them.
This section is concerned with programs that want to perform cleanup in signal handlers. Suppose a process acquires a lock, and receives a signal after the lock acquisition and before the process reckons this acquisition somewhere in its process data. If a signal handler has been registered, it is called (at that point), but it is not able to release the lock since it is not sure that it has been acquired because there is no atomic operation that acquires a lock and remembers it. What is atomic is the getting of a lock and the advancing of the program counter, but the signal handler cannot test the program counter. In some cases we can record in a list an object with a null pointer to a resource (e.g., an open file), that is filled by the operation that allocates the resource. But there can be cases in which the allocation of a resource is not done by a system call, and therefore is not atomic. The allocator of such resources should have a supervised block in it so as to return either with a resource (a single pointer) or with none. A single pointer does not need registration: when it is null it means no resource, when non-null a resource. However, returning a resource handle in a variable is often done by wrapper functions that perform it with instructions instead of being done by system calls before they return to the process space (when they are still atomic). A means to make resource acquiring and reckoning atomic is to disable signals when doing it. As with monitors, when there is a need to wait for a resource, the "lock" must be released before waiting, which here means to unblock signals before waiting. E.g.:
block signals acquire lock register lock acquired unblock signals
However, this does not work: it would keep signals blocked while waiting. We could swap acquiring and blocking:
acquire lock block signals register lock acquired unblock signals
but this has a race between acquiring the lock and blocking signals. There
would be a need for a primitive that acquires a lock, while waiting unblocks
signals (more or less like sigwait()
), and blocks again signals
before returning. Four such primitives exist: pselect()
,
ppoll()
, epoll_wait()
and read()
from a
signalfd()
but all the others do not do it. We could place it in a
supervised block:
BEGIN_KILL acquire lock ON_KILL release lock END_KILL
But in so doing, there is no reckoning. All the more, a signal can occur
immediately after the acquiring as well as during it, which means that in the
cleanup code we do not know if we should release the lock or not. Likewise, the
signal can occur immediately before the acquiring, making the cleanup code also
unable to tell if it has to release the lock. Unfortunately, there are no means
for a handler to know if it had interrupted a system call or not, otherwise it
could let the cleanup code know it, and release the lock in such a case. When a
system call is interrupted, errno
in the handler is 0, and also in
the siginfo struct
passed to it. But since the block could be
interrupted also before the system call, knowing in the cleanup code that the
signal aborted a system call would not avail (it still does not know if it had
interrupted the block before acquiring or after it).
Note that when a process receives a signal, it can be interrupted inside a system call, in which case it aborts it, but it can be interrupted also immediately after a system call resumes the process. Suppose a process is blocked on a semaphore. The process is placed in the ready queue. Later, another unblocks the semaphore. But this does not mean that the process becomes running immediately (there can be ready processes with higher priority). When in such a state, a signal can be sent to it. There is no guarantee that the process, when resumed executes at least one instruction placed immediately after the system call: it is instead likely that it will execute the signal handler. Moreover, the function that acquires a resource could be wrapped by a library function, that executes a few instruction after the return of the system call in it. This means that the process can be interrupted after the system call and before it has a chance to record somewhere that it got a resource.
This is only possible if the resource acquisition returns something
observable atomically, like, e.g., a pointer. This is seldom the
case, e.g., a
sem_wait()
returns nothing.
This means that the only one safe method to perform killing when a process acquires resources is to inform the victim and let it do it.
Basically, to kill a process, a signal could be sent to it. Its handler sets a flag, and the process tests it often, cleanups and exits. The only problem is when the process is inside a blocking system call or in an endless loop. In the former case, the signal interrupts the call, and the process can test the flag. In the second one, the handler, besides setting the flag, could also start a timer, and the handler of the timer terminate the process. However, this has a race: if the signal is caught between testing the flag and entering a blocking system call, the effect of the signals is lost, and the process blocks in the call. If this race were not present (i.e., if some form of persistent signals were supported), the user code would be something like this:
{ // doit { // cleanup operation if (killFlag) break cleanup; system call if (errno == EINTR) break cleanup; break doit; } cleanup: perform cleanup here } doit:
Note that a signal handler here would just do nothing besides setting the flag, and return (and as a side effect, make blocking system calls return).
In order to eliminate this race, these solutions exist:
Supervised blocks can provide graceful killing, except the ones implemented
with siglongjmp()
because it makes impossible to tell if a
resource has been acquired: when the signal comes right after a successful
acquisition, the handler cannot tell if acquisition has aborted or not. When a
process is blocked in a slow system call, and a signal occurs, and is caught by
a handler, there is no way for the handler to detect that the process was in a
system call (errno
is not set to EINTR
). Only when a
handler returns normally, errno
is set to EINTR
.
Besides that, it would be quite a costly solution if it used
siglongjmp
() because the code that would be executed normally
(i.e., all the times, even when there is no kill) would contain a
sigsetjmp()
, that costs as much as 110 memory-to-memory copies.
Note that cleanup cannot be performed directly in a signal handler because inside it there would be a need to call a generic cleanup function that closes files, releases locks, etc., but such a function needs to know what resources have been allocated, which is not possible to know reliably in a program that is interrupted by asynchronous signals. They interrupt the acquisition of resources with their reckoning in some data structure. See here why. Moreover, many system calls that are needed to perform cleanup cannot be called in signal handlers.
An alternative solution to cancellation (the shooter) is to send a sequence of signals to the victim so as to solve the problem of lost signals, i.e., to unblock it, should it be blocked in a system call.
Unfortunately, it is not possible to create the killer thread when the signal comes: it must be already there.
The shooter can be used in the following contexts:
exit()
) because the shooter would
have to rendezvous with it at the end of cleanup.We would have a solution that has more non-killable points than cancellation, and can misfire. Overall, it kills less than cancellation. The choice is then between these weaknesses and the nuisance to create a thread for long tasks and cancel it, or to reuse a thread,
There are a number of blocking system calls, like,
e.g., pthread_mutex_lock()
that are not interrupted by signals.
There are more of them than the ones that are not interrupted by a cancellation
request. This means that with signals it is possible to kill less than with
cancellation.
However, for sake of completeness, the implementation of the shooter solution is described in the following sections.
When killing a thread, the shooter must synchronize with the victim so as to stop sending signals, otherwise it could kill another thread with the same thread ID if the victim quit in the meantime.
When killing a single-threaded process: The killer thread has no means to detect that the victim reached the cleanup, or that it was blocked and then it is no longer so. If it could detect it, then it can stop sending signals. To acknowledge, an unnamed semaphore can be used (it is implicitly destroyed when the process terminates, even abnormally). However, acknowledging is not strictly needed: the cleanup code either does not contain blocking system calls, or it does, but in such a case it is executed with signals blocked, and therefore the sequence of signals has no effect. Moreover, if the termination of the main thread terminates the killer too, there is no delay in termination. Likewise, if the sequence of signals is fast, there is no delay in exiting. In conclusion, there is no need to send an acknowledge in this case. Take into account that when the killer has sent a kill signal, and the other does not acknowledge, the killer does not know where the other is. Then it sends another kill, but it is like shooting a thread that becomes vulnerable only when it is blocked. If you shoot when it is not vulnerable, nothing happens, you only misfire. When it is vulnerable, it proceeds and immediately it ceases to be vulnerable.
When killing a task, and the task has detected a kill request (either because it has tested the kill flag, or because it has detected an interrupted system call), it starts to perform cleanup, and in it it can use system calls, or it has ended cleanup and has started some other task, and that also may contain blocking system calls. Such system calls risk to be killed by stray shooter signals (if the killer gun is not closed). There is thus a need to synchronize the shooter with the task.
Acknowledging can be done with a semaphore. It can be done with a flag that is tested by the shooter before posting to the semaphore. The shooter could be de-scheduled between testing and sending, and in the meantime the victim raise the flag, but when setting the flag, the victim can also block signals, thus preventing any interruptions. But then to prevent the signals to show up when unblocked, there is a need to flush the pending signals before unblocking them (e.g., setting the disposition to ignore, and then restoring the previous one and unblocking).
A victim has to test the kill flag, as if signals were persistent. It is up to the victim to cleanup: another thread cannot do it reliably because it would execute in parallel with the victim, and the latter change the set of resources immediately after the killer has got it, and without the killer being informed:
For the shooter, killing a task or a thread is basically the same: in both cases it has to send signals. What is different is killing a process because in such a case it can forcibly kill it.
Requests to kill a thread are sent to the shooter. It could be done sending
it a signal (a realtime one, that is queued). The shooter must be quick to get
these requests, and must handle all of them in parallel, i.e., keep track of
what threads are under killing, and how many signals have been sent to them. It
could be possible to keep track of the tid's only, and send signals to all the
threads under killing (and that did not have disappeared), up to a retry limit
until there is at least one to kick. The idea is that the shooter is waiting
for a signal, that can be the one sent from another process to kill the
process, a request of a thread to kill another, or a timer expiring to retry a
kick. It is acceptable to synchronize all the kicks so that when a new request
to kill a thread has come, it is put in the list, the thread kicked, and if the
timer is not running, it is started. When the timer sends a signal, the list is
scanned, and each thread in it kicked. The ones that are terminated, are
removed from the list. The ones that have completed their kicks are removed
too. If the list becomes empty, the timer is stopped. Pay attention to the
timer because the risk is to have some unexpected shoots. Since the shooter is
using sigwait()
, it should not be disturbed by signals when it is
executing its code. The problem is to distinguish all the events in it. It is
not possible to distinguish because there is no additional info because
sigwait()
does not execute the handler. This can be overcome using
sigsuspend()
, that executes the handler. We could then use SIGTERM
both for process kill, and for a thread kill request, in which we pass the tid,
providing that in the case of the process, the associated info is guaranteed to
be zero. A timer could also send SIGTERM, and it can sends also a value, e.g.,
-1 to distinguish it from the other cases.
Another means to implement a shooter is to use a realtime timer, and make it periodic. It sends a sequence of signals. There is a need to stop the timer when no more shoots need be done. Of course, this is convenient when there is only one thread to kill since it needs to create a timer, and use a dedicated signal. The signal could be the same for all threads to kill: its purpose is mainly to abort blocking system calls, but if there are several threads to kill, all of them must receive the signal, which is not simple. In such a case there is a need for a more complex shooter, that serves the timer, and kills all the threads that need be killed.
The shooter must set a thread-private kill flag. It should be accessed both by the killer and the victim, which means that victims must pass a pointer to their flag to the shooter.
A special shooter can be used to kill a process. It is a thread that waits for a kill signal and kills the others repeatedly sending them a number of kill signals, sleeping for some time and eventually killing unconditionally the entire process. The threads can perform graceful termination in that timespan. If they succeed, then the process terminates and with it all its threads. If they do not, they are forcibly terminated by the process termination. It would kill unconditionally the process only when it does not react to a sequence of kill signals occurring in the interval between a blocking system call and the last test of the kill flag done before it. This interval can be made very short by testing the kill flag before any such call. The likeness that the process does not react (i.e., that it receives all the kill signals in that interval) is very small, and in such a case the process will be forcibly killed. In such an interval the process is not acquiring resources, and therefore it should be possible for it to reckon exactly the allocated resources, and thus allow the forced kill to perform cleanup. The victim, when testing the kill flag and finding it true should first acknowledge it to the killer, and then perform cleanup and eventually tell the killer it has finished (but the killer could wait with timeout for it to finish). Acknowledging allows the killer to avoid sending unnecessary kill signals, and to terminate the process as soon as the threads terminate if that occurs before the timeout.
The shooter could kill some threads that in their cleanup handlers kill others, and that join them. This is not a problem because the shooter cancels only the ones that are alive. But the shooter could have killed one that is also killed by another. With such a shooter, threads should not kill others, but they could kill others as part of their normal operation. To avoid to wait more than needed, the shooter could monitor the number of threads alive, and when it comes to 1, exit immediately.
Note, however, that this procedure for killing threads can only be applied when threads can be killed in any order. A more general procedure is to perform cancellation on the main thread (which would cancel the others created by it) and time supervise the operation.
Linux does not provide any means to send a kill signal to all the threads in
a group with a single system call. The /proc
virtual filesystem
allows to retrieve the tid's of all threads in a process looking to the
/proc/<pid>/task directory. However, Linux does not provide any means to
get task IDs (pthread_t
) from tid's, which are needed to invoke
pthread_kill()
and pthread_cancel()
. This means that
there is a need for the program to reckon task IDs (which cannot be done for
the ones that are not explicitly created by the program, but by some library
functions that are called).
Note that while reading /proc
, new threads could be created or
terminated. To cater for this, either the shooter can loop until it finds no
changes, which could be forever, or it can read it, then send signals to the
threads in it, then reads it again and send signals to the thread again,
retrying a few times. The shooter would kill the threads that generate others,
and then likely converge. Note that this eliminates also the need to
acknowledge: at each retry the threads that have tested the kill flag and
entered cleanup and completed it disappear.
When there are many threads, there is a need to have only one receive the kill signal and kill the others. To kill the others it needs to send them another signal.
Of course, there can be threads that block SIGTERM, and that will not receive it. They would behave as if they entered an endless loop.
Asynchronous cancellation could be implemented with signals using some
signal that is always blocked and that is raised and unblocked on
pthread_cancel()
, having its handler run the registered cleanup
functions (but cleanup handlers do not have the restrictions of signal
handlers). The deferred one could be implemented testing a kill flag in system
calls. Actually, Linux implements cancellation with SIGRTMIN.
Cancellation for threads does not have the races that the signals have, i.e. its requests are not lost. It allows graceful termination. A cancellation request is detected at the next system call (one of the many that detect it) that the thread executes. So, if the cancellation request is made after having tested it, and before making a blocking system call, it is not lost.
Cancellation requests are really permanent: when one arrives immediately before a thread has disabled cancellation, or while it is disabling it, it is honored at the first cancellation point after the thread has re-enabled cancellation.
Thread cancellation has one important restriction: it does not allow to cancel forcefully another thread (the victim can disable cancellability, and none of the others can enable it). Even if a signal handler or a cleanup handler would change the cancellation type to asynchronous it would not be possible to forcibly kill a thread because the tread could disable cancellation anyway. This means that cancellation is meant to be used on cooperating threads, that are responsive to cancellation. Threads that can execute code linked dynamically are not responsive if that code is not so.
The OpenGroup documentation provides a list of system calls that are (or can
be) cancellation points. There are many calls that POSIX specifies to be
cancellation points and that actually are, but their man pages say nothing
about it. There is no guarantee that there are no blocking calls that are not
cancellation points (actually, e.g., pthread_mutex_lock()
is not a
cancellation point). But cancellation points are more than the system calls
that return EINTR, thus cancellation is the best that can be used to kill. The
Open Group documentation does not ensure that system calls that return EINTR
are also cancellation points.
This is a table of the major blocking system calls, showing if they can be interrupted by signals or cancellation requests:
system call | ?EINTR? | ?cancellation point? |
pthread_mutex_lock() |
no | no |
flockfile() |
no | no |
sem_wait() |
yes | yes |
sem_timedwait() |
yes | yes |
pthread_cond_wait() |
no | yes |
pthread_cond_timedwait() |
no | no |
pthread_barrier_wait() |
no | no |
pthread_spin_lock() |
no | no |
sigwait() |
no | yes |
sigwaitinfo() |
yes | yes |
sigtimedwait() |
yes | yes |
pthread_join() |
no | yes |
Some system calls detect cancellation only when they are blocking. The man
page of sem_wait()
states nothing about the behaviour of
cancellation: whether the lock is not got when cancellation occurs, but is says
what happens when it is interrupted. It is likely that cancellation occurs when
it could be interrupted. Thus, sem_wait
() on a semaphore > 0
would not be a cancellation point. Actually, it is not. The POSIX standard
seems to allow a sem_wait()
on a green semaphore not to honor a
pending cancellation request ("However, if the thread is suspended at a
cancellation point and the event for which it is waiting occurs before the
cancellation request is acted upon, it is unspecified whether the cancellation
request is acted upon or whether the cancellation request remains pending and
the thread resumes normal execution."). I.e. it allows an implementation to
honor a cancellation request only on blocking points, which is when a system
call returns with EINTR. What is important is to define clearly what happens
when a system call is executed and there is a pending cancellation request, or
such a request occurs during its execution:
Some functions, like, e.g., pthread_mutex_lock()
is not a
cancellation point. Therefore, if a thread is deadlocked on a mutex,
pthread_cancel()
does nothing to unblock it. There is no much
documentation explaining why some blocking functions are not cancellation
points. The rationale, as much as I got it is:
pthread_mutex_lock()
were a cancellation point and a thread
needed to make an atomic update of data, it would had to disable
cancellation in critical regions (note that
pthread_mutex_lock()
is also not interrupted by signals).
Mutexes are meant to be used to implement short critical sections only.
E.g., a mutex lock can only be released by the same thread that owns the
lock (as a rule this is checked only when the mutex is created with the
error checking attribute, and otherwise it is not checked, but considered
an error). Moreover, cleanup handlers are code that is running as part of
the cancelled thread, and that has then to run concurrently with other
threads. It needs then some mechanism to protect critical sections, and
this is mutexes. In order to keep the state consistent, threads using
mutexes would have to disable cancellation when using them (if mutexes were
cancellation points), which would make the code more lengthy.pthread_mutex_lock()
were a cancellation point, they would
become so as well. Having too many cancellation points makes programming
difficult because of the need to provide cleanup at too many places.pthread_barrier_wait()
is interrupted by signals, but it is
resumed automatically when the handler returns. Killing threads when they
are going to wait on a barrier seems meaningless if we look to the single
thread, but it is not if we think to all threads that are going to wait on
a barrier. E.g., a program could call a library function that uses
internally some threads and that at the end wait on a barrier, and the
caller could need to kill itself, and thus also the library function and
all what is in it.A restriction of cancellation is that it is not possible to delimit its scope: it applies to whole threads only. An application that needs to restart afresh (i.e., to recycle) must kill all the inner operations, and not terminate the main. To implement recycling, the main should create a thread, and if it is cancelled, then it must recreate it.
Another restriction is that it does not allow to treat locally the interruption (and the cleanup). E.g., in a snippet of code:
open(file1); open(file2); open(file3);
with the cancellation API, since all three are cancellation points, if we want to close the files that are opened, we must set three cleanup handlers, that contain the cleanup code:
open(file1); pthread_cleanup_push(cl1); open(file2); pthread_cleanup_push(cl2); open(file3); pthread_cleanup_push(cl3);
(or reckon what files are opened, and use only one cleanup) while with persistent signals one can remember the opened files, which in this case is automatic, and then perform the appropriate closing:
{ open(file1); if (killed) goto doit: open(file2); if (killed) goto doit: open(file3); if (killed) goto doit: } // doit: if (killed){ if (file1 != null) close(file1); if (file2 != null) close(file2); if (file3 != null) close(file2); }
On the other hand, there is a need to unwind the stack and to execute all the cleanup code of the nested functions. However, reckoning is also possible with cancellation. The real difference is that it is somehow more difficult to pass the handles of the files to close to the cleanup handler than to close them in the same piece of code that has opened them.
Many (probably most) operations can be undone. Some cannot. Think, e.g., sending a message to a system logger that does not support a remove operation. There is no way in such a case to undo an operation (except sending another message telling that the former is not valid). Another example is formatting a disk.
The registration of cleanup handlers is provided by functions that are implemented in Linux as macros that enforce scoping. This has the advantage of avoiding to forget pairing, but has some disadvantages:
In order to spare callers the burden of setting up a cleanup handler and then calling a start function, and doing the opposite when stopping it, libraries should provide these functionalities as macros. E.g.
void openservice(){ ... code to open or start this service } void cleanup(void* arg){ ... code to closer or stop this service } #define OPENSERVICE() pthread_cleanup_push(cleanup,NULL); openservice() #define CLOSESERVICE() pthread_cleanup_pop(1) // example of library use OPENSERVICE(); ... other actions CLOSESERVICE();
There is no way to forcibly kill a thread. This would be done after a thread
has been cancelled, and the thread did not terminate in a defined amount of
time. At that point the thread is executing its cleanup handlers, or it is in
an endless loop. In the second case a signal must be sent to it so as to
interrupt it and cancel (or exit) it. In order to make this cancellation act
from within a handler, the handler would set the cancellability type to enabled
and asynchronous. Strangely, when a handler interrupts a system call, the
cancellability becomes asynchronous (if in the handler it was deferred), while
when it interrupts normal code, the cancellability remains deferred. This
caters for endless loops in the normal code of a thread. However, if there is
an endless loop in a cleanup handler, there is no way to kill the thread. The
only thing a handler can do to kill the thread is to cancel it, but at that
point cancellation has no effect (the handler cannot do a
pthread_exit()
since this is forbidden from cleanup handlers).
This means that threads are parallel functions inside a same process, that must cooperate also on killing. There is no possibility to repair always. If nothing can be done on a thread once its cleanup handlers are in execution, there is no way to forcibly kill it. Threads are tightly coupled parts of a process, and thus if they do not terminate, this is a program error that cannot be recovered.
Knowing then that a thread has no way to forcibly kill another, let's see what it can do to kill it.
Joinable threads must be joined. When a join is executed, it would be nice to time it. But this would be useful if the killer had a way to forcibly kill it. Not only, timing join would allow the killer to proceed, and then two things can happen: if the killer is quicker and terminates the process, the victim is killed immediately; if the killer is slower, it terminates placing the process in the defunct state. A killer thread can join a victim reliably only by blocking. A timed join could be implemented with a supervised block. Alternatively, we could synchronize with the normal or abnormal completion of a thread with some other means (e.g., a semaphore) that supports timeouts (and using detached threads). However, as said above, a thread that does not terminate is a programming error that cannot be recovered anyway.
With detached threads no join is done (and the killer must know what threads are detached because it cannot not test if a thread is such), and then there are two alternatives: to use a signal to make the victim tell the killer when it is exiting, or the killer polls it. The first is less safe: what if the victim forgets it? Forcing the detached state from the killer on the victim is not possible (and it would not be safe either because the victim could change it), and the same applies to getting it. We can detect that a thread terminated sending the signal 0 to it (but then it would be better to use a joinable threads). Detached threads have the same problem as the joinable ones: if the main thread terminates before, it remains defunct. This means that when a thread creates another, and that other is detached, it has to use some means to synchronize with its termination so as not to leave it around.
Killing a request is the killing of a sequence of operations that are being done in a reply to a request. Think, e.g., to a web server process that receives requests to deliver pages. To implement it, a control thread must be started, that is told what request to kill, and that it finds what thread is serving it, and then it cancels the thread, possibly not waiting so as to serve quickly other kill requests. If there are several threads executing requests, there should be some map between requests ids and thread ids; moreover, there should be no window between the time a control thread seeks a request, and the time it cancels the thread. I.e. the control thread must lock the entry in the map and cancel the thread using a critical region so as to prevent the thread to remove the entry and terminate, starting processing another request.
Signals can be used as inter-process communication means (IPC).
All other IPC means, like, e.g., messages, semaphores, etc. require to create a permanent object, and to destroy it when it is no longer needed, while signals can be sent to any process without a need for system objects. This can be seen as a disadvantage, but it can be an advantage too: it allows to communicate with processes that are not yet created.
Semaphores, mailboxes, pipes, etc, need to be created, and even worse, need to be removed when processes terminate (even when they terminate because killed, with the risk to litter the system with no longer useful objects). Signals are also served before the process to which they are sent does anything else (providing it has registered a handler). Moreover, the idea of interrupting a sequence of statement is something basic that cannot be achieved with messages sent to threads waiting for them. Of course, the request made by a process to another to kill itself or the actions that it is executing can be implemented sending a message to it (having in it a thread receiving the message), but that thread needs anyway a means to interrupt the actions done by its process.
Signals can be used as semaphores or queues: waiting for a semaphore to
become green or a queue to have an element to get is done by using
sigwait()
, or sigwaitinfo()
; posting to a semaphore
or a queue by sigqueue()
, pthread_kill()
or
kill()
. Synchronization among threads can better be done using one
of the IPC means available (locks of various kinds, message queues, etc.).
Therefore, signals as IPCs are restricted to synchronizing processes. Note that
related processes know each other pids, while unrelated processes do not know
them, and thus they must communicate (or find) their pids in order to
synchronize with signals. A drawback is then that this exchange of data must be
done, and another is that pids can denote processes that do not exist any more,
or worse that are recycled and denote homonyms. This kind of synchronization is
mostly used to issue requests to servers, or demons, i.e., processes that run
permanently.
Another difference is that signals can carry data, i.e., are similar to messages, while semaphores do not.
This is the scheme of the server process (the thread in it that is devoted to handle signals):
#define SIGCTRL (SIGRTMIN+6) // signal used as IPC sigemptyset(&mask); sigaddset(&mask,SIGCTRL); // use a realtime signal, that is queued int sig = 0; int val = 0; for (;;){ siginfo_t siginfo; sig = sigwaitinfo(&mask,&siginfo); // wait request if (sig == -1){ if (errno == EINTR) continue; ... error } val = siginfo.si_value.sival_int; // extract associated value sigval_t sval; sval.sival_int = ...; // send reply with this value if (sigqueue(siginfo.si_pid,SIGCTRL,sval) < 0){ // send reply ... error } }
This is the scheme of the process that makes a request:
sigval_t sval; sval.sival_int = val; // send request with this value if (sigqueue(pid,SIGCTRL,sval) < 0){ ... error } siginfo_t siginfo; while (sig = sigwaitinfo(&mask,&siginfo) == -1 && errno == EINTR); // wait reply if (sig == -1){ ... error } val = siginfo.si_value.sival_int; // get value from reply
Time supervision, although it is in general incompatible with the fact that processes and threads progress with the time that the scheduler gives them, is a need that occurs in applications. E.g., there are cases in realtime systems in which a sequence of operations must be done in a given time, otherwise something else need be done. There are several means to ensure that applications that have time constraints meet them. In these cases it is meaningful to supervise progress.
Time supervision can be applied to a single operation, usually a slow system call, or to a sequence of operations. Many slow system calls support time supervision. This works fine because supervision is started at the same time a process suspends, and therefore it supervises exactly the waiting time. Some system calls, though, require absolute time, which means that the current time need be got, and the deadline computed, and the process can get de-scheduled in between, thus making the deadline unreliable to some extent. Usually, however, the timeout is much bigger than the error caused by this. The same applies to the system calls that have no time supervision.
To time supervise a system call, like, e.g., a read()
, we could
run a thread that makes the read()
, and another that starts the
timer, and wait for one to end, but this also would not be perfect (the one
that reads might get de-scheduled, letting the other come first). Upon time
expiry a signal is can be sent. A problem is that once the handler has been
set, and a signal unblocked, it can be delivered, and this can occur before the
system call or function to time supervise is executed. If this occurs, we must
not execute the call or function (otherwise it would not be supervised any
more). This in practice is not a big problem because the chance that it occurs
are few. However, the code must not contain the race all the same. A bigger
problem is the expiry of the timer due to de-scheduling. This can be mended by
using a long timer, or by using timed operations, when available.
Time supervision on I/O is provided by poll()
. E.g.:
struct pollfd fds; fds.fd = fileno; fds.events = kind of event to poll ... other file descriptors to watch, be their number n int res = poll(&fds,n,timeout-in-millis); if (res == -1){ ... error } if (res == 0){ ... timeout } else { ... events, in fds.revents }
Time supervision is a supervised block in which the kill signal is generated internally, and thus can be implemented as such:
pselect()
, ppoll()
, etc.: block signals, start
a timer with a signal and abort the block when these calls are
interrupted.pselect()
, ppoll()
, etc.Cancelling a thread when time expires can be done by starting an ancillary thread that waits for the given time and then cancels the other. This means to have two threads for each time supervised task.
Another solution is to start a timer that delivers a signal when the time expires. However, that signal is process directed, and not aimed to the thread to be cancelled. Timers deliver signals as specified in the sigevent struct, which has a way to specify thread directed signals, but timers do not use it. An alternative it to use a different realtime signal for each time supervised thread, but they are too few, and when sufficient, may need a monitor to be reserved and released.
Another solution would be a (timer) thread that accepts supervision requests
from threads, and when an appointed signal occurs (generated by a timer) sends
a per-thread one to the thread that is waiting for it. It can also discard
external signals, but the threads that want to be time supervised must unblock
a signal anyway, and thus have the same hole. The timer thread has to wait for
requests and signals, it could do that with ppoll()
or also with
sigwait()
. Requests can be made sending a signal that carries a
malloc'ed struct containing the requester thread-id. A function can be provided
that creates a timer, sends its id and the thread-id to the timer thread, waits
for a reply and then arms the timer. It is precise because if the timer is
armed instead by the timer thread, the other could be de-scheduled, and some
time pass before it is scheduled again. When a timer sends a signal, is sends
also a siginfo that contains the timer id. When the timer generates a signal,
the timer thread gets it, searches the request that has that timer-id in it,
removes it, and sends a given signal to the thread whose tid is in it. That
signal can carry along the timer id, allowing the receiving thread to disarm it
immediately. This eliminates the problem of stray timer signals because
supervised threads can at the end disarm unconditionally the timer, and request
the timer thread to remove the request. This solution is complex, and has the
drawback that in the supervised thread there could be calls to library
functions that do not test kill flags (and restart interrupted system calls
unconditionally). After all, the purpose of placing supervised code in a thread
is to avoid to use signals to kill it.
Another solution is to ask timers to run a function that makes a
pthread_cancel()
. To that function, a thread-id must be passed,
cast to a pointer. This spares the implementation of the timer thread. There is
a need to synchronize thread termination and timer termination. When the thread
terminates normally, the timer must be disarmed because otherwise it can cancel
a thread that no longer exists (which is not a problem), or worse, whose tid
has been recycled. When it terminates because cancelled, the timer should be
disarmed for the same reason. No temporal windows must exist in which killing
or cancellation is done on a thread that does not exist. To achieve this, the
timer must be armed at the beginning of the thread and disarmed at the end,
before termination. Killing a thread by having a timer create a thread to do it
is a bit expensive, but timeouts should occur seldom in well designed programs.
Note that such a thread is created only when time supervision expires. Timers,
however, when told to run a function, create upfront an additional thread (and
another one when running the function), that is not terminated when they are
deleted. This is a bug, and has been filed. Anyway, here it is this
solution:
typedef struct task_t { int timeout; // max execution time of task, in msec void* (*function) (void* arg); // actual task function void* arg; // function argument void* retvalue; // function return value timer_t timer_id; // id of the timer } task_t; // timeout function that cancels the thread void threadcancel(sigval_t sigval){ pthread_cancel((pthread_t)sigval.sival_ptr); } // cleanup handler, that cancels the timer void cleanup(void* arg){ task_t* task = (task_t*)arg; if (task->timer_id == NULL) return; // argument not yet set timer_delete(task->timer_id); // disarm and delete timer } // thread that starts/stops time supervision and executes the task void* thread(void* data){ task_t* task = (task_t*)data; task->timer_id = NULL; // initialize task data task->retvalue = NULL; pthread_cleanup_push(cleanup,task); // register timer cancellation struct sigevent event; event.sigev_notify = SIGEV_THREAD; event.sigev_notify_function = threadcancel; event.sigev_notify_attributes = NULL; event.sigev_value.sival_ptr = (void*)pthread_self(); timer_t timer_id; if (timer_create(CLOCK_REALTIME,&event,&timer_id) < 0){ pthread_exit(PTHREAD_CANCELED); } task->timer_id = timer_id; // remember timer it and attributes struct itimerspec itime; // arm timer itime.it_value.tv_sec = task->timeout/1000; itime.it_value.tv_nsec = (task->timeout % 1000) * 1000000; itime.it_interval.tv_sec = 0; itime.it_interval.tv_nsec = 0; if (timer_settime(timer_id,0,&itime,NULL) < 0){ pthread_exit(PTHREAD_CANCELED); } task->retvalue = task->function(task->arg); // call actual task function pthread_cleanup_pop(1); // cancel timer return NULL; } // cleanup for convenience wrapper function static void cleanupt(void* arg){ pthread_t thr = *((pthread_t*)arg); pthread_cancel(thr); // n.b. no error checking pthread_join(thr,NULL); // no errors can occur }
// convenience wrapper function, that executes user function with timeout int timedTask(void* (*function) (void* arg), void* arg, int timeout){ task_t task; task.timeout = timeout; // fill in time and function task.function = function; task.arg = arg; pthread_t th; // create thread to execute function pthread_cleanup_push(cleanupt,&th); int res = pthread_create(&th,NULL,&thread,&task); if (res != 0){ return res; } void* status; // wait for its termination res = pthread_join(th,&status); if (res != 0){ return res; } if (task.retvalue != NULL){ // return value of function status = task.retvalue; } pthread_cleanup_pop(0); return (int)status; } void* funct(void* data){ ... actual task } // example of use int res = timedTask(funct,argument,timeout); if (res){ ... error }
A better solution is to place the actions to be supervised in a thread and to make the one that creates it control the passing of time. This creates only one thread per supervised block.
Waiting for children to terminate can be done with:
for (;;){ int status; pid_t pid = waitpid(-1,&status,0); if (pid == (pid_t)-1){ if (errno == ECHILD) break; ... an error occurred break; } }
This scrap of code can be placed in any thread, including one dedicated to wait for children. If the SIG_IGN handler has been set for SICGCHLD, there is no need to wait to recover the status of children, and children do not become zombies. Note that this is not the same as the default disposition (even if it looks similar).
A process must wait for its children even when they have been killed sending SIGKILL, unless it sets the disposition of SIGCHLD to SIG_IGN.
Waiting for children must be done when they are designed to terminate, and also when they terminate unexpectedly, while they where supposed not to do.
Waiting for children can also be done in a handler registered for SIGCHLD,
but this is not convenient because it would interrupt a number of system calls
even when registered with the SA_RESTART option. When there is no need to get
the children exit status, the disposition of SIGCHLD can be set to SIG_IGN, and
nothing else done; otherwise, children can be waited either by the thread that
created them or by any other (e.g., a dedicated one). Moreover,
waitpid()
resumes only when a child terminates, and not when
SIGCHLD is sent by any process, while a handler is run when SIGCHLD occurs,
also when sent by non-children processes.
Setting SIGCHLD to SIG_IGN or to an handler with SA_NOCLDWAIT prevents
children to become zombies. However, init()
in Linux clears
quickly orphaned processes (i.e., children whose parent died before them), and
also zombies. There is thus no need to set SIGCHLD to SIG_IGN to avoid
zombies.
system call | returns | what | delivers | blocks |
wait() |
pid | any child | status | yes |
waitpid() |
pid | specific child/any/group, and kind of state change | status | optional |
waitid() |
pid | specific child/any/group, and more kinds of state change | status, real user id | optional |
waitpid()
can wait for any child that has a specific process
group. Note that there can exist processes that belong to the same group, but
that are not children, and thus that are not waited for.
To wait for children, and collect their exit status:
Solution 1 is the cleanest and simplest. It makes the executing thread wait until all the children terminate. If there is a need to wait for a specific child, it can be done. It is in general cleaner to make the threads that create children to take care of their breed.
Solutions 2 applies when a dedicated thread is used. Solution 3 requires a
loop in which waitpid()
is repeatedly called to get the status of
all children that have terminated so as to compensate for the loss of some
SIGCHLD signal. We can have a thread that does it, but in this simple case a
handler spares a thread. Note that handling child termination with a parallel
thread of execution (be it a handler or a thread) requires to keep track of the
status of processes in some data structure, like, e.g., a list), unless it can
be dealt with on the spot. Creating a process and appending an element to the
list must be atomic with respect to signals for solution 3, i.e., it must be
done with this signal blocked. Moreover, if process creation is done by several
threads, it must also be protected with a mutex. Removing an element needs to
change the list, and thus must be protected too. This bookkeeping can be placed
in an atfork handler. Atfork handlers are not guaranteed to be mutually
exclusive (although they are): they must protect if they want to.
Solution 3 does not allow to wait until all children have terminated. Moreover, waiting for a specific child to complete using SIGCHLD is not easy: the handler could receive signals for other children, thus preventing other parts of the program to use this signal to wait for them. Moreover, signals might be discarded because some previous ones are still pending.
For an example of solution 3, see: http://www.gnu.org/software/libtool/manual/libc/Merged-Signals.html#Merged-Signals.
SIGSEGV is generated when a memory address that does not belong to the process address space is accessed, or when an attempt is made to write into a read-only memory address. E.g., null pointers, page faults or stack overflow.
SIGBUS is generated when accessing a memory mapped file at an address that does not correspond to a position in the file.
Both these signals, after executing a handler, restart the very same instruction that caused them. Therefore, either the handler is able to cure the cause of the signal (e.g., map properly memory), or the handler must terminate the program or the affected thread.
SIGSEGV and SIGBUS are normally programming errors or failures, that can be handled with cancellation, reporting the exception (and therefore, when they occur in system calls, they must not make them return with an error, but terminate the thread). When they denote a programming error, little can be done but killing the process. When they are failures, and there is a means to recover them, the thread can be cancelled, the problem cured, and the thread recreated. In the latter case, the solution is then to put the code that might generate one of these signals in a thread, and cancel it when the signal occurs.
Do simply register/deregister a cleanup handler around statements that can cause one such signal. To tell the signal handler not to kill the process, but only the thread, a flag is set. This is the scheme:
static volatile __thread sig_atomic_t kill_thread = 0; void cleanup(void* arg){ ... report an exception } static void handler(int sig, siginfo_t* siginfo, void* context){ if ((siginfo->si_code == SI_USER || siginfo->si_code == SI_QUEUE) && siginfo->si_pid != getpid()){ // external signal ... process kill } else { if (kill_thread != 0){ // thread handles exception, kill it pthread_exit(PTHREAD_CANCELED); } else { // programming error ... process kill } } } pthread_cleanup_push(cleanup,NULL); ... code that can generate a programming error pthread_cleanup_pop(0); pthread_cleanup_push(cleanup,NULL); kill_thread = 1; // enable handling ... code that can generate a failure kill_thread = 0; // disables it pthread_cleanup_pop(0);
The latter is meant to be used on some specific statement or system call that can produce a SIGSEGV or SIGBUS when executed, and not on a large section of code, that could generate them also because of some programming error.
Note that the handler cancels the interrupted thread using
pthread_exit()
, which is not asynch-signal-safe. However this
handler catches a synchronous signal, that cannot occur while executing this
function, and thus it does not interrupt an execution of it.
When they are not programming errors or failures, they are used to remap memory, and not to terminate processes or threads (or supervised blocks). However, when they are generated from within user code, their handlers are executed, and when returning they restart silently the code, but when they occur in system calls (e.g., because a return argument points to unmapped memory), the behaviour is dependent on the system call, and may range from error returns to partial execution of the system call. A program that wants to remap silently memory accesses can then provide a third case to the handler above, supplying it a user function to call instead of killing the process. That program, however, must either be sure not to execute system calls that can address unmapped memory, or to check any error that they return and act accordingly. This latter case is not treated further here.
When there is a need to perform some memory remapping under the hood, in the
handler, system calls such as mprotect()
and mmap()
can be safely called provided that errno
be saved and restored
(they cannot cause a SIGSEGV to occur while they are executing, and therefore
there are never two executions of them at the same time).
Stack overflow can occur at any function call, and then likely in library functions that are not prepared to handle it, and that do not even register cleanup handlers.
The stack should be dimensioned to support the deepest nesting, without reckoning recursion. Detection of stack overflow should be done only in functions that use much stack, and this occurs typically in recursive functions. It is better to control the depth of recursion than to rely on the system detecting stack overflow. Recovering from stack overflow is difficult: it could be done only, e.g., when there are alternative, less expensive ways to perform some task. E.g., a function needing a large amount of temporary storage could allocate it on the stack declaring an automatic variable. If it fails because of stack overflow, it could be called again telling it to allocate on the heap (which is more costly).
When a function is called, and there is no room in the stack for it, its stack frame is not placed in the stack, SIGSEGV is generated, its handler executed (if any), and if the handler terminates the thread, its cleanup handlers are run on the normal thread stack. The first cleanup handler executed must then use less stack than the offending function call, otherwise the process is immediately terminated with a segmentation fault (i.e., the default SIGSEGV disposition). E.g., suppose a function opens a file, but being close to the stack limit, it causes a stack overflow, and that the close function requires more stack than the open one: the process is terminated immediately. This means that the functions devoted to undo some operation must be thrifty with the stack. Since cleanup handlers unwind the stack, the subsequent ones have more stack at their disposal. N.B. if a thread has no alternate stack, and the handler is registered with SA_ONSTACK, the normal stack is used. Note also that there is no way to terminate a thread bypassing its cleanup handlers, and neither there is a way to extend a bit the stack of a thread (so as to have room to run its cleanup handlers).
Threads that want to terminate gracefully when a stack overflow occurs, must create an alternate stack. Consequently, the SIGSEGV handler must be registered with the SA_ONSTACK flag:
void* thread(void* data){ stack_t altstack; // set up an alternate stack if ((altstack.ss_sp = malloc(SIGSTKSZ)) == NULL){ ... error, no memory } altstack.ss_size = SIGSTKSZ; altstack.ss_flags = SS_ONSTACK; if (sigaltstack(&altstack,NULL) < 0){ ... error } pthread_cleanup_push(cleanup,NULL); ... actions that can cause SIGSEGV because of stack overflow pthread_cleanup_pop(0); return NULL; } ... handler registration struct sigaction sa; sa.sa_flags = SA_SIGINFO | SA_RESTART | SA_ONSTACK; sa.sa_sigaction = handler; sigsetmost(&sa.sa_mask); sigaction(SIGSEGV,&sa,NULL);
An alternate stack is set for the thread that executes the operations that
can cause the signal to occur so as to have room for the handler to execute
(which may not be the case when a stack overflow occurs). Note that
sigaltstack()
sets an alternate stack for the signal handlers of
the calling thread (and not the calling process, as said in the man pages) that
are registered with SA_ONSTACK.
SIGSEGV occurring while executing a system call and having a handler that
cures the problem makes some system calls proceed instead of interrupting,
including the ones that always interrupt when a signal occurs in them, and some
others return with EFAULT (i.e., it is not transparent). This seems to occur
when system calls, after having performed the requested operation try to copy
the results in the user space (e.g., read()
). It is documented
nowhere, though. Note that in such a case, the error cannot be recovered:
restarting them, another operation is executed. E.g., sigwaitinfo()
when receives a signal it is waiting for (and has an argument in unmapped
memory) returns EFAULT and does not run the handler; and when it receives a
stray SIGSEGV it runs the handler and returns EINTR. In both cases it has
received a signal, and so it cannot even be restarted. If sent by
kill()
it behaves as a normal signal. Another example is
read()
with a buffer in unmapped memory: it terminates with
EFAULT, but it reads the data, and therefore it cannot be restarted.
SIGBUS also is not transparent (e.g., sigwaitinfo()
returns with
EFAULT even when there is a handler that repairs the problem, but for some
other system calls, like, e.g., sem_wait()
, it is transparent). It
is either a programming error (misaligned data) or it occurs when a memory
mapped file is accessed at a position that does not correspond to data in the
file. N.B. It is not possible to assess the correctness of accesses to memory
mapped files before executing an I/O operation because immediately after the
test, the file could have been truncated by another process. This, though, can
be avoided by locking the file. System calls that return with an error when a
SIGBUS handler has repaired the problem could be restarted once. It is true
that right after the handler has cured the problem, the problem could occur
again (e.g., a file truncated again), which would need another restart. However,
restarting indefinitely could make the thread enter an endless loop. Moreover,
it is not possible to determine that the handler cured the problem and it was
executed within a system call.
There is then a need for the programmer to check exactly what each system call (that might address unmapped memory) returns in order to restart it properly (if possible). This could then be done only on a case-by-case basis. However, do take into account that the behaviour of system calls is not specified in such a case, and thus it can change in the future. A better solution is to make sure that the data accessed by system calls are properly mapped in accessible memory, and to make it so when that is not the case. When accessing a memory mapped file, the file should be locked so as not to be truncated by other processes.
A handler that makes a siglongjmp()
allows the thread to
continue, although there is little that can be done there. Either the handler
cures the problem, and then returns, or it kills the process or the thread, in
which case it does not need to jump. However, for sake of completeness, here it
is. Such a handler can be enabled only in a stretch of code that executes only
asynch-signal safe operations. In practice, it is advisable to apply it only
when there are no function calls in that stretch of code. E.g.:
static volatile __thread sigjmp_buf* jmp; void handler(int sig, siginfo_t* siginfo, void* context){ if ((siginfo->si_code == SI_USER || siginfo->si_code == SI_QUEUE) && siginfo->si_pid != getpid()){ // external signal ... process kill // or return, if stray signals discarded } else { sigjmp_buf* tmp; // save current tmp = (sigjmp_buf*)jmp; jmp = NULL; // make it NULL: prevent further jumping after handler jump if (tmp == NULL){ ... process kill } siglongjmp(*tmp,1); } ... handler registration struct sigaction sa; sa.sa_flags = SA_RESTART | SA_SIGINFO; sa.sa_sigaction = handler; sigsetmost(&sa.sa_mask); sigaction(SIGSEGV,&sa,NULL); ... supervised block sigjmp_buf buf; if (sigsetjmp(buf,1) != 0){ ... cleanup } else { jmp = (volatile sigjmp_buf*)&buf; ... actions that may generate the signal jmp = NULL; }
For SIGSEGV, the handler can determine if the signal is internal also by testing:
if (info->si_code == SEGV_ACCERR || info->si_code == SEGV_MAPERR){ ... internal signal }
Do remember, however, that a handler must be agreed among all threads because all threads can receive these signals.
In this chapter, error reporting for threads that use cleanup handlers and signal handlers is presented. Cleanup handlers perform exception handling, i.e. they execute any action that is deemed appropriate to cater for failures or other premature termination (e.g., killing). This applies to all signals that denote errors, and to the explicit checking of dynamic conditions whose violations denote errors.
Exception handling means basically detecting failures, cleaning up the process state up to a point at which something can be done to recover (which can be retrying, trying alternatives, or doing less) or terminating the process if nothing can be done.
Premature termination occurs when:
In the first two cases we are not interested in what the process (and all its threads) was exactly doing, in the others we need a detailed error reporting to tell what went wrong:
SIGSEGV cannot be left to its default disposition (barely no error reporting), and stray SIGSEGVs cannot produce an error (but, e.g., a message such as "process killed on user request", as all other kill requests). SIGSEGV can occur at any place in the code (e.g., at memory accesses made with pointers), and normally denotes a programming error. SIGBUS, in some specific circumstances, can denote a failure (e.g., a system call that accesses a memory mapped file that is truncated), that the program wants to handle.
The only one other signal that denotes the violation of a dynamic condition is a time supervision one. However with it, reporting the circumstances of the error is not difficult. Almost all system calls that detect the violation of a dynamic condition return with an error and not with a signal.
All these cases can be handled performing graceful kill and reporting an
exception to higher levels, being the last the one that circumstantiates the
error with respect to the external process interface. This can be done
recording the error data in cleanup handlers. When an internal signal denotes a
failure, we should catch it close to the point (setting a cleanup handler that
reports an error). For system calls that do not generate this signal, but
return an error or interruption, pthread_exit()
can be called,
reporting an error. When an error occurs, most of the times we need to perform
cleanup much the same as we do when killing. There could be cases in which at
some level of nesting a specific recovery can be done, and then the thread not
terminated. This can be handled telling the handler not to cancel the thread
(but in many cases no recovery can be done). It is then appropriate that
cancellation be used both to serve an external request and to handle errors.
When an internal signal denotes a programming error, it can be reported by telling at least the thread or topmost function being executed (with a cleanup handler). Printing the call stack would be quite useful, but unfortunately, there are no means to do it (unless a coredump is generated, and analyzed). It would be nice to tell the source line in error, or at least the program (instruction) counter. This seems quite a difficult thing in C: I have never found a linker listing map that tells where the modules are placed in virtual memory, and neither a C compiler listing telling the relative addresses of instructions in the module (compilation unit). The process could have other threads extant at that point, that must be cancelled without reporting any error.
This is the scheme:
void cleanup(void* arg){ // cleanup for function ... cleanup actions exceptionrethrow(..); } void function(){ // any function that can encounter errors ... pthread_cleanup_push(cleanup,arg); ... abortprocess(); // if unrecoverable error - or - statement // that may cause a signal ... pthread_cleanup_pop(0); - or - exceptionthrow(...); // if recoverable error (thread kill) } void cleanup0(void* arg){ if (failing){ // cleanup because of error exceptionrethrow(...); } } void function0(){ pthread_cleanup_push(cleanup0,NULL); // caller that wants to wrap exceptions ... function(); ... pthread_cleanup_pop(0); } typedef struct failure_t { struct failure_t* next; // chained object ... any other error data } failure_t; // throw an exception from a cleanup handler void exceptionrethrow(...){ failure_t* el; if (failure->next == NULL){ // use object passed by creator el = failure; } else { el = malloc(sizeof(failure_t)); // allocate a new object if (el == NULL) return; memset(el,0,sizeof(failure_t)); // initialize } el->next = failure->next; // add to chain failure->next = el; ... record error data in failure object failing = 1; // note thread is failing } // throw an exception from a thread void exceptionthrow(char* sub){ failure->next = failure; // initialize chain ... record error data in failure object failing = 1; pthread_exit(PTHREAD_CANCELED); } // deliver the list of exceptions (and makes it linear) struct failure_t* getexceptions(void* p){ if (p == NULL) return NULL; failure_t* f = (failure_t*)p; if (f->next == NULL) return NULL; // only one, and not filled f = f->next; ((failure_t*)p)->next = NULL; // open the cycle return f; } ... creation and joining of thread failure_t* fail = malloc(sizeof(failure_t)); if (fail != NULL) memset(fail,0,sizeof(failure_t)); pthread_cleanup_push(cleanupm,data); pthread_t th; // create thread to execute function res = pthread_create(&th,NULL,&thread,fail); ... res = pthread_join(th,&status); // wait for it pthread_cleanup_pop(0); failure_t* f = getexceptions(fail); // get list of exceptions if (f != NULL){ ... cope with exceptions } free(fail);
Threads are terminated either by cancellation (when performing graceful
kill) or by exiting. In both cases, cleanup handlers are executed. Since
cancellation can occur both because of an external request (in which case no
error reporting is wanted), and because of an unrecoverable error, a per-thread
flag (failing
) can be used to control error reporting in them. It
is set by handlers before cancellation or killing, and by treads before
suiciding.
When a thread is terminated because of an error, some object is used to document it, and each cleanup handler can add chained exceptions to it. Cleanup handlers do not return any data, and thus they must leave exception data in some variable. In some cases it could be appropriate to pre-allocate a number of exception descriptors so as to have room to store error data (and not to generate out of memory errors trying to allocate them there). They would be allocated for each thread. An alternative is to pass an exception object to a thread when created. To easily chain exception objects, further exception objects can be linked in a circular list built on the one that is passed to a thread. The list can be made linear to ease visiting it when displaying exceptions. When there is no heap to allocate additional (chained) exception objects, chaining is simply skipped.
Note that passing up the call stack (or block nesting) the information that
the operations (thread, function, block) failed, requires the transmission of
some transient data, i.e., values that are passed on the fly (e.g., from
pthread_exit()
to pthread_join()
, etc.) in some
predefined container (e.g., a return value from a function) that does not need
allocation and freeing. These values occupy that space for only some time.
However, usually such containers are rather small, and passing values is thus
not simple. E.g., the argument to the function executed by a thread when created
is only a pointer, which means that some other container need be created when
there is a need to pass several values.
Threads that are cancelled return PTHREAD_CANCELED (and unfortunately not a
user value, which could have been the exception), and this happens when they
are cancelled with pthread_cancel()
; instead, when they are
terminated with pthread_exit(),
its argument is returned. However,
to tell easily a failing thread from a successful one, PTHREAD_CANCELED should
be returned in all cases a thread did not complete its task. When a thread
failed, it is possible to tell if it did so because of an error, or otherwise:
in the first case the passed exception object is part of a circular list.
Note that when a thread is created, and an exception object passed to it, that object must be initialized, either before passing it, or immediately after (in the created thread). It is pretty the same because thread creation is not a cancellation point.
There is no built-in way to tell if a thread is exiting or not, or in other
words, if a function has been called by a cleanup handler or not.
Unfortunately, it is also difficult to do it using a flag because it would have
to be set when executing a pthread_cancel()
(which is difficult to
remember). Consequently, it is not possible to have only one function to throw
an exception because if it is called from a thread it must do a
pthread_exit()
, while when it is called from within a cleanup
handler it must not.
Each process must be able to decide how to manifest it is being killed, like, e.g., displaying a message on the controlling terminal, or simply replying with a signal. It can be done registering a cleanup handler.
SIGFPE is difficult to handle: it should not be ignored (otherwise the program behaviour is undefined), it should not be handled by the default disposition (which is to abort the process), and then it needs to be caught. However, in the handler the only alternative is to terminate the process. It is also conceivable to try to skip the offending instruction, but this entails a code that is dependent on the instruction set of the processor. Moreover, the signal could occur many instructions later, as Tydeman states. Although it is possible for a compiler to generate the appropriate instructions to prevent this, there is no guarantee that this is done. This makes this signal even more difficult to handle.
To handle the floating point exceptions, the functions in
fenv.h
can instead be used:
#include <fpe.h> ... fenv_t envp; feholdexcept(&envp); // disable generation of SIGFPE ... a = 1.0 / 0.0; // any floating point operation that raises an exception if (fetestexcept(FE_DIVBYZERO)) ... // true if exception occurred
SIGXCPU is sent to a process to notify it that it is reaching its limit of CPU consumption, and that soon after a SIGKILL will terminate it. The process has thus the chance to save or cleanup its important data. The CPU time limits are set as follows:
struct rlimit rlim; rlim.rlim_cur = 1; // soft limit, nr. of seconds rlim.rlim_max = 2; // hard limit if (setrlimit(RLIMIT_CPU,&rlim) == -1){ ... error }
A thread can be used to catch the signals generated by timers. A timer can be told to create a thread each time it tics. This is quite costly, and thus can be done for timers that tic only once:
void thread(sigval_t sigval){ ... } ... scrap of code that uses the timer struct sigevent event; timer_t timer_id; event.sigev_notify = SIGEV_THREAD; event.sigev_notify_function = thread; event.sigev_notify_attributes = NULL; if (timer_create(CLOCK_REALTIME,&event,&timer_id) < 0){ ... error } struct itimerspec itime; itime.it_value.tv_sec = 1; itime.it_value.tv_nsec = 0; itime.it_interval.tv_sec = 0; itime.it_interval.tv_nsec = 0; timer_settime(timer_id,0,&itime,NULL); ... when done: timer_delete(timer_id);
The created thread behaves as if it was detachable: it ceases to exist when its function returns. However, timers when told to run a function, create a thread for it, but create also another thread that is not terminated when timers are destroyed. This causes a leak of threads.
For timers that tic periodically, a thread can be created in advance, to wait for signals sent by the interval timers (without drift):
void* thread(void* data){ increasePriority(); // increase thread priority sigset_t mask; sigsetmost(&mask); // block most signals pthread_sigmask(SIG_BLOCK,&mask,NULL); // block all signals for this thread sigemptyset(&mask); sigaddset(&mask,SIGUSR1); // all signals carry the value for (;;){ int sig; sigwait(&mask,&sig); } } ... scrap of code that uses the timer. N.B. SIGUSR1 must be blocked here pthread_t th; pthread_create(&th,NULL,&thread,NULL); struct sigevent event; timer_t timer_id; event.sigev_notify = SIGEV_SIGNAL; event.sigev_signo = SIGUSR1; if (timer_create(CLOCK_REALTIME,&event,&timer_id) < 0){ ... error } struct itimerspec itime; itime.it_value.tv_sec = 1; itime.it_value.tv_nsec = 0; itime.it_interval.tv_sec = 1; itime.it_interval.tv_nsec = 0; timer_settime(timer_id,0,&itime,NULL); ... when done: timer_delete(timer_id);timer_delete(timer_id); pthread_cancel(th); pthread_join(th,NULL);
These signals are sent to processes to stop (i.e., suspend) them and to
continue (resume) them. SIGTSTP is sent when typing ^Z
, SIGCONT
when typing fg
at a shell prompt when a program has been run by
that shell. All of them can also be sent with kill()
. SIGSTOP
cannot be caught. SIGCONT can be caught, but eventually resumes the process
anyway. One use of catching these signals is to set, or reset, the terminal to
the desired operating mode, or to redraw it, or redisplay a prompt. If
something need be done when both a process is interactively stopped (SIGTSTP)
and when it is continued, and the actions are very simple, this is the
scheme:
static void handle_sigtstp(int signo){ ... do what actions are needed before stopping raise(SIGSTOP); ... do what actions are needed before continuing } ... to register the handler struct sigaction sa; sigsetmost(&sa.sa_mask); sa.sa_handler = handle_sigtstp; sa.sa_flags = SA_RESTART; sigaction(SIGTSTP,&sa,NULL);
If there is a need to do some (very simple) actions only when SIGCONT is received, then a handler for it can be set. Otherwise, a thread can be dedicated to them:
void* thread(void* data){ increasePriority(); // increase thread priority sigset_t mask; sigsetmost(&mask); pthread_sigmask(SIG_BLOCK,&mask,NULL); // block all signals for this thread sigemptyset(&mask); sigaddset(&mask,SIGTSTP); sigaddset(&mask,SIGCONT); for (;;){ int sig; sigwait(&mask,&sig); if (sig == SIGTSTP){ ... do what actions are needed before stopping raise(SIGSTOP); } else { ... do what actions are needed before continuing } } } ... to create the thread (and destroy it). ... N.B. these signals blocked in all other threads pthread_t th; pthread_create(&th,NULL,&thread,NULL); ... pthread_cancel(th); pthread_join(th,NULL);
For signals whose disposition is to stop the process (i.e., SIGSTOP always, and SIGTSTP, SIGTTIN and SIGTTOU with the default disposition), the interruption occurs when SIGCONT is received. SIGTSTP, SIGTTIN and SIGTTOU can be caught, and when they do, they do not stop the process (and thus do not need a SIGCONT), but they interrupt the same system calls as SIGSTOP. To make them stop the process, their handling must raise a SIGSTOP. Unfortunately, in Linux, there are 15 system calls that are interrupted when a process gets stopped (see man 7 signal), and this applies to all the calls that are are being executed (in different threads).
Blocking SIGCONT has no effect, all threads are resumed all the same, and all 15 system calls that had been stopped are interrupted anyway. However, if a handler is in place, only one handler is run when the signal is sent. N.B. there is no way to stop and continue individual threads.
The handling of these signals is process-wide (i.e., only one thread gets the signals, as usual). Unfortunately, each thread could have its own actions to be done to react to them. This could be achieved by re-sending the signals to the threads that need to perform some specific actions when stopped or continued. A single threaded process that is stopped and then continued and has registered a handler for SIGCONT can execute some code before anything is done by the main thread, like, e.g., set properly the terminal. In so doing, the main thread could almost forget about stop/continue events. A multi-threaded one instead resumes all threads, and at most one executes a signal handler for SIGCONT before resuming its operation. This needs a different organization, like, e.g., making all threads that want to display something on the terminal to send requests to a dedicated thread, that handles also stop/resume events.
There are no handlers for SIGSTOP, but the caller can know that the interruption was due to it (actually, to a stop signal) registering a handler for SIGCONT.
In any code (supervised blocks or otherwise), there is a need to restart the 15 system calls that are interrupted by stop signals. It is not acceptable that a program fails because it has been stopped and then continued. The problem is that there is no built-in way to detect that they have been interrupted by such signals, and not by others.
The solution to restart system calls does not change if a handler is provided for stop (and/or SIGCONT) signals.
Signals can be used to ask processes (or threads) to change the course of some action that they are doing repeatedly. This can be done informing them about the event with a flag. This is the case of processes that repeatedly execute a loop, or that perform frequently some actions. An example is processes that record log messages in some file. When there is a need to get the messages logged so far, the process can be told to close the current log file, and open a new one. Another example is processes that read a configuration file, and that can be told to reload it. If they execute repeatedly a loop, they can check if such a request has come.
This could be implemented registering a handler that sets a flag. However, that would interrupt many system calls that are not restarted automatically. A better solution is to use a thread to handle that signal. It can be handled by the control thread. This is an example of log rotation:
static sem_t rotation_requested; void* controlthread(void* data){ ... case SIGxxx: // signal devoted to rotate logs sem_post(&rotation_requested); break; } ... code that initializes the log file FILE* logfile; // log file int fileno = 0; // number of log file char filename[80]; // string to build the name of the log file int fileopened = 0; // tell if log file is open sem_init(&rotation_requested,0,1); ... code that writes a message in the log file if (!sem_trywait(&rotation_requested)){ // a request to rotate has been made if (fileopened){ // the log file is currently open fclose(logfile); // close it fileopened = 0; fileno++; // use a new number next time } } if (!fileopened){ // log file not open, build a new name and open it sprintf(filename,"tmp.log%d",fileno); logfile = fopen(filename,"w"); fileopened = 1; } fprintf(logfile,message); ... code that closes the log file if (fileopened){ fclose(logfile); } sem_destroy(&rotation_requested);
N.B. since the flag is shared between two threads, and since there is no guarantee that accesses are not reordered, the flag is implemented with a semaphore. The control thread posts to it to let the other know that there is a request pending, and the other tries to get it, rotating the log when it got a request, and proceeding normally otherwise.
In Linux there is no way for a userland application to be notified about the occurrence of power failure, computer suspension or hibernation (and also resume), with signals or otherwise.
Applications could need to perform some operations when these events occur, like, e.g., closing a dial-up Internet connection.
Parallel execution in the user space occurs when a process:
Processes are identified by pid's, that are recycled. A test on an Athlon 64 X2 4200+ has shown that a process creating other processes in a row creates 32312 processes in 12 seconds before a pid is recycled. Of course, this is somehow an upper limit because these processes do nothing and terminate immediately. The time a pid takes to be reused is then fairly large, and yet not too large to prevent to kill a homonym. Thread-id's (tid's) are instead reused immediately. I.e. when a thread terminates, and another is created, the tid of the first is reused. With a 32 bits kernels and processes created every 300 μs the recycle time is 10-4 ? (215 - 1) = 10 seconds, and with a 64 system it is 5 minutes. On 64 bits kernels, the maximum pid is 4194303, and in 32 bits kernels it is 32768. It is configured in /proc/sys/kernel/pid_max. However, a system is never creating processes in a row like that, and thus pids are never recycled before a few minutes.
When a process is creating too many threads, it receives an EPERM as a result of threads creation. Concerning processes, after having created 24854 processes, the system practically hangs (it allows to kill processes, though).
A child, when created, does not have the threads of the father; it has only
one thread and that is the one that forked it. If the father had several
threads, with locks held by some of them, the locks will be duplicated in the
child, but not the threads that hold them. There is then a need to be very
careful in touching locks that are held by threads that do not exist in the
child. Moreover, when a thread executes a fork()
, others could be
in the middle of updating shared data.
Mutexes and rwlocks have an holder, while semaphores do not. A mutex knows
what thread locked it, while a semaphore does not (this allows deadlock
detection). There are no means to make condition variables fork-safe, and
therefore they must not be used after fork()
(they can be
implemented using locks).
To avoid races, no asynch-signal-unsafe function can be called
before exec-ing. However, some libraries, like, e.g., the ones that perform I/O
redirection are likely to be called after fork()
. Some
implementations of them are multi-threaded and use locks (e.g.,
dup()
, fcntl()
). Extant non multi-threaded programs
may not follow the restriction to call only asynch-signal-safe functions. In
order to avoid to provide a special variant of these functions to be used only
in such programs, a means have been introduced to allow to use locks in
libraries, and safely call them after fork()
: atfork handlers.
They should be considered something special for a special case, though: new
programs must obey the rule to call only asynch-signal-safe functions after
fork()
(e.g., they cannot use locks). Programs must also avoid to
use data that can be in an inconsistent state in a child.
Functions that use locks must also register atfork handlers in order to
become fork-safe. This must be done either in an initialization function placed
in the package in which they lie, or in the functions that use mutexes in such
a way as to be executed only once. Remember that atfork handlers can be
registered, but not un-registered. The prepare handler acquires locks, the
father handler and the child handler release it. This is quite burdensome.
However, it has been introduced for a special case. This is rather coarse
because it makes every fork()
acquire locks that might not be used
in children.
The problem occurs when a thread executes a fork()
while the
state of the data is inconsistent, or when a thread holds a mutex, and the
thread has no counterpart in the child (the mutex in the child will be locked
by a nonexistent thread). A multi-threaded process has better to make an
exec()
soon after a fork()
.
A function that uses some global variables could be made thread-safe by
protecting the access to such data with a mutex. This makes another thread call
the same function wait a bit. However, here the picture is worse because a
fork()
does not wait for any mutex, much the same as a signal
handler. They both run without paying any attention to mutexes. Or it is even
worse because you can protect critical sections with mutexes in the realm of
threads, and blocking signals in that of signal handlers, but you can do
nothing when a fork is done.
Since a program can be made by several modules, and can call also several
library functions, to preserve encapsulation, each module or library can make
fork()
reserve/release its own locks by installing its own
handlers.
Moreover also all data in the father that contain pid's and tid's may be meaningless in the child.
The man pages do not say anything regarding what functions can be called from within atfork handlers. I guess that the asynch-signal-unsafe ones are allowed since the ones that release locks are so.
Concerning per-thread keys, threads have the same tid's in fathers and
children. Thus, the main thread in a child has the same per-thread values as
the creator thread in the father. What is not said is if the associations
present in the father for other threads are deleted. E.g., it is likely that
keys that associate non-existent threads with malloc-ed memory produce a memory
leak. If a thread in the father creates locally a key, that key could be
deleted with a child atfork handler. However, since between fork()
and exec()
little should be done, unused memory is not much of a
problem there.
Suppose a process is inside a slow system call, and having caught a signal,
it is executing a signal handler, and the handler makes a fork()
.
The child is inside the system call and the handler too, and when the handlers
executions return, both system calls are aborted or are restarted.
Let's consider a process with children and also with threads: the children are sons of the main thread. All threads share the same pid and father.
The difference between the main thread and the others is that when the main
one falls off the bottom, an implicit exit()
is done, while when
any other thread falls off the bottom, a pthread_exit()
is done.
An exit()
terminates at once all threads (it executes an
exit_group()
). Therefore, when the main thread falls off the
bottom, all threads are terminated.
A process that creates a child and terminates leaves the child running, but re-parented to 1. Such child is said to be orphaned.
Let's have a process that creates a child that terminates before the father
terminates. During the timespan between the child termination and the father
waiting for it, the child is a zombie (and marked as
<defunct>
in the ps
output). A zombie ceases
altogether to exist when the father terminates because at that time it becomes
orphaned and it is re-parented to init
, which promptly waits for
it.
Threads and signal handlers share the memory with the process that created them or was interrupted by them.
gcc supports __thread
, that declares thread-local variables.
This is much more handy than the use of thread keys. E.g.
static __thread int var; // declares a variable var, private for each thread ... var ... // from within a thread addresses its private one
Children are created with a copy of the memory of their father.
Occurrences of errno
refer to the thread specific one,
controlled by the _posix_source
feature definition, which is the
default one. Actually, errno
is always a macro that calls a
function that delivers a thread private location.
Signal handlers address the errno
of the thread to which the
signal has been delivered, and that they interrupted.
Since all threads of a same process share the same address space, they can pass data using shared (static) variables, providing that they protect accesses (e.g., with mutexes), or do that when absolutely sure that no simultaneous accesses occur (and no reordering of accesses too).
Another means to exchange data is through arguments:
pthread_create()
has a void*
parameter. The
creator can pass a scalar value (casting it) or a pointer to:
pthread_exit()
has a void*
parameter. The
thread can pass a scalar value (casting it) or a pointer to:
the value is got by the thread that executes a
pthread_join()
passing the address of a return variable to it.
Note that when the joined thread has been cancelled, the PTHREAD_CANCELED
value is returned. When there is a need to pass data from a thread that
might be cancelled, the easiest means is to pass a variable to it when it
is created.
When a thread terminates because it is cancelled, all the data that it has
malloc'ed (and that must not be returned, which is the normal case), must be
freed. This can be done in its cleanup handlers. A thread can detect some
condition that prevents it from continuing. In such a case it could terminate
executing a pthread_exit()
. This makes its cleanup handlers be
executed. If there is a need to return malloc'ed data, they must not be freed.
However, cleanup handlers have no built-in way to tell that they are being
executed because of cancellation from being executed because of thread exiting.
To distinguish these two cases, a per-thread flag can be set before calling
pthread_exit()
. Do also take into account that cleanup handlers
have no means to change the value that is returned by
pthread_join()
.
Note that a thread that has called a pthread_exit()
and is
executing its cleanup handlers is insensitive to cancellation. Therefore, when
cancelling a thread, there is no guarantee that it terminates having freed all
its malloc'ed data.
N.B. Terminate the main function of a joinable thread always with a
return
or a pthread_exit()
because otherwise the
value returned by pthread_join()
is undefined.
Detaching a thread that is inside its cleanup handlers is allowed, it
succeeds both either when that is due to exiting and cancelling.
pthread_join()
returns EINVAL on detached threads.
When a thread cancels the main thread, the process does not terminate. The
main thread, after having executed the cleanup handlers terminates, but not the
whole process. This is because cancellation performs a
pthread_exit()
at the end.
When the main thread returns, or falls through the end or executes an
exit()
, the process is terminated immediately, including all its
threads (their cleanup handler, if any, are not executed). When the main thread
executes a pthread_exit()
it terminates, but the process becomes
defunct and terminates only when all threads terminate (or one executes an
exit()
). It is possible to detach the main thread if it does not
need to join the other threads. When any thread executes an
exit()
, the process is terminated immediately.
This section contains a number of possible improvements to the current semantics of signals and cancellation in Linux. To my knowledge no one of these have been submitted to the Linux community.
We know that it is not possible to implement killing by simply acting a program counter transfer (except for few cases): there is a need to release all resources that were allocated when the transfer occurred. This cannot be done automatically in general. Therefore, killing works only if all the code to be supervised contains tests on kill requests and handles them, much the same as with cancellation. Thus, it is not possible to just take an existing piece of code (with library calls in it) and enclose it in some try block. Solutions:
pselect()
to the other slow system
calls. This is costly.They would allow to kill a single operation, while the minimum granularity (for the general case) now is the thread.
All these solutions remove the races described in the section of supervised blocks.
Note that not all blocking functions seem to be interrupted by signals (e.g.,
pthread_cond_wait()
). Thus, if a signal were used to kill, it
should interrupt them.
Aborted system calls would return with EINTR in order to allow the caller to know if they succeeded or not. This allows to know if resources have been acquired, and thus must be released when cleaning up.
A solution is persistent signals: a slow system call is aborted when a signal comes while the process is suspended in it, and also when, before blocking, a persistent signal has occurred. Alternatively, it could abort when a special kill flag is true upon entrance. It would have the same effect of blocking all signals, testing the flag, and if false execute the call and atomically unblock signals. Persistent signals would then not make handlers executed, but only abort slow system calls. This would be the paradigm:
if (killFlag) ... kill (1) sem_wait(sem); (2) if (killFlag){ (3) if (errno == EINTR){ (4) kill } (5) sem_post(sem); }
Signal arriving at:
1: ok, it makes sem_wait()
abort, and also if it occurs inside
sem_wait()
2: ok, it raises the killFlag, and thread can release sem
3: ok, duplicated signal
4: same
5: same
In order to protect cleanup code, signals can be blocked in it.
Persistent signals and kill flags allow to register resource allocation:
(1) resourceAllocated = false; (2) sem_wait(sem); (3) if (killFlag){ (4) if (errno == EINTR){ (5) kill, but do not release sem } else { (6) kill and release sem } } resourceAllocated = true;
Signal arriving at:
1: ok, makes sem_wait()
abort
2: ok, makes sem_wait()
abort, and also if it occurs inside
sem_wait()
3: ok, nothing
4: same
5: same
6: same
Basically, persistent signals interrupt the code, but since the handler does nothing, it lets the code proceed. Such signals make only the slow system calls abort. There is a need in the system calls to clear the signal. But clearing could also be done outside. Note that when there is a sequence of actions, and in them there are several cancellation points, there is a need to remember what calls have been interrupted to undo the ones that had not been interrupted.
The correct scheme after a system call is to test the kill flag because we can then handle the case in which the kill has arrived during or just after the system call. However, we can test if the system call has been interrupted, in which case we probably do nothing (except exiting), and let the test made at the next kill point to release the resource.
Here is a list of other possible improvements:
pthread_cancel()
(otherwise there
is no way for a thread to time-supervise a sequence of actions and to
cancel)pthread_cancel()
to be called from within a signal
handler (this is already possible, even if not documented).Currently, signals can safely be used only in very few contexts (especially in a MT-process) because of the lack of these features.
Java does not have a notion of asynchronously interrupting threads. It has a notion of interrupting them at well defined points during execution. When a thread is interrupted, an exception is thrown. This makes interruption blend seamlessly with the exception handling mechanism, allowing thus threads to set up exception handlers to catch them, deciding then to terminate or to repair and recover.
A thread can post an interruption request to another, placing it into an interrupted status that makes it resume with an exception at the next execution of one of a set of (blocking) methods (belonging to several classes). The same occurs if the thread is already blocked in one such methods.
Java does not have a forced thread kill. Thread.destroy()
is
not implemented. As a consequence, there is no way to cure an application that
has a wild thread (as with POSIX threads).
Java programs can register shutdown hooks, which are threads that are executed when the program (actually, the virtual machine) receives a signal.
SIGINT, SIGTERM, SIGHUP make the shutdown hooks run. SIGQUIT produces a dump of the current threads and garbage collection statistics. The other signals abort the program, some with an error message.