Skip to content

libcontainer: map pre-exec PID 1 signals to 128+signal#5197

Open
lifubang wants to merge 1 commit intoopencontainers:mainfrom
lifubang:carry-5189-pid1-exec2
Open

libcontainer: map pre-exec PID 1 signals to 128+signal#5197
lifubang wants to merge 1 commit intoopencontainers:mainfrom
lifubang:carry-5189-pid1-exec2

Conversation

@lifubang
Copy link
Copy Markdown
Member

Test #5189

There is a narrow pre-exec window spanning the exec.fifo handshake and the final execve in which the Go-based runc init helper is still the container's PID 1.

If SIGTERM, SIGINT, or SIGHUP arrives in that window, Linux does not apply the default terminating action because PID 1 is special. The Go runtime signal path assumes the kernel will finish that work for terminating signals and calls dieFromSignal on that basis; see: https://github.com/golang/go/blob/c60392da/src/runtime/signal_unix.go#L993

For runc's PID 1 helper, that mismatch leaks Go's internal exit status 2 instead of the usual shell-style 128+signal.

Install a narrow pre-exec signal handler for those signals while the helper is PID 1, and translate them to 128+signal until execve replaces the helper with the container payload.

Add libcontainer integration coverage for the regression. The test uses a StartContainer hook to hold the process in the post-fifo, pre-exec window, signals init through the libcontainer API, and verifies the resulting exit status for SIGTERM, SIGINT, and SIGHUP.

@lifubang lifubang force-pushed the carry-5189-pid1-exec2 branch from 7af6e21 to 948957a Compare March 25, 2026 01:45
@lifubang lifubang force-pushed the carry-5189-pid1-exec2 branch 2 times, most recently from 69adecd to 6eb7f1c Compare March 25, 2026 02:59
There is a narrow pre-exec window spanning the exec.fifo handshake
and the final execve in which the Go-based runc init helper is still
the container's PID 1.

If SIGTERM, SIGINT, or SIGHUP arrives in that window, Linux does not
apply the default terminating action because PID 1 is special. The Go
runtime signal path assumes the kernel will finish that work for
terminating signals and calls dieFromSignal on that basis; see:
https://github.com/golang/go/blob/c60392da/src/runtime/signal_unix.go#L993

For runc's PID 1 helper, that mismatch leaks Go's internal exit status
2 instead of the usual shell-style 128+signal.

Install a narrow pre-exec signal handler for those signals while the
helper is PID 1, and translate them to 128+signal until execve
replaces the helper with the container payload.

Add libcontainer integration coverage for the regression. The test uses
a StartContainer hook to hold the process in the post-fifo, pre-exec
window, signals init through the libcontainer API, and verifies the
resulting exit status for SIGTERM, SIGINT, and SIGHUP.

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Signed-off-by: lifubang <lifubang@acmcoder.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a PID 1 pre-exec edge case where termination signals (SIGTERM/SIGINT/SIGHUP) can yield Go’s internal exit status instead of the conventional 128+signal, by installing a narrow signal handler in the init helper and adding an integration test that reproduces the race window.

Changes:

  • Add a pre-exec PID 1 signal handler in libcontainer init to exit with 128+signo for SIGTERM/SIGINT/SIGHUP.
  • Add an integration test that blocks in a StartContainer hook, signals init via the libcontainer API, and asserts the mapped exit codes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
libcontainer/init_linux.go Installs a pre-exec signal handler (PID 1 only) to translate termination signals into conventional 128+signal exit codes.
libcontainer/integration/preexec_signal_test.go Adds integration coverage that holds init in the post-fifo, pre-exec window and validates exit status mapping for SIGTERM/SIGINT/SIGHUP.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +142 to +146
fd, err := unix.Open(path, unix.O_WRONLY|unix.O_NONBLOCK|unix.O_CLOEXEC, 0)
if errors.Is(err, unix.ENXIO) || errors.Is(err, os.ErrNotExist) {
return
}
ok(t, err)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

releaseHook treats ENXIO from opening the FIFO as a no-op return. ENXIO can also occur if the hook hasn't opened the FIFO for reading yet (a race even after hookReady is created), which can leave the StartContainer hook blocked and make this test flaky/hang. Consider retrying on ENXIO until a short deadline (or change the hook command so the FIFO is opened before writing the ready file).

Suggested change
fd, err := unix.Open(path, unix.O_WRONLY|unix.O_NONBLOCK|unix.O_CLOEXEC, 0)
if errors.Is(err, unix.ENXIO) || errors.Is(err, os.ErrNotExist) {
return
}
ok(t, err)
openDeadline := time.Now().Add(500 * time.Millisecond)
var (
fd int
err error
)
for {
fd, err = unix.Open(path, unix.O_WRONLY|unix.O_NONBLOCK|unix.O_CLOEXEC, 0)
if errors.Is(err, os.ErrNotExist) {
// FIFO does not exist (any more) — nothing to release.
return
}
if errors.Is(err, unix.ENXIO) {
// Reader (hook) has not opened the FIFO yet; retry for a short time.
if time.Now().After(openDeadline) {
t.Fatalf("timed out waiting to open %s for writing: %v", path, err)
}
time.Sleep(10 * time.Millisecond)
continue
}
ok(t, err)
break
}

Copilot uses AI. Check for mistakes.
Comment on lines +132 to +136
case <-time.After(500 * time.Millisecond):
releaseHook(t, hookRelease)
t.Fatal("process did not exit while startContainer hook was still blocking")
return 0
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 500ms timeout here (and the 500ms deadline used inside releaseHook) seems quite tight for an integration test and may cause flakes under CI load. Consider increasing these timeouts (e.g., a few seconds, consistent with other integration tests) while still asserting that the init exits before the hook is released.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants