rootfs: make pivot_root not use a temporary directory#1125
rootfs: make pivot_root not use a temporary directory#1125cyphar merged 1 commit intoopencontainers:masterfrom cyphar:pivot_root-without-tmpdir
Conversation
|
cool, testing on RHEL. |
|
@rhvgoyal could you also try this out? |
|
okay, this messed up the mounts on my host while testing. |
|
After running the tests on the host, this is all that remained. |
|
Ah, fun. Yeah, this is kinda what I was worried about. I'll need to play around with this more -- the code for |
|
@mrunalp Does this patch help? diff --git a/libcontainer/rootfs_linux.go b/libcontainer/rootfs_linux.go
index 4f21416ce233..0014cb82e6e6 100644
--- a/libcontainer/rootfs_linux.go
+++ b/libcontainer/rootfs_linux.go
@@ -10,6 +10,7 @@ import (
"os/exec"
"path"
"path/filepath"
+ "runtime"
"strings"
"syscall"
"time"
@@ -566,6 +567,13 @@ func setupPtmx(config *configs.Config, console *linuxConsole) error {
// pivotRoot will call pivot_root such that rootfs becomes the new root
// filesystem, and everything else is cleaned up.
func pivotRoot(rootfs string) error {
+ // We change the cwd during this function. Because Go is multithreaded and
+ // it doesn't appear to correctly implement POSIX thread semantics, we have
+ // to ensure we don't switch threads here. Otherwise we start napalming the
+ // host's filesystem.
+ runtime.LockOSThread()
+ defer runtime.UnlockOSThread()
+
// While the documentation may claim otherwise, pivot_root(".", ".") is
// actually valid. What this results in is / being the new root but
// /proc/self/cwd being the old root. Since we can play around with the cwd |
|
@cyphar Nope, that patch doesn't help :/ |
| // Make pivotDir rprivate to make sure any of the unmounts don't | ||
| // propagate to parent. | ||
| if err := syscall.Mount("", pivotDir, "", syscall.MS_PRIVATE|syscall.MS_REC, ""); err != nil { | ||
| if err := syscall.Unmount(".", syscall.MNT_DETACH); err != nil { |
There was a problem hiding this comment.
You have to mark all old mounts as private before umounting them
syscall.Mount("", ".", "", syscall.MS_PRIVATE|syscall.MS_REC, "")
There was a problem hiding this comment.
@avagin Yes, we were making the old mounts private before but it got removed in this PR.
There was a problem hiding this comment.
Heh, whoops. I'll fix this.
There was a problem hiding this comment.
I've applied this fix. PTAL.
|
This looks like a cool idea. Will try tomorrow morning. |
libcontainer/rootfs_linux.go
Outdated
| if err := syscall.Unmount(".", syscall.MNT_DETACH); err != nil { | ||
| return err | ||
| } | ||
| syscall.Close(oldroot) |
There was a problem hiding this comment.
You already have a defer syscall.Close(oldroot). I am wondering why this call is required.
There was a problem hiding this comment.
My thinking was to make it clear that we didn't need that fd after that point, but I've removed it.
libcontainer/rootfs_linux.go
Outdated
|
|
||
| // We cannot umount(".") because currently our . is newroot. So we need to | ||
| // switch back to oldroot before doing a MNT_DETACH (since we still have an | ||
| // open fd to it). |
There was a problem hiding this comment.
So looks like you are changing cwd of the process and root of the process continues to be rootfs. IOW, . being new root is not the problem. Probably problem is that . is cwd of the process and it will keep mounts busy. Hence you are changing cwd to oldroot so that unmount succeeds.
If this is correct, it might be a good idea to modify comment a bit to reflect that.
There was a problem hiding this comment.
MNT_DETACH should allow us to unmount it without changing directories, from my reading of pivot_root it does a lot of tomfoolery that I believe might change the cwd. But I'll double check.
There was a problem hiding this comment.
I'm confused -- I just removed this line and the code still works. So I think you're right (that . is still oldroot) but you're wrong that MNT_DETACH will fail if we unmount without changing directories.
|
code looks good. Is it safe to test on my machine? i'm scared.... |
|
I tested it on my machine and it worked for me. |
Namely, use an undocumented feature of pivot_root(2) where
pivot_root(".", ".") is actually a feature and allows you to make the
old_root be tied to your /proc/self/cwd in a way that makes unmounting
easy. Thanks a lot to the LXC developers which came up with this idea
first.
This is the first step of many to allowing runC to work with a
completely read-only rootfs.
Signed-off-by: Aleksa Sarai <asarai@suse.de>
|
@rhvgoyal While the |
| if err := syscall.Chdir("/"); err != nil { | ||
| return fmt.Errorf("chdir / %s", err) | ||
|
|
||
| // Currently our "." is oldroot (according to the current kernel code). |
There was a problem hiding this comment.
I am not sure what does this comment mean. I suspect you are saying that cwd of process has not changed and oldroot is continuing to be cwd of the process?
If yes, old root is "/" and that's was not necessarily cwd of the process at the time of call to pivot_root().
May be say something like. pivot_root() does not guarantee what will happen to cwd of the calling process. It is a possibility that pivot_root() changed cwd to new root and in that case following umount() will fail. So to be safe, change dir to oldroot.
There was a problem hiding this comment.
I don't think you're right (I thought about it some more). pivot_root(a, b) will make a the new root and mount the old root at b. What we're doing here is unmounting the old root, because the old root is also on b. Since b is . it matters whether or not pivot_root has changed your cwd. It turns out that this does change your cwd in the current Linux implementation, but I don't want to depend on that (the mount logic is very dodgy in the kernel).
Namely, use an undocumented feature of pivot_root(2) where
pivot_root(".", ".") is actually a feature and allows you to make the
old_root be tied to your /proc/self/cwd in a way that makes unmounting
easy. Thanks a lot to the LXC developers which came up with this idea
first.
This is the first step of many to allowing runC to work with a
completely read-only rootfs.
Fixes #1122.
Signed-off-by: Aleksa Sarai asarai@suse.de