I observed that userspace tracing no longer worked when an
instrumented application (linked against liblttng-ust) was launched
before the session daemon.
While investigating this, I noticed that the shm_open() of
'/lttng-ust-wait-8' failed with EACCES. As the permissions on the
'/dev/shm' directory and the file itself should have allowed the
session daemon to open the shm, this pointed to a change in kernel
behaviour.
Moreover, it appeared that this could only be reproduced on my
system (running Arch Linux) and not on other systems.
It turns out that Linux 4.19 introduces a new protected_regular sysctl
to allow the mitigation of a class of TOCTOU security issues related
to the creation of files and FIFOs in sticky directories.
When this sysctl is not set to '0', it specifically blocks the way the
session daemon attempts to open the app notification shm that an
application has already created.
To quote a comment added in linux's fs/namei.c as part of
30aba6656f:
```
Block an O_CREAT open of a FIFO (or a regular file) when:
- sysctl_protected_fifos (or sysctl_protected_regular) is enabled
- the file already exists
- we are in a sticky directory
- we don't own the file
- the owner of the directory doesn't own the file
- the directory is world writable
```
While the concerns that led to the inclusion of this patch are valid,
the risks that are being mitigated do not apply to the session
daemon's and instrumented application's use of this shm. This shm is
only used to wake-up applications and get them to attempt to connect
to the session daemon's application socket. The application socket is
the part that is security sensitive. At worst, an attacker controlling
this shm could wake up the UST thread in applications which would then
attempt to connect to the session daemon.
Unfortunately (for us, at least), systemd v241+ sets the
protected_regular sysctl to 1 by default (see systemd commit
27325875), causing the open of the shm by the session daemon to fail.
Introduce a fall-back to attempt a shm_open without the O_CREAT flag
when opening it with 'O_RDWR | O_CREAT' fails. The comments detail the
reason why those attempts are made in that specific order.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
/*
* Try creating shm (or get rw access). We don't do an exclusive open,
* because we allow other processes to create+ftruncate it concurrently.
+ *
+ * A sysctl, fs.protected_regular may prevent the session daemon from
+ * opening a previously created shm when the O_CREAT flag is provided.
+ * Systemd enables this ABI-breaking change by default since v241.
+ *
+ * First, attempt to use the create-or-open semantic that is
+ * desired here. If this fails with EACCES, work around this broken
+ * behaviour and attempt to open the shm without the O_CREAT flag.
+ *
+ * The two attempts are made in this order since applications are
+ * expected to race with the session daemon to create this shm.
+ * Attempting an shm_open() without the O_CREAT flag first could fail
+ * because the file doesn't exist. It could then be created by an
+ * application, which would cause a second try with the O_CREAT flag to
+ * fail with EACCES.
+ *
+ * Note that this introduces a new failure mode where a user could
+ * launch an application (creating the shm) and unlink the shm while
+ * the session daemon is launching, causing the second attempt
+ * to fail. This is not recovered-from as unlinking the shm will
+ * prevent userspace tracing from succeeding anyhow: the sessiond would
+ * use a now-unlinked shm, while the next application would create
+ * a new named shm.
*/
wait_shm_fd = shm_open(shm_path, O_RDWR | O_CREAT, mode);
if (wait_shm_fd < 0) {
- PERROR("Failed to open wait shm at %s", shm_path);
- goto error;
+ if (errno == EACCES) {
+ /* Work around sysctl fs.protected_regular. */
+ DBG("shm_open of %s returned EACCES, this may be caused "
+ "by the fs.protected_regular sysctl. "
+ "Attempting to open the shm without "
+ "creating it.", shm_path);
+ wait_shm_fd = shm_open(shm_path, O_RDWR, mode);
+ }
+ if (wait_shm_fd < 0) {
+ PERROR("Failed to open wait shm at %s", shm_path);
+ goto error;
+ }
}
ret = ftruncate(wait_shm_fd, mmap_size);