LSM stacking and the future

Benefits for LWN subscribers

The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

By Jake Edge
November 20, 2019

LSS EU

The idea of stacking (or chaining) Linux security modules (LSMs) goes back 15 years (at least) at this point; progress has definitely been made along the way, especially in the last decade or so. It has been possible to stack "minor" LSMs with one major LSM (e.g. SELinux, Smack, or AppArmor) for some time, but mixing, say, SELinux and AppArmor in the same system has not been possible. Combining major security solutions may not seem like a truly important feature, but there is a use case where it is pretty clearly needed: containers. Longtime LSM stacker (and Smack maintainer) Casey Schaufler gave a presentation at the 2019 Linux Security Summit Europe to report on the status and plans for allowing arbitrary LSM stacking.

LSMs allow adding more restrictions to Linux than those afforded by the traditional security policies. For the most part, those policies reflect the existing mechanisms, such as permissions bits on files. But there are also other security concerns, such as binding to a network socket, that are outside of the usual permissions, so mechanisms to restrict access to them have been added to the LSM interface.

Prior to the advent of the Yama LSM, only one security module could be active in a running kernel; Yama was originally manually stacked, which "didn't really sit very well". To support adding the Yama restrictions on top of other LSMs in a dynamic fashion, lists of modules were added to the kernel, which would allow multiple LSMs to be active. But there was a problem for the "bigger" LSMs that need security "blobs"—context data associated with various kernel objects—in which to store their state. There is only one pointer available to use, so only one blob-using LSM could be active, though multiple minor LSMs that did not need the blobs could be stacked with it.

At this point, an LSM attribute has been added to tag LSMs; LSM_FLAG_EXCLUSIVE. The "exclusive" tag is applied to the blob-using LSMs: SELinux, Smack, and AppArmor. The idea is to remove that tag from those LSMs over time.

There can only be one exclusive LSM active in a running kernel. "That's bad", Schaufler said, but for a long time that was not seen as a serious problem. That was before containers became so widespread, however. Now there are some people who run, for example, Ubuntu in their data centers (with AppArmor) and who want to run Android (SELinux) containers on top. So the goal of the work he and others have been doing is to get rid of the exclusive bit for "as many modules as we possibly can".

The 5.1 kernel added "infrastructure-managed blobs" for a number of different kernel objects: tasks, credentials, files, inodes, and the System V interprocess-communication mechanisms (semaphores, shared memory, and message queues). An LSM will tell the kernel how much space it needs to store its information and the kernel will take care of allocating, managing, and freeing the blob. So, any LSM that only uses blobs on those object types can be marked as non-exclusive at this point.

That means a variety of LSMs can be used alongside SELinux, so "the IT people are really happy" since SELinux does not have to be turned off to get the protections afforded by some other module that only uses those blobs. There are also a number of smaller LSMs that are headed toward the mainline that could benefit from this. Those, or some custom module, can be run with one of the exclusive LSMs, mostly without interference; so "everybody's happy", he said.

Next up

"But not everybody's happy", he continued, because there are still limitations, which leads to the plans for an upcoming kernel, possibly 5.5. The code to remove the exclusive flag for AppArmor is basically ready. AppArmor is different than Smack and SELinux, "in that it is path-name-based-ish", though it is less so now than it used to be. It has a different fundamental security model; Smack and SELinux are both based on subjects and objects, while AppArmor mostly focuses on path names. The use cases for AppArmor are also different than those of the others, so it makes sense to start with it.

In order to make non-exclusive stacking work for AppArmor, kernel socket-object security blobs have to join the other infrastructure-managed blobs so that multiple LSMs can have them. That is fairly easily done, since it already has been done for the other objects. There are also more difficult pieces; when you get to those, that's where people start to bikeshed, he said.

The first problem is sharing /proc/PID/attr/current, which is used by AppArmor and other major LSMs to report the security context for the process identified by PID. So SELinux and AppArmor would both want to put their contexts in that file, but that is impossible. Similarly, the SO_PEERSEC socket option to retrieve the security context of the other endpoint of a Unix socket also cannot be shared. The solution for both is to introduce a new interface, so that the existing interfaces stay backward compatible.

A number of different ideas for the format of /proc/PID/attr/context and SO_PEERCONTEXT (the new interfaces) were proposed along the way, but the developers "finally did the intelligent thing" and asked the user community, the D-Bus developers in particular. They suggested a simple string with pairs of null-terminated strings of the form "LSM-name\0value\0"; the full length of the string will be known, so pulling out the individual LSM contexts will be straightforward. There is something of a lesson there, Schaufler said: instead of debating something like this, ask the people who will be using the information.

But adding interfaces doesn't really solve the problem, since there are numerous system utilities that will use the existing interfaces—and for a long time to come. So there is a new /proc/self/attr/display setting that can be used to determine which LSM's context information is reported via the existing interfaces. An SELinux container could set its display to ensure that the container sees the SELinux information even if it is running on a kernel with AppArmor active as well; the rest of the system could set the display to AppArmor so that it would look like that was the only LSM active.

The permissions required to change the display attribute also needed to be worked out. He thought there should be no checks on switching the value, but SELinux developer Stephen Smalley came up with some problems with that approach. So Schaufler suggested requiring the CAP_MAC_ADMIN capability, but it turns out that SELinux developers do not want to rely on capabilities, they want SELinux to be able to weigh in on the choice. So there is now a hook for display changes; SELinux and AppArmor have added ways to set a policy for changing display, while Smack just says "sure, go ahead".

It turns out that Android's binder security mechanism also uses the contexts, so the code needed to ensure that the processes at both ends of the bind see the same context; it doesn't matter which it is, he said, but it needs to be the same. There was also a need to add new audit fields to support subject contexts on a per-LSM basis, while still maintaining the "subj=" entries for backward compatibility. The same thing can be done with object contexts (i.e. "obj=") if that is needed down the road.

Before too long

The next major step is to remove the exclusive tag entirely, by getting rid of it for Smack and SELinux, so that you can use any set of arbitrary LSMs in the same running kernel That is targeted for the 5.8 kernel or so. It is more challenging, in part because the two LSMs do a lot of the same things; in particular, both interact with the networking subsystem extensively, he said.

Two more kernel objects, for keys and superblocks, need to be added to the infrastructure-managed list. Part of the reason that superblocks need blobs for Smack and SELinux is that both process mount options, which is a bit messy to do. Instead of simply handing the options to a single LSM, they will need to be sent to a series of LSMs; each LSM needs to only deal with the options it knows about, ignoring those it doesn't, but then any options that are not handled by any LSM need special treatment.

"The networking stuff has a wonderful set of challenges", Schaufler said. The NetLabel interface is useful to allow an LSM to put CIPSO or CALIPSO labels on packets, but two LSMs cannot put different labels on the same packets. After much "gnashing of teeth", it was decided that unless all of the relevant LSMs could agree on the labels, packet sending would fail. It may be a bit harsh, but it makes sense: "If you can't get people to agree, you probably shouldn't send it".

The label is set when the socket is created, so that is the operation that should fail, even though it doesn't really matter until a packet is actually sent. Making that work requires some changes in NetLabel and SELinux, but more in Smack. NetLabel is used differently by Smack, which "makes things more complicated", he said.

The secmark facility allows associating a 32-bit number with a packet; it is added to the socket buffer (sk_buff or SKB) object by nftables. However, 32 bits is not enough to be able to handle two, three, or even more LSMs that want to use secmarks. It is not clear what to do about that, yet. A hash-table mapping might work or only allowing a single LSM to use the facility is another option, though "it's kind of a cop-out". Another possibility is an SKB extension, but he is a bit leery of going that route because he anticipates some opposition from the networking developers.

Labeled NFSv4 presents an "interesting conundrum", he said. It was defined with a format for the label data that is passed back and forth, which "Linux very carefully ignores". The Linux implementation doesn't add the labels or read them; it just assumes that any data that is there is reasonable for whatever is actually going to use it. The NFS developers are looking into that at this point.

Schaufler wrapped up by reiterating that the first set of changes for AppArmor are targeting Linux 5.5. The second set needs more work and there are some solutions to be found, but it will hopefully make its way into the mainline in 5.8 or thereabouts. Interested readers can view his slides [PDF] and the YouTube video of the talk.

[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to attend the Linux Security Summit Europe in Lyon, France.]

Index entries for this article
Kernel	Security/Security modules
Security	Linux Security Modules (LSM)
Conference	Linux Security Summit Europe/2019

(Log in to post comments)

LSM stacking and the future

Posted Nov 20, 2019 22:19 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

LSM people need to rethink their approach to security. Right now it's simply unusable crap that fails dangerously if turned off.

Instead of only limiting access it needs to be able to widen it. This way the files can just have the most files can have very restrictive permissions (600 with root owner) and LSM can be used to grant additional ones as needed.

Mind you, I don't propose removing the ability to narrow permissions. It is still useful. For example, to confine a browser so it won't be able to access arbitrary files.

LSM stacking and the future

Posted Nov 21, 2019 14:38 UTC (Thu) by sruffell (subscriber, #42212) [Link]

I do not agree that the LSM framework is unusable crap.

I believe your suggestion would *require* a user to understand the configured LSM / Mandatory Access Control implementation to do anything on their system, which for most users is much more difficult than understanding the classic UNIX file permission model. If you do that, the kernel might as well pick an implementation and make it mandatory.

The way it is now, you can have a usable system, and an administrator is able to remove permissions from a file by adjusting the classic file permission bits. If everything was 600 with root owner by default, the file permission bits would, in effect, be meaningless.

So perhaps what you're suggesting is a kernel config option to disable discretionary access control?

LSM stacking and the future

Posted Nov 21, 2019 18:48 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> I do not agree that the LSM framework is unusable crap.
Yes, it is.

SELinux works badly, it's not modular, not composable. So even in Fedora it requires serious hacks to be usable. For example, Apache's CGI scripts are controlled by a global boolean switch.

The the whole "labeling" nonsense. A file already has a label - its path. Yet SELinux still barely supports it.

> I believe your suggestion would *require* a user to understand the configured LSM / Mandatory Access Control implementation to do anything on their system
Indeed. This would make it at least effective and force LSM developers to cut the crap and actually provide a usable system.

LSM stacking and the future

Posted Nov 22, 2019 14:51 UTC (Fri) by sruffell (subscriber, #42212) [Link]

> SELinux works badly, it's not modular, not composable. So even in Fedora it requires serious hacks to be usable. For example, Apache's CGI scripts are controlled by a global boolean switch.

So it sounds to me like you have a problem with SELinux and not the LSM framework?

> The the whole "labeling" nonsense. A file already has a label - its path. Yet SELinux still barely supports it.

This sounds like another complaint against SELinux specifically since labeling is an SELinux (and Smack) requirement. TOMOYO and AppArmor use paths, not file system attributes, to specify domain/type information on files.

> > I believe your suggestion would *require* a user to understand the configured LSM / Mandatory Access Control implementation to do anything on their system
>
> Indeed. This would make it at least effective and force LSM developers to cut the crap and actually provide a usable system.

I admit, thinking about what it would look like to completely disable DAC is an interesting thought exercise. My gut feeling is that this would be better than allowing LSMs to expand permissions on a case-by-case basis. I think it would be better to ensure that the user is not under any misconceptions that the classic permission bits mean anything if you went this route.

LSM stacking and the future

Posted Nov 22, 2019 20:08 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> So it sounds to me like you have a problem with SELinux and not the LSM framework?
Partially. I'm not familiar with TOMOYO - it has essentially ZERO documentation and Smack has always looked stupid to me (it's "SELinux but different").

I like the _idea_ of AppArmor, but so far it's not gaining any real traction. It also was not usable by unprivileged users the last time I checked.

> I admit, thinking about what it would look like to completely disable DAC is an interesting thought exercise. My gut feeling is that this would be better than allowing LSMs to expand permissions on a case-by-case basis.
MAC-only access control would resemble capability-based security a bit. It would also be different from the current state of the art.

> I think it would be better to ensure that the user is not under any misconceptions that the classic permission bits mean anything if you went this route.
Indeed.

LSM stacking and the future

Posted Nov 23, 2019 20:53 UTC (Sat) by jamielinux (guest, #82303) [Link]

> I'm not familiar with TOMOYO - it has essentially ZERO documentation

For anybody else reading, there is quite thorough documentation on the main project website:
http://tomoyo.osdn.jp/documentation.html.en
http://tomoyo.osdn.jp/2.6/index.html.en

LSM stacking and the future

Posted Nov 24, 2019 0:09 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

Well, I did look at it at the time documentation looked like this: http://tomoyo.osdn.jp/2.2/index.html.en

The current one is a bit better, but it's still not nearly good enough. For example, there are pretty much no sample policies anywhere on the Web.

LSM stacking and the future

Posted Nov 28, 2019 8:41 UTC (Thu) by jamielinux (guest, #82303) [Link]

It sounds like “zero” was a slightly unfair exaggeration ;-) But I think we’re not really in disagreement! I agree the documentation could be better, and there could be more, and there isn’t really much of a community. Which is a shame because I think it’s a neat project. (I haven’t used tomoyo in a long time in favour of SELinux.)

Disclaimer: I rewrote tomoyo’s docs and redesigned the website about 8 years ago as my first contribution to free software ... it was also the first time I’d written docs or designed a website at all, so be gentle ;-)

LSM stacking and the future

Posted Nov 29, 2019 8:23 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Thank you!

Documentation can also be really improved simply by adding a handful of sample policies.

LSM stacking and the future

Posted Dec 6, 2019 22:25 UTC (Fri) by nix (subscriber, #2304) [Link]

Well, they've got me going both "those permission-validation penguin cartoons are incredibly cute" and "this whole design looks fascinating", and got me playing with it in a VM where before I'd never touched it for complete lack of understanding of how it worked and what it was for: so I'd say they're pretty good docs.

They have quite a few examples, too -- enough to be getting started with.

LSM stacking and the future

Posted Nov 22, 2019 15:33 UTC (Fri) by vadim (subscriber, #35271) [Link]

> SELinux works badly, it's not modular, not composable. So even in Fedora it requires serious hacks to be usable. For example, Apache's CGI scripts are controlled by a global boolean switch.

You mean the setsebool stuff? That's quite intentional.

SELinux is intended to sandbox applications. The problem is that Apache can do many things, not all of which are desirable in all situations. For instance if I don't run any CGI scripts, then making it so that Apache can't possibly run any even if it tries ensures nobody can sneak a CGI in and exploit the system that way. The booleans are quite useful for things like that.

> The the whole "labeling" nonsense. A file already has a label - its path. Yet SELinux still barely supports it.

It's part of the design. Personally, I want to control objects, not paths. It should be impossible to work around security by accessing a file through another path. Also, applying labels means I can place appropriate labelled files anywhere, and things will work without having to adjust the SELinux configuration, just like under the old permissions system a program can read anything with read permissions, no matter where it is.

LSM stacking and the future

Posted Nov 22, 2019 20:02 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> You mean the setsebool stuff? That's quite intentional.
Yep.

> SELinux is intended to sandbox applications. The problem is that Apache can do many things, not all of which are desirable in all situations.
Then it should be a setting within Apache. Not essentially a system-global non-namespaced variable.

> It's part of the design. Personally, I want to control objects, not paths. It should be impossible to work around security by accessing a file through another path.
Yep. And I'm arguing that this whole design is flawed. It looks OK in academic papers but not in real life.

All Fedora/RHEL users that I know of have SELinux off. I can't find any recent reliable statistics, though. It would be nice if Fedora could gather it.

LSM stacking and the future

Posted Nov 22, 2019 20:49 UTC (Fri) by vadim (subscriber, #35271) [Link]

> Then it should be a setting within Apache. Not essentially a system-global non-namespaced variable.

That makes no sense. The point of SELinux is that it imposes my (the administrator's) policy upon the system. It can't be in Apache's config because Apache doesn't get to have an opinion. The rule is set and enforced before it even gets to parse its config file.

This means that if I say Apache will not run CGIs then it won't, no matter the config, internal defaults or even if somebody finds an exploit and runs their own code.

> Yep. And I'm arguing that this whole design is flawed. It looks OK in academic papers but not in real life.

Works for me. How is it flawed?

> All Fedora/RHEL users that I know of have SELinux off. I can't find any recent reliable statistics, though. It would be nice if Fedora could gather it.

Hi. Been running with it on since Fedora 18. On my desktop, laptop, work laptop, server, firewall, and multiple VMs.

LSM stacking and the future

Posted Nov 23, 2019 15:18 UTC (Sat) by zlynx (subscriber, #2285) [Link]

I run with SELinux on always. I learned the booleans and even write my own policies when necessary. This stuff isn't that hard.

LSM stacking and the future

Posted Apr 6, 2020 23:02 UTC (Mon) by indolering (guest, #102865) [Link]

> All Fedora/RHEL users that I know of have SELinux off. I can't find any recent reliable statistics, though. It would be nice if Fedora could gather it.

I don't really understand SELinux but I never had much of a problem with it, the SELinux troubleshooter fixed most problems with minimal fuss. I also tend to use Flatpak and other isolation mechanisms, however, so maybe I just never ran across it very much.

LSM stacking and the future

Posted Nov 22, 2019 8:55 UTC (Fri) by maxfragg (guest, #122266) [Link]

what exactly do you hope to gain from this?
LSMs are intended to be used as a layer on top of unix permissions, what besides a lot of potential information leaks would you hope to gain from your proposal?
Currently, if DAC checks fail, MAC will never be invoked, thus also no audit logs, how should this work, if MAC would be able to widen access?

LSM stacking and the future

Posted Nov 22, 2019 19:56 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> LSMs are intended to be used as a layer on top of unix permissions, what besides a lot of potential information leaks would you hope to gain from your proposal?
Force people to either make LSMs usable or abandon them altogether. Either one works.

> Currently, if DAC checks fail, MAC will never be invoked, thus also no audit logs, how should this work, if MAC would be able to widen access?
Just like that - an optional ability to override the DAC.

LSM stacking and the future

Posted Nov 23, 2019 23:10 UTC (Sat) by vadim (subscriber, #35271) [Link]

> Force people to either make LSMs usable or abandon them altogether. Either one works.

What do you mean by 'make LSMs usable'? Security is inherently complicated.

> Just like that - an optional ability to override the DAC.

You could just chmod 777 everything. I remember there used to be a demo server somewhere that demonstrated SELinux by allowing you to log in as root, but kept you confined enough that you still couldn't break anything.

That said, IIRC there's a central point in the kernel for permissions check, so this is probably not hard to implement.

LSM stacking and the future

Posted Nov 24, 2019 1:00 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

> What do you mean by 'make LSMs usable'?
Just that.

> Security is inherently complicated.
It's academy-inspired uber-complicated all-powerful monsters that are inherently complicated.

It's not like we haven't see alternatives. OpenBSD has very practical and extremely useful unveil()/pledge() support, for example. Which is STILL impossible to express completely in Linux even with unholy brew of eBPF and SELinux.

> You could just chmod 777 everything. I remember there used to be a demo server somewhere that demonstrated SELinux by allowing you to log in as root, but kept you confined enough that you still couldn't break anything.
No. My point would be to set permissions to 600 (or even 000) and then use LSMs to grant additional access. If one then turns off LSM they lose access.

LSM stacking and the future

Posted Nov 25, 2019 22:14 UTC (Mon) by vadim (subscriber, #35271) [Link]

> It's academy-inspired uber-complicated all-powerful monsters that are inherently complicated.

Because interactions are complicated. Eg, I want Apache to listen on port 80, but not on port 22 under any circumstance. I want Apache to serve files from my home directory, but not my GPG keys.

Then you have messes like PAM, which are pretty tricky to secure.

> It's not like we haven't see alternatives. OpenBSD has very practical and extremely useful unveil()/pledge() support, for example. Which is STILL impossible to express completely in Linux even with unholy brew of eBPF and SELinux.

Linux has seccomp, and while helpful it's a blunt and problematic instrument. For instance trouble comes when somebody makes a new version of open(), and now there's a new syscall that's now in the allow list, yet being used by glibc. Things like that.

But more importantly, this completely misses the point. The point of something like SELinux isn't that Apache politely declares what it will do and won't, but that I, being the sysadmin, am the one authority on the system, and Apache doesn't get any say in anything.

> No. My point would be to set permissions to 600 (or even 000) and then use LSMs to grant additional access. If one then turns off LSM they lose access.

What's the point? You can already make it impossible to turn a LSM off, since they're controlled by things like files and syscalls, which can be disabled.

LSM stacking and the future

Posted Nov 25, 2019 23:01 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> Because interactions are complicated. Eg, I want Apache to listen on port 80, but not on port 22 under any circumstance. I want Apache to serve files from my home directory, but not my GPG keys.
pledge()/unveil() do both just fine in practice.

> Linux has seccomp, and while helpful it's a blunt and problematic instrument. For instance trouble comes when somebody makes a new version of open(), and now there's a new syscall that's now in the allow list, yet being used by glibc. Things like that.
The problem is, even with all its brokenness, seccomp still can not express full pledge()/unveil() semantics.

This is an entirely self-inflicted issue. A simple targeted security subsystem that would just do what pledge() does would help immensely. It won't be uber-flexible NSA-Flask-compatible, and it would require extensions on case-by-case basis, sure. But it also would be much more usable.

> But more importantly, this completely misses the point. The point of something like SELinux isn't that Apache politely declares what it will do and won't, but that I, being the sysadmin, am the one authority on the system, and Apache doesn't get any say in anything.
In reality this doesn't matter much, since you're likely using Apache from the distro-provided package with a distro-provided policy. So putting the permissions inside Apache "namespace" doesn't really matter.

> What's the point? You can already make it impossible to turn a LSM off, since they're controlled by things like files and syscalls, which can be disabled.
You're missing the point. If users or application developers see SELinux interfering with their work, they simply turn SELinux off instead of fixing whatever is wrong. There's no downside to doing this as LSMs fail open.

The ONLY way to fix this in the long term is to make LSMs mandatory.

LSM stacking and the future

Posted Nov 26, 2019 0:12 UTC (Tue) by vadim (subscriber, #35271) [Link]

> pledge()/unveil() do both just fine in practice.

I'm not an user of *BSD, how do you implement those policies with pledge/unveil?

> This is an entirely self-inflicted issue. A simple targeted security subsystem that would just do what pledge() does would help immensely.

Sure, improvements can be made.

> In reality this doesn't matter much, since you're likely using Apache from the distro-provided package with a distro-provided policy. So putting the permissions inside Apache "namespace" doesn't really matter.

It matters because:

1. I can modify the policy without touching the source code.
2. If something sets its own policy, the possibility exists of subverting security before the policy can be applied because there is a point before the policy is set.
3. An application's own author isn't necessarily the best person to be in charge of knowing what it should or not be doing.
4. Tools like 'sandbox' that sandbox arbitrary applications.

> The ONLY way to fix this in the long term is to make LSMs mandatory.

Ohh. I finally get it.

That's a pointless waste of time. You can't fix willful stupidity by technical measures, it never worked and never will. If somebody wants to disable security, they will do so. People will disable it, not compile it, patch the kernel, choose another distribution, run everything as root, whatever.

LSM stacking and the future

Posted Nov 26, 2019 0:26 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

> I'm not an user of *BSD, how do you implement those policies with pledge/unveil?
I don't use OpenBSD but I installed it in a VM just to check this.

You do:
1. unveil() directories that you want to be readable, this will automatically make everything else closed off.
2. Open port 80 as a superuser, pass the socket to Apache. This actually can be done by systemd without any SELinux.

> 1. I can modify the policy without touching the source code.
How often this actually happens? Fedora should try to gather stats. I haven't seen it done once in my experience.

> 2. If something sets its own policy, the possibility exists of subverting security before the policy can be applied because there is a point before the policy is set.
We have systemd for that. The wrapper code to set policy fits with it perfectly. Heck, it's already being used to allow rootless daemons listening on <1024 ports.

> 3. An application's own author isn't necessarily the best person to be in charge of knowing what it should or not be doing.
Realistically neither is the policy writer.

> 4. Tools like 'sandbox' that sandbox arbitrary applications.
unveil()/pledge() them from wrapper scripts. Add more pledges as needed on case-by-case basis.

rootless <1024

Posted Nov 26, 2019 8:04 UTC (Tue) by zdzichu (subscriber, #17118) [Link]

By the way, is the rule of requiring root for lower ports even sensible today? Seems like cargo culting.

rootless <1024

Posted Nov 26, 2019 8:08 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Correct. It makes no sense at all in the modern world. Up until recently (before the advent of systemd) it had actual _negative_ security as it forced all kinds of programs to be launched as root only to listen on a privileged port.

These days it also can be worked around using ambient caps acquired in a helper wrapper (regular caps are lost on exec).

rootless <1024

Posted Nov 26, 2019 14:27 UTC (Tue) by jem (subscriber, #24231) [Link]

Isn't the point of requiring root for ports less than 1024 that they can be trusted to some degree? So you can say ssh some-well-known-host, and rely on that some random joker with an ordinary account on the host hasn't discovered that port 22 is free and started his or her own password stealer.

rootless <1024

Posted Nov 26, 2019 14:53 UTC (Tue) by vadim (subscriber, #35271) [Link]

That's not very much security, since most any random joker can run a VM with a sshd on port 22, and then get people to connect there by say, messing with DNS, registering domain names that are off by one character, and such things.

Also due to said VMs the scenario of people being given shell accounts is becoming rarer by the day, anyway.

Also there's plenty important stuff on ports > 1024, such as administrative consoles like Cockpit on port 9090. So if you've got user access, there's nothing much preventing you for putting up a fake Cockpit page of your own.

rootless <1024

Posted Nov 26, 2019 19:57 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

That was the idea way back then, but in practice it's not really relevant anymore. Moreover, it resulted in daemons like MySQL or Postgres actually standardizing on ports >1024 to avoid doing the open-the-socket-then-drop-privs dance.

Had there been something like systemd from the start (heck, even a better designed inetd) then this might have turned out differently.

Even for ports <1024 you shouldn't really trust them implicitly.

rootless <1024

Posted Nov 26, 2019 21:24 UTC (Tue) by rodgerd (guest, #58896) [Link]

Maybe in 1982, when you could still count the number of Unix machines in the world, and have a reasonable chance of knowing who was configuring them.

Since Unix became popular in, oh, the mid-nineties, it's been a toxic heritage that causes more harm than good, leading programs to run as root simply because they wanted a well-known port, while providing absolutely no security benefit whatsoever.

This is a classic example where mindless adherence to "Unix tradition" has cause more harm than good, all for a lack of critical thinking.

rootless <1024

Posted Nov 28, 2019 2:49 UTC (Thu) by flussence (subscriber, #85566) [Link]

Its security value over the internet is nil, especially in the face of things like win32, but in a closed system (loopback or authenticated VPN) it still provides some assurance that the other end is what you think it is.

LSM stacking and the future

Posted Nov 26, 2019 8:24 UTC (Tue) by vadim (subscriber, #35271) [Link]

> 1. unveil() directories that you want to be readable, this will automatically make everything else closed off.

I see. Well, that's far worse than SELinux.

One big problem I see is that you can only sandbox once. Unveil requires blocking off further changes at the end. So you both can't confine further something in an already confined environment, and can't expand the confinement either.

The first is problematic because now your rule set encompasses anything that could possibly be called by the main process. Eg, you can confine Apache to /home/user/public_html, but what if you call a CGI that reads something in /home/user/.cgi? Now you need to allow that, and you can't give the permission to that particular CGI because once Apache or its wrapper finished with the pledge stuff, it's set in stone. So you give that permission to Apache, adding it to a heap of stuff that Apache can do, because something that it calls needs that. Hardly pretty, very manageable, or very secure.

The second is problematic because you make things that operate with above normal permissions impossible. Eg, think about tools like ping that execute with more privileges than their caller, but that are coded in a way that their usage is safe. Think for instance of a CGI calling scp. You must now allow Apache access to your ssh keys, which makes it able it to serve them to anyone who succeeds in tricking Apache into doing it.

Also this would seem not to allow for new users to be created, unless one can pledge("/home/*/foo")

> 2. Open port 80 as a superuser, pass the socket to Apache. This actually can be done by systemd without any SELinux.

Sure, if you have cooperation from the program, in that it allows to work on a socket passed on stdin at all. And if you need more than one of those now you need systemd support in that program. And what about port 8080?

Let's see what we're up to by now:

1. Write a wrapper that will forbid Apache from making listen() calls, and unveil() anything needed.
2. Write a systemd service that will listen on 80 and 8080, and pass those sockets to Apache
3. Ensure Apache is happy with not being able to listen to anything but what is passed to it from systemd
4. Ensure Apache can get multiple sockets from systemd
5. Ensure that neither Apache nor anything it calls will ever try to unveil anything, because that won't work.
6. Ensure that either anything Apache calls is fine with the pledge() being made, or that it's okay for the pledge() being rescinded on exec (there goes our listen() security!)
7. Accept that adding new users will require completely shutting down and restarting Apache

I don't know, this doesn't look particularly elegant to me. Lots of potential trouble already, and we've not even done much yet!

> How often this actually happens? Fedora should try to gather stats. I haven't seen it done once in my experience.

Anybody using setroubleshoot is effectively doing it

> We have systemd for that. The wrapper code to set policy fits with it perfectly. Heck, it's already being used to allow rootless daemons listening on <1024 ports.

Again, the point is confinement, not allowing formerly root-only things safely. I don't see why a thing should be able to open ports >= 1024 without my permission

> Realistically neither is the policy writer.

There's no perfection for sure, but at least the policy's writer is ideally an uninvolved third party who will ask useful questions like "Why does it want to do that?". Because if the developer of a thing is up to no good, or just not concerned about security, then clearly we benefit from an outside opinion.

LSM stacking and the future

Posted Nov 26, 2019 20:53 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

> The first is problematic because now your rule set encompasses anything that could possibly be called by the main process.
Nope. You can have multiple wrappers that run multiple Apache copies. As I said, I'm interested in reliable _practical_ solutions that just work.

Now let's see what you need to do in SELinux to do the same: Apache listening on port 80 and serving the ~/public directory while denying access to everything else. It's a simple task, right?

First, you need to create a label. Let's call it apache_file_t. And add it to ~/public. This will have an unfortunate side-effect of disabling user_home_t label on it, so if you have policies targeted for user_home_t then they might need an adjustment. For example, your backup utility might _lose_ access to ~/public if its policy just says "allow user_home_t read".

OK. From here on, files created in ~/public will have the apache_file_t label. However, since "file is an object blah-blah" if you move a file into ~/public it will NOT be automatically accessible. You need to remember to relabel it. The reverse is also true, if you move a file from ~/public it will still retain its labels and remain accessible.

But wait, there's more! SELinux can only take away rights. Typically home directories are set to 770 mode, so that they are accessible only for their users and user groups. So you need to make sure Apache is in the same group as yourself.

But OK, let's move on to listening on port 80. SELinux can... do nothing! It's only used to restrict access, not to grant it. So you have to start Apache as root and then let it drop privs. SELinux does allow taking away most of root's capabilities, so that's fine.

Now suppose that SELinux is turned off. Suddenly your home directory becomes accessible for Apache, which is in the same user group as your home directory. Whoops. And Apache is also started as root.

Let's compare with unveil(). You need to add access for ~/public, so you write a helper wrapper that does unveil() for that directory. Nothing else is affected, you don't need to modify your backup utility's policy. And unveil() can't be turned off, it's a core kernel feature.

LSM stacking and the future

Posted Nov 26, 2019 22:03 UTC (Tue) by vadim (subscriber, #35271) [Link]

Nope. You can have multiple wrappers that run multiple Apache copies.

No, I'm not talking about multiple Apache copies. I'm talking about Apache calling other binaries. That is, a situation where you have:

Wrapper -> Apache -> CGI_1
                  -> CGI_2
                  -> CGI_3

What I'm saying is that you have several problems there:

The Wrapper makes it impossible for any of its children (Apache, or Apache's children) to pledge/unveil anything, because pledge/unveil work by listing what you will do, and closing off the rest, after which the functionality is closed off to any children. This means that if anything wants to drop privileges further, now it can't. If it thinks that's an error, it won't run. Otherwise it'll run with more privileges than it needs, and the Wrapper is actually compromising the security of it.
This system means that you need to pledge/unveil everything Apache or any of its children might ever want, and grant that access to that Apache instance and every child. Which means Wrapper must pledge/unveil everything Apache, CGI_1, CGI_2, and CGI_3 at once. You can't allow things for Apache and deny them to the CGIs, or lockdown each CGI in its own particular way... unless you skip on locking down Apache of course.
For pledge() specifically you can drop the lockdown on exec, but of course that now means the CGIs are free to do whatever they want.
It's also an inflexible system in that it requires a full restart to change what you unveil. You must either unveil a subdirectory under which anything will be accessible regardless of what it is, or if you are selective, you only get to do it once in Wrapper, after which it's set in stone and requires a full restart of the Apache instance.

As I said, I'm interested in reliable _practical_ solutions that just work.

And I'm explaining why it's not very practical in practice

That's not a bug, that's a feature. I mean that 100% seriously. SELinux doesn't work on paths, and isn't supposed to. This is exactly the behavior I want my system to have.

Of course it can be turned off, what nonsense is that? "Core" nothing. It didn't exist once upon a time, so just install an older kernel. Or just hack it up. This looks like a promising place for a "return 0". Or perhaps here. Took me about 10 minutes and I never even touched BSD.

Besides which, look at that lovely BYPASSUNVEIL constant. And oh dear, there's a hardcoded list of bypassed rules right in the kernel source.

LSM stacking and the future

Posted Nov 26, 2019 22:47 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

> 1. This means that if anything wants to drop privileges further, now it can't. If it thinks that's an error, it won't run. Otherwise it'll run with more privileges than it needs, and the Wrapper is actually compromising the security of it.
Nothing stops you from making unveil() nestable. Each successful invocation can further reduce the access. I think that's how pledge() works as well.

> 2. This system means that you need to pledge/unveil everything Apache or any of its children might ever want, and grant that access to that Apache instance and every child.
Sure. So does SELinux. Just at the labeling phase and the policy creation phase. I'm assuming that Apache simply runs the CGI scripts.

> 3. For pledge() specifically you can drop the lockdown on exec, but of course that now means the CGIs are free to do whatever they want.
Uh? Nope. pledge() is inherited across exec() calls.

> 4. It's also an inflexible system in that it requires a full restart to change what you unveil.
So does SELinux. You can't change labels of a running process.

> And I'm explaining why it's not very practical in practice
Well, no you have not.

> That's not a bug, that's a feature. I mean that 100% seriously. SELinux doesn't work on paths, and isn't supposed to.
And that's why it's dumb and is turned off in most cases.

> Of course it can be turned off, what nonsense is that? "Core" nothing. It didn't exist once upon a time, so just install an older kernel.
Nope. unveil() can't be turned off. You need to replace the kernel and reboot the system. Running unveil() on an older kernel also results in -ENOSYS.

Meanwhile, SELinux can be turned off with one command.

Want to convince me? Show me a simple script that does what you're proposing: creates a public directory and runs Apache with access to it. No need for CGIs. I'll show the corresponding unveil/pledge based wrapper.

LSM stacking and the future

Posted Nov 27, 2019 1:19 UTC (Wed) by vadim (subscriber, #35271) [Link]

> Nothing stops you from making unveil() nestable. Each successful invocation can further reduce the access.

Sure does: the interface. What unveil() does is first to forbid everything, then allow whatever you pass to unveil.

This means that if you don't block off unveil after making your list of exceptions, a child process or an exploit could just unveil("/") and unblock everything.

> I think that's how pledge() works as well.

pledge() has two modes:

1. Pass on the restrictions to the child. Great, unless your child can't work with those. So if you block something major, you're going to have a hard time exec()ing much after that.
2. Remove all restrictions from the child. Which means you restricted yourself, but your child can do whatever it wants.

> Sure. So does SELinux. Just at the labeling phase and the policy creation phase. I'm assuming that Apache simply runs the CGI scripts.

Nope! See, SELinux has the concept of transition rules: https://danwalsh.livejournal.com/23944.html

Which means, I can do this:

1. Confine apache, so that it can only do apache things.
2. Confine CGI, so that it can only do CGI things.
3. Write an apache -> CGI transition rule. Which means CGI rules don't pollute my Apache rules, and the CGI doesn't get to listen on ports.

This means I can have a setup where every piece is locked down to be able to do no more than it's supposed to.

> So does SELinux. You can't change labels of a running process.

But you can change the labels of files on disk, which means for instance I can take a running libvirt, and give it a disk image on a removable drive. All I need to do is to label it, and it works. I don't need to bring libvirt down and all my VMs with it, so that it can have /mnt/external added to its allowed paths list.

> Meanwhile, SELinux can be turned off with one command.

Which can be disabled with SELinux itself, if you want to. After that, reboot time.

> Want to convince me? Show me a simple script that does what you're proposing: creates a public directory and runs Apache with access to it. No need for CGIs. I'll show the corresponding unveil/pledge based wrapper.

setsebool -P httpd_enable_homedirs 1
chcon -R -t httpd_sys_content_t ~user/public_html

LSM stacking and the future

Posted Nov 27, 2019 1:23 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

> This means that if you don't block off unveil after making your list of exceptions, a child process or an exploit could just unveil("/") and unblock everything.
Uhh, no? unveil("/") will simply return -EPERM. So for example, you can only call unveil("~/public/www") if the parent unveiled("~/public").

LSM stacking and the future

Posted Nov 27, 2019 10:49 UTC (Wed) by vadim (subscriber, #35271) [Link]

Hmm, interesting. Are you saying unveil works differently after a fork()? Eg, as I understand it:

// the whole filesystem is available at the start

unveil("/tmp", "r"); // now only /tmp is visible
unveil("/var", "r"); // now I can see both /tmp and /var

Are you saying the second statement will fail if I insert a fork() (perhaps with an exec) in the middle?

LSM stacking and the future

Posted Nov 27, 2019 11:48 UTC (Wed) by johill (subscriber, #25196) [Link]

I think you have to call unveil(NULL, NULL) to "stop" the ability to unveil more, but typically you would of course do that since it's otherwise useless?

LSM stacking and the future

Posted Nov 27, 2019 12:11 UTC (Wed) by vadim (subscriber, #35271) [Link]

And that's exactly the point I'm making:

unveil is a nice, handy mechanism. But it doesn't nest well. Since unveil builds a list of what you want to allow, you need to lock it up with unveil(NULL, NULL). Once you do so, any further unveil(), whether under a currently locked directory or not fails.

This means it's not a good thing for things that could nest. Sample scenario:

We have a "convert_image" program that does some conversion. We secure it with unveil to ensure it doesn't touch anything it's supposed to, if say, libjpeg happens to have an exploit. Great. It works the way it should from the commandline.

Now that we have a well protected tool, we can call it from Apache and not worry much. Wonderful!

But, let's suppose that since it's so awesome, we've now applied unveil to apache too, which calls convert_image through a CGI. apache calls unveil(NULL, NULL) as it should, and eventually runs convert_image. At that point, one of two things happens:

A. convert_image notices it can't secure itself and refuses to work
B. convert_image ignores the failure and plows ahead, allowing an exploit to work within what Apache is allowed to do.

So, while an interesting tool, it's a limited one, with gotchas like the above.

LSM stacking and the future

Posted Apr 7, 2020 0:00 UTC (Tue) by indolering (guest, #102865) [Link]

I agree with you that LSMs are not composable and would love to discuss cool alternatives ... but your comment comes across as trollish and it's hijacked the discussion : (

Jun	AUG	Nov
	14
2019	2020	2021