2014-12-15 13:01:42 +01:00
|
|
|
/*-
|
|
|
|
* Copyright (c) 2014 John Baldwin
|
2016-09-23 14:32:20 +02:00
|
|
|
* Copyright (c) 2014, 2016 The FreeBSD Foundation
|
2014-12-15 13:01:42 +01:00
|
|
|
*
|
|
|
|
* Portions of this software were developed by Konstantin Belousov
|
|
|
|
* under sponsorship from the FreeBSD Foundation.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include <sys/cdefs.h>
|
|
|
|
__FBSDID("$FreeBSD$");
|
|
|
|
|
|
|
|
#include <sys/param.h>
|
|
|
|
#include <sys/systm.h>
|
|
|
|
#include <sys/capsicum.h>
|
|
|
|
#include <sys/lock.h>
|
2019-07-02 21:07:17 +02:00
|
|
|
#include <sys/mman.h>
|
2014-12-15 13:01:42 +01:00
|
|
|
#include <sys/mutex.h>
|
|
|
|
#include <sys/priv.h>
|
|
|
|
#include <sys/proc.h>
|
|
|
|
#include <sys/procctl.h>
|
|
|
|
#include <sys/sx.h>
|
|
|
|
#include <sys/syscallsubr.h>
|
|
|
|
#include <sys/sysproto.h>
|
|
|
|
#include <sys/wait.h>
|
|
|
|
|
Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings. It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.
Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory. This
implementation is relatively small and happens at the correct
architectural level. Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.
The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection. It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.
I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation. The
current amount is controlled by aslr_pages_rnd.
To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized. This is controlled by vm.cluster_anon,
enabled by default.
The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386. This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.
ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support
for additional architectures will be added after further testing.
Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
to force ASLR off for the given binary. (A tool to edit the feature
control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.
PR: 208580 (exp runs)
Exp-runs done by: antoine
Reviewed by: markj (previous version)
Discussed with: emaste
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5603
2019-02-10 18:19:45 +01:00
|
|
|
#include <vm/vm.h>
|
|
|
|
#include <vm/pmap.h>
|
|
|
|
#include <vm/vm_map.h>
|
|
|
|
#include <vm/vm_extern.h>
|
|
|
|
|
2014-12-15 13:01:42 +01:00
|
|
|
static int
|
|
|
|
protect_setchild(struct thread *td, struct proc *p, int flags)
|
|
|
|
{
|
|
|
|
|
|
|
|
PROC_LOCK_ASSERT(p, MA_OWNED);
|
|
|
|
if (p->p_flag & P_SYSTEM || p_cansched(td, p) != 0)
|
|
|
|
return (0);
|
|
|
|
if (flags & PPROT_SET) {
|
|
|
|
p->p_flag |= P_PROTECTED;
|
|
|
|
if (flags & PPROT_INHERIT)
|
|
|
|
p->p_flag2 |= P2_INHERIT_PROTECTED;
|
|
|
|
} else {
|
|
|
|
p->p_flag &= ~P_PROTECTED;
|
|
|
|
p->p_flag2 &= ~P2_INHERIT_PROTECTED;
|
|
|
|
}
|
|
|
|
return (1);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
protect_setchildren(struct thread *td, struct proc *top, int flags)
|
|
|
|
{
|
|
|
|
struct proc *p;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
p = top;
|
|
|
|
ret = 0;
|
|
|
|
sx_assert(&proctree_lock, SX_LOCKED);
|
|
|
|
for (;;) {
|
|
|
|
ret |= protect_setchild(td, p, flags);
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
/*
|
|
|
|
* If this process has children, descend to them next,
|
|
|
|
* otherwise do any siblings, and if done with this level,
|
|
|
|
* follow back up the tree (but not past top).
|
|
|
|
*/
|
|
|
|
if (!LIST_EMPTY(&p->p_children))
|
|
|
|
p = LIST_FIRST(&p->p_children);
|
|
|
|
else for (;;) {
|
|
|
|
if (p == top) {
|
|
|
|
PROC_LOCK(p);
|
|
|
|
return (ret);
|
|
|
|
}
|
|
|
|
if (LIST_NEXT(p, p_sibling)) {
|
|
|
|
p = LIST_NEXT(p, p_sibling);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
p = p->p_pptr;
|
|
|
|
}
|
|
|
|
PROC_LOCK(p);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
protect_set(struct thread *td, struct proc *p, int flags)
|
|
|
|
{
|
|
|
|
int error, ret;
|
|
|
|
|
|
|
|
switch (PPROT_OP(flags)) {
|
|
|
|
case PPROT_SET:
|
|
|
|
case PPROT_CLEAR:
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
if ((PPROT_FLAGS(flags) & ~(PPROT_DESCEND | PPROT_INHERIT)) != 0)
|
|
|
|
return (EINVAL);
|
|
|
|
|
|
|
|
error = priv_check(td, PRIV_VM_MADV_PROTECT);
|
|
|
|
if (error)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
if (flags & PPROT_DESCEND)
|
|
|
|
ret = protect_setchildren(td, p, flags);
|
|
|
|
else
|
|
|
|
ret = protect_setchild(td, p, flags);
|
|
|
|
if (ret == 0)
|
|
|
|
return (EPERM);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
reap_acquire(struct thread *td, struct proc *p)
|
|
|
|
{
|
|
|
|
|
|
|
|
sx_assert(&proctree_lock, SX_XLOCKED);
|
|
|
|
if (p != curproc)
|
|
|
|
return (EPERM);
|
|
|
|
if ((p->p_treeflag & P_TREE_REAPER) != 0)
|
|
|
|
return (EBUSY);
|
|
|
|
p->p_treeflag |= P_TREE_REAPER;
|
|
|
|
/*
|
|
|
|
* We do not reattach existing children and the whole tree
|
|
|
|
* under them to us, since p->p_reaper already seen them.
|
|
|
|
*/
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
reap_release(struct thread *td, struct proc *p)
|
|
|
|
{
|
|
|
|
|
|
|
|
sx_assert(&proctree_lock, SX_XLOCKED);
|
|
|
|
if (p != curproc)
|
|
|
|
return (EPERM);
|
|
|
|
if (p == initproc)
|
|
|
|
return (EINVAL);
|
|
|
|
if ((p->p_treeflag & P_TREE_REAPER) == 0)
|
|
|
|
return (EINVAL);
|
|
|
|
reaper_abandon_children(p, false);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
reap_status(struct thread *td, struct proc *p,
|
|
|
|
struct procctl_reaper_status *rs)
|
|
|
|
{
|
2015-02-15 09:44:30 +01:00
|
|
|
struct proc *reap, *p2, *first_p;
|
2014-12-15 13:01:42 +01:00
|
|
|
|
|
|
|
sx_assert(&proctree_lock, SX_LOCKED);
|
|
|
|
bzero(rs, sizeof(*rs));
|
|
|
|
if ((p->p_treeflag & P_TREE_REAPER) == 0) {
|
|
|
|
reap = p->p_reaper;
|
|
|
|
} else {
|
|
|
|
reap = p;
|
|
|
|
rs->rs_flags |= REAPER_STATUS_OWNED;
|
|
|
|
}
|
|
|
|
if (reap == initproc)
|
|
|
|
rs->rs_flags |= REAPER_STATUS_REALINIT;
|
|
|
|
rs->rs_reaper = reap->p_pid;
|
|
|
|
rs->rs_descendants = 0;
|
|
|
|
rs->rs_children = 0;
|
|
|
|
if (!LIST_EMPTY(&reap->p_reaplist)) {
|
2015-02-15 09:44:30 +01:00
|
|
|
first_p = LIST_FIRST(&reap->p_children);
|
|
|
|
if (first_p == NULL)
|
|
|
|
first_p = LIST_FIRST(&reap->p_reaplist);
|
|
|
|
rs->rs_pid = first_p->p_pid;
|
2014-12-15 13:01:42 +01:00
|
|
|
LIST_FOREACH(p2, &reap->p_reaplist, p_reapsibling) {
|
|
|
|
if (proc_realparent(p2) == reap)
|
|
|
|
rs->rs_children++;
|
|
|
|
rs->rs_descendants++;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
rs->rs_pid = -1;
|
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
reap_getpids(struct thread *td, struct proc *p, struct procctl_reaper_pids *rp)
|
|
|
|
{
|
|
|
|
struct proc *reap, *p2;
|
|
|
|
struct procctl_reaper_pidinfo *pi, *pip;
|
|
|
|
u_int i, n;
|
|
|
|
int error;
|
|
|
|
|
|
|
|
sx_assert(&proctree_lock, SX_LOCKED);
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
reap = (p->p_treeflag & P_TREE_REAPER) == 0 ? p->p_reaper : p;
|
|
|
|
n = i = 0;
|
|
|
|
error = 0;
|
|
|
|
LIST_FOREACH(p2, &reap->p_reaplist, p_reapsibling)
|
|
|
|
n++;
|
|
|
|
sx_unlock(&proctree_lock);
|
|
|
|
if (rp->rp_count < n)
|
|
|
|
n = rp->rp_count;
|
|
|
|
pi = malloc(n * sizeof(*pi), M_TEMP, M_WAITOK);
|
|
|
|
sx_slock(&proctree_lock);
|
|
|
|
LIST_FOREACH(p2, &reap->p_reaplist, p_reapsibling) {
|
|
|
|
if (i == n)
|
|
|
|
break;
|
|
|
|
pip = &pi[i];
|
|
|
|
bzero(pip, sizeof(*pip));
|
|
|
|
pip->pi_pid = p2->p_pid;
|
|
|
|
pip->pi_subtree = p2->p_reapsubtree;
|
|
|
|
pip->pi_flags = REAPER_PIDINFO_VALID;
|
|
|
|
if (proc_realparent(p2) == reap)
|
|
|
|
pip->pi_flags |= REAPER_PIDINFO_CHILD;
|
2017-11-23 12:25:11 +01:00
|
|
|
if ((p2->p_treeflag & P_TREE_REAPER) != 0)
|
|
|
|
pip->pi_flags |= REAPER_PIDINFO_REAPER;
|
2014-12-15 13:01:42 +01:00
|
|
|
i++;
|
|
|
|
}
|
|
|
|
sx_sunlock(&proctree_lock);
|
|
|
|
error = copyout(pi, rp->rp_pids, i * sizeof(*pi));
|
|
|
|
free(pi, M_TEMP);
|
|
|
|
sx_slock(&proctree_lock);
|
|
|
|
PROC_LOCK(p);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2017-11-23 12:25:11 +01:00
|
|
|
static void
|
|
|
|
reap_kill_proc(struct thread *td, struct proc *p2, ksiginfo_t *ksi,
|
|
|
|
struct procctl_reaper_kill *rk, int *error)
|
|
|
|
{
|
|
|
|
int error1;
|
|
|
|
|
|
|
|
PROC_LOCK(p2);
|
|
|
|
error1 = p_cansignal(td, p2, rk->rk_sig);
|
|
|
|
if (error1 == 0) {
|
|
|
|
pksignal(p2, rk->rk_sig, ksi);
|
|
|
|
rk->rk_killed++;
|
|
|
|
*error = error1;
|
|
|
|
} else if (*error == ESRCH) {
|
|
|
|
rk->rk_fpid = p2->p_pid;
|
|
|
|
*error = error1;
|
|
|
|
}
|
|
|
|
PROC_UNLOCK(p2);
|
|
|
|
}
|
|
|
|
|
|
|
|
struct reap_kill_tracker {
|
|
|
|
struct proc *parent;
|
|
|
|
TAILQ_ENTRY(reap_kill_tracker) link;
|
|
|
|
};
|
|
|
|
|
|
|
|
TAILQ_HEAD(reap_kill_tracker_head, reap_kill_tracker);
|
|
|
|
|
|
|
|
static void
|
|
|
|
reap_kill_sched(struct reap_kill_tracker_head *tracker, struct proc *p2)
|
|
|
|
{
|
|
|
|
struct reap_kill_tracker *t;
|
|
|
|
|
|
|
|
t = malloc(sizeof(struct reap_kill_tracker), M_TEMP, M_WAITOK);
|
|
|
|
t->parent = p2;
|
|
|
|
TAILQ_INSERT_TAIL(tracker, t, link);
|
|
|
|
}
|
|
|
|
|
2014-12-15 13:01:42 +01:00
|
|
|
static int
|
|
|
|
reap_kill(struct thread *td, struct proc *p, struct procctl_reaper_kill *rk)
|
|
|
|
{
|
|
|
|
struct proc *reap, *p2;
|
|
|
|
ksiginfo_t ksi;
|
2017-11-23 12:25:11 +01:00
|
|
|
struct reap_kill_tracker_head tracker;
|
|
|
|
struct reap_kill_tracker *t;
|
|
|
|
int error;
|
2014-12-15 13:01:42 +01:00
|
|
|
|
|
|
|
sx_assert(&proctree_lock, SX_LOCKED);
|
|
|
|
if (IN_CAPABILITY_MODE(td))
|
|
|
|
return (ECAPMODE);
|
2017-11-23 12:25:11 +01:00
|
|
|
if (rk->rk_sig <= 0 || rk->rk_sig > _SIG_MAXSIG ||
|
|
|
|
(rk->rk_flags & ~(REAPER_KILL_CHILDREN |
|
|
|
|
REAPER_KILL_SUBTREE)) != 0 || (rk->rk_flags &
|
|
|
|
(REAPER_KILL_CHILDREN | REAPER_KILL_SUBTREE)) ==
|
|
|
|
(REAPER_KILL_CHILDREN | REAPER_KILL_SUBTREE))
|
2014-12-15 13:01:42 +01:00
|
|
|
return (EINVAL);
|
2015-02-15 09:43:19 +01:00
|
|
|
PROC_UNLOCK(p);
|
2014-12-15 13:01:42 +01:00
|
|
|
reap = (p->p_treeflag & P_TREE_REAPER) == 0 ? p->p_reaper : p;
|
|
|
|
ksiginfo_init(&ksi);
|
|
|
|
ksi.ksi_signo = rk->rk_sig;
|
|
|
|
ksi.ksi_code = SI_USER;
|
|
|
|
ksi.ksi_pid = td->td_proc->p_pid;
|
|
|
|
ksi.ksi_uid = td->td_ucred->cr_ruid;
|
|
|
|
error = ESRCH;
|
|
|
|
rk->rk_killed = 0;
|
|
|
|
rk->rk_fpid = -1;
|
2017-11-23 12:25:11 +01:00
|
|
|
if ((rk->rk_flags & REAPER_KILL_CHILDREN) != 0) {
|
|
|
|
for (p2 = LIST_FIRST(&reap->p_children); p2 != NULL;
|
|
|
|
p2 = LIST_NEXT(p2, p_sibling)) {
|
|
|
|
reap_kill_proc(td, p2, &ksi, rk, &error);
|
|
|
|
/*
|
|
|
|
* Do not end the loop on error, signal
|
|
|
|
* everything we can.
|
|
|
|
*/
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
TAILQ_INIT(&tracker);
|
|
|
|
reap_kill_sched(&tracker, reap);
|
|
|
|
while ((t = TAILQ_FIRST(&tracker)) != NULL) {
|
|
|
|
MPASS((t->parent->p_treeflag & P_TREE_REAPER) != 0);
|
|
|
|
TAILQ_REMOVE(&tracker, t, link);
|
|
|
|
for (p2 = LIST_FIRST(&t->parent->p_reaplist); p2 != NULL;
|
|
|
|
p2 = LIST_NEXT(p2, p_reapsibling)) {
|
|
|
|
if (t->parent == reap &&
|
|
|
|
(rk->rk_flags & REAPER_KILL_SUBTREE) != 0 &&
|
|
|
|
p2->p_reapsubtree != rk->rk_subtree)
|
|
|
|
continue;
|
|
|
|
if ((p2->p_treeflag & P_TREE_REAPER) != 0)
|
|
|
|
reap_kill_sched(&tracker, p2);
|
|
|
|
reap_kill_proc(td, p2, &ksi, rk, &error);
|
|
|
|
}
|
|
|
|
free(t, M_TEMP);
|
2014-12-15 13:01:42 +01:00
|
|
|
}
|
|
|
|
}
|
|
|
|
PROC_LOCK(p);
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
2015-01-18 16:13:11 +01:00
|
|
|
static int
|
|
|
|
trace_ctl(struct thread *td, struct proc *p, int state)
|
|
|
|
{
|
|
|
|
|
|
|
|
PROC_LOCK_ASSERT(p, MA_OWNED);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Ktrace changes p_traceflag from or to zero under the
|
|
|
|
* process lock, so the test does not need to acquire ktrace
|
|
|
|
* mutex.
|
|
|
|
*/
|
|
|
|
if ((p->p_flag & P_TRACED) != 0 || p->p_traceflag != 0)
|
|
|
|
return (EBUSY);
|
|
|
|
|
|
|
|
switch (state) {
|
|
|
|
case PROC_TRACE_CTL_ENABLE:
|
|
|
|
if (td->td_proc != p)
|
|
|
|
return (EPERM);
|
|
|
|
p->p_flag2 &= ~(P2_NOTRACE | P2_NOTRACE_EXEC);
|
|
|
|
break;
|
|
|
|
case PROC_TRACE_CTL_DISABLE_EXEC:
|
|
|
|
p->p_flag2 |= P2_NOTRACE_EXEC | P2_NOTRACE;
|
|
|
|
break;
|
|
|
|
case PROC_TRACE_CTL_DISABLE:
|
|
|
|
if ((p->p_flag2 & P2_NOTRACE_EXEC) != 0) {
|
|
|
|
KASSERT((p->p_flag2 & P2_NOTRACE) != 0,
|
|
|
|
("dandling P2_NOTRACE_EXEC"));
|
|
|
|
if (td->td_proc != p)
|
|
|
|
return (EPERM);
|
|
|
|
p->p_flag2 &= ~P2_NOTRACE_EXEC;
|
|
|
|
} else {
|
|
|
|
p->p_flag2 |= P2_NOTRACE;
|
|
|
|
}
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
trace_status(struct thread *td, struct proc *p, int *data)
|
|
|
|
{
|
|
|
|
|
|
|
|
if ((p->p_flag2 & P2_NOTRACE) != 0) {
|
|
|
|
KASSERT((p->p_flag & P_TRACED) == 0,
|
|
|
|
("%d traced but tracing disabled", p->p_pid));
|
|
|
|
*data = -1;
|
|
|
|
} else if ((p->p_flag & P_TRACED) != 0) {
|
|
|
|
*data = p->p_pptr->p_pid;
|
|
|
|
} else {
|
|
|
|
*data = 0;
|
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2016-09-21 10:23:33 +02:00
|
|
|
static int
|
|
|
|
trapcap_ctl(struct thread *td, struct proc *p, int state)
|
|
|
|
{
|
|
|
|
|
|
|
|
PROC_LOCK_ASSERT(p, MA_OWNED);
|
|
|
|
|
|
|
|
switch (state) {
|
|
|
|
case PROC_TRAPCAP_CTL_ENABLE:
|
|
|
|
p->p_flag2 |= P2_TRAPCAP;
|
|
|
|
break;
|
|
|
|
case PROC_TRAPCAP_CTL_DISABLE:
|
|
|
|
p->p_flag2 &= ~P2_TRAPCAP;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
trapcap_status(struct thread *td, struct proc *p, int *data)
|
|
|
|
{
|
|
|
|
|
|
|
|
*data = (p->p_flag2 & P2_TRAPCAP) != 0 ? PROC_TRAPCAP_CTL_ENABLE :
|
|
|
|
PROC_TRAPCAP_CTL_DISABLE;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
procctl(2): add PROC_NO_NEW_PRIVS_CTL, PROC_NO_NEW_PRIVS_STATUS
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared. The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.
The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30939
2021-07-01 10:11:11 +02:00
|
|
|
static int
|
|
|
|
no_new_privs_ctl(struct thread *td, struct proc *p, int state)
|
|
|
|
{
|
|
|
|
|
|
|
|
PROC_LOCK_ASSERT(p, MA_OWNED);
|
|
|
|
|
|
|
|
if (state != PROC_NO_NEW_PRIVS_ENABLE)
|
|
|
|
return (EINVAL);
|
|
|
|
p->p_flag2 |= P2_NO_NEW_PRIVS;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
no_new_privs_status(struct thread *td, struct proc *p, int *data)
|
|
|
|
{
|
|
|
|
|
|
|
|
*data = (p->p_flag2 & P2_NO_NEW_PRIVS) != 0 ?
|
|
|
|
PROC_NO_NEW_PRIVS_ENABLE : PROC_NO_NEW_PRIVS_DISABLE;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2019-07-02 21:07:17 +02:00
|
|
|
static int
|
|
|
|
protmax_ctl(struct thread *td, struct proc *p, int state)
|
|
|
|
{
|
|
|
|
PROC_LOCK_ASSERT(p, MA_OWNED);
|
|
|
|
|
|
|
|
switch (state) {
|
|
|
|
case PROC_PROTMAX_FORCE_ENABLE:
|
|
|
|
p->p_flag2 &= ~P2_PROTMAX_DISABLE;
|
|
|
|
p->p_flag2 |= P2_PROTMAX_ENABLE;
|
|
|
|
break;
|
|
|
|
case PROC_PROTMAX_FORCE_DISABLE:
|
|
|
|
p->p_flag2 |= P2_PROTMAX_DISABLE;
|
|
|
|
p->p_flag2 &= ~P2_PROTMAX_ENABLE;
|
|
|
|
break;
|
|
|
|
case PROC_PROTMAX_NOFORCE:
|
|
|
|
p->p_flag2 &= ~(P2_PROTMAX_ENABLE | P2_PROTMAX_DISABLE);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
protmax_status(struct thread *td, struct proc *p, int *data)
|
|
|
|
{
|
|
|
|
int d;
|
|
|
|
|
|
|
|
switch (p->p_flag2 & (P2_PROTMAX_ENABLE | P2_PROTMAX_DISABLE)) {
|
|
|
|
case 0:
|
2020-05-01 16:30:59 +02:00
|
|
|
d = PROC_PROTMAX_NOFORCE;
|
2019-07-02 21:07:17 +02:00
|
|
|
break;
|
|
|
|
case P2_PROTMAX_ENABLE:
|
|
|
|
d = PROC_PROTMAX_FORCE_ENABLE;
|
|
|
|
break;
|
|
|
|
case P2_PROTMAX_DISABLE:
|
|
|
|
d = PROC_PROTMAX_FORCE_DISABLE;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (kern_mmap_maxprot(p, PROT_READ) == PROT_READ)
|
|
|
|
d |= PROC_PROTMAX_ACTIVE;
|
|
|
|
*data = d;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings. It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.
Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory. This
implementation is relatively small and happens at the correct
architectural level. Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.
The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection. It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.
I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation. The
current amount is controlled by aslr_pages_rnd.
To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized. This is controlled by vm.cluster_anon,
enabled by default.
The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386. This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.
ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support
for additional architectures will be added after further testing.
Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
to force ASLR off for the given binary. (A tool to edit the feature
control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.
PR: 208580 (exp runs)
Exp-runs done by: antoine
Reviewed by: markj (previous version)
Discussed with: emaste
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5603
2019-02-10 18:19:45 +01:00
|
|
|
static int
|
|
|
|
aslr_ctl(struct thread *td, struct proc *p, int state)
|
|
|
|
{
|
|
|
|
|
|
|
|
PROC_LOCK_ASSERT(p, MA_OWNED);
|
|
|
|
|
|
|
|
switch (state) {
|
|
|
|
case PROC_ASLR_FORCE_ENABLE:
|
|
|
|
p->p_flag2 &= ~P2_ASLR_DISABLE;
|
|
|
|
p->p_flag2 |= P2_ASLR_ENABLE;
|
|
|
|
break;
|
|
|
|
case PROC_ASLR_FORCE_DISABLE:
|
|
|
|
p->p_flag2 |= P2_ASLR_DISABLE;
|
|
|
|
p->p_flag2 &= ~P2_ASLR_ENABLE;
|
|
|
|
break;
|
|
|
|
case PROC_ASLR_NOFORCE:
|
|
|
|
p->p_flag2 &= ~(P2_ASLR_ENABLE | P2_ASLR_DISABLE);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
aslr_status(struct thread *td, struct proc *p, int *data)
|
|
|
|
{
|
|
|
|
struct vmspace *vm;
|
|
|
|
int d;
|
|
|
|
|
|
|
|
switch (p->p_flag2 & (P2_ASLR_ENABLE | P2_ASLR_DISABLE)) {
|
|
|
|
case 0:
|
|
|
|
d = PROC_ASLR_NOFORCE;
|
|
|
|
break;
|
|
|
|
case P2_ASLR_ENABLE:
|
|
|
|
d = PROC_ASLR_FORCE_ENABLE;
|
|
|
|
break;
|
|
|
|
case P2_ASLR_DISABLE:
|
|
|
|
d = PROC_ASLR_FORCE_DISABLE;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if ((p->p_flag & P_WEXIT) == 0) {
|
|
|
|
_PHOLD(p);
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
vm = vmspace_acquire_ref(p);
|
2021-07-15 01:40:04 +02:00
|
|
|
if (vm != NULL) {
|
|
|
|
if ((vm->vm_map.flags & MAP_ASLR) != 0)
|
|
|
|
d |= PROC_ASLR_ACTIVE;
|
Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings. It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.
Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory. This
implementation is relatively small and happens at the correct
architectural level. Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.
The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection. It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.
I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation. The
current amount is controlled by aslr_pages_rnd.
To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized. This is controlled by vm.cluster_anon,
enabled by default.
The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386. This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.
ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support
for additional architectures will be added after further testing.
Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
to force ASLR off for the given binary. (A tool to edit the feature
control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.
PR: 208580 (exp runs)
Exp-runs done by: antoine
Reviewed by: markj (previous version)
Discussed with: emaste
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5603
2019-02-10 18:19:45 +01:00
|
|
|
vmspace_free(vm);
|
|
|
|
}
|
|
|
|
PROC_LOCK(p);
|
|
|
|
_PRELE(p);
|
|
|
|
}
|
|
|
|
*data = d;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2019-09-03 20:56:25 +02:00
|
|
|
static int
|
|
|
|
stackgap_ctl(struct thread *td, struct proc *p, int state)
|
|
|
|
{
|
|
|
|
PROC_LOCK_ASSERT(p, MA_OWNED);
|
|
|
|
|
|
|
|
if ((state & ~(PROC_STACKGAP_ENABLE | PROC_STACKGAP_DISABLE |
|
|
|
|
PROC_STACKGAP_ENABLE_EXEC | PROC_STACKGAP_DISABLE_EXEC)) != 0)
|
|
|
|
return (EINVAL);
|
|
|
|
switch (state & (PROC_STACKGAP_ENABLE | PROC_STACKGAP_DISABLE)) {
|
|
|
|
case PROC_STACKGAP_ENABLE:
|
|
|
|
if ((p->p_flag2 & P2_STKGAP_DISABLE) != 0)
|
|
|
|
return (EINVAL);
|
|
|
|
break;
|
|
|
|
case PROC_STACKGAP_DISABLE:
|
|
|
|
p->p_flag2 |= P2_STKGAP_DISABLE;
|
|
|
|
break;
|
|
|
|
case 0:
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
switch (state & (PROC_STACKGAP_ENABLE_EXEC |
|
|
|
|
PROC_STACKGAP_DISABLE_EXEC)) {
|
|
|
|
case PROC_STACKGAP_ENABLE_EXEC:
|
|
|
|
p->p_flag2 &= ~P2_STKGAP_DISABLE_EXEC;
|
|
|
|
break;
|
|
|
|
case PROC_STACKGAP_DISABLE_EXEC:
|
|
|
|
p->p_flag2 |= P2_STKGAP_DISABLE_EXEC;
|
|
|
|
break;
|
|
|
|
case 0:
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
stackgap_status(struct thread *td, struct proc *p, int *data)
|
|
|
|
{
|
|
|
|
PROC_LOCK_ASSERT(p, MA_OWNED);
|
|
|
|
|
|
|
|
*data = (p->p_flag2 & P2_STKGAP_DISABLE) != 0 ? PROC_STACKGAP_DISABLE :
|
|
|
|
PROC_STACKGAP_ENABLE;
|
|
|
|
*data |= (p->p_flag2 & P2_STKGAP_DISABLE_EXEC) != 0 ?
|
|
|
|
PROC_STACKGAP_DISABLE_EXEC : PROC_STACKGAP_ENABLE_EXEC;
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2014-12-15 13:01:42 +01:00
|
|
|
#ifndef _SYS_SYSPROTO_H_
|
|
|
|
struct procctl_args {
|
|
|
|
idtype_t idtype;
|
|
|
|
id_t id;
|
|
|
|
int com;
|
|
|
|
void *data;
|
|
|
|
};
|
|
|
|
#endif
|
|
|
|
/* ARGSUSED */
|
|
|
|
int
|
|
|
|
sys_procctl(struct thread *td, struct procctl_args *uap)
|
|
|
|
{
|
|
|
|
void *data;
|
|
|
|
union {
|
|
|
|
struct procctl_reaper_status rs;
|
|
|
|
struct procctl_reaper_pids rp;
|
|
|
|
struct procctl_reaper_kill rk;
|
|
|
|
} x;
|
2018-04-18 23:31:13 +02:00
|
|
|
int error, error1, flags, signum;
|
2014-12-15 13:01:42 +01:00
|
|
|
|
2019-03-16 12:44:33 +01:00
|
|
|
if (uap->com >= PROC_PROCCTL_MD_MIN)
|
|
|
|
return (cpu_procctl(td, uap->idtype, uap->id,
|
|
|
|
uap->com, uap->data));
|
|
|
|
|
2014-12-15 13:01:42 +01:00
|
|
|
switch (uap->com) {
|
Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings. It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.
Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory. This
implementation is relatively small and happens at the correct
architectural level. Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.
The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection. It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.
I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation. The
current amount is controlled by aslr_pages_rnd.
To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized. This is controlled by vm.cluster_anon,
enabled by default.
The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386. This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.
ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support
for additional architectures will be added after further testing.
Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
to force ASLR off for the given binary. (A tool to edit the feature
control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.
PR: 208580 (exp runs)
Exp-runs done by: antoine
Reviewed by: markj (previous version)
Discussed with: emaste
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5603
2019-02-10 18:19:45 +01:00
|
|
|
case PROC_ASLR_CTL:
|
2019-07-02 21:07:17 +02:00
|
|
|
case PROC_PROTMAX_CTL:
|
2014-12-15 13:01:42 +01:00
|
|
|
case PROC_SPROTECT:
|
2019-09-03 20:56:25 +02:00
|
|
|
case PROC_STACKGAP_CTL:
|
2015-01-18 16:13:11 +01:00
|
|
|
case PROC_TRACE_CTL:
|
2016-09-21 10:23:33 +02:00
|
|
|
case PROC_TRAPCAP_CTL:
|
procctl(2): add PROC_NO_NEW_PRIVS_CTL, PROC_NO_NEW_PRIVS_STATUS
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared. The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.
The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30939
2021-07-01 10:11:11 +02:00
|
|
|
case PROC_NO_NEW_PRIVS_CTL:
|
2014-12-15 13:01:42 +01:00
|
|
|
error = copyin(uap->data, &flags, sizeof(flags));
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
data = &flags;
|
|
|
|
break;
|
|
|
|
case PROC_REAP_ACQUIRE:
|
|
|
|
case PROC_REAP_RELEASE:
|
|
|
|
if (uap->data != NULL)
|
|
|
|
return (EINVAL);
|
|
|
|
data = NULL;
|
|
|
|
break;
|
|
|
|
case PROC_REAP_STATUS:
|
|
|
|
data = &x.rs;
|
|
|
|
break;
|
|
|
|
case PROC_REAP_GETPIDS:
|
|
|
|
error = copyin(uap->data, &x.rp, sizeof(x.rp));
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
data = &x.rp;
|
|
|
|
break;
|
|
|
|
case PROC_REAP_KILL:
|
|
|
|
error = copyin(uap->data, &x.rk, sizeof(x.rk));
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
data = &x.rk;
|
|
|
|
break;
|
Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings. It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.
Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory. This
implementation is relatively small and happens at the correct
architectural level. Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.
The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection. It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.
I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation. The
current amount is controlled by aslr_pages_rnd.
To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized. This is controlled by vm.cluster_anon,
enabled by default.
The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386. This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.
ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support
for additional architectures will be added after further testing.
Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
to force ASLR off for the given binary. (A tool to edit the feature
control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.
PR: 208580 (exp runs)
Exp-runs done by: antoine
Reviewed by: markj (previous version)
Discussed with: emaste
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5603
2019-02-10 18:19:45 +01:00
|
|
|
case PROC_ASLR_STATUS:
|
2019-07-02 21:07:17 +02:00
|
|
|
case PROC_PROTMAX_STATUS:
|
2019-09-03 20:56:25 +02:00
|
|
|
case PROC_STACKGAP_STATUS:
|
2015-01-18 16:13:11 +01:00
|
|
|
case PROC_TRACE_STATUS:
|
2016-09-21 10:23:33 +02:00
|
|
|
case PROC_TRAPCAP_STATUS:
|
procctl(2): add PROC_NO_NEW_PRIVS_CTL, PROC_NO_NEW_PRIVS_STATUS
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared. The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.
The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30939
2021-07-01 10:11:11 +02:00
|
|
|
case PROC_NO_NEW_PRIVS_STATUS:
|
2015-01-18 16:13:11 +01:00
|
|
|
data = &flags;
|
|
|
|
break;
|
2018-04-20 17:19:27 +02:00
|
|
|
case PROC_PDEATHSIG_CTL:
|
2018-04-18 23:31:13 +02:00
|
|
|
error = copyin(uap->data, &signum, sizeof(signum));
|
|
|
|
if (error != 0)
|
|
|
|
return (error);
|
|
|
|
data = &signum;
|
|
|
|
break;
|
2018-04-20 17:19:27 +02:00
|
|
|
case PROC_PDEATHSIG_STATUS:
|
2018-04-18 23:31:13 +02:00
|
|
|
data = &signum;
|
|
|
|
break;
|
2014-12-15 13:01:42 +01:00
|
|
|
default:
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
error = kern_procctl(td, uap->idtype, uap->id, uap->com, data);
|
|
|
|
switch (uap->com) {
|
|
|
|
case PROC_REAP_STATUS:
|
|
|
|
if (error == 0)
|
|
|
|
error = copyout(&x.rs, uap->data, sizeof(x.rs));
|
2014-12-16 10:49:07 +01:00
|
|
|
break;
|
2014-12-15 13:01:42 +01:00
|
|
|
case PROC_REAP_KILL:
|
|
|
|
error1 = copyout(&x.rk, uap->data, sizeof(x.rk));
|
|
|
|
if (error == 0)
|
|
|
|
error = error1;
|
|
|
|
break;
|
Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings. It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.
Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory. This
implementation is relatively small and happens at the correct
architectural level. Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.
The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection. It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.
I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation. The
current amount is controlled by aslr_pages_rnd.
To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized. This is controlled by vm.cluster_anon,
enabled by default.
The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386. This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.
ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support
for additional architectures will be added after further testing.
Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
to force ASLR off for the given binary. (A tool to edit the feature
control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.
PR: 208580 (exp runs)
Exp-runs done by: antoine
Reviewed by: markj (previous version)
Discussed with: emaste
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5603
2019-02-10 18:19:45 +01:00
|
|
|
case PROC_ASLR_STATUS:
|
2019-07-02 21:07:17 +02:00
|
|
|
case PROC_PROTMAX_STATUS:
|
2019-09-03 20:56:25 +02:00
|
|
|
case PROC_STACKGAP_STATUS:
|
2015-01-18 16:13:11 +01:00
|
|
|
case PROC_TRACE_STATUS:
|
2016-09-21 10:23:33 +02:00
|
|
|
case PROC_TRAPCAP_STATUS:
|
procctl(2): add PROC_NO_NEW_PRIVS_CTL, PROC_NO_NEW_PRIVS_STATUS
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared. The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.
The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30939
2021-07-01 10:11:11 +02:00
|
|
|
case PROC_NO_NEW_PRIVS_STATUS:
|
2015-01-18 16:13:11 +01:00
|
|
|
if (error == 0)
|
|
|
|
error = copyout(&flags, uap->data, sizeof(flags));
|
|
|
|
break;
|
2018-04-20 17:19:27 +02:00
|
|
|
case PROC_PDEATHSIG_STATUS:
|
2018-04-18 23:31:13 +02:00
|
|
|
if (error == 0)
|
|
|
|
error = copyout(&signum, uap->data, sizeof(signum));
|
|
|
|
break;
|
2014-12-15 13:01:42 +01:00
|
|
|
}
|
|
|
|
return (error);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int
|
|
|
|
kern_procctl_single(struct thread *td, struct proc *p, int com, void *data)
|
|
|
|
{
|
|
|
|
|
|
|
|
PROC_LOCK_ASSERT(p, MA_OWNED);
|
|
|
|
switch (com) {
|
Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings. It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.
Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory. This
implementation is relatively small and happens at the correct
architectural level. Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.
The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection. It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.
I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation. The
current amount is controlled by aslr_pages_rnd.
To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized. This is controlled by vm.cluster_anon,
enabled by default.
The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386. This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.
ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support
for additional architectures will be added after further testing.
Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
to force ASLR off for the given binary. (A tool to edit the feature
control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.
PR: 208580 (exp runs)
Exp-runs done by: antoine
Reviewed by: markj (previous version)
Discussed with: emaste
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5603
2019-02-10 18:19:45 +01:00
|
|
|
case PROC_ASLR_CTL:
|
|
|
|
return (aslr_ctl(td, p, *(int *)data));
|
|
|
|
case PROC_ASLR_STATUS:
|
|
|
|
return (aslr_status(td, p, data));
|
2014-12-15 13:01:42 +01:00
|
|
|
case PROC_SPROTECT:
|
|
|
|
return (protect_set(td, p, *(int *)data));
|
2019-07-02 21:07:17 +02:00
|
|
|
case PROC_PROTMAX_CTL:
|
|
|
|
return (protmax_ctl(td, p, *(int *)data));
|
|
|
|
case PROC_PROTMAX_STATUS:
|
|
|
|
return (protmax_status(td, p, data));
|
2019-09-03 20:56:25 +02:00
|
|
|
case PROC_STACKGAP_CTL:
|
|
|
|
return (stackgap_ctl(td, p, *(int *)data));
|
|
|
|
case PROC_STACKGAP_STATUS:
|
|
|
|
return (stackgap_status(td, p, data));
|
2014-12-15 13:01:42 +01:00
|
|
|
case PROC_REAP_ACQUIRE:
|
|
|
|
return (reap_acquire(td, p));
|
|
|
|
case PROC_REAP_RELEASE:
|
|
|
|
return (reap_release(td, p));
|
|
|
|
case PROC_REAP_STATUS:
|
|
|
|
return (reap_status(td, p, data));
|
|
|
|
case PROC_REAP_GETPIDS:
|
|
|
|
return (reap_getpids(td, p, data));
|
|
|
|
case PROC_REAP_KILL:
|
|
|
|
return (reap_kill(td, p, data));
|
2015-01-18 16:13:11 +01:00
|
|
|
case PROC_TRACE_CTL:
|
|
|
|
return (trace_ctl(td, p, *(int *)data));
|
|
|
|
case PROC_TRACE_STATUS:
|
|
|
|
return (trace_status(td, p, data));
|
2016-09-21 10:23:33 +02:00
|
|
|
case PROC_TRAPCAP_CTL:
|
|
|
|
return (trapcap_ctl(td, p, *(int *)data));
|
|
|
|
case PROC_TRAPCAP_STATUS:
|
|
|
|
return (trapcap_status(td, p, data));
|
procctl(2): add PROC_NO_NEW_PRIVS_CTL, PROC_NO_NEW_PRIVS_STATUS
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared. The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.
The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30939
2021-07-01 10:11:11 +02:00
|
|
|
case PROC_NO_NEW_PRIVS_CTL:
|
|
|
|
return (no_new_privs_ctl(td, p, *(int *)data));
|
|
|
|
case PROC_NO_NEW_PRIVS_STATUS:
|
|
|
|
return (no_new_privs_status(td, p, data));
|
2014-12-15 13:01:42 +01:00
|
|
|
default:
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
int
|
|
|
|
kern_procctl(struct thread *td, idtype_t idtype, id_t id, int com, void *data)
|
|
|
|
{
|
|
|
|
struct pgrp *pg;
|
|
|
|
struct proc *p;
|
|
|
|
int error, first_error, ok;
|
2018-04-18 23:31:13 +02:00
|
|
|
int signum;
|
2015-01-18 16:13:11 +01:00
|
|
|
bool tree_locked;
|
2014-12-15 13:01:42 +01:00
|
|
|
|
|
|
|
switch (com) {
|
Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings. It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.
Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory. This
implementation is relatively small and happens at the correct
architectural level. Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.
The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection. It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.
I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation. The
current amount is controlled by aslr_pages_rnd.
To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized. This is controlled by vm.cluster_anon,
enabled by default.
The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386. This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.
ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support
for additional architectures will be added after further testing.
Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
to force ASLR off for the given binary. (A tool to edit the feature
control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.
PR: 208580 (exp runs)
Exp-runs done by: antoine
Reviewed by: markj (previous version)
Discussed with: emaste
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5603
2019-02-10 18:19:45 +01:00
|
|
|
case PROC_ASLR_CTL:
|
|
|
|
case PROC_ASLR_STATUS:
|
2019-07-02 21:07:17 +02:00
|
|
|
case PROC_PROTMAX_CTL:
|
|
|
|
case PROC_PROTMAX_STATUS:
|
2014-12-15 13:01:42 +01:00
|
|
|
case PROC_REAP_ACQUIRE:
|
|
|
|
case PROC_REAP_RELEASE:
|
|
|
|
case PROC_REAP_STATUS:
|
|
|
|
case PROC_REAP_GETPIDS:
|
|
|
|
case PROC_REAP_KILL:
|
2019-09-03 20:56:25 +02:00
|
|
|
case PROC_STACKGAP_CTL:
|
|
|
|
case PROC_STACKGAP_STATUS:
|
2015-01-18 16:13:11 +01:00
|
|
|
case PROC_TRACE_STATUS:
|
2016-09-21 10:23:33 +02:00
|
|
|
case PROC_TRAPCAP_STATUS:
|
2018-04-20 17:19:27 +02:00
|
|
|
case PROC_PDEATHSIG_CTL:
|
|
|
|
case PROC_PDEATHSIG_STATUS:
|
procctl(2): add PROC_NO_NEW_PRIVS_CTL, PROC_NO_NEW_PRIVS_STATUS
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared. The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.
The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30939
2021-07-01 10:11:11 +02:00
|
|
|
case PROC_NO_NEW_PRIVS_CTL:
|
|
|
|
case PROC_NO_NEW_PRIVS_STATUS:
|
2014-12-15 13:01:42 +01:00
|
|
|
if (idtype != P_PID)
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
|
2018-04-18 23:31:13 +02:00
|
|
|
switch (com) {
|
2018-04-20 17:19:27 +02:00
|
|
|
case PROC_PDEATHSIG_CTL:
|
2018-04-18 23:31:13 +02:00
|
|
|
signum = *(int *)data;
|
|
|
|
p = td->td_proc;
|
|
|
|
if ((id != 0 && id != p->p_pid) ||
|
|
|
|
(signum != 0 && !_SIG_VALID(signum)))
|
|
|
|
return (EINVAL);
|
|
|
|
PROC_LOCK(p);
|
|
|
|
p->p_pdeathsig = signum;
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
return (0);
|
2018-04-20 17:19:27 +02:00
|
|
|
case PROC_PDEATHSIG_STATUS:
|
2018-04-18 23:31:13 +02:00
|
|
|
p = td->td_proc;
|
|
|
|
if (id != 0 && id != p->p_pid)
|
|
|
|
return (EINVAL);
|
|
|
|
PROC_LOCK(p);
|
|
|
|
*(int *)data = p->p_pdeathsig;
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
2014-12-15 13:01:42 +01:00
|
|
|
switch (com) {
|
|
|
|
case PROC_SPROTECT:
|
|
|
|
case PROC_REAP_STATUS:
|
|
|
|
case PROC_REAP_GETPIDS:
|
|
|
|
case PROC_REAP_KILL:
|
2015-01-18 16:13:11 +01:00
|
|
|
case PROC_TRACE_CTL:
|
2016-09-21 10:23:33 +02:00
|
|
|
case PROC_TRAPCAP_CTL:
|
procctl(2): add PROC_NO_NEW_PRIVS_CTL, PROC_NO_NEW_PRIVS_STATUS
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared. The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.
The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30939
2021-07-01 10:11:11 +02:00
|
|
|
case PROC_NO_NEW_PRIVS_CTL:
|
2014-12-15 13:01:42 +01:00
|
|
|
sx_slock(&proctree_lock);
|
2015-01-18 16:13:11 +01:00
|
|
|
tree_locked = true;
|
2014-12-15 13:01:42 +01:00
|
|
|
break;
|
|
|
|
case PROC_REAP_ACQUIRE:
|
|
|
|
case PROC_REAP_RELEASE:
|
|
|
|
sx_xlock(&proctree_lock);
|
2015-01-18 16:13:11 +01:00
|
|
|
tree_locked = true;
|
|
|
|
break;
|
Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings. It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.
Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory. This
implementation is relatively small and happens at the correct
architectural level. Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.
The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection. It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.
I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation. The
current amount is controlled by aslr_pages_rnd.
To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized. This is controlled by vm.cluster_anon,
enabled by default.
The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386. This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.
ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support
for additional architectures will be added after further testing.
Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
to force ASLR off for the given binary. (A tool to edit the feature
control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.
PR: 208580 (exp runs)
Exp-runs done by: antoine
Reviewed by: markj (previous version)
Discussed with: emaste
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5603
2019-02-10 18:19:45 +01:00
|
|
|
case PROC_ASLR_CTL:
|
|
|
|
case PROC_ASLR_STATUS:
|
2019-07-02 21:07:17 +02:00
|
|
|
case PROC_PROTMAX_CTL:
|
|
|
|
case PROC_PROTMAX_STATUS:
|
2019-09-03 20:56:25 +02:00
|
|
|
case PROC_STACKGAP_CTL:
|
|
|
|
case PROC_STACKGAP_STATUS:
|
2015-01-18 16:13:11 +01:00
|
|
|
case PROC_TRACE_STATUS:
|
2016-09-21 10:23:33 +02:00
|
|
|
case PROC_TRAPCAP_STATUS:
|
procctl(2): add PROC_NO_NEW_PRIVS_CTL, PROC_NO_NEW_PRIVS_STATUS
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared. The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.
The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30939
2021-07-01 10:11:11 +02:00
|
|
|
case PROC_NO_NEW_PRIVS_STATUS:
|
2015-01-18 16:13:11 +01:00
|
|
|
tree_locked = false;
|
2014-12-15 13:01:42 +01:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
return (EINVAL);
|
|
|
|
}
|
|
|
|
|
|
|
|
switch (idtype) {
|
|
|
|
case P_PID:
|
|
|
|
p = pfind(id);
|
|
|
|
if (p == NULL) {
|
|
|
|
error = ESRCH;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
error = p_cansee(td, p);
|
|
|
|
if (error == 0)
|
|
|
|
error = kern_procctl_single(td, p, com, data);
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
break;
|
|
|
|
case P_PGID:
|
|
|
|
/*
|
|
|
|
* Attempt to apply the operation to all members of the
|
|
|
|
* group. Ignore processes in the group that can't be
|
|
|
|
* seen. Ignore errors so long as at least one process is
|
|
|
|
* able to complete the request successfully.
|
|
|
|
*/
|
|
|
|
pg = pgfind(id);
|
|
|
|
if (pg == NULL) {
|
|
|
|
error = ESRCH;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
PGRP_UNLOCK(pg);
|
|
|
|
ok = 0;
|
|
|
|
first_error = 0;
|
|
|
|
LIST_FOREACH(p, &pg->pg_members, p_pglist) {
|
|
|
|
PROC_LOCK(p);
|
|
|
|
if (p->p_state == PRS_NEW || p_cansee(td, p) != 0) {
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
error = kern_procctl_single(td, p, com, data);
|
|
|
|
PROC_UNLOCK(p);
|
|
|
|
if (error == 0)
|
|
|
|
ok = 1;
|
|
|
|
else if (first_error == 0)
|
|
|
|
first_error = error;
|
|
|
|
}
|
|
|
|
if (ok)
|
|
|
|
error = 0;
|
|
|
|
else if (first_error != 0)
|
|
|
|
error = first_error;
|
|
|
|
else
|
|
|
|
/*
|
|
|
|
* Was not able to see any processes in the
|
|
|
|
* process group.
|
|
|
|
*/
|
|
|
|
error = ESRCH;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
error = EINVAL;
|
|
|
|
break;
|
|
|
|
}
|
2015-01-18 16:13:11 +01:00
|
|
|
if (tree_locked)
|
|
|
|
sx_unlock(&proctree_lock);
|
2014-12-15 13:01:42 +01:00
|
|
|
return (error);
|
|
|
|
}
|