| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements syz_kvm_setup_cpu for riscv64 architecture.
The pseudo-syscall accepts VM fd, vCPU fd, host memory, and guest code
as parameters. Additional parameters (ntext, flags, opts, nopt) are
included for interface consistency with other architectures but are
currently unused on riscv64.
Implementation:
- Set up guest memory via KVM_SET_USER_MEMORY_REGION
- Copy guest code to guest memory
- Initialize guest registers to enable code execution in S-mode
- Return 0 on success, -1 on failure
Testing:
A test file syz_kvm_setup_cpu_riscv64 is included in sys/linux/test/
to verify basic functionality.
Known limitations:
- ifuzz is not yet compatible with riscv64. Temporary workaround: set
text[riscv64] to TextTarget and return nil in createTargetIfuzzConfig
for riscv64 to ensure generateText and mutateText work correctly.
This patch also adds support for KVM_GET_ONE_REG ioctl.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I observed that on machines with many CPUs (480 on my setup), fuzzing
with a handful of procs (8 on my setup) would consistently fail to start
because syz-executors would fail to respond within the default handshake
timeout of 1 minute. Reducing procs to 4 would fix it but sounds
ridiculous on such a powerful machine.
As part of the default sandbox policy, a syz-executor creates a large
number of virtual network interfaces (16 on my kernel config, probably
more on other kernels). This step vastly dominates the executor startup
time and was clearly responsible for the timeout I observed that
prevented me from fuzzing.
When fuzzing or reproducing with procs > 1, all executors run their
sandbox setup in parallel. Creating network interfaces is done by socket
operations to the RTNL (routing netlink) subsystem. Unfortunately, all
RTNL operations in the kernel are serialized by a "rtnl_mutex" mega lock
so instead of paralellizing the 8*16 interfaces creation, they
effectively get serialized and the timing it takes to set up the default
sandbox for one executor scales lineraly with the number of executors
started "in parallel". This is currently inherent to the rtnl_mutex in
the kernel and as far as I can tell there's nothing we can do about it.
However, it makes it very important that each critical section guarded
by "rtnl_mutex" stays short and snappy, to avoid long waits on the lock.
Unfortunately, the default behavior of a virtual network interface
creation is to create one RX and one TX queue per CPU. Each queue is
associated with a sysfs file whose creation is quite slow and goes
through various sanitized paths that take a long time. This means that
each critical section scales linearly to the number of CPUs on the host.
For example, in my setup, starting fuzzing needs 2 minutes 25. I found
that I could bring this down to 10 seconds (15x faster startup time!) by
limiting the number of RX and TX queues created per virtual interface to
2 using the IFLA_NUM_*X_QUEUES RTNL attributes. I opportunistically
chose 2 to try and keep coverage of the code that exercises multiple
queues but I don't have evidences that choosing 1 here would actually
reduce the code coverage.
As far as I can tell, reducing the number of queues would be problematic
in a high performance networking scenario but doesn't matter for fuzzing
in a namespace with only one process so this seems like a fair trade-off
to me. Ultimately, this lets me start a lot more parallel executors and
take better advantage of my beefy machine.
Technical detail for review: veth interfaces actually create two
interfaces for both side of the virtual ethernet link so both sides
need to be configured with a low number of queues.
|
| |
|
|
|
|
|
|
|
|
| |
I can't think of a valid reason to create nodes under /dev/ if they
don't already exist.
On systems where /dev/ isn't backed by a virtual/temp file system,
O_CREAT lets syzkaller create persistent files on disk and may
unnecessarily clutter or fill the disk with files that have nothing to
do with the intended syscall descriptions.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the generated KFuzzTest programs were reusing the address of
the top-level input struct. A problem could arise when the encoded blob
is large and overflows into another allocated region - this certainly
happens in the case where the input struct points to some large char
buffer, for example.
While this wasn't directly a problem, it could lead to racy behavior
when running KFuzzTest targets concurrently.
To fix this, we now introduce an additional buffer parameter into
syz_kfuzztest_run that is as big as the maximum accepted input size in
the KFuzzTest kernel code. When this buffer is allocated, we ensure that
we have some allocated space in the program that can hold the entire
encoded input.
This works in practice, but has not been tested with concurrent
KFuzzTest executions yet.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Add syz_kfuzztest_run pseudo-syscall, KFuzzTest attribute, and encoding
logic.
KFuzzTest targets, which are invoked in the executor with the new
syz_kfuzztest_run pseudo-syscall, require specialized encoding. To
differentiate KFuzzTest calls from standard syzkaller calls, we
introduce a new attribute called KFuzzTest or "kfuzz_test" in syzkaller
descriptions that can be used to annotate calls.
Signed-off-by: Ethan Graham <ethangraham@google.com>
|
| |
|
|
| |
Implement a pseudo-syscall to check the value of kvm_run.exit_reason
|
| |
|
|
|
|
|
| |
Append errors=withdraw to the mount options so that gfs2 withdrawals
don't lead to kernel panics.
Closes #6189.
|
| |
|
|
|
|
| |
The logic in that branch of the code relies on replacing # characters
with numbers. There's a comment in the code which shows a clarifying
example but it misses the # which I found mildly confusing.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We noticed that syzkaller left some files with fairly unusual file names
under /dev. Eg:
---------- 1 root root 0 May 30 14:42 vcs-
---------- 1 root root 0 May 30 14:48 vcs.
---------- 1 root root 136317631 May 30 14:42 vcs'
---------- 1 root root 0 May 30 14:48 vcs(
---------- 1 root root 0 May 30 14:43 vcs)
---------- 1 root root 0 May 30 14:43 vcs*
---------- 1 root root 136317633 May 30 14:46 vcs+
Funnily enough the characters after "vcs" are always within the '0'-10
to '0' ASCII range. We noticed that the syz_open_dev logic uses a modulo
10 on a signed number (the volatile long a1 argument) and in C the
modulo of a negative number stays negative, so the result of this
operation is in the '0'-10 to '0'+10 range. This is in turn casted to a
char which is also signed and doesn't fix the glitch.
By casting a1 to an unsigned long first, this keeps the result of the
modulo operation signed and therefore the virtual file name suffix a
number.
|
| |
|
|
| |
The tests began to fail after pushing the new env container.
|
| |
|
|
|
|
| |
This commit adds the actual SyzOS fuzzer for x86-64 and a small test. It
also updates some necessary parts of the ARM version and adds some glue
for i386.
|
| |
|
|
|
|
|
|
| |
Syzkaller allows user to specify filepath arguments in syscalls via globs.
However, on linux, you are effectivly limited to some /sys and /dev paths due to sandboxing.
With this change, user can supply their custom fuzzing artifacts to /syz-inputs to use those in globs.
They are mounted read-only to increase reproducibility.
|
| |
|
|
| |
Add a pseudo-syscall to assert on register values.
|
| |
|
|
| |
We can reach it at least with automatic descriptions.
|
| |
|
|
|
| |
The new pseudo-syscall will serve as a test assertion, checking the uexit
return value. This is going to help us validate SyzOS code.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is done to solve a particular test failure running:
$ tools/syz-env go test ./prog -run TestSpecialStructs
, which failed on PPC64, because prog/rand.go instanciated a call to
syz_kvm_setup_syzos_vm(), which requested too much memory (1024 pages)
from the allocator (PPC64 uses 64k pages, so the number of available pages
is lower).
On the other hand, factoring out syzos-related descriptions is probably
a nice thing to do anyway.
|
| |
|
|
|
|
|
| |
Pass 1024 pages of memory to both syz_kvm_setup_syzos_vm() and
syz_kvm_setup_cpu$arm64() to make sure that:
- there is enough memory for guest allocations (e.g. ITS pages)
- host can tamper with that memory, provoking more bugs
|
| |
|
|
| |
This helps to avoid leaking processes when killing races with PR_SET_PDEATHSIG.
|
| |
|
|
|
|
|
|
| |
It's unclear why we need a new session.
Sessions group process groups, but we don't use that.
Setsid also creates a new process group,
but we don't kill this process group,
so also unclear why this is needed.
|
| |
|
|
|
|
|
|
| |
All these broke when we started mounting new tmpfs for sandbox=root.
Some are not mounted at all, some are mounted in the outer root
and are not accessible from the new root.
Mount then inside of the new root tmpfs.
Other file systems (binderfs, cgroups) seem to be ok.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
syz_kvm_add_vcpu
The old syz_kvm_setup_cpu() API mixed together VM and VCPU setup, making it
harder to create and fuzz two VCPUs in the same VM.
Introduce two new pseudo-syscalls, syz_kvm_setup_syzos_vm() and syz_kvm_add_vcpu(),
that will simplify this task.
syz_kvm_setup_syzos_vm() takes a VM file descriptor, performs VM setup
(allocates guest memory and installs SYZOS code into it) and returns a
new kvm_syz_vm resource, which is in fact a pointer to `struct kvm_syz_vm`
encapsulating VM-specific data in the C code.
syz_kvm_add_vcpu() takes the VM ID denoted by kvm_syz_vm and creates a
new VCPU within that VM with a proper CPU number. It then stores the
fuzzer-supplied SYZOS API sequence into the corresponding part (indexed by
CPU number) of the VM memory slot, and sets up the CPU registers to interpret
that sequence.
The new pseudo-syscall let the fuzzer create independent CPUs that run different
code sequences without interfering with each other.
|
| |
|
|
|
|
|
|
| |
syz_create_resource allows to turn any value into a resource.
Improve binfmt descriptions using syz_create_resource:
we need to pass the same file name to write syscalls and execve.
Use syz_create_resource to improve binfmt descriptions.
|
| | |
|
| |
|
|
|
| |
The new pseudo-syscall sets up VGICv3 IRQ controller on the host.
That still requires guest setup code, which will be submitted separately.
|
| |
|
|
|
|
|
| |
Protect KCOV regions with pkeys if they are available.
Protect output region with pkeys in snapshot mode.
Snapshot mode is especially sensitive to output buffer corruption
since its location is not randomized.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Running the vusb_ath9k runtest (with [1] and [2] applied) produces ~100k of
extra coverage, which is somewhat close to the current 256k limit. A more
complicated program might produce more extra coverage and overflow the
coverage buffer.
Increase kExtraCoverSize to 1024k.
As the extra coverage buffer is maintained per-executor and not per-thread,
the total increase of the coverage mapping is ~9%, which is not too bad.
[1] https://lore.kernel.org/all/eaf54b8634970b73552dcd38bf9be6ef55238c10.1718092070.git.dvyukov@google.com/
[2] https://lore.kernel.org/all/20240722223726.194658-1-andrey.konovalov@linux.dev/T/#u
|
| |
|
|
|
|
|
| |
cad_pid must not point to a persistent runner process,
b/c it will be killed on ctrl+alt+del.
Fixes #5027
|
| |
|
|
|
|
|
|
|
|
| |
mount() in gVisor returns EFAULT if source is NULL. It is a gVisor issue
and we will fix it. Let's explicitly sets a string source for the proc
mount to unblock gVisor jobs. The source string will additionally be
useful for troubleshooting mount-related problems in the future, because
it is shown in /prod/pid/mountinfo.
Signed-off-by: Andrei Vagin <avagin@google.com>
|
| |
|
|
|
|
|
| |
Android sets fs.mount-max to 100, making it impossible to create new chroots.
Relax the limit, setting it to a value used on desktops.
Tracking bug: https://github.com/google/syzkaller/issues/4972
|
| |
|
|
|
|
|
|
|
|
|
| |
To prevent the executor from accidentally making the whole root file system
immutable (which breaks fuzzing), modify sandbox=none to create a tmpfs mount
and chroot into it before executing programs in a process.
According to `syz-manager -mode=smoke-test`, the number of enabled syscalls on
x86 doesn't change with this patch.
Fixes #4939, #2933, #971.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
My gcc-10 in testing vm compainls during reproducer [0] build with
following error:
rep.c: In function ‘remove_dir’:
rep.c:662:3: error: a label can only be part of a statement and a declaration is not a statement
662 | const int umount_flags = MNT_FORCE | UMOUNT_NOFOLLOW;
| ^~~~~
Label followed by declaration is C23 extension, so only new compilers
support it.
Fix it by moving declaration above `retry` label and put unused attribute
to suppress possible warning.
[0] https://syzkaller.appspot.com/bug?extid=dcc068159182a4c31ca3
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
|
| |
|
|
|
| |
There were some cases where the return value was not checked, allowing
errors to propagate. This fixes them to return early with a message.
|
| |
|
|
| |
strconst["foo"] was replaced by ptr[in, string["foo"]].
|
| |
|
|
| |
Close_range is faster.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Return failure reason from setup functions rather than crash.
This will provide better error messages, but also allow setup
w/o creating subprocesses which will be needed when we combine
fuzzer and executor.
Also close all resources created during setup.
This is also useful for in-process setup, but also should improve
chances of reproducing a bug with C reproducer. Currently leaked
file descriptors may disturb repro execution (e.g. it may act
on a wrong fd).
|
| |
|
|
|
|
|
| |
gVisor doesn't implement binfmt file system.
Fixes: 229488b413d4 ("executor: consistently fail on feature setup")
Signed-off-by: Andrei Vagin <avagin@google.com>
|
| |
|
|
|
|
| |
Currently we fail in some cases, but ignore errors in other cases.
Consistently fail when feature setup fails.
This will be required for relying on setup failure to detect feature presence.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Fuzzer managed to do:
executing program 0:
...
close_range(r5, 0xffffffffffffffff, 0x0)
...
SYZFATAL: executor 0 failed 11 times: executor 0: exit status 67
SYZFAIL: tun read failed
(errno 9: Bad file descriptor)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fuzzer managed to do:
executing program 4:
...
prlimit64(0x0, 0x7, &(0x7f0000000000), 0x0)
...
syz_usbip_server_init(0x3)
...
SYZFATAL: executor 4 failed 11 times: executor 4: exit status 67
SYZFAIL: syz_usbip_server_init: socketpair failed
(errno 24: Too many open files)
|
| |
|
|
|
|
|
|
|
|
| |
Starting from v6.9, we can no longer reuse a loop device while some
filesystem is mounted on it. It conflicts with the MNT_DETACH approach
we were previously using.
Let's umount synchronously instead, but also with a MNT_FORCE flag to
abort potentially long graceful cleanup operations. We don't need them
for the filesystems mounted only for fuzzing purposes.
|
| |
|
|
|
|
| |
Don't treat ENOENT from socket call as fatal.
Fuzzer manages to make all socket calls for a particular
protocol fail using NLBL_MGMT_C_REMOVE netlink function.
|
| |
|
|
|
|
|
|
|
|
| |
IORING_SETUP_CQE32 and IORING_SETUP_SQE128 may lead to incorrect
assumptions about the ring buffer size, causing the kernel to write
outside of the mapped memory, smashing whatever follows it.
This is a hotfix for https://github.com/google/syzkaller/issues/4531
that will stop the ci-upstream-gce-arm64 from generating random
coverage.
|
| |
|
|
|
|
|
| |
The fd may be closed by an async close() call, it's not a reason to
report a failure.
Reported-by: Andrei Vagin <avagin@google.com>
|
| |
|
|
|
|
| |
When BLK_DEV_WRITE_MOUNTED is enabled, the kernel treats the loopfd
reference as a writer and does not let us issue mount() calls over the
same block device.
|
| |
|
|
|
| |
This should never be happening during fuzzing. Otherwise we let
syz-executor silently crash and restart insane number of times.
|
| |
|
|
|
|
| |
During fuzzing, it's expected that certain operations might return
errors. Don't abort the whole syz-executor process in this case, this is
too expensive.
|
| |
|
|
|
|
|
|
|
|
|
| |
This kernel interface provides access to fds of other processes, which
is readily abused by the fuzzer to mangle parent syz-executor fds.
Pid=1 is the parent syz-executor process when PID namespace is created.
Sanitize it in the new syz_pidfd_open() pseudo-syscall.
We could not patch the argument in sys/linux/init.go because the first
argument is a resource.
|
| |
|
|
|
| |
Add new pseudo-syscall for creating a socket in init netns and connecting to
NVMe-oF/TCP server on 127.0.0.1:4420. Also add descriptions for NVMe-oF/TCP.
|