| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
Enhance the debugging capabilities of C reproducers by passing the VCPU
file descriptor to the syz_kvm_assert_syzos_uexit function. With access to
the VCPU fd, the function can now dump the VCPU's register state upon
assertion failure, providing critical context for debugging guest execution
issues.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Implement SYZOS_API_NESTED_AMD_INJECT_EVENT to allow the L1 guest to
inject events (Interrupts, NMIs, Exceptions) into L2 via the VMCB EVENTINJ
field.
This primitive abstracts the VMCB bit-packing logic
(Vector, Type, Valid, Error Code) into a high-level API, enabling the fuzzer
to semantically mutate event injection parameters.
This targets KVM's nested event merging logic, specifically where L0 must
reconcile L1-injected events with Host-pending events.
|
| |
|
|
|
|
|
|
|
| |
Implement the SYZOS_API_NESTED_AMD_STGI and SYZOS_API_NESTED_AMD_CLGI
primitives to toggle the Global Interrupt Flag (GIF). These commands
execute the stgi and clgi instructions respectively and require no
arguments.
Also add a test checking that CLGI correctly masks NMI injection from L0.
|
| |
|
|
|
|
|
|
|
|
| |
Implement the SYZOS_API_NESTED_AMD_INVLPGA primitive to execute the
INVLPGA instruction in the L1 guest.
This allows the fuzzer to target KVM's Shadow MMU and Nested Paging (NPT)
logic by invalidating TLB entries for specific ASIDs.
Also add a simple syzlang seed/regression test.
|
| | |
|
| | |
|
| |
|
|
| |
flatbuffers changed some function signatures. Update executor code to match.
|
| |
|
|
| |
Update flatbuffers to v23.5.26, which matches the compiler version in the new env container.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Florent Revest reported ThinLTO builds failing with the following error:
<inline asm>:2:1: error: symbol 'after_vmentry_label' is already defined
after_vmentry_label:
^
error: cannot compile inline asm
, which turned out to be caused by the compiler not respecting `noinline`.
Adding __attribute__((optnone)) (or optimize("O0") on GCC) fixes the issue.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I observed that on machines with many CPUs (480 on my setup), fuzzing
with a handful of procs (8 on my setup) would consistently fail to start
because syz-executors would fail to respond within the default handshake
timeout of 1 minute. Reducing procs to 4 would fix it but sounds
ridiculous on such a powerful machine.
As part of the default sandbox policy, a syz-executor creates a large
number of virtual network interfaces (16 on my kernel config, probably
more on other kernels). This step vastly dominates the executor startup
time and was clearly responsible for the timeout I observed that
prevented me from fuzzing.
When fuzzing or reproducing with procs > 1, all executors run their
sandbox setup in parallel. Creating network interfaces is done by socket
operations to the RTNL (routing netlink) subsystem. Unfortunately, all
RTNL operations in the kernel are serialized by a "rtnl_mutex" mega lock
so instead of paralellizing the 8*16 interfaces creation, they
effectively get serialized and the timing it takes to set up the default
sandbox for one executor scales lineraly with the number of executors
started "in parallel". This is currently inherent to the rtnl_mutex in
the kernel and as far as I can tell there's nothing we can do about it.
However, it makes it very important that each critical section guarded
by "rtnl_mutex" stays short and snappy, to avoid long waits on the lock.
Unfortunately, the default behavior of a virtual network interface
creation is to create one RX and one TX queue per CPU. Each queue is
associated with a sysfs file whose creation is quite slow and goes
through various sanitized paths that take a long time. This means that
each critical section scales linearly to the number of CPUs on the host.
For example, in my setup, starting fuzzing needs 2 minutes 25. I found
that I could bring this down to 10 seconds (15x faster startup time!) by
limiting the number of RX and TX queues created per virtual interface to
2 using the IFLA_NUM_*X_QUEUES RTNL attributes. I opportunistically
chose 2 to try and keep coverage of the code that exercises multiple
queues but I don't have evidences that choosing 1 here would actually
reduce the code coverage.
As far as I can tell, reducing the number of queues would be problematic
in a high performance networking scenario but doesn't matter for fuzzing
in a namespace with only one process so this seems like a fair trade-off
to me. Ultimately, this lets me start a lot more parallel executors and
take better advantage of my beefy machine.
Technical detail for review: veth interfaces actually create two
interfaces for both side of the virtual ethernet link so both sides
need to be configured with a low number of queues.
|
| |
|
|
|
|
|
|
|
|
|
| |
The new command allows mutation of AMD VMCB block with plain 64-bit writes.
In addition to VM ID and VMCB offset, @nested_amd_vmcb_write_mask takes
three 64-bit numbers: the set mask, the unset mask, and the flip mask.
This allows to make bitwise modifications to VMCB without disturbing
the execution too much.
Also add sys/linux/test/amd64-syz_kvm_nested_amd_vmcb_write_mask to test the
new command behavior.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The new command allows mutation of Intel VMCS fields with the help
of vmwrite instruction.
In addition to VM ID and field ID, @nested_intel_vmwrite_mask takes
three 64-bit numbers: the set mask, the unset mask, and the flip mask.
This allows to make bitwise modifications to VMCS without disturbing
the execution too much.
Also add sys/linux/test/amd64-syz_kvm_nested_vmwrite_mask to test the
new command behavior.
|
| |
|
|
|
|
|
| |
Enable basic RDTSCP handling. Ensure that Intel hosts exit on RDTSCP
in L2, and that both Intel and AMD can handle RDTSCP exits.
Add amd64-syz_kvm_nested_vmresume-rdtscp to test that.
|
| |
|
|
|
|
|
|
|
|
| |
While at it, fix a bug in rdmsr() that apparently lost the top 32 bits.
Also fix a bug in Intel's Secondary Processor-based Controls:
we were incorrectly using the top 32 bits of
X86_MSR_IA32_VMX_PROCBASED_CTLS2 to enable all the available controls
without additional setup. This only worked because rdmsr() zeroed out
those top bits.
|
| |
|
|
|
|
|
| |
Enable basic RDTSC handling. Ensure that Intel hosts exit on RDTSC
in L2, and that both Intel and AMD can handle RDTSC exits.
Add amd64-syz_kvm_nested_vmresume-rdtsc to test that.
|
| |
|
|
|
| |
Ensure L2 correctly exits to L1 on CPUID and resumes properly.
Add a test.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide the SYZOS API command to resume L2 execution after a VM exit,
using VMRESUME on Intel and VMRUN on AMD.
For testing purpose, implement basic handling of the INVD instruction:
- enable INVD interception on AMD (set all bits in VMCB 00Ch);
- map EXIT_REASON_INVD and VMEXIT_INVD into SYZOS_NESTED_EXIT_REASON_INVD;
- advance L2 RIP to skip to the next instruction.
While at it, perform minor refactorings of L2 exit reason handling.
sys/linux/test/amd64-syz_kvm_nested_vmresume tests the new command by
executing two instructions, INVD and HLT, in the nested VM.
|
| |
|
|
|
|
|
| |
It was useful initially for vendor-agnostic tests, but given that we
have guest_uexit_l2() right before it, we can save an extra L2-L1 exit.
Perhaps this should increase the probability of executing more complex
payloads (fewer KVM_RUN calls to reach the same point in L2 code).
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Provide a SYZOS API command to launch the L2 VM using the
VMLAUNCH (Intel) or VMRUN (AMD) instruction.
For testing purposes, each L2->L1 exit is followed by a guest_uexit_l2()
returning the exit code to L0. Common exit reasons (like HLT) will be
mapped into a common exit code space (0xe2e20000 | reason), so that
a single test can be used for both Intel and AMD.
Vendor-specific exit codes will be returned using the 0xe2110000 mask
for Intel and 0xe2aa0000 for AMD.
|
| |
|
|
| |
The new command loads an instruction blob into the specified L2 VM.
|
| |
|
|
|
|
|
|
| |
Now that we are using volatiles in guest_main(), there is no
particular need to base the numbers on primes (this didn't work well
with Clang anyway).
Instead, group the commands logically and leave some space between the
groups for future updates.
|
| |
|
|
|
|
| |
Provide basic setup for registers, page tables, and segments to create
Intel/AMD-based nested virtual machines.
Note that the machines do not get started yet.
|
| |
|
|
|
|
| |
Add vendor-specific code to turn on nested virtualization on Intel
and AMD. Also provide get_cpu_vendor() to pick the correct
implementation.
|
| |
|
|
| |
Set up the L1 guest's 64-bit Task State Segment (TSS), a prerequisite for VMX/SVM.
|
| |
|
|
|
|
|
|
|
|
|
| |
This patch lays the groundwork for nested virtualization by rearranging
the KVM guest's memory map.
Key changes include:
- Introducing a dedicated per-VCPU memory region for L2 VMs.
- Updating `executor/kvm.h` with:
- Adjusted stack addresses for the L1 guest.
- Detailed memory layout macros for L2 VM structures
|
| |
|
|
|
|
|
|
|
|
| |
I can't think of a valid reason to create nodes under /dev/ if they
don't already exist.
On systems where /dev/ isn't backed by a virtual/temp file system,
O_CREAT lets syzkaller create persistent files on disk and may
unnecessarily clutter or fill the disk with files that have nothing to
do with the intended syscall descriptions.
|
| |
|
|
|
| |
This commit enables the periodic execution of a leak checker within the executor. The leak checker will now run every
2 * num_procs executions, but only after the corpus has been triaged and all executor processes are in an idle state.
|
| |
|
|
|
| |
Not having these results in three copies of every KVM-related #define
in each reproducer.
|
| |
|
|
|
|
| |
Add #if checks to define executor_fn_guest_addr() for
__NR_syz_kvm_setup_cpu and __NR_syz_kvm_setup_syzos_vm.
This fixes a compilation error spotted by csource_test.go
|
| |
|
|
|
|
| |
struct kvm_ppc_mmuv3_cfg seems to be defined in
/usr/powerpc64le-linux-gnu/include/asm/kvm.h, remove the duplicate
definition.
|
| |
|
|
| |
Fix a compilation error spotted by csource_test.go
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Apply __addrspace_guest to every guest function and use a C++ template
to statically validate that host functions are not passed to
executor_fn_guest_addr().
This only works in Clang builds of syz-executor, because GCC does not
support address spaces, and C reproducers cannot use templates.
The static check allows us to drop the dynamic checks in DEFINE_GUEST_FN_TO_GPA_FN().
While at it, replace DEFINE_GUEST_FN_TO_GPA_FN() with explicit declarations of
host_fn_guest_addr() and guest_fn_guest_addr().
|
| |
|
|
|
| |
Use SYZOS_ADDR_EXECUTOR_CODE instead of both. Also put platform-specific
definitions under #if GOARCH_xxx.
|
| |
|
|
| |
Somehow Clang still manages to emit a jump table for it.
|
| |
|
|
| |
Make sure setup_cpuid() is only declared together with install_user_code()
|
| |
|
|
|
|
|
|
|
|
| |
The new API call allows to initialize the handler with one of the
three possible values:
- NULL (should cause a page fault)
- dummy_null_handler (should call iret)
- uexit_irq_handler (should perform guest_uexit(UEXIT_IRQ))
Also add a test for uexit_irq_handler()
|
| |
|
|
|
|
|
|
| |
Use a pool of 32 pages to allocate PT and PE entries for the guest
page tables.
This eliminates the need for manually assigned page table entries
that are brittle and may break when someone changes the memory
layout.
|
| |
|
|
|
| |
Pass around struct kvm_syzos_vm instead of one-off pointers to
various guest memory ranges.
|
| |
|
|
|
| |
Untangle SYZOS GDT setup from the legacy one.
Drop LDT and TSS for now.
|
| |
|
|
|
| |
Per https://wiki.osdev.org/Task_State_Segment#Long_Mode,
io_bitmap and reserved3 should be 16-bit.
|
| |
|
|
|
| |
Instead of open-coding every memory region in several places,
use a single array to configure their creation.
|
| |
|
|
|
|
| |
Provide map_4k_region() to ease page table creation for different
regions.
While at it, also move the stack from 0x0 to 0x90000.
|
| |
|
|
|
| |
DEFINE_GUEST_FN_TO_GPA_FN() allows to define helper functions to
calculate guest addresses in the host/guest code.
|
| |
|
|
|
|
|
| |
To distinguish SYZOS addresses from other x86 definitions, change them
to start with X86_SYZOS_ADDR_
No functional change.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the generated KFuzzTest programs were reusing the address of
the top-level input struct. A problem could arise when the encoded blob
is large and overflows into another allocated region - this certainly
happens in the case where the input struct points to some large char
buffer, for example.
While this wasn't directly a problem, it could lead to racy behavior
when running KFuzzTest targets concurrently.
To fix this, we now introduce an additional buffer parameter into
syz_kfuzztest_run that is as big as the maximum accepted input size in
the KFuzzTest kernel code. When this buffer is allocated, we ensure that
we have some allocated space in the program that can hold the entire
encoded input.
This works in practice, but has not been tested with concurrent
KFuzzTest executions yet.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Add syz_kfuzztest_run pseudo-syscall, KFuzzTest attribute, and encoding
logic.
KFuzzTest targets, which are invoked in the executor with the new
syz_kfuzztest_run pseudo-syscall, require specialized encoding. To
differentiate KFuzzTest calls from standard syzkaller calls, we
introduce a new attribute called KFuzzTest or "kfuzz_test" in syzkaller
descriptions that can be used to annotate calls.
Signed-off-by: Ethan Graham <ethangraham@google.com>
|
| |
|
|
|
|
|
|
| |
Add SYZOS calls that correspond to the IN and OUT x86 instructions
that perform port I/O.
These instructions have different variants, for now we just implement
the one that takes the port number from DX instead of encoding it in
the opcode.
|
| |
|
|
|
| |
Add a SYZOS call to write to one of the debug registers
(DR0-DR7).
|
| |
|
|
| |
Implement a pseudo-syscall to check the value of kvm_run.exit_reason
|
| |
|
|
|
|
|
|
|
|
|
| |
When compiling the executor in syz-env-old, -fstack-protector may
kick in and introduce global accesses that tools/check-syzos.sh reports.
To prevent this, introduce the __no_stack_protector macro attribute that
disable stack protection for the function in question, and use it for
guest code.
While at it, factor out some common definitions into common_kvm_syzos.h
|