| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
Close file descriptors in read_btf_vmlinux, find_vf_interface and
syz_usbip_server_init.
Otherwise sykaller fails when executor exeeds a limit for available fds.
Signed-off-by: Pavel Nikulshin <p.nikulshin@ispras.ru>
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We get various reports like: "SYZFAIL: tun: can't open /dev/net/tun":
https://syzkaller.appspot.com/bug?extid=17b76f12c4893fc7b67b
or like "SYZFAIL: tun: ioctl(TUNSETIFF) failed"
https://syzkaller.appspot.com/bug?extid=49461b4cd5aa62f553fc
Which look a lot like syzkaller manages to delete or replace
/dev/net/tun and from there on consistently fail to open the device and
complains with these SYZFAIL.
This is an attempt to fix up the /dev/net/tun before we open it so we
know we try to open the right file.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch enables syzos for riscv64 and implements
the corresponding pseudo syscalls.
Pseudo syscalls:
- syz_kvm_setup_syzos_vm
- syz_kvm_add_vcpu
- syz_kvm_assert_syzos_uexit
Syzos guest support:
- guest_uexit
- guest_execute_code
- guest_handle_csrr and guest_handle_csrw
Test seeds:
- riscv64-syz_kvm_setup_syzos_vm
- riscv64-syz_kvm_setup_syzos_vm-csrr
- riscv64-syz_kvm_setup_syzos_vm-csrw
|
| |
|
|
|
| |
On Linux, verify that makedumpfile and the second kernel are present,
then set up a kernel to be used on panic.
|
| |
|
|
| |
Follow-up fix for https://github.com/google/syzkaller/pull/6820
|
| |
|
|
|
|
|
|
|
|
| |
L1 guest memory is non-contiguous, but previously host setup assumed
the opposite, using L1 guest addresses as offsets in the host memory
block. This led to subtle bugs in IRQ handling (and possibly elsewhere).
Fix this by using gpa_to_hva() to translate guest physical addresses to
host virtual addresses.
This function is cold, so we can afford O(SYZOS_REGION_COUNT) complexity.
|
| |
|
|
|
| |
Somehow one of the previous patches made dummy_null_handler() behave
like uexit_irq_handler(). Restore the original handler behavior.
|
| |
|
|
|
|
| |
These pseudo syscalls implementation was not previously tested in
pkg/csource, so let's first fix the bugs before enabling tests
for them.
|
| |
|
|
|
|
|
|
|
|
| |
Moving setup_pg_table() before setup_gdt_64() prevents the page table
initialization from accidentally erasing the newly created Global
Descriptor Table (GDT).
If the GDT is zeroed out, the CPU hardware cannot fetch the necessary
code segment descriptors to deliver interrupts or exceptions, leading
to unhandled #GP or #DF crashes.
|
| |
|
|
| |
No functional change for syz-executor.
|
| |
|
|
|
| |
Use UEXIT_END to indicate normal guest termination, and UEXIT_INVALID_MAIN
to indicate malformed guest program.
|
| |
|
|
| |
This is only needed for tests generated by prog2c.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Executor code relies on uint32/uint64 types rather than uint*_t.
Using uint64_t causes type mismatches in generated C reproducers
for programs.
Switch uint64_t to uint64 to keep executor headers consistent.
No functional changes.
Signed-off-by: 6eanut <jiakaiPeanut@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit corrects the GDT setup for the data and TSS segments in L1.
Previously, the data segment was incorrectly using the TSS base address,
and the TSS base address was not properly set.
The data segment base is now set to 0, as it should be for a flat 64-bit
model. The TSS segment descriptor in the GDT now correctly points to
X86_SYZOS_ADDR_VAR_TSS and uses the full 64-bit address.
The attributes are also updated to mark the TSS as busy.
Additionally, the TSS region is now explicitly copied from L1 to L2 to
ensure the L2 environment has a valid TSS.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit introduces the `SYZOS_API_NESTED_LOAD_SYZOS` command to
enable running full SYZOS programs within a nested L2 guest, enhancing
fuzzing capabilities for nested virtualization.
Key changes include:
- Nested SYZOS Execution: The new command loads a SYZOS program into an
L2 VM, setting up its execution environment.
- ABI Refinement: Program size is now passed via the shared `syzos_globals`
memory region instead of registers, standardizing the ABI for L1 and L2.
- L2 State Management: Improved saving and restoring of L2 guest GPRs
across VM-exits using inline assembly wrappers for Intel and AMD.
- Nested UEXIT Propagation: Intercepts EPT/NPT faults on the exit page to
capture the L2 exit code from saved registers and forward it to L0 with
an incremented nesting level.
- L2 Memory Management: Updates to L2 page table setup, including skipping
NO_HOST_MEM regions to force exits, and a new `l2_gpa_to_pa` helper.
|
| |
|
|
|
| |
Make sure executor_fn_guest_addr() is defined when
__NR_syz_kvm_assert_syzos_uexit is.
|
| |
|
|
|
| |
When setting up L1 guest, execute CPUID and enable X86_EFER_SVME for
AMD CPUs.
|
| |
|
|
|
| |
X86_CR4_OSFXSR is 1<<9 according to
https://wiki.osdev.org/CPU_Registers_x86
|
| |
|
|
|
| |
Turned out executor_fn_guest_addr() was not inlined when building
the reproducers with -O0, so the guest code crashed.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactor the SYZOS L1 guest to construct L2 page tables dynamically by
mirroring its own memory layout (provided via boot arguments) instead
of using a static 2MB identity map.
This change introduces l2_map_page to allocate unique backing memory
for most regions, while mapping X86_SYZOS_ADDR_USER_CODE and
X86_SYZOS_ADDR_STACK_BOTTOM to specific per-VM buffers reserved in L1.
This allows L1 to inject code and stack content into backing buffers
while the L2 guest executes them from standard virtual addresses.
Additionally, MEM_REGION_FLAG_* definitions are moved to the guest
header to support this logic.
|
| |
|
|
|
|
|
|
|
| |
Reorder include directives in SYZOS headers to follow the project's
include ordering rules.
https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includes
Signed-off-by: 6eanut <jiakaiPeanut@gmail.com>
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enable the SYZOS guest (L1) to dynamically allocate memory
for nested L2 page tables, replacing the previous rigid static layout.
Move the mem_region and syzos_boot_args struct definitions to the guest
header (common_kvm_amd64_syzos.h) to allow the guest to parse the memory
map injected by the host.
Introduce a bump allocator, guest_alloc_page(), which targets the
X86_SYZOS_ADDR_UNUSED heap. This allocator relies on a new struct
syzos_globals located at X86_SYZOS_ADDR_GLOBALS to track the allocation
offset.
Refactor setup_l2_page_tables() to allocate intermediate paging levels
(PDPT, PD, PT) via guest_alloc_page() instead of using fixed contiguous
offsets relative to the PML4. This allows for disjoint memory usage and
supports future recursion requirements.
|
| |
|
|
|
|
|
|
|
|
| |
Reserve a dedicated 4KB page at X86_SYZOS_ADDR_GLOBALS (0x17F000) to
store global state shared across the SYZOS L1 guest environment.
This region is required to store the state of the guest-side memory
allocator (specifically the allocation offset and total size of the
unused heap), enabling thread-safe dynamic memory allocation for nested
L2 page tables.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce a dedicated page at X86_SYZOS_ADDR_BOOT_ARGS to pass
configuration data from the executor to the SYZOS guest. This will allow
dynamic adjustments to the guest environment, such as specifying memory
region sizes.
- Added `MEM_REGION_FLAG_REMAINING` to flag the last memory region, which
will consume the rest of the available guest memory.
- Defined `struct syzos_boot_args` to pass the memory region layout to the
guest.
- Modified `syzos_mem_regions`:
- Reduced X86_SYZOS_ADDR_VAR_IDT size to 10 pages.
- Inserted the new X86_SYZOS_ADDR_BOOT_ARGS region.
- Added a final region with MEM_REGION_FLAG_REMAINING.
- Updated `setup_vm` to:
- Calculate the size of the REMAINING region.
- Populate the `syzos_boot_args` structure in the boot args page.
- Updated `setup_pg_table` to use the REMAINING flag to map the last region.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactor the SYZOS guest memory layout to decouple the dynamic page table
allocator from the fixed system data structures (GDT, IDT, initial PML4).
Previously, the page table pool was located at 0x5000, tightly packed with
the initial system pages. This rigid structure made it difficult to expand
the pool or inject configuration data without shifting fixed offsets.
Move X86_SYZOS_ADDR_PT_POOL to 0x180000, creating a distinct high-memory
region well above the L2 VCPU data, and increase the pool size to 64 pages
(256KB) to support deeper nested hierarchies.
Update the syz_kvm_setup_syzos_vm logic to handle non-contiguous
Guest-to-Host address translation via a new get_host_pte_ptr() helper.
This is necessary because the executor's host memory allocation remains
strictly linear while the guest physical address space now contains
significant gaps.
This layout change is a prerequisite for enabling "SYZOS inside SYZOS"
(L2 nesting), allowing the future injection of boot arguments into the
gap created between fixed data and dynamic regions.
|
| |
|
|
|
| |
Provide visibility into expected vs. actual KVM exit reasons during
assertion failures.
|
| |
|
|
|
|
|
| |
- Enables syz_kvm_assert_reg for riscv64.
- Updates kvm_one_reg according to the latest definition in
https://github.com/torvalds/linux/blob/master/arch/riscv/include/uapi/asm/kvm.h.
- Adds a test case: riscv64-kvm-reg.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces SYZOS_API_NESTED_AMD_VMLOAD and
SYZOS_API_NESTED_AMD_VMSAVE.
These primitives allow the L1 guest to execute the VMLOAD and VMSAVE
instructions, which load/store additional guest state (FS, GS, TR, LDTR,
etc.) to/from the VMCB specified by the 'vm_id' argument.
This stresses the KVM L0 instruction emulator, which must validate the
L1-provided physical address in RAX and perform the state transfer.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch implements syz_kvm_setup_cpu for riscv64 architecture.
The pseudo-syscall accepts VM fd, vCPU fd, host memory, and guest code
as parameters. Additional parameters (ntext, flags, opts, nopt) are
included for interface consistency with other architectures but are
currently unused on riscv64.
Implementation:
- Set up guest memory via KVM_SET_USER_MEMORY_REGION
- Copy guest code to guest memory
- Initialize guest registers to enable code execution in S-mode
- Return 0 on success, -1 on failure
Testing:
A test file syz_kvm_setup_cpu_riscv64 is included in sys/linux/test/
to verify basic functionality.
Known limitations:
- ifuzz is not yet compatible with riscv64. Temporary workaround: set
text[riscv64] to TextTarget and return nil in createTargetIfuzzConfig
for riscv64 to ensure generateText and mutateText work correctly.
This patch also adds support for KVM_GET_ONE_REG ioctl.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces SYZOS_API_NESTED_AMD_SET_INTERCEPT to SYZOS.
This primitive enables the fuzzer to surgically modify intercept vectors
in the AMD VMCB (Virtual Machine Control Block) Control Area.
It implements a read-modify-write operation on 32-bit VMCB offsets,
allowing the L1 hypervisor (SYZOS) to deterministically set or clear
specific intercept bits (e.g., for RDTSC, HLT, or exceptions) for the L2
guest.
This capability allows syzkaller to systematically explore KVM's nested
SVM emulation logic by toggling intercepts on and off, rather than
relying on static defaults or random memory corruption.
|
| |
|
|
|
|
|
|
| |
Enhance the debugging capabilities of C reproducers by passing the VCPU
file descriptor to the syz_kvm_assert_syzos_uexit function. With access to
the VCPU fd, the function can now dump the VCPU's register state upon
assertion failure, providing critical context for debugging guest execution
issues.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Implement SYZOS_API_NESTED_AMD_INJECT_EVENT to allow the L1 guest to
inject events (Interrupts, NMIs, Exceptions) into L2 via the VMCB EVENTINJ
field.
This primitive abstracts the VMCB bit-packing logic
(Vector, Type, Valid, Error Code) into a high-level API, enabling the fuzzer
to semantically mutate event injection parameters.
This targets KVM's nested event merging logic, specifically where L0 must
reconcile L1-injected events with Host-pending events.
|
| |
|
|
|
|
|
|
|
| |
Implement the SYZOS_API_NESTED_AMD_STGI and SYZOS_API_NESTED_AMD_CLGI
primitives to toggle the Global Interrupt Flag (GIF). These commands
execute the stgi and clgi instructions respectively and require no
arguments.
Also add a test checking that CLGI correctly masks NMI injection from L0.
|
| |
|
|
|
|
|
|
|
|
| |
Implement the SYZOS_API_NESTED_AMD_INVLPGA primitive to execute the
INVLPGA instruction in the L1 guest.
This allows the fuzzer to target KVM's Shadow MMU and Nested Paging (NPT)
logic by invalidating TLB entries for specific ASIDs.
Also add a simple syzlang seed/regression test.
|
| | |
|
| | |
|
| |
|
|
| |
flatbuffers changed some function signatures. Update executor code to match.
|
| |
|
|
| |
Update flatbuffers to v23.5.26, which matches the compiler version in the new env container.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Florent Revest reported ThinLTO builds failing with the following error:
<inline asm>:2:1: error: symbol 'after_vmentry_label' is already defined
after_vmentry_label:
^
error: cannot compile inline asm
, which turned out to be caused by the compiler not respecting `noinline`.
Adding __attribute__((optnone)) (or optimize("O0") on GCC) fixes the issue.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I observed that on machines with many CPUs (480 on my setup), fuzzing
with a handful of procs (8 on my setup) would consistently fail to start
because syz-executors would fail to respond within the default handshake
timeout of 1 minute. Reducing procs to 4 would fix it but sounds
ridiculous on such a powerful machine.
As part of the default sandbox policy, a syz-executor creates a large
number of virtual network interfaces (16 on my kernel config, probably
more on other kernels). This step vastly dominates the executor startup
time and was clearly responsible for the timeout I observed that
prevented me from fuzzing.
When fuzzing or reproducing with procs > 1, all executors run their
sandbox setup in parallel. Creating network interfaces is done by socket
operations to the RTNL (routing netlink) subsystem. Unfortunately, all
RTNL operations in the kernel are serialized by a "rtnl_mutex" mega lock
so instead of paralellizing the 8*16 interfaces creation, they
effectively get serialized and the timing it takes to set up the default
sandbox for one executor scales lineraly with the number of executors
started "in parallel". This is currently inherent to the rtnl_mutex in
the kernel and as far as I can tell there's nothing we can do about it.
However, it makes it very important that each critical section guarded
by "rtnl_mutex" stays short and snappy, to avoid long waits on the lock.
Unfortunately, the default behavior of a virtual network interface
creation is to create one RX and one TX queue per CPU. Each queue is
associated with a sysfs file whose creation is quite slow and goes
through various sanitized paths that take a long time. This means that
each critical section scales linearly to the number of CPUs on the host.
For example, in my setup, starting fuzzing needs 2 minutes 25. I found
that I could bring this down to 10 seconds (15x faster startup time!) by
limiting the number of RX and TX queues created per virtual interface to
2 using the IFLA_NUM_*X_QUEUES RTNL attributes. I opportunistically
chose 2 to try and keep coverage of the code that exercises multiple
queues but I don't have evidences that choosing 1 here would actually
reduce the code coverage.
As far as I can tell, reducing the number of queues would be problematic
in a high performance networking scenario but doesn't matter for fuzzing
in a namespace with only one process so this seems like a fair trade-off
to me. Ultimately, this lets me start a lot more parallel executors and
take better advantage of my beefy machine.
Technical detail for review: veth interfaces actually create two
interfaces for both side of the virtual ethernet link so both sides
need to be configured with a low number of queues.
|
| |
|
|
|
|
|
|
|
|
|
| |
The new command allows mutation of AMD VMCB block with plain 64-bit writes.
In addition to VM ID and VMCB offset, @nested_amd_vmcb_write_mask takes
three 64-bit numbers: the set mask, the unset mask, and the flip mask.
This allows to make bitwise modifications to VMCB without disturbing
the execution too much.
Also add sys/linux/test/amd64-syz_kvm_nested_amd_vmcb_write_mask to test the
new command behavior.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The new command allows mutation of Intel VMCS fields with the help
of vmwrite instruction.
In addition to VM ID and field ID, @nested_intel_vmwrite_mask takes
three 64-bit numbers: the set mask, the unset mask, and the flip mask.
This allows to make bitwise modifications to VMCS without disturbing
the execution too much.
Also add sys/linux/test/amd64-syz_kvm_nested_vmwrite_mask to test the
new command behavior.
|
| |
|
|
|
|
|
| |
Enable basic RDTSCP handling. Ensure that Intel hosts exit on RDTSCP
in L2, and that both Intel and AMD can handle RDTSCP exits.
Add amd64-syz_kvm_nested_vmresume-rdtscp to test that.
|
| |
|
|
|
|
|
|
|
|
| |
While at it, fix a bug in rdmsr() that apparently lost the top 32 bits.
Also fix a bug in Intel's Secondary Processor-based Controls:
we were incorrectly using the top 32 bits of
X86_MSR_IA32_VMX_PROCBASED_CTLS2 to enable all the available controls
without additional setup. This only worked because rdmsr() zeroed out
those top bits.
|
| |
|
|
|
|
|
| |
Enable basic RDTSC handling. Ensure that Intel hosts exit on RDTSC
in L2, and that both Intel and AMD can handle RDTSC exits.
Add amd64-syz_kvm_nested_vmresume-rdtsc to test that.
|
| |
|
|
|
| |
Ensure L2 correctly exits to L1 on CPUID and resumes properly.
Add a test.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide the SYZOS API command to resume L2 execution after a VM exit,
using VMRESUME on Intel and VMRUN on AMD.
For testing purpose, implement basic handling of the INVD instruction:
- enable INVD interception on AMD (set all bits in VMCB 00Ch);
- map EXIT_REASON_INVD and VMEXIT_INVD into SYZOS_NESTED_EXIT_REASON_INVD;
- advance L2 RIP to skip to the next instruction.
While at it, perform minor refactorings of L2 exit reason handling.
sys/linux/test/amd64-syz_kvm_nested_vmresume tests the new command by
executing two instructions, INVD and HLT, in the nested VM.
|
| |
|
|
|
|
|
| |
It was useful initially for vendor-agnostic tests, but given that we
have guest_uexit_l2() right before it, we can save an extra L2-L1 exit.
Perhaps this should increase the probability of executing more complex
payloads (fewer KVM_RUN calls to reach the same point in L2 code).
|