| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
| |
To be useful, the stat should have a different name depending on the VM
pool name.
|
| |
|
|
|
| |
This will let us see executor restart statistics per VM pool (relevant
for diff fuzzing).
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As we figured out in #5805, syz-manager treats random incoming RPC
connections as trusted, and will crash if a non-executor client sends
an invalid packet to it.
To address this issue, we introduce another stage of handshake, which
includes a cookie exchange:
- upon connection from an executor, the manager sends a ConnectHello RPC
message to it, which contains a random 64-bit cookie;
- the executor calculates a hash of that cookie and includes it into
its ConnectRequest together with the other information;
- before checking the validity of ConnectRequest, the manager ensures
client sanity (passed ID didn't change, hashed cookie has the expected
value)
We deliberately pick a random cookie instead of a magic number: if the
fuzzer somehow learns to send packets to the manager, we don't want it to
crash multiple managers on the same machine.
|
| |
|
|
|
| |
Dump the whole flatrpc.ConnectRequest to the logs, so that we can
better understand the cause of #5805
|
| |
|
|
|
| |
Running it from the VM context causes its cancellation each time VM
crashes or the connection is aborted.
|
| |
|
|
|
| |
On context abortion, return a special error.
On the pkg/rpcserver side, recognize and process it.
|
| |
|
|
|
| |
If an instance crashed during machine check, that should not normally
abort all RPCServer operation.
|
| |
|
|
| |
Apply necessary changes to pkg/flatrpc and pkg/manager as well.
|
| |
|
|
|
|
|
| |
The context is assumed to be passed into the function doing the actual
processing. Refactor vminfo to follow this approach.
This will help refactor pkg/rpcserver later.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We query globs for 2 reasons:
1. Expand glob types in syscall descriptions.
2. Dynamic file probing for automatic descriptions generation.
In both of these contexts are are interested in files
that will be present during test program execution
(rather than normal unsandboxed execution).
For example, some files may not be accessible to test programs
after pivot root. On the other hand, we create and link
some additional files for the test program that don't
normally exist.
Add a new request type for querying of globs that are
executed in the test program context.
|
| |
|
|
|
|
|
|
|
|
|
| |
Few assorted changes to reduce future diffs:
- add rpcserver.RemoteConfig similar to LocalConfig
(there are too many parameters)
- add CheckGlobs to requesting additional globs from VMs
- pass whole InfoRequest to the MachineChecked callback
so that it's possible to read globs information
- add per-mode config checking in the manager
- add Manager.saveJson helper
|
| |
|
|
|
| |
We have a /modules link in the manager, but it's not exposed anywhere.
Add a stat with this link.
|
| | |
|
| |
|
|
|
| |
It will enable collecting statistics for several simultaneous RPCServer
objects.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we kill hanged processes and consider the corresponding test finished.
We don't kill/wait for the actual test subprocess (we don't know its pid to kill,
and waiting will presumably hang). This has 2 problems:
1. If the hanged process causes "task hung" report, we can't reproduce it,
since the test finished too long ago (manager thinks its finished and
discards the request).
2. The test process still consumed per-pid resources.
Explicitly detect and handle such cases:
Manager keeps these hanged tests forever,
and we assign a new proc id for future processes
(don't reuse the hanged one).
|
| |
|
|
|
|
|
|
|
| |
It does sometimes happen that the kernel is crashed so fast that
syz-manager is not notified that the syz-executor has started running
the faulty input.
In cases when the exact program is known from Comm, let's make sure it's
always present in the log of the last executed programs.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added more test coverage of the package and created an interface of
rpcserver to use it as the dependency (for syz-manager).
Also tried to cover with tests a private method handleConn(),
though it calls handleRunnerConn which has a separate logic in
Handshake(), which within handleConn() unit test we should've mocked.
This will require a refactoring of `runners map[int]*Runner` and
runner.go in general with a separate interface which we can mock as
well.
General idea is to have interfaces of Server (rpc), Runner etc. and mock a
compound logic like Handshake during a separate public (or private if it
has callable, if-else logic) method unit-testing.
|
| |
|
|
|
|
| |
Using actual VM indices for VM identification allows to match these indices to VMs in the pool,
allows to use dense arrays to store information about runners (e.g. in queue.Distributor),
and just removes string names as unnecessary additional entities.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we force restart in rpcserver, but this has 2 problems:
1. It does not know the proc where the requets will land.
2. It does not take into account if the proc has already restarted
recently for other reasons.
Restart procs in executor only if they haven't restarted recenlty.
Also make it deterministic. Given all other randomess we have,
there does not seem to be a reason to use randomized restarts
and restart after fewer/more runs.
Also restart only after corpus triage.
Corpus triage is slow already and there does not seem to be enough
benefit to restart during corpus triage.
Also restart at most 1 proc at a time,
since there are lots of serial work in the kernel.
|
| |
|
|
| |
Distribute triage requests to different VMs.
|
| | |
|
| | |
|
| |
|
|
|
| |
These stats will be needed for snapshot mode that does not use rpcserver.
Move them from pkg/rpcserver to pkg/fuzzer/queue.
|
| |
|
|
|
|
|
|
| |
Go package names should generally be singular form:
https://go.dev/blog/package-names
https://rakyll.org/style-packages
https://groups.google.com/g/golang-nuts/c/buBwLar1gNw
|
| |
|
|
|
|
| |
New is more idiomatic name and is shorter
(lines where stats.Create is used are usually long,
so making them a bit shorter is good).
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
For local rpcserver runs, we do not reboot the executor in case of
errors. Moreover, if the error did not lead to the executor process
exit, we may never detect that something went wrong.
Return an error channel from CreateInstance() to be able to act on
connection loop errors.
Explicitly register the instance during local executions and exit from
RunLocal() in case of connection problems.
|
| |
|
|
|
|
|
|
| |
In some cases, the executor seems to be mysteriously silent when we were
awaiting a reply.
During pkg/runtest tests, give it 1 minute to prepare a reply, then try
to request the current state and abort the connection.
|
| |
|
|
|
| |
Rely on instance.Pool to perform fuzzing and do bug reproductions.
Extract the reproduction queue logic to separate testable class.
|
| |
|
|
|
|
|
| |
The pool operates on a low level and assumes that there's one default
activity (=fuzzing) that is performed by the VMs and that there are
also occasional non-default activities that must be performed by some
VMs (=bug reproduction).
|
| | |
|
| |
|
|
| |
The object enables a graceful shutdown of machine checks.
|
| |
|
|
|
| |
For fuzzing, we don't strictly need the kernel directory or the kernel
object file. We just need a disk/kernel image.
|
| |
|
|
|
| |
Make it explicit which methods of Runner refer to its implementation and
which are supposed to be invoked by its users.
|
| | |
|
| |
|
|
| |
This allows for a more clean interface between RPCServer and Runner.
|
| | |
|
| | |
|
| |
|
|
|
|
|
| |
Signal rotation is intended to make the fuzzer re-discover flaky coverage
in non flaky way. However, taking into accout that we get effectively
the same effect after each manager restart, and that the fuzzer is overloaded
with triage/smash jobs, it does not look to be worth it.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We see some errors of the form:
SYZFAIL: coverage filter is full
pc=0x80007000c0008 regions=[0xffffffffbfffffff 0x243fffffff 0x143fffffff 0xc3fffffff] alloc=156
Executor shouldn't send non kernel addresses in signal,
but somehow it does. It can happen if the VM memory is corrupted,
or if the test program does something very nasty (e.g. discovers
the output region and writes to it).
It's not possible to reliably filter signal in the tested VM.
Move all of the filtering logic to the host.
Fixes #4942
|
| | |
|
| |
|
|
|
|
| |
It's a more general name that says what happened
rather than a detail of what excutor should do.
We can use this notification for other things as well.
|
| |
|
|
|
|
|
| |
Split out most of the Runner functionality into a separate file.
This should make it easier to reason about what rpcserver.go does and
it also makes further pkg/rpcserver refactoring simpler.
|
| | |
|
|
|
Move all syz-fuzzer logic into syz-executor and remove syz-fuzzer.
Also restore syz-runtest functionality in the manager.
Update #4917 (sets most signal handlers to SIG_IGN)
|