| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
| |
This commit enables the periodic execution of a leak checker within the executor. The leak checker will now run every
2 * num_procs executions, but only after the corpus has been triaged and all executor processes are in an idle state.
|
| |
|
|
| |
This will reduce code duplication and simplify adding new fields.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As we figured out in #5805, syz-manager treats random incoming RPC
connections as trusted, and will crash if a non-executor client sends
an invalid packet to it.
To address this issue, we introduce another stage of handshake, which
includes a cookie exchange:
- upon connection from an executor, the manager sends a ConnectHello RPC
message to it, which contains a random 64-bit cookie;
- the executor calculates a hash of that cookie and includes it into
its ConnectRequest together with the other information;
- before checking the validity of ConnectRequest, the manager ensures
client sanity (passed ID didn't change, hashed cookie has the expected
value)
We deliberately pick a random cookie instead of a magic number: if the
fuzzer somehow learns to send packets to the manager, we don't want it to
crash multiple managers on the same machine.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We query globs for 2 reasons:
1. Expand glob types in syscall descriptions.
2. Dynamic file probing for automatic descriptions generation.
In both of these contexts are are interested in files
that will be present during test program execution
(rather than normal unsandboxed execution).
For example, some files may not be accessible to test programs
after pivot root. On the other hand, we create and link
some additional files for the test program that don't
normally exist.
Add a new request type for querying of globs that are
executed in the test program context.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After 9fc8fe026baa ("executor: better handling for hanged test
processes"), yz-executor's responses may reference procids outside of
the [0;procs] range.
If procids are no longer dense on the syz-executor side, we cannot rely
on this check in pkg/rpcserver:
```
if avoid == (uint64(1)<<runner.procs)-1 {
avoid = 0
}
```
Signed-off-by: Andrei Vagin <avagin@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we kill hanged processes and consider the corresponding test finished.
We don't kill/wait for the actual test subprocess (we don't know its pid to kill,
and waiting will presumably hang). This has 2 problems:
1. If the hanged process causes "task hung" report, we can't reproduce it,
since the test finished too long ago (manager thinks its finished and
discards the request).
2. The test process still consumed per-pid resources.
Explicitly detect and handle such cases:
Manager keeps these hanged tests forever,
and we assign a new proc id for future processes
(don't reuse the hanged one).
|
| |
|
|
|
| |
Replace just the SYZFAIL part instead of the whole message.
This makes debugging of things easier.
|
| |
|
|
|
|
| |
Using actual VM indices for VM identification allows to match these indices to VMs in the pool,
allows to use dense arrays to store information about runners (e.g. in queue.Distributor),
and just removes string names as unnecessary additional entities.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we force restart in rpcserver, but this has 2 problems:
1. It does not know the proc where the requets will land.
2. It does not take into account if the proc has already restarted
recently for other reasons.
Restart procs in executor only if they haven't restarted recenlty.
Also make it deterministic. Given all other randomess we have,
there does not seem to be a reason to use randomized restarts
and restart after fewer/more runs.
Also restart only after corpus triage.
Corpus triage is slow already and there does not seem to be enough
benefit to restart during corpus triage.
Also restart at most 1 proc at a time,
since there are lots of serial work in the kernel.
|
| |
|
|
| |
Distribute triage requests to different VMs.
|
| | |
|
| |
|
|
|
| |
There are also synchnous fatal signals that can happen due to bugs
in executor code. So handle them as SIGSEGV.
|
| |
|
|
|
| |
This will allow to reuse finish_output function for snapshot mode as well.
NFC
|
| | |
|
| |
|
|
|
| |
Don't print SYZFAIL messages during machine check.
Otherwise each of them is detected as a bug.
|
| |
|
|
|
|
|
| |
Signal rotation is intended to make the fuzzer re-discover flaky coverage
in non flaky way. However, taking into accout that we get effectively
the same effect after each manager restart, and that the fuzzer is overloaded
with triage/smash jobs, it does not look to be worth it.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We see some errors of the form:
SYZFAIL: coverage filter is full
pc=0x80007000c0008 regions=[0xffffffffbfffffff 0x243fffffff 0x143fffffff 0xc3fffffff] alloc=156
Executor shouldn't send non kernel addresses in signal,
but somehow it does. It can happen if the VM memory is corrupted,
or if the test program does something very nasty (e.g. discovers
the output region and writes to it).
It's not possible to reliably filter signal in the tested VM.
Move all of the filtering logic to the host.
Fixes #4942
|
| |
|
|
|
|
| |
SIGBUS means OOM on Linux.
Most of the crashes that happen during fuzzing are SIGBUS,
so separate them from SIGSEGV and suppress.
|
| |
|
|
|
|
| |
It's a more general name that says what happened
rather than a detail of what excutor should do.
We can use this notification for other things as well.
|
| |
|
|
|
|
|
|
|
|
|
| |
There is a quirk related to posix_spawn_file_actions_adddup2:
it just executes the specified dup's in order in the child process.
In our case we do dups as follows:
20 -> 4 (output region)
4 -> 5 (max signal)
So we dup the output region onto 4 first, and then dup the same output region
(fd 4 becomes the output region) onto 5 (max signal).
So we have output region as both output region and max signal.
|
| |
|
|
|
|
|
| |
Coverage setup fails with exitf if not supported.
Currently we consider it as transient error that needs to be retried.
As the result we reach 20 attempts and crash the VM.
Return an error in such case instead.
|
| |
|
|
|
|
|
|
| |
OpenBSD says:
executor/executor_runner.h:750:51: error: no member named 'uc_mcontext' in 'sigcontext'
auto& mctx = static_cast<ucontext_t*>(ucontext)->uc_mcontext;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 215eef4ad85fb6124af70d1e5c9729b69554a32b.
The gvisor "stdin" address still crashes in executor
Connection::Connect on atoi(ports) with ports == NULL.
The gvisor "stdin" address is not tested, so it's better to make it less
special rather than add more special cases in manager, executor,
and now also in Connection to handle it.
It still may crash in future after some changes.
|
| |
|
|
| |
It is returned from vm/gvisor.
|
|
|
Move all syz-fuzzer logic into syz-executor and remove syz-fuzzer.
Also restore syz-runtest functionality in the manager.
Update #4917 (sets most signal handlers to SIG_IGN)
|