| Commit message (Collapse) | Author | Age | Files | Lines |
| | |
|
| |
|
|
| |
It will help us catch broken seeds right in TestParse().
|
| |
|
|
|
| |
Decouple it from syz-manager.
Remove a lot of no longer necessary mutex calls.
|
| |
|
|
|
| |
These no longer make any sense since we only send programs after the
corpus triage.
|
| |
|
|
|
|
|
|
|
|
|
| |
We used to send corpus updates (added/removed elements) to the hub in each sync.
But that produced too much churn since hub algorithm is O(N^2) (distributing everything
to everybody), and lots of new inputs are later removed (either we can't reproduce coverage
after restart, or inputs removed during corpus minimization). So now we don't send new inputs
in each sync, instead we aim at sending corpus once after initial triage. This solves
the problem with non-reproducible/removed inputs. Typical instance life-time on syzbot is <24h,
for such instances we send the corpus once. If an instance somehow lives for longer, then we
re-connect and re-send once in a while (e.g. a local long-running instance).
|
| |
|
|
|
|
| |
If a manager uses fake coverage, don't send its corpus to the hub.
It should be lower quality than coverage-guided corpus.
However still send repros and accept new inputs.
|
| | |
|
| |
|
|
| |
This is a potentially reusable piece of functionality.
|
| |
|
|
|
|
|
| |
Let it be equal to 15 calls for now.
Don't reminimize corpus programs that have fewer calls.
Always reminimize hub programs that no less calls.
|
| |
|
|
| |
Reduce the size of syz-manager.
|
| |
|
|
|
|
|
|
| |
Go package names should generally be singular form:
https://go.dev/blog/package-names
https://rakyll.org/style-packages
https://groups.google.com/g/golang-nuts/c/buBwLar1gNw
|
| |
|
|
|
|
| |
New is more idiomatic name and is shorter
(lines where stats.Create is used are usually long,
so making them a bit shorter is good).
|
| |
|
|
|
|
|
|
|
| |
There have been some mess and duplication around Crash/ReproResult data
structures. As a result, we've been attempting to upload repro failure
logs to the dashboard for bugs, which did not originate from the
dashboard. It litters the syz-manager logs.
Refactor the code.
|
| |
|
|
|
|
|
|
|
|
|
| |
It does happen that we see a long tail of "candidate triage jobs"
during a big influx of syz-hub programs. This is bad because in 10
minutes we'll query another batch, which will further stretch the triage
process.
By delaying the process more and more we're offloading the start of bug
reproduction, so let's control the hub sync process more carefully -
only perform the next query after the previous batch has completed.
|
| |
|
|
| |
Only request them if there's nothing else to reproduce.
|
| |
|
|
|
| |
Rely on instance.Pool to perform fuzzing and do bug reproductions.
Extract the reproduction queue logic to separate testable class.
|
| |
|
|
|
|
|
|
| |
We only need serialized representation on some rare operations
(some web UI pages, and first hub connect). Don't keep them in memory.
In my instance this saves 503MB (15.5%) of heap,
which reduces RSS by 1GB (2x due to GC).
|
| |
|
|
|
|
|
| |
Move all syz-fuzzer logic into syz-executor and remove syz-fuzzer.
Also restore syz-runtest functionality in the manager.
Update #4917 (sets most signal handlers to SIG_IGN)
|
| |
|
|
|
|
|
|
|
| |
The next commit will add another Candidate flag.
Candidate flags duplicate progTypes enum, so to avoid conversions
of one to another use progTypes in Candidate struct directly.
Rename progTypes to progFlags since multiple can be set,
so this is effectively flags rather than a single type.
|
| |
|
|
|
|
|
| |
Start switching from host.Features to flatrpc.Features.
This change is supposed to be a no-op,
just to reduce future diffs that will change
how we obtain features.
|
| |
|
|
|
|
| |
Remove things that are only needed for target VM communication:
conditional compression, timeout scaling, traffic stats.
To minimize diffs when we switch target VM communication to flatrpc.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Fuzzer don't need timeouts for the RPC connection much,
if it does not receive new programs, we will kill it
due to "no output" anyway.
But they are problematic when we do parallel calls (Exchange),
e.g. one call can cancel timeout of an existing call.
They also will be more problematic if we also send
notifications about programs fuzzer started executing in parallel.
And they also marginally slow down things.
Disable timeouts in the fuzzer.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we store whole CheckResult in the manager and send
it back to fuzzers. It is somewhat large for both storing
in memory and sending each time to fuzzers.
We already clear DisabledCalls in CheckResult before storing it
to save space. But we also don't need to store/send EnabledSyscalls.
Currently we use CheckResult.GlobFiles in the fuzzer to update
prog package, but we don't need to do it. That's a leftover from
"fuzzer in the VM" times. We don't generate programs in the fuzzer
anymore.
The only bit we really need it CheckResult.Features, so store/send
just them.
|
| |
|
|
|
|
|
| |
Add ability for each package to create and export own stats.
Each stat is self-contained, describes how it should be presented,
and there is not need to copy them from one package to another.
Stats also keep historical data and allow building graphs over time.
|
| |
|
|
|
|
|
|
| |
RPC compression take up to 10% of CPU time in profiles,
but it's unlikely to be beneficial for local VM runs
(we are mostly copying memory in this case).
Enable RPC compression based on the VM type
(local VM don't use it, remove machines use it).
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Instead of doing fuzzing in parallel in running VM, make all decisions
in the host syz-manager process.
Instantiate and keep a fuzzer.Fuzzer object in syz-manager and update
the RPC between syz-manager and syz-fuzzer to exchange exact programs to
execute and their resulting signal and coverage.
To optimize the networking traffic, exchange mostly only the difference
between the known max signal and the detected signal.
|
| |
|
|
|
|
|
|
| |
pkg/fuzzer and syz-manager have a common corpus functionality that can
be well be unified.
Create a separate pkg/corpus package that would be used by both of them.
It will simplify further work of moving pkg/fuzzer to the host.
|
| |
|
|
|
| |
In case of non-squashed programs we can leverage our descriptions in a
much better way than just blind mutations of binary blobs.
|
| |
|
|
|
|
|
| |
From dashboard we receive logs, from syz-hub - ready reproducers.
If we failed to find a repro from the log, report a failure back
to the dashboard. If we succeeded, prepend the options.
|
| |
|
|
|
|
|
|
|
|
|
| |
There are cases when syz-manager is killed before it could finish bug
reproduction. If the bug is frequent, it's not a problem - we might have
more luck next time. However, if the bug happened only once, we risk
never finding a repro.
Let syz-managers periodically query dashboard for crash logs to
reproducer. Later we can reuse the same API to move repro sharing
functionality out from syz-hub.
|
| |
|
|
|
|
|
| |
If we received an invalid program from the fuzzer, log it as an error.
It should never be happening under normal conditions.
Include the exact error text in log messages.
|
| |
|
|
|
| |
This will help avoid a circular dependency pkg/vcs -> pkg/report ->
pkg/vcs.
|
| |
|
|
| |
Amend oops and oopsFormat to contain report type.
|
| |
|
|
|
|
|
|
|
| |
At times, syz-hub gets broken and no syz-manager instance can connect to
it for quite a while. This basically prevents corpus rotations and
reproducer generation from happening.
If syz-hub is still unreachable after 3 connection attempts,
give up and jump to phaseTriagedHub unconditionally.
|
| |
|
| |
There is no need to use RPC prefix. It is already a part of the element path.
|
| |
|
|
| |
Permit empty hub_key to indicate oauth.
|
| |
|
|
|
|
| |
Add sys/targets.Timeouts struct that parametrizes timeouts throughout the system.
The struct allows to control syscall/program/no output timeouts for OS/arch/VM/etc.
See comment on the struct for more details.
|
| |
|
|
|
| |
r.Progs is not filled anymore (for legacy managers).
Use r.Inputs instead of r.Progs everywhere.
|
| |
|
|
|
|
| |
Actually send domain to the hub...
Update #2095
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Hub input domain identifier (optional).
The domain is used to avoid duplicate work (input minimization, smashing)
across multiple managers testing similar kernels and connected to the same hub.
If two managers are in the same domain, they will not do input minimization after each other.
If additionally they are in the same smashing sub-domain, they will also not do smashing
after each other.
By default (empty domain) all managers testing the same OS are placed into the same domain,
this is a reasonable setting if managers test roughly the same kernel. In this case they
will not do minimization nor smashing after each other.
The setting can be either a single identifier (e.g. "foo") which will affect both minimization
and smashing; or two identifiers separated with '/' (e.g. "foo/bar"), in this case the first
identifier affects minimization and both affect smashing.
For example, if managers test different Linux kernel versions with different tools,
a reasonable use of domains on these managers can be:
- "upstream/kasan"
- "upstream/kmsan"
- "upstream/kcsan"
- "5.4/kasan"
- "5.4/kcsan"
- "4.19/kasan"
Fixes #2095
|
| |
|
|
|
|
|
|
|
| |
We have program "validity" check duplicated 4 times
(initially it was just "does it deserialize?").
Then we added program length and disabled syscall.
But some of the sites have only a subset of checks.
Factor out program checking procedure into a separate function
and use it at all sites.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have _some_ limits on program length, but they are really soft.
When we ask to generate a program with 10 calls, sometimes we get
100-150 calls. There are also no checks when we accept external
programs from corpus/hub. Issue #1630 contains an example where
this crashes VM (executor limit on number of 1000 resources is
violated). Larger programs also harm the process overall (slower,
consume more memory, lead to monster reproducers, etc).
Add a set of measure for hard control over program length.
Ensure that generated/mutated programs are not too long;
drop too long programs coming from corpus/hub in manager;
drop too long programs in hub.
As a bonus ensure that mutation don't produce programs with
0 calls (which is currently possible and happens).
Fixes #1630
|
| |
|
|
|
|
| |
Never send more than 100K, this is never healthy but happens episodically
due to various reasons: problems with fallback coverage, bugs in kcov,
fuzzer exploiting our infrastructure, etc.
|
| |
|
|
|
|
|
|
|
| |
Use a random subset of syscalls/corpus/coverage for each individual VM run.
Hypothesis is that this should allow fuzzer to get more coverage
find more bugs in saturated state (stuck in local optimum).
See the issue and comments for details.
Update #1348
|
| |
|
|
|
| |
Rename some features in preparation for subsequent changes
which will align names across the code base.
|
| |
|
|
|
|
| |
pkg/repro only enables leak checking when report type is MemoryLeak.
Since repros from hub always have Unknown type, repro won't reproduce leaks.
Always set report type to MemoryLeak on leak instances.
|
| |
|
|
|
|
|
|
|
|
|
| |
Over time we relaxed parsing to handle all kinds of invalid programs
(excessive/missing args, wrong types, etc).
This is useful when reading old programs from corpus.
But this is harmful for e.g. reading test inputs as they can become arbitrary outdated.
For runtests which creates additional problem of executing not
what is actually written in the test (or at least what author meant).
Add strict parsing mode that does not tolerate any errors.
For now it just checks excessive syscall arguments.
|
| |
|
|
|
|
|
|
|
|
|
| |
The tool is run as:
$ syz-runtest -config manager.config
This runs all programs from sys/*/test/* in different modes
on actual VMs and checks results.
Fixes #603
|
|
|
Move work with hub into a separate file and fully separate
its state from the rest of the manager state.
First step towards splitting manager into managable parts.
This also required to rework stats as they are used throughout the code.
Update #538
Update #605
|