| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
Support for:
- polling for AI jobs
- handling completion of AI jobs
- submitting job trajectory logs
- basic visualization for AI jobs
|
| |
|
|
| |
Any is the preferred over interface{} now in Go.
|
| |
|
|
| |
Remove duplicated code related to request deserialization using middleware.
|
| |
|
|
|
|
| |
#6070 explains the problem of data propagation.
1. Add weekly /cron/update_coverdb_subsystems.
2. Stop updating subsystems from coverage receiver API.
|
| |
|
|
|
|
| |
Don't specify the subsystem revision in the dashboard config and instead
let it be nested in the registered subsystems. This reduces the amount
of the manual work needed to switch syzbot to a newer subsystem list.
|
| |
|
|
|
|
|
| |
For some reason, the default ReplyTo field is converted to a Reply-To
header that's not well understood by the mailing lists.
Set an In-Reply-To header explicitly.
|
| |
|
|
|
| |
The API method can be used to send a raw email on behalf of the GAE
instance.
|
| |
|
|
| |
getSpannerClient returns prod client as a default.
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
| |
1. Init coveragedb client once and propagate it through context to enable mocking.
2. Always init coverage handlers. It simplifies testing.
3. Read webGit and coveragedb client from ctx to make it mockable.
4. Use int for file line number and int64 for merged coverage.
5. Add tests.
|
| |
|
|
| |
They are shorter, more readable, and don't require temp vars.
|
| |
|
|
|
| |
All API handlers expect input stream to be ungzipped.
This handler shouldn't be the exception.
|
| |
|
|
| |
Bug: payload is a serialized json, not string.
|
| |
|
|
| |
create_upload_url: failed to unmarshal response: json: cannot unmarshal number into Go value of type string
|
| |
|
|
|
|
|
|
| |
1. Make interface testable.
2. Add Spanner interfaces.
3. Generate mocks for proxy interfaces.
4. Test SaveMergeResult.
5. Test MergeCSVWriteJSONL and coveragedb.SaveMergeResult integration.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previous implementation store only the summary of processed records.
The summary was <1GB and single processing node was able to manipulate the data.
Current implementation stores all the details about records read to make post-processing more flexible.
This change was needed to get access to the source manager name and will help to analyze other details.
This new implementation requires 20GB mem to process single day records.
CSV log interning experiment allowed to merge using 10G.
Quarter data aggregation will cost ~100 times more.
The alternative is to use stream processing. We can process data kernel-file-by-file.
It allows to /15000 memory consumption.
This approach is implemented here.
We're batching coverage signals by file and store per-file results in GCS JSONL file.
See https://jsonlines.org/ to learn about jsonl.
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
|
| |
I don't see any visible problems but the records in DB are not created.
Let's report the amount of records created at the end of the batch step.
+log the names of the managers
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Syscall attributes are extended with a fsck command field which lets
file system mount definitions specify a fsck-like command to run. This
is required because all file systems have a custom fsck command
invokation style.
When uploading a compressed image asset to the dashboard, syz-manager
also runs the fsck command and logs its output over the dashapi.
The dashboard logs these fsck logs into the database.
This has been requested by fs maintainer Ted Tso who would like to
quickly understand whether a filesystem is corrupted or not before
looking at a reproducer in more details. Ultimately, this could be used
as an early triage sign to determine whether a bug is obviously
critical.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use a common number of attempts (10) everywhere in dashboard/app
for db.RunInTransaction() with the custom wrapper runInTransaction().
This should fix the issues when the ErrConcurrentTransaction occurs
after a default number of attempts (3). If the max. limit (10) is not
enough, then it should hint on problem at the tx transaction function itself
for the further debugging.
For very valuable transactions in createBugForCrash(), reportCrash(),
let's leave 30 attempts as it is currently.
Fixes: https://github.com/google/syzkaller/issues/5441
|
| |
|
|
|
|
|
|
|
|
|
| |
Instead of iterating over all possible seq values, first query the
maximum existing value outside of the transaction and then only consider
the higher numbers inside the transaction.
That should avoid "operating on too many entity groups in a single
transaction" errors that we observe on syzbot.
Closes #5440.
|
| |
|
|
| |
Make it return a single slice.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
For some kinds of logs (e.g. crash logs), it's preferable to leave the
tail part because it has more important information.
Since for other kinds of logs we don't expect any cuts to happen, let's
just always cut like this.
Refactor the putText() function and its usages in the dashboard.
Closes #5250.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
If there are non-revoked reproducers, we will always prefer them to the revoked
ones. And we do update the priority once we have revoked
a reproducer. But if the only reproducer was revoked,
we still give that crash a higher priority than other crashes,
which never had a reproducer attached.
If "repro revoked" should have the same priority as "no repro",
then we just need to update Crash.UpdateReportingPriority.
Fixes: https://github.com/google/syzkaller/issues/4992.
|
| | |
|
| |
|
|
| |
Let's resist the timing attacks.
|
| | |
|
| |
|
|
| |
We're currently receiving only "unauthorized".
|
| |
|
|
| |
dashboard/app knows about subsystems more
|
| | |
|
| |
|
|
| |
Filter out the obviously unnecessary bugs.
|
| |
|
|
|
|
|
| |
As it is problematic to set up automatic bidirectional sharing of
reproducer files between namespaces, let's support the ability to
manually request a reproduction attempt on a specific syz-manager
instance. That should help in the time being.
|
| |
|
|
|
| |
This will help us understand how exactly how we have arrived at the
reproducer.
|
| | |
|
| | |
|
| |
|
|
|
| |
We're periodically seeing "failed to report build error" errors on the
ci, but the error message is missing necessary details.
|
| |
|
|
|
|
|
|
| |
Poll missing backport commits and, once a missing backport is found to
be present in the fuzzed trees, don't display it on the web dashboard
and send an email with the suggestion to close the bug.
Closes #4425.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
In some cases, we derail during bug reproduction and end up finding and
reporting a reproducer for another issue. It causes no problems since
it's bucketed using the new title, but it's difficult to trace such
situations - on the original bug page, there are no failed reproduction
logs and on the new bug page it looks as if we intended to find a
reproducer for the bug with the new title.
Let's record bug reproduction logs in this case, so that we'd also see
a failed bug reproduction attempt on the original bug page.
|
| |
|
|
|
|
|
|
|
|
|
| |
There are cases when syz-manager is killed before it could finish bug
reproduction. If the bug is frequent, it's not a problem - we might have
more luck next time. However, if the bug happened only once, we risk
never finding a repro.
Let syz-managers periodically query dashboard for crash logs to
reproducer. Later we can reuse the same API to move repro sharing
functionality out from syz-hub.
|
| |
|
|
|
| |
This statistics allows us to better estimate the amount of coverage that
is lost after every syzbot instance is restarted.
|
| |
|
|
|
|
|
|
| |
Add an emergency stop button that can be used by any admin. After it's
clicked two times, syzbot stops all reporting and recoding of new bugs.
It's assumed that the stop mode is revoked by manually deleting an entry
from the database.
|
| |
|
|
|
|
|
|
|
|
| |
The query may take up to 100ms in some cases, while the result changes
on quite rare occasions.
Let's use a cached version of the data when rendering UI pages.
We don't need extra tests because it's already excercised in existing
tests that trigger web endpoints.
|
| |
|
|
|
| |
Now that we mock the config as a whole and not parts of it, these
functions have boiled down to 1-liners. We don't need them anymore.
|
| |
|
|
|
| |
In many cases we want to just access the namespaces's config.
Introduce a special helper function to keep code shorter and more conscise.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We used to have a single global `config` variable and access it
throughout the whole dashboard application.
However, this approach has been more and more complicated test writing
-- sometimes we want the config to be only slightly different, so that
it's not worth it adding new namespaces, sometimes we have to test how
dashboard handles config changes over time.
This has already led to a number of hacky contextWithXXX methods that
mocked various parts of the global variable. The rest of the code had to
sometimes still use `config` directly and sometimes invoke getXXX(c)
methods. This is very inconsistent and prone to errors.
With more and more situations where we need to patch the config
appearing (see #4118), let's refactor the application to always access
config via the getConfig(c) method. This allows us to uniformly patch
the config and be sure that the non-patched copy is not accessible from
anywhere else.
|
| |
|
|
|
|
|
|
| |
If there's no config record for the manager, activeManager would return
a nil pointer to ConfigManager. Assume the priority to be 0 in this
case.
Update the admin.go function that recalculates priorities.
|