syz - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	all: use any instead of interface{}	Dmitry Vyukov	2025-12-22	1	-4/+4
\| \| \| \|	Any is the preferred over interface{} now in Go.
*	pkg/repro: validate the result	Aleksandr Nogikh	2025-06-26	2	-11/+139
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The whole pkg/repro algorithm is very sensitive to random kernel crashes, yet all other parts of the system rely on pkg/repro reproducers being reliable enough to draw meaningful conclusions from running them. A single unrelated kernel crash during repro extraction may divert the whole process since all the checks we do during the process (e.g. during minimization or when we drop prog opts) assume that if the kernel didn't crash, it was due to the fact that the removed part was essential for reproduction, and not due to the fact that our reproducer is already broken. Since such problem may happen at any moment, let's do a single validation check at the very end of repro generation. Overall, these cases are not super frequent, so it's not worth it to re-check every step. Calculate the reliability score of thre reproducer and use a 15% default cut-off for flaky results.
*	pkg/repro: add logging to repro tests	Aleksandr Nogikh	2025-06-26	2	-9/+17
\| \| \| \|	This will aid in debugging the tests that failed.
*	syz-cluster: report reproducers for findings	Aleksandr Nogikh	2025-06-23	1	-0/+12
\| \| \| \| \|	Move C repro generation from syz-manager to pkg/repro to avoid code duplication.
*	pkg/repro: abort prog minimization on error	Aleksandr Nogikh	2025-04-24	1	-1/+16
\| \| \| \| \|	If an error happened during prog minimization, abort it instead of trying to proceed further.
*	vm/dispatcher: make pool.Run cancellable	Aleksandr Nogikh	2025-04-23	2	-3/+8
\| \| \| \| \| \| \| \| \| \|	Make the pool.Run() function take a context.Context to be able to abort the callback passed to it or abort its scheduling if it's not yet running. Otherwise, if the callback is not yet started and the pool's Loop is aborted, we risk waiting for pool.Run() forever. It prevents the normal shutdown of repro.Run() and, consequently, the DiffFuzzer functionality.
*	tools/syz-execprog: support running unsafe programs	Dmitry Vyukov	2024-11-26	1	-1/+1
\|
*	pkg/repro: accept a cancellable context	Aleksandr Nogikh	2024-11-13	3	-83/+80
\| \| \| \| \| \| \| \| \| \|	Refactor pkg/repro to accept a context.Context object. This will make it look more similar to other package interfaces and will eventually let us abort currently running repro jobs without having to shut down the whole application. Simplify the code by factoring out the parameters common both to RunSyzRepro() and RunCRepro().
*	pkg/manager: set more http fields before calling Serve	Dmitry Vyukov	2024-11-07	2	-4/+3
\| \| \| \| \| \| \| \| \|	Pools and ReproLoop and always created on start, so there is no need to support lazy set for them. It only complicates code and makes it harder to reason about. Also introduce vm.Dispatcher as an alias to dispatcher.Pool, as it's the only specialization we use in the project.
*	pkg/repro: adjust log levels	Aleksandr Nogikh	2024-10-25	1	-4/+4
\| \| \| \| \|	Some of the levels were just too high, especially considering that the messages are printed via log.Logf().
*	pkg/repro: add a fast mode	Aleksandr Nogikh	2024-10-25	1	-23/+51
\| \| \| \| \|	It's to be used in case a quick reproducer is necessary. It omits C repro generation and a number of option simplifications.
*	executor: better handling for hanged test processes	Dmitry Vyukov	2024-10-24	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we kill hanged processes and consider the corresponding test finished. We don't kill/wait for the actual test subprocess (we don't know its pid to kill, and waiting will presumably hang). This has 2 problems: 1. If the hanged process causes "task hung" report, we can't reproduce it, since the test finished too long ago (manager thinks its finished and discards the request). 2. The test process still consumed per-pid resources. Explicitly detect and handle such cases: Manager keeps these hanged tests forever, and we assign a new proc id for future processes (don't reuse the hanged one).
*	pkg/repro: be strict about titles during opt simplifications	Aleksandr Nogikh	2024-08-27	1	-38/+48
\| \| \| \| \| \| \| \| \| \|	Ideally, we should be mindful of that during the whole repro process, but there's always a chance that different titles are the manifestations of the same problem. So let's stay tolerant to different titles during prog extraction and minimization, but carefully check them during opt simplifications and C repro extraction.
*	pkg/repro: don't exaggerate timeouts	Aleksandr Nogikh	2024-08-27	1	-28/+34
\| \| \| \| \| \| \|	Our largest timeout is 6 minutes, so anything between 1.5 minutes and 6 ended up having a 9 minute timeout. That's too much. Consider the time it actually took to crash the kernel.
*	pkg/repro: increase the minimum testing timeouts	Aleksandr Nogikh	2024-08-27	1	-2/+2
\| \| \| \| \|	15 seconds is an unreasonably small timeout. Let's do at least 30 seconds first, then at least 100 seconds.
*	pkg/repro: consider the executor info from the crash report	Aleksandr Nogikh	2024-08-27	1	-32/+63
\| \| \| \| \| \|	1) If we know the tentative reproducer, try only it before the bisection. It's the best single candidate program. 2) During bisection, never drop the program.
*	prog: replace MinimizeParams with MinimizeMode	Dmitry Vyukov	2024-08-07	1	-17/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	All callers shouldn't control lots of internal details of minimization (if we have more params, that's just more variations to test, and we don't have more, params is just a more convoluted way to say if we minimize for corpus or a crash). 2 bools also allow to express 4 options, but only 3 make sense. Also when I see MinimizeParams{} in the code, it's unclear what it means. Replace params with mode. And potentially "crash" minimization is not "light", it's just different. E.g. we can simplify int arguments for reproducers (esp in snapshot mode), but we don't need that for corpus.
*	syz-manager: move fullReproLog() to pkg/repro	Aleksandr Nogikh	2024-08-06	1	-0/+10
\|
*	syz-manager: still ignore log parse problems	Aleksandr Nogikh	2024-07-17	1	-1/+3
\| \| \| \| \|	It seems that this error may come up in absolutely valid and reasonable cases. Restore the special casing.
*	syz-manager: refactor empty crash log errors	Aleksandr Nogikh	2024-07-17	1	-3/+1
\| \| \| \| \|	Now that we do not take the programs from the SSH-based logs, the error does look surprising, so let's print it with log.Errorf().
*	pkg/repro: don't minimize to 0 calls	Aleksandr Nogikh	2024-07-15	1	-0/+5
\| \| \| \| \| \|	Minimizing to 0 calls leads to an empty execution log, which leads to an immediate exit of tools/syz-execprog, which would be recognized as "lost connection to machine".
*	all: transition to instance.Pool	Aleksandr Nogikh	2024-07-11	3	-162/+118
\| \| \| \| \|	Rely on instance.Pool to perform fuzzing and do bug reproductions. Extract the reproduction queue logic to separate testable class.
*	pkg/repro: avoid hitting the prog.MaxCalls limit	Aleksandr Nogikh	2024-05-27	2	-15/+94
\| \| \| \| \| \| \| \| \| \|	When we combine the progs found during prog bisection, there's a chance that we may exceed the prog.MaxCalls limit. In that case, we get a SYZFATAL error and proceed as if it were the target crash. That's absolutely wrong. Let's first minimize each single program before concatenating them, that should work for almost all cases.
*	prog: make minimization parameters explicit	Aleksandr Nogikh	2024-05-27	1	-1/+1
\| \| \| \|	Add an explicit parameter to only run call removal.
*	pkg/instance: use execprog to do basic instance testing	Dmitry Vyukov	2024-05-27	3	-5/+8
\| \| \| \| \| \| \| \| \| \| \|	When we accept new kernels for fuzzing we need more extensive testing, but syz-ci switched to using syz-manager for this purpose. Now instance testing is used only for bisection and patch testing, which does not need such extensive image testing (it may even harm). So just run a simple program as a testing. It also uses the same features as the target reproducer, so e.g. if the reproducer does not use wifi, we won't test it, which reduces changes of unrelated kernel bugs.
*	pkg/csource: remove the Repro option	Aleksandr Nogikh	2024-05-17	1	-1/+0
\| \| \| \|	Enable it unconditionally.
*	pkg/repro: don't clear the Repro flag	Aleksandr Nogikh	2024-05-17	1	-5/+0
\| \| \| \| \|	If C reproducers keep on printing "executing program" lines, it will be easier to re-use them during the repro and patch testing.
*	pkg/repro, pkg/ipc: use flatrpc.Feature	Dmitry Vyukov	2024-05-06	2	-33/+32
\| \| \| \| \| \| \|	Start switching from host.Features to flatrpc.Features. This change is supposed to be a no-op, just to reduce future diffs that will change how we obtain features.
*	vm: combine Run and MonitorExecution	Dmitry Vyukov	2024-04-11	1	-1/+1
\| \| \| \| \| \|	All callers of Run always call MonitorExecution right after it. Combine these 2 methods. This allows to hide some implementation details and simplify users of vm package.
*	pkg/repro: check reproducibility before bisecting	Aleksandr Nogikh	2024-02-22	1	-1/+11
\| \| \| \| \| \|	In many cases bisection does not seem to bring any results, but it takes quite a while to run. Let's save some time by running the whole log before the proces.
*	pkg/repro: abort after 6 chunks	Aleksandr Nogikh	2024-02-05	2	-2/+2
\| \| \| \|	It feels like 8 is a bit too high number. Let's stop reproduction earlier.
*	pkg/repro: fix VM creation	Dmitry Vyukov	2024-01-09	1	-0/+1
\|
*	pkg/repro: fix potential deadlock	Dmitry Vyukov	2024-01-08	1	-13/+5
\| \| \| \| \| \| \| \|	If an instance fails to boot 3 times in a row, we drop it on the floor. If we drop all instances this way, the repro process deadlocks infinitly. Retry infinitly instead. There is nothing else good we can do in this case. If the instances becomes alive again, repro will resume.
*	pkg/repro: make instance falure log less confusing	Aleksandr Nogikh	2023-09-11	1	-1/+2
\| \| \| \| \|	Now it looks like a failure of the whole reproduction process. Adjust the message to reduce confusion.
*	all: use special placeholder for errors	Taras Madan	2023-07-24	1	-2/+2
\|
*	pkg/repro: tolerate two consequential run errors	Aleksandr Nogikh	2023-07-20	2	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Retrying once has greatly reduced the number of "failed to copy prog to VM" errors, but they still periodically pop up. The underlying problem is still not 100% known. Supposedly, if a booted VM with an instrumented kernel has to wait too long, it can just hang or crash by itself. At least on some problematic revisions. Investigation wouldbe quite time-consuming -- we need to do a complicated refactoring in order to also capture serial output for Copy() failures. So far it does not seem to be totally worth it. Let's do 3 runOnInstance() attempts. If the problem still persists, there's no point in doing more runs -- we'd have to determine the exact root cause.
*	pkg/report: request VMs outside of createInstances()	Aleksandr Nogikh	2023-07-12	1	-26/+28
\| \| \| \| \| \| \| \| \| \| \|	In the current code, there's a possibility that we write to ctx.bootRequests after it was quickly closed. That could happen when we immediately abort the reproduction process after it's started. To avoid this, don't send elements over the bootRequests channel in the createInstances() function. Hopefully closes #4016.
*	pkg/repro: use the generic minimization algorithm	Aleksandr Nogikh	2023-07-07	2	-109/+25
\|
*	pkg/report: move report.Type to pkg/report/crash	Aleksandr Nogikh	2023-07-05	1	-7/+9
\| \| \| \| \|	This will help avoid a circular dependency pkg/vcs -> pkg/report -> pkg/vcs.
*	pkg/report: extract more report types for Linux	Aleksandr Nogikh	2023-07-05	1	-1/+1
\| \| \| \|	Amend oops and oopsFormat to contain report type.
*	all: support swap feature on Linux	Aleksandr Nogikh	2023-06-15	1	-0/+11
\| \| \| \| \|	If the feature is supported on the device, allocate a 128MB swap file after VM boot and activate it.
*	syz-manager: don't report all pkg/repro errors on shutdown	Aleksandr Nogikh	2023-06-09	1	-1/+3
\| \| \| \| \|	Otherwise we're getting "repro failed: all VMs failed to boot" pkg/repro errors if a sykaller instance is shutting down.
*	pkg/repro: retry on ExecProgInstance errors	Aleksandr Nogikh	2023-05-25	2	-15/+77
\| \| \| \| \| \| \|	Most of those errors seem to be transient, so there's no sense to fail the whole C repro generation process. Give it one more chance and only fail after that.
*	pkg/report: restructure the shutdown process	Aleksandr Nogikh	2023-05-25	2	-8/+14
\| \| \| \| \| \| \|	Only use ctx.bootRequests to indicate that no further VMs are needed. Do not return from Run() until we have fully stopped the VM creation loop as there's a risk it might interfere with fuzzing.
*	pkg/repro: add a smoke test	Aleksandr Nogikh	2023-05-25	2	-6/+117
\| \| \| \| \|	This is a sanity test for the overall pkg/repro machinery. It does not focus on minor corner cases.
*	pkg/repro: factor out an interface	Aleksandr Nogikh	2023-05-25	1	-8/+15
\| \| \| \| \|	Interact with a syz-execprog instance via an additional interface. This will simplify testing.
*	pkg/repro: refactor the Run() method	Aleksandr Nogikh	2023-05-25	1	-45/+64
\| \| \| \| \| \|	Split Run() into several functions to facilitate testing. This commit does not introduce any functional changes.
*	pkg/report: don't record error for empty repro log case	Aleksandr Nogikh	2023-05-25	1	-1/+4
\| \| \| \| \|	It's not entirely normal, but it can still happen and it's not a big problem by itself. Let's not pollute our error logs.
*	pkg/testutil: add RandSource helper	Dmitry Vyukov	2022-11-23	1	-9/+2
\| \| \| \| \|	The code to send rand source is dublicated in several packages. Move it to testutil package.
*	executor: add NIC PCI pass-through VF support	George Kennedy	2022-09-21	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for moving a NIC PCI pass-through VF into Syzkaller's network namespace so that it will tested. As DEVLINK support is triggered by setting the pass-through device to "addr=0x10", NIC PCI pass-through VF support will be triggered by setting the device to "addr=0x11". If a NIC PCI pass-through VF is detected in do_sandbox, setup a staging namespace before the fork() and transfer the NIC VF interface to it. After the fork() and in the child transfer the NIC VF interface to Syzkaller's network namespace and rename the interface to netpci0 so that it will be tested. Signed-off-by: George Kennedy <george.kennedy@oracle.com>