diff options
| author | Aleksandr Nogikh <nogikh@google.com> | 2025-06-23 15:27:39 +0200 |
|---|---|---|
| committer | Aleksandr Nogikh <nogikh@google.com> | 2025-06-26 12:49:47 +0000 |
| commit | c3f8bb06ed4d6ad6211729efcd0d5aa4a26d5c4d (patch) | |
| tree | f6f96131b5eb73e696dd87cf812b5756e556d106 /pkg/repro/repro.go | |
| parent | 5fd74b4cc1dd3fa8867f53dbbb67c950188250de (diff) | |
pkg/repro: validate the result
The whole pkg/repro algorithm is very sensitive to random kernel
crashes, yet all other parts of the system rely on pkg/repro reproducers
being reliable enough to draw meaningful conclusions from running them.
A single unrelated kernel crash during repro extraction may divert the
whole process since all the checks we do during the process (e.g. during
minimization or when we drop prog opts) assume that if the kernel didn't
crash, it was due to the fact that the removed part was essential for
reproduction, and not due to the fact that our reproducer is already
broken.
Since such problem may happen at any moment, let's do a single
validation check at the very end of repro generation. Overall, these
cases are not super frequent, so it's not worth it to re-check every
step.
Calculate the reliability score of thre reproducer and use a 15% default
cut-off for flaky results.
Diffstat (limited to 'pkg/repro/repro.go')
| -rw-r--r-- | pkg/repro/repro.go | 43 |
1 files changed, 43 insertions, 0 deletions
diff --git a/pkg/repro/repro.go b/pkg/repro/repro.go index 25bfbc3dd..b10d29784 100644 --- a/pkg/repro/repro.go +++ b/pkg/repro/repro.go @@ -33,6 +33,9 @@ type Result struct { // Information about the final (non-symbolized) crash that we reproduced. // Can be different from what we started reproducing. Report *report.Report + // A very rough estimate of the probability with which the resulting syz + // reproducer crashes the kernel. + Reliability float64 } type Stats struct { @@ -263,9 +266,49 @@ func (ctx *reproContext) repro() (*Result, error) { } } } + // Validate the resulting reproducer - a random rare kernel crash might have diverted the process. + res.Reliability, err = calculateReliability(func() (bool, error) { + ret, err := ctx.testProg(res.Prog, res.Duration, res.Opts, false) + if err != nil { + return false, err + } + ctx.reproLogf(2, "validation run: crashed=%v", ret.Crashed) + return ret.Crashed, nil + }) + if err != nil { + ctx.reproLogf(2, "could not calculate reliability, err=%v", err) + return nil, err + } + + const minReliability = 0.15 + if res.Reliability < minReliability { + ctx.reproLogf(1, "reproducer is too unreliable: %.2f", res.Reliability) + return nil, err + } + return res, nil } +func calculateReliability(cb func() (bool, error)) (float64, error) { + const ( + maxRuns = 10 + enoughOK = 3 + ) + total := 0 + okCount := 0 + for i := 0; i < maxRuns && okCount < enoughOK; i++ { + total++ + ok, err := cb() + if err != nil { + return 0, err + } + if ok { + okCount++ + } + } + return float64(okCount) / float64(total), nil +} + func (ctx *reproContext) extractProg(entries []*prog.LogEntry) (*Result, error) { ctx.reproLogf(2, "extracting reproducer from %v programs", len(entries)) start := time.Now() |
