| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we added custom code to kernel build action,
and few others to expose verbose errors from executed binaries (notably make).
But lots of other binary executions missing this logic,
e.g. for git failure we currently see unuseful:
failed to run ["git" "fetch" "--force" "--tags" exit status 128
Instead of adding more and more custom code to do the same,
remove the custom code and always add verbose output
in syz-agent and tools/syz-aflow.
|
| |
|
|
|
| |
Requesting to return the program as one of the agent's outputs
enforces its structure and prevents LLM from using garbage formatting.
|
| |
|
|
|
|
|
| |
If we have duplicate names, then only one of the duplicates will be used at random.
Add a check that we don't have duplicate names.
Currently it's only "crash-reproducer" (both action and a tool).
Also ignore "set-results" tool, and all tools created in tests.
|
| |
|
|
|
|
|
|
|
| |
There is no point in using Provide more than once,
and anywhere besides the first action of a flow.
So it's not really an action, but more of a flow property.
Add Flow.Consts field to handle this case better.
Also provide slightly less verbose syntax by using a map
instead of a struct, and add tests.
|
| |
|
|
|
|
|
| |
LLM seems to have some knowledge about syzkaller program syntax,
but presumably it's still useful to give it all details about syntax.
Update #6878
|
| |
|
|
|
|
|
|
|
|
| |
It's useful to be able to look at the kernel source code
when creating a reproducer for a bug. So give the agent
codesearch tools.
Also slightly refine prompt wording.
Update #6878
|
| |
|
|
|
|
| |
Teach the repro flow about the `read-description` tool.
Update #6878
|
| | |
|
| |
|
|
|
|
| |
When we combine tool sets for agents, there is always a protential
problem with aliasing existing slices and introducing subtle bugs.
Add Tools function that can append tool/tool sets w/o aliasing problem.
|
| |
|
|
| |
Update #6878
|
| |
|
|
| |
Update #6878
|
| | |
|
| |
|
|
| |
It is not used.
|
| |
|
|
|
|
| |
This allows auto-upstreamming of actionable bugs.
Fixes #6779
|
| |
|
|
| |
Now it's compiled into the syz-agent binary itself.
|
| |
|
|
| |
Update #6578
|
| |
|
|
|
|
| |
Add a tool that executes git grep with the given expression.
It can handle long tail of cases that codesearcher can't handle,
while still providing less output than reading whole files.
|
| |
|
|
|
| |
We need to run git log in the master git repo b/c out KernelSrc/KernelScratchSrc
are shallow checkouts that don't have history.
|
| |
|
|
|
|
|
|
|
|
| |
Introduce abstract "task type" for LLM agents instead of specifying
temperature explicitly for each agent. This has 2 advantages:
- we don't hardcode it everywhere, and can change centrally
as our understanding of the right temperature evolves
- we can control other LLM parameters (topn/topk) using task type as well
Update #6576
|
| |
|
|
|
|
|
| |
Give LLM the recent commit subjects when it generates description,
so that it can use the same style.
Add infrastrcuture to write end-to-end action tests to test it.
|
| |
|
|
|
| |
More instructions slightly more concrete,
and add details about some bug types.
|
| |
|
|
|
|
| |
Move it so that it can be accessed by the dashboard as well.
Add kernel branch to output (it's needed for gerrit),
provide actual kernel commit hash instead of tag name.
|
| |
|
|
|
|
|
|
|
| |
This adds a flow feature (and creates a new flow using it) called
"sliding window summary".
It works by asking the AI to always summarize the latest knowledge,
and then we toss the old messages if they fall outside the context
sliding window.
|
| |
|
|
| |
Provide base kernel repo/commit and recipients (to/cc) for patches.
|
| | |
|
| |
|
|
|
| |
Lots of workflows may want to have special handling for particular bug types.
Provide common helpers that make it easy to act on bug type.
|
| | |
|
| |
|
|
|
|
|
|
| |
Make codeeditor error on nop changes that don't actually change the code.
Make patch testing error on empty patch.
Perhaps we need a notion of "mandatory" tools that must be called
successfully at least once... not sure yet.
|
| |
|
|
|
|
| |
Add DoWhile.MaxIterations and make it mandatory.
I think it's useful to make workflow implementer to think
explicitly about a reasonable cap on the number of iterations.
|
| |
|
|
|
| |
Add code editing tool, and patch testing action to the workflow.
Add a loop that asks to fix/regenerate patch on test errors.
|
| |
|
|
| |
The action checks out a temp dir for code edits and patch testing.
|
| |
|
|
|
|
|
| |
I've added NewPipeline constructor for a bit nicer syntax,
but failed to use it in actual workflows.
Unexport Pipeline and rename NewPipeline to Pipeline.
This slightly improves workflows definition syntax.
|
| |
|
|
|
| |
It can answer complex questions about kernel,
and provide a concise answer to other LLMs.
|
| |
|
|
|
| |
For situations where the user wants to reproduce bugs against a
different repository than mainline.
|
| |
|
|
|
|
| |
Having LLM model per-agent is even more flexible than per-flow.
We can have some more complex tasks during patch generation with the most elaborate model,
but also some simpler ones with less elaborate models.
|
| |
|
|
|
| |
We may want to use a weaker model for some workflows.
Allow to use different models for different workflows.
|
| |
|
|
|
| |
Add race:harmful/benign label.
Set it automatically by confirmed AI jobs.
|
| |
|
|
|
|
| |
Add workflow that can be used for moderation of UAF bugs (consistent/actionable reports),
such UAF bugs can be upstreammed automatically, even if they happened only once
and don't have a reproducer.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rephrase the prompt to be only about KCSAN,
currently it has some leftovers from more generic assessment prompt
that covered KASAN bugs as well (actionability).
Also add Confident bool output.
We may want to act on both benign/non-benign,
so we need to know when LLM wasn't actually sure either way.
This should also be useful for manual verification/statistics.
If LLM is not confident and can can admit that, it's much better
than giving a wrong answer. But we will likely want to track
percent of non-confident answers.
|
| | |
|
| | |
|
| |
|