| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Currently we added custom code to kernel build action,
and few others to expose verbose errors from executed binaries (notably make).
But lots of other binary executions missing this logic,
e.g. for git failure we currently see unuseful:
failed to run ["git" "fetch" "--force" "--tags" exit status 128
Instead of adding more and more custom code to do the same,
remove the custom code and always add verbose output
in syz-agent and tools/syz-aflow.
|
| |
|
|
|
|
|
|
|
| |
The tests fail on OpenBSD with:
expected: "bad expression: fatal: command line, 'bad expression (': Unmatched ( or \\("
actual : "bad expression: fatal: command line, 'bad expression (': parentheses not balanced"
Disable the tests on non-linux for now.
|
| |
|
|
|
| |
Requesting to return the program as one of the agent's outputs
enforces its structure and prevents LLM from using garbage formatting.
|
| |
|
|
|
|
|
| |
If we have duplicate names, then only one of the duplicates will be used at random.
Add a check that we don't have duplicate names.
Currently it's only "crash-reproducer" (both action and a tool).
Also ignore "set-results" tool, and all tools created in tests.
|
| |
|
|
| |
Fixes #6897
|
| |
|
|
|
|
|
|
|
| |
There is no point in using Provide more than once,
and anywhere besides the first action of a flow.
So it's not really an action, but more of a flow property.
Add Flow.Consts field to handle this case better.
Also provide slightly less verbose syntax by using a map
instead of a struct, and add tests.
|
| |
|
|
|
|
|
| |
LLM seems to have some knowledge about syzkaller program syntax,
but presumably it's still useful to give it all details about syntax.
Update #6878
|
| |
|
|
|
|
|
|
|
|
| |
It's useful to be able to look at the kernel source code
when creating a reproducer for a bug. So give the agent
codesearch tools.
Also slightly refine prompt wording.
Update #6878
|
| |
|
|
|
|
|
| |
Provide some instructions on how tools should be named, implemented
and registered.
Update #6878
|
| |
|
|
|
|
| |
Teach the repro flow about the `read-description` tool.
Update #6878
|
| |
|
|
|
|
|
|
|
|
|
| |
Adds a tool that allows an agent to read the content of syzlang
description files (e.g., `sys.txt`, `socket.txt`).
Providing the ability to fetch exact system call definitions helps
reasoning models generate correct and compiling programs from crash
reports.
Update #6878
|
| | |
|
| |
|
|
|
|
|
| |
Collect code coverage for test programs.
This is likley to be needed for #6878 and seed generation workflow.
For now it's not wired into any workflow/tool and is not tested.
But this should provide most of the plumbing to wire it up.
|
| | |
|
| |
|
|
|
|
| |
When we combine tool sets for agents, there is always a protential
problem with aliasing existing slices and introducing subtle bugs.
Add Tools function that can append tool/tool sets w/o aliasing problem.
|
| |
|
|
| |
Update #6878
|
| |
|
|
| |
Update #6878
|
| | |
|
| |
|
|
| |
It is not used.
|
| |
|
|
|
| |
There's no workflow implementation, but having the const there will let
us implement the dashboard side in parallel.
|
| |
|
|
|
|
| |
This allows auto-upstreamming of actionable bugs.
Fixes #6779
|
| |
|
|
|
|
|
| |
Currently we crash on nil deref, if LLM specifies explicit 'nil'
for an optional (pointer) argument. Handle such cases properly.
Fixes #6811
|
| |
|
|
| |
Update #6811
|
| |
|
|
|
|
| |
The MCP server exports all aflow tools (and actions as tools) we have.
Fixes #6763
|
| |
|
|
| |
This will be needed to an MCP server.
|
| |
|
|
|
| |
Linter started complaining about too high cyclomatic complexity.
Split the chat function.
|
| |
|
|
|
| |
In some cases there may be not final text reply, only some structured outputs
(e.g. some bool). Don't require final reply, if structured outputs are specified.
|
| |
|
|
|
|
| |
If LLM calls set-results tool to set structured results,
and then calls another unrelated tool, currently we lose structured results
(overwrite with nil). Don't do that, keep structured results.
|
| |
|
|
| |
We don't a separate var for agent, nor the Pipeline for 1 agent.
|
| |
|
|
|
| |
If LLM searches for "->", grep considered it as a flag and failed.
Add "--" before the expression to fix such cases.
|
| |
|
|
| |
Now it's compiled into the syz-agent binary itself.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compiled clang tools into Go binaries using cgo.
This significantly simplifies building and deployment.
This also enables unit testing of clang tools.
Now raw go test for clang tools will build them, run,
and verify output.
Each clang tool is still started as a subprocess.
I've experimented with running them in-process,
but this makes stdout/stderr interception extremly complicated,
and it seems that clang tools still use unsynchronized global state,
which breaks when invoked multiple times.
Subprocesses also make it safer in the face of potential memory leaks,
or memory corruptions in clang tools.
Fixes #6645
|
| |
|
|
| |
Provide better errors messages on boot errors.
|
| |
|
|
| |
Update #6578
|
| |
|
|
|
|
| |
Add a tool that executes git grep with the given expression.
It can handle long tail of cases that codesearcher can't handle,
while still providing less output than reading whole files.
|
| |
|
|
| |
Fixes #6671
|
| |
|
|
|
| |
We need to run git log in the master git repo b/c out KernelSrc/KernelScratchSrc
are shallow checkouts that don't have history.
|
| |
|
|
|
|
|
|
|
|
| |
Introduce abstract "task type" for LLM agents instead of specifying
temperature explicitly for each agent. This has 2 advantages:
- we don't hardcode it everywhere, and can change centrally
as our understanding of the right temperature evolves
- we can control other LLM parameters (topn/topk) using task type as well
Update #6576
|
| |
|
|
|
|
|
| |
Give LLM the recent commit subjects when it generates description,
so that it can use the same style.
Add infrastrcuture to write end-to-end action tests to test it.
|
| |
|
|
| |
This will cause dashboard to log errors.
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
| |
It's very inconvinient to hardcode exact LLM replies in this test,
because it's hard to understand when exactly it will be asked to summarize.
It's easy to make a bug in the test, and provide summary reply when it wasn't asked to.
Instead support proving full generateContent callback,
and just model what an LLM would do -- provide summary only when it's asked to.
|
| |
|
|
| |
Don't memorize repeated request configs.
|
| |
|
|
|
| |
More instructions slightly more concrete,
and add details about some bug types.
|
| |
|
|
|
|
| |
Move it so that it can be accessed by the dashboard as well.
Add kernel branch to output (it's needed for gerrit),
provide actual kernel commit hash instead of tag name.
|
| |
|
|
|
|
|
|
|
| |
This adds a flow feature (and creates a new flow using it) called
"sliding window summary".
It works by asking the AI to always summarize the latest knowledge,
and then we toss the old messages if they fall outside the context
sliding window.
|
| |
|
|
|
|
|
|
|
| |
This caching is very handy when testing some dashboard features
related to stating jobs, or handling jobs completion,
or testing changes in the last steps of patching workflow.
Without caching each testing takes 10 mins,
with caching the whole workflow completes almost immidiatly .
|
| |
|
|
| |
Provide base kernel repo/commit and recipients (to/cc) for patches.
|
| |
|
|
|
| |
Sometimes LLM requests just hang dead for tens of minutes,
abort them after 10 minutes and retry.
|