| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
If we have duplicate names, then only one of the duplicates will be used at random.
Add a check that we don't have duplicate names.
Currently it's only "crash-reproducer" (both action and a tool).
Also ignore "set-results" tool, and all tools created in tests.
|
| |
|
|
|
|
|
|
|
| |
There is no point in using Provide more than once,
and anywhere besides the first action of a flow.
So it's not really an action, but more of a flow property.
Add Flow.Consts field to handle this case better.
Also provide slightly less verbose syntax by using a map
instead of a struct, and add tests.
|
| |
|
|
|
|
|
|
|
|
| |
Introduce abstract "task type" for LLM agents instead of specifying
temperature explicitly for each agent. This has 2 advantages:
- we don't hardcode it everywhere, and can change centrally
as our understanding of the right temperature evolves
- we can control other LLM parameters (topn/topk) using task type as well
Update #6576
|
| | |
|
| |
|
|
|
|
| |
If LLMAgent.Temperature is assigned an untyped float const (0.5)
it will be typed as float64 rather than float32. So recast them.
Cap Temperature at model's supported MaxTemperature.
|
| |
|
|
|
|
|
|
| |
A bunch of NFC refactorings:
- split action verification into 2 phases (inputs/outputs)
- change how LLMTool is verified
- remove some unused fields/parameters
- improve error messages a bit
|
| |
|
|
|
|
|
| |
I've added NewPipeline constructor for a bit nicer syntax,
but failed to use it in actual workflows.
Unexport Pipeline and rename NewPipeline to Pipeline.
This slightly improves workflows definition syntax.
|
| | |
|
| |
|
|
|
|
| |
Add helper function that executes test workflows,
compares results (trajectory, LLM requests) against golden files,
and if requested updates these golden files.
|
| |
|
|
|
|
| |
Using cached replies is faster, cheaper, and more reliable.
Espcially handy during development when the same workflows
are retried lots of times with some changes.
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
Currently we handle several errors in LLMAgent (wrong tool name,
wrong tool arguments), and return the error to LLM,
but nothing is injected into the trajectory wrt what happened.
This makes trajectory incomplete and confusing,
one just sees repeated LLM calls w/o understanding what caused them.
Inject these tool failures into the trace, so that it's clear
what happened.
|
| | |
|
| |
|
|
| |
This seems to help a bit with number of round-trips.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Gracefully handle (reply to LLM with error):
- incorrect tool name
- incorrect tool arg type
- missing tool arg
Silently handle:
- more than one call to set-results
- excessive tool args
Fixes #6604
|
| |
|
|
|
|
|
|
| |
Detect model quota violations (assumed to be RPD).
Make syz-agent not request jobs that use the model
until the next quota reset time.
Fixes #6573
|
| |
|
|
|
|
| |
Having LLM model per-agent is even more flexible than per-flow.
We can have some more complex tasks during patch generation with the most elaborate model,
but also some simpler ones with less elaborate models.
|
| |
|
|
|
|
|
|
|
| |
Add LLMAgent.Candidates parameter.
If set to a value N>1, then the agent is invoked N times,
and all outputs become slices.
The results can be later aggregated by another agent,
as shown in the test.
|
| |
|
|
|
| |
We may want to use a weaker model for some workflows.
Allow to use different models for different workflows.
|
| |
|