| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
Update #6811
|
| |
|
|
|
|
|
|
|
|
|
| |
Contributes to #6469.
To handle global variables:
* Add EntityKindGlobalVariable
* Modify TraverseVarDecl() function logic
* Add a check to ensure StartLine and EndLine are in the same file
* Fix missing #include <cstdint> in json.h
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compiled clang tools into Go binaries using cgo.
This significantly simplifies building and deployment.
This also enables unit testing of clang tools.
Now raw go test for clang tools will build them, run,
and verify output.
Each clang tool is still started as a subprocess.
I've experimented with running them in-process,
but this makes stdout/stderr interception extremly complicated,
and it seems that clang tools still use unsynchronized global state,
which breaks when invoked multiple times.
Subprocesses also make it safer in the face of potential memory leaks,
or memory corruptions in clang tools.
Fixes #6645
|
| |
|
|
| |
Also fixes a lint error.
|
| |
|
|
|
|
|
|
|
|
| |
- Extract struct field offsets and sizes in the C++ codesearch indexer.
- Add 'fields' to the JSON definition output.
- Update pkg/codesearch to parse and expose the new field information.
- Add 'struct-layout' command to syz-codesearch for debugging.
- Add 'codesearch-struct-layout' tool to pkg/aflow/tool/codesearcher/
to allow LLM agents to query struct memory layout and map byte offsets to fields.
- Support pointer marshaling for optional JSON values (e.g. *uint)
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
When preparing a codesearch index, I encountered errors which I narrowed
down to lines like the following in the json output of codesearch:
"type": "void (void __attribute__((btf_type_tag("user")))*, const void *, size_t, size_t)",
After this change, the line gets formatted like this:
"type": "void (void __attribute__((btf_type_tag(\"user\")))*, const void *, size_t, size_t)",
This fixes the errors I encountered
|
| |
|
|
|
| |
Use uint32 instead of int for line numbers (2G lines should be enough for everyone).
Reorder fields to remove unnecessary paddings.
|
| |
|
|
|
| |
Use uint8 enums instead of strings to store entity/reference kind.
String is 16 bytes and is slower to work with.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With all references in the index, it become quite big.
Merge and dedup the resulting index on the fly.
Also intern all strings b/c there are tons of duplicates.
This also removes unnecessary duplicates (effectively ODR violations in the kernel)
due to use of BUILD_BUG_ON. The macro produces different function calls
in different translations units, so the same function may contain
__compiletime_assert_N1 call in one TU and __compiletime_assert_N2 in another.
Over this reduces resource consumption of index building from:
time:296.11s user:16993.71s sys:6661.03s memory:82707MB
to:
time:194.28s user:16860.01s sys:6647.01s memory: 3243MB
25x reduction in memory consumption.
|
| |
|
|
| |
Update #6469
|
| |
|
|
| |
Update #6469
|
| |
|
|
| |
Update #6469
|
| |
|
|
|
|
| |
If format of the codesearch DB file changes,
we need to create new DB rather than use old cached one.
Add DB format hash to cache signature.
|
| |
|
|
|
|
|
|
|
| |
Extend codesearch clang tool to export info about function references
(calls, takes-address-of).
Add pkg/codesearch command find-references.
Export find-references in pkg/aflow/tools/codesearcher to LLMs.
Update #6469
|
| |
|
|
|
| |
We currently duplicate list of source extensions in the build action
and codesearch tool. Unify the lists.
|
| |
|
|
|
|
|
|
|
|
| |
The error allows tools to communicate that an error is not an infrastructure error
that must fail the whole workflow, but rather a bad tool invocation by an LLM
(e.g. asking for a non-existent file contents).
Previously in the codesearcher tool we used a separate Missing bool
to communicate that. With the error everything just becomes cleaner and nicer.
The errors also allows all other tools to communicate any errors to the LLM
when the normal results cannot be provided and don't make sense.
|
| |
|
|
| |
Just provides full file contents as last resort.
|
| |
|
|
|
| |
dir-index provides a list of subdirectories and files in the given
directory in the source tree.
|
|
|
Add a clang tool that is used for code indexing (tools/clang/codesearch/).
It follows conventions and build procedure of the declextract tool.
Add pkg/codesearch package that aggregates the info exposed by the clang tools,
and allows doing simple queries:
- show source code of an entity (function, struct, etc)
- show entity comment
- show all entities defined in a source file
Add tools/syz-codesearch wrapper tool that allows to create index for a kernel build,
and then run code queries on it.
|