aboutsummaryrefslogtreecommitdiffstats
path: root/pkg/codesearch
Commit message (Collapse)AuthorAgeFilesLines
* pkg/aflow/tool/codesearcher: add end-to-end testsDmitry Vyukov2026-02-192-2/+8
| | | | Update #6811
* tools/clang/codesearch: improve codesearch to handle global variablesArtem Metla2026-02-175-9/+90
| | | | | | | | | | | Contributes to #6469. To handle global variables: * Add EntityKindGlobalVariable * Modify TraverseVarDecl() function logic * Add a check to ensure StartLine and EndLine are in the same file * Fix missing #include <cstdint> in json.h
* pkg/codesearch: test that compile_commands.json is loadedDmitry Vyukov2026-02-131-0/+7
|
* tools/clang: compile clang tools into the binaryDmitry Vyukov2026-02-061-1/+2
| | | | | | | | | | | | | | | | | | Compiled clang tools into Go binaries using cgo. This significantly simplifies building and deployment. This also enables unit testing of clang tools. Now raw go test for clang tools will build them, run, and verify output. Each clang tool is still started as a subprocess. I've experimented with running them in-process, but this makes stdout/stderr interception extremly complicated, and it seems that clang tools still use unsynchronized global state, which breaks when invoked multiple times. Subprocesses also make it safer in the face of potential memory leaks, or memory corruptions in clang tools. Fixes #6645
* pkg/codesearch: remove check for invalid C which is not expected at this pointTamas Koczka2026-01-281-1/+1
| | | | Also fixes a lint error.
* pkg/codesearch: expose struct layout in codesearchTamas Koczka2026-01-286-7/+138
| | | | | | | | | | - Extract struct field offsets and sizes in the C++ codesearch indexer. - Add 'fields' to the JSON definition output. - Update pkg/codesearch to parse and expose the new field information. - Add 'struct-layout' command to syz-codesearch for debugging. - Add 'codesearch-struct-layout' tool to pkg/aflow/tool/codesearcher/ to allow LLM agents to query struct memory layout and map byte offsets to fields. - Support pointer marshaling for optional JSON values (e.g. *uint)
* pkg/codesearch: support finding field reads/writesDmitry Vyukov2026-01-2614-17/+179
|
* tools/clang/json: escape strings properlyFlorent Revest2026-01-254-20/+43
| | | | | | | | | | | | | When preparing a codesearch index, I encountered errors which I narrowed down to lines like the following in the json output of codesearch: "type": "void (void __attribute__((btf_type_tag("user")))*, const void *, size_t, size_t)", After this change, the line gets formatted like this: "type": "void (void __attribute__((btf_type_tag(\"user\")))*, const void *, size_t, size_t)", This fixes the errors I encountered
* pkg/codesearch: reduce memory consumption a bit moreDmitry Vyukov2026-01-227-55/+56
| | | | | Use uint32 instead of int for line numbers (2G lines should be enough for everyone). Reorder fields to remove unnecessary paddings.
* pkg/codesearch: reduce memory consumption moreDmitry Vyukov2026-01-222-12/+104
| | | | | Use uint8 enums instead of strings to store entity/reference kind. String is 16 bytes and is slower to work with.
* pkg/codesearch: reduce memory consumption when building indexDmitry Vyukov2026-01-221-8/+51
| | | | | | | | | | | | | | | | | | With all references in the index, it become quite big. Merge and dedup the resulting index on the fly. Also intern all strings b/c there are tons of duplicates. This also removes unnecessary duplicates (effectively ODR violations in the kernel) due to use of BUILD_BUG_ON. The macro produces different function calls in different translations units, so the same function may contain __compiletime_assert_N1 call in one TU and __compiletime_assert_N2 in another. Over this reduces resource consumption of index building from: time:296.11s user:16993.71s sys:6661.03s memory:82707MB to: time:194.28s user:16860.01s sys:6647.01s memory: 3243MB 25x reduction in memory consumption.
* tools/clang/codesearch: index struct referencesDmitry Vyukov2026-01-225-1/+60
| | | | Update #6469
* pkg/codesearch: do indexing of struct/union/enumDmitry Vyukov2026-01-2118-26/+245
| | | | Update #6469
* pkg/codesearch: fix resolving of static functions declared in headersDmitry Vyukov2026-01-214-9/+41
| | | | Update #6469
* pkg/aflow/tool/codesearcher: take into account DB format when cachingDmitry Vyukov2026-01-211-0/+16
| | | | | | If format of the codesearch DB file changes, we need to create new DB rather than use old cached one. Add DB format hash to cache signature.
* pkg/codesearch: support searching for referencesDmitry Vyukov2026-01-2122-28/+482
| | | | | | | | | Extend codesearch clang tool to export info about function references (calls, takes-address-of). Add pkg/codesearch command find-references. Export find-references in pkg/aflow/tools/codesearcher to LLMs. Update #6469
* pkg/aflow/action/kernel: keep build files that codesearch will needDmitry Vyukov2026-01-201-2/+13
| | | | | We currently duplicate list of source extensions in the build action and codesearch tool. Unify the lists.
* pkg/aflow: add BadCallErrorDmitry Vyukov2026-01-2010-41/+45
| | | | | | | | | | The error allows tools to communicate that an error is not an infrastructure error that must fail the whole workflow, but rather a bad tool invocation by an LLM (e.g. asking for a non-existent file contents). Previously in the codesearcher tool we used a separate Missing bool to communicate that. With the error everything just becomes cleaner and nicer. The errors also allows all other tools to communicate any errors to the LLM when the normal results cannot be provided and don't make sense.
* pkg/codesearch: add read-file commandDmitry Vyukov2026-01-206-0/+43
| | | | Just provides full file contents as last resort.
* pkg/codesearch: add dir-index commandDmitry Vyukov2026-01-2011-0/+116
| | | | | dir-index provides a list of subdirectories and files in the given directory in the source tree.
* pkg/codesearch: add skeleton for code searching toolDmitry Vyukov2025-11-2021-0/+502
Add a clang tool that is used for code indexing (tools/clang/codesearch/). It follows conventions and build procedure of the declextract tool. Add pkg/codesearch package that aggregates the info exposed by the clang tools, and allows doing simple queries: - show source code of an entity (function, struct, etc) - show entity comment - show all entities defined in a source file Add tools/syz-codesearch wrapper tool that allows to create index for a kernel build, and then run code queries on it.