| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
Quarter long aggregation means thousands of gzip files.
Opening all the files in parallel we struggle from:
1. Memory overhead.
2. GCS API errors. It can't read Attrs for 1500+ files.
|
| |
|
|
| |
New code will be limited to max 7 function params.
|
| |
|
|
|
|
|
|
| |
1. Init coveragedb client once and propagate it through context to enable mocking.
2. Always init coverage handlers. It simplifies testing.
3. Read webGit and coveragedb client from ctx to make it mockable.
4. Use int for file line number and int64 for merged coverage.
5. Add tests.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
1. Make heatmap testable, move out the spanner client instantiation.
2. Generate spannerdb.ReadOnlyTransaction mocks.
3. Generate spannerdb.RowIterator mocks.
4. Generate spannerdb.Row mocks.
5. Prepare spannerdb fixture.
6. Fixed html control name + value.
7. Added multiple tests.
8. Show line coverage from selected manager.
9. Propagate coverage url params to file coverage url.
|
| |
|
|
|
|
|
|
| |
The problem is the deadlock happening on GCS storage error.
GCS client establishes the connection when it has enough data to write.
It is approximately 16M. The error happens on io.Writer access in the middle of the merge.
This error terminates errgroup goroutine. Other goroutines remain blocked.
This commit propagates termination signal to other goroutines.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previous implementation store only the summary of processed records.
The summary was <1GB and single processing node was able to manipulate the data.
Current implementation stores all the details about records read to make post-processing more flexible.
This change was needed to get access to the source manager name and will help to analyze other details.
This new implementation requires 20GB mem to process single day records.
CSV log interning experiment allowed to merge using 10G.
Quarter data aggregation will cost ~100 times more.
The alternative is to use stream processing. We can process data kernel-file-by-file.
It allows to /15000 memory consumption.
This approach is implemented here.
We're batching coverage signals by file and store per-file results in GCS JSONL file.
See https://jsonlines.org/ to learn about jsonl.
|
| |
|
|
|
| |
Storing all the details about coverage data source we're able to better explain the origin.
This origin data is currently used to get "manager" name.
|
| |
|
|
|
|
| |
We currently merge bigquery data for every line coverage request.
Let's read cached lines coverage data from spanner instead.
It allows to get only 1 file version from git and skip the data merge step.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
It directly uses the coverage signals from BigQuery.
There is no need to wait for the coverage_batch cron jobs.
Looks good for debugging.
Limitations:
1. It is slow. I know how to speed up but want to stabilize the UI first.
2. It is expensive because of the direct BQ requests. Limited to admin only because of it.
3. It merges only the commits reachable on github because of the gitweb throttling.
After the UI stabilization I'll save all the required artifacts to spanner and make this page publicly available.
To merge all the commits, not the github reachable only, http git caching instance is needed.
|
| |
|
|
|
| |
If no coverage for file is available, we panic(nil deref).
New code doesn't panic.
|
| |
|
|
|
|
|
| |
The first versions of this code used branches for git checkout.
Later the git repos were merged and branch info usage was reduced.
The latest changes switched the code from branch checkout to commit checkout.
There is no need in branch info anymore and I don't see any use-cases for it.
|
| | |
|
| | |
|
| |
|