diff options
| author | Dmitry Vyukov <dvyukov@google.com> | 2024-10-21 11:53:44 +0200 |
|---|---|---|
| committer | Dmitry Vyukov <dvyukov@google.com> | 2024-10-24 09:34:38 +0000 |
| commit | 9fc8fe026baab9959459256f2d47f4bbf21d405a (patch) | |
| tree | 6d97a7ac2b8e69f5fa7a92a4b3824b1ad9e571c7 /pkg/rpcserver/rpcserver.go | |
| parent | a85e9d5032fdf305457a6400bd3af4a8df6c45c4 (diff) | |
executor: better handling for hanged test processes
Currently we kill hanged processes and consider the corresponding test finished.
We don't kill/wait for the actual test subprocess (we don't know its pid to kill,
and waiting will presumably hang). This has 2 problems:
1. If the hanged process causes "task hung" report, we can't reproduce it,
since the test finished too long ago (manager thinks its finished and
discards the request).
2. The test process still consumed per-pid resources.
Explicitly detect and handle such cases:
Manager keeps these hanged tests forever,
and we assign a new proc id for future processes
(don't reuse the hanged one).
Diffstat (limited to 'pkg/rpcserver/rpcserver.go')
| -rw-r--r-- | pkg/rpcserver/rpcserver.go | 12 |
1 files changed, 7 insertions, 5 deletions
diff --git a/pkg/rpcserver/rpcserver.go b/pkg/rpcserver/rpcserver.go index 4a0587c53..53181dd2b 100644 --- a/pkg/rpcserver/rpcserver.go +++ b/pkg/rpcserver/rpcserver.go @@ -416,11 +416,13 @@ func (serv *server) CreateInstance(id int, injectExec chan<- bool, updInfo dispa infoc: make(chan chan []byte), requests: make(map[int64]*queue.Request), executing: make(map[int64]bool), - lastExec: MakeLastExecuting(serv.cfg.Procs, 6), - stats: serv.runnerStats, - procs: serv.cfg.Procs, - updInfo: updInfo, - resultCh: make(chan error, 1), + hanged: make(map[int64]bool), + // Executor may report proc IDs that are larger than serv.cfg.Procs. + lastExec: MakeLastExecuting(prog.MaxPids, 6), + stats: serv.runnerStats, + procs: serv.cfg.Procs, + updInfo: updInfo, + resultCh: make(chan error, 1), } serv.mu.Lock() defer serv.mu.Unlock() |
