Durable Session PR #5 (Icon Flyover, More Bug Fixes, Corner Cases)#2825
Durable Session PR #5 (Icon Flyover, More Bug Fixes, Corner Cases)#2825
Conversation
sawka
commented
Feb 5, 2026
- Track wave version when job was created (for future backward compat)
- New Flyover Menu off of "shell durability" icon. Help + Actions + UX
- Bug with conn typeahead not closing when clicking disconnect
- Auto-reconnect jobs that have been disconnected (check for "gone" state)
- Disable tab indicator stuff (only indicates tab not block)
- Fix bug with dev logging path on connserver
- Fix bugs with restarting blockcontrollers on startup
- Fix bugs with getting HasExited status from job manager
- Fix startup files, quoting, tilde, etc in start job flow
- ...
WalkthroughThis PR introduces durable session management improvements and connection handling refactoring. It adds a new DurableSessionFlyover component for displaying durable SSH session status with multiple state-specific UI elements. The backend adds auto-reconnect capability to the job controller when connections drop, requires connection names during controller instantiation across shell/durable shell/tsunami controllers, and propagates startup errors and wave version information through the job status. Configuration defaults shift the bell indicator from enabled to disabled, shell execution paths use remote home directories instead of hardcoded tildes, and connserver log paths become user-specific. Telemetry recognition expands to include durable session actions. Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cmd/wsh/cmd/wshcmd-connserver.go (1)
375-390:⚠️ Potential issue | 🟡 MinorRestrict dev log file permissions to owner-only.
0644makes dev logs world-readable; these logs can include connection details or tokens. Prefer0600to limit access to the current user.🔒 Proposed fix
- logFile, err = os.OpenFile(logFilePath, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0644) + logFile, err = os.OpenFile(logFilePath, os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0600)
🤖 Fix all issues with AI agents
In `@frontend/app/block/durable-session-flyover.tsx`:
- Around line 201-203: The file defines user-facing strings titleText and
descriptionText (and flag showRestartButton) in durable-session-flyover.tsx with
placeholder copy; replace the placeholder values so no placeholder text ships:
update titleText to the final UX copy, replace descriptionText with the real
descriptive message from product copy, and set showRestartButton according to
the intended UX (true/false) or wire it to the component state/props if dynamic;
ensure you update any related rendering logic that references
titleText/descriptionText/showRestartButton so the new copy displays correctly.
- Around line 270-301: In getContentToRender, the ConnStatus property is
accessed as connStatus?.connected which doesn't exist; update the
DurableAwaitingStart prop to compute connected from connStatus.status (e.g.,
connected={connStatus?.status === "connected"}) so it uses the canonical .status
field; adjust the call site in getContentToRender (and any similar usages) and
ensure DurableAwaitingStart still receives a boolean.
In `@pkg/jobcontroller/jobcontroller.go`:
- Around line 99-100: The lastAutoReconnectAttempt sync map (created via
MakeSyncMap[int64] and keyed by job ID) accumulates stale timestamps because
entries are never removed when jobs are removed; update the DeleteJob function
to delete the corresponding entry from lastAutoReconnectAttempt for the deleted
job ID (e.g., call lastAutoReconnectAttempt.Delete(jobID) or equivalent) so that
map entries are cleaned up when a job is deleted. Ensure you reference the same
sync map variable name (lastAutoReconnectAttempt) and perform the removal after
successful job deletion to avoid races.
- Around line 105-130: The EventSubCommand calls in InitJobController are
missing error handling; update the init to capture the returned error from each
wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{...}, nil), check
for non-nil error, and log or handle it (e.g., via processLogger or a dedicated
logger) and optionally retry or cancel initialization; also ensure the
EventListener.On registrations (rpcClient.EventListener.On with
handleRouteUpEvent, handleRouteDownEvent, handleConnChangeEvent,
handleBlockCloseEvent) remain before subscriptions so failures can be correlated
with the registered handlers.
In `@pkg/shellexec/shellexec.go`:
- Around line 375-386: The code builds integration file paths using
remoteInfo.HomeDir which can be empty for older/partial remotes; update
StartRemoteShellProc and StartRemoteShellJob to guard against an empty
remoteInfo.HomeDir by substituting a safe default (e.g., "~") before
constructing bashPath, waveFishPath, pwshPath and any other paths; ensure the
check happens once after obtaining remoteInfo and use the sanitized homeDir
variable in all fmt.Sprintf calls and option construction so pwsh -File and
other shell flags receive valid paths.
- Around line 382-384: The fish `-C` source command is built with an unquoted
path causing failures on paths with spaces; update the construction of carg to
call shellutil.HardQuoteFish(waveFishPath) and embed that quoted value in the
`"source %s"` string before appending to shellOpts (replace the existing
waveFishPath usage). Apply this change in all three locations where carg and
waveFishPath are created: the interactive remote block (where shellOpts is
appended with "-C", carg), the non-interactive remote block, and the local shell
block so that carg uses shellutil.HardQuoteFish and shellOpts remains unchanged
otherwise.
🧹 Nitpick comments (1)
frontend/app/block/durable-session-flyover.tsx (1)
160-165: Remove the duplicateDurableEndedContentPropsdefinition.
The duplication is confusing and easy to drift.♻️ Suggested cleanup
-interface DurableEndedContentProps { - doneReason: string; - viewModel: TermViewModel; - onClose: () => void; -}Also applies to: 183-188
| let titleText = "Durable Session (Ended)"; | ||
| let descriptionText = "[Placeholder text - will be updated]"; | ||
| let showRestartButton = false; |
There was a problem hiding this comment.
Replace placeholder copy before release.
User-facing placeholder text shouldn’t ship.
✏️ Suggested copy
- let descriptionText = "[Placeholder text - will be updated]";
+ let descriptionText = "The durable session ended for an unknown reason.";📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let titleText = "Durable Session (Ended)"; | |
| let descriptionText = "[Placeholder text - will be updated]"; | |
| let showRestartButton = false; | |
| let titleText = "Durable Session (Ended)"; | |
| let descriptionText = "The durable session ended for an unknown reason."; | |
| let showRestartButton = false; |
🤖 Prompt for AI Agents
In `@frontend/app/block/durable-session-flyover.tsx` around lines 201 - 203, The
file defines user-facing strings titleText and descriptionText (and flag
showRestartButton) in durable-session-flyover.tsx with placeholder copy; replace
the placeholder values so no placeholder text ships: update titleText to the
final UX copy, replace descriptionText with the real descriptive message from
product copy, and set showRestartButton according to the intended UX
(true/false) or wire it to the component state/props if dynamic; ensure you
update any related rendering logic that references
titleText/descriptionText/showRestartButton so the new copy displays correctly.
| function getContentToRender( | ||
| viewModel: TermViewModel, | ||
| onClose: () => void, | ||
| jobStatus: BlockJobStatusData, | ||
| connStatus: ConnStatus, | ||
| isConfigedDurable?: boolean | null | ||
| ): string | React.ReactNode { | ||
| if (isConfigedDurable === false) { | ||
| return <StandardSessionContent viewModel={viewModel} onClose={onClose} />; | ||
| } | ||
|
|
||
| const status = jobStatus?.status; | ||
| if (status === "connected") { | ||
| return <DurableAttachedContent onClose={onClose} />; | ||
| } else if (status === "disconnected") { | ||
| return <DurableDetachedContent onClose={onClose} />; | ||
| } else if (status === "init") { | ||
| return <DurableStartingContent onClose={onClose} />; | ||
| } else if (status === "done") { | ||
| const doneReason = jobStatus?.donereason; | ||
| const startupError = jobStatus?.startuperror; | ||
| return ( | ||
| <DurableEndedContent | ||
| doneReason={doneReason} | ||
| startupError={startupError} | ||
| viewModel={viewModel} | ||
| onClose={onClose} | ||
| /> | ||
| ); | ||
| } else if (status == null) { | ||
| return <DurableAwaitingStart connected={!!connStatus?.connected} viewModel={viewModel} onClose={onClose} />; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n --type=ts --type=tsx 'interface ConnStatus|type ConnStatus' -C2Repository: wavetermdev/waveterm
Length of output: 91
🏁 Script executed:
rg -n --type=ts --type=tsx 'connStatus\.(status|connected)' -C2Repository: wavetermdev/waveterm
Length of output: 91
🏁 Script executed:
rg -n 'interface ConnStatus|type ConnStatus' -C2 --type=tsRepository: wavetermdev/waveterm
Length of output: 618
🏁 Script executed:
rg -n 'connStatus\.(status|connected)' -C2Repository: wavetermdev/waveterm
Length of output: 3706
🏁 Script executed:
fd -e ts -e tsx | head -20Repository: wavetermdev/waveterm
Length of output: 578
🏁 Script executed:
sed -n '270,301p' frontend/app/block/durable-session-flyover.tsxRepository: wavetermdev/waveterm
Length of output: 1271
Fix the ConnStatus property access to use .status instead of .connected.
Line 290 accesses connStatus?.connected, which does not exist on the ConnStatus type. The canonical property is .status. Change to connected={connStatus?.status === "connected"} to align with the rest of the codebase.
🤖 Prompt for AI Agents
In `@frontend/app/block/durable-session-flyover.tsx` around lines 270 - 301, In
getContentToRender, the ConnStatus property is accessed as connStatus?.connected
which doesn't exist; update the DurableAwaitingStart prop to compute connected
from connStatus.status (e.g., connected={connStatus?.status === "connected"}) so
it uses the canonical .status field; adjust the call site in getContentToRender
(and any similar usages) and ensure DurableAwaitingStart still receives a
boolean.
| lastAutoReconnectAttempt = ds.MakeSyncMap[int64]() | ||
|
|
There was a problem hiding this comment.
Memory leak: lastAutoReconnectAttempt entries are not cleaned up.
The lastAutoReconnectAttempt map stores timestamps per job ID, but entries are never removed when jobs are deleted. Over time, this could accumulate stale entries.
🛡️ Proposed fix to clean up in DeleteJob
Add cleanup in the DeleteJob function (around line 1321):
func DeleteJob(ctx context.Context, jobId string) error {
SetJobConnStatus(jobId, JobConnStatus_Disconnected)
jobTerminationMessageWritten.Delete(jobId)
+ lastAutoReconnectAttempt.Delete(jobId)
err := filestore.WFS.DeleteZone(ctx, jobId)
if err != nil {
log.Printf("[job:%s] warning: error deleting WaveFS zone: %v", jobId, err)
}
return wstore.DBDelete(ctx, waveobj.OType_Job, jobId)
}🤖 Prompt for AI Agents
In `@pkg/jobcontroller/jobcontroller.go` around lines 99 - 100, The
lastAutoReconnectAttempt sync map (created via MakeSyncMap[int64] and keyed by
job ID) accumulates stale timestamps because entries are never removed when jobs
are removed; update the DeleteJob function to delete the corresponding entry
from lastAutoReconnectAttempt for the deleted job ID (e.g., call
lastAutoReconnectAttempt.Delete(jobID) or equivalent) so that map entries are
cleaned up when a job is deleted. Ensure you reference the same sync map
variable name (lastAutoReconnectAttempt) and perform the removal after
successful job deletion to avoid races.
| func InitJobController() { | ||
| go connReconcileWorker() | ||
| go jobPruningWorker() | ||
|
|
||
| rpcClient := wshclient.GetBareRpcClient() | ||
| rpcClient.EventListener.On(wps.Event_RouteUp, handleRouteUpEvent) | ||
| rpcClient.EventListener.On(wps.Event_RouteDown, handleRouteDownEvent) | ||
| rpcClient.EventListener.On(wps.Event_ConnChange, handleConnChangeEvent) | ||
| rpcClient.EventListener.On(wps.Event_BlockClose, handleBlockCloseEvent) | ||
| wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{ | ||
| Event: wps.Event_RouteUp, | ||
| AllScopes: true, | ||
| }, nil) | ||
| wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{ | ||
| Event: wps.Event_RouteDown, | ||
| AllScopes: true, | ||
| }, nil) | ||
| wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{ | ||
| Event: wps.Event_ConnChange, | ||
| AllScopes: true, | ||
| }, nil) | ||
| wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{ | ||
| Event: wps.Event_BlockClose, | ||
| AllScopes: true, | ||
| }, nil) | ||
| } |
There was a problem hiding this comment.
Error handling for event subscriptions is missing.
The EventSubCommand calls don't check for errors. While failures might be logged elsewhere, silent subscription failures could cause the auto-reconnect feature to not work as expected.
🛡️ Proposed fix to add error handling
func InitJobController() {
go connReconcileWorker()
go jobPruningWorker()
rpcClient := wshclient.GetBareRpcClient()
rpcClient.EventListener.On(wps.Event_RouteUp, handleRouteUpEvent)
rpcClient.EventListener.On(wps.Event_RouteDown, handleRouteDownEvent)
rpcClient.EventListener.On(wps.Event_ConnChange, handleConnChangeEvent)
rpcClient.EventListener.On(wps.Event_BlockClose, handleBlockCloseEvent)
- wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{
+ err := wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{
Event: wps.Event_RouteUp,
AllScopes: true,
}, nil)
- wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{
+ if err != nil {
+ log.Printf("[jobcontroller] error subscribing to RouteUp events: %v", err)
+ }
+ err = wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{
Event: wps.Event_RouteDown,
AllScopes: true,
}, nil)
- wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{
+ if err != nil {
+ log.Printf("[jobcontroller] error subscribing to RouteDown events: %v", err)
+ }
+ err = wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{
Event: wps.Event_ConnChange,
AllScopes: true,
}, nil)
- wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{
+ if err != nil {
+ log.Printf("[jobcontroller] error subscribing to ConnChange events: %v", err)
+ }
+ err = wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{
Event: wps.Event_BlockClose,
AllScopes: true,
}, nil)
+ if err != nil {
+ log.Printf("[jobcontroller] error subscribing to BlockClose events: %v", err)
+ }
}🤖 Prompt for AI Agents
In `@pkg/jobcontroller/jobcontroller.go` around lines 105 - 130, The
EventSubCommand calls in InitJobController are missing error handling; update
the init to capture the returned error from each
wshclient.EventSubCommand(rpcClient, wps.SubscriptionRequest{...}, nil), check
for non-nil error, and log or handle it (e.g., via processLogger or a dedicated
logger) and optionally retry or cancel initialization; also ensure the
EventListener.On registrations (rpcClient.EventListener.On with
handleRouteUpEvent, handleRouteDownEvent, handleConnChangeEvent,
handleBlockCloseEvent) remain before subscriptions so failures can be correlated
with the registered handlers.
| bashPath := fmt.Sprintf("%s/.waveterm/%s/.bashrc", remoteInfo.HomeDir, shellutil.BashIntegrationDir) | ||
| shellOpts = append(shellOpts, "--rcfile", bashPath) | ||
| } else if shellType == shellutil.ShellType_fish { | ||
| if cmdOpts.Login { | ||
| shellOpts = append(shellOpts, "-l") | ||
| } | ||
| // source the wave.fish file | ||
| waveFishPath := fmt.Sprintf("~/.waveterm/%s/wave.fish", shellutil.FishIntegrationDir) | ||
| waveFishPath := fmt.Sprintf("%s/.waveterm/%s/wave.fish", remoteInfo.HomeDir, shellutil.FishIntegrationDir) | ||
| carg := fmt.Sprintf(`"source %s"`, waveFishPath) | ||
| shellOpts = append(shellOpts, "-C", carg) | ||
| } else if shellType == shellutil.ShellType_pwsh { | ||
| pwshPath := fmt.Sprintf("~/.waveterm/%s/wavepwsh.ps1", shellutil.PwshIntegrationDir) | ||
| pwshPath := fmt.Sprintf("%s/.waveterm/%s/wavepwsh.ps1", remoteInfo.HomeDir, shellutil.PwshIntegrationDir) |
There was a problem hiding this comment.
Guard against empty remoteInfo.HomeDir to preserve compatibility.
If remoteInfo.HomeDir is empty (older remote or partial info), the constructed paths become /.waveterm/... and pwsh -File can fail. Consider falling back to "~" (or another safe default) before path construction in both functions.
🛠️ Suggested fix (apply in both StartRemoteShellProc and StartRemoteShellJob)
- if shellType == shellutil.ShellType_bash {
- bashPath := fmt.Sprintf("%s/.waveterm/%s/.bashrc", remoteInfo.HomeDir, shellutil.BashIntegrationDir)
+ homeDir := remoteInfo.HomeDir
+ if homeDir == "" {
+ homeDir = "~"
+ }
+ if shellType == shellutil.ShellType_bash {
+ bashPath := fmt.Sprintf("%s/.waveterm/%s/.bashrc", homeDir, shellutil.BashIntegrationDir)
shellOpts = append(shellOpts, "--rcfile", bashPath)
} else if shellType == shellutil.ShellType_fish {
...
- waveFishPath := fmt.Sprintf("%s/.waveterm/%s/wave.fish", remoteInfo.HomeDir, shellutil.FishIntegrationDir)
+ waveFishPath := fmt.Sprintf("%s/.waveterm/%s/wave.fish", homeDir, shellutil.FishIntegrationDir)
...
} else if shellType == shellutil.ShellType_pwsh {
- pwshPath := fmt.Sprintf("%s/.waveterm/%s/wavepwsh.ps1", remoteInfo.HomeDir, shellutil.PwshIntegrationDir)
+ pwshPath := fmt.Sprintf("%s/.waveterm/%s/wavepwsh.ps1", homeDir, shellutil.PwshIntegrationDir)
...
}Also applies to: 498-509
🤖 Prompt for AI Agents
In `@pkg/shellexec/shellexec.go` around lines 375 - 386, The code builds
integration file paths using remoteInfo.HomeDir which can be empty for
older/partial remotes; update StartRemoteShellProc and StartRemoteShellJob to
guard against an empty remoteInfo.HomeDir by substituting a safe default (e.g.,
"~") before constructing bashPath, waveFishPath, pwshPath and any other paths;
ensure the check happens once after obtaining remoteInfo and use the sanitized
homeDir variable in all fmt.Sprintf calls and option construction so pwsh -File
and other shell flags receive valid paths.
| waveFishPath := fmt.Sprintf("%s/.waveterm/%s/wave.fish", remoteInfo.HomeDir, shellutil.FishIntegrationDir) | ||
| carg := fmt.Sprintf(`"source %s"`, waveFishPath) | ||
| shellOpts = append(shellOpts, "-C", carg) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the specific lines mentioned in the review
head -n 384 pkg/shellexec/shellexec.go | tail -n 10Repository: wavetermdev/waveterm
Length of output: 580
🏁 Script executed:
# Also check lines around 504-506
sed -n '500,510p' pkg/shellexec/shellexec.goRepository: wavetermdev/waveterm
Length of output: 652
🏁 Script executed:
# Search for the HardQuoteFish function mentioned in the suggested fix
rg "HardQuoteFish" --type goRepository: wavetermdev/waveterm
Length of output: 332
🏁 Script executed:
# Check the HardQuoteFish implementation
cat -n pkg/util/shellutil/shellquote.go | grep -A 15 "func HardQuoteFish"Repository: wavetermdev/waveterm
Length of output: 395
🏁 Script executed:
# Get broader context around line 382
sed -n '370,390p' pkg/shellexec/shellexec.go | cat -nRepository: wavetermdev/waveterm
Length of output: 1295
🏁 Script executed:
# Get broader context around line 504
sed -n '495,515p' pkg/shellexec/shellexec.go | cat -nRepository: wavetermdev/waveterm
Length of output: 1158
🏁 Script executed:
# Search for all occurrences of waveFishPath in the file
grep -n "waveFishPath" pkg/shellexec/shellexec.goRepository: wavetermdev/waveterm
Length of output: 689
🏁 Script executed:
# Get context for line 230
sed -n '220,235p' pkg/shellexec/shellexec.go | cat -nRepository: wavetermdev/waveterm
Length of output: 980
🏁 Script executed:
# Get the full HardQuoteFish implementation
sed -n '54,80p' pkg/util/shellutil/shellquote.go | cat -nRepository: wavetermdev/waveterm
Length of output: 665
Quote waveFishPath with HardQuoteFish to handle spaces in paths.
The fish shell -C argument constructs a source command with an unquoted path. If the path contains spaces (e.g., remote home directory with spaces), fish will split the argument and fail. Use shellutil.HardQuoteFish() to safely escape the path.
This applies at three locations:
- Lines 382-384 (interactive remote)
- Lines 504-506 (non-interactive remote)
- Lines 230-232 (local shell)
🛠️ Suggested fixes
Lines 382-384:
- carg := fmt.Sprintf(`"source %s"`, waveFishPath)
+ carg := fmt.Sprintf(`"source %s"`, shellutil.HardQuoteFish(waveFishPath))
shellOpts = append(shellOpts, "-C", carg)Lines 504-506:
- carg := fmt.Sprintf(`source %s`, waveFishPath)
+ carg := fmt.Sprintf(`source %s`, shellutil.HardQuoteFish(waveFishPath))
shellOpts = append(shellOpts, "-C", carg)Lines 230-232:
- carg := fmt.Sprintf(`"source %s"`, waveFishPath)
+ carg := fmt.Sprintf(`"source %s"`, shellutil.HardQuoteFish(waveFishPath))
shellOpts = append(shellOpts, "-C", carg)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| waveFishPath := fmt.Sprintf("%s/.waveterm/%s/wave.fish", remoteInfo.HomeDir, shellutil.FishIntegrationDir) | |
| carg := fmt.Sprintf(`"source %s"`, waveFishPath) | |
| shellOpts = append(shellOpts, "-C", carg) | |
| waveFishPath := fmt.Sprintf("%s/.waveterm/%s/wave.fish", remoteInfo.HomeDir, shellutil.FishIntegrationDir) | |
| carg := fmt.Sprintf(`"source %s"`, shellutil.HardQuoteFish(waveFishPath)) | |
| shellOpts = append(shellOpts, "-C", carg) |
🤖 Prompt for AI Agents
In `@pkg/shellexec/shellexec.go` around lines 382 - 384, The fish `-C` source
command is built with an unquoted path causing failures on paths with spaces;
update the construction of carg to call shellutil.HardQuoteFish(waveFishPath)
and embed that quoted value in the `"source %s"` string before appending to
shellOpts (replace the existing waveFishPath usage). Apply this change in all
three locations where carg and waveFishPath are created: the interactive remote
block (where shellOpts is appended with "-C", carg), the non-interactive remote
block, and the local shell block so that carg uses shellutil.HardQuoteFish and
shellOpts remains unchanged otherwise.