fix(gateway): wrap Anthropic stream EOF as failover error before client output

Anthropic streaming path (gateway_service.go) returned a plain error on
upstream SSE read failure, so the handler-level UpstreamFailoverError check
never fired and the client received a bare `stream_read_error` event,
breaking long-running tasks even when no bytes had been written yet.

The most common trigger is HTTP/2 GOAWAY from api.anthropic.com edge
backends doing graceful rotation: Go's http.Transport surfaces this as
`unexpected EOF` and never auto-retries.

Mirror what the OpenAI and antigravity gateways already do: when the read
error happens before any byte has reached the client (`!c.Writer.Written()`),
return `*UpstreamFailoverError{StatusCode: 502, RetryableOnSameAccount: true}`
so the handler can retry on the same or another account. After client
output has begun, SSE has no resume protocol — keep the existing passthrough
behavior.

Tests cover both branches via streamReadCloser-based fixtures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
alfadb
2026-04-28 19:12:48 +08:00
parent b0a2252ed1
commit 6327573534
2 changed files with 84 additions and 0 deletions

View File

@@ -7041,6 +7041,20 @@ func (s *GatewayService) handleStreamingResponse(ctx context.Context, resp *http
sendErrorEvent("response_too_large")
return &streamingResult{usage: usage, firstTokenMs: firstTokenMs}, ev.err
}
// 上游中途读错误unexpected EOF / connection reset 等,常见于 HTTP/2 GOAWAY
// 若尚未向客户端写过任何字节,包成 UpstreamFailoverError 让 handler 层走 failover/重试。
// 已经开始写流时 SSE 协议无 resume只能透传错误事件给客户端。
if !c.Writer.Written() {
logger.LegacyPrintf("service.gateway", "Upstream stream read error before any client output (account=%d), failing over: %v", account.ID, ev.err)
body, _ := json.Marshal(map[string]string{
"error": fmt.Sprintf("upstream stream disconnected: %s", ev.err),
})
return nil, &UpstreamFailoverError{
StatusCode: http.StatusBadGateway,
ResponseBody: body,
RetryableOnSameAccount: true,
}
}
sendErrorEvent("stream_read_error")
return &streamingResult{usage: usage, firstTokenMs: firstTokenMs}, fmt.Errorf("stream read error: %w", ev.err)
}