Commit Graph

1573 Commits

Author SHA1 Message Date
liuxiongfeng
57a778dccf chore: bump version to 0.1.79.1 2026-02-10 23:48:23 +08:00
liuxiongfeng
f702c66659 fix: resolve gofmt alignment issue in failover_loop_test.go
Move inline comments to separate lines to avoid gofmt
consecutive-line comment alignment requirements.
2026-02-10 23:40:37 +08:00
liuxiongfeng
a095468850 fix: correct import path in failover_loop.go and failover_loop_test.go
Use github.com/Wei-Shaw/sub2api/internal/service instead of
sub2api/internal/service to match the module path in go.mod.
2026-02-10 23:35:21 +08:00
liuxiongfeng
f9b6a20995 Merge tag 'v0.1.79' into develop
增强错误处理与重试机制,新增 MODEL_CAPACITY_EXHAUSTED 同账号固定间隔重试、瞬态错误同账号重试优先于故障转移,并大幅优化错误匹配性能。

- MODEL_CAPACITY_EXHAUSTED (503) 使用固定 1s 间隔重试最多 60 次,不切换账号
- 瞬态错误(Google 400、空流响应)同账号重试 2 次后再触发故障转移
- 空流响应触发 failover 自动换号重试,不再直接返回 502
- Google "Invalid project resource name" 400 错误触发 failover 并临时封禁账号 1 小时
- 错误透传规则新增 skip_monitoring 选项,匹配的错误不记录到运维监控日志
- Antigravity 转发支持 daily/prod 单 URL 切换

- 错误匹配性能优化:延迟/限制 body ToLower,预计算规则关键词和平台集合
- MODEL_CAPACITY_EXHAUSTED 全局去重,避免并发请求重复重试
- 503 重试 body 读取限制从 2MB 降至 8KB
- time.After 替换为 time.NewTimer,防止 context 取消时 timer 泄漏
- 临时封禁冷却时间从 30 分钟缩短至 1 分钟(同账号重试耗尽后)

- 修复错误透传规则 skip_monitoring 未生效的问题
- 修复 CI 检查失败(gofmt、errcheck、staticcheck)

# Conflicts:
#	backend/internal/service/error_passthrough_runtime_test.go
2026-02-10 23:25:51 +08:00
liuxiongfeng
f2770da880 refactor: extract failover error handling into shared HandleFailoverError
- Extract duplicated failover error handling from gateway_handler.go (Gemini-compat & Claude paths) and gemini_v1beta_handler.go into shared failover_loop.go
- Introduce TempUnscheduler interface for testability (GatewayService implicitly satisfies it)
- Add comprehensive unit tests for HandleFailoverError (32 test cases covering all paths)
- Fix golangci-lint issues: errcheck in test type assertion, staticcheck QF1003 if/else→switch
2026-02-10 23:13:37 +08:00
Wesley Liddick
ae6fed15cc Merge pull request #548 from Edric-Li/main
feat: 错误处理增强、重试优化与性能改进
v0.1.79
2026-02-10 22:46:58 +08:00
Edric Li
378e476e48 fix: 修复 CI 检查失败
- gofmt: 修复 error_passthrough_service.go 格式问题
- errcheck: 修复 error_passthrough_runtime_test.go 类型断言未检查
- staticcheck: if-else 改为 switch (gateway_service.go)
- test: 修复两个测试用例错误使用 MODEL_CAPACITY_EXHAUSTED 导致走错路径
2026-02-10 22:08:49 +08:00
Edric Li
2a1067c82b Merge remote-tracking branch 'upstream/main' 2026-02-10 21:52:33 +08:00
Edric Li
a54b81cf74 perf: 错误处理性能优化
- MatchRule 延迟/限制 body ToLower,先用 statusCode 短路,只在需要关键词匹配时转换且限制 8KB
- 预计算规则的小写关键词/平台和 error code set,消除运行时重复 ToLower 和线性扫描
- MODEL_CAPACITY_EXHAUSTED 全局去重,避免并发请求重复重试同一模型
- 503 重试 body 读取限制从 2MB 降至 8KB
- time.After 替换为 time.NewTimer,防止 context 取消时 timer 泄漏
2026-02-10 21:40:31 +08:00
liuxiongfeng
d269659e61 chore: bump version to 0.1.78.2 2026-02-10 21:28:52 +08:00
liuxiongfeng
c4d6715443 chore: squash merge customizations from develop-old-0.1.77
- 定制文档: CLAUDE.md, AGENTS.md
- UI定制: 微信客服按钮, 首页改造, 移除GitHub链接
- 部署运维: docker-compose.yml, 压测脚本
- CI/gitignore 小改动
2026-02-10 20:59:54 +08:00
Edric Li
2d4236f76e fix: 修复错误透传规则 skip_monitoring 未生效的问题
- ops_error_logger: status < 400 分支增加 OpsSkipPassthroughKey 检查
- ops_upstream_context: 新增 checkSkipMonitoringForUpstreamEvent,中间重试/故障转移事件也能触发跳过标记
- gateway_handler/openai_gateway_handler/gemini_v1beta_handler: handleFailoverExhausted 匹配规则后设置 OpsSkipPassthroughKey
- antigravity_gateway_service: writeMappedClaudeError 增加 applyErrorPassthroughRule 调用
2026-02-10 20:56:01 +08:00
Wesley Liddick
84ced1c497 Merge pull request #543 from slovx2/upstream_main
feat(antigravity): 转发与测试支持 daily/prod 单 URL 切换
2026-02-10 14:57:46 +08:00
song
b161312183 test(antigravity): 更新单URL策略下的重试断言 2026-02-10 14:36:09 +08:00
song
1f647b120a feat(antigravity): 转发与测试支持daily/prod单URL切换 2026-02-10 13:51:29 +08:00
Edric Li
7d0a30fa8f merge: sync upstream main (antigravity single-account 503 retry)
合并上游新增的 Antigravity 单账号 503 退避重试机制,
解决与本地 MODEL_CAPACITY_EXHAUSTED 逻辑的冲突,两者共存。
2026-02-10 12:00:21 +08:00
Edric Li
d95e04fd1f feat: 错误透传规则支持 skip_monitoring 跳过运维监控记录
在每条错误透传规则上新增 skip_monitoring 选项,开启后匹配该规则的错误
不会被记录到 ops_error_logs,减少监控噪音。默认关闭,不影响现有规则。
2026-02-10 11:42:39 +08:00
erio
fc4a1c5433 Merge branch 'release/custom-0.1.78'
# Conflicts:
#	backend/cmd/server/VERSION
2026-02-10 11:41:29 +08:00
erio
6bdd580b3f chore: bump version to 0.1.78.1 2026-02-10 11:40:36 +08:00
erio
9cf4882f4c Merge tag 'v0.1.78' into develop
Resolve conflict in antigravity_gateway_service.go by keeping both
retry strategies:
- MODEL_CAPACITY_EXHAUSTED: handleModelCapacityExhaustedRetry (ours)
- Single-account 503 long delay: handleSingleAccountRetryInPlace (upstream)

Update tests to reflect that MODEL_CAPACITY_EXHAUSTED always goes
through capacity retry regardless of single-account mode.
2026-02-10 11:32:56 +08:00
erio
406dad998d chore: bump version to 0.1.77.2 2026-02-10 10:59:34 +08:00
erio
8b0db22c18 Merge branch 'develop' into release/custom-0.1.77 2026-02-10 10:58:52 +08:00
shaw
5dd83d3cf2 fix: 移除特定system以适配新版cc客户端缓存失效的bug v0.1.78 2026-02-10 10:28:34 +08:00
Wesley Liddick
14e1aac9b5 Merge pull request #533 from GuangYiDing/feat/antigravity-single-account-503-retry
feat: Antigravity 单账号分组 503 退避重试机制
2026-02-10 09:59:48 +08:00
erio
f06048eccf fix: simplify MODEL_CAPACITY_EXHAUSTED to single retry for all cases
Both short (<20s) and long (>=20s/missing) retryDelay now retry once:
- Short: wait actual retryDelay, retry once
- Long/missing: wait 20s, retry once
- Still capacity exhausted: switch account
- Different error: let upper layer handle
2026-02-10 04:05:20 +08:00
erio
05f5a8b61d fix: use switch statement for staticcheck QF1003 compliance 2026-02-10 03:59:39 +08:00
erio
662625a091 feat: optimize MODEL_CAPACITY_EXHAUSTED retry and remove extra failover retries
- MODEL_CAPACITY_EXHAUSTED now uses independent retry strategy:
  - retryDelay < 20s: wait actual retryDelay then retry once
  - retryDelay >= 20s or missing: retry up to 5 times at 20s intervals
  - Still capacity exhausted after retries: switch account (failover)
  - Different error during retry (e.g. 429): handle by actual error code
  - No model rate limit set (capacity != rate limit)

- Remove Antigravity extra failover retries feature:
  Same-account retry mechanism (cherry-picked) makes it redundant.
  Removed: antigravityExtraRetries config, sleepFixedDelay, skip-non-antigravity logic.
2026-02-10 03:47:40 +08:00
Edric Li
6328e69441 feat: same-account retry before failover for transient errors
For retryable transient errors (Google 400 "invalid project resource name"
and empty stream responses), retry on the same account up to 2 times
(with 500ms delay) before switching to another account.

- Add RetryableOnSameAccount field to UpstreamFailoverError
- Add same-account retry loop in both Gemini and Claude/OpenAI handler paths
- Move temp-unschedule from service layer to handler layer (only after
  all same-account retries exhausted)
- Reduce temp-unschedule cooldown from 30 minutes to 1 minute
2026-02-10 03:27:19 +08:00
Edric Li
425dfb80d9 feat: failover and temp-unschedule on empty stream response
- Empty stream responses now return UpstreamFailoverError instead of
  plain 502, triggering automatic account switching (up to 10 retries)
- Add tempUnscheduleEmptyResponse: accounts returning empty responses
  are temp-unscheduled for 30 minutes
- Apply to both Claude and Gemini non-streaming paths
- Align googleConfigErrorCooldown from 60m to 30m for consistency
2026-02-10 03:27:02 +08:00
Edric Li
4c1fd570f0 feat: failover and temp-unschedule on Google "Invalid project resource name" 400
Google 后端间歇性返回 400 "Invalid project resource name" 错误,
此前该错误直接透传给客户端且不触发账号切换,导致请求失败。

- 在 Antigravity 和 Gemini 两个平台的所有转发路径中,
  精确匹配该错误消息后触发 failover 自动换号重试
- 命中后将账号临时封禁 1 小时,避免反复调度到同一故障账号
- 提取共享函数 isGoogleProjectConfigError / tempUnscheduleGoogleConfigError
  消除跨 Service 的代码重复
2026-02-10 03:26:51 +08:00
Edric Li
6114f69cca feat: MODEL_CAPACITY_EXHAUSTED 使用固定1s间隔重试60次,不切换账号
MODEL_CAPACITY_EXHAUSTED (503) 表示模型容量不足,所有账号共享同一容量池,
切换账号无意义。改为固定1s间隔重试最多60次,重试耗尽后直接返回上游错误。

- 新增 antigravityModelCapacityRetryMaxAttempts=60 和 antigravityModelCapacityRetryWait=1s
- shouldTriggerAntigravitySmartRetry 新增 isModelCapacityExhausted 返回值
- handleSmartRetry 对 MODEL_CAPACITY_EXHAUSTED 使用独立重试策略
- handleModelRateLimit 对 MODEL_CAPACITY_EXHAUSTED 仅标记 Handled,不设限流
- 重试耗尽后不设置模型限流、不清除粘性会话、不切换账号
2026-02-10 02:03:06 +08:00
Edric Li
d6c2921f2b feat: same-account retry before failover for transient errors
For retryable transient errors (Google 400 "invalid project resource name"
and empty stream responses), retry on the same account up to 2 times
(with 500ms delay) before switching to another account.

- Add RetryableOnSameAccount field to UpstreamFailoverError
- Add same-account retry loop in both Gemini and Claude/OpenAI handler paths
- Move temp-unschedule from service layer to handler layer (only after
  all same-account retries exhausted)
- Reduce temp-unschedule cooldown from 30 minutes to 1 minute
2026-02-10 00:53:54 +08:00
Edric Li
61c73287dc feat: failover and temp-unschedule on empty stream response
- Empty stream responses now return UpstreamFailoverError instead of
  plain 502, triggering automatic account switching (up to 10 retries)
- Add tempUnscheduleEmptyResponse: accounts returning empty responses
  are temp-unscheduled for 30 minutes
- Apply to both Claude and Gemini non-streaming paths
- Align googleConfigErrorCooldown from 60m to 30m for consistency
2026-02-09 23:25:30 +08:00
Edric Li
89905ec43d feat: failover and temp-unschedule on Google "Invalid project resource name" 400
Google 后端间歇性返回 400 "Invalid project resource name" 错误,
此前该错误直接透传给客户端且不触发账号切换,导致请求失败。

- 在 Antigravity 和 Gemini 两个平台的所有转发路径中,
  精确匹配该错误消息后触发 failover 自动换号重试
- 命中后将账号临时封禁 1 小时,避免反复调度到同一故障账号
- 提取共享函数 isGoogleProjectConfigError / tempUnscheduleGoogleConfigError
  消除跨 Service 的代码重复
2026-02-09 22:48:32 +08:00
erio
345f853b5d chore: bump version to 0.1.77.1 2026-02-09 22:27:47 +08:00
erio
100a70f87c Merge remote-tracking branch 'upstream/main' into develop 2026-02-09 22:27:07 +08:00
shaw
aa4b102108 fix: 移除Antigravity的apikey账户额外的表单 v0.1.77 2026-02-09 22:15:14 +08:00
erio
18b591bc3b feat: Antigravity extra failover retries after default retries exhausted
When default failover retries are exhausted, continue retrying with
Antigravity accounts only (up to 10 times, configurable via
GATEWAY_ANTIGRAVITY_EXTRA_RETRIES). Each extra retry uses a fixed
500ms delay. Non-Antigravity accounts are skipped during the extra
retry phase. Applied to all three endpoints: Gemini compat, Claude,
and Gemini native API paths.
2026-02-09 22:13:44 +08:00
Rose Ding
e4bc35151f test: 添加单账号 503 退避重试机制的单元测试
覆盖 Service 层和 Handler 层的所有新增逻辑:
- isSingleAccountRetry context 标记检查
- handleSmartRetry 中 503 + SingleAccountRetry 分支
- handleSingleAccountRetryInPlace 原地重试逻辑
- antigravityRetryLoop 预检查跳过限流
- sleepAntigravitySingleAccountBackoff 固定延迟退避
- 端到端集成场景验证

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 22:06:06 +08:00
Wesley Liddick
56da498b7e Merge pull request #532 from touwaeriol/fix/clear-model-rate-limits
fix: support clearing model-level rate limits from action menu and temp-unsched reset
2026-02-09 20:52:44 +08:00
Wesley Liddick
1bba1a62b1 Merge pull request #531 from touwaeriol/fix/gemini-error-policy-before-retry
fix: Gemini error policy check should precede retry logic
2026-02-09 20:52:32 +08:00
erio
4a84ca9a02 fix: support clearing model-level rate limits from action menu and temp-unsched reset 2026-02-09 20:37:30 +08:00
erio
6a52b24369 Merge branch 'develop'
# Conflicts:
#	backend/cmd/server/VERSION
2026-02-09 20:13:54 +08:00
erio
228aca9523 Merge branch 'fix/gemini-error-policy-before-retry' into develop
# Conflicts:
#	backend/cmd/server/VERSION
2026-02-09 20:08:31 +08:00
erio
7e4637cd70 fix: support clearing model-level rate limits from action menu and temp-unsched reset 2026-02-09 20:08:00 +08:00
erio
a70d37a676 fix: Gemini error policy check should precede retry logic 2026-02-09 19:55:17 +08:00
erio
6892e84ad2 fix: skip rate limiting when custom error codes don't match upstream status
Add ShouldHandleErrorCode guard at the entry of handleGeminiUpstreamError
and AntigravityGatewayService.handleUpstreamError so that accounts with
custom error codes (e.g. [599]) are not rate-limited when the upstream
returns a non-matching status (e.g. 429).
2026-02-09 19:55:05 +08:00
erio
73f455745c feat: ErrorPolicySkipped returns 500 instead of upstream status code
When custom error codes are enabled and the upstream error code is NOT
in the configured list, return HTTP 500 to the client instead of
transparently forwarding the original status code.

Also adds integration test TestCustomErrorCode599 verifying that 429,
500, 503, 401, 403 all return 500 without triggering SetRateLimited
or SetError.
2026-02-09 19:54:54 +08:00
erio
3e3c015efa fix: Gemini error policy check should precede retry logic 2026-02-09 19:22:32 +08:00
erio
30c30b1712 fix: skip rate limiting when custom error codes don't match upstream status
Add ShouldHandleErrorCode guard at the entry of handleGeminiUpstreamError
and AntigravityGatewayService.handleUpstreamError so that accounts with
custom error codes (e.g. [599]) are not rate-limited when the upstream
returns a non-matching status (e.g. 429).
2026-02-09 18:53:52 +08:00