Gemini 2.5 Pro/Flash thinking models return thoughtsTokenCount separately
from candidatesTokenCount in usageMetadata, but this field was not parsed
or included in billing calculations, causing thinking tokens to be
unbilled.
- Add ThoughtsTokenCount field to GeminiUsageMetadata struct
- Include thoughtsTokenCount in OutputTokens across all 3 Gemini usage
parsing paths (non-streaming, streaming, compat layer)
- Add tests covering thinking token scenarios
Closes#554
When client disconnects during upstream request, the error was
incorrectly reported as "Upstream request failed after retries".
Now checks context cancellation first and returns
"Client disconnected before upstream response" instead.
- Replace heavy RefreshAccountToken with lightweight tryFillProjectID
(loadCodeAssist → onboardUser → fallback), consistent with
Antigravity-Manager's behavior
- Add sync.Map cooldown/dedup (60s) to prevent repeated fill attempts
- Add fallback project_id "bamboo-precept-lgxtn" matching AM
- Extract mergeCredentials helper to eliminate duplication
- Use slog structured logging instead of log.Printf
- Fix time.Sleep in OnboardUser to context-aware select
- Fix strings.NewReader(string(bodyBytes)) → bytes.NewReader(bodyBytes)
- Remove redundant tc := tc in test (Go 1.22+)
- Add nil guard in persistProjectID for test safety
- Extract duplicated failover error handling from gateway_handler.go (Gemini-compat & Claude paths) and gemini_v1beta_handler.go into shared failover_loop.go
- Introduce TempUnscheduler interface for testability (GatewayService implicitly satisfies it)
- Add comprehensive unit tests for HandleFailoverError (32 test cases covering all paths)
- Fix golangci-lint issues: errcheck in test type assertion, staticcheck QF1003 if/else→switch
Resolve conflict in antigravity_gateway_service.go by keeping both
retry strategies:
- MODEL_CAPACITY_EXHAUSTED: handleModelCapacityExhaustedRetry (ours)
- Single-account 503 long delay: handleSingleAccountRetryInPlace (upstream)
Update tests to reflect that MODEL_CAPACITY_EXHAUSTED always goes
through capacity retry regardless of single-account mode.
Both short (<20s) and long (>=20s/missing) retryDelay now retry once:
- Short: wait actual retryDelay, retry once
- Long/missing: wait 20s, retry once
- Still capacity exhausted: switch account
- Different error: let upper layer handle
- MODEL_CAPACITY_EXHAUSTED now uses independent retry strategy:
- retryDelay < 20s: wait actual retryDelay then retry once
- retryDelay >= 20s or missing: retry up to 5 times at 20s intervals
- Still capacity exhausted after retries: switch account (failover)
- Different error during retry (e.g. 429): handle by actual error code
- No model rate limit set (capacity != rate limit)
- Remove Antigravity extra failover retries feature:
Same-account retry mechanism (cherry-picked) makes it redundant.
Removed: antigravityExtraRetries config, sleepFixedDelay, skip-non-antigravity logic.
For retryable transient errors (Google 400 "invalid project resource name"
and empty stream responses), retry on the same account up to 2 times
(with 500ms delay) before switching to another account.
- Add RetryableOnSameAccount field to UpstreamFailoverError
- Add same-account retry loop in both Gemini and Claude/OpenAI handler paths
- Move temp-unschedule from service layer to handler layer (only after
all same-account retries exhausted)
- Reduce temp-unschedule cooldown from 30 minutes to 1 minute
- Empty stream responses now return UpstreamFailoverError instead of
plain 502, triggering automatic account switching (up to 10 retries)
- Add tempUnscheduleEmptyResponse: accounts returning empty responses
are temp-unscheduled for 30 minutes
- Apply to both Claude and Gemini non-streaming paths
- Align googleConfigErrorCooldown from 60m to 30m for consistency
For retryable transient errors (Google 400 "invalid project resource name"
and empty stream responses), retry on the same account up to 2 times
(with 500ms delay) before switching to another account.
- Add RetryableOnSameAccount field to UpstreamFailoverError
- Add same-account retry loop in both Gemini and Claude/OpenAI handler paths
- Move temp-unschedule from service layer to handler layer (only after
all same-account retries exhausted)
- Reduce temp-unschedule cooldown from 30 minutes to 1 minute
- Empty stream responses now return UpstreamFailoverError instead of
plain 502, triggering automatic account switching (up to 10 retries)
- Add tempUnscheduleEmptyResponse: accounts returning empty responses
are temp-unscheduled for 30 minutes
- Apply to both Claude and Gemini non-streaming paths
- Align googleConfigErrorCooldown from 60m to 30m for consistency