diff --git a/.github/workflows/backend-ci.yml b/.github/workflows/backend-ci.yml index 2596a18c..84575a96 100644 --- a/.github/workflows/backend-ci.yml +++ b/.github/workflows/backend-ci.yml @@ -17,6 +17,7 @@ jobs: go-version-file: backend/go.mod check-latest: false cache: true + cache-dependency-path: backend/go.sum - name: Verify Go version run: | go version | grep -q 'go1.25.7' @@ -36,6 +37,7 @@ jobs: go-version-file: backend/go.mod check-latest: false cache: true + cache-dependency-path: backend/go.sum - name: Verify Go version run: | go version | grep -q 'go1.25.7' diff --git a/.gitignore b/.gitignore index 48172982..925912fa 100644 --- a/.gitignore +++ b/.gitignore @@ -78,6 +78,7 @@ Desktop.ini # =================== tmp/ temp/ +logs/ *.tmp *.temp *.log @@ -129,4 +130,12 @@ deploy/docker-compose.override.yml .gocache/ vite.config.js docs/* -.serena/ \ No newline at end of file +.serena/ + +# =================== +# 压测工具 +# =================== +tools/loadtest/ +# Antigravity Manager +Antigravity-Manager/ +antigravity_projectid_fix.patch diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000..0202e94f --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,1210 @@ +# Sub2API 开发说明 + +## 版本管理策略 + +### 版本号规则 + +我们在官方版本号后面添加自己的小版本号: + +- 官方版本:`v0.1.68` +- 我们的版本:`v0.1.68.1`、`v0.1.68.2`(递增) + +### 分支策略 + +| 分支 | 说明 | +|------|------| +| `main` | 我们的主分支,包含所有定制功能 | +| `release/custom-X.Y.Z` | 基于官方 `vX.Y.Z` 的发布分支 | +| `upstream/main` | 上游官方仓库 | + +--- + +## 发布流程(基于新官方版本) + +当官方发布新版本(如 `v0.1.69`)时: + +### 1. 同步上游并创建发布分支 + +```bash +# 获取上游最新代码 +git fetch upstream --tags + +# 基于官方标签创建新的发布分支 +git checkout v0.1.69 -b release/custom-0.1.69 + +# 合并我们的 main 分支(包含所有定制功能) +git merge main --no-edit + +# 解决可能的冲突后继续 +``` + +### 2. 更新版本号并打标签 + +```bash +# 更新版本号文件 +echo "0.1.69.1" > backend/cmd/server/VERSION +git add backend/cmd/server/VERSION +git commit -m "chore: bump version to 0.1.69.1" + +# 打上我们自己的标签 +git tag v0.1.69.1 + +# 推送分支和标签 +git push origin release/custom-0.1.69 +git push origin v0.1.69.1 +``` + +### 3. 更新 main 分支 + +```bash +# 将发布分支合并回 main,保持 main 包含最新定制功能 +git checkout main +git merge release/custom-0.1.69 +git push origin main +``` + +--- + +## 热修复发布(在现有版本上修复) + +当需要在当前版本上发布修复时: + +```bash +# 在当前发布分支上修复 +git checkout release/custom-0.1.68 +# ... 进行修复 ... +git commit -m "fix: 修复描述" + +# 递增小版本号 +echo "0.1.68.2" > backend/cmd/server/VERSION +git add backend/cmd/server/VERSION +git commit -m "chore: bump version to 0.1.68.2" + +# 打标签并推送 +git tag v0.1.68.2 +git push origin release/custom-0.1.68 +git push origin v0.1.68.2 + +# 同步修复到 main +git checkout main +git cherry-pick +git push origin main +``` + +--- + +## 服务器部署流程 + +### 前置条件 + +- 本地已配置 SSH 别名 `clicodeplus` 连接到生产服务器(运行服务) +- 本地已配置 SSH 别名 `us-asaki-root` 连接到构建服务器(拉取代码、构建镜像) +- 生产服务器部署目录:`/root/sub2api`(正式)、`/root/sub2api-beta`(测试) +- 生产服务器使用 Docker Compose 部署 +- **镜像统一在构建服务器上构建**,避免生产服务器因编译占用 CPU/内存影响线上服务 + +### 服务器角色说明 + +| 服务器 | SSH 别名 | 职责 | +|--------|----------|------| +| 构建服务器 | `us-asaki-root` | 拉取代码、`docker build` 构建镜像 | +| 生产服务器 | `clicodeplus` | 加载镜像、运行服务、部署验证 | + +### 部署环境说明 + +| 环境 | 目录(生产服务器) | 端口 | 数据库 | 容器名 | +|------|------|------|--------|--------| +| 正式 | `/root/sub2api` | 8080 | `sub2api` | `sub2api` | +| Beta | `/root/sub2api-beta` | 8084 | `beta` | `sub2api-beta` | + +### 外部数据库 + +正式和 Beta 环境**共用外部 PostgreSQL 数据库**(非容器内数据库),配置在 `.env` 文件中: +- `DATABASE_HOST`:外部数据库地址 +- `DATABASE_SSLMODE`:SSL 模式(通常为 `require`) +- `POSTGRES_USER` / `POSTGRES_DB`:用户名和数据库名 + +#### 数据库操作命令 + +通过 SSH 在服务器上执行数据库操作: + +```bash +# 正式环境 - 查询迁移记录 +ssh clicodeplus "source /root/sub2api/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c 'SELECT * FROM schema_migrations ORDER BY applied_at DESC LIMIT 5;'" + +# Beta 环境 - 查询迁移记录 +ssh clicodeplus "source /root/sub2api-beta/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c 'SELECT * FROM schema_migrations ORDER BY applied_at DESC LIMIT 5;'" + +# Beta 环境 - 清除指定迁移记录(重新执行迁移) +ssh clicodeplus "source /root/sub2api-beta/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c \"DELETE FROM schema_migrations WHERE filename LIKE '%049%';\"" + +# Beta 环境 - 更新账号数据 +ssh clicodeplus "source /root/sub2api-beta/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c \"UPDATE accounts SET credentials = credentials - 'model_mapping' WHERE platform = 'antigravity';\"" +``` + +> **注意**:使用 `source .env` 加载环境变量,避免在命令行中暴露密码。 + +### 部署步骤 + +**重要:每次部署都必须递增版本号!** + +#### 0. 递增版本号并推送(本地操作) + +每次部署前,先在本地递增小版本号并确保推送成功: + +```bash +# 查看当前版本号 +cat backend/cmd/server/VERSION +# 假设当前是 0.1.69.1 + +# 递增版本号 +echo "0.1.69.2" > backend/cmd/server/VERSION +git add backend/cmd/server/VERSION +git commit -m "chore: bump version to 0.1.69.2" +git push origin release/custom-0.1.69 + +# ⚠️ 确认推送成功(必须看到分支更新输出,不能有 rejected 错误) +``` + +> **检查点**:如果有其他未提交的改动,应先 commit 并 push,确保 release 分支上的所有代码都已推送到远程。 + +#### 1. 构建服务器拉取代码 + +```bash +# 拉取最新代码并切换分支 +ssh us-asaki-root "cd /root/sub2api && git fetch origin && git checkout -B release/custom-0.1.69 origin/release/custom-0.1.69" + +# ⚠️ 验证版本号与步骤 0 一致 +ssh us-asaki-root "cat /root/sub2api/backend/cmd/server/VERSION" +``` + +> **首次使用构建服务器?** 需要先初始化仓库,参见下方「构建服务器首次初始化」章节。 + +#### 2. 构建服务器构建镜像 + +```bash +ssh us-asaki-root "cd /root/sub2api && docker build --no-cache -t sub2api:latest -f Dockerfile ." + +# ⚠️ 必须看到构建成功输出,如果失败需要先排查问题 +``` + +> **常见构建问题**: +> - `buildx` 版本过旧导致 API 版本不兼容 → 更新 buildx:`curl -fsSL "https://github.com/docker/buildx/releases/latest/download/buildx-$(curl -fsSL https://api.github.com/repos/docker/buildx/releases/latest | grep tag_name | cut -d'"' -f4).linux-amd64" -o ~/.docker/cli-plugins/docker-buildx && chmod +x ~/.docker/cli-plugins/docker-buildx` +> - 磁盘空间不足 → `docker system prune -f` 清理无用镜像 + +#### 3. 传输镜像到生产服务器并加载 + +```bash +# 导出镜像 → 通过管道传输 → 生产服务器加载 +ssh us-asaki-root "docker save sub2api:latest" | ssh clicodeplus "docker load" + +# ⚠️ 必须看到 "Loaded image: sub2api:latest" 输出 +``` + +#### 4. 生产服务器同步代码、更新标签并重启 + +```bash +# 同步代码(用于版本号确认和 deploy 配置) +ssh clicodeplus "cd /root/sub2api && git fetch fork && git checkout -B release/custom-0.1.69 fork/release/custom-0.1.69" + +# 更新镜像标签并重启 +ssh clicodeplus "docker tag sub2api:latest weishaw/sub2api:latest" +ssh clicodeplus "cd /root/sub2api/deploy && docker compose up -d --force-recreate sub2api" +``` + +#### 5. 验证部署 + +```bash +# 查看启动日志 +ssh clicodeplus "docker logs sub2api --tail 20" + +# 确认版本号(必须与步骤 0 中设置的版本号一致) +ssh clicodeplus "cat /root/sub2api/backend/cmd/server/VERSION" + +# 检查容器状态(必须显示 healthy) +ssh clicodeplus "docker ps | grep sub2api" +``` + +--- + +### 构建服务器首次初始化 + +首次使用 `us-asaki-root` 作为构建服务器时,需要执行以下一次性操作: + +```bash +ssh us-asaki-root + +# 1) 克隆仓库 +cd /root +git clone https://github.com/touwaeriol/sub2api.git sub2api +cd sub2api + +# 2) 验证 Docker 和 buildx 版本 +docker version +docker buildx version +# 如果 buildx 版本过旧(< v0.14),执行更新: +# LATEST=$(curl -fsSL https://api.github.com/repos/docker/buildx/releases/latest | grep tag_name | cut -d'"' -f4) +# curl -fsSL "https://github.com/docker/buildx/releases/download/${LATEST}/buildx-${LATEST}.linux-amd64" -o ~/.docker/cli-plugins/docker-buildx +# chmod +x ~/.docker/cli-plugins/docker-buildx + +# 3) 验证构建能力 +docker build --no-cache -t sub2api:test -f Dockerfile . +docker rmi sub2api:test +``` + +--- + +## Beta 并行部署(不影响现网) + +目标:在同一台服务器上并行启动一个 beta 实例(例如端口 `8084`),**严禁改动/重启**现网实例(默认目录 `/root/sub2api`)。 + +### 设计原则 + +- **新目录**:beta 使用独立目录,例如 `/root/sub2api-beta`。 +- **敏感信息只放 `.env`**:beta 的数据库密码、JWT_SECRET 等只写入 `/root/sub2api-beta/deploy/.env`,不要提交到 git。 +- **独立 Compose Project**:通过 `docker compose -p sub2api-beta ...` 启动,确保 network/volume 隔离。 +- **独立端口**:通过 `.env` 的 `SERVER_PORT` 映射宿主机端口(例如 `8084:8080`)。 + +### 前置检查 + +```bash +# 1) 确保 8084 未被占用 +ssh clicodeplus "ss -ltnp | grep :8084 || echo '8084 is free'" + +# 2) 确认现网容器还在(只读检查) +ssh clicodeplus "docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Ports}}' | sed -n '1,200p'" +``` + +### 首次部署步骤 + +> **构建服务器说明**:正式和 beta 共用构建服务器上的 `/root/sub2api` 仓库,通过不同的镜像标签区分(`sub2api:latest` 用于正式,`sub2api:beta` 用于测试)。 + +```bash +# 1) 构建服务器构建 beta 镜像(共用 /root/sub2api 仓库,切到目标分支后打 beta 标签) +ssh us-asaki-root "cd /root/sub2api && git fetch origin && git checkout -B release/custom-0.1.71 origin/release/custom-0.1.71" +ssh us-asaki-root "cd /root/sub2api && docker build --no-cache -t sub2api:beta -f Dockerfile ." + +# ⚠️ 构建完成后如需恢复正式分支: +# ssh us-asaki-root "cd /root/sub2api && git checkout release/custom-<正式版本>" + +# 2) 传输镜像到生产服务器 +ssh us-asaki-root "docker save sub2api:beta" | ssh clicodeplus "docker load" +# ⚠️ 必须看到 "Loaded image: sub2api:beta" 输出 + +# 3) 在生产服务器上准备 beta 环境 +ssh clicodeplus + +# 克隆代码(仅用于 deploy 配置和版本号确认,不在此构建) +cd /root +git clone https://github.com/touwaeriol/sub2api.git sub2api-beta +cd /root/sub2api-beta +git checkout release/custom-0.1.71 + +# 4) 准备 beta 的 .env(敏感信息只写这里) +cd /root/sub2api-beta/deploy + +# 推荐:从现网 .env 复制,保证除 DB 名/用户/端口外完全一致 +cp -f /root/sub2api/deploy/.env ./.env + +# 仅修改以下三项(其他保持不变) +perl -pi -e 's/^SERVER_PORT=.*/SERVER_PORT=8084/' ./.env +perl -pi -e 's/^POSTGRES_USER=.*/POSTGRES_USER=beta/' ./.env +perl -pi -e 's/^POSTGRES_DB=.*/POSTGRES_DB=beta/' ./.env + +# 5) 写 compose override(避免与现网容器名冲突,镜像使用构建服务器传输的 sub2api:beta) +cat > docker-compose.override.yml <<'YAML' +services: + sub2api: + image: sub2api:beta + container_name: sub2api-beta + redis: + container_name: sub2api-beta-redis +YAML + +# 6) 启动 beta(独立 project,确保不影响现网) +cd /root/sub2api-beta/deploy +docker compose -p sub2api-beta --env-file .env -f docker-compose.yml -f docker-compose.override.yml up -d + +# 7) 验证 beta +curl -fsS http://127.0.0.1:8084/health +docker logs sub2api-beta --tail 50 +``` + +### 数据库配置约定(beta) + +- 数据库地址/SSL/密码:与现网一致(从现网 `.env` 复制即可)。 +- 仅修改: + - `POSTGRES_USER=beta` + - `POSTGRES_DB=beta` + +注意:需要数据库侧已存在 `beta` 用户与 `beta` 数据库,并授予权限;否则容器会启动失败并不断重启。 + +### 更新 beta(构建服务器构建 + 传输 + 仅重启 beta 容器) + +```bash +# 1) 构建服务器拉取代码并构建镜像(共用 /root/sub2api 仓库) +ssh us-asaki-root "cd /root/sub2api && git fetch origin && git checkout -B release/custom-0.1.71 origin/release/custom-0.1.71" +ssh us-asaki-root "cd /root/sub2api && docker build --no-cache -t sub2api:beta -f Dockerfile ." +# ⚠️ 必须看到构建成功输出 + +# 2) 传输镜像到生产服务器 +ssh us-asaki-root "docker save sub2api:beta" | ssh clicodeplus "docker load" +# ⚠️ 必须看到 "Loaded image: sub2api:beta" 输出 + +# 3) 生产服务器同步代码(用于版本号确认和 deploy 配置) +ssh clicodeplus "set -e; cd /root/sub2api-beta && git fetch --all --tags && git checkout -f release/custom-0.1.71 && git reset --hard origin/release/custom-0.1.71" + +# 4) 重启 beta 容器并验证 +ssh clicodeplus "cd /root/sub2api-beta/deploy && docker compose -p sub2api-beta --env-file .env -f docker-compose.yml -f docker-compose.override.yml up -d --no-deps --force-recreate sub2api" +ssh clicodeplus "sleep 5 && curl -fsS http://127.0.0.1:8084/health" +ssh clicodeplus "cat /root/sub2api-beta/backend/cmd/server/VERSION" +``` + +### 停止/回滚 beta(只影响 beta) + +```bash +ssh clicodeplus "cd /root/sub2api-beta/deploy && docker compose -p sub2api-beta -f docker-compose.yml -f docker-compose.override.yml down" +``` + +--- + +## 服务器首次部署 + +### 1. 构建服务器:克隆代码并配置远程仓库 + +```bash +ssh us-asaki-root +cd /root +git clone https://github.com/Wei-Shaw/sub2api.git +cd sub2api + +# 添加 fork 仓库 +git remote add fork https://github.com/touwaeriol/sub2api.git +``` + +### 2. 构建服务器:切换到定制分支并构建镜像 + +```bash +git fetch fork +git checkout -B release/custom-0.1.69 fork/release/custom-0.1.69 + +cd /root/sub2api +docker build -t sub2api:latest -f Dockerfile . +exit +``` + +### 3. 传输镜像到生产服务器 + +```bash +ssh us-asaki-root "docker save sub2api:latest" | ssh clicodeplus "docker load" +``` + +### 4. 生产服务器:克隆代码并配置环境 + +```bash +ssh clicodeplus +cd /root +git clone https://github.com/Wei-Shaw/sub2api.git +cd sub2api + +# 添加 fork 仓库 +git remote add fork https://github.com/touwaeriol/sub2api.git +git fetch fork +git checkout -B release/custom-0.1.69 fork/release/custom-0.1.69 + +# 配置环境变量 +cd deploy +cp .env.example .env +vim .env # 配置 DATABASE_URL, REDIS_URL, JWT_SECRET 等 +``` + +### 5. 生产服务器:更新镜像标签并启动服务 + +```bash +docker tag sub2api:latest weishaw/sub2api:latest +cd /root/sub2api/deploy && docker compose up -d +``` + +### 6. 验证部署 + +```bash +# 查看应用日志 +docker logs sub2api --tail 50 + +# 检查健康状态 +curl http://localhost:8080/health + +# 确认版本号 +cat /root/sub2api/backend/cmd/server/VERSION +``` + +### 7. 常用运维命令 + +```bash +# 查看实时日志 +docker logs -f sub2api + +# 重启服务 +docker compose restart sub2api + +# 停止所有服务 +docker compose down + +# 停止并删除数据卷(慎用!会删除数据库数据) +docker compose down -v + +# 查看资源使用情况 +docker stats sub2api +``` + +--- + +## 定制功能说明 + +当前定制分支包含以下功能(相对于官方版本): + +### UI/UX 定制 + +| 功能 | 说明 | +|------|------| +| 首页优化 | 面向用户的价值主张设计 | +| 移除 GitHub 链接 | 用户菜单中不显示 GitHub 导航 | +| 微信客服按钮 | 首页悬浮微信客服入口 | +| 限流时间精确显示 | 账号限流时间显示精确到秒 | + +### Antigravity 平台增强 + +| 功能 | 说明 | +|------|------| +| Scope 级别限流 | 按配额域(claude/gemini_text/gemini_image)独立限流,避免整个账号被锁定 | +| 模型级别限流 | 按具体模型(如 claude-opus-4-5)独立限流,更精细的限流控制 | +| 限流预检查 | 调度时预检查账号/模型限流状态,避免选中已限流账号 | +| 秒级冷却时间 | 支持 429 响应的秒级精确冷却时间 | +| 身份注入优化 | 模型身份信息注入 + 静默边界防止身份泄露 | +| thoughtSignature 修复 | Gemini 3 函数调用 400 错误修复 | +| max_tokens 自动修正 | 自动修正 max_tokens <= budget_tokens 导致的 400 错误 | + +### 调度算法优化 + +| 功能 | 说明 | +|------|------| +| 分层过滤选择 | 调度算法从全排序改为分层过滤,提升性能 | +| LRU 随机选择 | 相同 LRU 时间时随机选择,避免账号集中 | +| 限流等待阈值配置化 | 可配置的限流等待阈值 | + +### 运维增强 + +| 功能 | 说明 | +|------|------| +| Scope 限流统计 | 运维界面展示 Antigravity 账号 scope 级别限流统计 | +| 账号限流状态显示 | 账号列表显示 scope 和模型级别限流状态 | +| 清除限流按钮增强 | 有 scope/模型限流时也显示清除限流按钮 | + +### 其他修复 + +| 功能 | 说明 | +|------|------| +| .gitattributes | 确保迁移文件使用 LF 换行符(解决 Windows 下 SQL 摘要不一致) | +| 部署配置优化 | DATABASE_HOST 和 DATABASE_SSLMODE 可通过 .env 配置 | + +--- + +## Admin API 接口文档 + +### 认证方式 + +所有 Admin API 通过 `x-api-key` 请求头传递 Admin API Key 认证。 + +``` +x-api-key: admin-xxx +``` + +> **使用说明**:用户提供 admin token 后,直接将其作为 `x-api-key` 的值使用。Token 格式为 `admin-` + 64 位十六进制字符,在管理后台 `设置 > Admin API Key` 中生成。**请勿将实际 token 写入文档或代码中。** + +### 环境地址 + +| 环境 | 基础地址 | 说明 | +|------|----------|------| +| 正式 | `https://clicodeplus.com` | 生产环境 | +| Beta | `http://<服务器IP>:8084` | 仅内网访问 | +| OpenAI | `http://<服务器IP>:8083` | 仅内网访问 | + +> 以下接口文档中,`${BASE}` 代表环境基础地址,`${KEY}` 代表用户提供的 admin token。 + +--- + +### 1. 账号管理 + +#### 1.1 获取账号列表 + +``` +GET /api/v1/admin/accounts +``` + +**查询参数**: + +| 参数 | 类型 | 必填 | 说明 | +|------|------|------|------| +| `platform` | string | 否 | 平台筛选:`antigravity` / `anthropic` / `openai` / `gemini` | +| `type` | string | 否 | 账号类型:`oauth` / `api_key` / `cookie` | +| `status` | string | 否 | 状态:`active` / `disabled` / `error` | +| `search` | string | 否 | 搜索关键词(名称、备注) | +| `page` | int | 否 | 页码,默认 1 | +| `page_size` | int | 否 | 每页数量,默认 20 | + +```bash +curl -s "${BASE}/api/v1/admin/accounts?platform=antigravity&page=1&page_size=100" \ + -H "x-api-key: ${KEY}" +``` + +**响应**: +```json +{ + "code": 0, + "message": "success", + "data": { + "items": [{"id": 1, "name": "xxx@gmail.com", "platform": "antigravity", "status": "active", ...}], + "total": 66 + } +} +``` + +#### 1.2 获取账号详情 + +``` +GET /api/v1/admin/accounts/:id +``` + +```bash +curl -s "${BASE}/api/v1/admin/accounts/1" -H "x-api-key: ${KEY}" +``` + +#### 1.3 测试账号连接 + +``` +POST /api/v1/admin/accounts/:id/test +``` + +**请求体**(JSON,可选): + +| 字段 | 类型 | 必填 | 说明 | +|------|------|------|------| +| `model_id` | string | 否 | 指定测试模型,如 `claude-opus-4-6`;不传则使用默认模型 | + +**响应格式**:SSE(Server-Sent Events)流 + +```bash +curl -N -X POST "${BASE}/api/v1/admin/accounts/1/test" \ + -H "x-api-key: ${KEY}" \ + -H "Content-Type: application/json" \ + -d '{"model_id": "claude-opus-4-6"}' +``` + +**SSE 事件类型**: + +| type | 字段 | 说明 | +|------|------|------| +| `test_start` | `model` | 测试开始,返回测试模型名 | +| `content` | `text` | 模型响应内容(流式文本片段) | +| `test_end` | `success`, `error` | 测试结束,`success=true` 表示成功 | +| `error` | `text` | 错误信息 | + +#### 1.4 清除账号限流 + +``` +POST /api/v1/admin/accounts/:id/clear-rate-limit +``` + +```bash +curl -X POST "${BASE}/api/v1/admin/accounts/1/clear-rate-limit" \ + -H "x-api-key: ${KEY}" +``` + +#### 1.5 清除账号错误状态 + +``` +POST /api/v1/admin/accounts/:id/clear-error +``` + +```bash +curl -X POST "${BASE}/api/v1/admin/accounts/1/clear-error" \ + -H "x-api-key: ${KEY}" +``` + +#### 1.6 获取账号可用模型 + +``` +GET /api/v1/admin/accounts/:id/models +``` + +```bash +curl -s "${BASE}/api/v1/admin/accounts/1/models" -H "x-api-key: ${KEY}" +``` + +#### 1.7 刷新 OAuth Token + +``` +POST /api/v1/admin/accounts/:id/refresh +``` + +```bash +curl -X POST "${BASE}/api/v1/admin/accounts/1/refresh" -H "x-api-key: ${KEY}" +``` + +#### 1.8 刷新账号等级 + +``` +POST /api/v1/admin/accounts/:id/refresh-tier +``` + +```bash +curl -X POST "${BASE}/api/v1/admin/accounts/1/refresh-tier" -H "x-api-key: ${KEY}" +``` + +#### 1.9 获取账号统计 + +``` +GET /api/v1/admin/accounts/:id/stats +``` + +```bash +curl -s "${BASE}/api/v1/admin/accounts/1/stats" -H "x-api-key: ${KEY}" +``` + +#### 1.10 获取账号用量 + +``` +GET /api/v1/admin/accounts/:id/usage +``` + +```bash +curl -s "${BASE}/api/v1/admin/accounts/1/usage" -H "x-api-key: ${KEY}" +``` + +#### 1.11 批量测试账号(脚本) + +批量测试指定平台所有账号的指定模型连通性: + +```bash +# 用户需提供:BASE(环境地址)、KEY(admin token)、MODEL(测试模型) +ACCOUNT_IDS=$(curl -s "${BASE}/api/v1/admin/accounts?platform=antigravity&page=1&page_size=100" \ + -H "x-api-key: ${KEY}" | python3 -c " +import json, sys +data = json.load(sys.stdin) +for item in data['data']['items']: + print(f\"{item['id']}|{item['name']}\") +") + +while IFS='|' read -r ID NAME; do + echo "测试账号 ID=${ID} (${NAME})..." + RESPONSE=$(curl -s --max-time 60 -N \ + -X POST "${BASE}/api/v1/admin/accounts/${ID}/test" \ + -H "x-api-key: ${KEY}" \ + -H "Content-Type: application/json" \ + -d "{\"model_id\": \"${MODEL}\"}" 2>&1) + if echo "$RESPONSE" | grep -q '"success":true'; then + echo " ✅ 成功" + elif echo "$RESPONSE" | grep -q '"type":"content"'; then + echo " ✅ 成功(有内容响应)" + else + ERROR_MSG=$(echo "$RESPONSE" | grep -o '"error":"[^"]*"' | tail -1) + echo " ❌ 失败: ${ERROR_MSG}" + fi +done <<< "$ACCOUNT_IDS" +``` + +--- + +### 2. 运维监控 + +#### 2.1 并发统计 + +``` +GET /api/v1/admin/ops/concurrency +``` + +```bash +curl -s "${BASE}/api/v1/admin/ops/concurrency" -H "x-api-key: ${KEY}" +``` + +#### 2.2 账号可用性 + +``` +GET /api/v1/admin/ops/account-availability +``` + +```bash +curl -s "${BASE}/api/v1/admin/ops/account-availability" -H "x-api-key: ${KEY}" +``` + +#### 2.3 实时流量摘要 + +``` +GET /api/v1/admin/ops/realtime-traffic +``` + +```bash +curl -s "${BASE}/api/v1/admin/ops/realtime-traffic" -H "x-api-key: ${KEY}" +``` + +#### 2.4 请求错误列表 + +``` +GET /api/v1/admin/ops/request-errors +``` + +**查询参数**:`page`、`page_size` + +```bash +curl -s "${BASE}/api/v1/admin/ops/request-errors?page=1&page_size=50" \ + -H "x-api-key: ${KEY}" +``` + +#### 2.5 上游错误列表 + +``` +GET /api/v1/admin/ops/upstream-errors +``` + +```bash +curl -s "${BASE}/api/v1/admin/ops/upstream-errors?page=1&page_size=50" \ + -H "x-api-key: ${KEY}" +``` + +#### 2.6 仪表板概览 + +``` +GET /api/v1/admin/ops/dashboard/overview +``` + +```bash +curl -s "${BASE}/api/v1/admin/ops/dashboard/overview" -H "x-api-key: ${KEY}" +``` + +--- + +### 3. 系统设置 + +#### 3.1 获取系统设置 + +``` +GET /api/v1/admin/settings +``` + +```bash +curl -s "${BASE}/api/v1/admin/settings" -H "x-api-key: ${KEY}" +``` + +#### 3.2 更新系统设置 + +``` +PUT /api/v1/admin/settings +``` + +```bash +curl -X PUT "${BASE}/api/v1/admin/settings" \ + -H "x-api-key: ${KEY}" \ + -H "Content-Type: application/json" \ + -d '{ ... }' +``` + +#### 3.3 Admin API Key 状态(脱敏) + +``` +GET /api/v1/admin/settings/admin-api-key +``` + +```bash +curl -s "${BASE}/api/v1/admin/settings/admin-api-key" -H "x-api-key: ${KEY}" +``` + +--- + +### 4. 用户管理 + +#### 4.1 用户列表 + +``` +GET /api/v1/admin/users +``` + +```bash +curl -s "${BASE}/api/v1/admin/users?page=1&page_size=20" -H "x-api-key: ${KEY}" +``` + +#### 4.2 用户详情 + +``` +GET /api/v1/admin/users/:id +``` + +```bash +curl -s "${BASE}/api/v1/admin/users/1" -H "x-api-key: ${KEY}" +``` + +#### 4.3 更新用户余额 + +``` +POST /api/v1/admin/users/:id/balance +``` + +```bash +curl -X POST "${BASE}/api/v1/admin/users/1/balance" \ + -H "x-api-key: ${KEY}" \ + -H "Content-Type: application/json" \ + -d '{"amount": 100, "reason": "充值"}' +``` + +--- + +### 5. 分组管理 + +#### 5.1 分组列表 + +``` +GET /api/v1/admin/groups +``` + +```bash +curl -s "${BASE}/api/v1/admin/groups" -H "x-api-key: ${KEY}" +``` + +#### 5.2 所有分组(不分页) + +``` +GET /api/v1/admin/groups/all +``` + +```bash +curl -s "${BASE}/api/v1/admin/groups/all" -H "x-api-key: ${KEY}" +``` + +--- + +## 注意事项 + +1. **前端必须打包进镜像**:使用 `docker build` 在构建服务器(`us-asaki-root`)上构建,Dockerfile 会自动编译前端并 embed 到后端二进制中,构建完成后通过 `docker save | docker load` 传输到生产服务器(`clicodeplus`) + +2. **镜像标签**:docker-compose.yml 使用 `weishaw/sub2api:latest`,本地构建后需要 `docker tag` 覆盖 + +3. **Windows 换行符问题**:已通过 `.gitattributes` 解决,确保 `*.sql` 文件始终使用 LF + +4. **版本号管理**:每次发布必须更新 `backend/cmd/server/VERSION` 并打标签 + +5. **合并冲突**:合并上游新版本时,重点关注以下文件可能的冲突: + - `backend/internal/service/antigravity_gateway_service.go` + - `backend/internal/service/gateway_service.go` + - `backend/internal/pkg/antigravity/request_transformer.go` + +--- + +## Go 代码规范 + +### 1. 函数设计 + +#### 单一职责原则 +- **函数行数**:单个函数常规不应超过 **30 行**,超过时应拆分为子函数。若某段逻辑确实不可拆分(如复杂的状态机、协议解析等),可以例外,但需添加注释说明原因 +- **嵌套层级**:避免超过 3 层嵌套,使用 early return 减少嵌套 + +```go +// ❌ 不推荐:深层嵌套 +func process(data []Item) { + for _, item := range data { + if item.Valid { + if item.Type == "A" { + if item.Status == "active" { + // 业务逻辑... + } + } + } + } +} + +// ✅ 推荐:early return +func process(data []Item) { + for _, item := range data { + if !item.Valid { + continue + } + if item.Type != "A" { + continue + } + if item.Status != "active" { + continue + } + // 业务逻辑... + } +} +``` + +#### 复杂逻辑提取 +将复杂的条件判断或处理逻辑提取为独立函数: + +```go +// ❌ 不推荐:内联复杂逻辑 +if resp.StatusCode == 429 || resp.StatusCode == 503 { + // 80+ 行处理逻辑... +} + +// ✅ 推荐:提取为独立函数 +result := handleRateLimitResponse(resp, params) +switch result.action { +case actionRetry: + continue +case actionBreak: + return result.resp, nil +} +``` + +### 2. 重复代码消除 + +#### 配置获取模式 +将重复的配置获取逻辑提取为方法: + +```go +// ❌ 不推荐:重复代码 +logBody := s.settingService != nil && s.settingService.cfg != nil && s.settingService.cfg.Gateway.LogUpstreamErrorBody +maxBytes := 2048 +if s.settingService != nil && s.settingService.cfg != nil && s.settingService.cfg.Gateway.LogUpstreamErrorBodyMaxBytes > 0 { + maxBytes = s.settingService.cfg.Gateway.LogUpstreamErrorBodyMaxBytes +} + +// ✅ 推荐:提取为方法 +func (s *Service) getLogConfig() (logBody bool, maxBytes int) { + maxBytes = 2048 + if s.settingService == nil || s.settingService.cfg == nil { + return false, maxBytes + } + cfg := s.settingService.cfg.Gateway + if cfg.LogUpstreamErrorBodyMaxBytes > 0 { + maxBytes = cfg.LogUpstreamErrorBodyMaxBytes + } + return cfg.LogUpstreamErrorBody, maxBytes +} +``` + +### 3. 常量管理 + +#### 避免魔法数字 +所有硬编码的数值都应定义为常量: + +```go +// ❌ 不推荐 +if retryDelay >= 10*time.Second { + resetAt := time.Now().Add(30 * time.Second) +} + +// ✅ 推荐 +const ( + rateLimitThreshold = 10 * time.Second + defaultRateLimitDuration = 30 * time.Second +) + +if retryDelay >= rateLimitThreshold { + resetAt := time.Now().Add(defaultRateLimitDuration) +} +``` + +#### 注释引用常量名 +在注释中引用常量名而非硬编码值: + +```go +// ❌ 不推荐 +// < 10s: 等待后重试 + +// ✅ 推荐 +// < rateLimitThreshold: 等待后重试 +``` + +### 4. 错误处理 + +#### 使用结构化日志 +优先使用 `slog` 进行结构化日志记录: + +```go +// ❌ 不推荐 +log.Printf("%s status=%d model_rate_limit_failed model=%s error=%v", prefix, statusCode, modelName, err) + +// ✅ 推荐 +slog.Error("failed to set model rate limit", + "prefix", prefix, + "status_code", statusCode, + "model", modelName, + "error", err, +) +``` + +### 5. 测试规范 + +#### Mock 函数签名同步 +修改函数签名时,必须同步更新所有测试中的 mock 函数: + +```go +// 如果修改了 handleError 签名 +handleError func(..., groupID int64, sessionHash string) *Result + +// 必须同步更新测试中的 mock +handleError: func(..., groupID int64, sessionHash string) *Result { + return nil +}, +``` + +#### 测试构建标签 +统一使用测试构建标签: + +```go +//go:build unit + +package service +``` + +### 6. 时间格式解析 + +#### 使用标准库 +优先使用 `time.ParseDuration`,支持所有 Go duration 格式: + +```go +// ❌ 不推荐:手动限制格式 +if !strings.HasSuffix(delay, "s") || strings.Contains(delay, "m") { + continue +} + +// ✅ 推荐:使用标准库 +dur, err := time.ParseDuration(delay) // 支持 "0.5s", "4m50s", "1h30m" 等 +``` + +### 7. 接口设计 + +#### 接口隔离原则 +定义最小化接口,只包含必需的方法: + +```go +// ❌ 不推荐:使用过于宽泛的接口 +type AccountRepository interface { + // 20+ 个方法... +} + +// ✅ 推荐:定义最小化接口 +type ModelRateLimiter interface { + SetModelRateLimit(ctx context.Context, id int64, modelKey string, resetAt time.Time) error +} +``` + +### 8. 并发安全 + +#### 共享数据保护 +访问可能被并发修改的数据时,确保线程安全: + +```go +// 如果 Account.Extra 可能被并发修改 +// 需要使用互斥锁或原子操作保护读取 +func (a *Account) GetRateLimitRemainingTime(model string) time.Duration { + a.mu.RLock() + defer a.mu.RUnlock() + // 读取 Extra 字段... +} +``` + +### 9. 命名规范 + +#### 一致的命名风格 +- 常量使用 camelCase:`rateLimitThreshold` +- 类型使用 PascalCase:`AntigravityQuotaScope` +- 同一概念使用统一命名:`Threshold` 或 `Limit`,不要混用 + +```go +// ❌ 不推荐:命名不一致 +antigravitySmartRetryMinWait // 使用 Min +antigravityRateLimitThreshold // 使用 Threshold + +// ✅ 推荐:统一风格 +antigravityMinRetryWait +antigravityRateLimitThreshold +``` + +### 10. 代码审查清单 + +在提交代码前,检查以下项目: + +- [ ] 函数是否超过 30 行?(不可拆分的逻辑除外,需注释说明) +- [ ] 嵌套是否超过 3 层? +- [ ] 是否有重复代码可以提取? +- [ ] 是否使用了魔法数字? +- [ ] Mock 函数签名是否与实际函数一致? +- [ ] 测试是否覆盖了新增逻辑? +- [ ] 日志是否包含足够的上下文信息? +- [ ] 是否考虑了并发安全? + +--- + +## CI 检查与发布门禁 + +### GitHub Actions 检查项 + +本项目有 4 个 CI 任务,**任何代码推送或发布前都必须全部通过**: + +| Workflow | Job | 说明 | 本地验证命令 | +|----------|-----|------|-------------| +| CI | `test` | 单元测试 + 集成测试 | `cd backend && make test-unit && make test-integration` | +| CI | `golangci-lint` | Go 代码静态检查(golangci-lint v2.7) | `cd backend && golangci-lint run --timeout=5m` | +| Security Scan | `backend-security` | govulncheck + gosec 安全扫描 | `cd backend && govulncheck ./... && gosec -severity high -confidence high ./...` | +| Security Scan | `frontend-security` | pnpm audit 前端依赖安全检查 | `cd frontend && pnpm audit --prod --audit-level=high` | + +### 向上游提交 PR + +PR 目标是上游官方仓库,**只包含通用功能改动**(bug fix、新功能、性能优化等)。 + +**以下文件禁止出现在 PR 中**(属于我们 fork 的定制化内容): +- `CLAUDE.md`、`AGENTS.md` — 我们的开发文档 +- `backend/cmd/server/VERSION` — 我们的版本号文件 +- UI 定制改动(GitHub 链接移除、微信客服按钮、首页定制等) +- 部署配置(`deploy/` 目录下的定制修改) + +**PR 流程**: +1. 从 `develop` 创建功能分支,只包含要提交给上游的改动 +2. 推送分支后,**等待 4 个 CI job 全部通过** +3. 确认通过后再创建 PR +4. 使用 `gh run list --repo touwaeriol/sub2api --branch ` 检查状态 + +### 自有分支推送(develop / main) + +推送到我们自己的 `develop` 或 `main` 分支时,包含所有改动(定制化 + 通用功能)。 + +**推送前必须在本地执行全部 CI 检查**(不要等 GitHub Actions): + +```bash +# 确保 Go 工具链可用(macOS homebrew) +export PATH="/opt/homebrew/bin:$HOME/go/bin:$PATH" + +# 1. 单元测试(必须) +cd backend && make test-unit + +# 2. 集成测试(推荐,需要 Docker) +make test-integration + +# 3. golangci-lint 静态检查(必须) +golangci-lint run --timeout=5m + +# 4. gofmt 格式检查(必须) +gofmt -l ./... +# 如果有输出,运行 gofmt -w 修复 +``` + +**推送后确认**: +1. 使用 `gh run list --repo touwaeriol/sub2api --branch ` 检查 GitHub Actions 状态 +2. 确认 CI 和 Security Scan 两个 workflow 的 4 个 job 全部绿色 ✅ +3. 任何 job 失败必须立即修复,**禁止在 CI 未通过的状态下继续后续操作** + +### 发布版本 + +1. 本地执行上述全部 CI 检查通过 +2. 递增 `backend/cmd/server/VERSION`,提交并推送 +3. 推送后确认 GitHub Actions 的 4 个 CI job 全部通过 +4. **CI 未通过时禁止部署** — 必须先修复问题 +5. 使用 `gh run list --repo touwaeriol/sub2api --limit 10` 确认状态 + +### 常见 CI 失败原因及修复 +- **gofmt**:struct 字段对齐不一致 → 运行 `gofmt -w ` 修复 +- **golangci-lint**:未使用的变量/导入 → 删除或使用 `_` 忽略 +- **test 失败**:mock 函数签名不一致 → 同步更新 mock +- **gosec**:安全漏洞 → 根据提示修复或添加例外 diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 00000000..737cdf19 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,1251 @@ +# Sub2API 开发说明 + +## 版本管理策略 + +### 版本号规则 + +我们在官方版本号后面添加自己的小版本号: + +- 官方版本:`v0.1.68` +- 我们的版本:`v0.1.68.1`、`v0.1.68.2`(递增) + +### 分支策略 + +| 分支 | 说明 | +|------|------| +| `main` | 我们的主分支,包含所有定制功能 | +| `release/custom-X.Y.Z` | 基于官方 `vX.Y.Z` 的发布分支 | +| `upstream/main` | 上游官方仓库 | + +--- + +## 发布流程(基于新官方版本) + +当官方发布新版本(如 `v0.1.69`)时: + +### 1. 同步上游并创建发布分支 + +```bash +# 获取上游最新代码 +git fetch upstream --tags + +# 基于官方标签创建新的发布分支 +git checkout v0.1.69 -b release/custom-0.1.69 + +# 合并我们的 main 分支(包含所有定制功能) +git merge main --no-edit + +# 解决可能的冲突后继续 +``` + +### 2. 更新版本号并打标签 + +```bash +# 更新版本号文件 +echo "0.1.69.1" > backend/cmd/server/VERSION +git add backend/cmd/server/VERSION +git commit -m "chore: bump version to 0.1.69.1" + +# 打上我们自己的标签 +git tag v0.1.69.1 + +# 推送分支和标签 +git push origin release/custom-0.1.69 +git push origin v0.1.69.1 +``` + +### 3. 更新 main 分支 + +```bash +# 将发布分支合并回 main,保持 main 包含最新定制功能 +git checkout main +git merge release/custom-0.1.69 +git push origin main +``` + +--- + +## 热修复发布(在现有版本上修复) + +当需要在当前版本上发布修复时: + +```bash +# 在当前发布分支上修复 +git checkout release/custom-0.1.68 +# ... 进行修复 ... +git commit -m "fix: 修复描述" + +# 递增小版本号 +echo "0.1.68.2" > backend/cmd/server/VERSION +git add backend/cmd/server/VERSION +git commit -m "chore: bump version to 0.1.68.2" + +# 打标签并推送 +git tag v0.1.68.2 +git push origin release/custom-0.1.68 +git push origin v0.1.68.2 + +# 同步修复到 main +git checkout main +git cherry-pick +git push origin main +``` + +--- + +## 服务器部署流程 + +### 前置条件 + +- 本地已配置 SSH 别名 `clicodeplus` 连接到生产服务器(运行服务) +- 本地已配置 SSH 别名 `us-asaki-root` 连接到构建服务器(拉取代码、构建镜像) +- 生产服务器部署目录:`/root/sub2api`(正式)、`/root/sub2api-beta`(测试) +- 生产服务器使用 Docker Compose 部署 +- **镜像统一在构建服务器上构建**,避免生产服务器因编译占用 CPU/内存影响线上服务 + +### 服务器角色说明 + +| 服务器 | SSH 别名 | 职责 | +|--------|----------|------| +| 构建服务器 | `us-asaki-root` | 拉取代码、`docker build` 构建镜像 | +| 生产服务器 | `clicodeplus` | 加载镜像、运行服务、部署验证 | +| 数据库服务器 | `db-clicodeplus` | PostgreSQL 16 + Redis 7,所有环境共用 | + +> 数据库服务器运维手册:`db-clicodeplus:/root/README.md` + +### 部署环境说明 + +| 环境 | 目录(生产服务器) | 端口 | 数据库 | Redis DB | 容器名 | +|------|------|------|--------|----------|--------| +| 正式 | `/root/sub2api` | 8080 | `sub2api` | 0 | `sub2api` | +| Beta | `/root/sub2api-beta` | 8084 | `beta` | 2 | `sub2api-beta` | +| OpenAI | `/root/sub2api-openai` | 8083 | `openai` | 3 | `sub2api-openai` | + +### 外部数据库与 Redis + +所有环境(正式、Beta、OpenAI)共用 `db.clicodeplus.com` 上的 **PostgreSQL 16** 和 **Redis 7**,不使用容器内数据库或 Redis。 + +**PostgreSQL**(端口 5432,TLS 加密,scram-sha-256 认证): + +| 环境 | 用户名 | 数据库 | +|------|--------|--------| +| 正式 | `sub2api` | `sub2api` | +| Beta | `beta` | `beta` | +| OpenAI | `openai` | `openai` | + +**Redis**(端口 6379,密码认证): + +| 环境 | DB | +|------|-----| +| 正式 | 0 | +| Beta | 2 | +| OpenAI | 3 | + +**配置方式**: +- 数据库通过 `.env` 中的 `DATABASE_HOST`、`DATABASE_SSLMODE`、`POSTGRES_USER`、`POSTGRES_PASSWORD`、`POSTGRES_DB` 配置 +- Redis 通过 `docker-compose.override.yml` 覆盖 `REDIS_HOST`(因主 compose 文件硬编码为 `redis`),密码通过 `.env` 中的 `REDIS_PASSWORD` 配置 +- 各环境的 `docker-compose.override.yml` 已通过 `depends_on: !reset {}` 和 `redis: profiles: [disabled]` 去掉了对容器 Redis 的依赖 + +#### 数据库操作命令 + +通过 SSH 在服务器上执行数据库操作: + +```bash +# 正式环境 - 查询迁移记录 +ssh clicodeplus "source /root/sub2api/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c 'SELECT * FROM schema_migrations ORDER BY applied_at DESC LIMIT 5;'" + +# Beta 环境 - 查询迁移记录 +ssh clicodeplus "source /root/sub2api-beta/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c 'SELECT * FROM schema_migrations ORDER BY applied_at DESC LIMIT 5;'" + +# Beta 环境 - 清除指定迁移记录(重新执行迁移) +ssh clicodeplus "source /root/sub2api-beta/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c \"DELETE FROM schema_migrations WHERE filename LIKE '%049%';\"" + +# Beta 环境 - 更新账号数据 +ssh clicodeplus "source /root/sub2api-beta/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c \"UPDATE accounts SET credentials = credentials - 'model_mapping' WHERE platform = 'antigravity';\"" +``` + +> **注意**:使用 `source .env` 加载环境变量,避免在命令行中暴露密码。 + +### 部署步骤 + +**重要:每次部署都必须递增版本号!** + +#### 0. 递增版本号并推送(本地操作) + +每次部署前,先在本地递增小版本号并确保推送成功: + +```bash +# 查看当前版本号 +cat backend/cmd/server/VERSION +# 假设当前是 0.1.69.1 + +# 递增版本号 +echo "0.1.69.2" > backend/cmd/server/VERSION +git add backend/cmd/server/VERSION +git commit -m "chore: bump version to 0.1.69.2" +git push origin release/custom-0.1.69 + +# ⚠️ 确认推送成功(必须看到分支更新输出,不能有 rejected 错误) +``` + +> **检查点**:如果有其他未提交的改动,应先 commit 并 push,确保 release 分支上的所有代码都已推送到远程。 + +#### 1. 构建服务器拉取代码 + +```bash +# 拉取最新代码并切换分支 +ssh us-asaki-root "cd /root/sub2api && git fetch origin && git checkout -B release/custom-0.1.69 origin/release/custom-0.1.69" + +# ⚠️ 验证版本号与步骤 0 一致 +ssh us-asaki-root "cat /root/sub2api/backend/cmd/server/VERSION" +``` + +> **首次使用构建服务器?** 需要先初始化仓库,参见下方「构建服务器首次初始化」章节。 + +#### 2. 构建服务器构建镜像 + +```bash +ssh us-asaki-root "cd /root/sub2api && docker build --no-cache -t sub2api:latest -f Dockerfile ." + +# ⚠️ 必须看到构建成功输出,如果失败需要先排查问题 +``` + +> **常见构建问题**: +> - `buildx` 版本过旧导致 API 版本不兼容 → 更新 buildx:`curl -fsSL "https://github.com/docker/buildx/releases/latest/download/buildx-$(curl -fsSL https://api.github.com/repos/docker/buildx/releases/latest | grep tag_name | cut -d'"' -f4).linux-amd64" -o ~/.docker/cli-plugins/docker-buildx && chmod +x ~/.docker/cli-plugins/docker-buildx` +> - 磁盘空间不足 → `docker system prune -f` 清理无用镜像 + +#### 3. 传输镜像到生产服务器并加载 + +```bash +# 导出镜像 → 通过管道传输 → 生产服务器加载 +ssh us-asaki-root "docker save sub2api:latest" | ssh clicodeplus "docker load" + +# ⚠️ 必须看到 "Loaded image: sub2api:latest" 输出 +``` + +#### 4. 生产服务器同步代码、更新标签并重启 + +```bash +# 同步代码(用于版本号确认和 deploy 配置) +ssh clicodeplus "cd /root/sub2api && git fetch fork && git checkout -B release/custom-0.1.69 fork/release/custom-0.1.69" + +# 更新镜像标签并重启 +ssh clicodeplus "docker tag sub2api:latest weishaw/sub2api:latest" +ssh clicodeplus "cd /root/sub2api/deploy && docker compose up -d --force-recreate sub2api" +``` + +#### 5. 验证部署 + +```bash +# 查看启动日志 +ssh clicodeplus "docker logs sub2api --tail 20" + +# 确认版本号(必须与步骤 0 中设置的版本号一致) +ssh clicodeplus "cat /root/sub2api/backend/cmd/server/VERSION" + +# 检查容器状态(必须显示 healthy) +ssh clicodeplus "docker ps | grep sub2api" +``` + +--- + +### 构建服务器首次初始化 + +首次使用 `us-asaki-root` 作为构建服务器时,需要执行以下一次性操作: + +```bash +ssh us-asaki-root + +# 1) 克隆仓库 +cd /root +git clone https://github.com/touwaeriol/sub2api.git sub2api +cd sub2api + +# 2) 验证 Docker 和 buildx 版本 +docker version +docker buildx version +# 如果 buildx 版本过旧(< v0.14),执行更新: +# LATEST=$(curl -fsSL https://api.github.com/repos/docker/buildx/releases/latest | grep tag_name | cut -d'"' -f4) +# curl -fsSL "https://github.com/docker/buildx/releases/download/${LATEST}/buildx-${LATEST}.linux-amd64" -o ~/.docker/cli-plugins/docker-buildx +# chmod +x ~/.docker/cli-plugins/docker-buildx + +# 3) 验证构建能力 +docker build --no-cache -t sub2api:test -f Dockerfile . +docker rmi sub2api:test +``` + +--- + +## Beta 并行部署(不影响现网) + +目标:在同一台服务器上并行启动一个 beta 实例(例如端口 `8084`),**严禁改动/重启**现网实例(默认目录 `/root/sub2api`)。 + +### 设计原则 + +- **新目录**:beta 使用独立目录,例如 `/root/sub2api-beta`。 +- **敏感信息只放 `.env`**:beta 的数据库密码、JWT_SECRET 等只写入 `/root/sub2api-beta/deploy/.env`,不要提交到 git。 +- **独立 Compose Project**:通过 `docker compose -p sub2api-beta ...` 启动,确保 network/volume 隔离。 +- **独立端口**:通过 `.env` 的 `SERVER_PORT` 映射宿主机端口(例如 `8084:8080`)。 + +### 前置检查 + +```bash +# 1) 确保 8084 未被占用 +ssh clicodeplus "ss -ltnp | grep :8084 || echo '8084 is free'" + +# 2) 确认现网容器还在(只读检查) +ssh clicodeplus "docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Ports}}' | sed -n '1,200p'" +``` + +### 首次部署步骤 + +> **构建服务器说明**:正式和 beta 共用构建服务器上的 `/root/sub2api` 仓库,通过不同的镜像标签区分(`sub2api:latest` 用于正式,`sub2api:beta` 用于测试)。 + +```bash +# 1) 构建服务器构建 beta 镜像(共用 /root/sub2api 仓库,切到目标分支后打 beta 标签) +ssh us-asaki-root "cd /root/sub2api && git fetch origin && git checkout -B release/custom-0.1.71 origin/release/custom-0.1.71" +ssh us-asaki-root "cd /root/sub2api && docker build --no-cache -t sub2api:beta -f Dockerfile ." + +# ⚠️ 构建完成后如需恢复正式分支: +# ssh us-asaki-root "cd /root/sub2api && git checkout release/custom-<正式版本>" + +# 2) 传输镜像到生产服务器 +ssh us-asaki-root "docker save sub2api:beta" | ssh clicodeplus "docker load" +# ⚠️ 必须看到 "Loaded image: sub2api:beta" 输出 + +# 3) 在生产服务器上准备 beta 环境 +ssh clicodeplus + +# 克隆代码(仅用于 deploy 配置和版本号确认,不在此构建) +cd /root +git clone https://github.com/touwaeriol/sub2api.git sub2api-beta +cd /root/sub2api-beta +git checkout release/custom-0.1.71 + +# 4) 准备 beta 的 .env(敏感信息只写这里) +cd /root/sub2api-beta/deploy + +# 推荐:从现网 .env 复制,保证除 DB 名/用户/端口外完全一致 +cp -f /root/sub2api/deploy/.env ./.env + +# 仅修改以下三项(其他保持不变) +perl -pi -e 's/^SERVER_PORT=.*/SERVER_PORT=8084/' ./.env +perl -pi -e 's/^POSTGRES_USER=.*/POSTGRES_USER=beta/' ./.env +perl -pi -e 's/^POSTGRES_DB=.*/POSTGRES_DB=beta/' ./.env + +# 5) 写 compose override(避免与现网容器名冲突,镜像使用构建服务器传输的 sub2api:beta,Redis 使用外部服务) +cat > docker-compose.override.yml <<'YAML' +services: + sub2api: + image: sub2api:beta + container_name: sub2api-beta + environment: + - DATABASE_HOST=${DATABASE_HOST:-postgres} + - DATABASE_SSLMODE=${DATABASE_SSLMODE:-disable} + - REDIS_HOST=db.clicodeplus.com + depends_on: !reset {} + redis: + profiles: + - disabled +YAML + +# 6) 启动 beta(独立 project,确保不影响现网) +cd /root/sub2api-beta/deploy +docker compose -p sub2api-beta --env-file .env -f docker-compose.yml -f docker-compose.override.yml up -d + +# 7) 验证 beta +curl -fsS http://127.0.0.1:8084/health +docker logs sub2api-beta --tail 50 +``` + +### 数据库配置约定(beta) + +- 数据库地址/SSL/密码:与现网一致(从现网 `.env` 复制即可),均指向 `db.clicodeplus.com`。 +- 仅修改: + - `POSTGRES_USER=beta` + - `POSTGRES_DB=beta` + - `REDIS_DB=2` + +注意:需要数据库侧已存在 `beta` 用户与 `beta` 数据库,并授予权限;否则容器会启动失败并不断重启。 + +### 更新 beta(构建服务器构建 + 传输 + 仅重启 beta 容器) + +```bash +# 1) 构建服务器拉取代码并构建镜像(共用 /root/sub2api 仓库) +ssh us-asaki-root "cd /root/sub2api && git fetch origin && git checkout -B release/custom-0.1.71 origin/release/custom-0.1.71" +ssh us-asaki-root "cd /root/sub2api && docker build --no-cache -t sub2api:beta -f Dockerfile ." +# ⚠️ 必须看到构建成功输出 + +# 2) 传输镜像到生产服务器 +ssh us-asaki-root "docker save sub2api:beta" | ssh clicodeplus "docker load" +# ⚠️ 必须看到 "Loaded image: sub2api:beta" 输出 + +# 3) 生产服务器同步代码(用于版本号确认和 deploy 配置) +ssh clicodeplus "set -e; cd /root/sub2api-beta && git fetch --all --tags && git checkout -f release/custom-0.1.71 && git reset --hard origin/release/custom-0.1.71" + +# 4) 重启 beta 容器并验证 +ssh clicodeplus "cd /root/sub2api-beta/deploy && docker compose -p sub2api-beta --env-file .env -f docker-compose.yml -f docker-compose.override.yml up -d --no-deps --force-recreate sub2api" +ssh clicodeplus "sleep 5 && curl -fsS http://127.0.0.1:8084/health" +ssh clicodeplus "cat /root/sub2api-beta/backend/cmd/server/VERSION" +``` + +### 停止/回滚 beta(只影响 beta) + +```bash +ssh clicodeplus "cd /root/sub2api-beta/deploy && docker compose -p sub2api-beta -f docker-compose.yml -f docker-compose.override.yml down" +``` + +--- + +## 服务器首次部署 + +### 1. 构建服务器:克隆代码并配置远程仓库 + +```bash +ssh us-asaki-root +cd /root +git clone https://github.com/Wei-Shaw/sub2api.git +cd sub2api + +# 添加 fork 仓库 +git remote add fork https://github.com/touwaeriol/sub2api.git +``` + +### 2. 构建服务器:切换到定制分支并构建镜像 + +```bash +git fetch fork +git checkout -B release/custom-0.1.69 fork/release/custom-0.1.69 + +cd /root/sub2api +docker build -t sub2api:latest -f Dockerfile . +exit +``` + +### 3. 传输镜像到生产服务器 + +```bash +ssh us-asaki-root "docker save sub2api:latest" | ssh clicodeplus "docker load" +``` + +### 4. 生产服务器:克隆代码并配置环境 + +```bash +ssh clicodeplus +cd /root +git clone https://github.com/Wei-Shaw/sub2api.git +cd sub2api + +# 添加 fork 仓库 +git remote add fork https://github.com/touwaeriol/sub2api.git +git fetch fork +git checkout -B release/custom-0.1.69 fork/release/custom-0.1.69 + +# 配置环境变量 +cd deploy +cp .env.example .env +vim .env # 配置 DATABASE_HOST=db.clicodeplus.com, POSTGRES_PASSWORD, REDIS_PASSWORD, JWT_SECRET 等 + +# 创建 override 文件(Redis 指向外部服务,去掉容器 Redis 依赖) +cat > docker-compose.override.yml <<'YAML' +services: + sub2api: + environment: + - REDIS_HOST=db.clicodeplus.com + depends_on: !reset {} + redis: + profiles: + - disabled +YAML +``` + +### 5. 生产服务器:更新镜像标签并启动服务 + +```bash +docker tag sub2api:latest weishaw/sub2api:latest +cd /root/sub2api/deploy && docker compose up -d +``` + +### 6. 验证部署 + +```bash +# 查看应用日志 +docker logs sub2api --tail 50 + +# 检查健康状态 +curl http://localhost:8080/health + +# 确认版本号 +cat /root/sub2api/backend/cmd/server/VERSION +``` + +### 7. 常用运维命令 + +```bash +# 查看实时日志 +docker logs -f sub2api + +# 重启服务 +docker compose restart sub2api + +# 停止所有服务 +docker compose down + +# 停止并删除数据卷(慎用!会删除数据库数据) +docker compose down -v + +# 查看资源使用情况 +docker stats sub2api +``` + +--- + +## 定制功能说明 + +当前定制分支包含以下功能(相对于官方版本): + +### UI/UX 定制 + +| 功能 | 说明 | +|------|------| +| 首页优化 | 面向用户的价值主张设计 | +| 移除 GitHub 链接 | 用户菜单中不显示 GitHub 导航 | +| 微信客服按钮 | 首页悬浮微信客服入口 | +| 限流时间精确显示 | 账号限流时间显示精确到秒 | + +### Antigravity 平台增强 + +| 功能 | 说明 | +|------|------| +| Scope 级别限流 | 按配额域(claude/gemini_text/gemini_image)独立限流,避免整个账号被锁定 | +| 模型级别限流 | 按具体模型(如 claude-opus-4-5)独立限流,更精细的限流控制 | +| 限流预检查 | 调度时预检查账号/模型限流状态,避免选中已限流账号 | +| 秒级冷却时间 | 支持 429 响应的秒级精确冷却时间 | +| 身份注入优化 | 模型身份信息注入 + 静默边界防止身份泄露 | +| thoughtSignature 修复 | Gemini 3 函数调用 400 错误修复 | +| max_tokens 自动修正 | 自动修正 max_tokens <= budget_tokens 导致的 400 错误 | + +### 调度算法优化 + +| 功能 | 说明 | +|------|------| +| 分层过滤选择 | 调度算法从全排序改为分层过滤,提升性能 | +| LRU 随机选择 | 相同 LRU 时间时随机选择,避免账号集中 | +| 限流等待阈值配置化 | 可配置的限流等待阈值 | + +### 运维增强 + +| 功能 | 说明 | +|------|------| +| Scope 限流统计 | 运维界面展示 Antigravity 账号 scope 级别限流统计 | +| 账号限流状态显示 | 账号列表显示 scope 和模型级别限流状态 | +| 清除限流按钮增强 | 有 scope/模型限流时也显示清除限流按钮 | + +### 其他修复 + +| 功能 | 说明 | +|------|------| +| .gitattributes | 确保迁移文件使用 LF 换行符(解决 Windows 下 SQL 摘要不一致) | +| 部署配置优化 | DATABASE_HOST 和 DATABASE_SSLMODE 可通过 .env 配置 | + +--- + +## Admin API 接口文档 + +### 认证方式 + +所有 Admin API 通过 `x-api-key` 请求头传递 Admin API Key 认证。 + +``` +x-api-key: admin-xxx +``` + +> **使用说明**:用户提供 admin token 后,直接将其作为 `x-api-key` 的值使用。Token 格式为 `admin-` + 64 位十六进制字符,在管理后台 `设置 > Admin API Key` 中生成。**请勿将实际 token 写入文档或代码中。** + +### 环境地址 + +| 环境 | 基础地址 | 说明 | +|------|----------|------| +| 正式 | `https://clicodeplus.com` | 生产环境 | +| Beta | `http://<服务器IP>:8084` | 仅内网访问 | +| OpenAI | `http://<服务器IP>:8083` | 仅内网访问 | + +> 以下接口文档中,`${BASE}` 代表环境基础地址,`${KEY}` 代表用户提供的 admin token。 + +--- + +### 1. 账号管理 + +#### 1.1 获取账号列表 + +``` +GET /api/v1/admin/accounts +``` + +**查询参数**: + +| 参数 | 类型 | 必填 | 说明 | +|------|------|------|------| +| `platform` | string | 否 | 平台筛选:`antigravity` / `anthropic` / `openai` / `gemini` | +| `type` | string | 否 | 账号类型:`oauth` / `api_key` / `cookie` | +| `status` | string | 否 | 状态:`active` / `disabled` / `error` | +| `search` | string | 否 | 搜索关键词(名称、备注) | +| `page` | int | 否 | 页码,默认 1 | +| `page_size` | int | 否 | 每页数量,默认 20 | + +```bash +curl -s "${BASE}/api/v1/admin/accounts?platform=antigravity&page=1&page_size=100" \ + -H "x-api-key: ${KEY}" +``` + +**响应**: +```json +{ + "code": 0, + "message": "success", + "data": { + "items": [{"id": 1, "name": "xxx@gmail.com", "platform": "antigravity", "status": "active", ...}], + "total": 66 + } +} +``` + +#### 1.2 获取账号详情 + +``` +GET /api/v1/admin/accounts/:id +``` + +```bash +curl -s "${BASE}/api/v1/admin/accounts/1" -H "x-api-key: ${KEY}" +``` + +#### 1.3 测试账号连接 + +``` +POST /api/v1/admin/accounts/:id/test +``` + +**请求体**(JSON,可选): + +| 字段 | 类型 | 必填 | 说明 | +|------|------|------|------| +| `model_id` | string | 否 | 指定测试模型,如 `claude-opus-4-6`;不传则使用默认模型 | + +**响应格式**:SSE(Server-Sent Events)流 + +```bash +curl -N -X POST "${BASE}/api/v1/admin/accounts/1/test" \ + -H "x-api-key: ${KEY}" \ + -H "Content-Type: application/json" \ + -d '{"model_id": "claude-opus-4-6"}' +``` + +**SSE 事件类型**: + +| type | 字段 | 说明 | +|------|------|------| +| `test_start` | `model` | 测试开始,返回测试模型名 | +| `content` | `text` | 模型响应内容(流式文本片段) | +| `test_end` | `success`, `error` | 测试结束,`success=true` 表示成功 | +| `error` | `text` | 错误信息 | + +#### 1.4 清除账号限流 + +``` +POST /api/v1/admin/accounts/:id/clear-rate-limit +``` + +```bash +curl -X POST "${BASE}/api/v1/admin/accounts/1/clear-rate-limit" \ + -H "x-api-key: ${KEY}" +``` + +#### 1.5 清除账号错误状态 + +``` +POST /api/v1/admin/accounts/:id/clear-error +``` + +```bash +curl -X POST "${BASE}/api/v1/admin/accounts/1/clear-error" \ + -H "x-api-key: ${KEY}" +``` + +#### 1.6 获取账号可用模型 + +``` +GET /api/v1/admin/accounts/:id/models +``` + +```bash +curl -s "${BASE}/api/v1/admin/accounts/1/models" -H "x-api-key: ${KEY}" +``` + +#### 1.7 刷新 OAuth Token + +``` +POST /api/v1/admin/accounts/:id/refresh +``` + +```bash +curl -X POST "${BASE}/api/v1/admin/accounts/1/refresh" -H "x-api-key: ${KEY}" +``` + +#### 1.8 刷新账号等级 + +``` +POST /api/v1/admin/accounts/:id/refresh-tier +``` + +```bash +curl -X POST "${BASE}/api/v1/admin/accounts/1/refresh-tier" -H "x-api-key: ${KEY}" +``` + +#### 1.9 获取账号统计 + +``` +GET /api/v1/admin/accounts/:id/stats +``` + +```bash +curl -s "${BASE}/api/v1/admin/accounts/1/stats" -H "x-api-key: ${KEY}" +``` + +#### 1.10 获取账号用量 + +``` +GET /api/v1/admin/accounts/:id/usage +``` + +```bash +curl -s "${BASE}/api/v1/admin/accounts/1/usage" -H "x-api-key: ${KEY}" +``` + +#### 1.11 批量测试账号(脚本) + +批量测试指定平台所有账号的指定模型连通性: + +```bash +# 用户需提供:BASE(环境地址)、KEY(admin token)、MODEL(测试模型) +ACCOUNT_IDS=$(curl -s "${BASE}/api/v1/admin/accounts?platform=antigravity&page=1&page_size=100" \ + -H "x-api-key: ${KEY}" | python3 -c " +import json, sys +data = json.load(sys.stdin) +for item in data['data']['items']: + print(f\"{item['id']}|{item['name']}\") +") + +while IFS='|' read -r ID NAME; do + echo "测试账号 ID=${ID} (${NAME})..." + RESPONSE=$(curl -s --max-time 60 -N \ + -X POST "${BASE}/api/v1/admin/accounts/${ID}/test" \ + -H "x-api-key: ${KEY}" \ + -H "Content-Type: application/json" \ + -d "{\"model_id\": \"${MODEL}\"}" 2>&1) + if echo "$RESPONSE" | grep -q '"success":true'; then + echo " ✅ 成功" + elif echo "$RESPONSE" | grep -q '"type":"content"'; then + echo " ✅ 成功(有内容响应)" + else + ERROR_MSG=$(echo "$RESPONSE" | grep -o '"error":"[^"]*"' | tail -1) + echo " ❌ 失败: ${ERROR_MSG}" + fi +done <<< "$ACCOUNT_IDS" +``` + +--- + +### 2. 运维监控 + +#### 2.1 并发统计 + +``` +GET /api/v1/admin/ops/concurrency +``` + +```bash +curl -s "${BASE}/api/v1/admin/ops/concurrency" -H "x-api-key: ${KEY}" +``` + +#### 2.2 账号可用性 + +``` +GET /api/v1/admin/ops/account-availability +``` + +```bash +curl -s "${BASE}/api/v1/admin/ops/account-availability" -H "x-api-key: ${KEY}" +``` + +#### 2.3 实时流量摘要 + +``` +GET /api/v1/admin/ops/realtime-traffic +``` + +```bash +curl -s "${BASE}/api/v1/admin/ops/realtime-traffic" -H "x-api-key: ${KEY}" +``` + +#### 2.4 请求错误列表 + +``` +GET /api/v1/admin/ops/request-errors +``` + +**查询参数**:`page`、`page_size` + +```bash +curl -s "${BASE}/api/v1/admin/ops/request-errors?page=1&page_size=50" \ + -H "x-api-key: ${KEY}" +``` + +#### 2.5 上游错误列表 + +``` +GET /api/v1/admin/ops/upstream-errors +``` + +```bash +curl -s "${BASE}/api/v1/admin/ops/upstream-errors?page=1&page_size=50" \ + -H "x-api-key: ${KEY}" +``` + +#### 2.6 仪表板概览 + +``` +GET /api/v1/admin/ops/dashboard/overview +``` + +```bash +curl -s "${BASE}/api/v1/admin/ops/dashboard/overview" -H "x-api-key: ${KEY}" +``` + +--- + +### 3. 系统设置 + +#### 3.1 获取系统设置 + +``` +GET /api/v1/admin/settings +``` + +```bash +curl -s "${BASE}/api/v1/admin/settings" -H "x-api-key: ${KEY}" +``` + +#### 3.2 更新系统设置 + +``` +PUT /api/v1/admin/settings +``` + +```bash +curl -X PUT "${BASE}/api/v1/admin/settings" \ + -H "x-api-key: ${KEY}" \ + -H "Content-Type: application/json" \ + -d '{ ... }' +``` + +#### 3.3 Admin API Key 状态(脱敏) + +``` +GET /api/v1/admin/settings/admin-api-key +``` + +```bash +curl -s "${BASE}/api/v1/admin/settings/admin-api-key" -H "x-api-key: ${KEY}" +``` + +--- + +### 4. 用户管理 + +#### 4.1 用户列表 + +``` +GET /api/v1/admin/users +``` + +```bash +curl -s "${BASE}/api/v1/admin/users?page=1&page_size=20" -H "x-api-key: ${KEY}" +``` + +#### 4.2 用户详情 + +``` +GET /api/v1/admin/users/:id +``` + +```bash +curl -s "${BASE}/api/v1/admin/users/1" -H "x-api-key: ${KEY}" +``` + +#### 4.3 更新用户余额 + +``` +POST /api/v1/admin/users/:id/balance +``` + +```bash +curl -X POST "${BASE}/api/v1/admin/users/1/balance" \ + -H "x-api-key: ${KEY}" \ + -H "Content-Type: application/json" \ + -d '{"amount": 100, "reason": "充值"}' +``` + +--- + +### 5. 分组管理 + +#### 5.1 分组列表 + +``` +GET /api/v1/admin/groups +``` + +```bash +curl -s "${BASE}/api/v1/admin/groups" -H "x-api-key: ${KEY}" +``` + +#### 5.2 所有分组(不分页) + +``` +GET /api/v1/admin/groups/all +``` + +```bash +curl -s "${BASE}/api/v1/admin/groups/all" -H "x-api-key: ${KEY}" +``` + +--- + +## 注意事项 + +1. **前端必须打包进镜像**:使用 `docker build` 在构建服务器(`us-asaki-root`)上构建,Dockerfile 会自动编译前端并 embed 到后端二进制中,构建完成后通过 `docker save | docker load` 传输到生产服务器(`clicodeplus`) + +2. **镜像标签**:docker-compose.yml 使用 `weishaw/sub2api:latest`,本地构建后需要 `docker tag` 覆盖 + +3. **Windows 换行符问题**:已通过 `.gitattributes` 解决,确保 `*.sql` 文件始终使用 LF + +4. **版本号管理**:每次发布必须更新 `backend/cmd/server/VERSION` 并打标签 + +5. **合并冲突**:合并上游新版本时,重点关注以下文件可能的冲突: + - `backend/internal/service/antigravity_gateway_service.go` + - `backend/internal/service/gateway_service.go` + - `backend/internal/pkg/antigravity/request_transformer.go` + +--- + +## Go 代码规范 + +### 1. 函数设计 + +#### 单一职责原则 +- **函数行数**:单个函数常规不应超过 **30 行**,超过时应拆分为子函数。若某段逻辑确实不可拆分(如复杂的状态机、协议解析等),可以例外,但需添加注释说明原因 +- **嵌套层级**:避免超过 3 层嵌套,使用 early return 减少嵌套 + +```go +// ❌ 不推荐:深层嵌套 +func process(data []Item) { + for _, item := range data { + if item.Valid { + if item.Type == "A" { + if item.Status == "active" { + // 业务逻辑... + } + } + } + } +} + +// ✅ 推荐:early return +func process(data []Item) { + for _, item := range data { + if !item.Valid { + continue + } + if item.Type != "A" { + continue + } + if item.Status != "active" { + continue + } + // 业务逻辑... + } +} +``` + +#### 复杂逻辑提取 +将复杂的条件判断或处理逻辑提取为独立函数: + +```go +// ❌ 不推荐:内联复杂逻辑 +if resp.StatusCode == 429 || resp.StatusCode == 503 { + // 80+ 行处理逻辑... +} + +// ✅ 推荐:提取为独立函数 +result := handleRateLimitResponse(resp, params) +switch result.action { +case actionRetry: + continue +case actionBreak: + return result.resp, nil +} +``` + +### 2. 重复代码消除 + +#### 配置获取模式 +将重复的配置获取逻辑提取为方法: + +```go +// ❌ 不推荐:重复代码 +logBody := s.settingService != nil && s.settingService.cfg != nil && s.settingService.cfg.Gateway.LogUpstreamErrorBody +maxBytes := 2048 +if s.settingService != nil && s.settingService.cfg != nil && s.settingService.cfg.Gateway.LogUpstreamErrorBodyMaxBytes > 0 { + maxBytes = s.settingService.cfg.Gateway.LogUpstreamErrorBodyMaxBytes +} + +// ✅ 推荐:提取为方法 +func (s *Service) getLogConfig() (logBody bool, maxBytes int) { + maxBytes = 2048 + if s.settingService == nil || s.settingService.cfg == nil { + return false, maxBytes + } + cfg := s.settingService.cfg.Gateway + if cfg.LogUpstreamErrorBodyMaxBytes > 0 { + maxBytes = cfg.LogUpstreamErrorBodyMaxBytes + } + return cfg.LogUpstreamErrorBody, maxBytes +} +``` + +### 3. 常量管理 + +#### 避免魔法数字 +所有硬编码的数值都应定义为常量: + +```go +// ❌ 不推荐 +if retryDelay >= 10*time.Second { + resetAt := time.Now().Add(30 * time.Second) +} + +// ✅ 推荐 +const ( + rateLimitThreshold = 10 * time.Second + defaultRateLimitDuration = 30 * time.Second +) + +if retryDelay >= rateLimitThreshold { + resetAt := time.Now().Add(defaultRateLimitDuration) +} +``` + +#### 注释引用常量名 +在注释中引用常量名而非硬编码值: + +```go +// ❌ 不推荐 +// < 10s: 等待后重试 + +// ✅ 推荐 +// < rateLimitThreshold: 等待后重试 +``` + +### 4. 错误处理 + +#### 使用结构化日志 +优先使用 `slog` 进行结构化日志记录: + +```go +// ❌ 不推荐 +log.Printf("%s status=%d model_rate_limit_failed model=%s error=%v", prefix, statusCode, modelName, err) + +// ✅ 推荐 +slog.Error("failed to set model rate limit", + "prefix", prefix, + "status_code", statusCode, + "model", modelName, + "error", err, +) +``` + +### 5. 测试规范 + +#### Mock 函数签名同步 +修改函数签名时,必须同步更新所有测试中的 mock 函数: + +```go +// 如果修改了 handleError 签名 +handleError func(..., groupID int64, sessionHash string) *Result + +// 必须同步更新测试中的 mock +handleError: func(..., groupID int64, sessionHash string) *Result { + return nil +}, +``` + +#### 测试构建标签 +统一使用测试构建标签: + +```go +//go:build unit + +package service +``` + +### 6. 时间格式解析 + +#### 使用标准库 +优先使用 `time.ParseDuration`,支持所有 Go duration 格式: + +```go +// ❌ 不推荐:手动限制格式 +if !strings.HasSuffix(delay, "s") || strings.Contains(delay, "m") { + continue +} + +// ✅ 推荐:使用标准库 +dur, err := time.ParseDuration(delay) // 支持 "0.5s", "4m50s", "1h30m" 等 +``` + +### 7. 接口设计 + +#### 接口隔离原则 +定义最小化接口,只包含必需的方法: + +```go +// ❌ 不推荐:使用过于宽泛的接口 +type AccountRepository interface { + // 20+ 个方法... +} + +// ✅ 推荐:定义最小化接口 +type ModelRateLimiter interface { + SetModelRateLimit(ctx context.Context, id int64, modelKey string, resetAt time.Time) error +} +``` + +### 8. 并发安全 + +#### 共享数据保护 +访问可能被并发修改的数据时,确保线程安全: + +```go +// 如果 Account.Extra 可能被并发修改 +// 需要使用互斥锁或原子操作保护读取 +func (a *Account) GetRateLimitRemainingTime(model string) time.Duration { + a.mu.RLock() + defer a.mu.RUnlock() + // 读取 Extra 字段... +} +``` + +### 9. 命名规范 + +#### 一致的命名风格 +- 常量使用 camelCase:`rateLimitThreshold` +- 类型使用 PascalCase:`AntigravityQuotaScope` +- 同一概念使用统一命名:`Threshold` 或 `Limit`,不要混用 + +```go +// ❌ 不推荐:命名不一致 +antigravitySmartRetryMinWait // 使用 Min +antigravityRateLimitThreshold // 使用 Threshold + +// ✅ 推荐:统一风格 +antigravityMinRetryWait +antigravityRateLimitThreshold +``` + +### 10. 代码审查清单 + +在提交代码前,检查以下项目: + +- [ ] 函数是否超过 30 行?(不可拆分的逻辑除外,需注释说明) +- [ ] 嵌套是否超过 3 层? +- [ ] 是否有重复代码可以提取? +- [ ] 是否使用了魔法数字? +- [ ] Mock 函数签名是否与实际函数一致? +- [ ] 测试是否覆盖了新增逻辑? +- [ ] 日志是否包含足够的上下文信息? +- [ ] 是否考虑了并发安全? + +--- + +## CI 检查与发布门禁 + +### GitHub Actions 检查项 + +本项目有 4 个 CI 任务,**任何代码推送或发布前都必须全部通过**: + +| Workflow | Job | 说明 | 本地验证命令 | +|----------|-----|------|-------------| +| CI | `test` | 单元测试 + 集成测试 | `cd backend && make test-unit && make test-integration` | +| CI | `golangci-lint` | Go 代码静态检查(golangci-lint v2.7) | `cd backend && golangci-lint run --timeout=5m` | +| Security Scan | `backend-security` | govulncheck + gosec 安全扫描 | `cd backend && govulncheck ./... && gosec -severity high -confidence high ./...` | +| Security Scan | `frontend-security` | pnpm audit 前端依赖安全检查 | `cd frontend && pnpm audit --prod --audit-level=high` | + +### 向上游提交 PR + +PR 目标是上游官方仓库,**只包含通用功能改动**(bug fix、新功能、性能优化等)。 + +**以下文件禁止出现在 PR 中**(属于我们 fork 的定制化内容): +- `CLAUDE.md`、`AGENTS.md` — 我们的开发文档 +- `backend/cmd/server/VERSION` — 我们的版本号文件 +- UI 定制改动(GitHub 链接移除、微信客服按钮、首页定制等) +- 部署配置(`deploy/` 目录下的定制修改) + +**PR 流程**: +1. 从 `develop` 创建功能分支,只包含要提交给上游的改动 +2. 推送分支后,**等待 4 个 CI job 全部通过** +3. 确认通过后再创建 PR +4. 使用 `gh run list --repo touwaeriol/sub2api --branch ` 检查状态 + +### 自有分支推送(develop / main) + +推送到我们自己的 `develop` 或 `main` 分支时,包含所有改动(定制化 + 通用功能)。 + +**推送前必须在本地执行全部 CI 检查**(不要等 GitHub Actions): + +```bash +# 确保 Go 工具链可用(macOS homebrew) +export PATH="/opt/homebrew/bin:$HOME/go/bin:$PATH" + +# 1. 单元测试(必须) +cd backend && make test-unit + +# 2. 集成测试(推荐,需要 Docker) +make test-integration + +# 3. golangci-lint 静态检查(必须) +golangci-lint run --timeout=5m + +# 4. gofmt 格式检查(必须) +gofmt -l ./... +# 如果有输出,运行 gofmt -w 修复 +``` + +**推送后确认**: +1. 使用 `gh run list --repo touwaeriol/sub2api --branch ` 检查 GitHub Actions 状态 +2. 确认 CI 和 Security Scan 两个 workflow 的 4 个 job 全部绿色 ✅ +3. 任何 job 失败必须立即修复,**禁止在 CI 未通过的状态下继续后续操作** + +### 发布版本 + +1. 本地执行上述全部 CI 检查通过 +2. 递增 `backend/cmd/server/VERSION`,提交并推送 +3. 推送后确认 GitHub Actions 的 4 个 CI job 全部通过 +4. **CI 未通过时禁止部署** — 必须先修复问题 +5. 使用 `gh run list --repo touwaeriol/sub2api --limit 10` 确认状态 + +### 常见 CI 失败原因及修复 +- **gofmt**:struct 字段对齐不一致 → 运行 `gofmt -w ` 修复 +- **golangci-lint**:未使用的变量/导入 → 删除或使用 `_` 忽略 +- **test 失败**:mock 函数签名不一致 → 同步更新 mock +- **gosec**:安全漏洞 → 根据提示修复或添加例外 diff --git a/backend/cmd/server/VERSION b/backend/cmd/server/VERSION index 5087e794..0aa59ad9 100644 --- a/backend/cmd/server/VERSION +++ b/backend/cmd/server/VERSION @@ -1 +1 @@ -0.1.76 \ No newline at end of file +0.1.79.7 diff --git a/backend/internal/config/config.go b/backend/internal/config/config.go index 91437ba8..7b6b4a37 100644 --- a/backend/internal/config/config.go +++ b/backend/internal/config/config.go @@ -883,6 +883,7 @@ func setDefaults() { viper.SetDefault("gateway.max_account_switches", 10) viper.SetDefault("gateway.max_account_switches_gemini", 3) viper.SetDefault("gateway.antigravity_fallback_cooldown_minutes", 1) + viper.SetDefault("gateway.antigravity_extra_retries", 10) viper.SetDefault("gateway.max_body_size", int64(100*1024*1024)) viper.SetDefault("gateway.connection_pool_isolation", ConnectionPoolIsolationAccountProxy) // HTTP 上游连接池配置(针对 5000+ 并发用户优化) diff --git a/backend/internal/handler/admin/ops_realtime_handler.go b/backend/internal/handler/admin/ops_realtime_handler.go index c175dcd0..2d3cce4b 100644 --- a/backend/internal/handler/admin/ops_realtime_handler.go +++ b/backend/internal/handler/admin/ops_realtime_handler.go @@ -65,6 +65,10 @@ func (h *OpsHandler) GetConcurrencyStats(c *gin.Context) { // GetUserConcurrencyStats returns real-time concurrency usage for all active users. // GET /api/v1/admin/ops/user-concurrency +// +// Query params: +// - platform: optional, filter users by allowed platform +// - group_id: optional, filter users by allowed group func (h *OpsHandler) GetUserConcurrencyStats(c *gin.Context) { if h.opsService == nil { response.Error(c, http.StatusServiceUnavailable, "Ops service not available") @@ -84,7 +88,18 @@ func (h *OpsHandler) GetUserConcurrencyStats(c *gin.Context) { return } - users, collectedAt, err := h.opsService.GetUserConcurrencyStats(c.Request.Context()) + platformFilter := strings.TrimSpace(c.Query("platform")) + var groupID *int64 + if v := strings.TrimSpace(c.Query("group_id")); v != "" { + id, err := strconv.ParseInt(v, 10, 64) + if err != nil || id <= 0 { + response.BadRequest(c, "Invalid group_id") + return + } + groupID = &id + } + + users, collectedAt, err := h.opsService.GetUserConcurrencyStats(c.Request.Context(), platformFilter, groupID) if err != nil { response.ErrorFrom(c, err) return diff --git a/backend/internal/handler/failover_loop.go b/backend/internal/handler/failover_loop.go new file mode 100644 index 00000000..1f8a7e9a --- /dev/null +++ b/backend/internal/handler/failover_loop.go @@ -0,0 +1,160 @@ +package handler + +import ( + "context" + "log" + "net/http" + "time" + + "github.com/Wei-Shaw/sub2api/internal/service" +) + +// TempUnscheduler 用于 HandleFailoverError 中同账号重试耗尽后的临时封禁。 +// GatewayService 隐式实现此接口。 +type TempUnscheduler interface { + TempUnscheduleRetryableError(ctx context.Context, accountID int64, failoverErr *service.UpstreamFailoverError) +} + +// FailoverAction 表示 failover 错误处理后的下一步动作 +type FailoverAction int + +const ( + // FailoverContinue 继续循环(同账号重试或切换账号,调用方统一 continue) + FailoverContinue FailoverAction = iota + // FailoverExhausted 切换次数耗尽(调用方应返回错误响应) + FailoverExhausted + // FailoverCanceled context 已取消(调用方应直接 return) + FailoverCanceled +) + +const ( + // maxSameAccountRetries 同账号重试次数上限(针对 RetryableOnSameAccount 错误) + maxSameAccountRetries = 2 + // sameAccountRetryDelay 同账号重试间隔 + sameAccountRetryDelay = 500 * time.Millisecond + // singleAccountBackoffDelay 单账号分组 503 退避重试固定延时。 + // Service 层在 SingleAccountRetry 模式下已做充分原地重试(最多 3 次、总等待 30s), + // Handler 层只需短暂间隔后重新进入 Service 层即可。 + singleAccountBackoffDelay = 2 * time.Second +) + +// FailoverState 跨循环迭代共享的 failover 状态 +type FailoverState struct { + SwitchCount int + MaxSwitches int + FailedAccountIDs map[int64]struct{} + SameAccountRetryCount map[int64]int + LastFailoverErr *service.UpstreamFailoverError + ForceCacheBilling bool + hasBoundSession bool +} + +// NewFailoverState 创建 failover 状态 +func NewFailoverState(maxSwitches int, hasBoundSession bool) *FailoverState { + return &FailoverState{ + MaxSwitches: maxSwitches, + FailedAccountIDs: make(map[int64]struct{}), + SameAccountRetryCount: make(map[int64]int), + hasBoundSession: hasBoundSession, + } +} + +// HandleFailoverError 处理 UpstreamFailoverError,返回下一步动作。 +// 包含:缓存计费判断、同账号重试、临时封禁、切换计数、Antigravity 延时。 +func (s *FailoverState) HandleFailoverError( + ctx context.Context, + gatewayService TempUnscheduler, + accountID int64, + platform string, + failoverErr *service.UpstreamFailoverError, +) FailoverAction { + s.LastFailoverErr = failoverErr + + // 缓存计费判断 + if needForceCacheBilling(s.hasBoundSession, failoverErr) { + s.ForceCacheBilling = true + } + + // 同账号重试:对 RetryableOnSameAccount 的临时性错误,先在同一账号上重试 + if failoverErr.RetryableOnSameAccount && s.SameAccountRetryCount[accountID] < maxSameAccountRetries { + s.SameAccountRetryCount[accountID]++ + log.Printf("Account %d: retryable error %d, same-account retry %d/%d", + accountID, failoverErr.StatusCode, s.SameAccountRetryCount[accountID], maxSameAccountRetries) + if !sleepWithContext(ctx, sameAccountRetryDelay) { + return FailoverCanceled + } + return FailoverContinue + } + + // 同账号重试用尽,执行临时封禁 + if failoverErr.RetryableOnSameAccount { + gatewayService.TempUnscheduleRetryableError(ctx, accountID, failoverErr) + } + + // 加入失败列表 + s.FailedAccountIDs[accountID] = struct{}{} + + // 检查是否耗尽 + if s.SwitchCount >= s.MaxSwitches { + return FailoverExhausted + } + + // 递增切换计数 + s.SwitchCount++ + log.Printf("Account %d: upstream error %d, switching account %d/%d", + accountID, failoverErr.StatusCode, s.SwitchCount, s.MaxSwitches) + + // Antigravity 平台换号线性递增延时 + if platform == service.PlatformAntigravity { + delay := time.Duration(s.SwitchCount-1) * time.Second + if !sleepWithContext(ctx, delay) { + return FailoverCanceled + } + } + + return FailoverContinue +} + +// HandleSelectionExhausted 处理选号失败(所有候选账号都在排除列表中)时的退避重试决策。 +// 针对 Antigravity 单账号分组的 503 (MODEL_CAPACITY_EXHAUSTED) 场景: +// 清除排除列表、等待退避后重新选号。 +// +// 返回 FailoverContinue 时,调用方应设置 SingleAccountRetry context 并 continue。 +// 返回 FailoverExhausted 时,调用方应返回错误响应。 +// 返回 FailoverCanceled 时,调用方应直接 return。 +func (s *FailoverState) HandleSelectionExhausted(ctx context.Context) FailoverAction { + if s.LastFailoverErr != nil && + s.LastFailoverErr.StatusCode == http.StatusServiceUnavailable && + s.SwitchCount <= s.MaxSwitches { + + log.Printf("Antigravity single-account 503 backoff: waiting %v before retry (attempt %d)", + singleAccountBackoffDelay, s.SwitchCount) + if !sleepWithContext(ctx, singleAccountBackoffDelay) { + return FailoverCanceled + } + log.Printf("Antigravity single-account 503 retry: clearing failed accounts, retry %d/%d", + s.SwitchCount, s.MaxSwitches) + s.FailedAccountIDs = make(map[int64]struct{}) + return FailoverContinue + } + return FailoverExhausted +} + +// needForceCacheBilling 判断 failover 时是否需要强制缓存计费。 +// 粘性会话切换账号、或上游明确标记时,将 input_tokens 转为 cache_read 计费。 +func needForceCacheBilling(hasBoundSession bool, failoverErr *service.UpstreamFailoverError) bool { + return hasBoundSession || (failoverErr != nil && failoverErr.ForceCacheBilling) +} + +// sleepWithContext 等待指定时长,返回 false 表示 context 已取消。 +func sleepWithContext(ctx context.Context, d time.Duration) bool { + if d <= 0 { + return true + } + select { + case <-ctx.Done(): + return false + case <-time.After(d): + return true + } +} diff --git a/backend/internal/handler/failover_loop_test.go b/backend/internal/handler/failover_loop_test.go new file mode 100644 index 00000000..5a41b2dd --- /dev/null +++ b/backend/internal/handler/failover_loop_test.go @@ -0,0 +1,732 @@ +package handler + +import ( + "context" + "testing" + "time" + + "github.com/Wei-Shaw/sub2api/internal/service" + + "github.com/stretchr/testify/require" +) + +// --------------------------------------------------------------------------- +// Mock +// --------------------------------------------------------------------------- + +// mockTempUnscheduler 记录 TempUnscheduleRetryableError 的调用信息。 +type mockTempUnscheduler struct { + calls []tempUnscheduleCall +} + +type tempUnscheduleCall struct { + accountID int64 + failoverErr *service.UpstreamFailoverError +} + +func (m *mockTempUnscheduler) TempUnscheduleRetryableError(_ context.Context, accountID int64, failoverErr *service.UpstreamFailoverError) { + m.calls = append(m.calls, tempUnscheduleCall{accountID: accountID, failoverErr: failoverErr}) +} + +// --------------------------------------------------------------------------- +// Helper +// --------------------------------------------------------------------------- + +func newTestFailoverErr(statusCode int, retryable, forceBilling bool) *service.UpstreamFailoverError { + return &service.UpstreamFailoverError{ + StatusCode: statusCode, + RetryableOnSameAccount: retryable, + ForceCacheBilling: forceBilling, + } +} + +// --------------------------------------------------------------------------- +// NewFailoverState 测试 +// --------------------------------------------------------------------------- + +func TestNewFailoverState(t *testing.T) { + t.Run("初始化字段正确", func(t *testing.T) { + fs := NewFailoverState(5, true) + require.Equal(t, 5, fs.MaxSwitches) + require.Equal(t, 0, fs.SwitchCount) + require.NotNil(t, fs.FailedAccountIDs) + require.Empty(t, fs.FailedAccountIDs) + require.NotNil(t, fs.SameAccountRetryCount) + require.Empty(t, fs.SameAccountRetryCount) + require.Nil(t, fs.LastFailoverErr) + require.False(t, fs.ForceCacheBilling) + require.True(t, fs.hasBoundSession) + }) + + t.Run("无绑定会话", func(t *testing.T) { + fs := NewFailoverState(3, false) + require.Equal(t, 3, fs.MaxSwitches) + require.False(t, fs.hasBoundSession) + }) + + t.Run("零最大切换次数", func(t *testing.T) { + fs := NewFailoverState(0, false) + require.Equal(t, 0, fs.MaxSwitches) + }) +} + +// --------------------------------------------------------------------------- +// sleepWithContext 测试 +// --------------------------------------------------------------------------- + +func TestSleepWithContext(t *testing.T) { + t.Run("零时长立即返回true", func(t *testing.T) { + start := time.Now() + ok := sleepWithContext(context.Background(), 0) + require.True(t, ok) + require.Less(t, time.Since(start), 50*time.Millisecond) + }) + + t.Run("负时长立即返回true", func(t *testing.T) { + start := time.Now() + ok := sleepWithContext(context.Background(), -1*time.Second) + require.True(t, ok) + require.Less(t, time.Since(start), 50*time.Millisecond) + }) + + t.Run("正常等待后返回true", func(t *testing.T) { + start := time.Now() + ok := sleepWithContext(context.Background(), 50*time.Millisecond) + elapsed := time.Since(start) + require.True(t, ok) + require.GreaterOrEqual(t, elapsed, 40*time.Millisecond) + require.Less(t, elapsed, 500*time.Millisecond) + }) + + t.Run("已取消context立即返回false", func(t *testing.T) { + ctx, cancel := context.WithCancel(context.Background()) + cancel() + + start := time.Now() + ok := sleepWithContext(ctx, 5*time.Second) + require.False(t, ok) + require.Less(t, time.Since(start), 50*time.Millisecond) + }) + + t.Run("等待期间context取消返回false", func(t *testing.T) { + ctx, cancel := context.WithCancel(context.Background()) + go func() { + time.Sleep(30 * time.Millisecond) + cancel() + }() + + start := time.Now() + ok := sleepWithContext(ctx, 5*time.Second) + elapsed := time.Since(start) + require.False(t, ok) + require.Less(t, elapsed, 500*time.Millisecond) + }) +} + +// --------------------------------------------------------------------------- +// HandleFailoverError — 基本切换流程 +// --------------------------------------------------------------------------- + +func TestHandleFailoverError_BasicSwitch(t *testing.T) { + t.Run("非重试错误_非Antigravity_直接切换", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + err := newTestFailoverErr(500, false, false) + + action := fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + + require.Equal(t, FailoverContinue, action) + require.Equal(t, 1, fs.SwitchCount) + require.Contains(t, fs.FailedAccountIDs, int64(100)) + require.Equal(t, err, fs.LastFailoverErr) + require.False(t, fs.ForceCacheBilling) + require.Empty(t, mock.calls, "不应调用 TempUnschedule") + }) + + t.Run("非重试错误_Antigravity_第一次切换无延迟", func(t *testing.T) { + // switchCount 从 0→1 时,sleepFailoverDelay(ctx, 1) 的延时 = (1-1)*1s = 0 + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + err := newTestFailoverErr(500, false, false) + + start := time.Now() + action := fs.HandleFailoverError(context.Background(), mock, 100, service.PlatformAntigravity, err) + elapsed := time.Since(start) + + require.Equal(t, FailoverContinue, action) + require.Equal(t, 1, fs.SwitchCount) + require.Less(t, elapsed, 200*time.Millisecond, "第一次切换延迟应为 0") + }) + + t.Run("非重试错误_Antigravity_第二次切换有1秒延迟", func(t *testing.T) { + // switchCount 从 1→2 时,sleepFailoverDelay(ctx, 2) 的延时 = (2-1)*1s = 1s + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + fs.SwitchCount = 1 // 模拟已切换一次 + + err := newTestFailoverErr(500, false, false) + start := time.Now() + action := fs.HandleFailoverError(context.Background(), mock, 200, service.PlatformAntigravity, err) + elapsed := time.Since(start) + + require.Equal(t, FailoverContinue, action) + require.Equal(t, 2, fs.SwitchCount) + require.GreaterOrEqual(t, elapsed, 800*time.Millisecond, "第二次切换延迟应约 1s") + require.Less(t, elapsed, 3*time.Second) + }) + + t.Run("连续切换直到耗尽", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(2, false) + + // 第一次切换:0→1 + err1 := newTestFailoverErr(500, false, false) + action := fs.HandleFailoverError(context.Background(), mock, 100, "openai", err1) + require.Equal(t, FailoverContinue, action) + require.Equal(t, 1, fs.SwitchCount) + + // 第二次切换:1→2 + err2 := newTestFailoverErr(502, false, false) + action = fs.HandleFailoverError(context.Background(), mock, 200, "openai", err2) + require.Equal(t, FailoverContinue, action) + require.Equal(t, 2, fs.SwitchCount) + + // 第三次已耗尽:SwitchCount(2) >= MaxSwitches(2) + err3 := newTestFailoverErr(503, false, false) + action = fs.HandleFailoverError(context.Background(), mock, 300, "openai", err3) + require.Equal(t, FailoverExhausted, action) + require.Equal(t, 2, fs.SwitchCount, "耗尽时不应继续递增") + + // 验证失败账号列表 + require.Len(t, fs.FailedAccountIDs, 3) + require.Contains(t, fs.FailedAccountIDs, int64(100)) + require.Contains(t, fs.FailedAccountIDs, int64(200)) + require.Contains(t, fs.FailedAccountIDs, int64(300)) + + // LastFailoverErr 应为最后一次的错误 + require.Equal(t, err3, fs.LastFailoverErr) + }) + + t.Run("MaxSwitches为0时首次即耗尽", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(0, false) + err := newTestFailoverErr(500, false, false) + + action := fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + require.Equal(t, FailoverExhausted, action) + require.Equal(t, 0, fs.SwitchCount) + require.Contains(t, fs.FailedAccountIDs, int64(100)) + }) +} + +// --------------------------------------------------------------------------- +// HandleFailoverError — 缓存计费 (ForceCacheBilling) +// --------------------------------------------------------------------------- + +func TestHandleFailoverError_CacheBilling(t *testing.T) { + t.Run("hasBoundSession为true时设置ForceCacheBilling", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, true) // hasBoundSession=true + err := newTestFailoverErr(500, false, false) + + fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + require.True(t, fs.ForceCacheBilling) + }) + + t.Run("failoverErr.ForceCacheBilling为true时设置", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + err := newTestFailoverErr(500, false, true) // ForceCacheBilling=true + + fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + require.True(t, fs.ForceCacheBilling) + }) + + t.Run("两者均为false时不设置", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + err := newTestFailoverErr(500, false, false) + + fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + require.False(t, fs.ForceCacheBilling) + }) + + t.Run("一旦设置不会被后续错误重置", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + + // 第一次:ForceCacheBilling=true → 设置 + err1 := newTestFailoverErr(500, false, true) + fs.HandleFailoverError(context.Background(), mock, 100, "openai", err1) + require.True(t, fs.ForceCacheBilling) + + // 第二次:ForceCacheBilling=false → 仍然保持 true + err2 := newTestFailoverErr(502, false, false) + fs.HandleFailoverError(context.Background(), mock, 200, "openai", err2) + require.True(t, fs.ForceCacheBilling, "ForceCacheBilling 一旦设置不应被重置") + }) +} + +// --------------------------------------------------------------------------- +// HandleFailoverError — 同账号重试 (RetryableOnSameAccount) +// --------------------------------------------------------------------------- + +func TestHandleFailoverError_SameAccountRetry(t *testing.T) { + t.Run("第一次重试返回FailoverContinue", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + err := newTestFailoverErr(400, true, false) + + start := time.Now() + action := fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + elapsed := time.Since(start) + + require.Equal(t, FailoverContinue, action) + require.Equal(t, 1, fs.SameAccountRetryCount[100]) + require.Equal(t, 0, fs.SwitchCount, "同账号重试不应增加切换计数") + require.NotContains(t, fs.FailedAccountIDs, int64(100), "同账号重试不应加入失败列表") + require.Empty(t, mock.calls, "同账号重试期间不应调用 TempUnschedule") + // 验证等待了 sameAccountRetryDelay (500ms) + require.GreaterOrEqual(t, elapsed, 400*time.Millisecond) + require.Less(t, elapsed, 2*time.Second) + }) + + t.Run("第二次重试仍返回FailoverContinue", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + err := newTestFailoverErr(400, true, false) + + // 第一次 + action := fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + require.Equal(t, FailoverContinue, action) + require.Equal(t, 1, fs.SameAccountRetryCount[100]) + + // 第二次 + action = fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + require.Equal(t, FailoverContinue, action) + require.Equal(t, 2, fs.SameAccountRetryCount[100]) + + require.Empty(t, mock.calls, "两次重试期间均不应调用 TempUnschedule") + }) + + t.Run("第三次重试耗尽_触发TempUnschedule并切换", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + err := newTestFailoverErr(400, true, false) + + // 第一次、第二次重试 + fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + require.Equal(t, 2, fs.SameAccountRetryCount[100]) + + // 第三次:重试已达到 maxSameAccountRetries(2),应切换账号 + action := fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + require.Equal(t, FailoverContinue, action) + require.Equal(t, 1, fs.SwitchCount) + require.Contains(t, fs.FailedAccountIDs, int64(100)) + + // 验证 TempUnschedule 被调用 + require.Len(t, mock.calls, 1) + require.Equal(t, int64(100), mock.calls[0].accountID) + require.Equal(t, err, mock.calls[0].failoverErr) + }) + + t.Run("不同账号独立跟踪重试次数", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(5, false) + err := newTestFailoverErr(400, true, false) + + // 账号 100 第一次重试 + action := fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + require.Equal(t, FailoverContinue, action) + require.Equal(t, 1, fs.SameAccountRetryCount[100]) + + // 账号 200 第一次重试(独立计数) + action = fs.HandleFailoverError(context.Background(), mock, 200, "openai", err) + require.Equal(t, FailoverContinue, action) + require.Equal(t, 1, fs.SameAccountRetryCount[200]) + require.Equal(t, 1, fs.SameAccountRetryCount[100], "账号 100 的计数不应受影响") + }) + + t.Run("重试耗尽后再次遇到同账号_直接切换", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(5, false) + err := newTestFailoverErr(400, true, false) + + // 耗尽账号 100 的重试 + fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + // 第三次: 重试耗尽 → 切换 + action := fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + require.Equal(t, FailoverContinue, action) + + // 再次遇到账号 100,计数仍为 2,条件不满足 → 直接切换 + action = fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + require.Equal(t, FailoverContinue, action) + require.Len(t, mock.calls, 2, "第二次耗尽也应调用 TempUnschedule") + }) +} + +// --------------------------------------------------------------------------- +// HandleFailoverError — TempUnschedule 调用验证 +// --------------------------------------------------------------------------- + +func TestHandleFailoverError_TempUnschedule(t *testing.T) { + t.Run("非重试错误不调用TempUnschedule", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + err := newTestFailoverErr(500, false, false) // RetryableOnSameAccount=false + + fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + require.Empty(t, mock.calls) + }) + + t.Run("重试错误耗尽后调用TempUnschedule_传入正确参数", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + err := newTestFailoverErr(502, true, false) + + // 耗尽重试 + fs.HandleFailoverError(context.Background(), mock, 42, "openai", err) + fs.HandleFailoverError(context.Background(), mock, 42, "openai", err) + fs.HandleFailoverError(context.Background(), mock, 42, "openai", err) + + require.Len(t, mock.calls, 1) + require.Equal(t, int64(42), mock.calls[0].accountID) + require.Equal(t, 502, mock.calls[0].failoverErr.StatusCode) + require.True(t, mock.calls[0].failoverErr.RetryableOnSameAccount) + }) +} + +// --------------------------------------------------------------------------- +// HandleFailoverError — Context 取消 +// --------------------------------------------------------------------------- + +func TestHandleFailoverError_ContextCanceled(t *testing.T) { + t.Run("同账号重试sleep期间context取消", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + err := newTestFailoverErr(400, true, false) + + ctx, cancel := context.WithCancel(context.Background()) + cancel() // 立即取消 + + start := time.Now() + action := fs.HandleFailoverError(ctx, mock, 100, "openai", err) + elapsed := time.Since(start) + + require.Equal(t, FailoverCanceled, action) + require.Less(t, elapsed, 100*time.Millisecond, "应立即返回") + // 重试计数仍应递增 + require.Equal(t, 1, fs.SameAccountRetryCount[100]) + }) + + t.Run("Antigravity延迟期间context取消", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + fs.SwitchCount = 1 // 下一次 switchCount=2 → delay = 1s + err := newTestFailoverErr(500, false, false) + + ctx, cancel := context.WithCancel(context.Background()) + cancel() // 立即取消 + + start := time.Now() + action := fs.HandleFailoverError(ctx, mock, 100, service.PlatformAntigravity, err) + elapsed := time.Since(start) + + require.Equal(t, FailoverCanceled, action) + require.Less(t, elapsed, 100*time.Millisecond, "应立即返回而非等待 1s") + }) +} + +// --------------------------------------------------------------------------- +// HandleFailoverError — FailedAccountIDs 跟踪 +// --------------------------------------------------------------------------- + +func TestHandleFailoverError_FailedAccountIDs(t *testing.T) { + t.Run("切换时添加到失败列表", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + + fs.HandleFailoverError(context.Background(), mock, 100, "openai", newTestFailoverErr(500, false, false)) + require.Contains(t, fs.FailedAccountIDs, int64(100)) + + fs.HandleFailoverError(context.Background(), mock, 200, "openai", newTestFailoverErr(502, false, false)) + require.Contains(t, fs.FailedAccountIDs, int64(200)) + require.Len(t, fs.FailedAccountIDs, 2) + }) + + t.Run("耗尽时也添加到失败列表", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(0, false) + + action := fs.HandleFailoverError(context.Background(), mock, 100, "openai", newTestFailoverErr(500, false, false)) + require.Equal(t, FailoverExhausted, action) + require.Contains(t, fs.FailedAccountIDs, int64(100)) + }) + + t.Run("同账号重试期间不添加到失败列表", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + + action := fs.HandleFailoverError(context.Background(), mock, 100, "openai", newTestFailoverErr(400, true, false)) + require.Equal(t, FailoverContinue, action) + require.NotContains(t, fs.FailedAccountIDs, int64(100)) + }) + + t.Run("同一账号多次切换不重复添加", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(5, false) + + fs.HandleFailoverError(context.Background(), mock, 100, "openai", newTestFailoverErr(500, false, false)) + fs.HandleFailoverError(context.Background(), mock, 100, "openai", newTestFailoverErr(500, false, false)) + require.Len(t, fs.FailedAccountIDs, 1, "map 天然去重") + }) +} + +// --------------------------------------------------------------------------- +// HandleFailoverError — LastFailoverErr 更新 +// --------------------------------------------------------------------------- + +func TestHandleFailoverError_LastFailoverErr(t *testing.T) { + t.Run("每次调用都更新LastFailoverErr", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + + err1 := newTestFailoverErr(500, false, false) + fs.HandleFailoverError(context.Background(), mock, 100, "openai", err1) + require.Equal(t, err1, fs.LastFailoverErr) + + err2 := newTestFailoverErr(502, false, false) + fs.HandleFailoverError(context.Background(), mock, 200, "openai", err2) + require.Equal(t, err2, fs.LastFailoverErr) + }) + + t.Run("同账号重试时也更新LastFailoverErr", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + + err := newTestFailoverErr(400, true, false) + fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + require.Equal(t, err, fs.LastFailoverErr) + }) +} + +// --------------------------------------------------------------------------- +// HandleFailoverError — 综合集成场景 +// --------------------------------------------------------------------------- + +func TestHandleFailoverError_IntegrationScenario(t *testing.T) { + t.Run("模拟完整failover流程_多账号混合重试与切换", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, true) // hasBoundSession=true + + // 1. 账号 100 遇到可重试错误,同账号重试 2 次 + retryErr := newTestFailoverErr(400, true, false) + action := fs.HandleFailoverError(context.Background(), mock, 100, "openai", retryErr) + require.Equal(t, FailoverContinue, action) + require.True(t, fs.ForceCacheBilling, "hasBoundSession=true 应设置 ForceCacheBilling") + + action = fs.HandleFailoverError(context.Background(), mock, 100, "openai", retryErr) + require.Equal(t, FailoverContinue, action) + + // 2. 账号 100 重试耗尽 → TempUnschedule + 切换 + action = fs.HandleFailoverError(context.Background(), mock, 100, "openai", retryErr) + require.Equal(t, FailoverContinue, action) + require.Equal(t, 1, fs.SwitchCount) + require.Len(t, mock.calls, 1) + + // 3. 账号 200 遇到不可重试错误 → 直接切换 + switchErr := newTestFailoverErr(500, false, false) + action = fs.HandleFailoverError(context.Background(), mock, 200, "openai", switchErr) + require.Equal(t, FailoverContinue, action) + require.Equal(t, 2, fs.SwitchCount) + + // 4. 账号 300 遇到不可重试错误 → 再切换 + action = fs.HandleFailoverError(context.Background(), mock, 300, "openai", switchErr) + require.Equal(t, FailoverContinue, action) + require.Equal(t, 3, fs.SwitchCount) + + // 5. 账号 400 → 已耗尽 (SwitchCount=3 >= MaxSwitches=3) + action = fs.HandleFailoverError(context.Background(), mock, 400, "openai", switchErr) + require.Equal(t, FailoverExhausted, action) + + // 最终状态验证 + require.Equal(t, 3, fs.SwitchCount, "耗尽时不再递增") + require.Len(t, fs.FailedAccountIDs, 4, "4个不同账号都在失败列表中") + require.True(t, fs.ForceCacheBilling) + require.Len(t, mock.calls, 1, "只有账号 100 触发了 TempUnschedule") + }) + + t.Run("模拟Antigravity平台完整流程", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(2, false) + + err := newTestFailoverErr(500, false, false) + + // 第一次切换:delay = 0s + start := time.Now() + action := fs.HandleFailoverError(context.Background(), mock, 100, service.PlatformAntigravity, err) + elapsed := time.Since(start) + require.Equal(t, FailoverContinue, action) + require.Less(t, elapsed, 200*time.Millisecond, "第一次切换延迟为 0") + + // 第二次切换:delay = 1s + start = time.Now() + action = fs.HandleFailoverError(context.Background(), mock, 200, service.PlatformAntigravity, err) + elapsed = time.Since(start) + require.Equal(t, FailoverContinue, action) + require.GreaterOrEqual(t, elapsed, 800*time.Millisecond, "第二次切换延迟约 1s") + + // 第三次:耗尽(无延迟,因为在检查延迟之前就返回了) + start = time.Now() + action = fs.HandleFailoverError(context.Background(), mock, 300, service.PlatformAntigravity, err) + elapsed = time.Since(start) + require.Equal(t, FailoverExhausted, action) + require.Less(t, elapsed, 200*time.Millisecond, "耗尽时不应有延迟") + }) + + t.Run("ForceCacheBilling通过错误标志设置", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) // hasBoundSession=false + + // 第一次:ForceCacheBilling=false + err1 := newTestFailoverErr(500, false, false) + fs.HandleFailoverError(context.Background(), mock, 100, "openai", err1) + require.False(t, fs.ForceCacheBilling) + + // 第二次:ForceCacheBilling=true(Antigravity 粘性会话切换) + err2 := newTestFailoverErr(500, false, true) + fs.HandleFailoverError(context.Background(), mock, 200, "openai", err2) + require.True(t, fs.ForceCacheBilling, "错误标志应触发 ForceCacheBilling") + + // 第三次:ForceCacheBilling=false,但状态仍保持 true + err3 := newTestFailoverErr(500, false, false) + fs.HandleFailoverError(context.Background(), mock, 300, "openai", err3) + require.True(t, fs.ForceCacheBilling, "不应重置") + }) +} + +// --------------------------------------------------------------------------- +// HandleFailoverError — 边界条件 +// --------------------------------------------------------------------------- + +func TestHandleFailoverError_EdgeCases(t *testing.T) { + t.Run("StatusCode为0的错误也能正常处理", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + err := newTestFailoverErr(0, false, false) + + action := fs.HandleFailoverError(context.Background(), mock, 100, "openai", err) + require.Equal(t, FailoverContinue, action) + }) + + t.Run("AccountID为0也能正常跟踪", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + err := newTestFailoverErr(500, true, false) + + action := fs.HandleFailoverError(context.Background(), mock, 0, "openai", err) + require.Equal(t, FailoverContinue, action) + require.Equal(t, 1, fs.SameAccountRetryCount[0]) + }) + + t.Run("负AccountID也能正常跟踪", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + err := newTestFailoverErr(500, true, false) + + action := fs.HandleFailoverError(context.Background(), mock, -1, "openai", err) + require.Equal(t, FailoverContinue, action) + require.Equal(t, 1, fs.SameAccountRetryCount[-1]) + }) + + t.Run("空平台名称不触发Antigravity延迟", func(t *testing.T) { + mock := &mockTempUnscheduler{} + fs := NewFailoverState(3, false) + fs.SwitchCount = 1 + err := newTestFailoverErr(500, false, false) + + start := time.Now() + action := fs.HandleFailoverError(context.Background(), mock, 100, "", err) + elapsed := time.Since(start) + + require.Equal(t, FailoverContinue, action) + require.Less(t, elapsed, 200*time.Millisecond, "空平台不应触发 Antigravity 延迟") + }) +} + +// --------------------------------------------------------------------------- +// HandleSelectionExhausted 测试 +// --------------------------------------------------------------------------- + +func TestHandleSelectionExhausted(t *testing.T) { + t.Run("无LastFailoverErr时返回Exhausted", func(t *testing.T) { + fs := NewFailoverState(3, false) + // LastFailoverErr 为 nil + + action := fs.HandleSelectionExhausted(context.Background()) + require.Equal(t, FailoverExhausted, action) + }) + + t.Run("非503错误返回Exhausted", func(t *testing.T) { + fs := NewFailoverState(3, false) + fs.LastFailoverErr = newTestFailoverErr(500, false, false) + + action := fs.HandleSelectionExhausted(context.Background()) + require.Equal(t, FailoverExhausted, action) + }) + + t.Run("503且未耗尽_等待后返回Continue并清除失败列表", func(t *testing.T) { + fs := NewFailoverState(3, false) + fs.LastFailoverErr = newTestFailoverErr(503, false, false) + fs.FailedAccountIDs[100] = struct{}{} + fs.SwitchCount = 1 + + start := time.Now() + action := fs.HandleSelectionExhausted(context.Background()) + elapsed := time.Since(start) + + require.Equal(t, FailoverContinue, action) + require.Empty(t, fs.FailedAccountIDs, "应清除失败账号列表") + require.GreaterOrEqual(t, elapsed, 1500*time.Millisecond, "应等待约 2s") + require.Less(t, elapsed, 5*time.Second) + }) + + t.Run("503但SwitchCount已超过MaxSwitches_返回Exhausted", func(t *testing.T) { + fs := NewFailoverState(2, false) + fs.LastFailoverErr = newTestFailoverErr(503, false, false) + fs.SwitchCount = 3 // > MaxSwitches(2) + + start := time.Now() + action := fs.HandleSelectionExhausted(context.Background()) + elapsed := time.Since(start) + + require.Equal(t, FailoverExhausted, action) + require.Less(t, elapsed, 100*time.Millisecond, "不应等待") + }) + + t.Run("503但context已取消_返回Canceled", func(t *testing.T) { + fs := NewFailoverState(3, false) + fs.LastFailoverErr = newTestFailoverErr(503, false, false) + + ctx, cancel := context.WithCancel(context.Background()) + cancel() + + start := time.Now() + action := fs.HandleSelectionExhausted(ctx) + elapsed := time.Since(start) + + require.Equal(t, FailoverCanceled, action) + require.Less(t, elapsed, 100*time.Millisecond, "应立即返回") + }) + + t.Run("503且SwitchCount等于MaxSwitches_仍可重试", func(t *testing.T) { + fs := NewFailoverState(2, false) + fs.LastFailoverErr = newTestFailoverErr(503, false, false) + fs.SwitchCount = 2 // == MaxSwitches,条件是 <=,仍可重试 + + action := fs.HandleSelectionExhausted(context.Background()) + require.Equal(t, FailoverContinue, action) + }) +} diff --git a/backend/internal/handler/gateway_handler.go b/backend/internal/handler/gateway_handler.go index c2b6bf09..0cc86bb4 100644 --- a/backend/internal/handler/gateway_handler.go +++ b/backend/internal/handler/gateway_handler.go @@ -232,12 +232,7 @@ func (h *GatewayHandler) Messages(c *gin.Context) { hasBoundSession := sessionKey != "" && sessionBoundAccountID > 0 if platform == service.PlatformGemini { - maxAccountSwitches := h.maxAccountSwitchesGemini - switchCount := 0 - failedAccountIDs := make(map[int64]struct{}) - sameAccountRetryCount := make(map[int64]int) // 同账号重试计数 - var lastFailoverErr *service.UpstreamFailoverError - var forceCacheBilling bool // 粘性会话切换时的缓存计费标记 + fs := NewFailoverState(h.maxAccountSwitchesGemini, hasBoundSession) // 单账号分组提前设置 SingleAccountRetry 标记,让 Service 层首次 503 就不设模型限流标记。 // 避免单账号分组收到 503 (MODEL_CAPACITY_EXHAUSTED) 时设 29s 限流,导致后续请求连续快速失败。 @@ -247,31 +242,28 @@ func (h *GatewayHandler) Messages(c *gin.Context) { } for { - selection, err := h.gatewayService.SelectAccountWithLoadAwareness(c.Request.Context(), apiKey.GroupID, sessionKey, reqModel, failedAccountIDs, "") // Gemini 不使用会话限制 + selection, err := h.gatewayService.SelectAccountWithLoadAwareness(c.Request.Context(), apiKey.GroupID, sessionKey, reqModel, fs.FailedAccountIDs, "") // Gemini 不使用会话限制 if err != nil { - if len(failedAccountIDs) == 0 { + if len(fs.FailedAccountIDs) == 0 { h.handleStreamingAwareError(c, http.StatusServiceUnavailable, "api_error", "No available accounts: "+err.Error(), streamStarted) return } - // Antigravity 单账号退避重试:分组内没有其他可用账号时, - // 对 503 错误不直接返回,而是清除排除列表、等待退避后重试同一个账号。 - // 谷歌上游 503 (MODEL_CAPACITY_EXHAUSTED) 通常是暂时性的,等几秒就能恢复。 - if lastFailoverErr != nil && lastFailoverErr.StatusCode == http.StatusServiceUnavailable && switchCount <= maxAccountSwitches { - if sleepAntigravitySingleAccountBackoff(c.Request.Context(), switchCount) { - log.Printf("Antigravity single-account 503 retry: clearing failed accounts, retry %d/%d", switchCount, maxAccountSwitches) - failedAccountIDs = make(map[int64]struct{}) - // 设置 context 标记,让 Service 层预检查等待限流过期而非直接切换 - ctx := context.WithValue(c.Request.Context(), ctxkey.SingleAccountRetry, true) - c.Request = c.Request.WithContext(ctx) - continue + action := fs.HandleSelectionExhausted(c.Request.Context()) + switch action { + case FailoverContinue: + ctx := context.WithValue(c.Request.Context(), ctxkey.SingleAccountRetry, true) + c.Request = c.Request.WithContext(ctx) + continue + case FailoverCanceled: + return + default: // FailoverExhausted + if fs.LastFailoverErr != nil { + h.handleFailoverExhausted(c, fs.LastFailoverErr, service.PlatformGemini, streamStarted) + } else { + h.handleFailoverExhaustedSimple(c, 502, streamStarted) } + return } - if lastFailoverErr != nil { - h.handleFailoverExhausted(c, lastFailoverErr, service.PlatformGemini, streamStarted) - } else { - h.handleFailoverExhaustedSimple(c, 502, streamStarted) - } - return } account := selection.Account setOpsSelectedAccount(c, account.ID) @@ -346,8 +338,8 @@ func (h *GatewayHandler) Messages(c *gin.Context) { // 转发请求 - 根据账号平台分流 var result *service.ForwardResult requestCtx := c.Request.Context() - if switchCount > 0 { - requestCtx = context.WithValue(requestCtx, ctxkey.AccountSwitchCount, switchCount) + if fs.SwitchCount > 0 { + requestCtx = context.WithValue(requestCtx, ctxkey.AccountSwitchCount, fs.SwitchCount) } if account.Platform == service.PlatformAntigravity { result, err = h.antigravityGatewayService.ForwardGemini(requestCtx, c, account, reqModel, "generateContent", reqStream, body, hasBoundSession) @@ -360,40 +352,16 @@ func (h *GatewayHandler) Messages(c *gin.Context) { if err != nil { var failoverErr *service.UpstreamFailoverError if errors.As(err, &failoverErr) { - lastFailoverErr = failoverErr - if needForceCacheBilling(hasBoundSession, failoverErr) { - forceCacheBilling = true - } - - // 同账号重试:对 RetryableOnSameAccount 的临时性错误,先在同一账号上重试 - if failoverErr.RetryableOnSameAccount && sameAccountRetryCount[account.ID] < maxSameAccountRetries { - sameAccountRetryCount[account.ID]++ - log.Printf("Account %d: retryable error %d, same-account retry %d/%d", - account.ID, failoverErr.StatusCode, sameAccountRetryCount[account.ID], maxSameAccountRetries) - if !sleepSameAccountRetryDelay(c.Request.Context()) { - return - } + action := fs.HandleFailoverError(c.Request.Context(), h.gatewayService, account.ID, account.Platform, failoverErr) + switch action { + case FailoverContinue: continue - } - - // 同账号重试用尽,执行临时封禁并切换账号 - if failoverErr.RetryableOnSameAccount { - h.gatewayService.TempUnscheduleRetryableError(c.Request.Context(), account.ID, failoverErr) - } - - failedAccountIDs[account.ID] = struct{}{} - if switchCount >= maxAccountSwitches { - h.handleFailoverExhausted(c, failoverErr, service.PlatformGemini, streamStarted) + case FailoverExhausted: + h.handleFailoverExhausted(c, fs.LastFailoverErr, service.PlatformGemini, streamStarted) + return + case FailoverCanceled: return } - switchCount++ - log.Printf("Account %d: upstream error %d, switching account %d/%d", account.ID, failoverErr.StatusCode, switchCount, maxAccountSwitches) - if account.Platform == service.PlatformAntigravity { - if !sleepFailoverDelay(c.Request.Context(), switchCount) { - return - } - } - continue } // 错误响应已在Forward中处理,这里只记录日志 log.Printf("Forward request failed: %v", err) @@ -421,7 +389,7 @@ func (h *GatewayHandler) Messages(c *gin.Context) { }); err != nil { log.Printf("Record usage failed: %v", err) } - }(result, account, userAgent, clientIP, forceCacheBilling) + }(result, account, userAgent, clientIP, fs.ForceCacheBilling) return } } @@ -442,41 +410,33 @@ func (h *GatewayHandler) Messages(c *gin.Context) { } for { - maxAccountSwitches := h.maxAccountSwitches - switchCount := 0 - failedAccountIDs := make(map[int64]struct{}) - sameAccountRetryCount := make(map[int64]int) // 同账号重试计数 - var lastFailoverErr *service.UpstreamFailoverError + fs := NewFailoverState(h.maxAccountSwitches, hasBoundSession) retryWithFallback := false - var forceCacheBilling bool // 粘性会话切换时的缓存计费标记 for { // 选择支持该模型的账号 - selection, err := h.gatewayService.SelectAccountWithLoadAwareness(c.Request.Context(), currentAPIKey.GroupID, sessionKey, reqModel, failedAccountIDs, parsedReq.MetadataUserID) + selection, err := h.gatewayService.SelectAccountWithLoadAwareness(c.Request.Context(), currentAPIKey.GroupID, sessionKey, reqModel, fs.FailedAccountIDs, parsedReq.MetadataUserID) if err != nil { - if len(failedAccountIDs) == 0 { + if len(fs.FailedAccountIDs) == 0 { h.handleStreamingAwareError(c, http.StatusServiceUnavailable, "api_error", "No available accounts: "+err.Error(), streamStarted) return } - // Antigravity 单账号退避重试:分组内没有其他可用账号时, - // 对 503 错误不直接返回,而是清除排除列表、等待退避后重试同一个账号。 - // 谷歌上游 503 (MODEL_CAPACITY_EXHAUSTED) 通常是暂时性的,等几秒就能恢复。 - if lastFailoverErr != nil && lastFailoverErr.StatusCode == http.StatusServiceUnavailable && switchCount <= maxAccountSwitches { - if sleepAntigravitySingleAccountBackoff(c.Request.Context(), switchCount) { - log.Printf("Antigravity single-account 503 retry: clearing failed accounts, retry %d/%d", switchCount, maxAccountSwitches) - failedAccountIDs = make(map[int64]struct{}) - // 设置 context 标记,让 Service 层预检查等待限流过期而非直接切换 - ctx := context.WithValue(c.Request.Context(), ctxkey.SingleAccountRetry, true) - c.Request = c.Request.WithContext(ctx) - continue + action := fs.HandleSelectionExhausted(c.Request.Context()) + switch action { + case FailoverContinue: + ctx := context.WithValue(c.Request.Context(), ctxkey.SingleAccountRetry, true) + c.Request = c.Request.WithContext(ctx) + continue + case FailoverCanceled: + return + default: // FailoverExhausted + if fs.LastFailoverErr != nil { + h.handleFailoverExhausted(c, fs.LastFailoverErr, platform, streamStarted) + } else { + h.handleFailoverExhaustedSimple(c, 502, streamStarted) } + return } - if lastFailoverErr != nil { - h.handleFailoverExhausted(c, lastFailoverErr, platform, streamStarted) - } else { - h.handleFailoverExhaustedSimple(c, 502, streamStarted) - } - return } account := selection.Account setOpsSelectedAccount(c, account.ID) @@ -549,8 +509,8 @@ func (h *GatewayHandler) Messages(c *gin.Context) { // 转发请求 - 根据账号平台分流 var result *service.ForwardResult requestCtx := c.Request.Context() - if switchCount > 0 { - requestCtx = context.WithValue(requestCtx, ctxkey.AccountSwitchCount, switchCount) + if fs.SwitchCount > 0 { + requestCtx = context.WithValue(requestCtx, ctxkey.AccountSwitchCount, fs.SwitchCount) } if account.Platform == service.PlatformAntigravity && account.Type != service.AccountTypeAPIKey { result, err = h.antigravityGatewayService.Forward(requestCtx, c, account, body, hasBoundSession) @@ -598,40 +558,16 @@ func (h *GatewayHandler) Messages(c *gin.Context) { } var failoverErr *service.UpstreamFailoverError if errors.As(err, &failoverErr) { - lastFailoverErr = failoverErr - if needForceCacheBilling(hasBoundSession, failoverErr) { - forceCacheBilling = true - } - - // 同账号重试:对 RetryableOnSameAccount 的临时性错误,先在同一账号上重试 - if failoverErr.RetryableOnSameAccount && sameAccountRetryCount[account.ID] < maxSameAccountRetries { - sameAccountRetryCount[account.ID]++ - log.Printf("Account %d: retryable error %d, same-account retry %d/%d", - account.ID, failoverErr.StatusCode, sameAccountRetryCount[account.ID], maxSameAccountRetries) - if !sleepSameAccountRetryDelay(c.Request.Context()) { - return - } + action := fs.HandleFailoverError(c.Request.Context(), h.gatewayService, account.ID, account.Platform, failoverErr) + switch action { + case FailoverContinue: continue - } - - // 同账号重试用尽,执行临时封禁并切换账号 - if failoverErr.RetryableOnSameAccount { - h.gatewayService.TempUnscheduleRetryableError(c.Request.Context(), account.ID, failoverErr) - } - - failedAccountIDs[account.ID] = struct{}{} - if switchCount >= maxAccountSwitches { - h.handleFailoverExhausted(c, failoverErr, account.Platform, streamStarted) + case FailoverExhausted: + h.handleFailoverExhausted(c, fs.LastFailoverErr, account.Platform, streamStarted) + return + case FailoverCanceled: return } - switchCount++ - log.Printf("Account %d: upstream error %d, switching account %d/%d", account.ID, failoverErr.StatusCode, switchCount, maxAccountSwitches) - if account.Platform == service.PlatformAntigravity { - if !sleepFailoverDelay(c.Request.Context(), switchCount) { - return - } - } - continue } // 错误响应已在Forward中处理,这里只记录日志 log.Printf("Account %d: Forward request failed: %v", account.ID, err) @@ -659,7 +595,7 @@ func (h *GatewayHandler) Messages(c *gin.Context) { }); err != nil { log.Printf("Record usage failed: %v", err) } - }(result, account, userAgent, clientIP, forceCacheBilling) + }(result, account, userAgent, clientIP, fs.ForceCacheBilling) return } if !retryWithFallback { @@ -893,65 +829,6 @@ func (h *GatewayHandler) handleConcurrencyError(c *gin.Context, err error, slotT fmt.Sprintf("Concurrency limit exceeded for %s, please retry later", slotType), streamStarted) } -// needForceCacheBilling 判断 failover 时是否需要强制缓存计费 -// 粘性会话切换账号、或上游明确标记时,将 input_tokens 转为 cache_read 计费 -func needForceCacheBilling(hasBoundSession bool, failoverErr *service.UpstreamFailoverError) bool { - return hasBoundSession || (failoverErr != nil && failoverErr.ForceCacheBilling) -} - -const ( - // maxSameAccountRetries 同账号重试次数上限(针对 RetryableOnSameAccount 错误) - maxSameAccountRetries = 2 - // sameAccountRetryDelay 同账号重试间隔 - sameAccountRetryDelay = 500 * time.Millisecond -) - -// sleepSameAccountRetryDelay 同账号重试固定延时,返回 false 表示 context 已取消。 -func sleepSameAccountRetryDelay(ctx context.Context) bool { - select { - case <-ctx.Done(): - return false - case <-time.After(sameAccountRetryDelay): - return true - } -} - -// sleepFailoverDelay 账号切换线性递增延时:第1次0s、第2次1s、第3次2s… -// 返回 false 表示 context 已取消。 -func sleepFailoverDelay(ctx context.Context, switchCount int) bool { - delay := time.Duration(switchCount-1) * time.Second - if delay <= 0 { - return true - } - select { - case <-ctx.Done(): - return false - case <-time.After(delay): - return true - } -} - -// sleepAntigravitySingleAccountBackoff Antigravity 平台单账号分组的 503 退避重试延时。 -// 当分组内只有一个可用账号且上游返回 503(MODEL_CAPACITY_EXHAUSTED)时使用, -// 采用短固定延时策略。Service 层在 SingleAccountRetry 模式下已经做了充分的原地重试 -// (最多 3 次、总等待 30s),所以 Handler 层的退避只需短暂等待即可。 -// 返回 false 表示 context 已取消。 -func sleepAntigravitySingleAccountBackoff(ctx context.Context, retryCount int) bool { - // 固定短延时:2s - // Service 层已经在原地等待了足够长的时间(retryDelay × 重试次数), - // Handler 层只需短暂间隔后重新进入 Service 层即可。 - const delay = 2 * time.Second - - log.Printf("Antigravity single-account 503 backoff: waiting %v before retry (attempt %d)", delay, retryCount) - - select { - case <-ctx.Done(): - return false - case <-time.After(delay): - return true - } -} - func (h *GatewayHandler) handleFailoverExhausted(c *gin.Context, failoverErr *service.UpstreamFailoverError, platform string, streamStarted bool) { statusCode := failoverErr.StatusCode responseBody := failoverErr.ResponseBody diff --git a/backend/internal/handler/gateway_handler_single_account_retry_test.go b/backend/internal/handler/gateway_handler_single_account_retry_test.go deleted file mode 100644 index 96aa14c6..00000000 --- a/backend/internal/handler/gateway_handler_single_account_retry_test.go +++ /dev/null @@ -1,51 +0,0 @@ -package handler - -import ( - "context" - "testing" - "time" - - "github.com/stretchr/testify/require" -) - -// --------------------------------------------------------------------------- -// sleepAntigravitySingleAccountBackoff 测试 -// --------------------------------------------------------------------------- - -func TestSleepAntigravitySingleAccountBackoff_ReturnsTrue(t *testing.T) { - ctx := context.Background() - start := time.Now() - ok := sleepAntigravitySingleAccountBackoff(ctx, 1) - elapsed := time.Since(start) - - require.True(t, ok, "should return true when context is not canceled") - // 固定延迟 2s - require.GreaterOrEqual(t, elapsed, 1500*time.Millisecond, "should wait approximately 2s") - require.Less(t, elapsed, 5*time.Second, "should not wait too long") -} - -func TestSleepAntigravitySingleAccountBackoff_ContextCanceled(t *testing.T) { - ctx, cancel := context.WithCancel(context.Background()) - cancel() // 立即取消 - - start := time.Now() - ok := sleepAntigravitySingleAccountBackoff(ctx, 1) - elapsed := time.Since(start) - - require.False(t, ok, "should return false when context is canceled") - require.Less(t, elapsed, 500*time.Millisecond, "should return immediately on cancel") -} - -func TestSleepAntigravitySingleAccountBackoff_FixedDelay(t *testing.T) { - // 验证不同 retryCount 都使用固定 2s 延迟 - ctx := context.Background() - - start := time.Now() - ok := sleepAntigravitySingleAccountBackoff(ctx, 5) - elapsed := time.Since(start) - - require.True(t, ok) - // 即使 retryCount=5,延迟仍然是固定的 2s - require.GreaterOrEqual(t, elapsed, 1500*time.Millisecond) - require.Less(t, elapsed, 5*time.Second) -} diff --git a/backend/internal/handler/gemini_v1beta_handler.go b/backend/internal/handler/gemini_v1beta_handler.go index 3d25505b..51b77037 100644 --- a/backend/internal/handler/gemini_v1beta_handler.go +++ b/backend/internal/handler/gemini_v1beta_handler.go @@ -321,11 +321,7 @@ func (h *GatewayHandler) GeminiV1BetaModels(c *gin.Context) { hasBoundSession := sessionKey != "" && sessionBoundAccountID > 0 cleanedForUnknownBinding := false - maxAccountSwitches := h.maxAccountSwitchesGemini - switchCount := 0 - failedAccountIDs := make(map[int64]struct{}) - var lastFailoverErr *service.UpstreamFailoverError - var forceCacheBilling bool // 粘性会话切换时的缓存计费标记 + fs := NewFailoverState(h.maxAccountSwitchesGemini, hasBoundSession) // 单账号分组提前设置 SingleAccountRetry 标记,让 Service 层首次 503 就不设模型限流标记。 // 避免单账号分组收到 503 (MODEL_CAPACITY_EXHAUSTED) 时设 29s 限流,导致后续请求连续快速失败。 @@ -335,27 +331,24 @@ func (h *GatewayHandler) GeminiV1BetaModels(c *gin.Context) { } for { - selection, err := h.gatewayService.SelectAccountWithLoadAwareness(c.Request.Context(), apiKey.GroupID, sessionKey, modelName, failedAccountIDs, "") // Gemini 不使用会话限制 + selection, err := h.gatewayService.SelectAccountWithLoadAwareness(c.Request.Context(), apiKey.GroupID, sessionKey, modelName, fs.FailedAccountIDs, "") // Gemini 不使用会话限制 if err != nil { - if len(failedAccountIDs) == 0 { + if len(fs.FailedAccountIDs) == 0 { googleError(c, http.StatusServiceUnavailable, "No available Gemini accounts: "+err.Error()) return } - // Antigravity 单账号退避重试:分组内没有其他可用账号时, - // 对 503 错误不直接返回,而是清除排除列表、等待退避后重试同一个账号。 - // 谷歌上游 503 (MODEL_CAPACITY_EXHAUSTED) 通常是暂时性的,等几秒就能恢复。 - if lastFailoverErr != nil && lastFailoverErr.StatusCode == http.StatusServiceUnavailable && switchCount <= maxAccountSwitches { - if sleepAntigravitySingleAccountBackoff(c.Request.Context(), switchCount) { - log.Printf("Antigravity single-account 503 retry: clearing failed accounts, retry %d/%d", switchCount, maxAccountSwitches) - failedAccountIDs = make(map[int64]struct{}) - // 设置 context 标记,让 Service 层预检查等待限流过期而非直接切换 - ctx := context.WithValue(c.Request.Context(), ctxkey.SingleAccountRetry, true) - c.Request = c.Request.WithContext(ctx) - continue - } + action := fs.HandleSelectionExhausted(c.Request.Context()) + switch action { + case FailoverContinue: + ctx := context.WithValue(c.Request.Context(), ctxkey.SingleAccountRetry, true) + c.Request = c.Request.WithContext(ctx) + continue + case FailoverCanceled: + return + default: // FailoverExhausted + h.handleGeminiFailoverExhausted(c, fs.LastFailoverErr) + return } - h.handleGeminiFailoverExhausted(c, lastFailoverErr) - return } account := selection.Account setOpsSelectedAccount(c, account.ID) @@ -429,8 +422,8 @@ func (h *GatewayHandler) GeminiV1BetaModels(c *gin.Context) { // 5) forward (根据平台分流) var result *service.ForwardResult requestCtx := c.Request.Context() - if switchCount > 0 { - requestCtx = context.WithValue(requestCtx, ctxkey.AccountSwitchCount, switchCount) + if fs.SwitchCount > 0 { + requestCtx = context.WithValue(requestCtx, ctxkey.AccountSwitchCount, fs.SwitchCount) } if account.Platform == service.PlatformAntigravity && account.Type != service.AccountTypeAPIKey { result, err = h.antigravityGatewayService.ForwardGemini(requestCtx, c, account, modelName, action, stream, body, hasBoundSession) @@ -443,24 +436,16 @@ func (h *GatewayHandler) GeminiV1BetaModels(c *gin.Context) { if err != nil { var failoverErr *service.UpstreamFailoverError if errors.As(err, &failoverErr) { - failedAccountIDs[account.ID] = struct{}{} - if needForceCacheBilling(hasBoundSession, failoverErr) { - forceCacheBilling = true - } - if switchCount >= maxAccountSwitches { - lastFailoverErr = failoverErr - h.handleGeminiFailoverExhausted(c, lastFailoverErr) + action := fs.HandleFailoverError(c.Request.Context(), h.gatewayService, account.ID, account.Platform, failoverErr) + switch action { + case FailoverContinue: + continue + case FailoverExhausted: + h.handleGeminiFailoverExhausted(c, fs.LastFailoverErr) + return + case FailoverCanceled: return } - lastFailoverErr = failoverErr - switchCount++ - log.Printf("Gemini account %d: upstream error %d, switching account %d/%d", account.ID, failoverErr.StatusCode, switchCount, maxAccountSwitches) - if account.Platform == service.PlatformAntigravity { - if !sleepFailoverDelay(c.Request.Context(), switchCount) { - return - } - } - continue } // ForwardNative already wrote the response log.Printf("Gemini native forward failed: %v", err) @@ -506,7 +491,7 @@ func (h *GatewayHandler) GeminiV1BetaModels(c *gin.Context) { }); err != nil { log.Printf("Record usage failed: %v", err) } - }(result, account, userAgent, clientIP, forceCacheBilling) + }(result, account, userAgent, clientIP, fs.ForceCacheBilling) return } } diff --git a/backend/internal/service/antigravity_gateway_service.go b/backend/internal/service/antigravity_gateway_service.go index 0d054c49..5ca7b3f3 100644 --- a/backend/internal/service/antigravity_gateway_service.go +++ b/backend/internal/service/antigravity_gateway_service.go @@ -1372,6 +1372,10 @@ func (s *AntigravityGatewayService) Forward(ctx context.Context, c *gin.Context, ForceCacheBilling: switchErr.IsStickySession, } } + // 区分客户端取消和真正的上游失败,返回更准确的错误消息 + if c.Request.Context().Err() != nil { + return nil, s.writeClaudeError(c, http.StatusBadGateway, "client_disconnected", "Client disconnected before upstream response") + } return nil, s.writeClaudeError(c, http.StatusBadGateway, "upstream_error", "Upstream request failed after retries") } resp := result.resp @@ -2044,6 +2048,10 @@ func (s *AntigravityGatewayService) ForwardGemini(ctx context.Context, c *gin.Co ForceCacheBilling: switchErr.IsStickySession, } } + // 区分客户端取消和真正的上游失败,返回更准确的错误消息 + if c.Request.Context().Err() != nil { + return nil, s.writeGoogleError(c, http.StatusBadGateway, "Client disconnected before upstream response") + } return nil, s.writeGoogleError(c, http.StatusBadGateway, "Upstream request failed after retries") } resp := result.resp diff --git a/backend/internal/service/error_passthrough_runtime_test.go b/backend/internal/service/error_passthrough_runtime_test.go index 0a45e57a..4a4309f9 100644 --- a/backend/internal/service/error_passthrough_runtime_test.go +++ b/backend/internal/service/error_passthrough_runtime_test.go @@ -220,7 +220,7 @@ func TestApplyErrorPassthroughRule_SkipMonitoringSetsContextKey(t *testing.T) { v, exists := c.Get(OpsSkipPassthroughKey) assert.True(t, exists, "OpsSkipPassthroughKey should be set when skip_monitoring=true") boolVal, ok := v.(bool) - assert.True(t, ok, "value should be bool") + assert.True(t, ok, "value should be a bool") assert.True(t, boolVal) } diff --git a/backend/internal/service/ops_concurrency.go b/backend/internal/service/ops_concurrency.go index f6541d08..faac2d5b 100644 --- a/backend/internal/service/ops_concurrency.go +++ b/backend/internal/service/ops_concurrency.go @@ -344,8 +344,16 @@ func (s *OpsService) getUsersLoadMapBestEffort(ctx context.Context, users []User return out } -// GetUserConcurrencyStats returns real-time concurrency usage for all active users. -func (s *OpsService) GetUserConcurrencyStats(ctx context.Context) (map[int64]*UserConcurrencyInfo, *time.Time, error) { +// GetUserConcurrencyStats returns real-time concurrency usage for active users. +// +// Optional filters: +// - platformFilter: only include users who have access to groups belonging to that platform +// - groupIDFilter: only include users who have access to that specific group +func (s *OpsService) GetUserConcurrencyStats( + ctx context.Context, + platformFilter string, + groupIDFilter *int64, +) (map[int64]*UserConcurrencyInfo, *time.Time, error) { if err := s.RequireMonitoringEnabled(ctx); err != nil { return nil, nil, err } @@ -355,6 +363,15 @@ func (s *OpsService) GetUserConcurrencyStats(ctx context.Context) (map[int64]*Us return nil, nil, err } + // Build a set of allowed group IDs when filtering is requested. + var allowedGroupIDs map[int64]struct{} + if platformFilter != "" || (groupIDFilter != nil && *groupIDFilter > 0) { + allowedGroupIDs, err = s.buildAllowedGroupIDsForFilter(ctx, platformFilter, groupIDFilter) + if err != nil { + return nil, nil, err + } + } + collectedAt := time.Now() loadMap := s.getUsersLoadMapBestEffort(ctx, users) @@ -365,6 +382,12 @@ func (s *OpsService) GetUserConcurrencyStats(ctx context.Context) (map[int64]*Us continue } + // Apply group/platform filter: skip users whose AllowedGroups + // have no intersection with the matching group IDs. + if allowedGroupIDs != nil && !userMatchesGroupFilter(u.AllowedGroups, allowedGroupIDs) { + continue + } + load := loadMap[u.ID] currentInUse := int64(0) waiting := int64(0) @@ -394,3 +417,46 @@ func (s *OpsService) GetUserConcurrencyStats(ctx context.Context) (map[int64]*Us return result, &collectedAt, nil } + +// buildAllowedGroupIDsForFilter returns the set of group IDs that match the given +// platform and/or group ID filter. It reuses listAllAccountsForOps (which already +// supports platform filtering at the DB level) to collect group IDs from accounts. +func (s *OpsService) buildAllowedGroupIDsForFilter(ctx context.Context, platformFilter string, groupIDFilter *int64) (map[int64]struct{}, error) { + // Fast path: only group ID filter, no platform filter needed. + if platformFilter == "" && groupIDFilter != nil && *groupIDFilter > 0 { + return map[int64]struct{}{*groupIDFilter: {}}, nil + } + + // Use the same account-based approach as GetConcurrencyStats to collect group IDs. + accounts, err := s.listAllAccountsForOps(ctx, platformFilter) + if err != nil { + return nil, err + } + + groupIDs := make(map[int64]struct{}) + for _, acc := range accounts { + for _, grp := range acc.Groups { + if grp == nil || grp.ID <= 0 { + continue + } + // If groupIDFilter is set, only include that specific group. + if groupIDFilter != nil && *groupIDFilter > 0 && grp.ID != *groupIDFilter { + continue + } + groupIDs[grp.ID] = struct{}{} + } + } + + return groupIDs, nil +} + +// userMatchesGroupFilter returns true if the user's AllowedGroups contains +// at least one group ID in the allowed set. +func userMatchesGroupFilter(userGroups []int64, allowedGroupIDs map[int64]struct{}) bool { + for _, gid := range userGroups { + if _, ok := allowedGroupIDs[gid]; ok { + return true + } + } + return false +} diff --git a/deploy/docker-compose.yml b/deploy/docker-compose.yml index 033731ac..f1d19f84 100644 --- a/deploy/docker-compose.yml +++ b/deploy/docker-compose.yml @@ -47,13 +47,15 @@ services: # ======================================================================= # Database Configuration (PostgreSQL) + # Default: uses local postgres container + # External DB: set DATABASE_HOST and DATABASE_SSLMODE in .env # ======================================================================= - - DATABASE_HOST=postgres - - DATABASE_PORT=5432 + - DATABASE_HOST=${DATABASE_HOST:-postgres} + - DATABASE_PORT=${DATABASE_PORT:-5432} - DATABASE_USER=${POSTGRES_USER:-sub2api} - DATABASE_PASSWORD=${POSTGRES_PASSWORD:?POSTGRES_PASSWORD is required} - DATABASE_DBNAME=${POSTGRES_DB:-sub2api} - - DATABASE_SSLMODE=disable + - DATABASE_SSLMODE=${DATABASE_SSLMODE:-disable} # ======================================================================= # Redis Configuration @@ -128,8 +130,6 @@ services: # Examples: http://host:port, socks5://host:port - UPDATE_PROXY_URL=${UPDATE_PROXY_URL:-} depends_on: - postgres: - condition: service_healthy redis: condition: service_healthy networks: @@ -141,35 +141,6 @@ services: retries: 3 start_period: 30s - # =========================================================================== - # PostgreSQL Database - # =========================================================================== - postgres: - image: postgres:18-alpine - container_name: sub2api-postgres - restart: unless-stopped - ulimits: - nofile: - soft: 100000 - hard: 100000 - volumes: - - postgres_data:/var/lib/postgresql/data - environment: - - POSTGRES_USER=${POSTGRES_USER:-sub2api} - - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:?POSTGRES_PASSWORD is required} - - POSTGRES_DB=${POSTGRES_DB:-sub2api} - - TZ=${TZ:-Asia/Shanghai} - networks: - - sub2api-network - healthcheck: - test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-sub2api} -d ${POSTGRES_DB:-sub2api}"] - interval: 10s - timeout: 5s - retries: 5 - start_period: 10s - # 注意:不暴露端口到宿主机,应用通过内部网络连接 - # 如需调试,可临时添加:ports: ["127.0.0.1:5433:5432"] - # =========================================================================== # Redis Cache # =========================================================================== @@ -209,8 +180,6 @@ services: volumes: sub2api_data: driver: local - postgres_data: - driver: local redis_data: driver: local diff --git a/frontend/public/wechat-qr.jpg b/frontend/public/wechat-qr.jpg new file mode 100644 index 00000000..659068d8 Binary files /dev/null and b/frontend/public/wechat-qr.jpg differ diff --git a/frontend/src/api/admin/ops.ts b/frontend/src/api/admin/ops.ts index 9f980a12..523fbd00 100644 --- a/frontend/src/api/admin/ops.ts +++ b/frontend/src/api/admin/ops.ts @@ -366,8 +366,16 @@ export async function getConcurrencyStats(platform?: string, groupId?: number | return data } -export async function getUserConcurrencyStats(): Promise { - const { data } = await apiClient.get('/admin/ops/user-concurrency') +export async function getUserConcurrencyStats(platform?: string, groupId?: number | null): Promise { + const params: Record = {} + if (platform) { + params.platform = platform + } + if (typeof groupId === 'number' && groupId > 0) { + params.group_id = groupId + } + + const { data } = await apiClient.get('/admin/ops/user-concurrency', { params }) return data } diff --git a/frontend/src/components/common/WechatServiceButton.vue b/frontend/src/components/common/WechatServiceButton.vue new file mode 100644 index 00000000..9ee8d3d5 --- /dev/null +++ b/frontend/src/components/common/WechatServiceButton.vue @@ -0,0 +1,104 @@ + + + + + diff --git a/frontend/src/components/layout/AppHeader.vue b/frontend/src/components/layout/AppHeader.vue index a6b4030f..53a0c01e 100644 --- a/frontend/src/components/layout/AppHeader.vue +++ b/frontend/src/components/layout/AppHeader.vue @@ -121,23 +121,6 @@ {{ t('nav.apiKeys') }} - - - - - - {{ t('nav.github') }} - diff --git a/frontend/src/views/HomeView.vue b/frontend/src/views/HomeView.vue index 6a3753f1..babcf046 100644 --- a/frontend/src/views/HomeView.vue +++ b/frontend/src/views/HomeView.vue @@ -122,8 +122,11 @@ > {{ siteName }} -

- {{ siteSubtitle }} +

+ {{ t('home.heroSubtitle') }} +

+

+ {{ t('home.heroDescription') }}

@@ -177,7 +180,7 @@ -
+
@@ -204,6 +207,63 @@
+ +
+

+ {{ t('home.painPoints.title') }} +

+
+ +
+
+ + + +
+

{{ t('home.painPoints.items.expensive.title') }}

+

{{ t('home.painPoints.items.expensive.desc') }}

+
+ +
+
+ + + +
+

{{ t('home.painPoints.items.complex.title') }}

+

{{ t('home.painPoints.items.complex.desc') }}

+
+ +
+
+ + + +
+

{{ t('home.painPoints.items.unstable.title') }}

+

{{ t('home.painPoints.items.unstable.desc') }}

+
+ +
+
+ + + +
+

{{ t('home.painPoints.items.noControl.title') }}

+

{{ t('home.painPoints.items.noControl.desc') }}

+
+
+
+ + +
+

+ {{ t('home.solutions.title') }} +

+

{{ t('home.solutions.subtitle') }}

+
+
@@ -369,6 +429,77 @@ >
+ + +
+

+ {{ t('home.comparison.title') }} +

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
{{ t('home.comparison.headers.feature') }}{{ t('home.comparison.headers.official') }}{{ t('home.comparison.headers.us') }}
{{ t('home.comparison.items.pricing.feature') }}{{ t('home.comparison.items.pricing.official') }}{{ t('home.comparison.items.pricing.us') }}
{{ t('home.comparison.items.models.feature') }}{{ t('home.comparison.items.models.official') }}{{ t('home.comparison.items.models.us') }}
{{ t('home.comparison.items.management.feature') }}{{ t('home.comparison.items.management.official') }}{{ t('home.comparison.items.management.us') }}
{{ t('home.comparison.items.stability.feature') }}{{ t('home.comparison.items.stability.official') }}{{ t('home.comparison.items.stability.us') }}
{{ t('home.comparison.items.control.feature') }}{{ t('home.comparison.items.control.official') }}{{ t('home.comparison.items.control.us') }}
+
+
+ + +
+

+ {{ t('home.cta.title') }} +

+

+ {{ t('home.cta.description') }} +

+ + {{ t('home.cta.button') }} + + + + {{ t('home.goToDashboard') }} + + +
@@ -380,27 +511,20 @@

© {{ currentYear }} {{ siteName }}. {{ t('home.footer.allRightsReserved') }}

- + + {{ t('home.docs') }} + + + + @@ -410,6 +534,7 @@ import { useI18n } from 'vue-i18n' import { useAuthStore, useAppStore } from '@/stores' import LocaleSwitcher from '@/components/common/LocaleSwitcher.vue' import Icon from '@/components/icons/Icon.vue' +import WechatServiceButton from '@/components/common/WechatServiceButton.vue' const { t } = useI18n() @@ -419,7 +544,6 @@ const appStore = useAppStore() // Site settings - directly from appStore (already initialized from injected config) const siteName = computed(() => appStore.cachedPublicSettings?.site_name || appStore.siteName || 'Sub2API') const siteLogo = computed(() => appStore.cachedPublicSettings?.site_logo || appStore.siteLogo || '') -const siteSubtitle = computed(() => appStore.cachedPublicSettings?.site_subtitle || 'AI API Gateway Platform') const docUrl = computed(() => appStore.cachedPublicSettings?.doc_url || appStore.docUrl || '') const homeContent = computed(() => appStore.cachedPublicSettings?.home_content || '') @@ -432,9 +556,6 @@ const isHomeContentUrl = computed(() => { // Theme const isDark = ref(document.documentElement.classList.contains('dark')) -// GitHub URL -const githubUrl = 'https://github.com/Wei-Shaw/sub2api' - // Auth state const isAuthenticated = computed(() => authStore.isAuthenticated) const isAdmin = computed(() => authStore.isAdmin) diff --git a/frontend/src/views/admin/ops/components/OpsConcurrencyCard.vue b/frontend/src/views/admin/ops/components/OpsConcurrencyCard.vue index c7370ab5..0956caa5 100644 --- a/frontend/src/views/admin/ops/components/OpsConcurrencyCard.vue +++ b/frontend/src/views/admin/ops/components/OpsConcurrencyCard.vue @@ -122,6 +122,7 @@ const platformRows = computed((): SummaryRow[] => { available_accounts: availableAccounts, rate_limited_accounts: safeNumber(avail.rate_limit_count), + error_accounts: safeNumber(avail.error_count), total_concurrency: totalConcurrency, used_concurrency: usedConcurrency, @@ -161,7 +162,6 @@ const groupRows = computed((): SummaryRow[] => { total_accounts: totalAccounts, available_accounts: availableAccounts, rate_limited_accounts: safeNumber(avail.rate_limit_count), - error_accounts: safeNumber(avail.error_count), total_concurrency: totalConcurrency, used_concurrency: usedConcurrency, @@ -265,7 +265,7 @@ async function loadData() { try { if (showByUser.value) { // 用户视图模式只加载用户并发数据 - const userData = await opsAPI.getUserConcurrencyStats() + const userData = await opsAPI.getUserConcurrencyStats(props.platformFilter, props.groupIdFilter) userConcurrency.value = userData } else { // 常规模式加载账号/平台/分组数据 @@ -301,6 +301,14 @@ watch( } ) +// 过滤条件变化时重新加载数据 +watch( + [() => props.platformFilter, () => props.groupIdFilter], + () => { + loadData() + } +) + function getLoadBarClass(loadPct: number): string { if (loadPct >= 90) return 'bg-red-500 dark:bg-red-600' if (loadPct >= 70) return 'bg-orange-500 dark:bg-orange-600' @@ -329,6 +337,7 @@ function formatDuration(seconds: number): string { } + watch( () => realtimeEnabled.value, async (enabled) => { diff --git a/stress_test_gemini_session.sh b/stress_test_gemini_session.sh new file mode 100644 index 00000000..1f2aca57 --- /dev/null +++ b/stress_test_gemini_session.sh @@ -0,0 +1,127 @@ +#!/bin/bash + +# Gemini 粘性会话压力测试脚本 +# 测试目标:验证不同会话分配不同账号,同一会话保持同一账号 + +BASE_URL="http://host.clicodeplus.com:8080" +API_KEY="sk-32ad0a3197e528c840ea84f0dc6b2056dd3fead03526b5c605a60709bd408f7e" +MODEL="gemini-2.5-flash" + +# 创建临时目录存放结果 +RESULT_DIR="/tmp/gemini_stress_test_$(date +%s)" +mkdir -p "$RESULT_DIR" + +echo "==========================================" +echo "Gemini 粘性会话压力测试" +echo "结果目录: $RESULT_DIR" +echo "==========================================" + +# 函数:发送请求并记录 +send_request() { + local session_id=$1 + local round=$2 + local system_prompt=$3 + local contents=$4 + local output_file="$RESULT_DIR/session_${session_id}_round_${round}.json" + + local request_body=$(cat < "$output_file" 2>&1 + + echo "[Session $session_id Round $round] 完成" +} + +# 会话1:数学计算器(累加序列) +run_session_1() { + local sys_prompt="你是一个数学计算器,只返回计算结果数字,不要任何解释" + + # Round 1: 1+1=? + send_request 1 1 "$sys_prompt" '[{"role":"user","parts":[{"text":"1+1=?"}]}]' + + # Round 2: 继续 2+2=?(累加历史) + send_request 1 2 "$sys_prompt" '[{"role":"user","parts":[{"text":"1+1=?"}]},{"role":"model","parts":[{"text":"2"}]},{"role":"user","parts":[{"text":"2+2=?"}]}]' + + # Round 3: 继续 3+3=? + send_request 1 3 "$sys_prompt" '[{"role":"user","parts":[{"text":"1+1=?"}]},{"role":"model","parts":[{"text":"2"}]},{"role":"user","parts":[{"text":"2+2=?"}]},{"role":"model","parts":[{"text":"4"}]},{"role":"user","parts":[{"text":"3+3=?"}]}]' + + # Round 4: 批量计算 10+10, 20+20, 30+30 + send_request 1 4 "$sys_prompt" '[{"role":"user","parts":[{"text":"1+1=?"}]},{"role":"model","parts":[{"text":"2"}]},{"role":"user","parts":[{"text":"2+2=?"}]},{"role":"model","parts":[{"text":"4"}]},{"role":"user","parts":[{"text":"3+3=?"}]},{"role":"model","parts":[{"text":"6"}]},{"role":"user","parts":[{"text":"计算: 10+10=? 20+20=? 30+30=?"}]}]' +} + +# 会话2:英文翻译器(不同系统提示词 = 不同会话) +run_session_2() { + local sys_prompt="你是一个英文翻译器,将中文翻译成英文,只返回翻译结果" + + send_request 2 1 "$sys_prompt" '[{"role":"user","parts":[{"text":"你好"}]}]' + send_request 2 2 "$sys_prompt" '[{"role":"user","parts":[{"text":"你好"}]},{"role":"model","parts":[{"text":"Hello"}]},{"role":"user","parts":[{"text":"世界"}]}]' + send_request 2 3 "$sys_prompt" '[{"role":"user","parts":[{"text":"你好"}]},{"role":"model","parts":[{"text":"Hello"}]},{"role":"user","parts":[{"text":"世界"}]},{"role":"model","parts":[{"text":"World"}]},{"role":"user","parts":[{"text":"早上好"}]}]' +} + +# 会话3:日文翻译器 +run_session_3() { + local sys_prompt="你是一个日文翻译器,将中文翻译成日文,只返回翻译结果" + + send_request 3 1 "$sys_prompt" '[{"role":"user","parts":[{"text":"你好"}]}]' + send_request 3 2 "$sys_prompt" '[{"role":"user","parts":[{"text":"你好"}]},{"role":"model","parts":[{"text":"こんにちは"}]},{"role":"user","parts":[{"text":"谢谢"}]}]' + send_request 3 3 "$sys_prompt" '[{"role":"user","parts":[{"text":"你好"}]},{"role":"model","parts":[{"text":"こんにちは"}]},{"role":"user","parts":[{"text":"谢谢"}]},{"role":"model","parts":[{"text":"ありがとう"}]},{"role":"user","parts":[{"text":"再见"}]}]' +} + +# 会话4:乘法计算器(另一个数学会话,但系统提示词不同) +run_session_4() { + local sys_prompt="你是一个乘法专用计算器,只计算乘法,返回数字结果" + + send_request 4 1 "$sys_prompt" '[{"role":"user","parts":[{"text":"2*3=?"}]}]' + send_request 4 2 "$sys_prompt" '[{"role":"user","parts":[{"text":"2*3=?"}]},{"role":"model","parts":[{"text":"6"}]},{"role":"user","parts":[{"text":"4*5=?"}]}]' + send_request 4 3 "$sys_prompt" '[{"role":"user","parts":[{"text":"2*3=?"}]},{"role":"model","parts":[{"text":"6"}]},{"role":"user","parts":[{"text":"4*5=?"}]},{"role":"model","parts":[{"text":"20"}]},{"role":"user","parts":[{"text":"计算: 10*10=? 20*20=?"}]}]' +} + +# 会话5:诗人(完全不同的角色) +run_session_5() { + local sys_prompt="你是一位诗人,用简短的诗句回应每个话题,每次只写一句诗" + + send_request 5 1 "$sys_prompt" '[{"role":"user","parts":[{"text":"春天"}]}]' + send_request 5 2 "$sys_prompt" '[{"role":"user","parts":[{"text":"春天"}]},{"role":"model","parts":[{"text":"春风拂面花满枝"}]},{"role":"user","parts":[{"text":"夏天"}]}]' + send_request 5 3 "$sys_prompt" '[{"role":"user","parts":[{"text":"春天"}]},{"role":"model","parts":[{"text":"春风拂面花满枝"}]},{"role":"user","parts":[{"text":"夏天"}]},{"role":"model","parts":[{"text":"蝉鸣蛙声伴荷香"}]},{"role":"user","parts":[{"text":"秋天"}]}]' +} + +echo "" +echo "开始并发测试 5 个独立会话..." +echo "" + +# 并发运行所有会话 +run_session_1 & +run_session_2 & +run_session_3 & +run_session_4 & +run_session_5 & + +# 等待所有后台任务完成 +wait + +echo "" +echo "==========================================" +echo "所有请求完成,结果保存在: $RESULT_DIR" +echo "==========================================" + +# 显示结果摘要 +echo "" +echo "响应摘要:" +for f in "$RESULT_DIR"/*.json; do + filename=$(basename "$f") + response=$(cat "$f" | head -c 200) + echo "[$filename]: ${response}..." +done + +echo "" +echo "请检查服务器日志确认账号分配情况"