diff --git a/.github/workflows/backend-ci.yml b/.github/workflows/backend-ci.yml
index d21d0684..2c1e9413 100644
--- a/.github/workflows/backend-ci.yml
+++ b/.github/workflows/backend-ci.yml
@@ -17,6 +17,7 @@ jobs:
           go-version-file: backend/go.mod
           check-latest: false
           cache: true
+          cache-dependency-path: backend/go.sum
       - name: Verify Go version
         run: |
           go version | grep -q 'go1.25.7'
@@ -36,6 +37,7 @@ jobs:
           go-version-file: backend/go.mod
           check-latest: false
           cache: true
+          cache-dependency-path: backend/go.sum
       - name: Verify Go version
         run: |
           go version | grep -q 'go1.25.7'
diff --git a/.gitignore b/.gitignore
index 297c1d6f..da112576 100644
--- a/.gitignore
+++ b/.gitignore
@@ -78,6 +78,7 @@ Desktop.ini
 # ===================
 tmp/
 temp/
+logs/
 *.tmp
 *.temp
 *.log
@@ -128,8 +129,15 @@ deploy/docker-compose.override.yml
 vite.config.js
 docs/*
 .serena/
+
+# ===================
+# 压测工具
+# ===================
+tools/loadtest/
+# Antigravity Manager
+Antigravity-Manager/
+antigravity_projectid_fix.patch
 .codex/
 frontend/coverage/
 aicodex
 output/
-
diff --git a/AGENTS.md b/AGENTS.md
new file mode 100644
index 00000000..8edfa58b
--- /dev/null
+++ b/AGENTS.md
@@ -0,0 +1,1392 @@
+# Sub2API 开发说明
+
+## 版本管理策略
+
+### 版本号规则
+
+我们在官方版本号后面添加自己的小版本号：
+
+- 官方版本：`v0.1.68`
+- 我们的版本：`v0.1.68.1`、`v0.1.68.2`（递增）
+
+### 分支策略
+
+| 分支 | 说明 |
+|------|------|
+| `main` | 我们的主分支，包含所有定制功能 |
+| `release/custom-X.Y.Z` | 基于官方 `vX.Y.Z` 的发布分支 |
+| `upstream/main` | 上游官方仓库 |
+
+---
+
+## 发布流程（基于新官方版本）
+
+当官方发布新版本（如 `v0.1.69`）时：
+
+### 1. 同步上游并创建发布分支
+
+```bash
+# 获取上游最新代码
+git fetch upstream --tags
+
+# 基于官方标签创建新的发布分支
+git checkout v0.1.69 -b release/custom-0.1.69
+
+# 合并我们的 main 分支（包含所有定制功能）
+git merge main --no-edit
+
+# 解决可能的冲突后继续
+```
+
+### 2. 更新版本号并打标签
+
+```bash
+# 更新版本号文件
+echo "0.1.69.1" > backend/cmd/server/VERSION
+git add backend/cmd/server/VERSION
+git commit -m "chore: bump version to 0.1.69.1"
+
+# 打上我们自己的标签
+git tag v0.1.69.1
+
+# 推送分支和标签
+git push origin release/custom-0.1.69
+git push origin v0.1.69.1
+```
+
+### 3. 更新 main 分支
+
+```bash
+# 将发布分支合并回 main，保持 main 包含最新定制功能
+git checkout main
+git merge release/custom-0.1.69
+git push origin main
+```
+
+---
+
+## 热修复发布（在现有版本上修复）
+
+当需要在当前版本上发布修复时：
+
+```bash
+# 在当前发布分支上修复
+git checkout release/custom-0.1.68
+# ... 进行修复 ...
+git commit -m "fix: 修复描述"
+
+# 递增小版本号
+echo "0.1.68.2" > backend/cmd/server/VERSION
+git add backend/cmd/server/VERSION
+git commit -m "chore: bump version to 0.1.68.2"
+
+# 打标签并推送
+git tag v0.1.68.2
+git push origin release/custom-0.1.68
+git push origin v0.1.68.2
+
+# 同步修复到 main
+git checkout main
+git cherry-pick <fix-commit-hash>
+git push origin main
+```
+
+---
+
+## 服务器部署流程
+
+### 前置条件
+
+- 本地已配置 SSH 别名 `clicodeplus` 连接到生产服务器（运行服务）
+- 本地已配置 SSH 别名 `us-asaki-root` 连接到构建服务器（拉取代码、构建镜像）
+- 生产服务器部署目录：`/root/sub2api`（正式）、`/root/sub2api-beta`（测试）、`/root/sub2api-star`（Star）
+- 生产服务器使用 Docker Compose 部署
+- **镜像统一在构建服务器上构建**，避免生产服务器因编译占用 CPU/内存影响线上服务
+
+### 服务器角色说明
+
+| 服务器 | SSH 别名 | 职责 |
+|--------|----------|------|
+| 构建服务器 | `us-asaki-root` | 拉取代码、`docker build` 构建镜像 |
+| 生产服务器 | `clicodeplus` | 加载镜像、运行服务、部署验证 |
+| 数据库服务器 | `db-clicodeplus` | PostgreSQL 16 + Redis 7，所有环境共用 |
+
+> 数据库服务器运维手册：`db-clicodeplus:/root/README.md`
+
+### 部署环境说明
+
+| 环境 | 目录（生产服务器） | 端口 | 数据库 | Redis DB | 容器名 |
+|------|------|------|--------|----------|--------|
+| 正式 | `/root/sub2api` | 8080 | `sub2api` | 0 | `sub2api` |
+| Beta | `/root/sub2api-beta` | 8084 | `beta` | 2 | `sub2api-beta` |
+| OpenAI | `/root/sub2api-openai` | 8083 | `openai` | 3 | `sub2api-openai` |
+| Star | `/root/sub2api-star` | 8086 | `star` | 4 | `sub2api-star` |
+
+### 外部数据库与 Redis
+
+所有环境（正式、Beta、OpenAI、Star）共用 `db.clicodeplus.com` 上的 **PostgreSQL 16** 和 **Redis 7**，不使用容器内数据库或 Redis。
+
+**PostgreSQL**（端口 5432，TLS 加密，scram-sha-256 认证）：
+
+| 环境 | 用户名 | 数据库 |
+|------|--------|--------|
+| 正式 | `sub2api` | `sub2api` |
+| Beta | `beta` | `beta` |
+| OpenAI | `openai` | `openai` |
+| Star | `star` | `star` |
+
+**Redis**（端口 6379，密码认证）：
+
+| 环境 | DB |
+|------|-----|
+| 正式 | 0 |
+| Beta | 2 |
+| OpenAI | 3 |
+| Star | 4 |
+
+**配置方式**：
+- 数据库通过 `.env` 中的 `DATABASE_HOST`、`DATABASE_SSLMODE`、`POSTGRES_USER`、`POSTGRES_PASSWORD`、`POSTGRES_DB` 配置
+- Redis 通过 `docker-compose.override.yml` 覆盖 `REDIS_HOST`（因主 compose 文件硬编码为 `redis`），密码通过 `.env` 中的 `REDIS_PASSWORD` 配置
+- 各环境的 `docker-compose.override.yml` 已通过 `depends_on: !reset {}` 和 `redis: profiles: [disabled]` 去掉了对容器 Redis 的依赖
+
+#### 数据库操作命令
+
+通过 SSH 在服务器上执行数据库操作：
+
+```bash
+# 正式环境 - 查询迁移记录
+ssh clicodeplus "source /root/sub2api/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c 'SELECT * FROM schema_migrations ORDER BY applied_at DESC LIMIT 5;'"
+
+# Beta 环境 - 查询迁移记录
+ssh clicodeplus "source /root/sub2api-beta/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c 'SELECT * FROM schema_migrations ORDER BY applied_at DESC LIMIT 5;'"
+
+# Beta 环境 - 清除指定迁移记录（重新执行迁移）
+ssh clicodeplus "source /root/sub2api-beta/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c \"DELETE FROM schema_migrations WHERE filename LIKE '%049%';\""
+
+# Beta 环境 - 更新账号数据
+ssh clicodeplus "source /root/sub2api-beta/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c \"UPDATE accounts SET credentials = credentials - 'model_mapping' WHERE platform = 'antigravity';\""
+```
+
+> **注意**：使用 `source .env` 加载环境变量，避免在命令行中暴露密码。
+
+### 部署步骤
+
+**重要：每次部署都必须递增版本号！**
+
+#### 0. 递增版本号并推送（本地操作）
+
+每次部署前，先在本地递增小版本号并确保推送成功：
+
+```bash
+# 查看当前版本号
+cat backend/cmd/server/VERSION
+# 假设当前是 0.1.69.1
+
+# 递增版本号
+echo "0.1.69.2" > backend/cmd/server/VERSION
+git add backend/cmd/server/VERSION
+git commit -m "chore: bump version to 0.1.69.2"
+git push origin release/custom-0.1.69
+
+# ⚠️ 确认推送成功（必须看到分支更新输出，不能有 rejected 错误）
+```
+
+> **检查点**：如果有其他未提交的改动，应先 commit 并 push，确保 release 分支上的所有代码都已推送到远程。
+
+#### 1. 构建服务器拉取代码
+
+```bash
+# 拉取最新代码并切换分支
+ssh us-asaki-root "cd /root/sub2api && git fetch origin && git checkout -B release/custom-0.1.69 origin/release/custom-0.1.69"
+
+# ⚠️ 验证版本号与步骤 0 一致
+ssh us-asaki-root "cat /root/sub2api/backend/cmd/server/VERSION"
+```
+
+> **首次使用构建服务器？** 需要先初始化仓库，参见下方「构建服务器首次初始化」章节。
+
+#### 2. 构建服务器构建镜像
+
+```bash
+ssh us-asaki-root "cd /root/sub2api && docker build --no-cache -t sub2api:latest -f Dockerfile ."
+
+# ⚠️ 必须看到构建成功输出，如果失败需要先排查问题
+```
+
+> **常见构建问题**：
+> - `buildx` 版本过旧导致 API 版本不兼容 → 更新 buildx：`curl -fsSL "https://github.com/docker/buildx/releases/latest/download/buildx-$(curl -fsSL https://api.github.com/repos/docker/buildx/releases/latest | grep tag_name | cut -d'"' -f4).linux-amd64" -o ~/.docker/cli-plugins/docker-buildx && chmod +x ~/.docker/cli-plugins/docker-buildx`
+> - 磁盘空间不足 → `docker system prune -f` 清理无用镜像
+
+#### 3. 传输镜像到生产服务器并加载
+
+```bash
+# 导出镜像 → 通过管道传输 → 生产服务器加载
+ssh us-asaki-root "docker save sub2api:latest" | ssh clicodeplus "docker load"
+
+# ⚠️ 必须看到 "Loaded image: sub2api:latest" 输出
+```
+
+#### 4. 生产服务器同步代码、更新标签并重启
+
+```bash
+# 同步代码（用于版本号确认和 deploy 配置）
+ssh clicodeplus "cd /root/sub2api && git fetch fork && git checkout -B release/custom-0.1.69 fork/release/custom-0.1.69"
+
+# 更新镜像标签并重启
+ssh clicodeplus "docker tag sub2api:latest weishaw/sub2api:latest"
+ssh clicodeplus "cd /root/sub2api/deploy && docker compose up -d --force-recreate sub2api"
+```
+
+#### 5. 验证部署
+
+```bash
+# 查看启动日志
+ssh clicodeplus "docker logs sub2api --tail 20"
+
+# 确认版本号（必须与步骤 0 中设置的版本号一致）
+ssh clicodeplus "cat /root/sub2api/backend/cmd/server/VERSION"
+
+# 检查容器状态（必须显示 healthy）
+ssh clicodeplus "docker ps | grep sub2api"
+```
+
+---
+
+### 构建服务器首次初始化
+
+首次使用 `us-asaki-root` 作为构建服务器时，需要执行以下一次性操作：
+
+```bash
+ssh us-asaki-root
+
+# 1) 克隆仓库
+cd /root
+git clone https://github.com/touwaeriol/sub2api.git sub2api
+cd sub2api
+
+# 2) 验证 Docker 和 buildx 版本
+docker version
+docker buildx version
+# 如果 buildx 版本过旧（< v0.14），执行更新：
+# LATEST=$(curl -fsSL https://api.github.com/repos/docker/buildx/releases/latest | grep tag_name | cut -d'"' -f4)
+# curl -fsSL "https://github.com/docker/buildx/releases/download/${LATEST}/buildx-${LATEST}.linux-amd64" -o ~/.docker/cli-plugins/docker-buildx
+# chmod +x ~/.docker/cli-plugins/docker-buildx
+
+# 3) 验证构建能力
+docker build --no-cache -t sub2api:test -f Dockerfile .
+docker rmi sub2api:test
+```
+
+---
+
+## Beta 并行部署（不影响现网）
+
+目标：在同一台服务器上并行启动一个 beta 实例（例如端口 `8084`），**严禁改动/重启**现网实例（默认目录 `/root/sub2api`）。
+
+### 设计原则
+
+- **新目录**：beta 使用独立目录，例如 `/root/sub2api-beta`。
+- **敏感信息只放 `.env`**：beta 的数据库密码、JWT_SECRET 等只写入 `/root/sub2api-beta/deploy/.env`，不要提交到 git。
+- **独立 Compose Project**：通过 `docker compose -p sub2api-beta ...` 启动，确保 network/volume 隔离。
+- **独立端口**：通过 `.env` 的 `SERVER_PORT` 映射宿主机端口（例如 `8084:8080`）。
+
+### 前置检查
+
+```bash
+# 1) 确保 8084 未被占用
+ssh clicodeplus "ss -ltnp | grep :8084 || echo '8084 is free'"
+
+# 2) 确认现网容器还在（只读检查）
+ssh clicodeplus "docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Ports}}' | sed -n '1,200p'"
+```
+
+### 首次部署步骤
+
+> **构建服务器说明**：正式和 beta 共用构建服务器上的 `/root/sub2api` 仓库，通过不同的镜像标签区分（`sub2api:latest` 用于正式，`sub2api:beta` 用于测试）。
+
+```bash
+# 1) 构建服务器构建 beta 镜像（共用 /root/sub2api 仓库，切到目标分支后打 beta 标签）
+ssh us-asaki-root "cd /root/sub2api && git fetch origin && git checkout -B release/custom-0.1.71 origin/release/custom-0.1.71"
+ssh us-asaki-root "cd /root/sub2api && docker build --no-cache -t sub2api:beta -f Dockerfile ."
+
+# ⚠️ 构建完成后如需恢复正式分支：
+# ssh us-asaki-root "cd /root/sub2api && git checkout release/custom-<正式版本>"
+
+# 2) 传输镜像到生产服务器
+ssh us-asaki-root "docker save sub2api:beta" | ssh clicodeplus "docker load"
+# ⚠️ 必须看到 "Loaded image: sub2api:beta" 输出
+
+# 3) 在生产服务器上准备 beta 环境
+ssh clicodeplus
+
+# 克隆代码（仅用于 deploy 配置和版本号确认，不在此构建）
+cd /root
+git clone https://github.com/touwaeriol/sub2api.git sub2api-beta
+cd /root/sub2api-beta
+git checkout release/custom-0.1.71
+
+# 4) 准备 beta 的 .env（敏感信息只写这里）
+cd /root/sub2api-beta/deploy
+
+# 推荐：从现网 .env 复制，保证除 DB 名/用户/端口外完全一致
+cp -f /root/sub2api/deploy/.env ./.env
+
+# 仅修改以下三项（其他保持不变）
+perl -pi -e 's/^SERVER_PORT=.*/SERVER_PORT=8084/' ./.env
+perl -pi -e 's/^POSTGRES_USER=.*/POSTGRES_USER=beta/' ./.env
+perl -pi -e 's/^POSTGRES_DB=.*/POSTGRES_DB=beta/' ./.env
+
+# 5) 写 compose override（避免与现网容器名冲突，镜像使用构建服务器传输的 sub2api:beta，Redis 使用外部服务）
+cat > docker-compose.override.yml <<'YAML'
+services:
+  sub2api:
+    image: sub2api:beta
+    container_name: sub2api-beta
+    environment:
+      - DATABASE_HOST=${DATABASE_HOST:-postgres}
+      - DATABASE_SSLMODE=${DATABASE_SSLMODE:-disable}
+      - REDIS_HOST=db.clicodeplus.com
+    depends_on: !reset {}
+  redis:
+    profiles:
+      - disabled
+YAML
+
+# 6) 启动 beta（独立 project，确保不影响现网）
+cd /root/sub2api-beta/deploy
+docker compose -p sub2api-beta --env-file .env -f docker-compose.yml -f docker-compose.override.yml up -d
+
+# 7) 验证 beta
+curl -fsS http://127.0.0.1:8084/health
+docker logs sub2api-beta --tail 50
+```
+
+### 数据库配置约定（beta）
+
+- 数据库地址/SSL/密码：与现网一致（从现网 `.env` 复制即可），均指向 `db.clicodeplus.com`。
+- 仅修改：
+  - `POSTGRES_USER=beta`
+  - `POSTGRES_DB=beta`
+  - `REDIS_DB=2`
+
+注意：需要数据库侧已存在 `beta` 用户与 `beta` 数据库，并授予权限；否则容器会启动失败并不断重启。
+
+### 更新 beta（构建服务器构建 + 传输 + 仅重启 beta 容器）
+
+```bash
+# 1) 构建服务器拉取代码并构建镜像（共用 /root/sub2api 仓库）
+ssh us-asaki-root "cd /root/sub2api && git fetch origin && git checkout -B release/custom-0.1.71 origin/release/custom-0.1.71"
+ssh us-asaki-root "cd /root/sub2api && docker build --no-cache -t sub2api:beta -f Dockerfile ."
+# ⚠️ 必须看到构建成功输出
+
+# 2) 传输镜像到生产服务器
+ssh us-asaki-root "docker save sub2api:beta" | ssh clicodeplus "docker load"
+# ⚠️ 必须看到 "Loaded image: sub2api:beta" 输出
+
+# 3) 生产服务器同步代码（用于版本号确认和 deploy 配置）
+ssh clicodeplus "set -e; cd /root/sub2api-beta && git fetch --all --tags && git checkout -f release/custom-0.1.71 && git reset --hard origin/release/custom-0.1.71"
+
+# 4) 重启 beta 容器并验证
+ssh clicodeplus "cd /root/sub2api-beta/deploy && docker compose -p sub2api-beta --env-file .env -f docker-compose.yml -f docker-compose.override.yml up -d --no-deps --force-recreate sub2api"
+ssh clicodeplus "sleep 5 && curl -fsS http://127.0.0.1:8084/health"
+ssh clicodeplus "cat /root/sub2api-beta/backend/cmd/server/VERSION"
+```
+
+### 停止/回滚 beta（只影响 beta）
+
+```bash
+ssh clicodeplus "cd /root/sub2api-beta/deploy && docker compose -p sub2api-beta -f docker-compose.yml -f docker-compose.override.yml down"
+```
+
+---
+
+## 服务器首次部署
+
+### 1. 构建服务器：克隆代码并配置远程仓库
+
+```bash
+ssh us-asaki-root
+cd /root
+git clone https://github.com/Wei-Shaw/sub2api.git
+cd sub2api
+
+# 添加 fork 仓库
+git remote add fork https://github.com/touwaeriol/sub2api.git
+```
+
+### 2. 构建服务器：切换到定制分支并构建镜像
+
+```bash
+git fetch fork
+git checkout -B release/custom-0.1.69 fork/release/custom-0.1.69
+
+cd /root/sub2api
+docker build -t sub2api:latest -f Dockerfile .
+exit
+```
+
+### 3. 传输镜像到生产服务器
+
+```bash
+ssh us-asaki-root "docker save sub2api:latest" | ssh clicodeplus "docker load"
+```
+
+### 4. 生产服务器：克隆代码并配置环境
+
+```bash
+ssh clicodeplus
+cd /root
+git clone https://github.com/Wei-Shaw/sub2api.git
+cd sub2api
+
+# 添加 fork 仓库
+git remote add fork https://github.com/touwaeriol/sub2api.git
+git fetch fork
+git checkout -B release/custom-0.1.69 fork/release/custom-0.1.69
+
+# 配置环境变量
+cd deploy
+cp .env.example .env
+vim .env  # 配置 DATABASE_HOST=db.clicodeplus.com, POSTGRES_PASSWORD, REDIS_PASSWORD, JWT_SECRET 等
+
+# 创建 override 文件（Redis 指向外部服务，去掉容器 Redis 依赖）
+cat > docker-compose.override.yml <<'YAML'
+services:
+  sub2api:
+    environment:
+      - REDIS_HOST=db.clicodeplus.com
+    depends_on: !reset {}
+  redis:
+    profiles:
+      - disabled
+YAML
+```
+
+### 5. 生产服务器：更新镜像标签并启动服务
+
+```bash
+docker tag sub2api:latest weishaw/sub2api:latest
+cd /root/sub2api/deploy && docker compose up -d
+```
+
+### 6. 验证部署
+
+```bash
+# 查看应用日志
+docker logs sub2api --tail 50
+
+# 检查健康状态
+curl http://localhost:8080/health
+
+# 确认版本号
+cat /root/sub2api/backend/cmd/server/VERSION
+```
+
+### 7. 常用运维命令
+
+```bash
+# 查看实时日志
+docker logs -f sub2api
+
+# 重启服务
+docker compose restart sub2api
+
+# 停止所有服务
+docker compose down
+
+# 停止并删除数据卷（慎用！会删除数据库数据）
+docker compose down -v
+
+# 查看资源使用情况
+docker stats sub2api
+```
+
+---
+
+## 定制功能说明
+
+当前定制分支包含以下功能（相对于官方版本）：
+
+### UI/UX 定制
+
+| 功能 | 说明 |
+|------|------|
+| 首页优化 | 面向用户的价值主张设计 |
+| 移除 GitHub 链接 | 用户菜单中不显示 GitHub 导航 |
+| 微信客服按钮 | 首页悬浮微信客服入口 |
+| 限流时间精确显示 | 账号限流时间显示精确到秒 |
+
+### Antigravity 平台增强
+
+| 功能 | 说明 |
+|------|------|
+| Scope 级别限流 | 按配额域（claude/gemini_text/gemini_image）独立限流，避免整个账号被锁定 |
+| 模型级别限流 | 按具体模型（如 claude-opus-4-5）独立限流，更精细的限流控制 |
+| 限流预检查 | 调度时预检查账号/模型限流状态，避免选中已限流账号 |
+| 秒级冷却时间 | 支持 429 响应的秒级精确冷却时间 |
+| 身份注入优化 | 模型身份信息注入 + 静默边界防止身份泄露 |
+| thoughtSignature 修复 | Gemini 3 函数调用 400 错误修复 |
+| max_tokens 自动修正 | 自动修正 max_tokens <= budget_tokens 导致的 400 错误 |
+
+### 调度算法优化
+
+| 功能 | 说明 |
+|------|------|
+| 分层过滤选择 | 调度算法从全排序改为分层过滤，提升性能 |
+| LRU 随机选择 | 相同 LRU 时间时随机选择，避免账号集中 |
+| 限流等待阈值配置化 | 可配置的限流等待阈值 |
+
+### 运维增强
+
+| 功能 | 说明 |
+|------|------|
+| Scope 限流统计 | 运维界面展示 Antigravity 账号 scope 级别限流统计 |
+| 账号限流状态显示 | 账号列表显示 scope 和模型级别限流状态 |
+| 清除限流按钮增强 | 有 scope/模型限流时也显示清除限流按钮 |
+
+### 其他修复
+
+| 功能 | 说明 |
+|------|------|
+| .gitattributes | 确保迁移文件使用 LF 换行符（解决 Windows 下 SQL 摘要不一致） |
+| 部署配置优化 | DATABASE_HOST 和 DATABASE_SSLMODE 可通过 .env 配置 |
+
+---
+
+## Admin API 接口文档
+
+### ⚠️ API 操作流程规范
+
+当收到操作正式环境 Web 界面的新需求，但文档中未记录对应 API 接口时，**必须按以下流程执行**：
+
+1. **探索接口**：通过代码库搜索路由定义（`backend/internal/server/routes/`）、Handler（`backend/internal/handler/admin/`）和请求结构体，确定正确的 API 端点、请求方法、请求体格式
+2. **更新文档**：将新发现的接口补充到本文档的 Admin API 接口文档章节中，包含端点、参数说明和 curl 示例
+3. **执行操作**：根据最新文档中记录的接口完成用户需求
+
+> **目的**：避免每次遇到相同需求都重复探索代码库，确保 API 文档持续完善，后续操作可直接查阅文档执行。
+
+---
+
+### 认证方式
+
+所有 Admin API 通过 `x-api-key` 请求头传递 Admin API Key 认证。
+
+```
+x-api-key: admin-xxx
+```
+
+> **使用说明**：Admin API Key 统一存放在项目根目录 `.env` 文件的 `ADMIN_API_KEY` 变量中（该文件已被 `.gitignore` 排除，不会提交到代码库）。操作前先从 `.env` 读取密钥；若密钥失效（返回 401），应提示用户提供新的密钥并更新到 `.env` 中。Token 格式为 `admin-` + 64 位十六进制字符，在管理后台 `设置 > Admin API Key` 中生成。**请勿将实际 token 写入文档或代码中。**
+
+### 环境地址
+
+| 环境 | 基础地址 | 说明 |
+|------|----------|------|
+| 正式 | `https://clicodeplus.com` | 生产环境 |
+| Beta | `http://<服务器IP>:8084` | 仅内网访问 |
+| OpenAI | `http://<服务器IP>:8083` | 仅内网访问 |
+| Star | `https://hyntoken.com` | 独立环境 |
+
+> 以下接口文档中，`${BASE}` 代表环境基础地址，`${KEY}` 代表 `.env` 中的 `ADMIN_API_KEY`。操作前执行 `source .env` 或 `export KEY=$ADMIN_API_KEY` 加载。
+
+---
+
+### 1. 账号管理
+
+#### 1.1 获取账号列表
+
+```
+GET /api/v1/admin/accounts
+```
+
+**查询参数**：
+
+| 参数 | 类型 | 必填 | 说明 |
+|------|------|------|------|
+| `platform` | string | 否 | 平台筛选：`antigravity` / `anthropic` / `openai` / `gemini` |
+| `type` | string | 否 | 账号类型：`oauth` / `api_key` / `cookie` |
+| `status` | string | 否 | 状态：`active` / `disabled` / `error` |
+| `search` | string | 否 | 搜索关键词（名称、备注） |
+| `page` | int | 否 | 页码，默认 1 |
+| `page_size` | int | 否 | 每页数量，默认 20 |
+
+```bash
+curl -s "${BASE}/api/v1/admin/accounts?platform=antigravity&page=1&page_size=100" \
+  -H "x-api-key: ${KEY}"
+```
+
+**响应**：
+```json
+{
+  "code": 0,
+  "message": "success",
+  "data": {
+    "items": [{"id": 1, "name": "xxx@gmail.com", "platform": "antigravity", "status": "active", ...}],
+    "total": 66
+  }
+}
+```
+
+#### 1.2 获取账号详情
+
+```
+GET /api/v1/admin/accounts/:id
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/accounts/1" -H "x-api-key: ${KEY}"
+```
+
+#### 1.3 测试账号连接
+
+```
+POST /api/v1/admin/accounts/:id/test
+```
+
+**请求体**（JSON，可选）：
+
+| 字段 | 类型 | 必填 | 说明 |
+|------|------|------|------|
+| `model_id` | string | 否 | 指定测试模型，如 `claude-opus-4-6`；不传则使用默认模型 |
+
+**响应格式**：SSE（Server-Sent Events）流
+
+```bash
+curl -N -X POST "${BASE}/api/v1/admin/accounts/1/test" \
+  -H "x-api-key: ${KEY}" \
+  -H "Content-Type: application/json" \
+  -d '{"model_id": "claude-opus-4-6"}'
+```
+
+**SSE 事件类型**：
+
+| type | 字段 | 说明 |
+|------|------|------|
+| `test_start` | `model` | 测试开始，返回测试模型名 |
+| `content` | `text` | 模型响应内容（流式文本片段） |
+| `test_end` | `success`, `error` | 测试结束，`success=true` 表示成功 |
+| `error` | `text` | 错误信息 |
+
+#### 1.4 清除账号限流
+
+```
+POST /api/v1/admin/accounts/:id/clear-rate-limit
+```
+
+```bash
+curl -X POST "${BASE}/api/v1/admin/accounts/1/clear-rate-limit" \
+  -H "x-api-key: ${KEY}"
+```
+
+#### 1.5 清除账号错误状态
+
+```
+POST /api/v1/admin/accounts/:id/clear-error
+```
+
+```bash
+curl -X POST "${BASE}/api/v1/admin/accounts/1/clear-error" \
+  -H "x-api-key: ${KEY}"
+```
+
+#### 1.6 获取账号可用模型
+
+```
+GET /api/v1/admin/accounts/:id/models
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/accounts/1/models" -H "x-api-key: ${KEY}"
+```
+
+#### 1.7 刷新 OAuth Token
+
+```
+POST /api/v1/admin/accounts/:id/refresh
+```
+
+```bash
+curl -X POST "${BASE}/api/v1/admin/accounts/1/refresh" -H "x-api-key: ${KEY}"
+```
+
+#### 1.8 刷新账号等级
+
+```
+POST /api/v1/admin/accounts/:id/refresh-tier
+```
+
+```bash
+curl -X POST "${BASE}/api/v1/admin/accounts/1/refresh-tier" -H "x-api-key: ${KEY}"
+```
+
+#### 1.9 获取账号统计
+
+```
+GET /api/v1/admin/accounts/:id/stats
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/accounts/1/stats" -H "x-api-key: ${KEY}"
+```
+
+#### 1.10 获取账号用量
+
+```
+GET /api/v1/admin/accounts/:id/usage
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/accounts/1/usage" -H "x-api-key: ${KEY}"
+```
+
+#### 1.11 更新单个账号
+
+```
+PUT /api/v1/admin/accounts/:id
+```
+
+**请求体**（JSON，所有字段均为可选，仅传需要更新的字段）：
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `name` | string | 账号名称 |
+| `notes` | *string | 备注 |
+| `type` | string | 类型：`oauth` / `setup-token` / `apikey` / `upstream` |
+| `credentials` | object | 凭证信息 |
+| `extra` | object | 额外配置 |
+| `proxy_id` | *int64 | 代理 ID |
+| `concurrency` | *int | 并发数 |
+| `priority` | *int | 优先级（默认 50） |
+| `rate_multiplier` | *float64 | 速率倍数 |
+| `status` | string | 状态：`active` / `inactive` |
+| `group_ids` | *[]int64 | 分组 ID 列表 |
+| `expires_at` | *int64 | 过期时间戳 |
+| `auto_pause_on_expired` | *bool | 过期后自动暂停 |
+
+> 使用指针类型（`*`）的字段可以区分"未提供"和"设置为零值"。
+
+```bash
+# 示例：更新账号优先级为 100
+curl -X PUT "${BASE}/api/v1/admin/accounts/1" \
+  -H "x-api-key: ${KEY}" \
+  -H "Content-Type: application/json" \
+  -d '{"priority": 100}'
+```
+
+#### 1.12 批量更新账号
+
+```
+POST /api/v1/admin/accounts/bulk-update
+```
+
+**请求体**（JSON）：
+
+| 字段 | 类型 | 必填 | 说明 |
+|------|------|------|------|
+| `account_ids` | []int64 | **是** | 要更新的账号 ID 列表 |
+| `priority` | *int | 否 | 优先级 |
+| `concurrency` | *int | 否 | 并发数 |
+| `rate_multiplier` | *float64 | 否 | 速率倍数 |
+| `status` | string | 否 | 状态：`active` / `inactive` / `error` |
+| `schedulable` | *bool | 否 | 是否可调度 |
+| `group_ids` | *[]int64 | 否 | 分组 ID 列表 |
+| `proxy_id` | *int64 | 否 | 代理 ID |
+| `credentials` | object | 否 | 凭证信息（批量覆盖） |
+| `extra` | object | 否 | 额外配置（批量覆盖） |
+
+```bash
+# 示例：批量设置多个账号优先级为 100
+curl -X POST "${BASE}/api/v1/admin/accounts/bulk-update" \
+  -H "x-api-key: ${KEY}" \
+  -H "Content-Type: application/json" \
+  -d '{"account_ids": [1, 2, 3], "priority": 100}'
+```
+
+#### 1.13 批量测试账号（脚本）
+
+批量测试指定平台所有账号的指定模型连通性：
+
+```bash
+# 用户需提供：BASE（环境地址）、KEY（admin token）、MODEL（测试模型）
+ACCOUNT_IDS=$(curl -s "${BASE}/api/v1/admin/accounts?platform=antigravity&page=1&page_size=100" \
+  -H "x-api-key: ${KEY}" | python3 -c "
+import json, sys
+data = json.load(sys.stdin)
+for item in data['data']['items']:
+    print(f\"{item['id']}|{item['name']}\")
+")
+
+while IFS='|' read -r ID NAME; do
+    echo "测试账号 ID=${ID} (${NAME})..."
+    RESPONSE=$(curl -s --max-time 60 -N \
+      -X POST "${BASE}/api/v1/admin/accounts/${ID}/test" \
+      -H "x-api-key: ${KEY}" \
+      -H "Content-Type: application/json" \
+      -d "{\"model_id\": \"${MODEL}\"}" 2>&1)
+    if echo "$RESPONSE" | grep -q '"success":true'; then
+        echo "  ✅ 成功"
+    elif echo "$RESPONSE" | grep -q '"type":"content"'; then
+        echo "  ✅ 成功（有内容响应）"
+    else
+        ERROR_MSG=$(echo "$RESPONSE" | grep -o '"error":"[^"]*"' | tail -1)
+        echo "  ❌ 失败: ${ERROR_MSG}"
+    fi
+done <<< "$ACCOUNT_IDS"
+```
+
+---
+
+### 2. 运维监控
+
+#### 2.1 并发统计
+
+```
+GET /api/v1/admin/ops/concurrency
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/ops/concurrency" -H "x-api-key: ${KEY}"
+```
+
+#### 2.2 账号可用性
+
+```
+GET /api/v1/admin/ops/account-availability
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/ops/account-availability" -H "x-api-key: ${KEY}"
+```
+
+#### 2.3 实时流量摘要
+
+```
+GET /api/v1/admin/ops/realtime-traffic
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/ops/realtime-traffic" -H "x-api-key: ${KEY}"
+```
+
+#### 2.4 请求错误列表
+
+```
+GET /api/v1/admin/ops/request-errors
+```
+
+**查询参数**：`page`、`page_size`
+
+```bash
+curl -s "${BASE}/api/v1/admin/ops/request-errors?page=1&page_size=50" \
+  -H "x-api-key: ${KEY}"
+```
+
+#### 2.5 上游错误列表
+
+```
+GET /api/v1/admin/ops/upstream-errors
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/ops/upstream-errors?page=1&page_size=50" \
+  -H "x-api-key: ${KEY}"
+```
+
+#### 2.6 仪表板概览
+
+```
+GET /api/v1/admin/ops/dashboard/overview
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/ops/dashboard/overview" -H "x-api-key: ${KEY}"
+```
+
+---
+
+### 3. 系统设置
+
+#### 3.1 获取系统设置
+
+```
+GET /api/v1/admin/settings
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/settings" -H "x-api-key: ${KEY}"
+```
+
+#### 3.2 更新系统设置
+
+```
+PUT /api/v1/admin/settings
+```
+
+```bash
+curl -X PUT "${BASE}/api/v1/admin/settings" \
+  -H "x-api-key: ${KEY}" \
+  -H "Content-Type: application/json" \
+  -d '{ ... }'
+```
+
+#### 3.3 Admin API Key 状态（脱敏）
+
+```
+GET /api/v1/admin/settings/admin-api-key
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/settings/admin-api-key" -H "x-api-key: ${KEY}"
+```
+
+---
+
+### 4. 用户管理
+
+#### 4.1 用户列表
+
+```
+GET /api/v1/admin/users
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/users?page=1&page_size=20" -H "x-api-key: ${KEY}"
+```
+
+#### 4.2 用户详情
+
+```
+GET /api/v1/admin/users/:id
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/users/1" -H "x-api-key: ${KEY}"
+```
+
+#### 4.3 更新用户余额
+
+```
+POST /api/v1/admin/users/:id/balance
+```
+
+```bash
+curl -X POST "${BASE}/api/v1/admin/users/1/balance" \
+  -H "x-api-key: ${KEY}" \
+  -H "Content-Type: application/json" \
+  -d '{"amount": 100, "reason": "充值"}'
+```
+
+---
+
+### 5. 分组管理
+
+#### 5.1 分组列表
+
+```
+GET /api/v1/admin/groups
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/groups" -H "x-api-key: ${KEY}"
+```
+
+#### 5.2 所有分组（不分页）
+
+```
+GET /api/v1/admin/groups/all
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/groups/all" -H "x-api-key: ${KEY}"
+```
+
+---
+
+## 注意事项
+
+1. **前端必须打包进镜像**：使用 `docker build` 在构建服务器（`us-asaki-root`）上构建，Dockerfile 会自动编译前端并 embed 到后端二进制中，构建完成后通过 `docker save | docker load` 传输到生产服务器（`clicodeplus`）
+
+2. **镜像标签**：docker-compose.yml 使用 `weishaw/sub2api:latest`，本地构建后需要 `docker tag` 覆盖
+
+3. **Windows 换行符问题**：已通过 `.gitattributes` 解决，确保 `*.sql` 文件始终使用 LF
+
+4. **版本号管理**：每次发布必须更新 `backend/cmd/server/VERSION` 并打标签
+
+5. **合并冲突**：合并上游新版本时，重点关注以下文件可能的冲突：
+   - `backend/internal/service/antigravity_gateway_service.go`
+   - `backend/internal/service/gateway_service.go`
+   - `backend/internal/pkg/antigravity/request_transformer.go`
+
+---
+
+## Go 代码规范
+
+### 1. 函数设计
+
+#### 单一职责原则
+- **函数行数**：单个函数常规不应超过 **30 行**，超过时应拆分为子函数。若某段逻辑确实不可拆分（如复杂的状态机、协议解析等），可以例外，但需添加注释说明原因
+- **嵌套层级**：避免超过 3 层嵌套，使用 early return 减少嵌套
+
+```go
+// ❌ 不推荐：深层嵌套
+func process(data []Item) {
+    for _, item := range data {
+        if item.Valid {
+            if item.Type == "A" {
+                if item.Status == "active" {
+                    // 业务逻辑...
+                }
+            }
+        }
+    }
+}
+
+// ✅ 推荐：early return
+func process(data []Item) {
+    for _, item := range data {
+        if !item.Valid {
+            continue
+        }
+        if item.Type != "A" {
+            continue
+        }
+        if item.Status != "active" {
+            continue
+        }
+        // 业务逻辑...
+    }
+}
+```
+
+#### 复杂逻辑提取
+将复杂的条件判断或处理逻辑提取为独立函数：
+
+```go
+// ❌ 不推荐：内联复杂逻辑
+if resp.StatusCode == 429 || resp.StatusCode == 503 {
+    // 80+ 行处理逻辑...
+}
+
+// ✅ 推荐：提取为独立函数
+result := handleRateLimitResponse(resp, params)
+switch result.action {
+case actionRetry:
+    continue
+case actionBreak:
+    return result.resp, nil
+}
+```
+
+### 2. 重复代码消除
+
+#### 配置获取模式
+将重复的配置获取逻辑提取为方法：
+
+```go
+// ❌ 不推荐：重复代码
+logBody := s.settingService != nil && s.settingService.cfg != nil && s.settingService.cfg.Gateway.LogUpstreamErrorBody
+maxBytes := 2048
+if s.settingService != nil && s.settingService.cfg != nil && s.settingService.cfg.Gateway.LogUpstreamErrorBodyMaxBytes > 0 {
+    maxBytes = s.settingService.cfg.Gateway.LogUpstreamErrorBodyMaxBytes
+}
+
+// ✅ 推荐：提取为方法
+func (s *Service) getLogConfig() (logBody bool, maxBytes int) {
+    maxBytes = 2048
+    if s.settingService == nil || s.settingService.cfg == nil {
+        return false, maxBytes
+    }
+    cfg := s.settingService.cfg.Gateway
+    if cfg.LogUpstreamErrorBodyMaxBytes > 0 {
+        maxBytes = cfg.LogUpstreamErrorBodyMaxBytes
+    }
+    return cfg.LogUpstreamErrorBody, maxBytes
+}
+```
+
+### 3. 常量管理
+
+#### 避免魔法数字
+所有硬编码的数值都应定义为常量：
+
+```go
+// ❌ 不推荐
+if retryDelay >= 10*time.Second {
+    resetAt := time.Now().Add(30 * time.Second)
+}
+
+// ✅ 推荐
+const (
+    rateLimitThreshold       = 10 * time.Second
+    defaultRateLimitDuration = 30 * time.Second
+)
+
+if retryDelay >= rateLimitThreshold {
+    resetAt := time.Now().Add(defaultRateLimitDuration)
+}
+```
+
+#### 注释引用常量名
+在注释中引用常量名而非硬编码值：
+
+```go
+// ❌ 不推荐
+// < 10s: 等待后重试
+
+// ✅ 推荐
+// < rateLimitThreshold: 等待后重试
+```
+
+### 4. 错误处理
+
+#### 使用结构化日志
+优先使用 `slog` 进行结构化日志记录：
+
+```go
+// ❌ 不推荐
+log.Printf("%s status=%d model_rate_limit_failed model=%s error=%v", prefix, statusCode, modelName, err)
+
+// ✅ 推荐
+slog.Error("failed to set model rate limit",
+    "prefix", prefix,
+    "status_code", statusCode,
+    "model", modelName,
+    "error", err,
+)
+```
+
+### 5. 测试规范
+
+#### Mock 函数签名同步
+修改函数签名时，必须同步更新所有测试中的 mock 函数：
+
+```go
+// 如果修改了 handleError 签名
+handleError func(..., groupID int64, sessionHash string) *Result
+
+// 必须同步更新测试中的 mock
+handleError: func(..., groupID int64, sessionHash string) *Result {
+    return nil
+},
+```
+
+#### 测试构建标签
+统一使用测试构建标签：
+
+```go
+//go:build unit
+
+package service
+```
+
+### 6. 时间格式解析
+
+#### 使用标准库
+优先使用 `time.ParseDuration`，支持所有 Go duration 格式：
+
+```go
+// ❌ 不推荐：手动限制格式
+if !strings.HasSuffix(delay, "s") || strings.Contains(delay, "m") {
+    continue
+}
+
+// ✅ 推荐：使用标准库
+dur, err := time.ParseDuration(delay) // 支持 "0.5s", "4m50s", "1h30m" 等
+```
+
+### 7. 接口设计
+
+#### 接口隔离原则
+定义最小化接口，只包含必需的方法：
+
+```go
+// ❌ 不推荐：使用过于宽泛的接口
+type AccountRepository interface {
+    // 20+ 个方法...
+}
+
+// ✅ 推荐：定义最小化接口
+type ModelRateLimiter interface {
+    SetModelRateLimit(ctx context.Context, id int64, modelKey string, resetAt time.Time) error
+}
+```
+
+### 8. 并发安全
+
+#### 共享数据保护
+访问可能被并发修改的数据时，确保线程安全：
+
+```go
+// 如果 Account.Extra 可能被并发修改
+// 需要使用互斥锁或原子操作保护读取
+func (a *Account) GetRateLimitRemainingTime(model string) time.Duration {
+    a.mu.RLock()
+    defer a.mu.RUnlock()
+    // 读取 Extra 字段...
+}
+```
+
+### 9. 命名规范
+
+#### 一致的命名风格
+- 常量使用 camelCase：`rateLimitThreshold`
+- 类型使用 PascalCase：`AntigravityQuotaScope`
+- 同一概念使用统一命名：`Threshold` 或 `Limit`，不要混用
+
+```go
+// ❌ 不推荐：命名不一致
+antigravitySmartRetryMinWait    // 使用 Min
+antigravityRateLimitThreshold   // 使用 Threshold
+
+// ✅ 推荐：统一风格
+antigravityMinRetryWait
+antigravityRateLimitThreshold
+```
+
+### 10. 代码审查清单
+
+在提交代码前，检查以下项目：
+
+- [ ] 函数是否超过 30 行？（不可拆分的逻辑除外，需注释说明）
+- [ ] 嵌套是否超过 3 层？
+- [ ] 是否有重复代码可以提取？
+- [ ] 是否使用了魔法数字？
+- [ ] Mock 函数签名是否与实际函数一致？
+- [ ] 测试是否覆盖了新增逻辑？
+- [ ] 日志是否包含足够的上下文信息？
+- [ ] 是否考虑了并发安全？
+
+---
+
+## CI 检查与发布门禁
+
+### GitHub Actions 检查项
+
+本项目有 4 个 CI 任务，**任何代码推送或发布前都必须全部通过**：
+
+| Workflow | Job | 说明 | 本地验证命令 |
+|----------|-----|------|-------------|
+| CI | `test` | 单元测试 + 集成测试 | `cd backend && make test-unit && make test-integration` |
+| CI | `golangci-lint` | Go 代码静态检查（golangci-lint v2.7） | `cd backend && golangci-lint run --timeout=5m` |
+| Security Scan | `backend-security` | govulncheck + gosec 安全扫描 | `cd backend && govulncheck ./... && gosec -severity high -confidence high ./...` |
+| Security Scan | `frontend-security` | pnpm audit 前端依赖安全检查 | `cd frontend && pnpm audit --prod --audit-level=high` |
+
+### 向上游提交 PR
+
+PR 目标是上游官方仓库，**只包含通用功能改动**（bug fix、新功能、性能优化等）。
+
+**以下文件禁止出现在 PR 中**（属于我们 fork 的定制化内容）：
+- `CLAUDE.md`、`AGENTS.md` — 我们的开发文档
+- `backend/cmd/server/VERSION` — 我们的版本号文件
+- UI 定制改动（GitHub 链接移除、微信客服按钮、首页定制等）
+- 部署配置（`deploy/` 目录下的定制修改）
+
+**PR 流程**：
+1. 从 `develop` 创建功能分支，只包含要提交给上游的改动
+2. 推送分支后，**等待 4 个 CI job 全部通过**
+3. 确认通过后再创建 PR
+4. 使用 `gh run list --repo touwaeriol/sub2api --branch <branch>` 检查状态
+
+### 自有分支推送（develop / main）
+
+推送到我们自己的 `develop` 或 `main` 分支时，包含所有改动（定制化 + 通用功能）。
+
+**推送前必须在本地执行全部 CI 检查**（不要等 GitHub Actions）：
+
+```bash
+# 确保 Go 工具链可用（macOS homebrew）
+export PATH="/opt/homebrew/bin:$HOME/go/bin:$PATH"
+
+# 1. 单元测试（必须）
+cd backend && make test-unit
+
+# 2. 集成测试（推荐，需要 Docker）
+make test-integration
+
+# 3. golangci-lint 静态检查（必须）
+golangci-lint run --timeout=5m
+
+# 4. gofmt 格式检查（必须）
+gofmt -l ./...
+# 如果有输出，运行 gofmt -w <file> 修复
+```
+
+**推送后确认**：
+1. 使用 `gh run list --repo touwaeriol/sub2api --branch <branch>` 检查 GitHub Actions 状态
+2. 确认 CI 和 Security Scan 两个 workflow 的 4 个 job 全部绿色 ✅
+3. 任何 job 失败必须立即修复，**禁止在 CI 未通过的状态下继续后续操作**
+
+### 发布版本
+
+1. 本地执行上述全部 CI 检查通过
+2. 递增 `backend/cmd/server/VERSION`，提交并推送
+3. 推送后确认 GitHub Actions 的 4 个 CI job 全部通过
+4. **CI 未通过时禁止部署** — 必须先修复问题
+5. 使用 `gh run list --repo touwaeriol/sub2api --limit 10` 确认状态
+
+### 常见 CI 失败原因及修复
+- **gofmt**：struct 字段对齐不一致 → 运行 `gofmt -w <file>` 修复
+- **golangci-lint**：未使用的变量/导入 → 删除或使用 `_` 忽略
+- **test 失败**：mock 函数签名不一致 → 同步更新 mock
+- **gosec**：安全漏洞 → 根据提示修复或添加例外
+
+---
+
+## PR 描述格式规范
+
+所有 PR 描述使用中英文同步（先中文、后英文），包含以下三个部分：
+
+### 模板
+
+```markdown
+## 背景 / Background
+
+<一两句说明问题现状或触发原因>
+
+<English version of the background>
+
+---
+
+## 目的 / Purpose
+
+<本次改动要解决的问题或达到的目标>
+
+<English version of the purpose>
+
+---
+
+## 改动内容 / Changes
+
+### 后端 / Backend
+
+- **改动点 1**：说明
+- **改动点 2**：说明
+
+---
+
+- **Change 1**: description
+- **Change 2**: description
+
+### 前端 / Frontend
+
+- **改动点 1**：说明
+- **改动点 2**：说明
+
+---
+
+- **Change 1**: description
+- **Change 2**: description
+
+---
+
+## 截图 / Screenshot（可选）
+
+ASCII 示意图或实际截图
+```
+
+### 规范要点
+
+- **标题**：使用 conventional commits 格式，如 `feat(scope): description`
+- **中英文顺序**：同一段落先中文后英文，用空行分隔，不用 `---` 分割同段内容
+- **改动分类**：按 Backend / Frontend / Config 等模块分组，先列中文要点再列英文要点
+- **截图/示意图**：有 UI 变动时必须附上，可用 ASCII 示意布局
+- **目标分支**：提交到 `touwaeriol/sub2api` 的 `main` 分支
diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 00000000..b634af05
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,1337 @@
+# Sub2API 开发说明
+
+## 版本管理策略
+
+### 版本号规则
+
+我们在官方版本号后面添加自己的小版本号：
+
+- 官方版本：`v0.1.68`
+- 我们的版本：`v0.1.68.1`、`v0.1.68.2`（递增）
+
+### 分支策略
+
+| 分支 | 说明 |
+|------|------|
+| `main` | 我们的主分支，包含所有定制功能 |
+| `release/custom-X.Y.Z` | 基于官方 `vX.Y.Z` 的发布分支 |
+| `upstream/main` | 上游官方仓库 |
+
+---
+
+## 发布流程（基于新官方版本）
+
+当官方发布新版本（如 `v0.1.69`）时：
+
+### 1. 同步上游并创建发布分支
+
+```bash
+# 获取上游最新代码
+git fetch upstream --tags
+
+# 基于官方标签创建新的发布分支
+git checkout v0.1.69 -b release/custom-0.1.69
+
+# 合并我们的 main 分支（包含所有定制功能）
+git merge main --no-edit
+
+# 解决可能的冲突后继续
+```
+
+### 2. 更新版本号并打标签
+
+```bash
+# 更新版本号文件
+echo "0.1.69.1" > backend/cmd/server/VERSION
+git add backend/cmd/server/VERSION
+git commit -m "chore: bump version to 0.1.69.1"
+
+# 打上我们自己的标签
+git tag v0.1.69.1
+
+# 推送分支和标签
+git push origin release/custom-0.1.69
+git push origin v0.1.69.1
+```
+
+### 3. 更新 main 分支
+
+```bash
+# 将发布分支合并回 main，保持 main 包含最新定制功能
+git checkout main
+git merge release/custom-0.1.69
+git push origin main
+```
+
+---
+
+## 热修复发布（在现有版本上修复）
+
+当需要在当前版本上发布修复时：
+
+```bash
+# 在当前发布分支上修复
+git checkout release/custom-0.1.68
+# ... 进行修复 ...
+git commit -m "fix: 修复描述"
+
+# 递增小版本号
+echo "0.1.68.2" > backend/cmd/server/VERSION
+git add backend/cmd/server/VERSION
+git commit -m "chore: bump version to 0.1.68.2"
+
+# 打标签并推送
+git tag v0.1.68.2
+git push origin release/custom-0.1.68
+git push origin v0.1.68.2
+
+# 同步修复到 main
+git checkout main
+git cherry-pick <fix-commit-hash>
+git push origin main
+```
+
+---
+
+## 服务器部署流程
+
+### 前置条件
+
+- 本地已配置 SSH 别名 `clicodeplus` 连接到生产服务器（运行服务 + 构建镜像）
+- 生产服务器部署目录：`/root/sub2api`（正式）、`/root/sub2api-beta`（测试）、`/root/sub2api-star`（Star）
+- 生产服务器使用 Docker Compose 部署
+- **镜像在生产服务器本机构建**，使用资源限制的 `limited-builder` 构建器（3 核 CPU、4G 内存），避免构建占满服务器资源影响线上服务
+
+### 服务器角色说明
+
+| 服务器 | SSH 别名 | 职责 |
+|--------|----------|------|
+| 生产服务器 | `clicodeplus` | 拉取代码、构建镜像、运行服务、部署验证 |
+| 数据库服务器 | `db-clicodeplus` | PostgreSQL 16 + Redis 7，所有环境共用 |
+
+> 数据库服务器运维手册：`db-clicodeplus:/root/README.md`
+
+### 构建器说明
+
+生产服务器上配置了资源限制的 Docker buildx 构建器 `limited-builder`，**所有构建操作必须使用此构建器**：
+
+- **构建器名称**：`limited-builder`
+- **驱动**：`docker-container`（独立容器运行 BuildKit）
+- **资源限制**：3 核 CPU、4G 内存（服务器共 6 核 8G，预留一半给线上服务）
+- **容器名**：`buildx_buildkit_limited-builder0`
+
+```bash
+# 构建命令格式（必须指定 --builder）
+ssh clicodeplus "cd /root/sub2api && docker buildx build --builder limited-builder --no-cache --load -t sub2api:latest -f Dockerfile ."
+
+# 查看构建器状态
+ssh clicodeplus "docker buildx inspect limited-builder"
+
+# 如果构建器容器被意外删除，重新创建：
+ssh clicodeplus "docker buildx create --name limited-builder --driver docker-container --driver-opt 'default-load=true' && docker buildx inspect --builder limited-builder --bootstrap && docker update --cpus=3 --memory=4g --memory-swap=4g buildx_buildkit_limited-builder0"
+```
+
+### 部署环境说明
+
+| 环境 | 目录（生产服务器） | 端口 | 数据库 | Redis DB | 容器名 |
+|------|------|------|--------|----------|--------|
+| 正式 | `/root/sub2api` | 8080 | `sub2api` | 0 | `sub2api` |
+| Beta | `/root/sub2api-beta` | 8084 | `beta` | 2 | `sub2api-beta` |
+| OpenAI | `/root/sub2api-openai` | 8083 | `openai` | 3 | `sub2api-openai` |
+| Star | `/root/sub2api-star` | 8086 | `star` | 4 | `sub2api-star` |
+
+### 外部数据库与 Redis
+
+所有环境（正式、Beta、OpenAI、Star）共用 `db.clicodeplus.com` 上的 **PostgreSQL 16** 和 **Redis 7**，不使用容器内数据库或 Redis。
+
+**PostgreSQL**（端口 5432，TLS 加密，scram-sha-256 认证）：
+
+| 环境 | 用户名 | 数据库 |
+|------|--------|--------|
+| 正式 | `sub2api` | `sub2api` |
+| Beta | `beta` | `beta` |
+| OpenAI | `openai` | `openai` |
+| Star | `star` | `star` |
+
+**Redis**（端口 6379，密码认证）：
+
+| 环境 | DB |
+|------|-----|
+| 正式 | 0 |
+| Beta | 2 |
+| OpenAI | 3 |
+| Star | 4 |
+
+**配置方式**：
+- 数据库通过 `.env` 中的 `DATABASE_HOST`、`DATABASE_SSLMODE`、`POSTGRES_USER`、`POSTGRES_PASSWORD`、`POSTGRES_DB` 配置
+- Redis 通过 `docker-compose.override.yml` 覆盖 `REDIS_HOST`（因主 compose 文件硬编码为 `redis`），密码通过 `.env` 中的 `REDIS_PASSWORD` 配置
+- 各环境的 `docker-compose.override.yml` 已通过 `depends_on: !reset {}` 和 `redis: profiles: [disabled]` 去掉了对容器 Redis 的依赖
+
+#### 数据库操作命令
+
+通过 SSH 在服务器上执行数据库操作：
+
+```bash
+# 正式环境 - 查询迁移记录
+ssh clicodeplus "source /root/sub2api/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c 'SELECT * FROM schema_migrations ORDER BY applied_at DESC LIMIT 5;'"
+
+# Beta 环境 - 查询迁移记录
+ssh clicodeplus "source /root/sub2api-beta/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c 'SELECT * FROM schema_migrations ORDER BY applied_at DESC LIMIT 5;'"
+
+# Beta 环境 - 清除指定迁移记录（重新执行迁移）
+ssh clicodeplus "source /root/sub2api-beta/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c \"DELETE FROM schema_migrations WHERE filename LIKE '%049%';\""
+
+# Beta 环境 - 更新账号数据
+ssh clicodeplus "source /root/sub2api-beta/deploy/.env && PGPASSWORD=\"\$POSTGRES_PASSWORD\" psql -h \$DATABASE_HOST -U \$POSTGRES_USER -d \$POSTGRES_DB -c \"UPDATE accounts SET credentials = credentials - 'model_mapping' WHERE platform = 'antigravity';\""
+```
+
+> **注意**：使用 `source .env` 加载环境变量，避免在命令行中暴露密码。
+
+### 部署步骤
+
+**重要：每次部署都必须递增版本号！**
+
+#### 0. 递增版本号并推送（本地操作）
+
+每次部署前，先在本地递增小版本号并确保推送成功：
+
+```bash
+# 查看当前版本号
+cat backend/cmd/server/VERSION
+# 假设当前是 0.1.69.1
+
+# 递增版本号
+echo "0.1.69.2" > backend/cmd/server/VERSION
+git add backend/cmd/server/VERSION
+git commit -m "chore: bump version to 0.1.69.2"
+git push origin release/custom-0.1.69
+
+# ⚠️ 确认推送成功（必须看到分支更新输出，不能有 rejected 错误）
+```
+
+> **检查点**：如果有其他未提交的改动，应先 commit 并 push，确保 release 分支上的所有代码都已推送到远程。
+
+#### 1. 生产服务器拉取代码
+
+```bash
+# 拉取最新代码并切换分支
+ssh clicodeplus "cd /root/sub2api && git fetch fork && git checkout -B release/custom-0.1.69 fork/release/custom-0.1.69"
+
+# ⚠️ 验证版本号与步骤 0 一致
+ssh clicodeplus "cat /root/sub2api/backend/cmd/server/VERSION"
+```
+
+#### 2. 生产服务器构建镜像（使用 limited-builder）
+
+```bash
+ssh clicodeplus "cd /root/sub2api && docker buildx build --builder limited-builder --no-cache --load -t sub2api:latest -f Dockerfile ."
+
+# ⚠️ 必须看到构建成功输出，如果失败需要先排查问题
+```
+
+> **常见构建问题**：
+> - 构建器未启动 → `docker buildx inspect --builder limited-builder --bootstrap`
+> - 磁盘空间不足 → `docker system prune -f` 清理无用镜像
+> - 构建器被删除 → 参见上方「构建器说明」重新创建
+
+#### 3. 更新镜像标签并重启
+
+```bash
+# 更新镜像标签并重启
+ssh clicodeplus "docker tag sub2api:latest weishaw/sub2api:latest"
+ssh clicodeplus "cd /root/sub2api/deploy && docker compose up -d --force-recreate sub2api"
+```
+
+#### 4. 验证部署
+
+```bash
+# 查看启动日志
+ssh clicodeplus "docker logs sub2api --tail 20"
+
+# 确认版本号（必须与步骤 0 中设置的版本号一致）
+ssh clicodeplus "cat /root/sub2api/backend/cmd/server/VERSION"
+
+# 检查容器状态（必须显示 healthy）
+ssh clicodeplus "docker ps | grep sub2api"
+```
+
+---
+
+## Beta 并行部署（不影响现网）
+
+目标：在同一台服务器上并行启动一个 beta 实例（例如端口 `8084`），**严禁改动/重启**现网实例（默认目录 `/root/sub2api`）。
+
+### 设计原则
+
+- **新目录**：beta 使用独立目录，例如 `/root/sub2api-beta`。
+- **敏感信息只放 `.env`**：beta 的数据库密码、JWT_SECRET 等只写入 `/root/sub2api-beta/deploy/.env`，不要提交到 git。
+- **独立 Compose Project**：通过 `docker compose -p sub2api-beta ...` 启动，确保 network/volume 隔离。
+- **独立端口**：通过 `.env` 的 `SERVER_PORT` 映射宿主机端口（例如 `8084:8080`）。
+
+### 前置检查
+
+```bash
+# 1) 确保 8084 未被占用
+ssh clicodeplus "ss -ltnp | grep :8084 || echo '8084 is free'"
+
+# 2) 确认现网容器还在（只读检查）
+ssh clicodeplus "docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Ports}}' | sed -n '1,200p'"
+```
+
+### 首次部署步骤
+
+> **构建说明**：正式和 beta 通过不同的镜像标签区分（`sub2api:latest` 用于正式，`sub2api:beta` 用于测试），均在生产服务器本机使用 `limited-builder` 构建。
+
+```bash
+# 1) 在生产服务器上拉取代码并构建 beta 镜像
+ssh clicodeplus "cd /root/sub2api-beta && git fetch --all --tags && git checkout -f release/custom-0.1.71 && git reset --hard origin/release/custom-0.1.71"
+ssh clicodeplus "cd /root/sub2api-beta && docker buildx build --builder limited-builder --no-cache --load -t sub2api:beta -f Dockerfile ."
+
+# 2) 在生产服务器上准备 beta 环境
+ssh clicodeplus
+
+# 克隆代码（仅用于 deploy 配置和版本号确认，不在此构建）
+cd /root
+git clone https://github.com/touwaeriol/sub2api.git sub2api-beta
+cd /root/sub2api-beta
+git checkout release/custom-0.1.71
+
+# 4) 准备 beta 的 .env（敏感信息只写这里）
+cd /root/sub2api-beta/deploy
+
+# 推荐：从现网 .env 复制，保证除 DB 名/用户/端口外完全一致
+cp -f /root/sub2api/deploy/.env ./.env
+
+# 仅修改以下三项（其他保持不变）
+perl -pi -e 's/^SERVER_PORT=.*/SERVER_PORT=8084/' ./.env
+perl -pi -e 's/^POSTGRES_USER=.*/POSTGRES_USER=beta/' ./.env
+perl -pi -e 's/^POSTGRES_DB=.*/POSTGRES_DB=beta/' ./.env
+
+# 5) 写 compose override（避免与现网容器名冲突，镜像使用本机构建的 sub2api:beta，Redis 使用外部服务）
+cat > docker-compose.override.yml <<'YAML'
+services:
+  sub2api:
+    image: sub2api:beta
+    container_name: sub2api-beta
+    environment:
+      - DATABASE_HOST=${DATABASE_HOST:-postgres}
+      - DATABASE_SSLMODE=${DATABASE_SSLMODE:-disable}
+      - REDIS_HOST=db.clicodeplus.com
+    depends_on: !reset {}
+  redis:
+    profiles:
+      - disabled
+YAML
+
+# 6) 启动 beta（独立 project，确保不影响现网）
+cd /root/sub2api-beta/deploy
+docker compose -p sub2api-beta --env-file .env -f docker-compose.yml -f docker-compose.override.yml up -d
+
+# 7) 验证 beta
+curl -fsS http://127.0.0.1:8084/health
+docker logs sub2api-beta --tail 50
+```
+
+### 数据库配置约定（beta）
+
+- 数据库地址/SSL/密码：与现网一致（从现网 `.env` 复制即可），均指向 `db.clicodeplus.com`。
+- 仅修改：
+  - `POSTGRES_USER=beta`
+  - `POSTGRES_DB=beta`
+  - `REDIS_DB=2`
+
+注意：需要数据库侧已存在 `beta` 用户与 `beta` 数据库，并授予权限；否则容器会启动失败并不断重启。
+
+### 更新 beta（本机构建 + 仅重启 beta 容器）
+
+```bash
+# 1) 生产服务器拉取代码并构建镜像
+ssh clicodeplus "cd /root/sub2api-beta && git fetch --all --tags && git checkout -f release/custom-0.1.71 && git reset --hard origin/release/custom-0.1.71"
+ssh clicodeplus "cd /root/sub2api-beta && docker buildx build --builder limited-builder --no-cache --load -t sub2api:beta -f Dockerfile ."
+# ⚠️ 必须看到构建成功输出
+
+# 2) 重启 beta 容器并验证
+ssh clicodeplus "cd /root/sub2api-beta/deploy && docker compose -p sub2api-beta --env-file .env -f docker-compose.yml -f docker-compose.override.yml up -d --no-deps --force-recreate sub2api"
+ssh clicodeplus "sleep 5 && curl -fsS http://127.0.0.1:8084/health"
+ssh clicodeplus "cat /root/sub2api-beta/backend/cmd/server/VERSION"
+```
+
+### 停止/回滚 beta（只影响 beta）
+
+```bash
+ssh clicodeplus "cd /root/sub2api-beta/deploy && docker compose -p sub2api-beta -f docker-compose.yml -f docker-compose.override.yml down"
+```
+
+---
+
+## 服务器首次部署
+
+### 1. 生产服务器：克隆代码并配置环境
+
+```bash
+ssh clicodeplus
+cd /root
+git clone https://github.com/Wei-Shaw/sub2api.git
+cd sub2api
+
+# 添加 fork 仓库
+git remote add fork https://github.com/touwaeriol/sub2api.git
+git fetch fork
+git checkout -B release/custom-0.1.69 fork/release/custom-0.1.69
+
+# 配置环境变量
+cd deploy
+cp .env.example .env
+vim .env  # 配置 DATABASE_HOST=db.clicodeplus.com, POSTGRES_PASSWORD, REDIS_PASSWORD, JWT_SECRET 等
+
+# 创建 override 文件（Redis 指向外部服务，去掉容器 Redis 依赖）
+cat > docker-compose.override.yml <<'YAML'
+services:
+  sub2api:
+    environment:
+      - REDIS_HOST=db.clicodeplus.com
+    depends_on: !reset {}
+  redis:
+    profiles:
+      - disabled
+YAML
+```
+
+### 2. 生产服务器：创建构建器并构建镜像
+
+```bash
+# 创建资源限制的构建器（首次执行一次即可）
+docker buildx create --name limited-builder --driver docker-container --driver-opt "default-load=true"
+docker buildx inspect --builder limited-builder --bootstrap
+docker update --cpus=3 --memory=4g --memory-swap=4g buildx_buildkit_limited-builder0
+
+# 构建镜像
+cd /root/sub2api
+docker buildx build --builder limited-builder --no-cache --load -t sub2api:latest -f Dockerfile .
+
+# 更新镜像标签并启动
+docker tag sub2api:latest weishaw/sub2api:latest
+cd /root/sub2api/deploy && docker compose up -d
+```
+
+### 3. 验证部署
+
+```bash
+# 查看应用日志
+docker logs sub2api --tail 50
+
+# 检查健康状态
+curl http://localhost:8080/health
+
+# 确认版本号
+cat /root/sub2api/backend/cmd/server/VERSION
+```
+
+### 4. 常用运维命令
+
+```bash
+# 查看实时日志
+docker logs -f sub2api
+
+# 重启服务
+docker compose restart sub2api
+
+# 停止所有服务
+docker compose down
+
+# 停止并删除数据卷（慎用！会删除数据库数据）
+docker compose down -v
+
+# 查看资源使用情况
+docker stats sub2api
+```
+
+---
+
+## 定制功能说明
+
+当前定制分支包含以下功能（相对于官方版本）：
+
+### UI/UX 定制
+
+| 功能 | 说明 |
+|------|------|
+| 首页优化 | 面向用户的价值主张设计 |
+| 移除 GitHub 链接 | 用户菜单中不显示 GitHub 导航 |
+| 微信客服按钮 | 首页悬浮微信客服入口 |
+| 限流时间精确显示 | 账号限流时间显示精确到秒 |
+
+### Antigravity 平台增强
+
+| 功能 | 说明 |
+|------|------|
+| Scope 级别限流 | 按配额域（claude/gemini_text/gemini_image）独立限流，避免整个账号被锁定 |
+| 模型级别限流 | 按具体模型（如 claude-opus-4-5）独立限流，更精细的限流控制 |
+| 限流预检查 | 调度时预检查账号/模型限流状态，避免选中已限流账号 |
+| 秒级冷却时间 | 支持 429 响应的秒级精确冷却时间 |
+| 身份注入优化 | 模型身份信息注入 + 静默边界防止身份泄露 |
+| thoughtSignature 修复 | Gemini 3 函数调用 400 错误修复 |
+| max_tokens 自动修正 | 自动修正 max_tokens <= budget_tokens 导致的 400 错误 |
+
+### 调度算法优化
+
+| 功能 | 说明 |
+|------|------|
+| 分层过滤选择 | 调度算法从全排序改为分层过滤，提升性能 |
+| LRU 随机选择 | 相同 LRU 时间时随机选择，避免账号集中 |
+| 限流等待阈值配置化 | 可配置的限流等待阈值 |
+
+### 运维增强
+
+| 功能 | 说明 |
+|------|------|
+| Scope 限流统计 | 运维界面展示 Antigravity 账号 scope 级别限流统计 |
+| 账号限流状态显示 | 账号列表显示 scope 和模型级别限流状态 |
+| 清除限流按钮增强 | 有 scope/模型限流时也显示清除限流按钮 |
+
+### 其他修复
+
+| 功能 | 说明 |
+|------|------|
+| .gitattributes | 确保迁移文件使用 LF 换行符（解决 Windows 下 SQL 摘要不一致） |
+| 部署配置优化 | DATABASE_HOST 和 DATABASE_SSLMODE 可通过 .env 配置 |
+
+---
+
+## Admin API 接口文档
+
+### ⚠️ API 操作流程规范
+
+当收到操作正式环境 Web 界面的新需求，但文档中未记录对应 API 接口时，**必须按以下流程执行**：
+
+1. **探索接口**：通过代码库搜索路由定义（`backend/internal/server/routes/`）、Handler（`backend/internal/handler/admin/`）和请求结构体，确定正确的 API 端点、请求方法、请求体格式
+2. **更新文档**：将新发现的接口补充到本文档的 Admin API 接口文档章节中，包含端点、参数说明和 curl 示例
+3. **执行操作**：根据最新文档中记录的接口完成用户需求
+
+> **目的**：避免每次遇到相同需求都重复探索代码库，确保 API 文档持续完善，后续操作可直接查阅文档执行。
+
+---
+
+### 认证方式
+
+所有 Admin API 通过 `x-api-key` 请求头传递 Admin API Key 认证。
+
+```
+x-api-key: admin-xxx
+```
+
+> **使用说明**：Admin API Key 统一存放在项目根目录 `.env` 文件的 `ADMIN_API_KEY` 变量中（该文件已被 `.gitignore` 排除，不会提交到代码库）。操作前先从 `.env` 读取密钥；若密钥失效（返回 401），应提示用户提供新的密钥并更新到 `.env` 中。Token 格式为 `admin-` + 64 位十六进制字符，在管理后台 `设置 > Admin API Key` 中生成。**请勿将实际 token 写入文档或代码中。**
+
+### 环境地址
+
+| 环境 | 基础地址 | 说明 |
+|------|----------|------|
+| 正式 | `https://clicodeplus.com` | 生产环境 |
+| Beta | `http://<服务器IP>:8084` | 仅内网访问 |
+| OpenAI | `http://<服务器IP>:8083` | 仅内网访问 |
+| Star | `https://hyntoken.com` | 独立环境 |
+
+> 以下接口文档中，`${BASE}` 代表环境基础地址，`${KEY}` 代表 `.env` 中的 `ADMIN_API_KEY`。操作前执行 `source .env` 或 `export KEY=$ADMIN_API_KEY` 加载。
+
+---
+
+### 1. 账号管理
+
+#### 1.1 获取账号列表
+
+```
+GET /api/v1/admin/accounts
+```
+
+**查询参数**：
+
+| 参数 | 类型 | 必填 | 说明 |
+|------|------|------|------|
+| `platform` | string | 否 | 平台筛选：`antigravity` / `anthropic` / `openai` / `gemini` |
+| `type` | string | 否 | 账号类型：`oauth` / `api_key` / `cookie` |
+| `status` | string | 否 | 状态：`active` / `disabled` / `error` |
+| `search` | string | 否 | 搜索关键词（名称、备注） |
+| `page` | int | 否 | 页码，默认 1 |
+| `page_size` | int | 否 | 每页数量，默认 20 |
+
+```bash
+curl -s "${BASE}/api/v1/admin/accounts?platform=antigravity&page=1&page_size=100" \
+  -H "x-api-key: ${KEY}"
+```
+
+**响应**：
+```json
+{
+  "code": 0,
+  "message": "success",
+  "data": {
+    "items": [{"id": 1, "name": "xxx@gmail.com", "platform": "antigravity", "status": "active", ...}],
+    "total": 66
+  }
+}
+```
+
+#### 1.2 获取账号详情
+
+```
+GET /api/v1/admin/accounts/:id
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/accounts/1" -H "x-api-key: ${KEY}"
+```
+
+#### 1.3 测试账号连接
+
+```
+POST /api/v1/admin/accounts/:id/test
+```
+
+**请求体**（JSON，可选）：
+
+| 字段 | 类型 | 必填 | 说明 |
+|------|------|------|------|
+| `model_id` | string | 否 | 指定测试模型，如 `claude-opus-4-6`；不传则使用默认模型 |
+
+**响应格式**：SSE（Server-Sent Events）流
+
+```bash
+curl -N -X POST "${BASE}/api/v1/admin/accounts/1/test" \
+  -H "x-api-key: ${KEY}" \
+  -H "Content-Type: application/json" \
+  -d '{"model_id": "claude-opus-4-6"}'
+```
+
+**SSE 事件类型**：
+
+| type | 字段 | 说明 |
+|------|------|------|
+| `test_start` | `model` | 测试开始，返回测试模型名 |
+| `content` | `text` | 模型响应内容（流式文本片段） |
+| `test_end` | `success`, `error` | 测试结束，`success=true` 表示成功 |
+| `error` | `text` | 错误信息 |
+
+#### 1.4 清除账号限流
+
+```
+POST /api/v1/admin/accounts/:id/clear-rate-limit
+```
+
+```bash
+curl -X POST "${BASE}/api/v1/admin/accounts/1/clear-rate-limit" \
+  -H "x-api-key: ${KEY}"
+```
+
+#### 1.5 清除账号错误状态
+
+```
+POST /api/v1/admin/accounts/:id/clear-error
+```
+
+```bash
+curl -X POST "${BASE}/api/v1/admin/accounts/1/clear-error" \
+  -H "x-api-key: ${KEY}"
+```
+
+#### 1.6 获取账号可用模型
+
+```
+GET /api/v1/admin/accounts/:id/models
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/accounts/1/models" -H "x-api-key: ${KEY}"
+```
+
+#### 1.7 刷新 OAuth Token
+
+```
+POST /api/v1/admin/accounts/:id/refresh
+```
+
+```bash
+curl -X POST "${BASE}/api/v1/admin/accounts/1/refresh" -H "x-api-key: ${KEY}"
+```
+
+#### 1.8 刷新账号等级
+
+```
+POST /api/v1/admin/accounts/:id/refresh-tier
+```
+
+```bash
+curl -X POST "${BASE}/api/v1/admin/accounts/1/refresh-tier" -H "x-api-key: ${KEY}"
+```
+
+#### 1.9 获取账号统计
+
+```
+GET /api/v1/admin/accounts/:id/stats
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/accounts/1/stats" -H "x-api-key: ${KEY}"
+```
+
+#### 1.10 获取账号用量
+
+```
+GET /api/v1/admin/accounts/:id/usage
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/accounts/1/usage" -H "x-api-key: ${KEY}"
+```
+
+#### 1.11 更新单个账号
+
+```
+PUT /api/v1/admin/accounts/:id
+```
+
+**请求体**（JSON，所有字段均为可选，仅传需要更新的字段）：
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `name` | string | 账号名称 |
+| `notes` | *string | 备注 |
+| `type` | string | 类型：`oauth` / `setup-token` / `apikey` / `upstream` |
+| `credentials` | object | 凭证信息 |
+| `extra` | object | 额外配置 |
+| `proxy_id` | *int64 | 代理 ID |
+| `concurrency` | *int | 并发数 |
+| `priority` | *int | 优先级（默认 50） |
+| `rate_multiplier` | *float64 | 速率倍数 |
+| `status` | string | 状态：`active` / `inactive` |
+| `group_ids` | *[]int64 | 分组 ID 列表 |
+| `expires_at` | *int64 | 过期时间戳 |
+| `auto_pause_on_expired` | *bool | 过期后自动暂停 |
+
+> 使用指针类型（`*`）的字段可以区分"未提供"和"设置为零值"。
+
+```bash
+# 示例：更新账号优先级为 100
+curl -X PUT "${BASE}/api/v1/admin/accounts/1" \
+  -H "x-api-key: ${KEY}" \
+  -H "Content-Type: application/json" \
+  -d '{"priority": 100}'
+```
+
+#### 1.12 批量更新账号
+
+```
+POST /api/v1/admin/accounts/bulk-update
+```
+
+**请求体**（JSON）：
+
+| 字段 | 类型 | 必填 | 说明 |
+|------|------|------|------|
+| `account_ids` | []int64 | **是** | 要更新的账号 ID 列表 |
+| `priority` | *int | 否 | 优先级 |
+| `concurrency` | *int | 否 | 并发数 |
+| `rate_multiplier` | *float64 | 否 | 速率倍数 |
+| `status` | string | 否 | 状态：`active` / `inactive` / `error` |
+| `schedulable` | *bool | 否 | 是否可调度 |
+| `group_ids` | *[]int64 | 否 | 分组 ID 列表 |
+| `proxy_id` | *int64 | 否 | 代理 ID |
+| `credentials` | object | 否 | 凭证信息（批量覆盖） |
+| `extra` | object | 否 | 额外配置（批量覆盖） |
+
+```bash
+# 示例：批量设置多个账号优先级为 100
+curl -X POST "${BASE}/api/v1/admin/accounts/bulk-update" \
+  -H "x-api-key: ${KEY}" \
+  -H "Content-Type: application/json" \
+  -d '{"account_ids": [1, 2, 3], "priority": 100}'
+```
+
+#### 1.13 批量测试账号（脚本）
+
+批量测试指定平台所有账号的指定模型连通性：
+
+```bash
+# 用户需提供：BASE（环境地址）、KEY（admin token）、MODEL（测试模型）
+ACCOUNT_IDS=$(curl -s "${BASE}/api/v1/admin/accounts?platform=antigravity&page=1&page_size=100" \
+  -H "x-api-key: ${KEY}" | python3 -c "
+import json, sys
+data = json.load(sys.stdin)
+for item in data['data']['items']:
+    print(f\"{item['id']}|{item['name']}\")
+")
+
+while IFS='|' read -r ID NAME; do
+    echo "测试账号 ID=${ID} (${NAME})..."
+    RESPONSE=$(curl -s --max-time 60 -N \
+      -X POST "${BASE}/api/v1/admin/accounts/${ID}/test" \
+      -H "x-api-key: ${KEY}" \
+      -H "Content-Type: application/json" \
+      -d "{\"model_id\": \"${MODEL}\"}" 2>&1)
+    if echo "$RESPONSE" | grep -q '"success":true'; then
+        echo "  ✅ 成功"
+    elif echo "$RESPONSE" | grep -q '"type":"content"'; then
+        echo "  ✅ 成功（有内容响应）"
+    else
+        ERROR_MSG=$(echo "$RESPONSE" | grep -o '"error":"[^"]*"' | tail -1)
+        echo "  ❌ 失败: ${ERROR_MSG}"
+    fi
+done <<< "$ACCOUNT_IDS"
+```
+
+---
+
+### 2. 运维监控
+
+#### 2.1 并发统计
+
+```
+GET /api/v1/admin/ops/concurrency
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/ops/concurrency" -H "x-api-key: ${KEY}"
+```
+
+#### 2.2 账号可用性
+
+```
+GET /api/v1/admin/ops/account-availability
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/ops/account-availability" -H "x-api-key: ${KEY}"
+```
+
+#### 2.3 实时流量摘要
+
+```
+GET /api/v1/admin/ops/realtime-traffic
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/ops/realtime-traffic" -H "x-api-key: ${KEY}"
+```
+
+#### 2.4 请求错误列表
+
+```
+GET /api/v1/admin/ops/request-errors
+```
+
+**查询参数**：`page`、`page_size`
+
+```bash
+curl -s "${BASE}/api/v1/admin/ops/request-errors?page=1&page_size=50" \
+  -H "x-api-key: ${KEY}"
+```
+
+#### 2.5 上游错误列表
+
+```
+GET /api/v1/admin/ops/upstream-errors
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/ops/upstream-errors?page=1&page_size=50" \
+  -H "x-api-key: ${KEY}"
+```
+
+#### 2.6 仪表板概览
+
+```
+GET /api/v1/admin/ops/dashboard/overview
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/ops/dashboard/overview" -H "x-api-key: ${KEY}"
+```
+
+---
+
+### 3. 系统设置
+
+#### 3.1 获取系统设置
+
+```
+GET /api/v1/admin/settings
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/settings" -H "x-api-key: ${KEY}"
+```
+
+#### 3.2 更新系统设置
+
+```
+PUT /api/v1/admin/settings
+```
+
+```bash
+curl -X PUT "${BASE}/api/v1/admin/settings" \
+  -H "x-api-key: ${KEY}" \
+  -H "Content-Type: application/json" \
+  -d '{ ... }'
+```
+
+#### 3.3 Admin API Key 状态（脱敏）
+
+```
+GET /api/v1/admin/settings/admin-api-key
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/settings/admin-api-key" -H "x-api-key: ${KEY}"
+```
+
+---
+
+### 4. 用户管理
+
+#### 4.1 用户列表
+
+```
+GET /api/v1/admin/users
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/users?page=1&page_size=20" -H "x-api-key: ${KEY}"
+```
+
+#### 4.2 用户详情
+
+```
+GET /api/v1/admin/users/:id
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/users/1" -H "x-api-key: ${KEY}"
+```
+
+#### 4.3 更新用户余额
+
+```
+POST /api/v1/admin/users/:id/balance
+```
+
+```bash
+curl -X POST "${BASE}/api/v1/admin/users/1/balance" \
+  -H "x-api-key: ${KEY}" \
+  -H "Content-Type: application/json" \
+  -d '{"amount": 100, "reason": "充值"}'
+```
+
+---
+
+### 5. 分组管理
+
+#### 5.1 分组列表
+
+```
+GET /api/v1/admin/groups
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/groups" -H "x-api-key: ${KEY}"
+```
+
+#### 5.2 所有分组（不分页）
+
+```
+GET /api/v1/admin/groups/all
+```
+
+```bash
+curl -s "${BASE}/api/v1/admin/groups/all" -H "x-api-key: ${KEY}"
+```
+
+---
+
+## 注意事项
+
+1. **前端必须打包进镜像**：使用 `docker buildx build --builder limited-builder` 在生产服务器（`clicodeplus`）本机构建，Dockerfile 会自动编译前端并 embed 到后端二进制中
+
+2. **镜像标签**：docker-compose.yml 使用 `weishaw/sub2api:latest`，本地构建后需要 `docker tag` 覆盖
+
+3. **Windows 换行符问题**：已通过 `.gitattributes` 解决，确保 `*.sql` 文件始终使用 LF
+
+4. **版本号管理**：每次发布必须更新 `backend/cmd/server/VERSION` 并打标签
+
+5. **合并冲突**：合并上游新版本时，重点关注以下文件可能的冲突：
+   - `backend/internal/service/antigravity_gateway_service.go`
+   - `backend/internal/service/gateway_service.go`
+   - `backend/internal/pkg/antigravity/request_transformer.go`
+
+---
+
+## Go 代码规范
+
+### 1. 函数设计
+
+#### 单一职责原则
+- **函数行数**：单个函数常规不应超过 **30 行**，超过时应拆分为子函数。若某段逻辑确实不可拆分（如复杂的状态机、协议解析等），可以例外，但需添加注释说明原因
+- **嵌套层级**：避免超过 3 层嵌套，使用 early return 减少嵌套
+
+```go
+// ❌ 不推荐：深层嵌套
+func process(data []Item) {
+    for _, item := range data {
+        if item.Valid {
+            if item.Type == "A" {
+                if item.Status == "active" {
+                    // 业务逻辑...
+                }
+            }
+        }
+    }
+}
+
+// ✅ 推荐：early return
+func process(data []Item) {
+    for _, item := range data {
+        if !item.Valid {
+            continue
+        }
+        if item.Type != "A" {
+            continue
+        }
+        if item.Status != "active" {
+            continue
+        }
+        // 业务逻辑...
+    }
+}
+```
+
+#### 复杂逻辑提取
+将复杂的条件判断或处理逻辑提取为独立函数：
+
+```go
+// ❌ 不推荐：内联复杂逻辑
+if resp.StatusCode == 429 || resp.StatusCode == 503 {
+    // 80+ 行处理逻辑...
+}
+
+// ✅ 推荐：提取为独立函数
+result := handleRateLimitResponse(resp, params)
+switch result.action {
+case actionRetry:
+    continue
+case actionBreak:
+    return result.resp, nil
+}
+```
+
+### 2. 重复代码消除
+
+#### 配置获取模式
+将重复的配置获取逻辑提取为方法：
+
+```go
+// ❌ 不推荐：重复代码
+logBody := s.settingService != nil && s.settingService.cfg != nil && s.settingService.cfg.Gateway.LogUpstreamErrorBody
+maxBytes := 2048
+if s.settingService != nil && s.settingService.cfg != nil && s.settingService.cfg.Gateway.LogUpstreamErrorBodyMaxBytes > 0 {
+    maxBytes = s.settingService.cfg.Gateway.LogUpstreamErrorBodyMaxBytes
+}
+
+// ✅ 推荐：提取为方法
+func (s *Service) getLogConfig() (logBody bool, maxBytes int) {
+    maxBytes = 2048
+    if s.settingService == nil || s.settingService.cfg == nil {
+        return false, maxBytes
+    }
+    cfg := s.settingService.cfg.Gateway
+    if cfg.LogUpstreamErrorBodyMaxBytes > 0 {
+        maxBytes = cfg.LogUpstreamErrorBodyMaxBytes
+    }
+    return cfg.LogUpstreamErrorBody, maxBytes
+}
+```
+
+### 3. 常量管理
+
+#### 避免魔法数字
+所有硬编码的数值都应定义为常量：
+
+```go
+// ❌ 不推荐
+if retryDelay >= 10*time.Second {
+    resetAt := time.Now().Add(30 * time.Second)
+}
+
+// ✅ 推荐
+const (
+    rateLimitThreshold       = 10 * time.Second
+    defaultRateLimitDuration = 30 * time.Second
+)
+
+if retryDelay >= rateLimitThreshold {
+    resetAt := time.Now().Add(defaultRateLimitDuration)
+}
+```
+
+#### 注释引用常量名
+在注释中引用常量名而非硬编码值：
+
+```go
+// ❌ 不推荐
+// < 10s: 等待后重试
+
+// ✅ 推荐
+// < rateLimitThreshold: 等待后重试
+```
+
+### 4. 错误处理
+
+#### 使用结构化日志
+优先使用 `slog` 进行结构化日志记录：
+
+```go
+// ❌ 不推荐
+log.Printf("%s status=%d model_rate_limit_failed model=%s error=%v", prefix, statusCode, modelName, err)
+
+// ✅ 推荐
+slog.Error("failed to set model rate limit",
+    "prefix", prefix,
+    "status_code", statusCode,
+    "model", modelName,
+    "error", err,
+)
+```
+
+### 5. 测试规范
+
+#### Mock 函数签名同步
+修改函数签名时，必须同步更新所有测试中的 mock 函数：
+
+```go
+// 如果修改了 handleError 签名
+handleError func(..., groupID int64, sessionHash string) *Result
+
+// 必须同步更新测试中的 mock
+handleError: func(..., groupID int64, sessionHash string) *Result {
+    return nil
+},
+```
+
+#### 测试构建标签
+统一使用测试构建标签：
+
+```go
+//go:build unit
+
+package service
+```
+
+### 6. 时间格式解析
+
+#### 使用标准库
+优先使用 `time.ParseDuration`，支持所有 Go duration 格式：
+
+```go
+// ❌ 不推荐：手动限制格式
+if !strings.HasSuffix(delay, "s") || strings.Contains(delay, "m") {
+    continue
+}
+
+// ✅ 推荐：使用标准库
+dur, err := time.ParseDuration(delay) // 支持 "0.5s", "4m50s", "1h30m" 等
+```
+
+### 7. 接口设计
+
+#### 接口隔离原则
+定义最小化接口，只包含必需的方法：
+
+```go
+// ❌ 不推荐：使用过于宽泛的接口
+type AccountRepository interface {
+    // 20+ 个方法...
+}
+
+// ✅ 推荐：定义最小化接口
+type ModelRateLimiter interface {
+    SetModelRateLimit(ctx context.Context, id int64, modelKey string, resetAt time.Time) error
+}
+```
+
+### 8. 并发安全
+
+#### 共享数据保护
+访问可能被并发修改的数据时，确保线程安全：
+
+```go
+// 如果 Account.Extra 可能被并发修改
+// 需要使用互斥锁或原子操作保护读取
+func (a *Account) GetRateLimitRemainingTime(model string) time.Duration {
+    a.mu.RLock()
+    defer a.mu.RUnlock()
+    // 读取 Extra 字段...
+}
+```
+
+### 9. 命名规范
+
+#### 一致的命名风格
+- 常量使用 camelCase：`rateLimitThreshold`
+- 类型使用 PascalCase：`AntigravityQuotaScope`
+- 同一概念使用统一命名：`Threshold` 或 `Limit`，不要混用
+
+```go
+// ❌ 不推荐：命名不一致
+antigravitySmartRetryMinWait    // 使用 Min
+antigravityRateLimitThreshold   // 使用 Threshold
+
+// ✅ 推荐：统一风格
+antigravityMinRetryWait
+antigravityRateLimitThreshold
+```
+
+### 10. 代码审查清单
+
+在提交代码前，检查以下项目：
+
+- [ ] 函数是否超过 30 行？（不可拆分的逻辑除外，需注释说明）
+- [ ] 嵌套是否超过 3 层？
+- [ ] 是否有重复代码可以提取？
+- [ ] 是否使用了魔法数字？
+- [ ] Mock 函数签名是否与实际函数一致？
+- [ ] 测试是否覆盖了新增逻辑？
+- [ ] 日志是否包含足够的上下文信息？
+- [ ] 是否考虑了并发安全？
+
+---
+
+## CI 检查与发布门禁
+
+### GitHub Actions 检查项
+
+本项目有 4 个 CI 任务，**任何代码推送或发布前都必须全部通过**：
+
+| Workflow | Job | 说明 | 本地验证命令 |
+|----------|-----|------|-------------|
+| CI | `test` | 单元测试 + 集成测试 | `cd backend && make test-unit && make test-integration` |
+| CI | `golangci-lint` | Go 代码静态检查（golangci-lint v2.7） | `cd backend && golangci-lint run --timeout=5m` |
+| Security Scan | `backend-security` | govulncheck + gosec 安全扫描 | `cd backend && govulncheck ./... && gosec -severity high -confidence high ./...` |
+| Security Scan | `frontend-security` | pnpm audit 前端依赖安全检查 | `cd frontend && pnpm audit --prod --audit-level=high` |
+
+### 向上游提交 PR
+
+PR 目标是上游官方仓库，**只包含通用功能改动**（bug fix、新功能、性能优化等）。
+
+**以下文件禁止出现在 PR 中**（属于我们 fork 的定制化内容）：
+- `CLAUDE.md`、`AGENTS.md` — 我们的开发文档
+- `backend/cmd/server/VERSION` — 我们的版本号文件
+- UI 定制改动（GitHub 链接移除、微信客服按钮、首页定制等）
+- 部署配置（`deploy/` 目录下的定制修改）
+
+**PR 流程**：
+1. 从 `develop` 创建功能分支，只包含要提交给上游的改动
+2. 推送分支后，**等待 4 个 CI job 全部通过**
+3. 确认通过后再创建 PR
+4. 使用 `gh run list --repo touwaeriol/sub2api --branch <branch>` 检查状态
+
+### 自有分支推送（develop / main）
+
+推送到我们自己的 `develop` 或 `main` 分支时，包含所有改动（定制化 + 通用功能）。
+
+**推送前必须在本地执行全部 CI 检查**（不要等 GitHub Actions）：
+
+```bash
+# 确保 Go 工具链可用（macOS homebrew）
+export PATH="/opt/homebrew/bin:$HOME/go/bin:$PATH"
+
+# 1. 单元测试（必须）
+cd backend && make test-unit
+
+# 2. 集成测试（推荐，需要 Docker）
+make test-integration
+
+# 3. golangci-lint 静态检查（必须）
+golangci-lint run --timeout=5m
+
+# 4. gofmt 格式检查（必须）
+gofmt -l ./...
+# 如果有输出，运行 gofmt -w <file> 修复
+```
+
+**推送后确认**：
+1. 使用 `gh run list --repo touwaeriol/sub2api --branch <branch>` 检查 GitHub Actions 状态
+2. 确认 CI 和 Security Scan 两个 workflow 的 4 个 job 全部绿色 ✅
+3. 任何 job 失败必须立即修复，**禁止在 CI 未通过的状态下继续后续操作**
+
+### 发布版本
+
+1. 本地执行上述全部 CI 检查通过
+2. 递增 `backend/cmd/server/VERSION`，提交并推送
+3. 推送后确认 GitHub Actions 的 4 个 CI job 全部通过
+4. **CI 未通过时禁止部署** — 必须先修复问题
+5. 使用 `gh run list --repo touwaeriol/sub2api --limit 10` 确认状态
+
+### 常见 CI 失败原因及修复
+- **gofmt**：struct 字段对齐不一致 → 运行 `gofmt -w <file>` 修复
+- **golangci-lint**：未使用的变量/导入 → 删除或使用 `_` 忽略
+- **test 失败**：mock 函数签名不一致 → 同步更新 mock
+- **gosec**：安全漏洞 → 根据提示修复或添加例外
+
+---
+
+## PR 描述格式规范
+
+所有 PR 描述使用中英文同步（先中文、后英文），包含以下三个部分：
+
+### 模板
+
+```markdown
+## 背景 / Background
+
+<一两句说明问题现状或触发原因>
+
+<English version of the background>
+
+---
+
+## 目的 / Purpose
+
+<本次改动要解决的问题或达到的目标>
+
+<English version of the purpose>
+
+---
+
+## 改动内容 / Changes
+
+### 后端 / Backend
+
+- **改动点 1**：说明
+- **改动点 2**：说明
+
+---
+
+- **Change 1**: description
+- **Change 2**: description
+
+### 前端 / Frontend
+
+- **改动点 1**：说明
+- **改动点 2**：说明
+
+---
+
+- **Change 1**: description
+- **Change 2**: description
+
+---
+
+## 截图 / Screenshot（可选）
+
+ASCII 示意图或实际截图
+```
+
+### 规范要点
+
+- **标题**：使用 conventional commits 格式，如 `feat(scope): description`
+- **中英文顺序**：同一段落先中文后英文，用空行分隔，不用 `---` 分割同段内容
+- **改动分类**：按 Backend / Frontend / Config 等模块分组，先列中文要点再列英文要点
+- **截图/示意图**：有 UI 变动时必须附上，可用 ASCII 示意布局
+- **目标分支**：提交到 `touwaeriol/sub2api` 的 `main` 分支
diff --git a/backend/cmd/server/VERSION b/backend/cmd/server/VERSION
index 32844913..592fd9b9 100644
--- a/backend/cmd/server/VERSION
+++ b/backend/cmd/server/VERSION
@@ -1 +1 @@
-0.1.88
\ No newline at end of file
+0.1.90.9
diff --git a/backend/ent/account.go b/backend/ent/account.go
index c77002b3..2dbfc3a2 100644
--- a/backend/ent/account.go
+++ b/backend/ent/account.go
@@ -41,6 +41,8 @@ type Account struct {
 	ProxyID *int64 `json:"proxy_id,omitempty"`
 	// Concurrency holds the value of the "concurrency" field.
 	Concurrency int `json:"concurrency,omitempty"`
+	// LoadFactor holds the value of the "load_factor" field.
+	LoadFactor *int `json:"load_factor,omitempty"`
 	// Priority holds the value of the "priority" field.
 	Priority int `json:"priority,omitempty"`
 	// RateMultiplier holds the value of the "rate_multiplier" field.
@@ -143,7 +145,7 @@ func (*Account) scanValues(columns []string) ([]any, error) {
 			values[i] = new(sql.NullBool)
 		case account.FieldRateMultiplier:
 			values[i] = new(sql.NullFloat64)
-		case account.FieldID, account.FieldProxyID, account.FieldConcurrency, account.FieldPriority:
+		case account.FieldID, account.FieldProxyID, account.FieldConcurrency, account.FieldLoadFactor, account.FieldPriority:
 			values[i] = new(sql.NullInt64)
 		case account.FieldName, account.FieldNotes, account.FieldPlatform, account.FieldType, account.FieldStatus, account.FieldErrorMessage, account.FieldTempUnschedulableReason, account.FieldSessionWindowStatus:
 			values[i] = new(sql.NullString)
@@ -243,6 +245,13 @@ func (_m *Account) assignValues(columns []string, values []any) error {
 			} else if value.Valid {
 				_m.Concurrency = int(value.Int64)
 			}
+		case account.FieldLoadFactor:
+			if value, ok := values[i].(*sql.NullInt64); !ok {
+				return fmt.Errorf("unexpected type %T for field load_factor", values[i])
+			} else if value.Valid {
+				_m.LoadFactor = new(int)
+				*_m.LoadFactor = int(value.Int64)
+			}
 		case account.FieldPriority:
 			if value, ok := values[i].(*sql.NullInt64); !ok {
 				return fmt.Errorf("unexpected type %T for field priority", values[i])
@@ -445,6 +454,11 @@ func (_m *Account) String() string {
 	builder.WriteString("concurrency=")
 	builder.WriteString(fmt.Sprintf("%v", _m.Concurrency))
 	builder.WriteString(", ")
+	if v := _m.LoadFactor; v != nil {
+		builder.WriteString("load_factor=")
+		builder.WriteString(fmt.Sprintf("%v", *v))
+	}
+	builder.WriteString(", ")
 	builder.WriteString("priority=")
 	builder.WriteString(fmt.Sprintf("%v", _m.Priority))
 	builder.WriteString(", ")
diff --git a/backend/ent/account/account.go b/backend/ent/account/account.go
index 1fc34620..4c134649 100644
--- a/backend/ent/account/account.go
+++ b/backend/ent/account/account.go
@@ -37,6 +37,8 @@ const (
 	FieldProxyID = "proxy_id"
 	// FieldConcurrency holds the string denoting the concurrency field in the database.
 	FieldConcurrency = "concurrency"
+	// FieldLoadFactor holds the string denoting the load_factor field in the database.
+	FieldLoadFactor = "load_factor"
 	// FieldPriority holds the string denoting the priority field in the database.
 	FieldPriority = "priority"
 	// FieldRateMultiplier holds the string denoting the rate_multiplier field in the database.
@@ -121,6 +123,7 @@ var Columns = []string{
 	FieldExtra,
 	FieldProxyID,
 	FieldConcurrency,
+	FieldLoadFactor,
 	FieldPriority,
 	FieldRateMultiplier,
 	FieldStatus,
@@ -250,6 +253,11 @@ func ByConcurrency(opts ...sql.OrderTermOption) OrderOption {
 	return sql.OrderByField(FieldConcurrency, opts...).ToFunc()
 }
 
+// ByLoadFactor orders the results by the load_factor field.
+func ByLoadFactor(opts ...sql.OrderTermOption) OrderOption {
+	return sql.OrderByField(FieldLoadFactor, opts...).ToFunc()
+}
+
 // ByPriority orders the results by the priority field.
 func ByPriority(opts ...sql.OrderTermOption) OrderOption {
 	return sql.OrderByField(FieldPriority, opts...).ToFunc()
diff --git a/backend/ent/account/where.go b/backend/ent/account/where.go
index 54db1dcb..3749b45c 100644
--- a/backend/ent/account/where.go
+++ b/backend/ent/account/where.go
@@ -100,6 +100,11 @@ func Concurrency(v int) predicate.Account {
 	return predicate.Account(sql.FieldEQ(FieldConcurrency, v))
 }
 
+// LoadFactor applies equality check predicate on the "load_factor" field. It's identical to LoadFactorEQ.
+func LoadFactor(v int) predicate.Account {
+	return predicate.Account(sql.FieldEQ(FieldLoadFactor, v))
+}
+
 // Priority applies equality check predicate on the "priority" field. It's identical to PriorityEQ.
 func Priority(v int) predicate.Account {
 	return predicate.Account(sql.FieldEQ(FieldPriority, v))
@@ -650,6 +655,56 @@ func ConcurrencyLTE(v int) predicate.Account {
 	return predicate.Account(sql.FieldLTE(FieldConcurrency, v))
 }
 
+// LoadFactorEQ applies the EQ predicate on the "load_factor" field.
+func LoadFactorEQ(v int) predicate.Account {
+	return predicate.Account(sql.FieldEQ(FieldLoadFactor, v))
+}
+
+// LoadFactorNEQ applies the NEQ predicate on the "load_factor" field.
+func LoadFactorNEQ(v int) predicate.Account {
+	return predicate.Account(sql.FieldNEQ(FieldLoadFactor, v))
+}
+
+// LoadFactorIn applies the In predicate on the "load_factor" field.
+func LoadFactorIn(vs ...int) predicate.Account {
+	return predicate.Account(sql.FieldIn(FieldLoadFactor, vs...))
+}
+
+// LoadFactorNotIn applies the NotIn predicate on the "load_factor" field.
+func LoadFactorNotIn(vs ...int) predicate.Account {
+	return predicate.Account(sql.FieldNotIn(FieldLoadFactor, vs...))
+}
+
+// LoadFactorGT applies the GT predicate on the "load_factor" field.
+func LoadFactorGT(v int) predicate.Account {
+	return predicate.Account(sql.FieldGT(FieldLoadFactor, v))
+}
+
+// LoadFactorGTE applies the GTE predicate on the "load_factor" field.
+func LoadFactorGTE(v int) predicate.Account {
+	return predicate.Account(sql.FieldGTE(FieldLoadFactor, v))
+}
+
+// LoadFactorLT applies the LT predicate on the "load_factor" field.
+func LoadFactorLT(v int) predicate.Account {
+	return predicate.Account(sql.FieldLT(FieldLoadFactor, v))
+}
+
+// LoadFactorLTE applies the LTE predicate on the "load_factor" field.
+func LoadFactorLTE(v int) predicate.Account {
+	return predicate.Account(sql.FieldLTE(FieldLoadFactor, v))
+}
+
+// LoadFactorIsNil applies the IsNil predicate on the "load_factor" field.
+func LoadFactorIsNil() predicate.Account {
+	return predicate.Account(sql.FieldIsNull(FieldLoadFactor))
+}
+
+// LoadFactorNotNil applies the NotNil predicate on the "load_factor" field.
+func LoadFactorNotNil() predicate.Account {
+	return predicate.Account(sql.FieldNotNull(FieldLoadFactor))
+}
+
 // PriorityEQ applies the EQ predicate on the "priority" field.
 func PriorityEQ(v int) predicate.Account {
 	return predicate.Account(sql.FieldEQ(FieldPriority, v))
diff --git a/backend/ent/account_create.go b/backend/ent/account_create.go
index 963ffee8..d6046c79 100644
--- a/backend/ent/account_create.go
+++ b/backend/ent/account_create.go
@@ -139,6 +139,20 @@ func (_c *AccountCreate) SetNillableConcurrency(v *int) *AccountCreate {
 	return _c
 }
 
+// SetLoadFactor sets the "load_factor" field.
+func (_c *AccountCreate) SetLoadFactor(v int) *AccountCreate {
+	_c.mutation.SetLoadFactor(v)
+	return _c
+}
+
+// SetNillableLoadFactor sets the "load_factor" field if the given value is not nil.
+func (_c *AccountCreate) SetNillableLoadFactor(v *int) *AccountCreate {
+	if v != nil {
+		_c.SetLoadFactor(*v)
+	}
+	return _c
+}
+
 // SetPriority sets the "priority" field.
 func (_c *AccountCreate) SetPriority(v int) *AccountCreate {
 	_c.mutation.SetPriority(v)
@@ -623,6 +637,10 @@ func (_c *AccountCreate) createSpec() (*Account, *sqlgraph.CreateSpec) {
 		_spec.SetField(account.FieldConcurrency, field.TypeInt, value)
 		_node.Concurrency = value
 	}
+	if value, ok := _c.mutation.LoadFactor(); ok {
+		_spec.SetField(account.FieldLoadFactor, field.TypeInt, value)
+		_node.LoadFactor = &value
+	}
 	if value, ok := _c.mutation.Priority(); ok {
 		_spec.SetField(account.FieldPriority, field.TypeInt, value)
 		_node.Priority = value
@@ -936,6 +954,30 @@ func (u *AccountUpsert) AddConcurrency(v int) *AccountUpsert {
 	return u
 }
 
+// SetLoadFactor sets the "load_factor" field.
+func (u *AccountUpsert) SetLoadFactor(v int) *AccountUpsert {
+	u.Set(account.FieldLoadFactor, v)
+	return u
+}
+
+// UpdateLoadFactor sets the "load_factor" field to the value that was provided on create.
+func (u *AccountUpsert) UpdateLoadFactor() *AccountUpsert {
+	u.SetExcluded(account.FieldLoadFactor)
+	return u
+}
+
+// AddLoadFactor adds v to the "load_factor" field.
+func (u *AccountUpsert) AddLoadFactor(v int) *AccountUpsert {
+	u.Add(account.FieldLoadFactor, v)
+	return u
+}
+
+// ClearLoadFactor clears the value of the "load_factor" field.
+func (u *AccountUpsert) ClearLoadFactor() *AccountUpsert {
+	u.SetNull(account.FieldLoadFactor)
+	return u
+}
+
 // SetPriority sets the "priority" field.
 func (u *AccountUpsert) SetPriority(v int) *AccountUpsert {
 	u.Set(account.FieldPriority, v)
@@ -1419,6 +1461,34 @@ func (u *AccountUpsertOne) UpdateConcurrency() *AccountUpsertOne {
 	})
 }
 
+// SetLoadFactor sets the "load_factor" field.
+func (u *AccountUpsertOne) SetLoadFactor(v int) *AccountUpsertOne {
+	return u.Update(func(s *AccountUpsert) {
+		s.SetLoadFactor(v)
+	})
+}
+
+// AddLoadFactor adds v to the "load_factor" field.
+func (u *AccountUpsertOne) AddLoadFactor(v int) *AccountUpsertOne {
+	return u.Update(func(s *AccountUpsert) {
+		s.AddLoadFactor(v)
+	})
+}
+
+// UpdateLoadFactor sets the "load_factor" field to the value that was provided on create.
+func (u *AccountUpsertOne) UpdateLoadFactor() *AccountUpsertOne {
+	return u.Update(func(s *AccountUpsert) {
+		s.UpdateLoadFactor()
+	})
+}
+
+// ClearLoadFactor clears the value of the "load_factor" field.
+func (u *AccountUpsertOne) ClearLoadFactor() *AccountUpsertOne {
+	return u.Update(func(s *AccountUpsert) {
+		s.ClearLoadFactor()
+	})
+}
+
 // SetPriority sets the "priority" field.
 func (u *AccountUpsertOne) SetPriority(v int) *AccountUpsertOne {
 	return u.Update(func(s *AccountUpsert) {
@@ -2113,6 +2183,34 @@ func (u *AccountUpsertBulk) UpdateConcurrency() *AccountUpsertBulk {
 	})
 }
 
+// SetLoadFactor sets the "load_factor" field.
+func (u *AccountUpsertBulk) SetLoadFactor(v int) *AccountUpsertBulk {
+	return u.Update(func(s *AccountUpsert) {
+		s.SetLoadFactor(v)
+	})
+}
+
+// AddLoadFactor adds v to the "load_factor" field.
+func (u *AccountUpsertBulk) AddLoadFactor(v int) *AccountUpsertBulk {
+	return u.Update(func(s *AccountUpsert) {
+		s.AddLoadFactor(v)
+	})
+}
+
+// UpdateLoadFactor sets the "load_factor" field to the value that was provided on create.
+func (u *AccountUpsertBulk) UpdateLoadFactor() *AccountUpsertBulk {
+	return u.Update(func(s *AccountUpsert) {
+		s.UpdateLoadFactor()
+	})
+}
+
+// ClearLoadFactor clears the value of the "load_factor" field.
+func (u *AccountUpsertBulk) ClearLoadFactor() *AccountUpsertBulk {
+	return u.Update(func(s *AccountUpsert) {
+		s.ClearLoadFactor()
+	})
+}
+
 // SetPriority sets the "priority" field.
 func (u *AccountUpsertBulk) SetPriority(v int) *AccountUpsertBulk {
 	return u.Update(func(s *AccountUpsert) {
diff --git a/backend/ent/account_update.go b/backend/ent/account_update.go
index 875888e0..6f443c65 100644
--- a/backend/ent/account_update.go
+++ b/backend/ent/account_update.go
@@ -172,6 +172,33 @@ func (_u *AccountUpdate) AddConcurrency(v int) *AccountUpdate {
 	return _u
 }
 
+// SetLoadFactor sets the "load_factor" field.
+func (_u *AccountUpdate) SetLoadFactor(v int) *AccountUpdate {
+	_u.mutation.ResetLoadFactor()
+	_u.mutation.SetLoadFactor(v)
+	return _u
+}
+
+// SetNillableLoadFactor sets the "load_factor" field if the given value is not nil.
+func (_u *AccountUpdate) SetNillableLoadFactor(v *int) *AccountUpdate {
+	if v != nil {
+		_u.SetLoadFactor(*v)
+	}
+	return _u
+}
+
+// AddLoadFactor adds value to the "load_factor" field.
+func (_u *AccountUpdate) AddLoadFactor(v int) *AccountUpdate {
+	_u.mutation.AddLoadFactor(v)
+	return _u
+}
+
+// ClearLoadFactor clears the value of the "load_factor" field.
+func (_u *AccountUpdate) ClearLoadFactor() *AccountUpdate {
+	_u.mutation.ClearLoadFactor()
+	return _u
+}
+
 // SetPriority sets the "priority" field.
 func (_u *AccountUpdate) SetPriority(v int) *AccountUpdate {
 	_u.mutation.ResetPriority()
@@ -684,6 +711,15 @@ func (_u *AccountUpdate) sqlSave(ctx context.Context) (_node int, err error) {
 	if value, ok := _u.mutation.AddedConcurrency(); ok {
 		_spec.AddField(account.FieldConcurrency, field.TypeInt, value)
 	}
+	if value, ok := _u.mutation.LoadFactor(); ok {
+		_spec.SetField(account.FieldLoadFactor, field.TypeInt, value)
+	}
+	if value, ok := _u.mutation.AddedLoadFactor(); ok {
+		_spec.AddField(account.FieldLoadFactor, field.TypeInt, value)
+	}
+	if _u.mutation.LoadFactorCleared() {
+		_spec.ClearField(account.FieldLoadFactor, field.TypeInt)
+	}
 	if value, ok := _u.mutation.Priority(); ok {
 		_spec.SetField(account.FieldPriority, field.TypeInt, value)
 	}
@@ -1063,6 +1099,33 @@ func (_u *AccountUpdateOne) AddConcurrency(v int) *AccountUpdateOne {
 	return _u
 }
 
+// SetLoadFactor sets the "load_factor" field.
+func (_u *AccountUpdateOne) SetLoadFactor(v int) *AccountUpdateOne {
+	_u.mutation.ResetLoadFactor()
+	_u.mutation.SetLoadFactor(v)
+	return _u
+}
+
+// SetNillableLoadFactor sets the "load_factor" field if the given value is not nil.
+func (_u *AccountUpdateOne) SetNillableLoadFactor(v *int) *AccountUpdateOne {
+	if v != nil {
+		_u.SetLoadFactor(*v)
+	}
+	return _u
+}
+
+// AddLoadFactor adds value to the "load_factor" field.
+func (_u *AccountUpdateOne) AddLoadFactor(v int) *AccountUpdateOne {
+	_u.mutation.AddLoadFactor(v)
+	return _u
+}
+
+// ClearLoadFactor clears the value of the "load_factor" field.
+func (_u *AccountUpdateOne) ClearLoadFactor() *AccountUpdateOne {
+	_u.mutation.ClearLoadFactor()
+	return _u
+}
+
 // SetPriority sets the "priority" field.
 func (_u *AccountUpdateOne) SetPriority(v int) *AccountUpdateOne {
 	_u.mutation.ResetPriority()
@@ -1605,6 +1668,15 @@ func (_u *AccountUpdateOne) sqlSave(ctx context.Context) (_node *Account, err er
 	if value, ok := _u.mutation.AddedConcurrency(); ok {
 		_spec.AddField(account.FieldConcurrency, field.TypeInt, value)
 	}
+	if value, ok := _u.mutation.LoadFactor(); ok {
+		_spec.SetField(account.FieldLoadFactor, field.TypeInt, value)
+	}
+	if value, ok := _u.mutation.AddedLoadFactor(); ok {
+		_spec.AddField(account.FieldLoadFactor, field.TypeInt, value)
+	}
+	if _u.mutation.LoadFactorCleared() {
+		_spec.ClearField(account.FieldLoadFactor, field.TypeInt)
+	}
 	if value, ok := _u.mutation.Priority(); ok {
 		_spec.SetField(account.FieldPriority, field.TypeInt, value)
 	}
diff --git a/backend/ent/group.go b/backend/ent/group.go
index 76c3cae2..84dcccf8 100644
--- a/backend/ent/group.go
+++ b/backend/ent/group.go
@@ -62,22 +62,24 @@ type Group struct {
 	SoraVideoPricePerRequestHd *float64 `json:"sora_video_price_per_request_hd,omitempty"`
 	// SoraStorageQuotaBytes holds the value of the "sora_storage_quota_bytes" field.
 	SoraStorageQuotaBytes int64 `json:"sora_storage_quota_bytes,omitempty"`
-	// 是否仅允许 Claude Code 客户端
+	// allow Claude Code client only
 	ClaudeCodeOnly bool `json:"claude_code_only,omitempty"`
-	// 非 Claude Code 请求降级使用的分组 ID
+	// fallback group for non-Claude-Code requests
 	FallbackGroupID *int64 `json:"fallback_group_id,omitempty"`
-	// 无效请求兜底使用的分组 ID
+	// fallback group for invalid request
 	FallbackGroupIDOnInvalidRequest *int64 `json:"fallback_group_id_on_invalid_request,omitempty"`
-	// 模型路由配置：模型模式 -> 优先账号ID列表
+	// model routing config: pattern -> account ids
 	ModelRouting map[string][]int64 `json:"model_routing,omitempty"`
-	// 是否启用模型路由配置
+	// whether model routing is enabled
 	ModelRoutingEnabled bool `json:"model_routing_enabled,omitempty"`
-	// 是否注入 MCP XML 调用协议提示词（仅 antigravity 平台）
+	// whether MCP XML prompt injection is enabled
 	McpXMLInject bool `json:"mcp_xml_inject,omitempty"`
-	// 支持的模型系列：claude, gemini_text, gemini_image
+	// supported model scopes: claude, gemini_text, gemini_image
 	SupportedModelScopes []string `json:"supported_model_scopes,omitempty"`
-	// 分组显示排序，数值越小越靠前
+	// group display order, lower comes first
 	SortOrder int `json:"sort_order,omitempty"`
+	// simulate claude usage as claude-max style (1h cache write)
+	SimulateClaudeMaxEnabled bool `json:"simulate_claude_max_enabled,omitempty"`
 	// Edges holds the relations/edges for other nodes in the graph.
 	// The values are being populated by the GroupQuery when eager-loading is set.
 	Edges        GroupEdges `json:"edges"`
@@ -186,7 +188,7 @@ func (*Group) scanValues(columns []string) ([]any, error) {
 		switch columns[i] {
 		case group.FieldModelRouting, group.FieldSupportedModelScopes:
 			values[i] = new([]byte)
-		case group.FieldIsExclusive, group.FieldClaudeCodeOnly, group.FieldModelRoutingEnabled, group.FieldMcpXMLInject:
+		case group.FieldIsExclusive, group.FieldClaudeCodeOnly, group.FieldModelRoutingEnabled, group.FieldMcpXMLInject, group.FieldSimulateClaudeMaxEnabled:
 			values[i] = new(sql.NullBool)
 		case group.FieldRateMultiplier, group.FieldDailyLimitUsd, group.FieldWeeklyLimitUsd, group.FieldMonthlyLimitUsd, group.FieldImagePrice1k, group.FieldImagePrice2k, group.FieldImagePrice4k, group.FieldSoraImagePrice360, group.FieldSoraImagePrice540, group.FieldSoraVideoPricePerRequest, group.FieldSoraVideoPricePerRequestHd:
 			values[i] = new(sql.NullFloat64)
@@ -415,6 +417,12 @@ func (_m *Group) assignValues(columns []string, values []any) error {
 			} else if value.Valid {
 				_m.SortOrder = int(value.Int64)
 			}
+		case group.FieldSimulateClaudeMaxEnabled:
+			if value, ok := values[i].(*sql.NullBool); !ok {
+				return fmt.Errorf("unexpected type %T for field simulate_claude_max_enabled", values[i])
+			} else if value.Valid {
+				_m.SimulateClaudeMaxEnabled = value.Bool
+			}
 		default:
 			_m.selectValues.Set(columns[i], values[i])
 		}
@@ -608,6 +616,9 @@ func (_m *Group) String() string {
 	builder.WriteString(", ")
 	builder.WriteString("sort_order=")
 	builder.WriteString(fmt.Sprintf("%v", _m.SortOrder))
+	builder.WriteString(", ")
+	builder.WriteString("simulate_claude_max_enabled=")
+	builder.WriteString(fmt.Sprintf("%v", _m.SimulateClaudeMaxEnabled))
 	builder.WriteByte(')')
 	return builder.String()
 }
diff --git a/backend/ent/group/group.go b/backend/ent/group/group.go
index 6ac4eea1..640c804f 100644
--- a/backend/ent/group/group.go
+++ b/backend/ent/group/group.go
@@ -75,6 +75,8 @@ const (
 	FieldSupportedModelScopes = "supported_model_scopes"
 	// FieldSortOrder holds the string denoting the sort_order field in the database.
 	FieldSortOrder = "sort_order"
+	// FieldSimulateClaudeMaxEnabled holds the string denoting the simulate_claude_max_enabled field in the database.
+	FieldSimulateClaudeMaxEnabled = "simulate_claude_max_enabled"
 	// EdgeAPIKeys holds the string denoting the api_keys edge name in mutations.
 	EdgeAPIKeys = "api_keys"
 	// EdgeRedeemCodes holds the string denoting the redeem_codes edge name in mutations.
@@ -180,6 +182,7 @@ var Columns = []string{
 	FieldMcpXMLInject,
 	FieldSupportedModelScopes,
 	FieldSortOrder,
+	FieldSimulateClaudeMaxEnabled,
 }
 
 var (
@@ -247,6 +250,8 @@ var (
 	DefaultSupportedModelScopes []string
 	// DefaultSortOrder holds the default value on creation for the "sort_order" field.
 	DefaultSortOrder int
+	// DefaultSimulateClaudeMaxEnabled holds the default value on creation for the "simulate_claude_max_enabled" field.
+	DefaultSimulateClaudeMaxEnabled bool
 )
 
 // OrderOption defines the ordering options for the Group queries.
@@ -397,6 +402,11 @@ func BySortOrder(opts ...sql.OrderTermOption) OrderOption {
 	return sql.OrderByField(FieldSortOrder, opts...).ToFunc()
 }
 
+// BySimulateClaudeMaxEnabled orders the results by the simulate_claude_max_enabled field.
+func BySimulateClaudeMaxEnabled(opts ...sql.OrderTermOption) OrderOption {
+	return sql.OrderByField(FieldSimulateClaudeMaxEnabled, opts...).ToFunc()
+}
+
 // ByAPIKeysCount orders the results by api_keys count.
 func ByAPIKeysCount(opts ...sql.OrderTermOption) OrderOption {
 	return func(s *sql.Selector) {
diff --git a/backend/ent/group/where.go b/backend/ent/group/where.go
index 4cf65d0f..43c24792 100644
--- a/backend/ent/group/where.go
+++ b/backend/ent/group/where.go
@@ -195,6 +195,11 @@ func SortOrder(v int) predicate.Group {
 	return predicate.Group(sql.FieldEQ(FieldSortOrder, v))
 }
 
+// SimulateClaudeMaxEnabled applies equality check predicate on the "simulate_claude_max_enabled" field. It's identical to SimulateClaudeMaxEnabledEQ.
+func SimulateClaudeMaxEnabled(v bool) predicate.Group {
+	return predicate.Group(sql.FieldEQ(FieldSimulateClaudeMaxEnabled, v))
+}
+
 // CreatedAtEQ applies the EQ predicate on the "created_at" field.
 func CreatedAtEQ(v time.Time) predicate.Group {
 	return predicate.Group(sql.FieldEQ(FieldCreatedAt, v))
@@ -1470,6 +1475,16 @@ func SortOrderLTE(v int) predicate.Group {
 	return predicate.Group(sql.FieldLTE(FieldSortOrder, v))
 }
 
+// SimulateClaudeMaxEnabledEQ applies the EQ predicate on the "simulate_claude_max_enabled" field.
+func SimulateClaudeMaxEnabledEQ(v bool) predicate.Group {
+	return predicate.Group(sql.FieldEQ(FieldSimulateClaudeMaxEnabled, v))
+}
+
+// SimulateClaudeMaxEnabledNEQ applies the NEQ predicate on the "simulate_claude_max_enabled" field.
+func SimulateClaudeMaxEnabledNEQ(v bool) predicate.Group {
+	return predicate.Group(sql.FieldNEQ(FieldSimulateClaudeMaxEnabled, v))
+}
+
 // HasAPIKeys applies the HasEdge predicate on the "api_keys" edge.
 func HasAPIKeys() predicate.Group {
 	return predicate.Group(func(s *sql.Selector) {
diff --git a/backend/ent/group_create.go b/backend/ent/group_create.go
index 0ce5f959..99669ed3 100644
--- a/backend/ent/group_create.go
+++ b/backend/ent/group_create.go
@@ -424,6 +424,20 @@ func (_c *GroupCreate) SetNillableSortOrder(v *int) *GroupCreate {
 	return _c
 }
 
+// SetSimulateClaudeMaxEnabled sets the "simulate_claude_max_enabled" field.
+func (_c *GroupCreate) SetSimulateClaudeMaxEnabled(v bool) *GroupCreate {
+	_c.mutation.SetSimulateClaudeMaxEnabled(v)
+	return _c
+}
+
+// SetNillableSimulateClaudeMaxEnabled sets the "simulate_claude_max_enabled" field if the given value is not nil.
+func (_c *GroupCreate) SetNillableSimulateClaudeMaxEnabled(v *bool) *GroupCreate {
+	if v != nil {
+		_c.SetSimulateClaudeMaxEnabled(*v)
+	}
+	return _c
+}
+
 // AddAPIKeyIDs adds the "api_keys" edge to the APIKey entity by IDs.
 func (_c *GroupCreate) AddAPIKeyIDs(ids ...int64) *GroupCreate {
 	_c.mutation.AddAPIKeyIDs(ids...)
@@ -613,6 +627,10 @@ func (_c *GroupCreate) defaults() error {
 		v := group.DefaultSortOrder
 		_c.mutation.SetSortOrder(v)
 	}
+	if _, ok := _c.mutation.SimulateClaudeMaxEnabled(); !ok {
+		v := group.DefaultSimulateClaudeMaxEnabled
+		_c.mutation.SetSimulateClaudeMaxEnabled(v)
+	}
 	return nil
 }
 
@@ -683,6 +701,9 @@ func (_c *GroupCreate) check() error {
 	if _, ok := _c.mutation.SortOrder(); !ok {
 		return &ValidationError{Name: "sort_order", err: errors.New(`ent: missing required field "Group.sort_order"`)}
 	}
+	if _, ok := _c.mutation.SimulateClaudeMaxEnabled(); !ok {
+		return &ValidationError{Name: "simulate_claude_max_enabled", err: errors.New(`ent: missing required field "Group.simulate_claude_max_enabled"`)}
+	}
 	return nil
 }
 
@@ -830,6 +851,10 @@ func (_c *GroupCreate) createSpec() (*Group, *sqlgraph.CreateSpec) {
 		_spec.SetField(group.FieldSortOrder, field.TypeInt, value)
 		_node.SortOrder = value
 	}
+	if value, ok := _c.mutation.SimulateClaudeMaxEnabled(); ok {
+		_spec.SetField(group.FieldSimulateClaudeMaxEnabled, field.TypeBool, value)
+		_node.SimulateClaudeMaxEnabled = value
+	}
 	if nodes := _c.mutation.APIKeysIDs(); len(nodes) > 0 {
 		edge := &sqlgraph.EdgeSpec{
 			Rel:     sqlgraph.O2M,
@@ -1520,6 +1545,18 @@ func (u *GroupUpsert) AddSortOrder(v int) *GroupUpsert {
 	return u
 }
 
+// SetSimulateClaudeMaxEnabled sets the "simulate_claude_max_enabled" field.
+func (u *GroupUpsert) SetSimulateClaudeMaxEnabled(v bool) *GroupUpsert {
+	u.Set(group.FieldSimulateClaudeMaxEnabled, v)
+	return u
+}
+
+// UpdateSimulateClaudeMaxEnabled sets the "simulate_claude_max_enabled" field to the value that was provided on create.
+func (u *GroupUpsert) UpdateSimulateClaudeMaxEnabled() *GroupUpsert {
+	u.SetExcluded(group.FieldSimulateClaudeMaxEnabled)
+	return u
+}
+
 // UpdateNewValues updates the mutable fields using the new values that were set on create.
 // Using this option is equivalent to using:
 //
@@ -2188,6 +2225,20 @@ func (u *GroupUpsertOne) UpdateSortOrder() *GroupUpsertOne {
 	})
 }
 
+// SetSimulateClaudeMaxEnabled sets the "simulate_claude_max_enabled" field.
+func (u *GroupUpsertOne) SetSimulateClaudeMaxEnabled(v bool) *GroupUpsertOne {
+	return u.Update(func(s *GroupUpsert) {
+		s.SetSimulateClaudeMaxEnabled(v)
+	})
+}
+
+// UpdateSimulateClaudeMaxEnabled sets the "simulate_claude_max_enabled" field to the value that was provided on create.
+func (u *GroupUpsertOne) UpdateSimulateClaudeMaxEnabled() *GroupUpsertOne {
+	return u.Update(func(s *GroupUpsert) {
+		s.UpdateSimulateClaudeMaxEnabled()
+	})
+}
+
 // Exec executes the query.
 func (u *GroupUpsertOne) Exec(ctx context.Context) error {
 	if len(u.create.conflict) == 0 {
@@ -3022,6 +3073,20 @@ func (u *GroupUpsertBulk) UpdateSortOrder() *GroupUpsertBulk {
 	})
 }
 
+// SetSimulateClaudeMaxEnabled sets the "simulate_claude_max_enabled" field.
+func (u *GroupUpsertBulk) SetSimulateClaudeMaxEnabled(v bool) *GroupUpsertBulk {
+	return u.Update(func(s *GroupUpsert) {
+		s.SetSimulateClaudeMaxEnabled(v)
+	})
+}
+
+// UpdateSimulateClaudeMaxEnabled sets the "simulate_claude_max_enabled" field to the value that was provided on create.
+func (u *GroupUpsertBulk) UpdateSimulateClaudeMaxEnabled() *GroupUpsertBulk {
+	return u.Update(func(s *GroupUpsert) {
+		s.UpdateSimulateClaudeMaxEnabled()
+	})
+}
+
 // Exec executes the query.
 func (u *GroupUpsertBulk) Exec(ctx context.Context) error {
 	if u.create.err != nil {
diff --git a/backend/ent/group_update.go b/backend/ent/group_update.go
index 85575292..bc460a3b 100644
--- a/backend/ent/group_update.go
+++ b/backend/ent/group_update.go
@@ -625,6 +625,20 @@ func (_u *GroupUpdate) AddSortOrder(v int) *GroupUpdate {
 	return _u
 }
 
+// SetSimulateClaudeMaxEnabled sets the "simulate_claude_max_enabled" field.
+func (_u *GroupUpdate) SetSimulateClaudeMaxEnabled(v bool) *GroupUpdate {
+	_u.mutation.SetSimulateClaudeMaxEnabled(v)
+	return _u
+}
+
+// SetNillableSimulateClaudeMaxEnabled sets the "simulate_claude_max_enabled" field if the given value is not nil.
+func (_u *GroupUpdate) SetNillableSimulateClaudeMaxEnabled(v *bool) *GroupUpdate {
+	if v != nil {
+		_u.SetSimulateClaudeMaxEnabled(*v)
+	}
+	return _u
+}
+
 // AddAPIKeyIDs adds the "api_keys" edge to the APIKey entity by IDs.
 func (_u *GroupUpdate) AddAPIKeyIDs(ids ...int64) *GroupUpdate {
 	_u.mutation.AddAPIKeyIDs(ids...)
@@ -1110,6 +1124,9 @@ func (_u *GroupUpdate) sqlSave(ctx context.Context) (_node int, err error) {
 	if value, ok := _u.mutation.AddedSortOrder(); ok {
 		_spec.AddField(group.FieldSortOrder, field.TypeInt, value)
 	}
+	if value, ok := _u.mutation.SimulateClaudeMaxEnabled(); ok {
+		_spec.SetField(group.FieldSimulateClaudeMaxEnabled, field.TypeBool, value)
+	}
 	if _u.mutation.APIKeysCleared() {
 		edge := &sqlgraph.EdgeSpec{
 			Rel:     sqlgraph.O2M,
@@ -2014,6 +2031,20 @@ func (_u *GroupUpdateOne) AddSortOrder(v int) *GroupUpdateOne {
 	return _u
 }
 
+// SetSimulateClaudeMaxEnabled sets the "simulate_claude_max_enabled" field.
+func (_u *GroupUpdateOne) SetSimulateClaudeMaxEnabled(v bool) *GroupUpdateOne {
+	_u.mutation.SetSimulateClaudeMaxEnabled(v)
+	return _u
+}
+
+// SetNillableSimulateClaudeMaxEnabled sets the "simulate_claude_max_enabled" field if the given value is not nil.
+func (_u *GroupUpdateOne) SetNillableSimulateClaudeMaxEnabled(v *bool) *GroupUpdateOne {
+	if v != nil {
+		_u.SetSimulateClaudeMaxEnabled(*v)
+	}
+	return _u
+}
+
 // AddAPIKeyIDs adds the "api_keys" edge to the APIKey entity by IDs.
 func (_u *GroupUpdateOne) AddAPIKeyIDs(ids ...int64) *GroupUpdateOne {
 	_u.mutation.AddAPIKeyIDs(ids...)
@@ -2529,6 +2560,9 @@ func (_u *GroupUpdateOne) sqlSave(ctx context.Context) (_node *Group, err error)
 	if value, ok := _u.mutation.AddedSortOrder(); ok {
 		_spec.AddField(group.FieldSortOrder, field.TypeInt, value)
 	}
+	if value, ok := _u.mutation.SimulateClaudeMaxEnabled(); ok {
+		_spec.SetField(group.FieldSimulateClaudeMaxEnabled, field.TypeBool, value)
+	}
 	if _u.mutation.APIKeysCleared() {
 		edge := &sqlgraph.EdgeSpec{
 			Rel:     sqlgraph.O2M,
diff --git a/backend/ent/migrate/schema.go b/backend/ent/migrate/schema.go
index 85e94072..323568e7 100644
--- a/backend/ent/migrate/schema.go
+++ b/backend/ent/migrate/schema.go
@@ -106,6 +106,7 @@ var (
 		{Name: "credentials", Type: field.TypeJSON, SchemaType: map[string]string{"postgres": "jsonb"}},
 		{Name: "extra", Type: field.TypeJSON, SchemaType: map[string]string{"postgres": "jsonb"}},
 		{Name: "concurrency", Type: field.TypeInt, Default: 3},
+		{Name: "load_factor", Type: field.TypeInt, Nullable: true},
 		{Name: "priority", Type: field.TypeInt, Default: 50},
 		{Name: "rate_multiplier", Type: field.TypeFloat64, Default: 1, SchemaType: map[string]string{"postgres": "decimal(10,4)"}},
 		{Name: "status", Type: field.TypeString, Size: 20, Default: "active"},
@@ -132,7 +133,7 @@ var (
 		ForeignKeys: []*schema.ForeignKey{
 			{
 				Symbol:     "accounts_proxies_proxy",
-				Columns:    []*schema.Column{AccountsColumns[27]},
+				Columns:    []*schema.Column{AccountsColumns[28]},
 				RefColumns: []*schema.Column{ProxiesColumns[0]},
 				OnDelete:   schema.SetNull,
 			},
@@ -151,52 +152,52 @@ var (
 			{
 				Name:    "account_status",
 				Unique:  false,
-				Columns: []*schema.Column{AccountsColumns[13]},
+				Columns: []*schema.Column{AccountsColumns[14]},
 			},
 			{
 				Name:    "account_proxy_id",
 				Unique:  false,
-				Columns: []*schema.Column{AccountsColumns[27]},
+				Columns: []*schema.Column{AccountsColumns[28]},
 			},
 			{
 				Name:    "account_priority",
 				Unique:  false,
-				Columns: []*schema.Column{AccountsColumns[11]},
+				Columns: []*schema.Column{AccountsColumns[12]},
 			},
 			{
 				Name:    "account_last_used_at",
 				Unique:  false,
-				Columns: []*schema.Column{AccountsColumns[15]},
+				Columns: []*schema.Column{AccountsColumns[16]},
 			},
 			{
 				Name:    "account_schedulable",
 				Unique:  false,
-				Columns: []*schema.Column{AccountsColumns[18]},
+				Columns: []*schema.Column{AccountsColumns[19]},
 			},
 			{
 				Name:    "account_rate_limited_at",
 				Unique:  false,
-				Columns: []*schema.Column{AccountsColumns[19]},
+				Columns: []*schema.Column{AccountsColumns[20]},
 			},
 			{
 				Name:    "account_rate_limit_reset_at",
 				Unique:  false,
-				Columns: []*schema.Column{AccountsColumns[20]},
+				Columns: []*schema.Column{AccountsColumns[21]},
 			},
 			{
 				Name:    "account_overload_until",
 				Unique:  false,
-				Columns: []*schema.Column{AccountsColumns[21]},
+				Columns: []*schema.Column{AccountsColumns[22]},
 			},
 			{
 				Name:    "account_platform_priority",
 				Unique:  false,
-				Columns: []*schema.Column{AccountsColumns[6], AccountsColumns[11]},
+				Columns: []*schema.Column{AccountsColumns[6], AccountsColumns[12]},
 			},
 			{
 				Name:    "account_priority_status",
 				Unique:  false,
-				Columns: []*schema.Column{AccountsColumns[11], AccountsColumns[13]},
+				Columns: []*schema.Column{AccountsColumns[12], AccountsColumns[14]},
 			},
 			{
 				Name:    "account_deleted_at",
@@ -406,6 +407,7 @@ var (
 		{Name: "mcp_xml_inject", Type: field.TypeBool, Default: true},
 		{Name: "supported_model_scopes", Type: field.TypeJSON, SchemaType: map[string]string{"postgres": "jsonb"}},
 		{Name: "sort_order", Type: field.TypeInt, Default: 0},
+		{Name: "simulate_claude_max_enabled", Type: field.TypeBool, Default: false},
 	}
 	// GroupsTable holds the schema information for the "groups" table.
 	GroupsTable = &schema.Table{
diff --git a/backend/ent/mutation.go b/backend/ent/mutation.go
index 85e2ea71..994bf064 100644
--- a/backend/ent/mutation.go
+++ b/backend/ent/mutation.go
@@ -2260,6 +2260,8 @@ type AccountMutation struct {
 	extra                     *map[string]interface{}
 	concurrency               *int
 	addconcurrency            *int
+	load_factor               *int
+	addload_factor            *int
 	priority                  *int
 	addpriority               *int
 	rate_multiplier           *float64
@@ -2845,6 +2847,76 @@ func (m *AccountMutation) ResetConcurrency() {
 	m.addconcurrency = nil
 }
 
+// SetLoadFactor sets the "load_factor" field.
+func (m *AccountMutation) SetLoadFactor(i int) {
+	m.load_factor = &i
+	m.addload_factor = nil
+}
+
+// LoadFactor returns the value of the "load_factor" field in the mutation.
+func (m *AccountMutation) LoadFactor() (r int, exists bool) {
+	v := m.load_factor
+	if v == nil {
+		return
+	}
+	return *v, true
+}
+
+// OldLoadFactor returns the old "load_factor" field's value of the Account entity.
+// If the Account object wasn't provided to the builder, the object is fetched from the database.
+// An error is returned if the mutation operation is not UpdateOne, or the database query fails.
+func (m *AccountMutation) OldLoadFactor(ctx context.Context) (v *int, err error) {
+	if !m.op.Is(OpUpdateOne) {
+		return v, errors.New("OldLoadFactor is only allowed on UpdateOne operations")
+	}
+	if m.id == nil || m.oldValue == nil {
+		return v, errors.New("OldLoadFactor requires an ID field in the mutation")
+	}
+	oldValue, err := m.oldValue(ctx)
+	if err != nil {
+		return v, fmt.Errorf("querying old value for OldLoadFactor: %w", err)
+	}
+	return oldValue.LoadFactor, nil
+}
+
+// AddLoadFactor adds i to the "load_factor" field.
+func (m *AccountMutation) AddLoadFactor(i int) {
+	if m.addload_factor != nil {
+		*m.addload_factor += i
+	} else {
+		m.addload_factor = &i
+	}
+}
+
+// AddedLoadFactor returns the value that was added to the "load_factor" field in this mutation.
+func (m *AccountMutation) AddedLoadFactor() (r int, exists bool) {
+	v := m.addload_factor
+	if v == nil {
+		return
+	}
+	return *v, true
+}
+
+// ClearLoadFactor clears the value of the "load_factor" field.
+func (m *AccountMutation) ClearLoadFactor() {
+	m.load_factor = nil
+	m.addload_factor = nil
+	m.clearedFields[account.FieldLoadFactor] = struct{}{}
+}
+
+// LoadFactorCleared returns if the "load_factor" field was cleared in this mutation.
+func (m *AccountMutation) LoadFactorCleared() bool {
+	_, ok := m.clearedFields[account.FieldLoadFactor]
+	return ok
+}
+
+// ResetLoadFactor resets all changes to the "load_factor" field.
+func (m *AccountMutation) ResetLoadFactor() {
+	m.load_factor = nil
+	m.addload_factor = nil
+	delete(m.clearedFields, account.FieldLoadFactor)
+}
+
 // SetPriority sets the "priority" field.
 func (m *AccountMutation) SetPriority(i int) {
 	m.priority = &i
@@ -3773,7 +3845,7 @@ func (m *AccountMutation) Type() string {
 // order to get all numeric fields that were incremented/decremented, call
 // AddedFields().
 func (m *AccountMutation) Fields() []string {
-	fields := make([]string, 0, 27)
+	fields := make([]string, 0, 28)
 	if m.created_at != nil {
 		fields = append(fields, account.FieldCreatedAt)
 	}
@@ -3807,6 +3879,9 @@ func (m *AccountMutation) Fields() []string {
 	if m.concurrency != nil {
 		fields = append(fields, account.FieldConcurrency)
 	}
+	if m.load_factor != nil {
+		fields = append(fields, account.FieldLoadFactor)
+	}
 	if m.priority != nil {
 		fields = append(fields, account.FieldPriority)
 	}
@@ -3885,6 +3960,8 @@ func (m *AccountMutation) Field(name string) (ent.Value, bool) {
 		return m.ProxyID()
 	case account.FieldConcurrency:
 		return m.Concurrency()
+	case account.FieldLoadFactor:
+		return m.LoadFactor()
 	case account.FieldPriority:
 		return m.Priority()
 	case account.FieldRateMultiplier:
@@ -3948,6 +4025,8 @@ func (m *AccountMutation) OldField(ctx context.Context, name string) (ent.Value,
 		return m.OldProxyID(ctx)
 	case account.FieldConcurrency:
 		return m.OldConcurrency(ctx)
+	case account.FieldLoadFactor:
+		return m.OldLoadFactor(ctx)
 	case account.FieldPriority:
 		return m.OldPriority(ctx)
 	case account.FieldRateMultiplier:
@@ -4066,6 +4145,13 @@ func (m *AccountMutation) SetField(name string, value ent.Value) error {
 		}
 		m.SetConcurrency(v)
 		return nil
+	case account.FieldLoadFactor:
+		v, ok := value.(int)
+		if !ok {
+			return fmt.Errorf("unexpected type %T for field %s", value, name)
+		}
+		m.SetLoadFactor(v)
+		return nil
 	case account.FieldPriority:
 		v, ok := value.(int)
 		if !ok {
@@ -4189,6 +4275,9 @@ func (m *AccountMutation) AddedFields() []string {
 	if m.addconcurrency != nil {
 		fields = append(fields, account.FieldConcurrency)
 	}
+	if m.addload_factor != nil {
+		fields = append(fields, account.FieldLoadFactor)
+	}
 	if m.addpriority != nil {
 		fields = append(fields, account.FieldPriority)
 	}
@@ -4205,6 +4294,8 @@ func (m *AccountMutation) AddedField(name string) (ent.Value, bool) {
 	switch name {
 	case account.FieldConcurrency:
 		return m.AddedConcurrency()
+	case account.FieldLoadFactor:
+		return m.AddedLoadFactor()
 	case account.FieldPriority:
 		return m.AddedPriority()
 	case account.FieldRateMultiplier:
@@ -4225,6 +4316,13 @@ func (m *AccountMutation) AddField(name string, value ent.Value) error {
 		}
 		m.AddConcurrency(v)
 		return nil
+	case account.FieldLoadFactor:
+		v, ok := value.(int)
+		if !ok {
+			return fmt.Errorf("unexpected type %T for field %s", value, name)
+		}
+		m.AddLoadFactor(v)
+		return nil
 	case account.FieldPriority:
 		v, ok := value.(int)
 		if !ok {
@@ -4256,6 +4354,9 @@ func (m *AccountMutation) ClearedFields() []string {
 	if m.FieldCleared(account.FieldProxyID) {
 		fields = append(fields, account.FieldProxyID)
 	}
+	if m.FieldCleared(account.FieldLoadFactor) {
+		fields = append(fields, account.FieldLoadFactor)
+	}
 	if m.FieldCleared(account.FieldErrorMessage) {
 		fields = append(fields, account.FieldErrorMessage)
 	}
@@ -4312,6 +4413,9 @@ func (m *AccountMutation) ClearField(name string) error {
 	case account.FieldProxyID:
 		m.ClearProxyID()
 		return nil
+	case account.FieldLoadFactor:
+		m.ClearLoadFactor()
+		return nil
 	case account.FieldErrorMessage:
 		m.ClearErrorMessage()
 		return nil
@@ -4386,6 +4490,9 @@ func (m *AccountMutation) ResetField(name string) error {
 	case account.FieldConcurrency:
 		m.ResetConcurrency()
 		return nil
+	case account.FieldLoadFactor:
+		m.ResetLoadFactor()
+		return nil
 	case account.FieldPriority:
 		m.ResetPriority()
 		return nil
@@ -8089,6 +8196,7 @@ type GroupMutation struct {
 	appendsupported_model_scopes            []string
 	sort_order                              *int
 	addsort_order                           *int
+	simulate_claude_max_enabled             *bool
 	clearedFields                           map[string]struct{}
 	api_keys                                map[int64]struct{}
 	removedapi_keys                         map[int64]struct{}
@@ -9833,6 +9941,42 @@ func (m *GroupMutation) ResetSortOrder() {
 	m.addsort_order = nil
 }
 
+// SetSimulateClaudeMaxEnabled sets the "simulate_claude_max_enabled" field.
+func (m *GroupMutation) SetSimulateClaudeMaxEnabled(b bool) {
+	m.simulate_claude_max_enabled = &b
+}
+
+// SimulateClaudeMaxEnabled returns the value of the "simulate_claude_max_enabled" field in the mutation.
+func (m *GroupMutation) SimulateClaudeMaxEnabled() (r bool, exists bool) {
+	v := m.simulate_claude_max_enabled
+	if v == nil {
+		return
+	}
+	return *v, true
+}
+
+// OldSimulateClaudeMaxEnabled returns the old "simulate_claude_max_enabled" field's value of the Group entity.
+// If the Group object wasn't provided to the builder, the object is fetched from the database.
+// An error is returned if the mutation operation is not UpdateOne, or the database query fails.
+func (m *GroupMutation) OldSimulateClaudeMaxEnabled(ctx context.Context) (v bool, err error) {
+	if !m.op.Is(OpUpdateOne) {
+		return v, errors.New("OldSimulateClaudeMaxEnabled is only allowed on UpdateOne operations")
+	}
+	if m.id == nil || m.oldValue == nil {
+		return v, errors.New("OldSimulateClaudeMaxEnabled requires an ID field in the mutation")
+	}
+	oldValue, err := m.oldValue(ctx)
+	if err != nil {
+		return v, fmt.Errorf("querying old value for OldSimulateClaudeMaxEnabled: %w", err)
+	}
+	return oldValue.SimulateClaudeMaxEnabled, nil
+}
+
+// ResetSimulateClaudeMaxEnabled resets all changes to the "simulate_claude_max_enabled" field.
+func (m *GroupMutation) ResetSimulateClaudeMaxEnabled() {
+	m.simulate_claude_max_enabled = nil
+}
+
 // AddAPIKeyIDs adds the "api_keys" edge to the APIKey entity by ids.
 func (m *GroupMutation) AddAPIKeyIDs(ids ...int64) {
 	if m.api_keys == nil {
@@ -10191,7 +10335,7 @@ func (m *GroupMutation) Type() string {
 // order to get all numeric fields that were incremented/decremented, call
 // AddedFields().
 func (m *GroupMutation) Fields() []string {
-	fields := make([]string, 0, 30)
+	fields := make([]string, 0, 31)
 	if m.created_at != nil {
 		fields = append(fields, group.FieldCreatedAt)
 	}
@@ -10282,6 +10426,9 @@ func (m *GroupMutation) Fields() []string {
 	if m.sort_order != nil {
 		fields = append(fields, group.FieldSortOrder)
 	}
+	if m.simulate_claude_max_enabled != nil {
+		fields = append(fields, group.FieldSimulateClaudeMaxEnabled)
+	}
 	return fields
 }
 
@@ -10350,6 +10497,8 @@ func (m *GroupMutation) Field(name string) (ent.Value, bool) {
 		return m.SupportedModelScopes()
 	case group.FieldSortOrder:
 		return m.SortOrder()
+	case group.FieldSimulateClaudeMaxEnabled:
+		return m.SimulateClaudeMaxEnabled()
 	}
 	return nil, false
 }
@@ -10419,6 +10568,8 @@ func (m *GroupMutation) OldField(ctx context.Context, name string) (ent.Value, e
 		return m.OldSupportedModelScopes(ctx)
 	case group.FieldSortOrder:
 		return m.OldSortOrder(ctx)
+	case group.FieldSimulateClaudeMaxEnabled:
+		return m.OldSimulateClaudeMaxEnabled(ctx)
 	}
 	return nil, fmt.Errorf("unknown Group field %s", name)
 }
@@ -10638,6 +10789,13 @@ func (m *GroupMutation) SetField(name string, value ent.Value) error {
 		}
 		m.SetSortOrder(v)
 		return nil
+	case group.FieldSimulateClaudeMaxEnabled:
+		v, ok := value.(bool)
+		if !ok {
+			return fmt.Errorf("unexpected type %T for field %s", value, name)
+		}
+		m.SetSimulateClaudeMaxEnabled(v)
+		return nil
 	}
 	return fmt.Errorf("unknown Group field %s", name)
 }
@@ -11065,6 +11223,9 @@ func (m *GroupMutation) ResetField(name string) error {
 	case group.FieldSortOrder:
 		m.ResetSortOrder()
 		return nil
+	case group.FieldSimulateClaudeMaxEnabled:
+		m.ResetSimulateClaudeMaxEnabled()
+		return nil
 	}
 	return fmt.Errorf("unknown Group field %s", name)
 }
diff --git a/backend/ent/runtime/runtime.go b/backend/ent/runtime/runtime.go
index 2c7467f6..e49a3247 100644
--- a/backend/ent/runtime/runtime.go
+++ b/backend/ent/runtime/runtime.go
@@ -212,29 +212,29 @@ func init() {
 	// account.DefaultConcurrency holds the default value on creation for the concurrency field.
 	account.DefaultConcurrency = accountDescConcurrency.Default.(int)
 	// accountDescPriority is the schema descriptor for priority field.
-	accountDescPriority := accountFields[8].Descriptor()
+	accountDescPriority := accountFields[9].Descriptor()
 	// account.DefaultPriority holds the default value on creation for the priority field.
 	account.DefaultPriority = accountDescPriority.Default.(int)
 	// accountDescRateMultiplier is the schema descriptor for rate_multiplier field.
-	accountDescRateMultiplier := accountFields[9].Descriptor()
+	accountDescRateMultiplier := accountFields[10].Descriptor()
 	// account.DefaultRateMultiplier holds the default value on creation for the rate_multiplier field.
 	account.DefaultRateMultiplier = accountDescRateMultiplier.Default.(float64)
 	// accountDescStatus is the schema descriptor for status field.
-	accountDescStatus := accountFields[10].Descriptor()
+	accountDescStatus := accountFields[11].Descriptor()
 	// account.DefaultStatus holds the default value on creation for the status field.
 	account.DefaultStatus = accountDescStatus.Default.(string)
 	// account.StatusValidator is a validator for the "status" field. It is called by the builders before save.
 	account.StatusValidator = accountDescStatus.Validators[0].(func(string) error)
 	// accountDescAutoPauseOnExpired is the schema descriptor for auto_pause_on_expired field.
-	accountDescAutoPauseOnExpired := accountFields[14].Descriptor()
+	accountDescAutoPauseOnExpired := accountFields[15].Descriptor()
 	// account.DefaultAutoPauseOnExpired holds the default value on creation for the auto_pause_on_expired field.
 	account.DefaultAutoPauseOnExpired = accountDescAutoPauseOnExpired.Default.(bool)
 	// accountDescSchedulable is the schema descriptor for schedulable field.
-	accountDescSchedulable := accountFields[15].Descriptor()
+	accountDescSchedulable := accountFields[16].Descriptor()
 	// account.DefaultSchedulable holds the default value on creation for the schedulable field.
 	account.DefaultSchedulable = accountDescSchedulable.Default.(bool)
 	// accountDescSessionWindowStatus is the schema descriptor for session_window_status field.
-	accountDescSessionWindowStatus := accountFields[23].Descriptor()
+	accountDescSessionWindowStatus := accountFields[24].Descriptor()
 	// account.SessionWindowStatusValidator is a validator for the "session_window_status" field. It is called by the builders before save.
 	account.SessionWindowStatusValidator = accountDescSessionWindowStatus.Validators[0].(func(string) error)
 	accountgroupFields := schema.AccountGroup{}.Fields()
@@ -447,6 +447,10 @@ func init() {
 	groupDescSortOrder := groupFields[26].Descriptor()
 	// group.DefaultSortOrder holds the default value on creation for the sort_order field.
 	group.DefaultSortOrder = groupDescSortOrder.Default.(int)
+	// groupDescSimulateClaudeMaxEnabled is the schema descriptor for simulate_claude_max_enabled field.
+	groupDescSimulateClaudeMaxEnabled := groupFields[27].Descriptor()
+	// group.DefaultSimulateClaudeMaxEnabled holds the default value on creation for the simulate_claude_max_enabled field.
+	group.DefaultSimulateClaudeMaxEnabled = groupDescSimulateClaudeMaxEnabled.Default.(bool)
 	idempotencyrecordMixin := schema.IdempotencyRecord{}.Mixin()
 	idempotencyrecordMixinFields0 := idempotencyrecordMixin[0].Fields()
 	_ = idempotencyrecordMixinFields0
diff --git a/backend/ent/schema/account.go b/backend/ent/schema/account.go
index 443f9e09..5616d399 100644
--- a/backend/ent/schema/account.go
+++ b/backend/ent/schema/account.go
@@ -97,6 +97,8 @@ func (Account) Fields() []ent.Field {
 		field.Int("concurrency").
 			Default(3),
 
+		field.Int("load_factor").Optional().Nillable(),
+
 		// priority: 账户优先级，数值越小优先级越高
 		// 调度器会优先使用高优先级的账户
 		field.Int("priority").
diff --git a/backend/ent/schema/group.go b/backend/ent/schema/group.go
index 3fcf8674..456e38b2 100644
--- a/backend/ent/schema/group.go
+++ b/backend/ent/schema/group.go
@@ -33,8 +33,6 @@ func (Group) Mixin() []ent.Mixin {
 
 func (Group) Fields() []ent.Field {
 	return []ent.Field{
-		// 唯一约束通过部分索引实现（WHERE deleted_at IS NULL），支持软删除后重用
-		// 见迁移文件 016_soft_delete_partial_unique_indexes.sql
 		field.String("name").
 			MaxLen(100).
 			NotEmpty(),
@@ -51,7 +49,6 @@ func (Group) Fields() []ent.Field {
 			MaxLen(20).
 			Default(domain.StatusActive),
 
-		// Subscription-related fields (added by migration 003)
 		field.String("platform").
 			MaxLen(50).
 			Default(domain.PlatformAnthropic),
@@ -73,7 +70,6 @@ func (Group) Fields() []ent.Field {
 		field.Int("default_validity_days").
 			Default(30),
 
-		// 图片生成计费配置（antigravity 和 gemini 平台使用）
 		field.Float("image_price_1k").
 			Optional().
 			Nillable().
@@ -87,7 +83,6 @@ func (Group) Fields() []ent.Field {
 			Nillable().
 			SchemaType(map[string]string{dialect.Postgres: "decimal(20,8)"}),
 
-		// Sora 按次计费配置（阶段 1）
 		field.Float("sora_image_price_360").
 			Optional().
 			Nillable().
@@ -109,45 +104,41 @@ func (Group) Fields() []ent.Field {
 		field.Int64("sora_storage_quota_bytes").
 			Default(0),
 
-		// Claude Code 客户端限制 (added by migration 029)
 		field.Bool("claude_code_only").
 			Default(false).
-			Comment("是否仅允许 Claude Code 客户端"),
+			Comment("allow Claude Code client only"),
 		field.Int64("fallback_group_id").
 			Optional().
 			Nillable().
-			Comment("非 Claude Code 请求降级使用的分组 ID"),
+			Comment("fallback group for non-Claude-Code requests"),
 		field.Int64("fallback_group_id_on_invalid_request").
 			Optional().
 			Nillable().
-			Comment("无效请求兜底使用的分组 ID"),
+			Comment("fallback group for invalid request"),
 
-		// 模型路由配置 (added by migration 040)
 		field.JSON("model_routing", map[string][]int64{}).
 			Optional().
 			SchemaType(map[string]string{dialect.Postgres: "jsonb"}).
-			Comment("模型路由配置：模型模式 -> 优先账号ID列表"),
-
-		// 模型路由开关 (added by migration 041)
+			Comment("model routing config: pattern -> account ids"),
 		field.Bool("model_routing_enabled").
 			Default(false).
-			Comment("是否启用模型路由配置"),
+			Comment("whether model routing is enabled"),
 
-		// MCP XML 协议注入开关 (added by migration 042)
 		field.Bool("mcp_xml_inject").
 			Default(true).
-			Comment("是否注入 MCP XML 调用协议提示词（仅 antigravity 平台）"),
+			Comment("whether MCP XML prompt injection is enabled"),
 
-		// 支持的模型系列 (added by migration 046)
 		field.JSON("supported_model_scopes", []string{}).
 			Default([]string{"claude", "gemini_text", "gemini_image"}).
 			SchemaType(map[string]string{dialect.Postgres: "jsonb"}).
-			Comment("支持的模型系列：claude, gemini_text, gemini_image"),
+			Comment("supported model scopes: claude, gemini_text, gemini_image"),
 
-		// 分组排序 (added by migration 052)
 		field.Int("sort_order").
 			Default(0).
-			Comment("分组显示排序，数值越小越靠前"),
+			Comment("group display order, lower comes first"),
+		field.Bool("simulate_claude_max_enabled").
+			Default(false).
+			Comment("simulate claude usage as claude-max style (1h cache write)"),
 	}
 }
 
@@ -163,14 +154,11 @@ func (Group) Edges() []ent.Edge {
 		edge.From("allowed_users", User.Type).
 			Ref("allowed_groups").
 			Through("user_allowed_groups", UserAllowedGroup.Type),
-		// 注意：fallback_group_id 直接作为字段使用，不定义 edge
-		// 这样允许多个分组指向同一个降级分组（M2O 关系）
 	}
 }
 
 func (Group) Indexes() []ent.Index {
 	return []ent.Index{
-		// name 字段已在 Fields() 中声明 Unique()，无需重复索引
 		index.Fields("status"),
 		index.Fields("platform"),
 		index.Fields("subscription_type"),
diff --git a/backend/go.mod b/backend/go.mod
index ab76258a..d262199b 100644
--- a/backend/go.mod
+++ b/backend/go.mod
@@ -89,6 +89,7 @@ require (
 	github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
 	github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect
 	github.com/distribution/reference v0.6.0 // indirect
+	github.com/dlclark/regexp2 v1.10.0 // indirect
 	github.com/docker/docker v28.5.1+incompatible // indirect
 	github.com/docker/go-connections v0.6.0 // indirect
 	github.com/docker/go-units v0.5.0 // indirect
@@ -140,6 +141,8 @@ require (
 	github.com/opencontainers/image-spec v1.1.1 // indirect
 	github.com/pelletier/go-toml/v2 v2.2.2 // indirect
 	github.com/pkg/errors v0.9.1 // indirect
+	github.com/pkoukk/tiktoken-go v0.1.8 // indirect
+	github.com/pkoukk/tiktoken-go-loader v0.0.2 // indirect
 	github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
 	github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect
 	github.com/quic-go/qpack v0.6.0 // indirect
diff --git a/backend/go.sum b/backend/go.sum
index 32e389a7..10161387 100644
--- a/backend/go.sum
+++ b/backend/go.sum
@@ -124,6 +124,8 @@ github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f h1:lO4WD4F/r
 github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f/go.mod h1:cuUVRXasLTGF7a8hSLbxyZXjz+1KgoB3wDUb6vlszIc=
 github.com/distribution/reference v0.6.0 h1:0IXCQ5g4/QMHHkarYzh5l+u8T3t73zM5QvfrDyIgxBk=
 github.com/distribution/reference v0.6.0/go.mod h1:BbU0aIcezP1/5jX/8MP0YiH4SdvB5Y4f/wlDRiLyi3E=
+github.com/dlclark/regexp2 v1.10.0 h1:+/GIL799phkJqYW+3YbOd8LCcbHzT0Pbo8zl70MHsq0=
+github.com/dlclark/regexp2 v1.10.0/go.mod h1:DHkYz0B9wPfa6wondMfaivmHpzrQ3v9q8cnmRbL6yW8=
 github.com/docker/docker v28.5.1+incompatible h1:Bm8DchhSD2J6PsFzxC35TZo4TLGR2PdW/E69rU45NhM=
 github.com/docker/docker v28.5.1+incompatible/go.mod h1:eEKB0N0r5NX/I1kEveEz05bcu8tLC/8azJZsviup8Sk=
 github.com/docker/go-connections v0.6.0 h1:LlMG9azAe1TqfR7sO+NJttz1gy6KO7VJBh+pMmjSD94=
@@ -171,8 +173,6 @@ github.com/goccy/go-json v0.10.2 h1:CrxCmQqYDkv1z7lO7Wbh2HN93uovUHgrECaO5ZrCXAU=
 github.com/goccy/go-json v0.10.2/go.mod h1:6MelG93GURQebXPDq3khkgXZkazVtN9CRI+MGFi0w8I=
 github.com/golang-jwt/jwt/v5 v5.2.2 h1:Rl4B7itRWVtYIHFrSNd7vhTiz9UpLdi6gZhZ3wEeDy8=
 github.com/golang-jwt/jwt/v5 v5.2.2/go.mod h1:pqrtFR0X4osieyHYxtmOUWsAWrfe1Q5UVIyoH402zdk=
-github.com/golang/protobuf v1.5.4 h1:i7eJL8qZTpSEXOPTxNKhASYpMn+8e5Q6AdndVa1dWek=
-github.com/golang/protobuf v1.5.4/go.mod h1:lnTiLA8Wa4RWRcIUkrtSVa5nRhsEGBg48fD6rSs7xps=
 github.com/google/go-cmp v0.5.2/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
 github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
 github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
@@ -182,7 +182,6 @@ github.com/google/go-querystring v1.1.0/go.mod h1:Kcdr2DB4koayq7X8pmAG4sNG59So17
 github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
 github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e h1:ijClszYn+mADRFY17kjQEVQ1XRhq2/JR1M3sGqeJoxs=
 github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e/go.mod h1:boTsfXsheKC2y+lKOCMpSfarhxDeIzfZG1jqGcPl3cA=
-github.com/google/subcommands v1.2.0 h1:vWQspBTo2nEqTUFita5/KeEWlUL8kQObDFbub/EN9oE=
 github.com/google/subcommands v1.2.0/go.mod h1:ZjhPrFU+Olkh9WazFPsl27BQ4UPiG37m3yTrtFlrHVk=
 github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
 github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
@@ -203,6 +202,8 @@ github.com/icholy/digest v1.1.0 h1:HfGg9Irj7i+IX1o1QAmPfIBNu/Q5A5Tu3n/MED9k9H4=
 github.com/icholy/digest v1.1.0/go.mod h1:QNrsSGQ5v7v9cReDI0+eyjsXGUoRSUZQHeQ5C4XLa0Y=
 github.com/imroc/req/v3 v3.57.0 h1:LMTUjNRUybUkTPn8oJDq8Kg3JRBOBTcnDhKu7mzupKI=
 github.com/imroc/req/v3 v3.57.0/go.mod h1:JL62ey1nvSLq81HORNcosvlf7SxZStONNqOprg0Pz00=
+github.com/inconshreveable/mousetrap v1.1.0 h1:wN+x4NVGpMsO7ErUn/mUI3vEoE6Jt13X2s0bqwp9tc8=
+github.com/inconshreveable/mousetrap v1.1.0/go.mod h1:vpF70FUmC8bwa3OWnCshd2FqLfsEA9PFc4w1p2J65bw=
 github.com/jackc/pgpassfile v1.0.0 h1:/6Hmqy13Ss2zCq62VdNG8tM1wchn8zjSGOBJ6icpsIM=
 github.com/jackc/pgpassfile v1.0.0/go.mod h1:CEx0iS5ambNFdcRtxPj5JhEz+xB6uRky5eyVu/W2HEg=
 github.com/jackc/pgservicefile v0.0.0-20240606120523-5a60cdf6a761 h1:iCEnooe7UlwOQYpKFhBabPMi4aNAfoODPEFNiAnClxo=
@@ -285,6 +286,10 @@ github.com/pelletier/go-toml/v2 v2.2.2 h1:aYUidT7k73Pcl9nb2gScu7NSrKCSHIDE89b3+6
 github.com/pelletier/go-toml/v2 v2.2.2/go.mod h1:1t835xjRzz80PqgE6HHgN2JOsmgYu/h4qDAS4n929Rs=
 github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
 github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
+github.com/pkoukk/tiktoken-go v0.1.8 h1:85ENo+3FpWgAACBaEUVp+lctuTcYUO7BtmfhlN/QTRo=
+github.com/pkoukk/tiktoken-go v0.1.8/go.mod h1:9NiV+i9mJKGj1rYOT+njbv+ZwA/zJxYdewGl6qVatpg=
+github.com/pkoukk/tiktoken-go-loader v0.0.2 h1:LUKws63GV3pVHwH1srkBplBv+7URgmOmhSkRxsIvsK4=
+github.com/pkoukk/tiktoken-go-loader v0.0.2/go.mod h1:4mIkYyZooFlnenDlormIo6cd5wrlUKNr97wp9nGgEKo=
 github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
 github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRIccs7FGNTlIRMkT8wgtp5eCXdBlqhYGL6U=
 github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
@@ -398,8 +403,6 @@ go.opentelemetry.io/otel/metric v1.37.0 h1:mvwbQS5m0tbmqML4NqK+e3aDiO02vsf/Wgbsd
 go.opentelemetry.io/otel/metric v1.37.0/go.mod h1:04wGrZurHYKOc+RKeye86GwKiTb9FKm1WHtO+4EVr2E=
 go.opentelemetry.io/otel/sdk v1.37.0 h1:ItB0QUqnjesGRvNcmAcU0LyvkVyGJ2xftD29bWdDvKI=
 go.opentelemetry.io/otel/sdk v1.37.0/go.mod h1:VredYzxUvuo2q3WRcDnKDjbdvmO0sCzOvVAiY+yUkAg=
-go.opentelemetry.io/otel/sdk/metric v1.37.0 h1:90lI228XrB9jCMuSdA0673aubgRobVZFhbjxHHspCPc=
-go.opentelemetry.io/otel/sdk/metric v1.37.0/go.mod h1:cNen4ZWfiD37l5NhS+Keb5RXVWZWpRE+9WyVCpbo5ps=
 go.opentelemetry.io/otel/trace v1.37.0 h1:HLdcFNbRQBE2imdSEgm/kwqmQj1Or1l/7bW6mxVK7z4=
 go.opentelemetry.io/otel/trace v1.37.0/go.mod h1:TlgrlQ+PtQO5XFerSPUYG0JSgGyryXewPGyayAWSBS0=
 go.opentelemetry.io/proto/otlp v1.3.1 h1:TrMUixzpM0yuc/znrFTP9MMRh8trP93mkCiDVeXrui0=
@@ -455,8 +458,6 @@ golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGm
 golang.org/x/tools v0.41.0 h1:a9b8iMweWG+S0OBnlU36rzLp20z1Rp10w+IY2czHTQc=
 golang.org/x/tools v0.41.0/go.mod h1:XSY6eDqxVNiYgezAVqqCeihT4j1U2CCsqvH3WhQpnlg=
 golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
-gonum.org/v1/gonum v0.16.0 h1:5+ul4Swaf3ESvrOnidPp4GZbzf0mxVQpDCYUQE7OJfk=
-gonum.org/v1/gonum v0.16.0/go.mod h1:fef3am4MQ93R2HHpKnLk4/Tbh/s0+wqD5nfa6Pnwy4E=
 google.golang.org/genproto v0.0.0-20231106174013-bbf56f31fb17 h1:wpZ8pe2x1Q3f2KyT5f8oP/fa9rHAKgFPr/HZdNuS+PQ=
 google.golang.org/genproto/googleapis/api v0.0.0-20250929231259-57b25ae835d4 h1:8XJ4pajGwOlasW+L13MnEGA8W4115jJySQtVfS2/IBU=
 google.golang.org/genproto/googleapis/api v0.0.0-20250929231259-57b25ae835d4/go.mod h1:NnuHhy+bxcg30o7FnVAZbXsPHUDQ9qKWAQKCD7VxFtk=
diff --git a/backend/internal/handler/admin/account_handler.go b/backend/internal/handler/admin/account_handler.go
index 14f9e05d..d66a009c 100644
--- a/backend/internal/handler/admin/account_handler.go
+++ b/backend/internal/handler/admin/account_handler.go
@@ -102,6 +102,7 @@ type CreateAccountRequest struct {
 	Concurrency             int            `json:"concurrency"`
 	Priority                int            `json:"priority"`
 	RateMultiplier          *float64       `json:"rate_multiplier"`
+	LoadFactor              *int           `json:"load_factor"`
 	GroupIDs                []int64        `json:"group_ids"`
 	ExpiresAt               *int64         `json:"expires_at"`
 	AutoPauseOnExpired      *bool          `json:"auto_pause_on_expired"`
@@ -120,6 +121,7 @@ type UpdateAccountRequest struct {
 	Concurrency             *int           `json:"concurrency"`
 	Priority                *int           `json:"priority"`
 	RateMultiplier          *float64       `json:"rate_multiplier"`
+	LoadFactor              *int           `json:"load_factor"`
 	Status                  string         `json:"status" binding:"omitempty,oneof=active inactive"`
 	GroupIDs                *[]int64       `json:"group_ids"`
 	ExpiresAt               *int64         `json:"expires_at"`
@@ -135,6 +137,7 @@ type BulkUpdateAccountsRequest struct {
 	Concurrency             *int           `json:"concurrency"`
 	Priority                *int           `json:"priority"`
 	RateMultiplier          *float64       `json:"rate_multiplier"`
+	LoadFactor              *int           `json:"load_factor"`
 	Status                  string         `json:"status" binding:"omitempty,oneof=active inactive error"`
 	Schedulable             *bool          `json:"schedulable"`
 	GroupIDs                *[]int64       `json:"group_ids"`
@@ -506,6 +509,7 @@ func (h *AccountHandler) Create(c *gin.Context) {
 			Concurrency:           req.Concurrency,
 			Priority:              req.Priority,
 			RateMultiplier:        req.RateMultiplier,
+			LoadFactor:            req.LoadFactor,
 			GroupIDs:              req.GroupIDs,
 			ExpiresAt:             req.ExpiresAt,
 			AutoPauseOnExpired:    req.AutoPauseOnExpired,
@@ -575,6 +579,7 @@ func (h *AccountHandler) Update(c *gin.Context) {
 		Concurrency:           req.Concurrency, // 指针类型，nil 表示未提供
 		Priority:              req.Priority,    // 指针类型，nil 表示未提供
 		RateMultiplier:        req.RateMultiplier,
+		LoadFactor:            req.LoadFactor,
 		Status:                req.Status,
 		GroupIDs:              req.GroupIDs,
 		ExpiresAt:             req.ExpiresAt,
@@ -1101,6 +1106,7 @@ func (h *AccountHandler) BulkUpdate(c *gin.Context) {
 		req.Concurrency != nil ||
 		req.Priority != nil ||
 		req.RateMultiplier != nil ||
+		req.LoadFactor != nil ||
 		req.Status != "" ||
 		req.Schedulable != nil ||
 		req.GroupIDs != nil ||
@@ -1119,6 +1125,7 @@ func (h *AccountHandler) BulkUpdate(c *gin.Context) {
 		Concurrency:           req.Concurrency,
 		Priority:              req.Priority,
 		RateMultiplier:        req.RateMultiplier,
+		LoadFactor:            req.LoadFactor,
 		Status:                req.Status,
 		Schedulable:           req.Schedulable,
 		GroupIDs:              req.GroupIDs,
@@ -1132,6 +1139,12 @@ func (h *AccountHandler) BulkUpdate(c *gin.Context) {
 			c.JSON(409, gin.H{
 				"error":   "mixed_channel_warning",
 				"message": mixedErr.Error(),
+				"details": gin.H{
+					"group_id":         mixedErr.GroupID,
+					"group_name":       mixedErr.GroupName,
+					"current_platform": mixedErr.CurrentPlatform,
+					"other_platform":   mixedErr.OtherPlatform,
+				},
 			})
 			return
 		}
@@ -1328,6 +1341,29 @@ func (h *AccountHandler) ClearRateLimit(c *gin.Context) {
 	response.Success(c, h.buildAccountResponseWithRuntime(c.Request.Context(), account))
 }
 
+// ResetQuota handles resetting account quota usage
+// POST /api/v1/admin/accounts/:id/reset-quota
+func (h *AccountHandler) ResetQuota(c *gin.Context) {
+	accountID, err := strconv.ParseInt(c.Param("id"), 10, 64)
+	if err != nil {
+		response.BadRequest(c, "Invalid account ID")
+		return
+	}
+
+	if err := h.adminService.ResetAccountQuota(c.Request.Context(), accountID); err != nil {
+		response.InternalError(c, "Failed to reset account quota: "+err.Error())
+		return
+	}
+
+	account, err := h.adminService.GetAccount(c.Request.Context(), accountID)
+	if err != nil {
+		response.ErrorFrom(c, err)
+		return
+	}
+
+	response.Success(c, h.buildAccountResponseWithRuntime(c.Request.Context(), account))
+}
+
 // GetTempUnschedulable handles getting temporary unschedulable status
 // GET /api/v1/admin/accounts/:id/temp-unschedulable
 func (h *AccountHandler) GetTempUnschedulable(c *gin.Context) {
diff --git a/backend/internal/handler/admin/account_handler_mixed_channel_test.go b/backend/internal/handler/admin/account_handler_mixed_channel_test.go
index 24ec5bcf..5b81db2a 100644
--- a/backend/internal/handler/admin/account_handler_mixed_channel_test.go
+++ b/backend/internal/handler/admin/account_handler_mixed_channel_test.go
@@ -111,7 +111,7 @@ func TestAccountHandlerCreateMixedChannelConflictSimplifiedResponse(t *testing.T
 	var resp map[string]any
 	require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
 	require.Equal(t, "mixed_channel_warning", resp["error"])
-	require.Contains(t, resp["message"], "mixed_channel_warning")
+	require.Contains(t, resp["message"], "claude-max")
 	_, hasDetails := resp["details"]
 	_, hasRequireConfirmation := resp["require_confirmation"]
 	require.False(t, hasDetails)
@@ -140,7 +140,7 @@ func TestAccountHandlerUpdateMixedChannelConflictSimplifiedResponse(t *testing.T
 	var resp map[string]any
 	require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &resp))
 	require.Equal(t, "mixed_channel_warning", resp["error"])
-	require.Contains(t, resp["message"], "mixed_channel_warning")
+	require.Contains(t, resp["message"], "claude-max")
 	_, hasDetails := resp["details"]
 	_, hasRequireConfirmation := resp["require_confirmation"]
 	require.False(t, hasDetails)
diff --git a/backend/internal/handler/admin/admin_service_stub_test.go b/backend/internal/handler/admin/admin_service_stub_test.go
index f3b99ddb..84a9f102 100644
--- a/backend/internal/handler/admin/admin_service_stub_test.go
+++ b/backend/internal/handler/admin/admin_service_stub_test.go
@@ -425,5 +425,9 @@ func (s *stubAdminService) AdminUpdateAPIKeyGroupID(ctx context.Context, keyID i
 	return nil, service.ErrAPIKeyNotFound
 }
 
+func (s *stubAdminService) ResetAccountQuota(ctx context.Context, id int64) error {
+	return nil
+}
+
 // Ensure stub implements interface.
 var _ service.AdminService = (*stubAdminService)(nil)
diff --git a/backend/internal/handler/admin/group_handler.go b/backend/internal/handler/admin/group_handler.go
index 1edf4dcc..a3f16735 100644
--- a/backend/internal/handler/admin/group_handler.go
+++ b/backend/internal/handler/admin/group_handler.go
@@ -46,9 +46,10 @@ type CreateGroupRequest struct {
 	FallbackGroupID                 *int64   `json:"fallback_group_id"`
 	FallbackGroupIDOnInvalidRequest *int64   `json:"fallback_group_id_on_invalid_request"`
 	// 模型路由配置（仅 anthropic 平台使用）
-	ModelRouting        map[string][]int64 `json:"model_routing"`
-	ModelRoutingEnabled bool               `json:"model_routing_enabled"`
-	MCPXMLInject        *bool              `json:"mcp_xml_inject"`
+	ModelRouting             map[string][]int64 `json:"model_routing"`
+	ModelRoutingEnabled      bool               `json:"model_routing_enabled"`
+	MCPXMLInject             *bool              `json:"mcp_xml_inject"`
+	SimulateClaudeMaxEnabled *bool              `json:"simulate_claude_max_enabled"`
 	// 支持的模型系列（仅 antigravity 平台使用）
 	SupportedModelScopes []string `json:"supported_model_scopes"`
 	// Sora 存储配额
@@ -81,9 +82,10 @@ type UpdateGroupRequest struct {
 	FallbackGroupID                 *int64   `json:"fallback_group_id"`
 	FallbackGroupIDOnInvalidRequest *int64   `json:"fallback_group_id_on_invalid_request"`
 	// 模型路由配置（仅 anthropic 平台使用）
-	ModelRouting        map[string][]int64 `json:"model_routing"`
-	ModelRoutingEnabled *bool              `json:"model_routing_enabled"`
-	MCPXMLInject        *bool              `json:"mcp_xml_inject"`
+	ModelRouting             map[string][]int64 `json:"model_routing"`
+	ModelRoutingEnabled      *bool              `json:"model_routing_enabled"`
+	MCPXMLInject             *bool              `json:"mcp_xml_inject"`
+	SimulateClaudeMaxEnabled *bool              `json:"simulate_claude_max_enabled"`
 	// 支持的模型系列（仅 antigravity 平台使用）
 	SupportedModelScopes *[]string `json:"supported_model_scopes"`
 	// Sora 存储配额
@@ -201,6 +203,7 @@ func (h *GroupHandler) Create(c *gin.Context) {
 		ModelRouting:                    req.ModelRouting,
 		ModelRoutingEnabled:             req.ModelRoutingEnabled,
 		MCPXMLInject:                    req.MCPXMLInject,
+		SimulateClaudeMaxEnabled:        req.SimulateClaudeMaxEnabled,
 		SupportedModelScopes:            req.SupportedModelScopes,
 		SoraStorageQuotaBytes:           req.SoraStorageQuotaBytes,
 		CopyAccountsFromGroupIDs:        req.CopyAccountsFromGroupIDs,
@@ -252,6 +255,7 @@ func (h *GroupHandler) Update(c *gin.Context) {
 		ModelRouting:                    req.ModelRouting,
 		ModelRoutingEnabled:             req.ModelRoutingEnabled,
 		MCPXMLInject:                    req.MCPXMLInject,
+		SimulateClaudeMaxEnabled:        req.SimulateClaudeMaxEnabled,
 		SupportedModelScopes:            req.SupportedModelScopes,
 		SoraStorageQuotaBytes:           req.SoraStorageQuotaBytes,
 		CopyAccountsFromGroupIDs:        req.CopyAccountsFromGroupIDs,
diff --git a/backend/internal/handler/dto/mappers.go b/backend/internal/handler/dto/mappers.go
index fe2a1d77..d3fc51ed 100644
--- a/backend/internal/handler/dto/mappers.go
+++ b/backend/internal/handler/dto/mappers.go
@@ -122,13 +122,14 @@ func GroupFromServiceAdmin(g *service.Group) *AdminGroup {
 		return nil
 	}
 	out := &AdminGroup{
-		Group:                groupFromServiceBase(g),
-		ModelRouting:         g.ModelRouting,
-		ModelRoutingEnabled:  g.ModelRoutingEnabled,
-		MCPXMLInject:         g.MCPXMLInject,
-		SupportedModelScopes: g.SupportedModelScopes,
-		AccountCount:         g.AccountCount,
-		SortOrder:            g.SortOrder,
+		Group:                    groupFromServiceBase(g),
+		ModelRouting:             g.ModelRouting,
+		ModelRoutingEnabled:      g.ModelRoutingEnabled,
+		MCPXMLInject:             g.MCPXMLInject,
+		SimulateClaudeMaxEnabled: g.SimulateClaudeMaxEnabled,
+		SupportedModelScopes:     g.SupportedModelScopes,
+		AccountCount:             g.AccountCount,
+		SortOrder:                g.SortOrder,
 	}
 	if len(g.AccountGroups) > 0 {
 		out.AccountGroups = make([]AccountGroup, 0, len(g.AccountGroups))
@@ -183,6 +184,7 @@ func AccountFromServiceShallow(a *service.Account) *Account {
 		Extra:                   a.Extra,
 		ProxyID:                 a.ProxyID,
 		Concurrency:             a.Concurrency,
+		LoadFactor:              a.LoadFactor,
 		Priority:                a.Priority,
 		RateMultiplier:          a.BillingRateMultiplier(),
 		Status:                  a.Status,
@@ -248,6 +250,17 @@ func AccountFromServiceShallow(a *service.Account) *Account {
 		}
 	}
 
+	// 提取 API Key 账号配额限制（仅 apikey 类型有效）
+	if a.Type == service.AccountTypeAPIKey {
+		if limit := a.GetQuotaLimit(); limit > 0 {
+			out.QuotaLimit = &limit
+		}
+		used := a.GetQuotaUsed()
+		if out.QuotaLimit != nil {
+			out.QuotaUsed = &used
+		}
+	}
+
 	return out
 }
 
diff --git a/backend/internal/handler/dto/types.go b/backend/internal/handler/dto/types.go
index 920615f7..f0d13d3f 100644
--- a/backend/internal/handler/dto/types.go
+++ b/backend/internal/handler/dto/types.go
@@ -111,6 +111,8 @@ type AdminGroup struct {
 
 	// MCP XML 协议注入（仅 antigravity 平台使用）
 	MCPXMLInject bool `json:"mcp_xml_inject"`
+	// Claude usage 模拟开关（仅管理员可见）
+	SimulateClaudeMaxEnabled bool `json:"simulate_claude_max_enabled"`
 
 	// 支持的模型系列（仅 antigravity 平台使用）
 	SupportedModelScopes []string       `json:"supported_model_scopes"`
@@ -131,6 +133,7 @@ type Account struct {
 	Extra              map[string]any `json:"extra"`
 	ProxyID            *int64         `json:"proxy_id"`
 	Concurrency        int            `json:"concurrency"`
+	LoadFactor         *int           `json:"load_factor,omitempty"`
 	Priority           int            `json:"priority"`
 	RateMultiplier     float64        `json:"rate_multiplier"`
 	Status             string         `json:"status"`
@@ -185,6 +188,10 @@ type Account struct {
 	CacheTTLOverrideEnabled *bool   `json:"cache_ttl_override_enabled,omitempty"`
 	CacheTTLOverrideTarget  *string `json:"cache_ttl_override_target,omitempty"`
 
+	// API Key 账号配额限制
+	QuotaLimit *float64 `json:"quota_limit,omitempty"`
+	QuotaUsed  *float64 `json:"quota_used,omitempty"`
+
 	Proxy         *Proxy         `json:"proxy,omitempty"`
 	AccountGroups []AccountGroup `json:"account_groups,omitempty"`
 
diff --git a/backend/internal/handler/gateway_handler.go b/backend/internal/handler/gateway_handler.go
index 1c0ef8e6..3730bcf7 100644
--- a/backend/internal/handler/gateway_handler.go
+++ b/backend/internal/handler/gateway_handler.go
@@ -439,6 +439,7 @@ func (h *GatewayHandler) Messages(c *gin.Context) {
 			h.submitUsageRecordTask(func(ctx context.Context) {
 				if err := h.gatewayService.RecordUsage(ctx, &service.RecordUsageInput{
 					Result:            result,
+					ParsedRequest:     parsedReq,
 					APIKey:            apiKey,
 					User:              apiKey.User,
 					Account:           account,
@@ -630,6 +631,7 @@ func (h *GatewayHandler) Messages(c *gin.Context) {
 			// ===== 用户消息串行队列 END =====
 
 			// 转发请求 - 根据账号平台分流
+			c.Set("parsed_request", parsedReq)
 			var result *service.ForwardResult
 			requestCtx := c.Request.Context()
 			if fs.SwitchCount > 0 {
@@ -734,6 +736,7 @@ func (h *GatewayHandler) Messages(c *gin.Context) {
 			h.submitUsageRecordTask(func(ctx context.Context) {
 				if err := h.gatewayService.RecordUsage(ctx, &service.RecordUsageInput{
 					Result:            result,
+					ParsedRequest:     parsedReq,
 					APIKey:            currentAPIKey,
 					User:              currentAPIKey.User,
 					Account:           account,
diff --git a/backend/internal/handler/sora_client_handler_test.go b/backend/internal/handler/sora_client_handler_test.go
index d2d9790d..d2a849b1 100644
--- a/backend/internal/handler/sora_client_handler_test.go
+++ b/backend/internal/handler/sora_client_handler_test.go
@@ -2132,6 +2132,14 @@ func (r *stubAccountRepoForHandler) BulkUpdate(context.Context, []int64, service
 	return 0, nil
 }
 
+func (r *stubAccountRepoForHandler) IncrementQuotaUsed(context.Context, int64, float64) error {
+	return nil
+}
+
+func (r *stubAccountRepoForHandler) ResetQuotaUsed(context.Context, int64) error {
+	return nil
+}
+
 // ==================== Stub: SoraClient (用于 SoraGatewayService) ====================
 
 var _ service.SoraClient = (*stubSoraClientForHandler)(nil)
diff --git a/backend/internal/handler/sora_gateway_handler_test.go b/backend/internal/handler/sora_gateway_handler_test.go
index b76ab67d..637462ad 100644
--- a/backend/internal/handler/sora_gateway_handler_test.go
+++ b/backend/internal/handler/sora_gateway_handler_test.go
@@ -216,6 +216,14 @@ func (r *stubAccountRepo) BulkUpdate(ctx context.Context, ids []int64, updates s
 	return 0, nil
 }
 
+func (r *stubAccountRepo) IncrementQuotaUsed(ctx context.Context, id int64, amount float64) error {
+	return nil
+}
+
+func (r *stubAccountRepo) ResetQuotaUsed(ctx context.Context, id int64) error {
+	return nil
+}
+
 func (r *stubAccountRepo) listSchedulable() []service.Account {
 	var result []service.Account
 	for _, acc := range r.accounts {
diff --git a/backend/internal/pkg/antigravity/stream_transformer.go b/backend/internal/pkg/antigravity/stream_transformer.go
index 677435ad..54f7e282 100644
--- a/backend/internal/pkg/antigravity/stream_transformer.go
+++ b/backend/internal/pkg/antigravity/stream_transformer.go
@@ -18,6 +18,9 @@ const (
 	BlockTypeFunction
 )
 
+// UsageMapHook is a callback that can modify usage data before it's emitted in SSE events.
+type UsageMapHook func(usageMap map[string]any)
+
 // StreamingProcessor 流式响应处理器
 type StreamingProcessor struct {
 	blockType         BlockType
@@ -30,6 +33,7 @@ type StreamingProcessor struct {
 	originalModel     string
 	webSearchQueries  []string
 	groundingChunks   []GeminiGroundingChunk
+	usageMapHook      UsageMapHook
 
 	// 累计 usage
 	inputTokens     int
@@ -45,6 +49,25 @@ func NewStreamingProcessor(originalModel string) *StreamingProcessor {
 	}
 }
 
+// SetUsageMapHook sets an optional hook that modifies usage maps before they are emitted.
+func (p *StreamingProcessor) SetUsageMapHook(fn UsageMapHook) {
+	p.usageMapHook = fn
+}
+
+func usageToMap(u ClaudeUsage) map[string]any {
+	m := map[string]any{
+		"input_tokens":  u.InputTokens,
+		"output_tokens": u.OutputTokens,
+	}
+	if u.CacheCreationInputTokens > 0 {
+		m["cache_creation_input_tokens"] = u.CacheCreationInputTokens
+	}
+	if u.CacheReadInputTokens > 0 {
+		m["cache_read_input_tokens"] = u.CacheReadInputTokens
+	}
+	return m
+}
+
 // ProcessLine 处理 SSE 行，返回 Claude SSE 事件
 func (p *StreamingProcessor) ProcessLine(line string) []byte {
 	line = strings.TrimSpace(line)
@@ -158,6 +181,13 @@ func (p *StreamingProcessor) emitMessageStart(v1Resp *V1InternalResponse) []byte
 		responseID = "msg_" + generateRandomID()
 	}
 
+	var usageValue any = usage
+	if p.usageMapHook != nil {
+		usageMap := usageToMap(usage)
+		p.usageMapHook(usageMap)
+		usageValue = usageMap
+	}
+
 	message := map[string]any{
 		"id":            responseID,
 		"type":          "message",
@@ -166,7 +196,7 @@ func (p *StreamingProcessor) emitMessageStart(v1Resp *V1InternalResponse) []byte
 		"model":         p.originalModel,
 		"stop_reason":   nil,
 		"stop_sequence": nil,
-		"usage":         usage,
+		"usage":         usageValue,
 	}
 
 	event := map[string]any{
@@ -477,13 +507,20 @@ func (p *StreamingProcessor) emitFinish(finishReason string) []byte {
 		CacheReadInputTokens: p.cacheReadTokens,
 	}
 
+	var usageValue any = usage
+	if p.usageMapHook != nil {
+		usageMap := usageToMap(usage)
+		p.usageMapHook(usageMap)
+		usageValue = usageMap
+	}
+
 	deltaEvent := map[string]any{
 		"type": "message_delta",
 		"delta": map[string]any{
 			"stop_reason":   stopReason,
 			"stop_sequence": nil,
 		},
-		"usage": usage,
+		"usage": usageValue,
 	}
 
 	_, _ = result.Write(p.formatSSE("message_delta", deltaEvent))
diff --git a/backend/internal/repository/account_repo.go b/backend/internal/repository/account_repo.go
index 6f0c5424..ffbfd466 100644
--- a/backend/internal/repository/account_repo.go
+++ b/backend/internal/repository/account_repo.go
@@ -84,6 +84,9 @@ func (r *accountRepository) Create(ctx context.Context, account *service.Account
 	if account.RateMultiplier != nil {
 		builder.SetRateMultiplier(*account.RateMultiplier)
 	}
+	if account.LoadFactor != nil {
+		builder.SetLoadFactor(*account.LoadFactor)
+	}
 
 	if account.ProxyID != nil {
 		builder.SetProxyID(*account.ProxyID)
@@ -318,6 +321,11 @@ func (r *accountRepository) Update(ctx context.Context, account *service.Account
 	if account.RateMultiplier != nil {
 		builder.SetRateMultiplier(*account.RateMultiplier)
 	}
+	if account.LoadFactor != nil {
+		builder.SetLoadFactor(*account.LoadFactor)
+	} else {
+		builder.ClearLoadFactor()
+	}
 
 	if account.ProxyID != nil {
 		builder.SetProxyID(*account.ProxyID)
@@ -1223,6 +1231,15 @@ func (r *accountRepository) BulkUpdate(ctx context.Context, ids []int64, updates
 		args = append(args, *updates.RateMultiplier)
 		idx++
 	}
+	if updates.LoadFactor != nil {
+		if *updates.LoadFactor <= 0 {
+			setClauses = append(setClauses, "load_factor = NULL")
+		} else {
+			setClauses = append(setClauses, "load_factor = $"+itoa(idx))
+			args = append(args, *updates.LoadFactor)
+			idx++
+		}
+	}
 	if updates.Status != nil {
 		setClauses = append(setClauses, "status = $"+itoa(idx))
 		args = append(args, *updates.Status)
@@ -1545,6 +1562,7 @@ func accountEntityToService(m *dbent.Account) *service.Account {
 		Concurrency:             m.Concurrency,
 		Priority:                m.Priority,
 		RateMultiplier:          &rateMultiplier,
+		LoadFactor:              m.LoadFactor,
 		Status:                  m.Status,
 		ErrorMessage:            derefString(m.ErrorMessage),
 		LastUsedAt:              m.LastUsedAt,
@@ -1657,3 +1675,60 @@ func (r *accountRepository) FindByExtraField(ctx context.Context, key string, va
 
 	return r.accountsToService(ctx, accounts)
 }
+
+// IncrementQuotaUsed 原子递增账号的 extra.quota_used 字段
+func (r *accountRepository) IncrementQuotaUsed(ctx context.Context, id int64, amount float64) error {
+	rows, err := r.sql.QueryContext(ctx,
+		`UPDATE accounts SET extra = jsonb_set(
+			COALESCE(extra, '{}'::jsonb),
+			'{quota_used}',
+			to_jsonb(COALESCE((extra->>'quota_used')::numeric, 0) + $1)
+		), updated_at = NOW()
+		WHERE id = $2 AND deleted_at IS NULL
+		RETURNING
+			COALESCE((extra->>'quota_used')::numeric, 0),
+			COALESCE((extra->>'quota_limit')::numeric, 0)`,
+		amount, id)
+	if err != nil {
+		return err
+	}
+	defer func() { _ = rows.Close() }()
+
+	var newUsed, limit float64
+	if rows.Next() {
+		if err := rows.Scan(&newUsed, &limit); err != nil {
+			return err
+		}
+	}
+	if err := rows.Err(); err != nil {
+		return err
+	}
+
+	// 配额刚超限时触发调度快照刷新，使账号及时从调度候选中移除
+	if limit > 0 && newUsed >= limit && (newUsed-amount) < limit {
+		if err := enqueueSchedulerOutbox(ctx, r.sql, service.SchedulerOutboxEventAccountChanged, &id, nil, nil); err != nil {
+			logger.LegacyPrintf("repository.account", "[SchedulerOutbox] enqueue quota exceeded failed: account=%d err=%v", id, err)
+		}
+	}
+	return nil
+}
+
+// ResetQuotaUsed 重置账号的 extra.quota_used 为 0
+func (r *accountRepository) ResetQuotaUsed(ctx context.Context, id int64) error {
+	_, err := r.sql.ExecContext(ctx,
+		`UPDATE accounts SET extra = jsonb_set(
+			COALESCE(extra, '{}'::jsonb),
+			'{quota_used}',
+			'0'::jsonb
+		), updated_at = NOW()
+		WHERE id = $1 AND deleted_at IS NULL`,
+		id)
+	if err != nil {
+		return err
+	}
+	// 重置配额后触发调度快照刷新，使账号重新参与调度
+	if err := enqueueSchedulerOutbox(ctx, r.sql, service.SchedulerOutboxEventAccountChanged, &id, nil, nil); err != nil {
+		logger.LegacyPrintf("repository.account", "[SchedulerOutbox] enqueue quota reset failed: account=%d err=%v", id, err)
+	}
+	return nil
+}
diff --git a/backend/internal/repository/api_key_repo.go b/backend/internal/repository/api_key_repo.go
index c761e8c9..8167a452 100644
--- a/backend/internal/repository/api_key_repo.go
+++ b/backend/internal/repository/api_key_repo.go
@@ -164,6 +164,7 @@ func (r *apiKeyRepository) GetByKeyForAuth(ctx context.Context, key string) (*se
 				group.FieldModelRoutingEnabled,
 				group.FieldModelRouting,
 				group.FieldMcpXMLInject,
+				group.FieldSimulateClaudeMaxEnabled,
 				group.FieldSupportedModelScopes,
 			)
 		}).
@@ -617,6 +618,7 @@ func groupEntityToService(g *dbent.Group) *service.Group {
 		ModelRouting:                    g.ModelRouting,
 		ModelRoutingEnabled:             g.ModelRoutingEnabled,
 		MCPXMLInject:                    g.McpXMLInject,
+		SimulateClaudeMaxEnabled:        g.SimulateClaudeMaxEnabled,
 		SupportedModelScopes:            g.SupportedModelScopes,
 		SortOrder:                       g.SortOrder,
 		CreatedAt:                       g.CreatedAt,
diff --git a/backend/internal/repository/claude_usage_service.go b/backend/internal/repository/claude_usage_service.go
index f6054828..1264f6bb 100644
--- a/backend/internal/repository/claude_usage_service.go
+++ b/backend/internal/repository/claude_usage_service.go
@@ -8,6 +8,7 @@ import (
 	"net/http"
 	"time"
 
+	infraerrors "github.com/Wei-Shaw/sub2api/internal/pkg/errors"
 	"github.com/Wei-Shaw/sub2api/internal/pkg/httpclient"
 	"github.com/Wei-Shaw/sub2api/internal/service"
 )
@@ -95,7 +96,8 @@ func (s *claudeUsageService) FetchUsageWithOptions(ctx context.Context, opts *se
 
 	if resp.StatusCode != http.StatusOK {
 		body, _ := io.ReadAll(resp.Body)
-		return nil, fmt.Errorf("API returned status %d: %s", resp.StatusCode, string(body))
+		msg := fmt.Sprintf("API returned status %d: %s", resp.StatusCode, string(body))
+		return nil, infraerrors.New(http.StatusInternalServerError, "UPSTREAM_ERROR", msg)
 	}
 
 	var usageResp service.ClaudeUsageResponse
diff --git a/backend/internal/repository/group_repo.go b/backend/internal/repository/group_repo.go
index 4edc8534..aba11011 100644
--- a/backend/internal/repository/group_repo.go
+++ b/backend/internal/repository/group_repo.go
@@ -59,7 +59,8 @@ func (r *groupRepository) Create(ctx context.Context, groupIn *service.Group) er
 		SetNillableFallbackGroupIDOnInvalidRequest(groupIn.FallbackGroupIDOnInvalidRequest).
 		SetModelRoutingEnabled(groupIn.ModelRoutingEnabled).
 		SetMcpXMLInject(groupIn.MCPXMLInject).
-		SetSoraStorageQuotaBytes(groupIn.SoraStorageQuotaBytes)
+		SetSoraStorageQuotaBytes(groupIn.SoraStorageQuotaBytes).
+		SetSimulateClaudeMaxEnabled(groupIn.SimulateClaudeMaxEnabled)
 
 	// 设置模型路由配置
 	if groupIn.ModelRouting != nil {
@@ -125,7 +126,8 @@ func (r *groupRepository) Update(ctx context.Context, groupIn *service.Group) er
 		SetClaudeCodeOnly(groupIn.ClaudeCodeOnly).
 		SetModelRoutingEnabled(groupIn.ModelRoutingEnabled).
 		SetMcpXMLInject(groupIn.MCPXMLInject).
-		SetSoraStorageQuotaBytes(groupIn.SoraStorageQuotaBytes)
+		SetSoraStorageQuotaBytes(groupIn.SoraStorageQuotaBytes).
+		SetSimulateClaudeMaxEnabled(groupIn.SimulateClaudeMaxEnabled)
 
 	// 显式处理可空字段：nil 需要 clear，非 nil 需要 set。
 	if groupIn.DailyLimitUSD != nil {
diff --git a/backend/internal/repository/usage_log_repo.go b/backend/internal/repository/usage_log_repo.go
index 7fc11b78..26e975f6 100644
--- a/backend/internal/repository/usage_log_repo.go
+++ b/backend/internal/repository/usage_log_repo.go
@@ -1870,7 +1870,7 @@ func (r *usageLogRepository) GetGroupStatsWithFilters(ctx context.Context, start
 	query := `
 		SELECT
 			COALESCE(ul.group_id, 0) as group_id,
-			COALESCE(g.name, '') as group_name,
+			COALESCE(g.name, '(无分组)') as group_name,
 			COUNT(*) as requests,
 			COALESCE(SUM(ul.input_tokens + ul.output_tokens + ul.cache_creation_tokens + ul.cache_read_tokens), 0) as total_tokens,
 			COALESCE(SUM(ul.total_cost), 0) as cost,
diff --git a/backend/internal/server/api_contract_test.go b/backend/internal/server/api_contract_test.go
index 40b2d592..aafbbe21 100644
--- a/backend/internal/server/api_contract_test.go
+++ b/backend/internal/server/api_contract_test.go
@@ -1096,6 +1096,14 @@ func (s *stubAccountRepo) UpdateExtra(ctx context.Context, id int64, updates map
 	return errors.New("not implemented")
 }
 
+func (s *stubAccountRepo) IncrementQuotaUsed(ctx context.Context, id int64, amount float64) error {
+	return errors.New("not implemented")
+}
+
+func (s *stubAccountRepo) ResetQuotaUsed(ctx context.Context, id int64) error {
+	return errors.New("not implemented")
+}
+
 func (s *stubAccountRepo) BulkUpdate(ctx context.Context, ids []int64, updates service.AccountBulkUpdate) (int64, error) {
 	s.bulkUpdateIDs = append([]int64{}, ids...)
 	return int64(len(ids)), nil
diff --git a/backend/internal/server/routes/admin.go b/backend/internal/server/routes/admin.go
index e9f9bf62..2e53feb3 100644
--- a/backend/internal/server/routes/admin.go
+++ b/backend/internal/server/routes/admin.go
@@ -252,6 +252,7 @@ func registerAccountRoutes(admin *gin.RouterGroup, h *handler.Handlers) {
 		accounts.GET("/:id/today-stats", h.Admin.Account.GetTodayStats)
 		accounts.POST("/today-stats/batch", h.Admin.Account.GetBatchTodayStats)
 		accounts.POST("/:id/clear-rate-limit", h.Admin.Account.ClearRateLimit)
+		accounts.POST("/:id/reset-quota", h.Admin.Account.ResetQuota)
 		accounts.GET("/:id/temp-unschedulable", h.Admin.Account.GetTempUnschedulable)
 		accounts.DELETE("/:id/temp-unschedulable", h.Admin.Account.ClearTempUnschedulable)
 		accounts.POST("/:id/schedulable", h.Admin.Account.SetSchedulable)
diff --git a/backend/internal/service/account.go b/backend/internal/service/account.go
index 7d56b754..8eb3748c 100644
--- a/backend/internal/service/account.go
+++ b/backend/internal/service/account.go
@@ -28,6 +28,7 @@ type Account struct {
 	// RateMultiplier 账号计费倍率（>=0，允许 0 表示该账号计费为 0）。
 	// 使用指针用于兼容旧版本调度缓存（Redis）中缺字段的情况：nil 表示按 1.0 处理。
 	RateMultiplier     *float64
+	LoadFactor         *int // 调度负载因子；nil 表示使用 Concurrency
 	Status             string
 	ErrorMessage       string
 	LastUsedAt         *time.Time
@@ -88,6 +89,19 @@ func (a *Account) BillingRateMultiplier() float64 {
 	return *a.RateMultiplier
 }
 
+func (a *Account) EffectiveLoadFactor() int {
+	if a == nil {
+		return 1
+	}
+	if a.LoadFactor != nil && *a.LoadFactor > 0 {
+		return *a.LoadFactor
+	}
+	if a.Concurrency > 0 {
+		return a.Concurrency
+	}
+	return 1
+}
+
 func (a *Account) IsSchedulable() bool {
 	if !a.IsActive() || !a.Schedulable {
 		return false
@@ -1117,6 +1131,38 @@ func (a *Account) GetCacheTTLOverrideTarget() string {
 	return "5m"
 }
 
+// GetQuotaLimit 获取 API Key 账号的配额限制（美元）
+// 返回 0 表示未启用
+func (a *Account) GetQuotaLimit() float64 {
+	if a.Extra == nil {
+		return 0
+	}
+	if v, ok := a.Extra["quota_limit"]; ok {
+		return parseExtraFloat64(v)
+	}
+	return 0
+}
+
+// GetQuotaUsed 获取 API Key 账号的已用配额（美元）
+func (a *Account) GetQuotaUsed() float64 {
+	if a.Extra == nil {
+		return 0
+	}
+	if v, ok := a.Extra["quota_used"]; ok {
+		return parseExtraFloat64(v)
+	}
+	return 0
+}
+
+// IsQuotaExceeded 检查 API Key 账号配额是否已超限
+func (a *Account) IsQuotaExceeded() bool {
+	limit := a.GetQuotaLimit()
+	if limit <= 0 {
+		return false
+	}
+	return a.GetQuotaUsed() >= limit
+}
+
 // GetWindowCostLimit 获取 5h 窗口费用阈值（美元）
 // 返回 0 表示未启用
 func (a *Account) GetWindowCostLimit() float64 {
diff --git a/backend/internal/service/account_load_factor_test.go b/backend/internal/service/account_load_factor_test.go
new file mode 100644
index 00000000..1cd1b17c
--- /dev/null
+++ b/backend/internal/service/account_load_factor_test.go
@@ -0,0 +1,42 @@
+package service
+
+import (
+	"testing"
+
+	"github.com/stretchr/testify/require"
+)
+
+func TestEffectiveLoadFactor_NilAccount(t *testing.T) {
+	var a *Account
+	require.Equal(t, 1, a.EffectiveLoadFactor())
+}
+
+func TestEffectiveLoadFactor_NilLoadFactor_PositiveConcurrency(t *testing.T) {
+	a := &Account{Concurrency: 5}
+	require.Equal(t, 5, a.EffectiveLoadFactor())
+}
+
+func TestEffectiveLoadFactor_NilLoadFactor_ZeroConcurrency(t *testing.T) {
+	a := &Account{Concurrency: 0}
+	require.Equal(t, 1, a.EffectiveLoadFactor())
+}
+
+func TestEffectiveLoadFactor_PositiveLoadFactor(t *testing.T) {
+	a := &Account{Concurrency: 5, LoadFactor: intPtr(20)}
+	require.Equal(t, 20, a.EffectiveLoadFactor())
+}
+
+func TestEffectiveLoadFactor_ZeroLoadFactor_FallbackToConcurrency(t *testing.T) {
+	a := &Account{Concurrency: 5, LoadFactor: intPtr(0)}
+	require.Equal(t, 5, a.EffectiveLoadFactor())
+}
+
+func TestEffectiveLoadFactor_NegativeLoadFactor_FallbackToConcurrency(t *testing.T) {
+	a := &Account{Concurrency: 3, LoadFactor: intPtr(-1)}
+	require.Equal(t, 3, a.EffectiveLoadFactor())
+}
+
+func TestEffectiveLoadFactor_ZeroLoadFactor_ZeroConcurrency(t *testing.T) {
+	a := &Account{Concurrency: 0, LoadFactor: intPtr(0)}
+	require.Equal(t, 1, a.EffectiveLoadFactor())
+}
diff --git a/backend/internal/service/account_service.go b/backend/internal/service/account_service.go
index 18a70c5c..26c0b1c2 100644
--- a/backend/internal/service/account_service.go
+++ b/backend/internal/service/account_service.go
@@ -68,6 +68,10 @@ type AccountRepository interface {
 	UpdateSessionWindow(ctx context.Context, id int64, start, end *time.Time, status string) error
 	UpdateExtra(ctx context.Context, id int64, updates map[string]any) error
 	BulkUpdate(ctx context.Context, ids []int64, updates AccountBulkUpdate) (int64, error)
+	// IncrementQuotaUsed 原子递增 API Key 账号的配额用量
+	IncrementQuotaUsed(ctx context.Context, id int64, amount float64) error
+	// ResetQuotaUsed 重置 API Key 账号的配额用量为 0
+	ResetQuotaUsed(ctx context.Context, id int64) error
 }
 
 // AccountBulkUpdate describes the fields that can be updated in a bulk operation.
@@ -78,6 +82,7 @@ type AccountBulkUpdate struct {
 	Concurrency    *int
 	Priority       *int
 	RateMultiplier *float64
+	LoadFactor     *int
 	Status         *string
 	Schedulable    *bool
 	Credentials    map[string]any
diff --git a/backend/internal/service/account_service_delete_test.go b/backend/internal/service/account_service_delete_test.go
index 768cf7b7..c96b436f 100644
--- a/backend/internal/service/account_service_delete_test.go
+++ b/backend/internal/service/account_service_delete_test.go
@@ -199,6 +199,14 @@ func (s *accountRepoStub) BulkUpdate(ctx context.Context, ids []int64, updates A
 	panic("unexpected BulkUpdate call")
 }
 
+func (s *accountRepoStub) IncrementQuotaUsed(ctx context.Context, id int64, amount float64) error {
+	return nil
+}
+
+func (s *accountRepoStub) ResetQuotaUsed(ctx context.Context, id int64) error {
+	return nil
+}
+
 // TestAccountService_Delete_NotFound 测试删除不存在的账号时返回正确的错误。
 // 预期行为：
 //   - ExistsByID 返回 false（账号不存在）
diff --git a/backend/internal/service/account_test_service.go b/backend/internal/service/account_test_service.go
index 99046e30..9557e175 100644
--- a/backend/internal/service/account_test_service.go
+++ b/backend/internal/service/account_test_service.go
@@ -180,7 +180,7 @@ func (s *AccountTestService) TestAccountConnection(c *gin.Context, accountID int
 	}
 
 	if account.Platform == PlatformAntigravity {
-		return s.testAntigravityAccountConnection(c, account, modelID)
+		return s.routeAntigravityTest(c, account, modelID)
 	}
 
 	if account.Platform == PlatformSora {
@@ -1177,6 +1177,18 @@ func truncateSoraErrorBody(body []byte, max int) string {
 	return soraerror.TruncateBody(body, max)
 }
 
+// routeAntigravityTest 路由 Antigravity 账号的测试请求。
+// APIKey 类型走原生协议（与 gateway_handler 路由一致），OAuth/Upstream 走 CRS 中转。
+func (s *AccountTestService) routeAntigravityTest(c *gin.Context, account *Account, modelID string) error {
+	if account.Type == AccountTypeAPIKey {
+		if strings.HasPrefix(modelID, "gemini-") {
+			return s.testGeminiAccountConnection(c, account, modelID)
+		}
+		return s.testClaudeAccountConnection(c, account, modelID)
+	}
+	return s.testAntigravityAccountConnection(c, account, modelID)
+}
+
 // testAntigravityAccountConnection tests an Antigravity account's connection
 // 支持 Claude 和 Gemini 两种协议，使用非流式请求
 func (s *AccountTestService) testAntigravityAccountConnection(c *gin.Context, account *Account, modelID string) error {
diff --git a/backend/internal/service/admin_service.go b/backend/internal/service/admin_service.go
index 67e7c783..11f850ee 100644
--- a/backend/internal/service/admin_service.go
+++ b/backend/internal/service/admin_service.go
@@ -84,6 +84,7 @@ type AdminService interface {
 	DeleteRedeemCode(ctx context.Context, id int64) error
 	BatchDeleteRedeemCodes(ctx context.Context, ids []int64) (int64, error)
 	ExpireRedeemCode(ctx context.Context, id int64) (*RedeemCode, error)
+	ResetAccountQuota(ctx context.Context, id int64) error
 }
 
 // CreateUserInput represents input for creating a new user via admin operations.
@@ -137,9 +138,10 @@ type CreateGroupInput struct {
 	// 无效请求兜底分组 ID（仅 anthropic 平台使用）
 	FallbackGroupIDOnInvalidRequest *int64
 	// 模型路由配置（仅 anthropic 平台使用）
-	ModelRouting        map[string][]int64
-	ModelRoutingEnabled bool // 是否启用模型路由
-	MCPXMLInject        *bool
+	ModelRouting             map[string][]int64
+	ModelRoutingEnabled      bool // 是否启用模型路由
+	MCPXMLInject             *bool
+	SimulateClaudeMaxEnabled *bool
 	// 支持的模型系列（仅 antigravity 平台使用）
 	SupportedModelScopes []string
 	// Sora 存储配额
@@ -173,9 +175,10 @@ type UpdateGroupInput struct {
 	// 无效请求兜底分组 ID（仅 anthropic 平台使用）
 	FallbackGroupIDOnInvalidRequest *int64
 	// 模型路由配置（仅 anthropic 平台使用）
-	ModelRouting        map[string][]int64
-	ModelRoutingEnabled *bool // 是否启用模型路由
-	MCPXMLInject        *bool
+	ModelRouting             map[string][]int64
+	ModelRoutingEnabled      *bool // 是否启用模型路由
+	MCPXMLInject             *bool
+	SimulateClaudeMaxEnabled *bool
 	// 支持的模型系列（仅 antigravity 平台使用）
 	SupportedModelScopes *[]string
 	// Sora 存储配额
@@ -195,6 +198,7 @@ type CreateAccountInput struct {
 	Concurrency        int
 	Priority           int
 	RateMultiplier     *float64 // 账号计费倍率（>=0，允许 0）
+	LoadFactor         *int
 	GroupIDs           []int64
 	ExpiresAt          *int64
 	AutoPauseOnExpired *bool
@@ -215,6 +219,7 @@ type UpdateAccountInput struct {
 	Concurrency           *int     // 使用指针区分"未提供"和"设置为0"
 	Priority              *int     // 使用指针区分"未提供"和"设置为0"
 	RateMultiplier        *float64 // 账号计费倍率（>=0，允许 0）
+	LoadFactor            *int
 	Status                string
 	GroupIDs              *[]int64
 	ExpiresAt             *int64
@@ -230,6 +235,7 @@ type BulkUpdateAccountsInput struct {
 	Concurrency    *int
 	Priority       *int
 	RateMultiplier *float64 // 账号计费倍率（>=0，允许 0）
+	LoadFactor     *int
 	Status         string
 	Schedulable    *bool
 	GroupIDs       *[]int64
@@ -353,6 +359,10 @@ type ProxyExitInfoProber interface {
 	ProbeProxy(ctx context.Context, proxyURL string) (*ProxyExitInfo, int64, error)
 }
 
+type groupExistenceBatchReader interface {
+	ExistsByIDs(ctx context.Context, ids []int64) (map[int64]bool, error)
+}
+
 type proxyQualityTarget struct {
 	Target          string
 	URL             string
@@ -428,10 +438,6 @@ type userGroupRateBatchReader interface {
 	GetByUserIDs(ctx context.Context, userIDs []int64) (map[int64]map[int64]float64, error)
 }
 
-type groupExistenceBatchReader interface {
-	ExistsByIDs(ctx context.Context, ids []int64) (map[int64]bool, error)
-}
-
 // NewAdminService creates a new AdminService
 func NewAdminService(
 	userRepo UserRepository,
@@ -847,6 +853,13 @@ func (s *adminServiceImpl) CreateGroup(ctx context.Context, input *CreateGroupIn
 	if input.MCPXMLInject != nil {
 		mcpXMLInject = *input.MCPXMLInject
 	}
+	simulateClaudeMaxEnabled := false
+	if input.SimulateClaudeMaxEnabled != nil {
+		if platform != PlatformAnthropic && *input.SimulateClaudeMaxEnabled {
+			return nil, fmt.Errorf("simulate_claude_max_enabled only supported for anthropic groups")
+		}
+		simulateClaudeMaxEnabled = *input.SimulateClaudeMaxEnabled
+	}
 
 	// 如果指定了复制账号的源分组，先获取账号 ID 列表
 	var accountIDsToCopy []int64
@@ -903,6 +916,7 @@ func (s *adminServiceImpl) CreateGroup(ctx context.Context, input *CreateGroupIn
 		FallbackGroupIDOnInvalidRequest: fallbackOnInvalidRequest,
 		ModelRouting:                    input.ModelRouting,
 		MCPXMLInject:                    mcpXMLInject,
+		SimulateClaudeMaxEnabled:        simulateClaudeMaxEnabled,
 		SupportedModelScopes:            input.SupportedModelScopes,
 		SoraStorageQuotaBytes:           input.SoraStorageQuotaBytes,
 	}
@@ -1112,6 +1126,15 @@ func (s *adminServiceImpl) UpdateGroup(ctx context.Context, id int64, input *Upd
 	if input.MCPXMLInject != nil {
 		group.MCPXMLInject = *input.MCPXMLInject
 	}
+	if input.SimulateClaudeMaxEnabled != nil {
+		if group.Platform != PlatformAnthropic && *input.SimulateClaudeMaxEnabled {
+			return nil, fmt.Errorf("simulate_claude_max_enabled only supported for anthropic groups")
+		}
+		group.SimulateClaudeMaxEnabled = *input.SimulateClaudeMaxEnabled
+	}
+	if group.Platform != PlatformAnthropic {
+		group.SimulateClaudeMaxEnabled = false
+	}
 
 	// 支持的模型系列（仅 antigravity 平台使用）
 	if input.SupportedModelScopes != nil {
@@ -1413,6 +1436,9 @@ func (s *adminServiceImpl) CreateAccount(ctx context.Context, input *CreateAccou
 		}
 		account.RateMultiplier = input.RateMultiplier
 	}
+	if input.LoadFactor != nil && *input.LoadFactor > 0 {
+		account.LoadFactor = input.LoadFactor
+	}
 	if err := s.accountRepo.Create(ctx, account); err != nil {
 		return nil, err
 	}
@@ -1458,6 +1484,10 @@ func (s *adminServiceImpl) UpdateAccount(ctx context.Context, id int64, input *U
 		account.Credentials = input.Credentials
 	}
 	if len(input.Extra) > 0 {
+		// 保留 quota_used，防止编辑账号时意外重置配额用量
+		if oldQuotaUsed, ok := account.Extra["quota_used"]; ok {
+			input.Extra["quota_used"] = oldQuotaUsed
+		}
 		account.Extra = input.Extra
 	}
 	if input.ProxyID != nil {
@@ -1483,6 +1513,13 @@ func (s *adminServiceImpl) UpdateAccount(ctx context.Context, id int64, input *U
 		}
 		account.RateMultiplier = input.RateMultiplier
 	}
+	if input.LoadFactor != nil {
+		if *input.LoadFactor <= 0 {
+			account.LoadFactor = nil // 0 或负数表示清除
+		} else {
+			account.LoadFactor = input.LoadFactor
+		}
+	}
 	if input.Status != "" {
 		account.Status = input.Status
 	}
@@ -1616,6 +1653,9 @@ func (s *adminServiceImpl) BulkUpdateAccounts(ctx context.Context, input *BulkUp
 	if input.RateMultiplier != nil {
 		repoUpdates.RateMultiplier = input.RateMultiplier
 	}
+	if input.LoadFactor != nil {
+		repoUpdates.LoadFactor = input.LoadFactor
+	}
 	if input.Status != "" {
 		repoUpdates.Status = &input.Status
 	}
@@ -2439,3 +2479,7 @@ func (e *MixedChannelError) Error() string {
 	return fmt.Sprintf("mixed_channel_warning: Group '%s' contains both %s and %s accounts. Using mixed channels in the same context may cause thinking block signature validation issues, which will fallback to non-thinking mode for historical messages.",
 		e.GroupName, e.CurrentPlatform, e.OtherPlatform)
 }
+
+func (s *adminServiceImpl) ResetAccountQuota(ctx context.Context, id int64) error {
+	return s.accountRepo.ResetQuotaUsed(ctx, id)
+}
diff --git a/backend/internal/service/admin_service_bulk_update_test.go b/backend/internal/service/admin_service_bulk_update_test.go
index 4845d87c..e90ec93a 100644
--- a/backend/internal/service/admin_service_bulk_update_test.go
+++ b/backend/internal/service/admin_service_bulk_update_test.go
@@ -43,6 +43,16 @@ func (s *accountRepoStubForBulkUpdate) BindGroups(_ context.Context, accountID i
 	return nil
 }
 
+func (s *accountRepoStubForBulkUpdate) ListByGroup(_ context.Context, groupID int64) ([]Account, error) {
+	if err, ok := s.listByGroupErr[groupID]; ok {
+		return nil, err
+	}
+	if rows, ok := s.listByGroupData[groupID]; ok {
+		return rows, nil
+	}
+	return nil, nil
+}
+
 func (s *accountRepoStubForBulkUpdate) GetByIDs(_ context.Context, ids []int64) ([]*Account, error) {
 	s.getByIDsCalled = true
 	s.getByIDsIDs = append([]int64{}, ids...)
@@ -63,16 +73,6 @@ func (s *accountRepoStubForBulkUpdate) GetByID(_ context.Context, id int64) (*Ac
 	return nil, errors.New("account not found")
 }
 
-func (s *accountRepoStubForBulkUpdate) ListByGroup(_ context.Context, groupID int64) ([]Account, error) {
-	if err, ok := s.listByGroupErr[groupID]; ok {
-		return nil, err
-	}
-	if rows, ok := s.listByGroupData[groupID]; ok {
-		return rows, nil
-	}
-	return nil, nil
-}
-
 // TestAdminService_BulkUpdateAccounts_AllSuccessIDs 验证批量更新成功时返回 success_ids/failed_ids。
 func TestAdminService_BulkUpdateAccounts_AllSuccessIDs(t *testing.T) {
 	repo := &accountRepoStubForBulkUpdate{}
diff --git a/backend/internal/service/admin_service_group_test.go b/backend/internal/service/admin_service_group_test.go
index ef77a980..0e6fe084 100644
--- a/backend/internal/service/admin_service_group_test.go
+++ b/backend/internal/service/admin_service_group_test.go
@@ -785,3 +785,57 @@ func TestAdminService_UpdateGroup_InvalidRequestFallbackAllowsAntigravity(t *tes
 	require.NotNil(t, repo.updated)
 	require.Equal(t, fallbackID, *repo.updated.FallbackGroupIDOnInvalidRequest)
 }
+
+func TestAdminService_CreateGroup_SimulateClaudeMaxRequiresAnthropic(t *testing.T) {
+	repo := &groupRepoStubForAdmin{}
+	svc := &adminServiceImpl{groupRepo: repo}
+
+	enabled := true
+	_, err := svc.CreateGroup(context.Background(), &CreateGroupInput{
+		Name:                     "openai-group",
+		Platform:                 PlatformOpenAI,
+		SimulateClaudeMaxEnabled: &enabled,
+	})
+	require.Error(t, err)
+	require.Contains(t, err.Error(), "simulate_claude_max_enabled only supported for anthropic groups")
+	require.Nil(t, repo.created)
+}
+
+func TestAdminService_UpdateGroup_SimulateClaudeMaxRequiresAnthropic(t *testing.T) {
+	existingGroup := &Group{
+		ID:       1,
+		Name:     "openai-group",
+		Platform: PlatformOpenAI,
+		Status:   StatusActive,
+	}
+	repo := &groupRepoStubForAdmin{getByID: existingGroup}
+	svc := &adminServiceImpl{groupRepo: repo}
+
+	enabled := true
+	_, err := svc.UpdateGroup(context.Background(), 1, &UpdateGroupInput{
+		SimulateClaudeMaxEnabled: &enabled,
+	})
+	require.Error(t, err)
+	require.Contains(t, err.Error(), "simulate_claude_max_enabled only supported for anthropic groups")
+	require.Nil(t, repo.updated)
+}
+
+func TestAdminService_UpdateGroup_ClearsSimulateClaudeMaxWhenPlatformChanges(t *testing.T) {
+	existingGroup := &Group{
+		ID:                       1,
+		Name:                     "anthropic-group",
+		Platform:                 PlatformAnthropic,
+		Status:                   StatusActive,
+		SimulateClaudeMaxEnabled: true,
+	}
+	repo := &groupRepoStubForAdmin{getByID: existingGroup}
+	svc := &adminServiceImpl{groupRepo: repo}
+
+	group, err := svc.UpdateGroup(context.Background(), 1, &UpdateGroupInput{
+		Platform: PlatformOpenAI,
+	})
+	require.NoError(t, err)
+	require.NotNil(t, group)
+	require.NotNil(t, repo.updated)
+	require.False(t, repo.updated.SimulateClaudeMaxEnabled)
+}
diff --git a/backend/internal/service/antigravity_gateway_service.go b/backend/internal/service/antigravity_gateway_service.go
index 96ff3354..0fbbbfaf 100644
--- a/backend/internal/service/antigravity_gateway_service.go
+++ b/backend/internal/service/antigravity_gateway_service.go
@@ -1599,7 +1599,7 @@ func (s *AntigravityGatewayService) Forward(ctx context.Context, c *gin.Context,
 	var clientDisconnect bool
 	if claudeReq.Stream {
 		// 客户端要求流式，直接透传转换
-		streamRes, err := s.handleClaudeStreamingResponse(c, resp, startTime, originalModel)
+		streamRes, err := s.handleClaudeStreamingResponse(c, resp, startTime, originalModel, account.ID)
 		if err != nil {
 			logger.LegacyPrintf("service.antigravity_gateway", "%s status=stream_error error=%v", prefix, err)
 			return nil, err
@@ -1609,7 +1609,7 @@ func (s *AntigravityGatewayService) Forward(ctx context.Context, c *gin.Context,
 		clientDisconnect = streamRes.clientDisconnect
 	} else {
 		// 客户端要求非流式，收集流式响应后转换返回
-		streamRes, err := s.handleClaudeStreamToNonStreaming(c, resp, startTime, originalModel)
+		streamRes, err := s.handleClaudeStreamToNonStreaming(c, resp, startTime, originalModel, account.ID)
 		if err != nil {
 			logger.LegacyPrintf("service.antigravity_gateway", "%s status=stream_collect_error error=%v", prefix, err)
 			return nil, err
@@ -1618,6 +1618,9 @@ func (s *AntigravityGatewayService) Forward(ctx context.Context, c *gin.Context,
 		firstTokenMs = streamRes.firstTokenMs
 	}
 
+	// Claude Max cache billing: 同步 ForwardResult.Usage 与客户端响应体一致
+	applyClaudeMaxCacheBillingPolicyToUsage(usage, parsedRequestFromGinContext(c), claudeMaxGroupFromGinContext(c), originalModel, account.ID)
+
 	return &ForwardResult{
 		RequestID:        requestID,
 		Usage:            *usage,
@@ -3415,7 +3418,7 @@ func (s *AntigravityGatewayService) writeGoogleError(c *gin.Context, status int,
 
 // handleClaudeStreamToNonStreaming 收集上游流式响应，转换为 Claude 非流式格式返回
 // 用于处理客户端非流式请求但上游只支持流式的情况
-func (s *AntigravityGatewayService) handleClaudeStreamToNonStreaming(c *gin.Context, resp *http.Response, startTime time.Time, originalModel string) (*antigravityStreamResult, error) {
+func (s *AntigravityGatewayService) handleClaudeStreamToNonStreaming(c *gin.Context, resp *http.Response, startTime time.Time, originalModel string, accountID int64) (*antigravityStreamResult, error) {
 	scanner := bufio.NewScanner(resp.Body)
 	maxLineSize := defaultMaxLineSize
 	if s.settingService.cfg != nil && s.settingService.cfg.Gateway.MaxLineSize > 0 {
@@ -3573,6 +3576,9 @@ returnResponse:
 		return nil, s.writeClaudeError(c, http.StatusBadGateway, "upstream_error", "Failed to parse upstream response")
 	}
 
+	// Claude Max cache billing simulation (non-streaming)
+	claudeResp = applyClaudeMaxNonStreamingRewrite(c, claudeResp, agUsage, originalModel, accountID)
+
 	c.Data(http.StatusOK, "application/json", claudeResp)
 
 	// 转换为 service.ClaudeUsage
@@ -3587,7 +3593,7 @@ returnResponse:
 }
 
 // handleClaudeStreamingResponse 处理 Claude 流式响应（Gemini SSE → Claude SSE 转换）
-func (s *AntigravityGatewayService) handleClaudeStreamingResponse(c *gin.Context, resp *http.Response, startTime time.Time, originalModel string) (*antigravityStreamResult, error) {
+func (s *AntigravityGatewayService) handleClaudeStreamingResponse(c *gin.Context, resp *http.Response, startTime time.Time, originalModel string, accountID int64) (*antigravityStreamResult, error) {
 	c.Header("Content-Type", "text/event-stream")
 	c.Header("Cache-Control", "no-cache")
 	c.Header("Connection", "keep-alive")
@@ -3600,6 +3606,8 @@ func (s *AntigravityGatewayService) handleClaudeStreamingResponse(c *gin.Context
 	}
 
 	processor := antigravity.NewStreamingProcessor(originalModel)
+	setupClaudeMaxStreamingHook(c, processor, originalModel, accountID)
+
 	var firstTokenMs *int
 	// 使用 Scanner 并限制单行大小，避免 ReadString 无上限导致 OOM
 	scanner := bufio.NewScanner(resp.Body)
diff --git a/backend/internal/service/antigravity_gateway_service_test.go b/backend/internal/service/antigravity_gateway_service_test.go
index 84b65adc..cbecfee5 100644
--- a/backend/internal/service/antigravity_gateway_service_test.go
+++ b/backend/internal/service/antigravity_gateway_service_test.go
@@ -710,7 +710,7 @@ func TestHandleClaudeStreamingResponse_NormalComplete(t *testing.T) {
 		fmt.Fprintln(pw, "")
 	}()
 
-	result, err := svc.handleClaudeStreamingResponse(c, resp, time.Now(), "claude-sonnet-4-5")
+	result, err := svc.handleClaudeStreamingResponse(c, resp, time.Now(), "claude-sonnet-4-5", 0)
 	_ = pr.Close()
 
 	require.NoError(t, err)
@@ -787,7 +787,7 @@ func TestHandleClaudeStreamingResponse_ThoughtsTokenCount(t *testing.T) {
 		fmt.Fprintln(pw, "")
 	}()
 
-	result, err := svc.handleClaudeStreamingResponse(c, resp, time.Now(), "gemini-2.5-pro")
+	result, err := svc.handleClaudeStreamingResponse(c, resp, time.Now(), "gemini-2.5-pro", 0)
 	_ = pr.Close()
 
 	require.NoError(t, err)
@@ -990,7 +990,7 @@ func TestHandleClaudeStreamingResponse_ClientDisconnect(t *testing.T) {
 		fmt.Fprintln(pw, "")
 	}()
 
-	result, err := svc.handleClaudeStreamingResponse(c, resp, time.Now(), "claude-sonnet-4-5")
+	result, err := svc.handleClaudeStreamingResponse(c, resp, time.Now(), "claude-sonnet-4-5", 0)
 	_ = pr.Close()
 
 	require.NoError(t, err)
@@ -1014,7 +1014,7 @@ func TestHandleClaudeStreamingResponse_ContextCanceled(t *testing.T) {
 
 	resp := &http.Response{StatusCode: http.StatusOK, Body: cancelReadCloser{}, Header: http.Header{}}
 
-	result, err := svc.handleClaudeStreamingResponse(c, resp, time.Now(), "claude-sonnet-4-5")
+	result, err := svc.handleClaudeStreamingResponse(c, resp, time.Now(), "claude-sonnet-4-5", 0)
 
 	require.NoError(t, err)
 	require.NotNil(t, result)
diff --git a/backend/internal/service/api_key_auth_cache.go b/backend/internal/service/api_key_auth_cache.go
index 83933f42..80e9c0c6 100644
--- a/backend/internal/service/api_key_auth_cache.go
+++ b/backend/internal/service/api_key_auth_cache.go
@@ -59,9 +59,10 @@ type APIKeyAuthGroupSnapshot struct {
 
 	// Model routing is used by gateway account selection, so it must be part of auth cache snapshot.
 	// Only anthropic groups use these fields; others may leave them empty.
-	ModelRouting        map[string][]int64 `json:"model_routing,omitempty"`
-	ModelRoutingEnabled bool               `json:"model_routing_enabled"`
-	MCPXMLInject        bool               `json:"mcp_xml_inject"`
+	ModelRouting             map[string][]int64 `json:"model_routing,omitempty"`
+	ModelRoutingEnabled      bool               `json:"model_routing_enabled"`
+	MCPXMLInject             bool               `json:"mcp_xml_inject"`
+	SimulateClaudeMaxEnabled bool               `json:"simulate_claude_max_enabled"`
 
 	// 支持的模型系列（仅 antigravity 平台使用）
 	SupportedModelScopes []string `json:"supported_model_scopes,omitempty"`
diff --git a/backend/internal/service/api_key_auth_cache_impl.go b/backend/internal/service/api_key_auth_cache_impl.go
index 0ca694af..2aaa20ac 100644
--- a/backend/internal/service/api_key_auth_cache_impl.go
+++ b/backend/internal/service/api_key_auth_cache_impl.go
@@ -244,6 +244,7 @@ func (s *APIKeyService) snapshotFromAPIKey(apiKey *APIKey) *APIKeyAuthSnapshot {
 			ModelRouting:                    apiKey.Group.ModelRouting,
 			ModelRoutingEnabled:             apiKey.Group.ModelRoutingEnabled,
 			MCPXMLInject:                    apiKey.Group.MCPXMLInject,
+			SimulateClaudeMaxEnabled:        apiKey.Group.SimulateClaudeMaxEnabled,
 			SupportedModelScopes:            apiKey.Group.SupportedModelScopes,
 		}
 	}
@@ -301,6 +302,7 @@ func (s *APIKeyService) snapshotToAPIKey(key string, snapshot *APIKeyAuthSnapsho
 			ModelRouting:                    snapshot.Group.ModelRouting,
 			ModelRoutingEnabled:             snapshot.Group.ModelRoutingEnabled,
 			MCPXMLInject:                    snapshot.Group.MCPXMLInject,
+			SimulateClaudeMaxEnabled:        snapshot.Group.SimulateClaudeMaxEnabled,
 			SupportedModelScopes:            snapshot.Group.SupportedModelScopes,
 		}
 	}
diff --git a/backend/internal/service/claude_max_cache_billing_policy.go b/backend/internal/service/claude_max_cache_billing_policy.go
new file mode 100644
index 00000000..2381915e
--- /dev/null
+++ b/backend/internal/service/claude_max_cache_billing_policy.go
@@ -0,0 +1,450 @@
+package service
+
+import (
+	"encoding/json"
+	"strings"
+
+	"github.com/Wei-Shaw/sub2api/internal/pkg/claude"
+	"github.com/Wei-Shaw/sub2api/internal/pkg/logger"
+	"github.com/tidwall/gjson"
+)
+
+type claudeMaxCacheBillingOutcome struct {
+	Simulated bool
+}
+
+func applyClaudeMaxCacheBillingPolicyToUsage(usage *ClaudeUsage, parsed *ParsedRequest, group *Group, model string, accountID int64) claudeMaxCacheBillingOutcome {
+	var out claudeMaxCacheBillingOutcome
+	if usage == nil || !shouldApplyClaudeMaxBillingRulesForUsage(group, model, parsed) {
+		return out
+	}
+
+	resolvedModel := strings.TrimSpace(model)
+	if resolvedModel == "" && parsed != nil {
+		resolvedModel = strings.TrimSpace(parsed.Model)
+	}
+
+	if hasCacheCreationTokens(*usage) {
+		// Upstream already returned cache creation usage; keep original usage.
+		return out
+	}
+
+	if !shouldSimulateClaudeMaxUsageForUsage(*usage, parsed) {
+		return out
+	}
+	beforeInputTokens := usage.InputTokens
+	out.Simulated = safelyProjectUsageToClaudeMax1H(usage, parsed)
+	if out.Simulated {
+		logger.LegacyPrintf("service.gateway", "simulate_claude_max_usage: model=%s account=%d input_tokens:%d->%d cache_creation_1h=%d",
+			resolvedModel,
+			accountID,
+			beforeInputTokens,
+			usage.InputTokens,
+			usage.CacheCreation1hTokens,
+		)
+	}
+	return out
+}
+
+func isClaudeFamilyModel(model string) bool {
+	normalized := strings.ToLower(strings.TrimSpace(claude.NormalizeModelID(model)))
+	if normalized == "" {
+		return false
+	}
+	return strings.Contains(normalized, "claude-")
+}
+
+func shouldApplyClaudeMaxBillingRules(input *RecordUsageInput) bool {
+	if input == nil || input.Result == nil || input.APIKey == nil || input.APIKey.Group == nil {
+		return false
+	}
+	return shouldApplyClaudeMaxBillingRulesForUsage(input.APIKey.Group, input.Result.Model, input.ParsedRequest)
+}
+
+func shouldApplyClaudeMaxBillingRulesForUsage(group *Group, model string, parsed *ParsedRequest) bool {
+	if group == nil {
+		return false
+	}
+	if !group.SimulateClaudeMaxEnabled || group.Platform != PlatformAnthropic {
+		return false
+	}
+
+	resolvedModel := model
+	if resolvedModel == "" && parsed != nil {
+		resolvedModel = parsed.Model
+	}
+	if !isClaudeFamilyModel(resolvedModel) {
+		return false
+	}
+	return true
+}
+
+func hasCacheCreationTokens(usage ClaudeUsage) bool {
+	return usage.CacheCreationInputTokens > 0 || usage.CacheCreation5mTokens > 0 || usage.CacheCreation1hTokens > 0
+}
+
+func shouldSimulateClaudeMaxUsage(input *RecordUsageInput) bool {
+	if input == nil || input.Result == nil {
+		return false
+	}
+	if !shouldApplyClaudeMaxBillingRules(input) {
+		return false
+	}
+	return shouldSimulateClaudeMaxUsageForUsage(input.Result.Usage, input.ParsedRequest)
+}
+
+func shouldSimulateClaudeMaxUsageForUsage(usage ClaudeUsage, parsed *ParsedRequest) bool {
+	if usage.InputTokens <= 0 {
+		return false
+	}
+	if hasCacheCreationTokens(usage) {
+		return false
+	}
+	if !hasClaudeCacheSignals(parsed) {
+		return false
+	}
+	return true
+}
+
+func safelyProjectUsageToClaudeMax1H(usage *ClaudeUsage, parsed *ParsedRequest) (changed bool) {
+	defer func() {
+		if r := recover(); r != nil {
+			logger.LegacyPrintf("service.gateway", "simulate_claude_max_usage skipped: panic=%v", r)
+			changed = false
+		}
+	}()
+	return projectUsageToClaudeMax1H(usage, parsed)
+}
+
+func projectUsageToClaudeMax1H(usage *ClaudeUsage, parsed *ParsedRequest) bool {
+	if usage == nil {
+		return false
+	}
+	totalWindowTokens := usage.InputTokens + usage.CacheCreation5mTokens + usage.CacheCreation1hTokens
+	if totalWindowTokens <= 1 {
+		return false
+	}
+
+	simulatedInputTokens := computeClaudeMaxProjectedInputTokens(totalWindowTokens, parsed)
+	if simulatedInputTokens <= 0 {
+		simulatedInputTokens = 1
+	}
+	if simulatedInputTokens >= totalWindowTokens {
+		simulatedInputTokens = totalWindowTokens - 1
+	}
+
+	cacheCreation1hTokens := totalWindowTokens - simulatedInputTokens
+	if usage.InputTokens == simulatedInputTokens &&
+		usage.CacheCreation5mTokens == 0 &&
+		usage.CacheCreation1hTokens == cacheCreation1hTokens &&
+		usage.CacheCreationInputTokens == cacheCreation1hTokens {
+		return false
+	}
+
+	usage.InputTokens = simulatedInputTokens
+	usage.CacheCreation5mTokens = 0
+	usage.CacheCreation1hTokens = cacheCreation1hTokens
+	usage.CacheCreationInputTokens = cacheCreation1hTokens
+	return true
+}
+
+type claudeCacheProjection struct {
+	HasBreakpoint        bool
+	BreakpointCount      int
+	TotalEstimatedTokens int
+	TailEstimatedTokens  int
+}
+
+func computeClaudeMaxProjectedInputTokens(totalWindowTokens int, parsed *ParsedRequest) int {
+	if totalWindowTokens <= 1 {
+		return totalWindowTokens
+	}
+
+	projection := analyzeClaudeCacheProjection(parsed)
+	if !projection.HasBreakpoint || projection.TotalEstimatedTokens <= 0 || projection.TailEstimatedTokens <= 0 {
+		return totalWindowTokens
+	}
+
+	totalEstimate := int64(projection.TotalEstimatedTokens)
+	tailEstimate := int64(projection.TailEstimatedTokens)
+	if tailEstimate > totalEstimate {
+		tailEstimate = totalEstimate
+	}
+
+	scaled := (int64(totalWindowTokens)*tailEstimate + totalEstimate/2) / totalEstimate
+	if scaled <= 0 {
+		scaled = 1
+	}
+	if scaled >= int64(totalWindowTokens) {
+		scaled = int64(totalWindowTokens - 1)
+	}
+	return int(scaled)
+}
+
+func hasClaudeCacheSignals(parsed *ParsedRequest) bool {
+	if parsed == nil {
+		return false
+	}
+	if hasTopLevelEphemeralCacheControl(parsed) {
+		return true
+	}
+	return countExplicitCacheBreakpoints(parsed) > 0
+}
+
+func hasTopLevelEphemeralCacheControl(parsed *ParsedRequest) bool {
+	if parsed == nil || len(parsed.Body) == 0 {
+		return false
+	}
+	cacheType := strings.TrimSpace(gjson.GetBytes(parsed.Body, "cache_control.type").String())
+	return strings.EqualFold(cacheType, "ephemeral")
+}
+
+func analyzeClaudeCacheProjection(parsed *ParsedRequest) claudeCacheProjection {
+	var projection claudeCacheProjection
+	if parsed == nil {
+		return projection
+	}
+
+	total := 0
+	lastBreakpointAt := -1
+
+	switch system := parsed.System.(type) {
+	case string:
+		total += claudeMaxMessageOverheadTokens + estimateClaudeTextTokens(system)
+	case []any:
+		for _, raw := range system {
+			block, ok := raw.(map[string]any)
+			if !ok {
+				total += claudeMaxUnknownContentTokens
+				continue
+			}
+			total += estimateClaudeBlockTokens(block)
+			if hasEphemeralCacheControl(block) {
+				lastBreakpointAt = total
+				projection.BreakpointCount++
+				projection.HasBreakpoint = true
+			}
+		}
+	}
+
+	for _, rawMsg := range parsed.Messages {
+		total += claudeMaxMessageOverheadTokens
+		msg, ok := rawMsg.(map[string]any)
+		if !ok {
+			total += claudeMaxUnknownContentTokens
+			continue
+		}
+		content, exists := msg["content"]
+		if !exists {
+			continue
+		}
+		msgTokens, msgLastBreak, msgBreakCount := estimateClaudeContentTokens(content)
+		total += msgTokens
+		if msgBreakCount > 0 {
+			lastBreakpointAt = total - msgTokens + msgLastBreak
+			projection.BreakpointCount += msgBreakCount
+			projection.HasBreakpoint = true
+		}
+	}
+
+	if total <= 0 {
+		total = 1
+	}
+	projection.TotalEstimatedTokens = total
+
+	if projection.HasBreakpoint && lastBreakpointAt >= 0 {
+		tail := total - lastBreakpointAt
+		if tail <= 0 {
+			tail = 1
+		}
+		projection.TailEstimatedTokens = tail
+		return projection
+	}
+
+	if hasTopLevelEphemeralCacheControl(parsed) {
+		tail := estimateLastUserMessageTokens(parsed)
+		if tail <= 0 {
+			tail = 1
+		}
+		projection.HasBreakpoint = true
+		projection.BreakpointCount = 1
+		projection.TailEstimatedTokens = tail
+	}
+	return projection
+}
+
+func countExplicitCacheBreakpoints(parsed *ParsedRequest) int {
+	if parsed == nil {
+		return 0
+	}
+	total := 0
+	if system, ok := parsed.System.([]any); ok {
+		for _, raw := range system {
+			if block, ok := raw.(map[string]any); ok && hasEphemeralCacheControl(block) {
+				total++
+			}
+		}
+	}
+	for _, rawMsg := range parsed.Messages {
+		msg, ok := rawMsg.(map[string]any)
+		if !ok {
+			continue
+		}
+		content, ok := msg["content"].([]any)
+		if !ok {
+			continue
+		}
+		for _, raw := range content {
+			if block, ok := raw.(map[string]any); ok && hasEphemeralCacheControl(block) {
+				total++
+			}
+		}
+	}
+	return total
+}
+
+func hasEphemeralCacheControl(block map[string]any) bool {
+	if block == nil {
+		return false
+	}
+	raw, ok := block["cache_control"]
+	if !ok || raw == nil {
+		return false
+	}
+	switch cc := raw.(type) {
+	case map[string]any:
+		cacheType, _ := cc["type"].(string)
+		return strings.EqualFold(strings.TrimSpace(cacheType), "ephemeral")
+	case map[string]string:
+		return strings.EqualFold(strings.TrimSpace(cc["type"]), "ephemeral")
+	default:
+		return false
+	}
+}
+
+func estimateClaudeContentTokens(content any) (tokens int, lastBreakAt int, breakpointCount int) {
+	switch value := content.(type) {
+	case string:
+		return estimateClaudeTextTokens(value), -1, 0
+	case []any:
+		total := 0
+		lastBreak := -1
+		breaks := 0
+		for _, raw := range value {
+			block, ok := raw.(map[string]any)
+			if !ok {
+				total += claudeMaxUnknownContentTokens
+				continue
+			}
+			total += estimateClaudeBlockTokens(block)
+			if hasEphemeralCacheControl(block) {
+				lastBreak = total
+				breaks++
+			}
+		}
+		return total, lastBreak, breaks
+	default:
+		return estimateStructuredTokens(value), -1, 0
+	}
+}
+
+func estimateClaudeBlockTokens(block map[string]any) int {
+	if block == nil {
+		return claudeMaxUnknownContentTokens
+	}
+	tokens := claudeMaxBlockOverheadTokens
+	blockType, _ := block["type"].(string)
+	switch blockType {
+	case "text":
+		if text, ok := block["text"].(string); ok {
+			tokens += estimateClaudeTextTokens(text)
+		}
+	case "tool_result":
+		if content, ok := block["content"]; ok {
+			nested, _, _ := estimateClaudeContentTokens(content)
+			tokens += nested
+		}
+	case "tool_use":
+		if name, ok := block["name"].(string); ok {
+			tokens += estimateClaudeTextTokens(name)
+		}
+		if input, ok := block["input"]; ok {
+			tokens += estimateStructuredTokens(input)
+		}
+	default:
+		if text, ok := block["text"].(string); ok {
+			tokens += estimateClaudeTextTokens(text)
+		} else if content, ok := block["content"]; ok {
+			nested, _, _ := estimateClaudeContentTokens(content)
+			tokens += nested
+		}
+	}
+	if tokens <= claudeMaxBlockOverheadTokens {
+		tokens += claudeMaxUnknownContentTokens
+	}
+	return tokens
+}
+
+func estimateLastUserMessageTokens(parsed *ParsedRequest) int {
+	if parsed == nil || len(parsed.Messages) == 0 {
+		return 0
+	}
+	for i := len(parsed.Messages) - 1; i >= 0; i-- {
+		msg, ok := parsed.Messages[i].(map[string]any)
+		if !ok {
+			continue
+		}
+		role, _ := msg["role"].(string)
+		if !strings.EqualFold(strings.TrimSpace(role), "user") {
+			continue
+		}
+		tokens, _, _ := estimateClaudeContentTokens(msg["content"])
+		return claudeMaxMessageOverheadTokens + tokens
+	}
+	return 0
+}
+
+func estimateStructuredTokens(v any) int {
+	if v == nil {
+		return 0
+	}
+	raw, err := json.Marshal(v)
+	if err != nil {
+		return claudeMaxUnknownContentTokens
+	}
+	return estimateClaudeTextTokens(string(raw))
+}
+
+func estimateClaudeTextTokens(text string) int {
+	if tokens, ok := estimateTokensByThirdPartyTokenizer(text); ok {
+		return tokens
+	}
+	return estimateClaudeTextTokensHeuristic(text)
+}
+
+func estimateClaudeTextTokensHeuristic(text string) int {
+	normalized := strings.Join(strings.Fields(strings.TrimSpace(text)), " ")
+	if normalized == "" {
+		return 0
+	}
+	asciiChars := 0
+	nonASCIIChars := 0
+	for _, r := range normalized {
+		if r <= 127 {
+			asciiChars++
+		} else {
+			nonASCIIChars++
+		}
+	}
+	tokens := nonASCIIChars
+	if asciiChars > 0 {
+		tokens += (asciiChars + 3) / 4
+	}
+	if words := len(strings.Fields(normalized)); words > tokens {
+		tokens = words
+	}
+	if tokens <= 0 {
+		return 1
+	}
+	return tokens
+}
diff --git a/backend/internal/service/claude_max_simulation_test.go b/backend/internal/service/claude_max_simulation_test.go
new file mode 100644
index 00000000..3d2ae2e6
--- /dev/null
+++ b/backend/internal/service/claude_max_simulation_test.go
@@ -0,0 +1,156 @@
+package service
+
+import (
+	"strings"
+	"testing"
+)
+
+func TestProjectUsageToClaudeMax1H_Conservation(t *testing.T) {
+	usage := &ClaudeUsage{
+		InputTokens:              1200,
+		CacheCreationInputTokens: 0,
+		CacheCreation5mTokens:    0,
+		CacheCreation1hTokens:    0,
+	}
+	parsed := &ParsedRequest{
+		Model: "claude-sonnet-4-5",
+		Messages: []any{
+			map[string]any{
+				"role": "user",
+				"content": []any{
+					map[string]any{
+						"type":          "text",
+						"text":          strings.Repeat("cached context ", 200),
+						"cache_control": map[string]any{"type": "ephemeral"},
+					},
+					map[string]any{
+						"type": "text",
+						"text": "summarize quickly",
+					},
+				},
+			},
+		},
+	}
+
+	changed := projectUsageToClaudeMax1H(usage, parsed)
+	if !changed {
+		t.Fatalf("expected usage to be projected")
+	}
+
+	total := usage.InputTokens + usage.CacheCreation5mTokens + usage.CacheCreation1hTokens
+	if total != 1200 {
+		t.Fatalf("total tokens changed: got=%d want=%d", total, 1200)
+	}
+	if usage.CacheCreation5mTokens != 0 {
+		t.Fatalf("cache_creation_5m should be 0, got=%d", usage.CacheCreation5mTokens)
+	}
+	if usage.InputTokens <= 0 || usage.InputTokens >= 1200 {
+		t.Fatalf("simulated input out of range, got=%d", usage.InputTokens)
+	}
+	if usage.InputTokens > 100 {
+		t.Fatalf("simulated input should stay near cache breakpoint tail, got=%d", usage.InputTokens)
+	}
+	if usage.CacheCreation1hTokens <= 0 {
+		t.Fatalf("cache_creation_1h should be > 0, got=%d", usage.CacheCreation1hTokens)
+	}
+	if usage.CacheCreationInputTokens != usage.CacheCreation1hTokens {
+		t.Fatalf("cache_creation_input_tokens mismatch: got=%d want=%d", usage.CacheCreationInputTokens, usage.CacheCreation1hTokens)
+	}
+}
+
+func TestComputeClaudeMaxProjectedInputTokens_Deterministic(t *testing.T) {
+	parsed := &ParsedRequest{
+		Model: "claude-opus-4-5",
+		Messages: []any{
+			map[string]any{
+				"role": "user",
+				"content": []any{
+					map[string]any{
+						"type":          "text",
+						"text":          "build context",
+						"cache_control": map[string]any{"type": "ephemeral"},
+					},
+					map[string]any{
+						"type": "text",
+						"text": "what is failing now",
+					},
+				},
+			},
+		},
+	}
+
+	got1 := computeClaudeMaxProjectedInputTokens(4096, parsed)
+	got2 := computeClaudeMaxProjectedInputTokens(4096, parsed)
+	if got1 != got2 {
+		t.Fatalf("non-deterministic input tokens: %d != %d", got1, got2)
+	}
+}
+
+func TestShouldSimulateClaudeMaxUsage(t *testing.T) {
+	group := &Group{
+		Platform:                 PlatformAnthropic,
+		SimulateClaudeMaxEnabled: true,
+	}
+	input := &RecordUsageInput{
+		Result: &ForwardResult{
+			Model: "claude-sonnet-4-5",
+			Usage: ClaudeUsage{
+				InputTokens:              3000,
+				CacheCreationInputTokens: 0,
+				CacheCreation5mTokens:    0,
+				CacheCreation1hTokens:    0,
+			},
+		},
+		ParsedRequest: &ParsedRequest{
+			Messages: []any{
+				map[string]any{
+					"role": "user",
+					"content": []any{
+						map[string]any{
+							"type":          "text",
+							"text":          "cached",
+							"cache_control": map[string]any{"type": "ephemeral"},
+						},
+						map[string]any{
+							"type": "text",
+							"text": "tail",
+						},
+					},
+				},
+			},
+		},
+		APIKey: &APIKey{Group: group},
+	}
+
+	if !shouldSimulateClaudeMaxUsage(input) {
+		t.Fatalf("expected simulate=true for claude group with cache signal")
+	}
+
+	input.ParsedRequest = &ParsedRequest{
+		Messages: []any{
+			map[string]any{"role": "user", "content": "no cache signal"},
+		},
+	}
+	if shouldSimulateClaudeMaxUsage(input) {
+		t.Fatalf("expected simulate=false when request has no cache signal")
+	}
+
+	input.ParsedRequest = &ParsedRequest{
+		Messages: []any{
+			map[string]any{
+				"role": "user",
+				"content": []any{
+					map[string]any{
+						"type":          "text",
+						"text":          "cached",
+						"cache_control": map[string]any{"type": "ephemeral"},
+					},
+				},
+			},
+		},
+	}
+	input.Result.Usage.CacheCreationInputTokens = 100
+	if shouldSimulateClaudeMaxUsage(input) {
+		t.Fatalf("expected simulate=false when cache creation already exists")
+	}
+}
diff --git a/backend/internal/service/claude_tokenizer.go b/backend/internal/service/claude_tokenizer.go
new file mode 100644
index 00000000..61f5e961
--- /dev/null
+++ b/backend/internal/service/claude_tokenizer.go
@@ -0,0 +1,41 @@
+package service
+
+import (
+	"sync"
+
+	tiktoken "github.com/pkoukk/tiktoken-go"
+	tiktokenloader "github.com/pkoukk/tiktoken-go-loader"
+)
+
+var (
+	claudeTokenizerOnce sync.Once
+	claudeTokenizer     *tiktoken.Tiktoken
+)
+
+func getClaudeTokenizer() *tiktoken.Tiktoken {
+	claudeTokenizerOnce.Do(func() {
+		// Use offline loader to avoid runtime dictionary download.
+		tiktoken.SetBpeLoader(tiktokenloader.NewOfflineLoader())
+		// Use a high-capacity tokenizer as the default approximation for Claude payloads.
+		enc, err := tiktoken.GetEncoding(tiktoken.MODEL_O200K_BASE)
+		if err != nil {
+			enc, err = tiktoken.GetEncoding(tiktoken.MODEL_CL100K_BASE)
+		}
+		if err == nil {
+			claudeTokenizer = enc
+		}
+	})
+	return claudeTokenizer
+}
+
+func estimateTokensByThirdPartyTokenizer(text string) (int, bool) {
+	enc := getClaudeTokenizer()
+	if enc == nil {
+		return 0, false
+	}
+	tokens := len(enc.EncodeOrdinary(text))
+	if tokens <= 0 {
+		return 0, false
+	}
+	return tokens, true
+}
diff --git a/backend/internal/service/concurrency_service.go b/backend/internal/service/concurrency_service.go
index 4dcf84e0..19d3d536 100644
--- a/backend/internal/service/concurrency_service.go
+++ b/backend/internal/service/concurrency_service.go
@@ -331,8 +331,9 @@ func (s *ConcurrencyService) StartSlotCleanupWorker(accountRepo AccountRepositor
 	}()
 }
 
-// GetAccountConcurrencyBatch gets current concurrency counts for multiple accounts
-// Returns a map of accountID -> current concurrency count
+// GetAccountConcurrencyBatch gets current concurrency counts for multiple accounts.
+// Uses a detached context with timeout to prevent HTTP request cancellation from
+// causing the entire batch to fail (which would show all concurrency as 0).
 func (s *ConcurrencyService) GetAccountConcurrencyBatch(ctx context.Context, accountIDs []int64) (map[int64]int, error) {
 	if len(accountIDs) == 0 {
 		return map[int64]int{}, nil
@@ -344,5 +345,11 @@ func (s *ConcurrencyService) GetAccountConcurrencyBatch(ctx context.Context, acc
 		}
 		return result, nil
 	}
-	return s.cache.GetAccountConcurrencyBatch(ctx, accountIDs)
+
+	// Use a detached context so that a cancelled HTTP request doesn't cause
+	// the Redis pipeline to fail and return all-zero concurrency counts.
+	redisCtx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
+	defer cancel()
+
+	return s.cache.GetAccountConcurrencyBatch(redisCtx, accountIDs)
 }
diff --git a/backend/internal/service/error_passthrough_runtime_test.go b/backend/internal/service/error_passthrough_runtime_test.go
index 7032d15b..2b7bbf60 100644
--- a/backend/internal/service/error_passthrough_runtime_test.go
+++ b/backend/internal/service/error_passthrough_runtime_test.go
@@ -220,7 +220,7 @@ func TestApplyErrorPassthroughRule_SkipMonitoringSetsContextKey(t *testing.T) {
 	v, exists := c.Get(OpsSkipPassthroughKey)
 	assert.True(t, exists, "OpsSkipPassthroughKey should be set when skip_monitoring=true")
 	boolVal, ok := v.(bool)
-	assert.True(t, ok, "value should be bool")
+	assert.True(t, ok, "value should be a bool")
 	assert.True(t, boolVal)
 }
 
diff --git a/backend/internal/service/gateway_claude_max_response_helpers.go b/backend/internal/service/gateway_claude_max_response_helpers.go
new file mode 100644
index 00000000..a5f5f3d2
--- /dev/null
+++ b/backend/internal/service/gateway_claude_max_response_helpers.go
@@ -0,0 +1,196 @@
+package service
+
+import (
+	"context"
+	"encoding/json"
+
+	"github.com/Wei-Shaw/sub2api/internal/pkg/antigravity"
+	"github.com/gin-gonic/gin"
+	"github.com/tidwall/sjson"
+)
+
+type claudeMaxResponseRewriteContext struct {
+	Parsed *ParsedRequest
+	Group  *Group
+}
+
+type claudeMaxResponseRewriteContextKeyType struct{}
+
+var claudeMaxResponseRewriteContextKey = claudeMaxResponseRewriteContextKeyType{}
+
+func withClaudeMaxResponseRewriteContext(ctx context.Context, c *gin.Context, parsed *ParsedRequest) context.Context {
+	if ctx == nil {
+		ctx = context.Background()
+	}
+	value := claudeMaxResponseRewriteContext{
+		Parsed: parsed,
+		Group:  claudeMaxGroupFromGinContext(c),
+	}
+	return context.WithValue(ctx, claudeMaxResponseRewriteContextKey, value)
+}
+
+func claudeMaxResponseRewriteContextFromContext(ctx context.Context) claudeMaxResponseRewriteContext {
+	if ctx == nil {
+		return claudeMaxResponseRewriteContext{}
+	}
+	value, _ := ctx.Value(claudeMaxResponseRewriteContextKey).(claudeMaxResponseRewriteContext)
+	return value
+}
+
+func claudeMaxGroupFromGinContext(c *gin.Context) *Group {
+	if c == nil {
+		return nil
+	}
+	raw, exists := c.Get("api_key")
+	if !exists {
+		return nil
+	}
+	apiKey, ok := raw.(*APIKey)
+	if !ok || apiKey == nil {
+		return nil
+	}
+	return apiKey.Group
+}
+
+func parsedRequestFromGinContext(c *gin.Context) *ParsedRequest {
+	if c == nil {
+		return nil
+	}
+	raw, exists := c.Get("parsed_request")
+	if !exists {
+		return nil
+	}
+	parsed, _ := raw.(*ParsedRequest)
+	return parsed
+}
+
+func applyClaudeMaxSimulationToUsage(ctx context.Context, usage *ClaudeUsage, model string, accountID int64) claudeMaxCacheBillingOutcome {
+	var out claudeMaxCacheBillingOutcome
+	if usage == nil {
+		return out
+	}
+	rewriteCtx := claudeMaxResponseRewriteContextFromContext(ctx)
+	return applyClaudeMaxCacheBillingPolicyToUsage(usage, rewriteCtx.Parsed, rewriteCtx.Group, model, accountID)
+}
+
+func applyClaudeMaxSimulationToUsageJSONMap(ctx context.Context, usageObj map[string]any, model string, accountID int64) claudeMaxCacheBillingOutcome {
+	var out claudeMaxCacheBillingOutcome
+	if usageObj == nil {
+		return out
+	}
+	usage := claudeUsageFromJSONMap(usageObj)
+	out = applyClaudeMaxSimulationToUsage(ctx, &usage, model, accountID)
+	if out.Simulated {
+		rewriteClaudeUsageJSONMap(usageObj, usage)
+	}
+	return out
+}
+
+func rewriteClaudeUsageJSONBytes(body []byte, usage ClaudeUsage) []byte {
+	updated := body
+	var err error
+
+	updated, err = sjson.SetBytes(updated, "usage.input_tokens", usage.InputTokens)
+	if err != nil {
+		return body
+	}
+	updated, err = sjson.SetBytes(updated, "usage.cache_creation_input_tokens", usage.CacheCreationInputTokens)
+	if err != nil {
+		return body
+	}
+	updated, err = sjson.SetBytes(updated, "usage.cache_creation.ephemeral_5m_input_tokens", usage.CacheCreation5mTokens)
+	if err != nil {
+		return body
+	}
+	updated, err = sjson.SetBytes(updated, "usage.cache_creation.ephemeral_1h_input_tokens", usage.CacheCreation1hTokens)
+	if err != nil {
+		return body
+	}
+	return updated
+}
+
+func claudeUsageFromJSONMap(usageObj map[string]any) ClaudeUsage {
+	var usage ClaudeUsage
+	if usageObj == nil {
+		return usage
+	}
+
+	usage.InputTokens = usageIntFromAny(usageObj["input_tokens"])
+	usage.OutputTokens = usageIntFromAny(usageObj["output_tokens"])
+	usage.CacheCreationInputTokens = usageIntFromAny(usageObj["cache_creation_input_tokens"])
+	usage.CacheReadInputTokens = usageIntFromAny(usageObj["cache_read_input_tokens"])
+
+	if ccObj, ok := usageObj["cache_creation"].(map[string]any); ok {
+		usage.CacheCreation5mTokens = usageIntFromAny(ccObj["ephemeral_5m_input_tokens"])
+		usage.CacheCreation1hTokens = usageIntFromAny(ccObj["ephemeral_1h_input_tokens"])
+	}
+	return usage
+}
+
+func rewriteClaudeUsageJSONMap(usageObj map[string]any, usage ClaudeUsage) {
+	if usageObj == nil {
+		return
+	}
+	usageObj["input_tokens"] = usage.InputTokens
+	usageObj["cache_creation_input_tokens"] = usage.CacheCreationInputTokens
+
+	ccObj, _ := usageObj["cache_creation"].(map[string]any)
+	if ccObj == nil {
+		ccObj = make(map[string]any, 2)
+		usageObj["cache_creation"] = ccObj
+	}
+	ccObj["ephemeral_5m_input_tokens"] = usage.CacheCreation5mTokens
+	ccObj["ephemeral_1h_input_tokens"] = usage.CacheCreation1hTokens
+}
+
+func usageIntFromAny(v any) int {
+	switch value := v.(type) {
+	case int:
+		return value
+	case int64:
+		return int(value)
+	case float64:
+		return int(value)
+	case json.Number:
+		if n, err := value.Int64(); err == nil {
+			return int(n)
+		}
+	}
+	return 0
+}
+
+// setupClaudeMaxStreamingHook 为 Antigravity 流式路径设置 SSE usage 改写 hook。
+func setupClaudeMaxStreamingHook(c *gin.Context, processor *antigravity.StreamingProcessor, originalModel string, accountID int64) {
+	group := claudeMaxGroupFromGinContext(c)
+	parsed := parsedRequestFromGinContext(c)
+	if !shouldApplyClaudeMaxBillingRulesForUsage(group, originalModel, parsed) {
+		return
+	}
+	processor.SetUsageMapHook(func(usageMap map[string]any) {
+		svcUsage := claudeUsageFromJSONMap(usageMap)
+		outcome := applyClaudeMaxCacheBillingPolicyToUsage(&svcUsage, parsed, group, originalModel, accountID)
+		if outcome.Simulated {
+			rewriteClaudeUsageJSONMap(usageMap, svcUsage)
+		}
+	})
+}
+
+// applyClaudeMaxNonStreamingRewrite 为 Antigravity 非流式路径改写响应体中的 usage。
+func applyClaudeMaxNonStreamingRewrite(c *gin.Context, claudeResp []byte, agUsage *antigravity.ClaudeUsage, originalModel string, accountID int64) []byte {
+	group := claudeMaxGroupFromGinContext(c)
+	parsed := parsedRequestFromGinContext(c)
+	if !shouldApplyClaudeMaxBillingRulesForUsage(group, originalModel, parsed) {
+		return claudeResp
+	}
+	svcUsage := &ClaudeUsage{
+		InputTokens:              agUsage.InputTokens,
+		OutputTokens:             agUsage.OutputTokens,
+		CacheCreationInputTokens: agUsage.CacheCreationInputTokens,
+		CacheReadInputTokens:     agUsage.CacheReadInputTokens,
+	}
+	outcome := applyClaudeMaxCacheBillingPolicyToUsage(svcUsage, parsed, group, originalModel, accountID)
+	if outcome.Simulated {
+		return rewriteClaudeUsageJSONBytes(claudeResp, *svcUsage)
+	}
+	return claudeResp
+}
diff --git a/backend/internal/service/gateway_multiplatform_test.go b/backend/internal/service/gateway_multiplatform_test.go
index 1cb3c61e..320ceaa7 100644
--- a/backend/internal/service/gateway_multiplatform_test.go
+++ b/backend/internal/service/gateway_multiplatform_test.go
@@ -187,6 +187,14 @@ func (m *mockAccountRepoForPlatform) BulkUpdate(ctx context.Context, ids []int64
 	return 0, nil
 }
 
+func (m *mockAccountRepoForPlatform) IncrementQuotaUsed(ctx context.Context, id int64, amount float64) error {
+	return nil
+}
+
+func (m *mockAccountRepoForPlatform) ResetQuotaUsed(ctx context.Context, id int64) error {
+	return nil
+}
+
 // Verify interface implementation
 var _ AccountRepository = (*mockAccountRepoForPlatform)(nil)
 
diff --git a/backend/internal/service/gateway_record_usage_claude_max_test.go b/backend/internal/service/gateway_record_usage_claude_max_test.go
new file mode 100644
index 00000000..3cd86938
--- /dev/null
+++ b/backend/internal/service/gateway_record_usage_claude_max_test.go
@@ -0,0 +1,199 @@
+package service
+
+import (
+	"context"
+	"testing"
+	"time"
+
+	"github.com/Wei-Shaw/sub2api/internal/config"
+	"github.com/stretchr/testify/require"
+)
+
+type usageLogRepoRecordUsageStub struct {
+	UsageLogRepository
+
+	last     *UsageLog
+	inserted bool
+	err      error
+}
+
+func (s *usageLogRepoRecordUsageStub) Create(_ context.Context, log *UsageLog) (bool, error) {
+	copied := *log
+	s.last = &copied
+	return s.inserted, s.err
+}
+
+func newGatewayServiceForRecordUsageTest(repo UsageLogRepository) *GatewayService {
+	return &GatewayService{
+		usageLogRepo:    repo,
+		billingService:  NewBillingService(&config.Config{}, nil),
+		cfg:             &config.Config{RunMode: config.RunModeSimple},
+		deferredService: &DeferredService{},
+	}
+}
+
+func TestRecordUsage_SimulateClaudeMaxEnabled_ProjectsUsageAndSkipsTTLOverride(t *testing.T) {
+	repo := &usageLogRepoRecordUsageStub{inserted: true}
+	svc := newGatewayServiceForRecordUsageTest(repo)
+
+	groupID := int64(11)
+	input := &RecordUsageInput{
+		Result: &ForwardResult{
+			RequestID: "req-sim-1",
+			Model:     "claude-sonnet-4",
+			Duration:  time.Second,
+			Usage: ClaudeUsage{
+				InputTokens: 160,
+			},
+		},
+		ParsedRequest: &ParsedRequest{
+			Model: "claude-sonnet-4",
+			Messages: []any{
+				map[string]any{
+					"role": "user",
+					"content": []any{
+						map[string]any{
+							"type":          "text",
+							"text":          "long cached context for prior turns",
+							"cache_control": map[string]any{"type": "ephemeral"},
+						},
+						map[string]any{
+							"type": "text",
+							"text": "please summarize the logs and provide root cause analysis",
+						},
+					},
+				},
+			},
+		},
+		APIKey: &APIKey{
+			ID:      1,
+			GroupID: &groupID,
+			Group: &Group{
+				ID:                       groupID,
+				Platform:                 PlatformAnthropic,
+				RateMultiplier:           1,
+				SimulateClaudeMaxEnabled: true,
+			},
+		},
+		User: &User{ID: 2},
+		Account: &Account{
+			ID:       3,
+			Platform: PlatformAnthropic,
+			Type:     AccountTypeOAuth,
+			Extra: map[string]any{
+				"cache_ttl_override_enabled": true,
+				"cache_ttl_override_target":  "5m",
+			},
+		},
+	}
+
+	err := svc.RecordUsage(context.Background(), input)
+	require.NoError(t, err)
+	require.NotNil(t, repo.last)
+
+	log := repo.last
+	require.Equal(t, 80, log.InputTokens)
+	require.Equal(t, 80, log.CacheCreationTokens)
+	require.Equal(t, 0, log.CacheCreation5mTokens)
+	require.Equal(t, 80, log.CacheCreation1hTokens)
+	require.False(t, log.CacheTTLOverridden, "simulate outcome should skip account ttl override")
+}
+
+func TestRecordUsage_SimulateClaudeMaxDisabled_AppliesTTLOverride(t *testing.T) {
+	repo := &usageLogRepoRecordUsageStub{inserted: true}
+	svc := newGatewayServiceForRecordUsageTest(repo)
+
+	groupID := int64(12)
+	input := &RecordUsageInput{
+		Result: &ForwardResult{
+			RequestID: "req-sim-2",
+			Model:     "claude-sonnet-4",
+			Duration:  time.Second,
+			Usage: ClaudeUsage{
+				InputTokens:              40,
+				CacheCreationInputTokens: 120,
+				CacheCreation1hTokens:    120,
+			},
+		},
+		APIKey: &APIKey{
+			ID:      2,
+			GroupID: &groupID,
+			Group: &Group{
+				ID:                       groupID,
+				Platform:                 PlatformAnthropic,
+				RateMultiplier:           1,
+				SimulateClaudeMaxEnabled: false,
+			},
+		},
+		User: &User{ID: 3},
+		Account: &Account{
+			ID:       4,
+			Platform: PlatformAnthropic,
+			Type:     AccountTypeOAuth,
+			Extra: map[string]any{
+				"cache_ttl_override_enabled": true,
+				"cache_ttl_override_target":  "5m",
+			},
+		},
+	}
+
+	err := svc.RecordUsage(context.Background(), input)
+	require.NoError(t, err)
+	require.NotNil(t, repo.last)
+
+	log := repo.last
+	require.Equal(t, 120, log.CacheCreationTokens)
+	require.Equal(t, 120, log.CacheCreation5mTokens)
+	require.Equal(t, 0, log.CacheCreation1hTokens)
+	require.True(t, log.CacheTTLOverridden)
+}
+
+func TestRecordUsage_SimulateClaudeMaxEnabled_ExistingCacheCreationBypassesSimulation(t *testing.T) {
+	repo := &usageLogRepoRecordUsageStub{inserted: true}
+	svc := newGatewayServiceForRecordUsageTest(repo)
+
+	groupID := int64(13)
+	input := &RecordUsageInput{
+		Result: &ForwardResult{
+			RequestID: "req-sim-3",
+			Model:     "claude-sonnet-4",
+			Duration:  time.Second,
+			Usage: ClaudeUsage{
+				InputTokens:              20,
+				CacheCreationInputTokens: 120,
+				CacheCreation5mTokens:    120,
+			},
+		},
+		APIKey: &APIKey{
+			ID:      3,
+			GroupID: &groupID,
+			Group: &Group{
+				ID:                       groupID,
+				Platform:                 PlatformAnthropic,
+				RateMultiplier:           1,
+				SimulateClaudeMaxEnabled: true,
+			},
+		},
+		User: &User{ID: 4},
+		Account: &Account{
+			ID:       5,
+			Platform: PlatformAnthropic,
+			Type:     AccountTypeOAuth,
+			Extra: map[string]any{
+				"cache_ttl_override_enabled": true,
+				"cache_ttl_override_target":  "5m",
+			},
+		},
+	}
+
+	err := svc.RecordUsage(context.Background(), input)
+	require.NoError(t, err)
+	require.NotNil(t, repo.last)
+
+	log := repo.last
+	require.Equal(t, 20, log.InputTokens)
+	require.Equal(t, 120, log.CacheCreation5mTokens)
+	require.Equal(t, 0, log.CacheCreation1hTokens)
+	require.Equal(t, 120, log.CacheCreationTokens)
+	require.False(t, log.CacheTTLOverridden, "existing cache_creation with SimulateClaudeMax enabled should skip account ttl override")
+}
diff --git a/backend/internal/service/gateway_response_usage_sync_test.go b/backend/internal/service/gateway_response_usage_sync_test.go
new file mode 100644
index 00000000..445ee8ad
--- /dev/null
+++ b/backend/internal/service/gateway_response_usage_sync_test.go
@@ -0,0 +1,170 @@
+package service
+
+import (
+	"bytes"
+	"context"
+	"encoding/json"
+	"net/http"
+	"net/http/httptest"
+	"testing"
+
+	"github.com/Wei-Shaw/sub2api/internal/config"
+	"github.com/gin-gonic/gin"
+	"github.com/stretchr/testify/require"
+	"github.com/tidwall/gjson"
+)
+
+func TestHandleNonStreamingResponse_UsageAlignedWithClaudeMaxSimulation(t *testing.T) {
+	gin.SetMode(gin.TestMode)
+
+	svc := &GatewayService{
+		cfg:              &config.Config{},
+		rateLimitService: &RateLimitService{},
+	}
+
+	account := &Account{
+		ID:       11,
+		Platform: PlatformAnthropic,
+		Type:     AccountTypeOAuth,
+		Extra: map[string]any{
+			"cache_ttl_override_enabled": true,
+			"cache_ttl_override_target":  "5m",
+		},
+	}
+	group := &Group{
+		ID:                       99,
+		Platform:                 PlatformAnthropic,
+		SimulateClaudeMaxEnabled: true,
+	}
+	parsed := &ParsedRequest{
+		Model: "claude-sonnet-4",
+		Messages: []any{
+			map[string]any{
+				"role": "user",
+				"content": []any{
+					map[string]any{
+						"type":          "text",
+						"text":          "long cached context",
+						"cache_control": map[string]any{"type": "ephemeral"},
+					},
+					map[string]any{
+						"type": "text",
+						"text": "new user question",
+					},
+				},
+			},
+		},
+	}
+
+	upstreamBody := []byte(`{"id":"msg_1","model":"claude-sonnet-4","usage":{"input_tokens":120,"output_tokens":8}}`)
+	resp := &http.Response{
+		StatusCode: http.StatusOK,
+		Header:     http.Header{"Content-Type": []string{"application/json"}},
+		Body:       ioNopCloserBytes(upstreamBody),
+	}
+
+	rec := httptest.NewRecorder()
+	c, _ := gin.CreateTestContext(rec)
+	c.Request = httptest.NewRequest(http.MethodPost, "/v1/messages", bytes.NewReader(nil))
+	c.Set("api_key", &APIKey{Group: group})
+	requestCtx := withClaudeMaxResponseRewriteContext(context.Background(), c, parsed)
+
+	usage, err := svc.handleNonStreamingResponse(requestCtx, resp, c, account, "claude-sonnet-4", "claude-sonnet-4")
+	require.NoError(t, err)
+	require.NotNil(t, usage)
+
+	var rendered struct {
+		Usage ClaudeUsage `json:"usage"`
+	}
+	require.NoError(t, json.Unmarshal(rec.Body.Bytes(), &rendered))
+	rendered.Usage.CacheCreation5mTokens = int(gjson.GetBytes(rec.Body.Bytes(), "usage.cache_creation.ephemeral_5m_input_tokens").Int())
+	rendered.Usage.CacheCreation1hTokens = int(gjson.GetBytes(rec.Body.Bytes(), "usage.cache_creation.ephemeral_1h_input_tokens").Int())
+
+	require.Equal(t, rendered.Usage.InputTokens, usage.InputTokens)
+	require.Equal(t, rendered.Usage.OutputTokens, usage.OutputTokens)
+	require.Equal(t, rendered.Usage.CacheCreationInputTokens, usage.CacheCreationInputTokens)
+	require.Equal(t, rendered.Usage.CacheCreation5mTokens, usage.CacheCreation5mTokens)
+	require.Equal(t, rendered.Usage.CacheCreation1hTokens, usage.CacheCreation1hTokens)
+	require.Equal(t, rendered.Usage.CacheReadInputTokens, usage.CacheReadInputTokens)
+
+	require.Greater(t, usage.CacheCreation1hTokens, 0)
+	require.Equal(t, 0, usage.CacheCreation5mTokens)
+	require.Less(t, usage.InputTokens, 120)
+}
+
+func TestHandleNonStreamingResponse_ClaudeMaxDisabled_NoSimulationIntercept(t *testing.T) {
+	gin.SetMode(gin.TestMode)
+
+	svc := &GatewayService{
+		cfg:              &config.Config{},
+		rateLimitService: &RateLimitService{},
+	}
+
+	account := &Account{
+		ID:       12,
+		Platform: PlatformAnthropic,
+		Type:     AccountTypeOAuth,
+		Extra: map[string]any{
+			"cache_ttl_override_enabled": true,
+			"cache_ttl_override_target":  "5m",
+		},
+	}
+	group := &Group{
+		ID:                       100,
+		Platform:                 PlatformAnthropic,
+		SimulateClaudeMaxEnabled: false,
+	}
+	parsed := &ParsedRequest{
+		Model: "claude-sonnet-4",
+		Messages: []any{
+			map[string]any{
+				"role": "user",
+				"content": []any{
+					map[string]any{
+						"type":          "text",
+						"text":          "long cached context",
+						"cache_control": map[string]any{"type": "ephemeral"},
+					},
+					map[string]any{
+						"type": "text",
+						"text": "new user question",
+					},
+				},
+			},
+		},
+	}
+
+	upstreamBody := []byte(`{"id":"msg_2","model":"claude-sonnet-4","usage":{"input_tokens":120,"output_tokens":8}}`)
+	resp := &http.Response{
+		StatusCode: http.StatusOK,
+		Header:     http.Header{"Content-Type": []string{"application/json"}},
+		Body:       ioNopCloserBytes(upstreamBody),
+	}
+
+	rec := httptest.NewRecorder()
+	c, _ := gin.CreateTestContext(rec)
+	c.Request = httptest.NewRequest(http.MethodPost, "/v1/messages", bytes.NewReader(nil))
+	c.Set("api_key", &APIKey{Group: group})
+	requestCtx := withClaudeMaxResponseRewriteContext(context.Background(), c, parsed)
+
+	usage, err := svc.handleNonStreamingResponse(requestCtx, resp, c, account, "claude-sonnet-4", "claude-sonnet-4")
+	require.NoError(t, err)
+	require.NotNil(t, usage)
+
+	require.Equal(t, 120, usage.InputTokens)
+	require.Equal(t, 0, usage.CacheCreationInputTokens)
+	require.Equal(t, 0, usage.CacheCreation5mTokens)
+	require.Equal(t, 0, usage.CacheCreation1hTokens)
+}
+
+func ioNopCloserBytes(b []byte) *readCloserFromBytes {
+	return &readCloserFromBytes{Reader: bytes.NewReader(b)}
+}
+
+type readCloserFromBytes struct {
+	*bytes.Reader
+}
+
+func (r *readCloserFromBytes) Close() error {
+	return nil
+}
diff --git a/backend/internal/service/gateway_service.go b/backend/internal/service/gateway_service.go
index 132361f4..ccccdf4d 100644
--- a/backend/internal/service/gateway_service.go
+++ b/backend/internal/service/gateway_service.go
@@ -56,6 +56,12 @@ const (
 	claudeMimicDebugInfoKey = "claude_mimic_debug_info"
 )
 
+const (
+	claudeMaxMessageOverheadTokens = 3
+	claudeMaxBlockOverheadTokens   = 1
+	claudeMaxUnknownContentTokens  = 4
+)
+
 // ForceCacheBillingContextKey 强制缓存计费上下文键
 // 用于粘性会话切换时，将 input_tokens 转为 cache_read_input_tokens 计费
 type forceCacheBillingKeyType struct{}
@@ -1228,6 +1234,10 @@ func (s *GatewayService) SelectAccountWithLoadAwareness(ctx context.Context, gro
 				modelScopeSkippedIDs = append(modelScopeSkippedIDs, account.ID)
 				continue
 			}
+			// 配额检查
+			if !s.isAccountSchedulableForQuota(account) {
+				continue
+			}
 			// 窗口费用检查（非粘性会话路径）
 			if !s.isAccountSchedulableForWindowCost(ctx, account, false) {
 				filteredWindowCost++
@@ -1260,6 +1270,7 @@ func (s *GatewayService) SelectAccountWithLoadAwareness(ctx context.Context, gro
 							s.isAccountAllowedForPlatform(stickyAccount, platform, useMixed) &&
 							(requestedModel == "" || s.isModelSupportedByAccountWithContext(ctx, stickyAccount, requestedModel)) &&
 							s.isAccountSchedulableForModelSelection(ctx, stickyAccount, requestedModel) &&
+							s.isAccountSchedulableForQuota(stickyAccount) &&
 							s.isAccountSchedulableForWindowCost(ctx, stickyAccount, true) &&
 
 							s.isAccountSchedulableForRPM(ctx, stickyAccount, true) { // 粘性会话窗口费用+RPM 检查
@@ -1311,7 +1322,7 @@ func (s *GatewayService) SelectAccountWithLoadAwareness(ctx context.Context, gro
 			for _, acc := range routingCandidates {
 				routingLoads = append(routingLoads, AccountWithConcurrency{
 					ID:             acc.ID,
-					MaxConcurrency: acc.Concurrency,
+					MaxConcurrency: acc.EffectiveLoadFactor(),
 				})
 			}
 			routingLoadMap, _ := s.concurrencyService.GetAccountsLoadBatch(ctx, routingLoads)
@@ -1416,6 +1427,7 @@ func (s *GatewayService) SelectAccountWithLoadAwareness(ctx context.Context, gro
 					s.isAccountAllowedForPlatform(account, platform, useMixed) &&
 					(requestedModel == "" || s.isModelSupportedByAccountWithContext(ctx, account, requestedModel)) &&
 					s.isAccountSchedulableForModelSelection(ctx, account, requestedModel) &&
+					s.isAccountSchedulableForQuota(account) &&
 					s.isAccountSchedulableForWindowCost(ctx, account, true) &&
 
 					s.isAccountSchedulableForRPM(ctx, account, true) { // 粘性会话窗口费用+RPM 检查
@@ -1480,6 +1492,10 @@ func (s *GatewayService) SelectAccountWithLoadAwareness(ctx context.Context, gro
 		if !s.isAccountSchedulableForModelSelection(ctx, acc, requestedModel) {
 			continue
 		}
+		// 配额检查
+		if !s.isAccountSchedulableForQuota(acc) {
+			continue
+		}
 		// 窗口费用检查（非粘性会话路径）
 		if !s.isAccountSchedulableForWindowCost(ctx, acc, false) {
 			continue
@@ -1499,7 +1515,7 @@ func (s *GatewayService) SelectAccountWithLoadAwareness(ctx context.Context, gro
 	for _, acc := range candidates {
 		accountLoads = append(accountLoads, AccountWithConcurrency{
 			ID:             acc.ID,
-			MaxConcurrency: acc.Concurrency,
+			MaxConcurrency: acc.EffectiveLoadFactor(),
 		})
 	}
 
@@ -2113,6 +2129,15 @@ func (s *GatewayService) withWindowCostPrefetch(ctx context.Context, accounts []
 	return context.WithValue(ctx, windowCostPrefetchContextKey, costs)
 }
 
+// isAccountSchedulableForQuota 检查 API Key 账号是否在配额限制内
+// 仅适用于配置了 quota_limit 的 apikey 类型账号
+func (s *GatewayService) isAccountSchedulableForQuota(account *Account) bool {
+	if account.Type != AccountTypeAPIKey {
+		return true
+	}
+	return !account.IsQuotaExceeded()
+}
+
 // isAccountSchedulableForWindowCost 检查账号是否可根据窗口费用进行调度
 // 仅适用于 Anthropic OAuth/SetupToken 账号
 // 返回 true 表示可调度，false 表示不可调度
@@ -2590,7 +2615,7 @@ func (s *GatewayService) selectAccountForModelWithPlatform(ctx context.Context,
 						if clearSticky {
 							_ = s.cache.DeleteSessionAccountID(ctx, derefGroupID(groupID), sessionHash)
 						}
-						if !clearSticky && s.isAccountInGroup(account, groupID) && account.Platform == platform && (requestedModel == "" || s.isModelSupportedByAccountWithContext(ctx, account, requestedModel)) && s.isAccountSchedulableForModelSelection(ctx, account, requestedModel) && s.isAccountSchedulableForWindowCost(ctx, account, true) && s.isAccountSchedulableForRPM(ctx, account, true) {
+						if !clearSticky && s.isAccountInGroup(account, groupID) && account.Platform == platform && (requestedModel == "" || s.isModelSupportedByAccountWithContext(ctx, account, requestedModel)) && s.isAccountSchedulableForModelSelection(ctx, account, requestedModel) && s.isAccountSchedulableForQuota(account) && s.isAccountSchedulableForWindowCost(ctx, account, true) && s.isAccountSchedulableForRPM(ctx, account, true) {
 							if s.debugModelRoutingEnabled() {
 								logger.LegacyPrintf("service.gateway", "[ModelRoutingDebug] legacy routed sticky hit: group_id=%v model=%s session=%s account=%d", derefGroupID(groupID), requestedModel, shortSessionHash(sessionHash), accountID)
 							}
@@ -2644,6 +2669,9 @@ func (s *GatewayService) selectAccountForModelWithPlatform(ctx context.Context,
 			if !s.isAccountSchedulableForModelSelection(ctx, acc, requestedModel) {
 				continue
 			}
+			if !s.isAccountSchedulableForQuota(acc) {
+				continue
+			}
 			if !s.isAccountSchedulableForWindowCost(ctx, acc, false) {
 				continue
 			}
@@ -2700,7 +2728,7 @@ func (s *GatewayService) selectAccountForModelWithPlatform(ctx context.Context,
 					if clearSticky {
 						_ = s.cache.DeleteSessionAccountID(ctx, derefGroupID(groupID), sessionHash)
 					}
-					if !clearSticky && s.isAccountInGroup(account, groupID) && account.Platform == platform && (requestedModel == "" || s.isModelSupportedByAccountWithContext(ctx, account, requestedModel)) && s.isAccountSchedulableForModelSelection(ctx, account, requestedModel) && s.isAccountSchedulableForWindowCost(ctx, account, true) && s.isAccountSchedulableForRPM(ctx, account, true) {
+					if !clearSticky && s.isAccountInGroup(account, groupID) && account.Platform == platform && (requestedModel == "" || s.isModelSupportedByAccountWithContext(ctx, account, requestedModel)) && s.isAccountSchedulableForModelSelection(ctx, account, requestedModel) && s.isAccountSchedulableForQuota(account) && s.isAccountSchedulableForWindowCost(ctx, account, true) && s.isAccountSchedulableForRPM(ctx, account, true) {
 						return account, nil
 					}
 				}
@@ -2743,6 +2771,9 @@ func (s *GatewayService) selectAccountForModelWithPlatform(ctx context.Context,
 		if !s.isAccountSchedulableForModelSelection(ctx, acc, requestedModel) {
 			continue
 		}
+		if !s.isAccountSchedulableForQuota(acc) {
+			continue
+		}
 		if !s.isAccountSchedulableForWindowCost(ctx, acc, false) {
 			continue
 		}
@@ -2818,7 +2849,7 @@ func (s *GatewayService) selectAccountWithMixedScheduling(ctx context.Context, g
 						if clearSticky {
 							_ = s.cache.DeleteSessionAccountID(ctx, derefGroupID(groupID), sessionHash)
 						}
-						if !clearSticky && s.isAccountInGroup(account, groupID) && (requestedModel == "" || s.isModelSupportedByAccountWithContext(ctx, account, requestedModel)) && s.isAccountSchedulableForModelSelection(ctx, account, requestedModel) && s.isAccountSchedulableForWindowCost(ctx, account, true) && s.isAccountSchedulableForRPM(ctx, account, true) {
+						if !clearSticky && s.isAccountInGroup(account, groupID) && (requestedModel == "" || s.isModelSupportedByAccountWithContext(ctx, account, requestedModel)) && s.isAccountSchedulableForModelSelection(ctx, account, requestedModel) && s.isAccountSchedulableForQuota(account) && s.isAccountSchedulableForWindowCost(ctx, account, true) && s.isAccountSchedulableForRPM(ctx, account, true) {
 							if account.Platform == nativePlatform || (account.Platform == PlatformAntigravity && account.IsMixedSchedulingEnabled()) {
 								if s.debugModelRoutingEnabled() {
 									logger.LegacyPrintf("service.gateway", "[ModelRoutingDebug] legacy mixed routed sticky hit: group_id=%v model=%s session=%s account=%d", derefGroupID(groupID), requestedModel, shortSessionHash(sessionHash), accountID)
@@ -2874,6 +2905,9 @@ func (s *GatewayService) selectAccountWithMixedScheduling(ctx context.Context, g
 			if !s.isAccountSchedulableForModelSelection(ctx, acc, requestedModel) {
 				continue
 			}
+			if !s.isAccountSchedulableForQuota(acc) {
+				continue
+			}
 			if !s.isAccountSchedulableForWindowCost(ctx, acc, false) {
 				continue
 			}
@@ -2930,7 +2964,7 @@ func (s *GatewayService) selectAccountWithMixedScheduling(ctx context.Context, g
 					if clearSticky {
 						_ = s.cache.DeleteSessionAccountID(ctx, derefGroupID(groupID), sessionHash)
 					}
-					if !clearSticky && s.isAccountInGroup(account, groupID) && (requestedModel == "" || s.isModelSupportedByAccountWithContext(ctx, account, requestedModel)) && s.isAccountSchedulableForModelSelection(ctx, account, requestedModel) && s.isAccountSchedulableForWindowCost(ctx, account, true) && s.isAccountSchedulableForRPM(ctx, account, true) {
+					if !clearSticky && s.isAccountInGroup(account, groupID) && (requestedModel == "" || s.isModelSupportedByAccountWithContext(ctx, account, requestedModel)) && s.isAccountSchedulableForModelSelection(ctx, account, requestedModel) && s.isAccountSchedulableForQuota(account) && s.isAccountSchedulableForWindowCost(ctx, account, true) && s.isAccountSchedulableForRPM(ctx, account, true) {
 						if account.Platform == nativePlatform || (account.Platform == PlatformAntigravity && account.IsMixedSchedulingEnabled()) {
 							return account, nil
 						}
@@ -2975,6 +3009,9 @@ func (s *GatewayService) selectAccountWithMixedScheduling(ctx context.Context, g
 		if !s.isAccountSchedulableForModelSelection(ctx, acc, requestedModel) {
 			continue
 		}
+		if !s.isAccountSchedulableForQuota(acc) {
+			continue
+		}
 		if !s.isAccountSchedulableForWindowCost(ctx, acc, false) {
 			continue
 		}
@@ -4317,6 +4354,7 @@ func (s *GatewayService) Forward(ctx context.Context, c *gin.Context, account *A
 	}
 
 	// 处理正常响应
+	ctx = withClaudeMaxResponseRewriteContext(ctx, c, parsed)
 
 	// 触发上游接受回调（提前释放串行锁，不等流完成）
 	if parsed.OnUpstreamAccepted != nil {
@@ -5773,6 +5811,7 @@ func (s *GatewayService) handleStreamingResponse(ctx context.Context, resp *http
 
 	needModelReplace := originalModel != mappedModel
 	clientDisconnected := false // 客户端断开标志，断开后继续读取上游以获取完整usage
+	skipAccountTTLOverride := false
 
 	pendingEventLines := make([]string, 0, 4)
 
@@ -5833,17 +5872,25 @@ func (s *GatewayService) handleStreamingResponse(ctx context.Context, resp *http
 			if msg, ok := event["message"].(map[string]any); ok {
 				if u, ok := msg["usage"].(map[string]any); ok {
 					eventChanged = reconcileCachedTokens(u) || eventChanged
+					claudeMaxOutcome := applyClaudeMaxSimulationToUsageJSONMap(ctx, u, originalModel, account.ID)
+					if claudeMaxOutcome.Simulated {
+						skipAccountTTLOverride = true
+					}
 				}
 			}
 		}
 		if eventType == "message_delta" {
 			if u, ok := event["usage"].(map[string]any); ok {
 				eventChanged = reconcileCachedTokens(u) || eventChanged
+				claudeMaxOutcome := applyClaudeMaxSimulationToUsageJSONMap(ctx, u, originalModel, account.ID)
+				if claudeMaxOutcome.Simulated {
+					skipAccountTTLOverride = true
+				}
 			}
 		}
 
 		// Cache TTL Override: 重写 SSE 事件中的 cache_creation 分类
-		if account.IsCacheTTLOverrideEnabled() {
+		if account.IsCacheTTLOverrideEnabled() && !skipAccountTTLOverride {
 			overrideTarget := account.GetCacheTTLOverrideTarget()
 			if eventType == "message_start" {
 				if msg, ok := event["message"].(map[string]any); ok {
@@ -6253,8 +6300,13 @@ func (s *GatewayService) handleNonStreamingResponse(ctx context.Context, resp *h
 		}
 	}
 
+	claudeMaxOutcome := applyClaudeMaxSimulationToUsage(ctx, &response.Usage, originalModel, account.ID)
+	if claudeMaxOutcome.Simulated {
+		body = rewriteClaudeUsageJSONBytes(body, response.Usage)
+	}
+
 	// Cache TTL Override: 重写 non-streaming 响应中的 cache_creation 分类
-	if account.IsCacheTTLOverrideEnabled() {
+	if account.IsCacheTTLOverrideEnabled() && !claudeMaxOutcome.Simulated {
 		overrideTarget := account.GetCacheTTLOverrideTarget()
 		if applyCacheTTLOverride(&response.Usage, overrideTarget) {
 			// 同步更新 body JSON 中的嵌套 cache_creation 对象
@@ -6363,6 +6415,7 @@ func (s *GatewayService) getUserGroupRateMultiplier(ctx context.Context, userID,
 // RecordUsageInput 记录使用量的输入参数
 type RecordUsageInput struct {
 	Result            *ForwardResult
+	ParsedRequest     *ParsedRequest
 	APIKey            *APIKey
 	User              *User
 	Account           *Account
@@ -6379,6 +6432,89 @@ type APIKeyQuotaUpdater interface {
 	UpdateRateLimitUsage(ctx context.Context, apiKeyID int64, cost float64) error
 }
 
+// postUsageBillingParams 统一扣费所需的参数
+type postUsageBillingParams struct {
+	Cost                  *CostBreakdown
+	User                  *User
+	APIKey                *APIKey
+	Account               *Account
+	Subscription          *UserSubscription
+	IsSubscriptionBill    bool
+	AccountRateMultiplier float64
+	APIKeyService         APIKeyQuotaUpdater
+}
+
+// postUsageBilling 统一处理使用量记录后的扣费逻辑：
+//   - 订阅/余额扣费
+//   - API Key 配额更新
+//   - API Key 限速用量更新
+//   - 账号配额用量更新（账号口径：TotalCost × 账号计费倍率）
+func postUsageBilling(ctx context.Context, p *postUsageBillingParams, deps *billingDeps) {
+	cost := p.Cost
+
+	// 1. 订阅 / 余额扣费
+	if p.IsSubscriptionBill {
+		if cost.TotalCost > 0 {
+			if err := deps.userSubRepo.IncrementUsage(ctx, p.Subscription.ID, cost.TotalCost); err != nil {
+				slog.Error("increment subscription usage failed", "subscription_id", p.Subscription.ID, "error", err)
+			}
+			deps.billingCacheService.QueueUpdateSubscriptionUsage(p.User.ID, *p.APIKey.GroupID, cost.TotalCost)
+		}
+	} else {
+		if cost.ActualCost > 0 {
+			if err := deps.userRepo.DeductBalance(ctx, p.User.ID, cost.ActualCost); err != nil {
+				slog.Error("deduct balance failed", "user_id", p.User.ID, "error", err)
+			}
+			deps.billingCacheService.QueueDeductBalance(p.User.ID, cost.ActualCost)
+		}
+	}
+
+	// 2. API Key 配额
+	if cost.ActualCost > 0 && p.APIKey.Quota > 0 && p.APIKeyService != nil {
+		if err := p.APIKeyService.UpdateQuotaUsed(ctx, p.APIKey.ID, cost.ActualCost); err != nil {
+			slog.Error("update api key quota failed", "api_key_id", p.APIKey.ID, "error", err)
+		}
+	}
+
+	// 3. API Key 限速用量
+	if cost.ActualCost > 0 && p.APIKey.HasRateLimits() && p.APIKeyService != nil {
+		if err := p.APIKeyService.UpdateRateLimitUsage(ctx, p.APIKey.ID, cost.ActualCost); err != nil {
+			slog.Error("update api key rate limit usage failed", "api_key_id", p.APIKey.ID, "error", err)
+		}
+		deps.billingCacheService.QueueUpdateAPIKeyRateLimitUsage(p.APIKey.ID, cost.ActualCost)
+	}
+
+	// 4. 账号配额用量（账号口径：TotalCost × 账号计费倍率）
+	if cost.TotalCost > 0 && p.Account.Type == AccountTypeAPIKey && p.Account.GetQuotaLimit() > 0 {
+		accountCost := cost.TotalCost * p.AccountRateMultiplier
+		if err := deps.accountRepo.IncrementQuotaUsed(ctx, p.Account.ID, accountCost); err != nil {
+			slog.Error("increment account quota used failed", "account_id", p.Account.ID, "cost", accountCost, "error", err)
+		}
+	}
+
+	// 5. 更新账号最近使用时间
+	deps.deferredService.ScheduleLastUsedUpdate(p.Account.ID)
+}
+
+// billingDeps 扣费逻辑依赖的服务（由各 gateway service 提供）
+type billingDeps struct {
+	accountRepo         AccountRepository
+	userRepo            UserRepository
+	userSubRepo         UserSubscriptionRepository
+	billingCacheService *BillingCacheService
+	deferredService     *DeferredService
+}
+
+func (s *GatewayService) billingDeps() *billingDeps {
+	return &billingDeps{
+		accountRepo:         s.accountRepo,
+		userRepo:            s.userRepo,
+		userSubRepo:         s.userSubRepo,
+		billingCacheService: s.billingCacheService,
+		deferredService:     s.deferredService,
+	}
+}
+
 // RecordUsage 记录使用量并扣费（或更新订阅用量）
 func (s *GatewayService) RecordUsage(ctx context.Context, input *RecordUsageInput) error {
 	result := input.Result
@@ -6396,9 +6532,19 @@ func (s *GatewayService) RecordUsage(ctx context.Context, input *RecordUsageInpu
 		result.Usage.InputTokens = 0
 	}
 
+	// Claude Max cache billing policy (group-level):
+	// - GatewayService 路径: Forward 已改写 usage（含 cache tokens）→ apply 见到 cache tokens 跳过 → simulatedClaudeMax=true（通过第二条件）
+	// - Antigravity 路径: Forward 中 hook 改写了客户端 SSE，但 ForwardResult.Usage 是原始值 → apply 实际执行模拟 → simulatedClaudeMax=true
+	var apiKeyGroup *Group
+	if apiKey != nil {
+		apiKeyGroup = apiKey.Group
+	}
+	claudeMaxOutcome := applyClaudeMaxCacheBillingPolicyToUsage(&result.Usage, input.ParsedRequest, apiKeyGroup, result.Model, account.ID)
+	simulatedClaudeMax := claudeMaxOutcome.Simulated ||
+		(shouldApplyClaudeMaxBillingRulesForUsage(apiKeyGroup, result.Model, input.ParsedRequest) && hasCacheCreationTokens(result.Usage))
 	// Cache TTL Override: 确保计费时 token 分类与账号设置一致
 	cacheTTLOverridden := false
-	if account.IsCacheTTLOverrideEnabled() {
+	if account.IsCacheTTLOverrideEnabled() && !simulatedClaudeMax {
 		applyCacheTTLOverride(&result.Usage, account.GetCacheTTLOverrideTarget())
 		cacheTTLOverridden = (result.Usage.CacheCreation5mTokens + result.Usage.CacheCreation1hTokens) > 0
 	}
@@ -6542,45 +6688,21 @@ func (s *GatewayService) RecordUsage(ctx context.Context, input *RecordUsageInpu
 
 	shouldBill := inserted || err != nil
 
-	// 根据计费类型执行扣费
-	if isSubscriptionBilling {
-		// 订阅模式：更新订阅用量（使用 TotalCost 原始费用，不考虑倍率）
-		if shouldBill && cost.TotalCost > 0 {
-			if err := s.userSubRepo.IncrementUsage(ctx, subscription.ID, cost.TotalCost); err != nil {
-				logger.LegacyPrintf("service.gateway", "Increment subscription usage failed: %v", err)
-			}
-			// 异步更新订阅缓存
-			s.billingCacheService.QueueUpdateSubscriptionUsage(user.ID, *apiKey.GroupID, cost.TotalCost)
-		}
+	if shouldBill {
+		postUsageBilling(ctx, &postUsageBillingParams{
+			Cost:                  cost,
+			User:                  user,
+			APIKey:                apiKey,
+			Account:               account,
+			Subscription:          subscription,
+			IsSubscriptionBill:    isSubscriptionBilling,
+			AccountRateMultiplier: accountRateMultiplier,
+			APIKeyService:         input.APIKeyService,
+		}, s.billingDeps())
 	} else {
-		// 余额模式：扣除用户余额（使用 ActualCost 考虑倍率后的费用）
-		if shouldBill && cost.ActualCost > 0 {
-			if err := s.userRepo.DeductBalance(ctx, user.ID, cost.ActualCost); err != nil {
-				logger.LegacyPrintf("service.gateway", "Deduct balance failed: %v", err)
-			}
-			// 异步更新余额缓存
-			s.billingCacheService.QueueDeductBalance(user.ID, cost.ActualCost)
-		}
+		s.deferredService.ScheduleLastUsedUpdate(account.ID)
 	}
 
-	// 更新 API Key 配额（如果设置了配额限制）
-	if shouldBill && cost.ActualCost > 0 && apiKey.Quota > 0 && input.APIKeyService != nil {
-		if err := input.APIKeyService.UpdateQuotaUsed(ctx, apiKey.ID, cost.ActualCost); err != nil {
-			logger.LegacyPrintf("service.gateway", "Update API key quota failed: %v", err)
-		}
-	}
-
-	// Update API Key rate limit usage
-	if shouldBill && cost.ActualCost > 0 && apiKey.HasRateLimits() && input.APIKeyService != nil {
-		if err := input.APIKeyService.UpdateRateLimitUsage(ctx, apiKey.ID, cost.ActualCost); err != nil {
-			logger.LegacyPrintf("service.gateway", "Update API key rate limit usage failed: %v", err)
-		}
-		s.billingCacheService.QueueUpdateAPIKeyRateLimitUsage(apiKey.ID, cost.ActualCost)
-	}
-
-	// Schedule batch update for account last_used_at
-	s.deferredService.ScheduleLastUsedUpdate(account.ID)
-
 	return nil
 }
 
@@ -6740,44 +6862,21 @@ func (s *GatewayService) RecordUsageWithLongContext(ctx context.Context, input *
 
 	shouldBill := inserted || err != nil
 
-	// 根据计费类型执行扣费
-	if isSubscriptionBilling {
-		// 订阅模式：更新订阅用量（使用 TotalCost 原始费用，不考虑倍率）
-		if shouldBill && cost.TotalCost > 0 {
-			if err := s.userSubRepo.IncrementUsage(ctx, subscription.ID, cost.TotalCost); err != nil {
-				logger.LegacyPrintf("service.gateway", "Increment subscription usage failed: %v", err)
-			}
-			// 异步更新订阅缓存
-			s.billingCacheService.QueueUpdateSubscriptionUsage(user.ID, *apiKey.GroupID, cost.TotalCost)
-		}
+	if shouldBill {
+		postUsageBilling(ctx, &postUsageBillingParams{
+			Cost:                  cost,
+			User:                  user,
+			APIKey:                apiKey,
+			Account:               account,
+			Subscription:          subscription,
+			IsSubscriptionBill:    isSubscriptionBilling,
+			AccountRateMultiplier: accountRateMultiplier,
+			APIKeyService:         input.APIKeyService,
+		}, s.billingDeps())
 	} else {
-		// 余额模式：扣除用户余额（使用 ActualCost 考虑倍率后的费用）
-		if shouldBill && cost.ActualCost > 0 {
-			if err := s.userRepo.DeductBalance(ctx, user.ID, cost.ActualCost); err != nil {
-				logger.LegacyPrintf("service.gateway", "Deduct balance failed: %v", err)
-			}
-			// 异步更新余额缓存
-			s.billingCacheService.QueueDeductBalance(user.ID, cost.ActualCost)
-			// API Key 独立配额扣费
-			if input.APIKeyService != nil && apiKey.Quota > 0 {
-				if err := input.APIKeyService.UpdateQuotaUsed(ctx, apiKey.ID, cost.ActualCost); err != nil {
-					logger.LegacyPrintf("service.gateway", "Add API key quota used failed: %v", err)
-				}
-			}
-		}
+		s.deferredService.ScheduleLastUsedUpdate(account.ID)
 	}
 
-	// Update API Key rate limit usage
-	if shouldBill && cost.ActualCost > 0 && apiKey.HasRateLimits() && input.APIKeyService != nil {
-		if err := input.APIKeyService.UpdateRateLimitUsage(ctx, apiKey.ID, cost.ActualCost); err != nil {
-			logger.LegacyPrintf("service.gateway", "Update API key rate limit usage failed: %v", err)
-		}
-		s.billingCacheService.QueueUpdateAPIKeyRateLimitUsage(apiKey.ID, cost.ActualCost)
-	}
-
-	// Schedule batch update for account last_used_at
-	s.deferredService.ScheduleLastUsedUpdate(account.ID)
-
 	return nil
 }
 
diff --git a/backend/internal/service/gemini_multiplatform_test.go b/backend/internal/service/gemini_multiplatform_test.go
index 9476e984..b0b804eb 100644
--- a/backend/internal/service/gemini_multiplatform_test.go
+++ b/backend/internal/service/gemini_multiplatform_test.go
@@ -176,6 +176,14 @@ func (m *mockAccountRepoForGemini) BulkUpdate(ctx context.Context, ids []int64,
 	return 0, nil
 }
 
+func (m *mockAccountRepoForGemini) IncrementQuotaUsed(ctx context.Context, id int64, amount float64) error {
+	return nil
+}
+
+func (m *mockAccountRepoForGemini) ResetQuotaUsed(ctx context.Context, id int64) error {
+	return nil
+}
+
 // Verify interface implementation
 var _ AccountRepository = (*mockAccountRepoForGemini)(nil)
 
diff --git a/backend/internal/service/group.go b/backend/internal/service/group.go
index 6990caca..c4271038 100644
--- a/backend/internal/service/group.go
+++ b/backend/internal/service/group.go
@@ -50,6 +50,9 @@ type Group struct {
 	// MCP XML 协议注入开关（仅 antigravity 平台使用）
 	MCPXMLInject bool
 
+	// Claude usage 模拟开关：将无写缓存 usage 模拟为 claude-max 风格
+	SimulateClaudeMaxEnabled bool
+
 	// 支持的模型系列（仅 antigravity 平台使用）
 	// 可选值: claude, gemini_text, gemini_image
 	SupportedModelScopes []string
diff --git a/backend/internal/service/openai_account_scheduler.go b/backend/internal/service/openai_account_scheduler.go
index 99013ce5..9cc6aae1 100644
--- a/backend/internal/service/openai_account_scheduler.go
+++ b/backend/internal/service/openai_account_scheduler.go
@@ -590,7 +590,7 @@ func (s *defaultOpenAIAccountScheduler) selectByLoadBalance(
 		filtered = append(filtered, account)
 		loadReq = append(loadReq, AccountWithConcurrency{
 			ID:             account.ID,
-			MaxConcurrency: account.Concurrency,
+			MaxConcurrency: account.EffectiveLoadFactor(),
 		})
 	}
 	if len(filtered) == 0 {
diff --git a/backend/internal/service/openai_gateway_service.go b/backend/internal/service/openai_gateway_service.go
index d92b2ecf..73bdba65 100644
--- a/backend/internal/service/openai_gateway_service.go
+++ b/backend/internal/service/openai_gateway_service.go
@@ -319,6 +319,16 @@ func NewOpenAIGatewayService(
 	return svc
 }
 
+func (s *OpenAIGatewayService) billingDeps() *billingDeps {
+	return &billingDeps{
+		accountRepo:         s.accountRepo,
+		userRepo:            s.userRepo,
+		userSubRepo:         s.userSubRepo,
+		billingCacheService: s.billingCacheService,
+		deferredService:     s.deferredService,
+	}
+}
+
 // CloseOpenAIWSPool 关闭 OpenAI WebSocket 连接池的后台 worker 和空闲连接。
 // 应在应用优雅关闭时调用。
 func (s *OpenAIGatewayService) CloseOpenAIWSPool() {
@@ -1242,7 +1252,7 @@ func (s *OpenAIGatewayService) SelectAccountWithLoadAwareness(ctx context.Contex
 	for _, acc := range candidates {
 		accountLoads = append(accountLoads, AccountWithConcurrency{
 			ID:             acc.ID,
-			MaxConcurrency: acc.Concurrency,
+			MaxConcurrency: acc.EffectiveLoadFactor(),
 		})
 	}
 
@@ -3474,37 +3484,21 @@ func (s *OpenAIGatewayService) RecordUsage(ctx context.Context, input *OpenAIRec
 
 	shouldBill := inserted || err != nil
 
-	// Deduct based on billing type
-	if isSubscriptionBilling {
-		if shouldBill && cost.TotalCost > 0 {
-			_ = s.userSubRepo.IncrementUsage(ctx, subscription.ID, cost.TotalCost)
-			s.billingCacheService.QueueUpdateSubscriptionUsage(user.ID, *apiKey.GroupID, cost.TotalCost)
-		}
+	if shouldBill {
+		postUsageBilling(ctx, &postUsageBillingParams{
+			Cost:                  cost,
+			User:                  user,
+			APIKey:                apiKey,
+			Account:               account,
+			Subscription:          subscription,
+			IsSubscriptionBill:    isSubscriptionBilling,
+			AccountRateMultiplier: accountRateMultiplier,
+			APIKeyService:         input.APIKeyService,
+		}, s.billingDeps())
 	} else {
-		if shouldBill && cost.ActualCost > 0 {
-			_ = s.userRepo.DeductBalance(ctx, user.ID, cost.ActualCost)
-			s.billingCacheService.QueueDeductBalance(user.ID, cost.ActualCost)
-		}
+		s.deferredService.ScheduleLastUsedUpdate(account.ID)
 	}
 
-	// Update API key quota if applicable (only for balance mode with quota set)
-	if shouldBill && cost.ActualCost > 0 && apiKey.Quota > 0 && input.APIKeyService != nil {
-		if err := input.APIKeyService.UpdateQuotaUsed(ctx, apiKey.ID, cost.ActualCost); err != nil {
-			logger.LegacyPrintf("service.openai_gateway", "Update API key quota failed: %v", err)
-		}
-	}
-
-	// Update API Key rate limit usage
-	if shouldBill && cost.ActualCost > 0 && apiKey.HasRateLimits() && input.APIKeyService != nil {
-		if err := input.APIKeyService.UpdateRateLimitUsage(ctx, apiKey.ID, cost.ActualCost); err != nil {
-			logger.LegacyPrintf("service.openai_gateway", "Update API key rate limit usage failed: %v", err)
-		}
-		s.billingCacheService.QueueUpdateAPIKeyRateLimitUsage(apiKey.ID, cost.ActualCost)
-	}
-
-	// Schedule batch update for account last_used_at
-	s.deferredService.ScheduleLastUsedUpdate(account.ID)
-
 	return nil
 }
 
diff --git a/backend/internal/service/openai_ws_forwarder.go b/backend/internal/service/openai_ws_forwarder.go
index a5c2fd7a..7b6591fa 100644
--- a/backend/internal/service/openai_ws_forwarder.go
+++ b/backend/internal/service/openai_ws_forwarder.go
@@ -864,7 +864,8 @@ func isOpenAIWSClientDisconnectError(err error) bool {
 		strings.Contains(message, "unexpected eof") ||
 		strings.Contains(message, "use of closed network connection") ||
 		strings.Contains(message, "connection reset by peer") ||
-		strings.Contains(message, "broken pipe")
+		strings.Contains(message, "broken pipe") ||
+		strings.Contains(message, "an established connection was aborted")
 }
 
 func classifyOpenAIWSReadFallbackReason(err error) string {
diff --git a/backend/internal/service/ops_concurrency.go b/backend/internal/service/ops_concurrency.go
index 92b37e73..c03108c4 100644
--- a/backend/internal/service/ops_concurrency.go
+++ b/backend/internal/service/ops_concurrency.go
@@ -64,8 +64,9 @@ func (s *OpsService) getAccountsLoadMapBestEffort(ctx context.Context, accounts
 		if acc.ID <= 0 {
 			continue
 		}
-		if prev, ok := unique[acc.ID]; !ok || acc.Concurrency > prev {
-			unique[acc.ID] = acc.Concurrency
+		lf := acc.EffectiveLoadFactor()
+		if prev, ok := unique[acc.ID]; !ok || lf > prev {
+			unique[acc.ID] = lf
 		}
 	}
 
diff --git a/backend/internal/service/ops_metrics_collector.go b/backend/internal/service/ops_metrics_collector.go
index 30adaae0..6c337071 100644
--- a/backend/internal/service/ops_metrics_collector.go
+++ b/backend/internal/service/ops_metrics_collector.go
@@ -389,13 +389,9 @@ func (c *OpsMetricsCollector) collectConcurrencyQueueDepth(parentCtx context.Con
 		if acc.ID <= 0 {
 			continue
 		}
-		maxConc := acc.Concurrency
-		if maxConc < 0 {
-			maxConc = 0
-		}
 		batch = append(batch, AccountWithConcurrency{
 			ID:             acc.ID,
-			MaxConcurrency: maxConc,
+			MaxConcurrency: acc.EffectiveLoadFactor(),
 		})
 	}
 	if len(batch) == 0 {
diff --git a/backend/internal/service/subscription_calculate_progress_test.go b/backend/internal/service/subscription_calculate_progress_test.go
index 22018bcd..6a6a1c12 100644
--- a/backend/internal/service/subscription_calculate_progress_test.go
+++ b/backend/internal/service/subscription_calculate_progress_test.go
@@ -34,9 +34,10 @@ func TestCalculateProgress_BasicFields(t *testing.T) {
 	assert.Equal(t, int64(100), progress.ID)
 	assert.Equal(t, "Premium", progress.GroupName)
 	assert.Equal(t, sub.ExpiresAt, progress.ExpiresAt)
-	assert.Equal(t, 29, progress.ExpiresInDays) // 约 30 天
-	assert.Nil(t, progress.Daily, "无日限额时 Daily 应为 nil")
-	assert.Nil(t, progress.Weekly, "无周限额时 Weekly 应为 nil")
+	assert.GreaterOrEqual(t, progress.ExpiresInDays, 29)
+	assert.LessOrEqual(t, progress.ExpiresInDays, 30)
+	assert.Nil(t, progress.Daily)
+	assert.Nil(t, progress.Weekly)
 	assert.Nil(t, progress.Monthly, "无月限额时 Monthly 应为 nil")
 }
 
diff --git a/backend/migrations/056_add_sonnet46_to_model_mapping.sql b/backend/migrations/056_add_sonnet46_to_model_mapping.sql
new file mode 100644
index 00000000..aa7657d7
--- /dev/null
+++ b/backend/migrations/056_add_sonnet46_to_model_mapping.sql
@@ -0,0 +1,42 @@
+-- Add claude-sonnet-4-6 to model_mapping for all Antigravity accounts
+--
+-- Background:
+-- Antigravity now supports claude-sonnet-4-6
+--
+-- Strategy:
+-- Directly overwrite the entire model_mapping with updated mappings
+-- This ensures consistency with DefaultAntigravityModelMapping in constants.go
+
+UPDATE accounts
+SET credentials = jsonb_set(
+    credentials,
+    '{model_mapping}',
+    '{
+        "claude-opus-4-6-thinking": "claude-opus-4-6-thinking",
+        "claude-opus-4-6": "claude-opus-4-6-thinking",
+        "claude-opus-4-5-thinking": "claude-opus-4-6-thinking",
+        "claude-opus-4-5-20251101": "claude-opus-4-6-thinking",
+        "claude-sonnet-4-6": "claude-sonnet-4-6",
+        "claude-sonnet-4-5": "claude-sonnet-4-5",
+        "claude-sonnet-4-5-thinking": "claude-sonnet-4-5-thinking",
+        "claude-sonnet-4-5-20250929": "claude-sonnet-4-5",
+        "claude-haiku-4-5": "claude-sonnet-4-5",
+        "claude-haiku-4-5-20251001": "claude-sonnet-4-5",
+        "gemini-2.5-flash": "gemini-2.5-flash",
+        "gemini-2.5-flash-lite": "gemini-2.5-flash-lite",
+        "gemini-2.5-flash-thinking": "gemini-2.5-flash-thinking",
+        "gemini-2.5-pro": "gemini-2.5-pro",
+        "gemini-3-flash": "gemini-3-flash",
+        "gemini-3-pro-high": "gemini-3-pro-high",
+        "gemini-3-pro-low": "gemini-3-pro-low",
+        "gemini-3-pro-image": "gemini-3-pro-image",
+        "gemini-3-flash-preview": "gemini-3-flash",
+        "gemini-3-pro-preview": "gemini-3-pro-high",
+        "gemini-3-pro-image-preview": "gemini-3-pro-image",
+        "gpt-oss-120b-medium": "gpt-oss-120b-medium",
+        "tab_flash_lite_preview": "tab_flash_lite_preview"
+    }'::jsonb
+)
+WHERE platform = 'antigravity'
+  AND deleted_at IS NULL
+  AND credentials->'model_mapping' IS NOT NULL;
diff --git a/backend/migrations/057_add_gemini31_pro_to_model_mapping.sql b/backend/migrations/057_add_gemini31_pro_to_model_mapping.sql
new file mode 100644
index 00000000..6305e717
--- /dev/null
+++ b/backend/migrations/057_add_gemini31_pro_to_model_mapping.sql
@@ -0,0 +1,45 @@
+-- Add gemini-3.1-pro-high, gemini-3.1-pro-low, gemini-3.1-pro-preview to model_mapping
+--
+-- Background:
+-- Antigravity now supports gemini-3.1-pro-high and gemini-3.1-pro-low
+--
+-- Strategy:
+-- Directly overwrite the entire model_mapping with updated mappings
+-- This ensures consistency with DefaultAntigravityModelMapping in constants.go
+
+UPDATE accounts
+SET credentials = jsonb_set(
+    credentials,
+    '{model_mapping}',
+    '{
+        "claude-opus-4-6-thinking": "claude-opus-4-6-thinking",
+        "claude-opus-4-6": "claude-opus-4-6-thinking",
+        "claude-opus-4-5-thinking": "claude-opus-4-6-thinking",
+        "claude-opus-4-5-20251101": "claude-opus-4-6-thinking",
+        "claude-sonnet-4-6": "claude-sonnet-4-6",
+        "claude-sonnet-4-5": "claude-sonnet-4-5",
+        "claude-sonnet-4-5-thinking": "claude-sonnet-4-5-thinking",
+        "claude-sonnet-4-5-20250929": "claude-sonnet-4-5",
+        "claude-haiku-4-5": "claude-sonnet-4-5",
+        "claude-haiku-4-5-20251001": "claude-sonnet-4-5",
+        "gemini-2.5-flash": "gemini-2.5-flash",
+        "gemini-2.5-flash-lite": "gemini-2.5-flash-lite",
+        "gemini-2.5-flash-thinking": "gemini-2.5-flash-thinking",
+        "gemini-2.5-pro": "gemini-2.5-pro",
+        "gemini-3-flash": "gemini-3-flash",
+        "gemini-3-pro-high": "gemini-3-pro-high",
+        "gemini-3-pro-low": "gemini-3-pro-low",
+        "gemini-3-pro-image": "gemini-3-pro-image",
+        "gemini-3-flash-preview": "gemini-3-flash",
+        "gemini-3-pro-preview": "gemini-3-pro-high",
+        "gemini-3-pro-image-preview": "gemini-3-pro-image",
+        "gemini-3.1-pro-high": "gemini-3.1-pro-high",
+        "gemini-3.1-pro-low": "gemini-3.1-pro-low",
+        "gemini-3.1-pro-preview": "gemini-3.1-pro-high",
+        "gpt-oss-120b-medium": "gpt-oss-120b-medium",
+        "tab_flash_lite_preview": "tab_flash_lite_preview"
+    }'::jsonb
+)
+WHERE platform = 'antigravity'
+  AND deleted_at IS NULL
+  AND credentials->'model_mapping' IS NOT NULL;
diff --git a/backend/migrations/060_add_group_simulate_claude_max.sql b/backend/migrations/060_add_group_simulate_claude_max.sql
new file mode 100644
index 00000000..55662dfd
--- /dev/null
+++ b/backend/migrations/060_add_group_simulate_claude_max.sql
@@ -0,0 +1,3 @@
+ALTER TABLE groups
+    ADD COLUMN IF NOT EXISTS simulate_claude_max_enabled BOOLEAN NOT NULL DEFAULT FALSE;
+
diff --git a/backend/migrations/067_add_account_load_factor.sql b/backend/migrations/067_add_account_load_factor.sql
new file mode 100644
index 00000000..6805e8c2
--- /dev/null
+++ b/backend/migrations/067_add_account_load_factor.sql
@@ -0,0 +1 @@
+ALTER TABLE accounts ADD COLUMN IF NOT EXISTS load_factor INTEGER;
diff --git a/deploy/docker-compose.yml b/deploy/docker-compose.yml
index e5c97bf8..8715d75d 100644
--- a/deploy/docker-compose.yml
+++ b/deploy/docker-compose.yml
@@ -47,13 +47,15 @@ services:
 
       # =======================================================================
       # Database Configuration (PostgreSQL)
+      # Default: uses local postgres container
+      # External DB: set DATABASE_HOST and DATABASE_SSLMODE in .env
       # =======================================================================
-      - DATABASE_HOST=postgres
-      - DATABASE_PORT=5432
+      - DATABASE_HOST=${DATABASE_HOST:-postgres}
+      - DATABASE_PORT=${DATABASE_PORT:-5432}
       - DATABASE_USER=${POSTGRES_USER:-sub2api}
       - DATABASE_PASSWORD=${POSTGRES_PASSWORD:?POSTGRES_PASSWORD is required}
       - DATABASE_DBNAME=${POSTGRES_DB:-sub2api}
-      - DATABASE_SSLMODE=disable
+      - DATABASE_SSLMODE=${DATABASE_SSLMODE:-disable}
       - DATABASE_MAX_OPEN_CONNS=${DATABASE_MAX_OPEN_CONNS:-50}
       - DATABASE_MAX_IDLE_CONNS=${DATABASE_MAX_IDLE_CONNS:-10}
       - DATABASE_CONN_MAX_LIFETIME_MINUTES=${DATABASE_CONN_MAX_LIFETIME_MINUTES:-30}
@@ -139,8 +141,6 @@ services:
       # Examples: http://host:port, socks5://host:port
       - UPDATE_PROXY_URL=${UPDATE_PROXY_URL:-}
     depends_on:
-      postgres:
-        condition: service_healthy
       redis:
         condition: service_healthy
     networks:
diff --git a/frontend/public/wechat-qr.jpg b/frontend/public/wechat-qr.jpg
new file mode 100644
index 00000000..659068d8
Binary files /dev/null and b/frontend/public/wechat-qr.jpg differ
diff --git a/frontend/src/api/admin/accounts.ts b/frontend/src/api/admin/accounts.ts
index 25bb7b7b..5524e0cb 100644
--- a/frontend/src/api/admin/accounts.ts
+++ b/frontend/src/api/admin/accounts.ts
@@ -240,6 +240,18 @@ export async function clearRateLimit(id: number): Promise<Account> {
   return data
 }
 
+/**
+ * Reset account quota usage
+ * @param id - Account ID
+ * @returns Updated account
+ */
+export async function resetAccountQuota(id: number): Promise<Account> {
+  const { data } = await apiClient.post<Account>(
+    `/admin/accounts/${id}/reset-quota`
+  )
+  return data
+}
+
 /**
  * Get temporary unschedulable status
  * @param id - Account ID
@@ -576,6 +588,7 @@ export const accountsAPI = {
   getTodayStats,
   getBatchTodayStats,
   clearRateLimit,
+  resetAccountQuota,
   getTempUnschedulableStatus,
   resetTempUnschedulable,
   setSchedulable,
diff --git a/frontend/src/components/account/AccountCapacityCell.vue b/frontend/src/components/account/AccountCapacityCell.vue
index 2a4babf2..2001b185 100644
--- a/frontend/src/components/account/AccountCapacityCell.vue
+++ b/frontend/src/components/account/AccountCapacityCell.vue
@@ -71,6 +71,24 @@
         <span class="text-[9px] opacity-60">{{ rpmStrategyTag }}</span>
       </span>
     </div>
+
+    <!-- API Key 账号配额限制 -->
+    <div v-if="showQuotaLimit" class="flex items-center gap-1">
+      <span
+        :class="[
+          'inline-flex items-center gap-1 rounded-md px-1.5 py-0.5 text-[10px] font-medium',
+          quotaClass
+        ]"
+        :title="quotaTooltip"
+      >
+        <svg class="h-2.5 w-2.5" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
+          <path stroke-linecap="round" stroke-linejoin="round" d="M2.25 18.75a60.07 60.07 0 0115.797 2.101c.727.198 1.453-.342 1.453-1.096V18.75M3.75 4.5v.75A.75.75 0 013 6h-.75m0 0v-.375c0-.621.504-1.125 1.125-1.125H20.25M2.25 6v9m18-10.5v.75c0 .414.336.75.75.75h.75m-1.5-1.5h.375c.621 0 1.125.504 1.125 1.125v9.75c0 .621-.504 1.125-1.125 1.125h-.375m1.5-1.5H21a.75.75 0 00-.75.75v.75m0 0H3.75m0 0h-.375a1.125 1.125 0 01-1.125-1.125V15m1.5 1.5v-.75A.75.75 0 003 15h-.75M15 10.5a3 3 0 11-6 0 3 3 0 016 0zm3 0h.008v.008H18V10.5zm-12 0h.008v.008H6V10.5z" />
+        </svg>
+        <span class="font-mono">${{ formatCost(currentQuotaUsed) }}</span>
+        <span class="text-gray-400 dark:text-gray-500">/</span>
+        <span class="font-mono">${{ formatCost(account.quota_limit) }}</span>
+      </span>
+    </div>
   </div>
 </template>
 
@@ -286,6 +304,48 @@ const rpmTooltip = computed(() => {
   }
 })
 
+// 是否显示配额限制（仅 apikey 类型且设置了 quota_limit）
+const showQuotaLimit = computed(() => {
+  return (
+    props.account.type === 'apikey' &&
+    props.account.quota_limit !== undefined &&
+    props.account.quota_limit !== null &&
+    props.account.quota_limit > 0
+  )
+})
+
+// 当前已用配额
+const currentQuotaUsed = computed(() => props.account.quota_used ?? 0)
+
+// 配额状态样式
+const quotaClass = computed(() => {
+  if (!showQuotaLimit.value) return ''
+
+  const used = currentQuotaUsed.value
+  const limit = props.account.quota_limit || 0
+
+  if (used >= limit) {
+    return 'bg-red-100 text-red-700 dark:bg-red-900/30 dark:text-red-400'
+  }
+  if (used >= limit * 0.8) {
+    return 'bg-yellow-100 text-yellow-700 dark:bg-yellow-900/30 dark:text-yellow-400'
+  }
+  return 'bg-emerald-100 text-emerald-700 dark:bg-emerald-900/30 dark:text-emerald-400'
+})
+
+// 配额提示文字
+const quotaTooltip = computed(() => {
+  if (!showQuotaLimit.value) return ''
+
+  const used = currentQuotaUsed.value
+  const limit = props.account.quota_limit || 0
+
+  if (used >= limit) {
+    return t('admin.accounts.capacity.quota.exceeded')
+  }
+  return t('admin.accounts.capacity.quota.normal')
+})
+
 // 格式化费用显示
 const formatCost = (value: number | null | undefined) => {
   if (value === null || value === undefined) return '0'
diff --git a/frontend/src/components/account/AccountStatusIndicator.vue b/frontend/src/components/account/AccountStatusIndicator.vue
index e8331c25..728b310c 100644
--- a/frontend/src/components/account/AccountStatusIndicator.vue
+++ b/frontend/src/components/account/AccountStatusIndicator.vue
@@ -81,15 +81,15 @@
       v-if="activeModelRateLimits.length > 0"
       :class="[
         activeModelRateLimits.length <= 4
-          ? 'flex flex-col gap-1'
+          ? 'flex flex-col gap-0.5'
           : activeModelRateLimits.length <= 8
             ? 'columns-2 gap-x-2'
             : 'columns-3 gap-x-2'
       ]"
     >
-      <div v-for="item in activeModelRateLimits" :key="item.model" class="group relative mb-1 break-inside-avoid">
+      <div v-for="item in activeModelRateLimits" :key="item.model" class="group relative mb-0.5 break-inside-avoid">
         <span
-          class="inline-flex items-center gap-1 rounded bg-purple-100 px-1.5 py-0.5 text-xs font-medium text-purple-700 dark:bg-purple-900/30 dark:text-purple-400"
+          class="inline-flex items-center gap-0.5 rounded bg-purple-100 px-1.5 py-px text-xs font-medium text-purple-700 dark:bg-purple-900/30 dark:text-purple-400"
         >
           <Icon name="exclamationTriangle" size="xs" :stroke-width="2" />
           {{ formatScopeName(item.model) }}
diff --git a/frontend/src/components/account/AccountUsageCell.vue b/frontend/src/components/account/AccountUsageCell.vue
index 859bd7c9..24851afb 100644
--- a/frontend/src/components/account/AccountUsageCell.vue
+++ b/frontend/src/components/account/AccountUsageCell.vue
@@ -278,11 +278,12 @@
 </template>
 
 <script setup lang="ts">
-import { ref, computed, onMounted } from 'vue'
+import { ref, computed, onMounted, onBeforeUnmount } from 'vue'
 import { useI18n } from 'vue-i18n'
 import { adminAPI } from '@/api/admin'
 import type { Account, AccountUsageInfo, GeminiCredentials, WindowStats } from '@/types'
 import { resolveCodexUsageWindow } from '@/utils/codexUsage'
+import { enqueueUsageRequest } from '@/utils/usageLoadQueue'
 import UsageProgressBar from './UsageProgressBar.vue'
 import AccountQuotaInfo from './AccountQuotaInfo.vue'
 
@@ -292,6 +293,9 @@ const props = defineProps<{
 
 const { t } = useI18n()
 
+const unmounted = ref(false)
+onBeforeUnmount(() => { unmounted.value = true })
+
 const loading = ref(false)
 const error = ref<string | null>(null)
 const usageInfo = ref<AccountUsageInfo | null>(null)
@@ -701,12 +705,30 @@ const loadUsage = async () => {
   error.value = null
 
   try {
-    usageInfo.value = await adminAPI.accounts.getUsage(props.account.id)
+    const fetchFn = () => adminAPI.accounts.getUsage(props.account.id)
+    let result: AccountUsageInfo
+    // Only throttle Anthropic OAuth/setup-token accounts to avoid upstream 429
+    if (
+      props.account.platform === 'anthropic' &&
+      (props.account.type === 'oauth' || props.account.type === 'setup-token')
+    ) {
+      result = await enqueueUsageRequest(
+        props.account.platform,
+        'claude_code',
+        props.account.proxy_id,
+        fetchFn
+      )
+    } else {
+      result = await fetchFn()
+    }
+    if (!unmounted.value) usageInfo.value = result
   } catch (e: any) {
-    error.value = t('common.error')
-    console.error('Failed to load usage:', e)
+    if (!unmounted.value) {
+      error.value = t('common.error')
+      console.error('Failed to load usage:', e)
+    }
   } finally {
-    loading.value = false
+    if (!unmounted.value) loading.value = false
   }
 }
 
diff --git a/frontend/src/components/account/BulkEditAccountModal.vue b/frontend/src/components/account/BulkEditAccountModal.vue
index 1c83e658..5e7d0ae4 100644
--- a/frontend/src/components/account/BulkEditAccountModal.vue
+++ b/frontend/src/components/account/BulkEditAccountModal.vue
@@ -5,7 +5,7 @@
     width="wide"
     @close="handleClose"
   >
-    <form id="bulk-edit-account-form" class="space-y-5" @submit.prevent="handleSubmit">
+    <form id="bulk-edit-account-form" class="space-y-5" @submit.prevent="() => handleSubmit()">
       <!-- Info -->
       <div class="rounded-lg bg-blue-50 p-4 dark:bg-blue-900/20">
         <p class="text-sm text-blue-700 dark:text-blue-400">
@@ -469,7 +469,7 @@
       </div>
 
       <!-- Concurrency & Priority -->
-      <div class="grid grid-cols-2 gap-4 border-t border-gray-200 pt-4 dark:border-dark-600 lg:grid-cols-3">
+      <div class="grid grid-cols-2 gap-4 border-t border-gray-200 pt-4 dark:border-dark-600 lg:grid-cols-4">
         <div>
           <div class="mb-3 flex items-center justify-between">
             <label
@@ -498,6 +498,35 @@
             aria-labelledby="bulk-edit-concurrency-label"
           />
         </div>
+        <div>
+          <div class="mb-3 flex items-center justify-between">
+            <label
+              id="bulk-edit-load-factor-label"
+              class="input-label mb-0"
+              for="bulk-edit-load-factor-enabled"
+            >
+              {{ t('admin.accounts.loadFactor') }}
+            </label>
+            <input
+              v-model="enableLoadFactor"
+              id="bulk-edit-load-factor-enabled"
+              type="checkbox"
+              aria-controls="bulk-edit-load-factor"
+              class="rounded border-gray-300 text-primary-600 focus:ring-primary-500"
+            />
+          </div>
+          <input
+            v-model.number="loadFactor"
+            id="bulk-edit-load-factor"
+            type="number"
+            min="1"
+            :disabled="!enableLoadFactor"
+            class="input"
+            :class="!enableLoadFactor && 'cursor-not-allowed opacity-50'"
+            aria-labelledby="bulk-edit-load-factor-label"
+          />
+          <p class="input-hint">{{ t('admin.accounts.loadFactorHint') }}</p>
+        </div>
         <div>
           <div class="mb-3 flex items-center justify-between">
             <label
@@ -869,6 +898,7 @@ const enableCustomErrorCodes = ref(false)
 const enableInterceptWarmup = ref(false)
 const enableProxy = ref(false)
 const enableConcurrency = ref(false)
+const enableLoadFactor = ref(false)
 const enablePriority = ref(false)
 const enableRateMultiplier = ref(false)
 const enableStatus = ref(false)
@@ -889,6 +919,7 @@ const customErrorCodeInput = ref<number | null>(null)
 const interceptWarmupRequests = ref(false)
 const proxyId = ref<number | null>(null)
 const concurrency = ref(1)
+const loadFactor = ref<number | null>(null)
 const priority = ref(1)
 const rateMultiplier = ref(1)
 const status = ref<'active' | 'inactive'>('active')
@@ -1195,6 +1226,10 @@ const buildUpdatePayload = (): Record<string, unknown> | null => {
     updates.concurrency = concurrency.value
   }
 
+  if (enableLoadFactor.value) {
+    updates.load_factor = loadFactor.value
+  }
+
   if (enablePriority.value) {
     updates.priority = priority.value
   }
@@ -1340,6 +1375,7 @@ const handleSubmit = async () => {
     enableInterceptWarmup.value ||
     enableProxy.value ||
     enableConcurrency.value ||
+    enableLoadFactor.value ||
     enablePriority.value ||
     enableRateMultiplier.value ||
     enableStatus.value ||
@@ -1430,6 +1466,7 @@ watch(
       enableInterceptWarmup.value = false
       enableProxy.value = false
       enableConcurrency.value = false
+      enableLoadFactor.value = false
       enablePriority.value = false
       enableRateMultiplier.value = false
       enableStatus.value = false
@@ -1446,6 +1483,7 @@ watch(
       interceptWarmupRequests.value = false
       proxyId.value = null
       concurrency.value = 1
+      loadFactor.value = null
       priority.value = 1
       rateMultiplier.value = 1
       status.value = 'active'
diff --git a/frontend/src/components/account/CreateAccountModal.vue b/frontend/src/components/account/CreateAccountModal.vue
index 225b91f5..8013bbaf 100644
--- a/frontend/src/components/account/CreateAccountModal.vue
+++ b/frontend/src/components/account/CreateAccountModal.vue
@@ -1227,6 +1227,9 @@
 
       </div>
 
+      <!-- API Key 账号配额限制 -->
+      <QuotaLimitCard v-if="form.type === 'apikey'" v-model="editQuotaLimit" />
+
       <!-- Temp Unschedulable Rules -->
       <div class="border-t border-gray-200 pt-4 dark:border-dark-600 space-y-4">
         <div class="mb-3 flex items-center justify-between">
@@ -1749,11 +1752,17 @@
         <ProxySelector v-model="form.proxy_id" :proxies="proxies" />
       </div>
 
-      <div class="grid grid-cols-2 gap-4 lg:grid-cols-3">
+      <div class="grid grid-cols-2 gap-4 lg:grid-cols-4">
         <div>
           <label class="input-label">{{ t('admin.accounts.concurrency') }}</label>
           <input v-model.number="form.concurrency" type="number" min="1" class="input" />
         </div>
+        <div>
+          <label class="input-label">{{ t('admin.accounts.loadFactor') }}</label>
+          <input v-model.number="form.load_factor" type="number" min="1"
+            class="input" :placeholder="String(form.concurrency || 1)" />
+          <p class="input-hint">{{ t('admin.accounts.loadFactorHint') }}</p>
+        </div>
         <div>
           <label class="input-label">{{ t('admin.accounts.priority') }}</label>
           <input
@@ -2337,6 +2346,7 @@ import Icon from '@/components/icons/Icon.vue'
 import ProxySelector from '@/components/common/ProxySelector.vue'
 import GroupSelector from '@/components/common/GroupSelector.vue'
 import ModelWhitelistSelector from '@/components/account/ModelWhitelistSelector.vue'
+import QuotaLimitCard from '@/components/account/QuotaLimitCard.vue'
 import { applyInterceptWarmup } from '@/components/account/credentialsBuilder'
 import { formatDateTimeLocalInput, parseDateTimeLocalInput } from '@/utils/format'
 import { createStableObjectKeyResolver } from '@/utils/stableObjectKey'
@@ -2460,6 +2470,7 @@ const accountCategory = ref<'oauth-based' | 'apikey'>('oauth-based') // UI selec
 const addMethod = ref<AddMethod>('oauth') // For oauth-based: 'oauth' or 'setup-token'
 const apiKeyBaseUrl = ref('https://api.anthropic.com')
 const apiKeyValue = ref('')
+const editQuotaLimit = ref<number | null>(null)
 const modelMappings = ref<ModelMapping[]>([])
 const modelRestrictionMode = ref<'whitelist' | 'mapping'>('whitelist')
 const allowedModels = ref<string[]>([])
@@ -2633,6 +2644,7 @@ const form = reactive({
   credentials: {} as Record<string, unknown>,
   proxy_id: null as number | null,
   concurrency: 10,
+  load_factor: null as number | null,
   priority: 1,
   rate_multiplier: 1,
   group_ids: [] as number[],
@@ -3112,6 +3124,7 @@ const resetForm = () => {
   form.credentials = {}
   form.proxy_id = null
   form.concurrency = 10
+  form.load_factor = null
   form.priority = 1
   form.rate_multiplier = 1
   form.group_ids = []
@@ -3120,6 +3133,7 @@ const resetForm = () => {
   addMethod.value = 'oauth'
   apiKeyBaseUrl.value = 'https://api.anthropic.com'
   apiKeyValue.value = ''
+  editQuotaLimit.value = null
   modelMappings.value = []
   modelRestrictionMode.value = 'whitelist'
   allowedModels.value = [...claudeModels] // Default fill related models
@@ -3483,6 +3497,7 @@ const handleImportAccessToken = async (accessTokenInput: string) => {
           extra: soraExtra,
           proxy_id: form.proxy_id,
           concurrency: form.concurrency,
+          load_factor: form.load_factor || undefined,
           priority: form.priority,
           rate_multiplier: form.rate_multiplier,
           group_ids: form.group_ids,
@@ -3533,15 +3548,21 @@ const createAccountAndFinish = async (
   if (!applyTempUnschedConfig(credentials)) {
     return
   }
+  // Inject quota_limit for apikey accounts
+  let finalExtra = extra
+  if (type === 'apikey' && editQuotaLimit.value != null && editQuotaLimit.value > 0) {
+    finalExtra = { ...(extra || {}), quota_limit: editQuotaLimit.value }
+  }
   await doCreateAccount({
     name: form.name,
     notes: form.notes,
     platform,
     type,
     credentials,
-    extra,
+    extra: finalExtra,
     proxy_id: form.proxy_id,
     concurrency: form.concurrency,
+    load_factor: form.load_factor || undefined,
     priority: form.priority,
     rate_multiplier: form.rate_multiplier,
     group_ids: form.group_ids,
@@ -3597,6 +3618,7 @@ const handleOpenAIExchange = async (authCode: string) => {
         extra,
         proxy_id: form.proxy_id,
         concurrency: form.concurrency,
+        load_factor: form.load_factor || undefined,
         priority: form.priority,
         rate_multiplier: form.rate_multiplier,
         group_ids: form.group_ids,
@@ -3626,6 +3648,7 @@ const handleOpenAIExchange = async (authCode: string) => {
         extra: soraExtra,
         proxy_id: form.proxy_id,
         concurrency: form.concurrency,
+        load_factor: form.load_factor || undefined,
         priority: form.priority,
         rate_multiplier: form.rate_multiplier,
         group_ids: form.group_ids,
@@ -3703,6 +3726,7 @@ const handleOpenAIValidateRT = async (refreshTokenInput: string) => {
             extra,
             proxy_id: form.proxy_id,
             concurrency: form.concurrency,
+            load_factor: form.load_factor || undefined,
             priority: form.priority,
             rate_multiplier: form.rate_multiplier,
             group_ids: form.group_ids,
@@ -3730,6 +3754,7 @@ const handleOpenAIValidateRT = async (refreshTokenInput: string) => {
             extra: soraExtra,
             proxy_id: form.proxy_id,
             concurrency: form.concurrency,
+            load_factor: form.load_factor || undefined,
             priority: form.priority,
             rate_multiplier: form.rate_multiplier,
             group_ids: form.group_ids,
@@ -3818,6 +3843,7 @@ const handleSoraValidateST = async (sessionTokenInput: string) => {
           extra: soraExtra,
           proxy_id: form.proxy_id,
           concurrency: form.concurrency,
+          load_factor: form.load_factor || undefined,
           priority: form.priority,
           rate_multiplier: form.rate_multiplier,
           group_ids: form.group_ids,
@@ -3906,6 +3932,7 @@ const handleAntigravityValidateRT = async (refreshTokenInput: string) => {
           extra: {},
           proxy_id: form.proxy_id,
           concurrency: form.concurrency,
+          load_factor: form.load_factor || undefined,
           priority: form.priority,
           rate_multiplier: form.rate_multiplier,
           group_ids: form.group_ids,
@@ -4064,8 +4091,11 @@ const handleAnthropicExchange = async (authCode: string) => {
     }
 
     // Add RPM limit settings
-    if (rpmLimitEnabled.value && baseRpm.value != null && baseRpm.value > 0) {
-      extra.base_rpm = baseRpm.value
+    if (rpmLimitEnabled.value) {
+      const DEFAULT_BASE_RPM = 15
+      extra.base_rpm = (baseRpm.value != null && baseRpm.value > 0)
+        ? baseRpm.value
+        : DEFAULT_BASE_RPM
       extra.rpm_strategy = rpmStrategy.value
       if (rpmStickyBuffer.value != null && rpmStickyBuffer.value > 0) {
         extra.rpm_sticky_buffer = rpmStickyBuffer.value
@@ -4176,8 +4206,11 @@ const handleCookieAuth = async (sessionKey: string) => {
         }
 
         // Add RPM limit settings
-        if (rpmLimitEnabled.value && baseRpm.value != null && baseRpm.value > 0) {
-          extra.base_rpm = baseRpm.value
+        if (rpmLimitEnabled.value) {
+          const DEFAULT_BASE_RPM = 15
+          extra.base_rpm = (baseRpm.value != null && baseRpm.value > 0)
+            ? baseRpm.value
+            : DEFAULT_BASE_RPM
           extra.rpm_strategy = rpmStrategy.value
           if (rpmStickyBuffer.value != null && rpmStickyBuffer.value > 0) {
             extra.rpm_sticky_buffer = rpmStickyBuffer.value
@@ -4223,6 +4256,7 @@ const handleCookieAuth = async (sessionKey: string) => {
           extra,
           proxy_id: form.proxy_id,
           concurrency: form.concurrency,
+          load_factor: form.load_factor || undefined,
           priority: form.priority,
           rate_multiplier: form.rate_multiplier,
           group_ids: form.group_ids,
diff --git a/frontend/src/components/account/EditAccountModal.vue b/frontend/src/components/account/EditAccountModal.vue
index 09b39bc0..1b02e327 100644
--- a/frontend/src/components/account/EditAccountModal.vue
+++ b/frontend/src/components/account/EditAccountModal.vue
@@ -650,11 +650,17 @@
         <ProxySelector v-model="form.proxy_id" :proxies="proxies" />
       </div>
 
-      <div class="grid grid-cols-2 gap-4 lg:grid-cols-3">
+      <div class="grid grid-cols-2 gap-4 lg:grid-cols-4">
         <div>
           <label class="input-label">{{ t('admin.accounts.concurrency') }}</label>
           <input v-model.number="form.concurrency" type="number" min="1" class="input" />
         </div>
+        <div>
+          <label class="input-label">{{ t('admin.accounts.loadFactor') }}</label>
+          <input v-model.number="form.load_factor" type="number" min="1"
+            class="input" :placeholder="String(form.concurrency || 1)" />
+          <p class="input-hint">{{ t('admin.accounts.loadFactorHint') }}</p>
+        </div>
         <div>
           <label class="input-label">{{ t('admin.accounts.priority') }}</label>
           <input
@@ -759,6 +765,9 @@
         </div>
       </div>
 
+      <!-- API Key 账号配额限制 -->
+      <QuotaLimitCard v-if="account?.type === 'apikey'" v-model="editQuotaLimit" />
+
       <!-- OpenAI OAuth Codex 官方客户端限制开关 -->
       <div
         v-if="account?.platform === 'openai' && account?.type === 'oauth'"
@@ -1269,6 +1278,7 @@ import Icon from '@/components/icons/Icon.vue'
 import ProxySelector from '@/components/common/ProxySelector.vue'
 import GroupSelector from '@/components/common/GroupSelector.vue'
 import ModelWhitelistSelector from '@/components/account/ModelWhitelistSelector.vue'
+import QuotaLimitCard from '@/components/account/QuotaLimitCard.vue'
 import { applyInterceptWarmup } from '@/components/account/credentialsBuilder'
 import { formatDateTimeLocalInput, parseDateTimeLocalInput } from '@/utils/format'
 import { createStableObjectKeyResolver } from '@/utils/stableObjectKey'
@@ -1386,6 +1396,7 @@ const openaiOAuthResponsesWebSocketV2Mode = ref<OpenAIWSMode>(OPENAI_WS_MODE_OFF
 const openaiAPIKeyResponsesWebSocketV2Mode = ref<OpenAIWSMode>(OPENAI_WS_MODE_OFF)
 const codexCLIOnlyEnabled = ref(false)
 const anthropicPassthroughEnabled = ref(false)
+const editQuotaLimit = ref<number | null>(null)
 const openAIWSModeOptions = computed(() => [
   { value: OPENAI_WS_MODE_OFF, label: t('admin.accounts.openai.wsModeOff') },
   // TODO: ctx_pool 选项暂时隐藏，待测试完成后恢复
@@ -1465,6 +1476,7 @@ const form = reactive({
   notes: '',
   proxy_id: null as number | null,
   concurrency: 1,
+  load_factor: null as number | null,
   priority: 1,
   rate_multiplier: 1,
   status: 'active' as 'active' | 'inactive',
@@ -1498,6 +1510,7 @@ watch(
       form.notes = newAccount.notes || ''
       form.proxy_id = newAccount.proxy_id
       form.concurrency = newAccount.concurrency
+      form.load_factor = newAccount.load_factor ?? null
       form.priority = newAccount.priority
       form.rate_multiplier = newAccount.rate_multiplier ?? 1
       form.status = newAccount.status as 'active' | 'inactive'
@@ -1541,6 +1554,14 @@ watch(
         anthropicPassthroughEnabled.value = extra?.anthropic_passthrough === true
       }
 
+      // Load quota limit for apikey accounts
+      if (newAccount.type === 'apikey') {
+        const quotaVal = extra?.quota_limit as number | undefined
+        editQuotaLimit.value = (quotaVal && quotaVal > 0) ? quotaVal : null
+      } else {
+        editQuotaLimit.value = null
+      }
+
       // Load antigravity model mapping (Antigravity 只支持映射模式)
       if (newAccount.platform === 'antigravity') {
         const credentials = newAccount.credentials as Record<string, unknown> | undefined
@@ -2049,6 +2070,10 @@ const handleSubmit = async () => {
     if (form.expires_at === null) {
       updatePayload.expires_at = 0
     }
+    // load_factor: 空值/0/NaN 时发送 0（后端约定 0 = 清除）
+    if (!form.load_factor || form.load_factor <= 0) {
+      updatePayload.load_factor = 0
+    }
     updatePayload.auto_pause_on_expired = autoPauseOnExpired.value
 
     // For apikey type, handle credentials update
@@ -2188,8 +2213,11 @@ const handleSubmit = async () => {
       }
 
       // RPM limit settings
-      if (rpmLimitEnabled.value && baseRpm.value != null && baseRpm.value > 0) {
-        newExtra.base_rpm = baseRpm.value
+      if (rpmLimitEnabled.value) {
+        const DEFAULT_BASE_RPM = 15
+        newExtra.base_rpm = (baseRpm.value != null && baseRpm.value > 0)
+          ? baseRpm.value
+          : DEFAULT_BASE_RPM
         newExtra.rpm_strategy = rpmStrategy.value
         if (rpmStickyBuffer.value != null && rpmStickyBuffer.value > 0) {
           newExtra.rpm_sticky_buffer = rpmStickyBuffer.value
@@ -2283,6 +2311,19 @@ const handleSubmit = async () => {
       updatePayload.extra = newExtra
     }
 
+    // For apikey accounts, handle quota_limit in extra
+    if (props.account.type === 'apikey') {
+      const currentExtra = (updatePayload.extra as Record<string, unknown>) ||
+        (props.account.extra as Record<string, unknown>) || {}
+      const newExtra: Record<string, unknown> = { ...currentExtra }
+      if (editQuotaLimit.value != null && editQuotaLimit.value > 0) {
+        newExtra.quota_limit = editQuotaLimit.value
+      } else {
+        delete newExtra.quota_limit
+      }
+      updatePayload.extra = newExtra
+    }
+
     const canContinue = await ensureAntigravityMixedChannelConfirmed(async () => {
       await submitUpdateAccount(accountID, updatePayload)
     })
diff --git a/frontend/src/components/account/QuotaLimitCard.vue b/frontend/src/components/account/QuotaLimitCard.vue
new file mode 100644
index 00000000..1be73a25
--- /dev/null
+++ b/frontend/src/components/account/QuotaLimitCard.vue
@@ -0,0 +1,92 @@
+<script setup lang="ts">
+import { ref, watch } from 'vue'
+import { useI18n } from 'vue-i18n'
+
+const { t } = useI18n()
+
+const props = defineProps<{
+  modelValue: number | null
+}>()
+
+const emit = defineEmits<{
+  'update:modelValue': [value: number | null]
+}>()
+
+const enabled = ref(props.modelValue != null && props.modelValue > 0)
+
+// Sync enabled state when modelValue changes externally (e.g. account load)
+watch(
+  () => props.modelValue,
+  (val) => {
+    enabled.value = val != null && val > 0
+  }
+)
+
+// When toggle is turned off, clear the value
+watch(enabled, (val) => {
+  if (!val) {
+    emit('update:modelValue', null)
+  }
+})
+
+const onInput = (e: Event) => {
+  const raw = (e.target as HTMLInputElement).valueAsNumber
+  emit('update:modelValue', Number.isNaN(raw) ? null : raw)
+}
+</script>
+
+<template>
+  <div class="border-t border-gray-200 pt-4 dark:border-dark-600 space-y-4">
+    <div class="mb-3">
+      <h3 class="input-label mb-0 text-base font-semibold">{{ t('admin.accounts.quotaLimit') }}</h3>
+      <p class="mt-1 text-xs text-gray-500 dark:text-gray-400">
+        {{ t('admin.accounts.quotaLimitHint') }}
+      </p>
+    </div>
+
+    <div class="rounded-lg border border-gray-200 p-4 dark:border-dark-600">
+      <div class="mb-3 flex items-center justify-between">
+        <div>
+          <label class="input-label mb-0">{{ t('admin.accounts.quotaLimitToggle') }}</label>
+          <p class="mt-1 text-xs text-gray-500 dark:text-gray-400">
+            {{ t('admin.accounts.quotaLimitToggleHint') }}
+          </p>
+        </div>
+        <button
+          type="button"
+          @click="enabled = !enabled"
+          :class="[
+            'relative inline-flex h-6 w-11 flex-shrink-0 cursor-pointer rounded-full border-2 border-transparent transition-colors duration-200 ease-in-out focus:outline-none focus:ring-2 focus:ring-primary-500 focus:ring-offset-2',
+            enabled ? 'bg-primary-600' : 'bg-gray-200 dark:bg-dark-600'
+          ]"
+        >
+          <span
+            :class="[
+              'pointer-events-none inline-block h-5 w-5 transform rounded-full bg-white shadow ring-0 transition duration-200 ease-in-out',
+              enabled ? 'translate-x-5' : 'translate-x-0'
+            ]"
+          />
+        </button>
+      </div>
+
+      <div v-if="enabled" class="space-y-3">
+        <div>
+          <label class="input-label">{{ t('admin.accounts.quotaLimitAmount') }}</label>
+          <div class="relative">
+            <span class="absolute left-3 top-1/2 -translate-y-1/2 text-gray-500 dark:text-gray-400">$</span>
+            <input
+              :value="modelValue"
+              @input="onInput"
+              type="number"
+              min="0"
+              step="0.01"
+              class="input pl-7"
+              :placeholder="t('admin.accounts.quotaLimitPlaceholder')"
+            />
+          </div>
+          <p class="input-hint">{{ t('admin.accounts.quotaLimitAmountHint') }}</p>
+        </div>
+      </div>
+    </div>
+  </div>
+</template>
diff --git a/frontend/src/components/account/__tests__/AccountUsageCell.spec.ts b/frontend/src/components/account/__tests__/AccountUsageCell.spec.ts
index 0b61b3bd..64f15fc9 100644
--- a/frontend/src/components/account/__tests__/AccountUsageCell.spec.ts
+++ b/frontend/src/components/account/__tests__/AccountUsageCell.spec.ts
@@ -14,6 +14,10 @@ vi.mock('@/api/admin', () => ({
   }
 }))
 
+vi.mock('@/utils/usageLoadQueue', () => ({
+  enqueueUsageRequest: (_p: string, _t: string, _id: unknown, fn: () => Promise<unknown>) => fn()
+}))
+
 vi.mock('vue-i18n', async () => {
   const actual = await vi.importActual<typeof import('vue-i18n')>('vue-i18n')
   return {
diff --git a/frontend/src/components/admin/account/AccountActionMenu.vue b/frontend/src/components/admin/account/AccountActionMenu.vue
index fbff0bed..02596b9f 100644
--- a/frontend/src/components/admin/account/AccountActionMenu.vue
+++ b/frontend/src/components/admin/account/AccountActionMenu.vue
@@ -41,6 +41,10 @@
               <Icon name="clock" size="sm" />
               {{ t('admin.accounts.clearRateLimit') }}
             </button>
+            <button v-if="hasQuotaLimit" @click="$emit('reset-quota', account); $emit('close')" class="flex w-full items-center gap-2 px-4 py-2 text-sm text-teal-600 hover:bg-gray-100 dark:hover:bg-dark-700">
+              <Icon name="refresh" size="sm" />
+              {{ t('admin.accounts.resetQuota') }}
+            </button>
           </template>
         </div>
       </div>
@@ -55,7 +59,7 @@ import { Icon } from '@/components/icons'
 import type { Account } from '@/types'
 
 const props = defineProps<{ show: boolean; account: Account | null; position: { top: number; left: number } | null }>()
-const emit = defineEmits(['close', 'test', 'stats', 'schedule', 'reauth', 'refresh-token', 'reset-status', 'clear-rate-limit'])
+const emit = defineEmits(['close', 'test', 'stats', 'schedule', 'reauth', 'refresh-token', 'reset-status', 'clear-rate-limit', 'reset-quota'])
 const { t } = useI18n()
 const isRateLimited = computed(() => {
   if (props.account?.rate_limit_reset_at && new Date(props.account.rate_limit_reset_at) > new Date()) {
@@ -71,6 +75,12 @@ const isRateLimited = computed(() => {
   return false
 })
 const isOverloaded = computed(() => props.account?.overload_until && new Date(props.account.overload_until) > new Date())
+const hasQuotaLimit = computed(() => {
+  return props.account?.type === 'apikey' &&
+    props.account?.quota_limit !== undefined &&
+    props.account?.quota_limit !== null &&
+    props.account?.quota_limit > 0
+})
 
 const handleKeydown = (event: KeyboardEvent) => {
   if (event.key === 'Escape') emit('close')
diff --git a/frontend/src/components/common/WechatServiceButton.vue b/frontend/src/components/common/WechatServiceButton.vue
new file mode 100644
index 00000000..9ee8d3d5
--- /dev/null
+++ b/frontend/src/components/common/WechatServiceButton.vue
@@ -0,0 +1,104 @@
+<template>
+  <!-- 悬浮按钮 - 使用主题色 -->
+  <button
+    @click="showModal = true"
+    class="fixed bottom-6 right-6 z-50 flex items-center gap-2 rounded-full bg-gradient-to-r from-primary-500 to-primary-600 px-4 py-3 text-white shadow-lg shadow-primary-500/25 transition-all hover:from-primary-600 hover:to-primary-700 hover:shadow-xl hover:shadow-primary-500/30"
+  >
+    <svg class="h-5 w-5" viewBox="0 0 24 24" fill="currentColor">
+      <path d="M8.691 2.188C3.891 2.188 0 5.476 0 9.53c0 2.212 1.17 4.203 3.002 5.55a.59.59 0 01.213.665l-.39 1.48c-.019.07-.048.141-.048.213 0 .163.13.295.29.295a.328.328 0 00.186-.059l2.114-1.225a.87.87 0 01.415-.106.807.807 0 01.213.026 10.07 10.07 0 002.696.37c.262 0 .52-.011.776-.028a5.91 5.91 0 01-.193-1.479c0-3.644 3.374-6.6 7.536-6.6.262 0 .52.011.776.028-.628-3.513-4.27-6.472-8.885-6.472zM5.785 5.97a1.1 1.1 0 110 2.2 1.1 1.1 0 010-2.2zm5.813 0a1.1 1.1 0 110 2.2 1.1 1.1 0 010-2.2zm5.192 2.642c-3.703 0-6.71 2.567-6.71 5.73 0 3.163 3.007 5.73 6.71 5.73a7.9 7.9 0 002.126-.288.644.644 0 01.17-.022.69.69 0 01.329.085l1.672.97a.262.262 0 00.147.046c.128 0 .23-.104.23-.233a.403.403 0 00-.038-.168l-.309-1.17a.468.468 0 01.168-.527c1.449-1.065 2.374-2.643 2.374-4.423 0-3.163-3.007-5.73-6.71-5.73h-.159zm-2.434 3.34a.88.88 0 110 1.76.88.88 0 010-1.76zm4.868 0a.88.88 0 110 1.76.88.88 0 010-1.76z"/>
+    </svg>
+    <span class="text-sm font-medium">客服</span>
+  </button>
+
+  <!-- 弹窗 -->
+  <Teleport to="body">
+    <Transition name="fade">
+      <div
+        v-if="showModal"
+        class="fixed inset-0 z-[100] flex items-center justify-center bg-black/50 p-4 backdrop-blur-sm"
+        @click.self="showModal = false"
+      >
+        <Transition name="scale">
+          <div
+            v-if="showModal"
+            class="relative w-full max-w-sm rounded-2xl bg-white p-6 shadow-2xl dark:bg-dark-700"
+          >
+            <!-- 关闭按钮 -->
+            <button
+              @click="showModal = false"
+              class="absolute right-4 top-4 text-gray-400 transition-colors hover:text-gray-600 dark:text-dark-400 dark:hover:text-dark-200"
+            >
+              <svg class="h-5 w-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M6 18L18 6M6 6l12 12" />
+              </svg>
+            </button>
+
+            <!-- 标题 -->
+            <div class="mb-4 flex items-center gap-3">
+              <div class="flex h-10 w-10 items-center justify-center rounded-full bg-gradient-to-br from-primary-500 to-primary-600">
+                <svg class="h-6 w-6 text-white" viewBox="0 0 24 24" fill="currentColor">
+                  <path d="M8.691 2.188C3.891 2.188 0 5.476 0 9.53c0 2.212 1.17 4.203 3.002 5.55a.59.59 0 01.213.665l-.39 1.48c-.019.07-.048.141-.048.213 0 .163.13.295.29.295a.328.328 0 00.186-.059l2.114-1.225a.87.87 0 01.415-.106.807.807 0 01.213.026 10.07 10.07 0 002.696.37c.262 0 .52-.011.776-.028a5.91 5.91 0 01-.193-1.479c0-3.644 3.374-6.6 7.536-6.6.262 0 .52.011.776.028-.628-3.513-4.27-6.472-8.885-6.472zM5.785 5.97a1.1 1.1 0 110 2.2 1.1 1.1 0 010-2.2zm5.813 0a1.1 1.1 0 110 2.2 1.1 1.1 0 010-2.2zm5.192 2.642c-3.703 0-6.71 2.567-6.71 5.73 0 3.163 3.007 5.73 6.71 5.73a7.9 7.9 0 002.126-.288.644.644 0 01.17-.022.69.69 0 01.329.085l1.672.97a.262.262 0 00.147.046c.128 0 .23-.104.23-.233a.403.403 0 00-.038-.168l-.309-1.17a.468.468 0 01.168-.527c1.449-1.065 2.374-2.643 2.374-4.423 0-3.163-3.007-5.73-6.71-5.73h-.159zm-2.434 3.34a.88.88 0 110 1.76.88.88 0 010-1.76zm4.868 0a.88.88 0 110 1.76.88.88 0 010-1.76z"/>
+                </svg>
+              </div>
+              <div>
+                <h3 class="text-lg font-semibold text-gray-900 dark:text-white">联系客服</h3>
+                <p class="text-sm text-gray-500 dark:text-dark-400">扫码添加好友</p>
+              </div>
+            </div>
+
+            <!-- 二维码卡片 -->
+            <div class="mb-4 overflow-hidden rounded-xl border border-primary-100 bg-gradient-to-br from-primary-50 to-white p-3 dark:border-primary-800/30 dark:from-primary-900/10 dark:to-dark-800">
+              <img
+                src="/wechat-qr.jpg"
+                alt="微信二维码"
+                class="w-full rounded-lg"
+              />
+            </div>
+
+            <!-- 提示文字 -->
+            <div class="text-center">
+              <p class="mb-2 text-sm font-medium text-primary-600 dark:text-primary-400">
+                微信扫码添加客服
+              </p>
+              <p class="flex items-center justify-center gap-1 text-xs text-gray-500 dark:text-dark-400">
+                <svg class="h-4 w-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+                  <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M12 8v4l3 3m6-3a9 9 0 11-18 0 9 9 0 0118 0z" />
+                </svg>
+                工作时间：周一至周五 9:00-18:00
+              </p>
+            </div>
+          </div>
+        </Transition>
+      </div>
+    </Transition>
+  </Teleport>
+</template>
+
+<script setup lang="ts">
+import { ref } from 'vue'
+
+const showModal = ref(false)
+</script>
+
+<style scoped>
+.fade-enter-active,
+.fade-leave-active {
+  transition: opacity 0.2s ease;
+}
+
+.fade-enter-from,
+.fade-leave-to {
+  opacity: 0;
+}
+
+.scale-enter-active,
+.scale-leave-active {
+  transition: all 0.2s ease;
+}
+
+.scale-enter-from,
+.scale-leave-to {
+  opacity: 0;
+  transform: scale(0.95);
+}
+</style>
diff --git a/frontend/src/components/layout/AppHeader.vue b/frontend/src/components/layout/AppHeader.vue
index 76bf684f..5729d880 100644
--- a/frontend/src/components/layout/AppHeader.vue
+++ b/frontend/src/components/layout/AppHeader.vue
@@ -121,23 +121,6 @@
                   <Icon name="key" size="sm" />
                   {{ t('nav.apiKeys') }}
                 </router-link>
-
-                <a
-                  href="https://github.com/Wei-Shaw/sub2api"
-                  target="_blank"
-                  rel="noopener noreferrer"
-                  @click="closeDropdown"
-                  class="dropdown-item"
-                >
-                  <svg class="h-4 w-4" fill="currentColor" viewBox="0 0 24 24">
-                    <path
-                      fill-rule="evenodd"
-                      clip-rule="evenodd"
-                      d="M12 2C6.477 2 2 6.477 2 12c0 4.42 2.865 8.17 6.839 9.49.5.092.682-.217.682-.482 0-.237-.008-.866-.013-1.7-2.782.604-3.369-1.34-3.369-1.34-.454-1.156-1.11-1.464-1.11-1.464-.908-.62.069-.608.069-.608 1.003.07 1.531 1.03 1.531 1.03.892 1.529 2.341 1.087 2.91.831.092-.646.35-1.086.636-1.336-2.22-.253-4.555-1.11-4.555-4.943 0-1.091.39-1.984 1.029-2.683-.103-.253-.446-1.27.098-2.647 0 0 .84-.269 2.75 1.025A9.578 9.578 0 0112 6.836c.85.004 1.705.114 2.504.336 1.909-1.294 2.747-1.025 2.747-1.025.546 1.377.203 2.394.1 2.647.64.699 1.028 1.592 1.028 2.683 0 3.842-2.339 4.687-4.566 4.935.359.309.678.919.678 1.852 0 1.336-.012 2.415-.012 2.743 0 .267.18.578.688.48C19.138 20.167 22 16.418 22 12c0-5.523-4.477-10-10-10z"
-                    />
-                  </svg>
-                  {{ t('nav.github') }}
-                </a>
               </div>
 
               <!-- Contact Support (only show if configured) -->
diff --git a/frontend/src/i18n/locales/en.ts b/frontend/src/i18n/locales/en.ts
index 055998a7..d055a4c8 100644
--- a/frontend/src/i18n/locales/en.ts
+++ b/frontend/src/i18n/locales/en.ts
@@ -1482,6 +1482,14 @@ export default {
         enabled: 'Enabled',
         disabled: 'Disabled'
       },
+      claudeMaxSimulation: {
+        title: 'Claude Max Usage Simulation',
+        tooltip:
+          'When enabled, for Claude models without upstream cache-write usage, the system deterministically maps tokens to a small input plus 1h cache creation while keeping total tokens unchanged.',
+        enabled: 'Enabled (simulate 1h cache)',
+        disabled: 'Disabled',
+        hint: 'Only token categories in usage billing logs are adjusted. No per-request mapping state is persisted.'
+      },
       supportedScopes: {
         title: 'Supported Model Families',
         tooltip: 'Select the model families this group supports. Unchecked families will not be routed to this group.',
@@ -1734,6 +1742,10 @@ export default {
           stickyExemptWarning: 'RPM limit (Sticky Exempt) - Approaching limit',
           stickyExemptOver: 'RPM limit (Sticky Exempt) - Over limit, sticky only'
         },
+        quota: {
+          exceeded: 'Quota exceeded, account paused',
+          normal: 'Quota normal'
+        },
       },
       tempUnschedulable: {
         title: 'Temp Unschedulable',
@@ -1779,6 +1791,14 @@ export default {
         }
       },
       clearRateLimit: 'Clear Rate Limit',
+      resetQuota: 'Reset Quota',
+      quotaLimit: 'Quota Limit',
+      quotaLimitPlaceholder: '0 means unlimited',
+      quotaLimitHint: 'Set max spending limit (USD). Account will be paused when reached. Changing limit won\'t reset usage.',
+      quotaLimitToggle: 'Enable Quota Limit',
+      quotaLimitToggleHint: 'When enabled, account will be paused when usage reaches the set limit',
+      quotaLimitAmount: 'Limit Amount',
+      quotaLimitAmountHint: 'Maximum spending limit (USD). Account will be auto-paused when reached. Changing limit won\'t reset usage.',
       testConnection: 'Test Connection',
       reAuthorize: 'Re-Authorize',
       refreshToken: 'Refresh Token',
@@ -1991,10 +2011,12 @@ export default {
       proxy: 'Proxy',
       noProxy: 'No Proxy',
       concurrency: 'Concurrency',
+      loadFactor: 'Load Factor',
+      loadFactorHint: 'Defaults to concurrency',
       priority: 'Priority',
       priorityHint: 'Lower value accounts are used first',
       billingRateMultiplier: 'Billing Rate Multiplier',
-      billingRateMultiplierHint: '>=0, 0 means free. Affects account billing only',
+      billingRateMultiplierHint: '0 = free, affects account billing only',
       expiresAt: 'Expires At',
       expiresAtHint: 'Leave empty for no expiration',
       higherPriorityFirst: 'Lower value means higher priority',
diff --git a/frontend/src/i18n/locales/zh.ts b/frontend/src/i18n/locales/zh.ts
index cc203adb..ecaff13d 100644
--- a/frontend/src/i18n/locales/zh.ts
+++ b/frontend/src/i18n/locales/zh.ts
@@ -1784,8 +1784,20 @@ export default {
           stickyExemptWarning: 'RPM 限制 (粘性豁免) - 接近阈值',
           stickyExemptOver: 'RPM 限制 (粘性豁免) - 超限，仅粘性会话'
         },
+        quota: {
+          exceeded: '配额已用完，账号暂停调度',
+          normal: '配额正常'
+        },
       },
       clearRateLimit: '清除速率限制',
+      resetQuota: '重置配额',
+      quotaLimit: '配额限制',
+      quotaLimitPlaceholder: '0 表示不限制',
+      quotaLimitHint: '设置最大使用额度（美元），达到后账号暂停调度。修改限额不会重置已用额度。',
+      quotaLimitToggle: '启用配额限制',
+      quotaLimitToggleHint: '开启后，当账号用量达到设定额度时自动暂停调度',
+      quotaLimitAmount: '限额金额',
+      quotaLimitAmountHint: '账号最大可用额度（美元），达到后自动暂停。修改限额不会重置已用额度。',
       testConnection: '测试连接',
       reAuthorize: '重新授权',
       refreshToken: '刷新令牌',
@@ -2133,10 +2145,12 @@ export default {
       proxy: '代理',
       noProxy: '无代理',
       concurrency: '并发数',
+      loadFactor: '负载因子',
+      loadFactorHint: '不填则等于并发数',
       priority: '优先级',
       priorityHint: '优先级越小的账号优先使用',
       billingRateMultiplier: '账号计费倍率',
-      billingRateMultiplierHint: '>=0，0 表示该账号计费为 0；仅影响账号计费口径',
+      billingRateMultiplierHint: '0 表示不计费，仅影响账号计费',
       expiresAt: '过期时间',
       expiresAtHint: '留空表示不过期',
       higherPriorityFirst: '数值越小优先级越高',
diff --git a/frontend/src/types/index.ts b/frontend/src/types/index.ts
index 915822f0..7da1b0c8 100644
--- a/frontend/src/types/index.ts
+++ b/frontend/src/types/index.ts
@@ -395,6 +395,8 @@ export interface AdminGroup extends Group {
 
   // MCP XML 协议注入（仅 antigravity 平台使用）
   mcp_xml_inject: boolean
+  // Claude usage 模拟开关（仅 anthropic 平台使用）
+  simulate_claude_max_enabled: boolean
 
   // 支持的模型系列（仅 antigravity 平台使用）
   supported_model_scopes?: string[]
@@ -483,6 +485,7 @@ export interface CreateGroupRequest {
   fallback_group_id?: number | null
   fallback_group_id_on_invalid_request?: number | null
   mcp_xml_inject?: boolean
+  simulate_claude_max_enabled?: boolean
   supported_model_scopes?: string[]
   // 从指定分组复制账号
   copy_accounts_from_group_ids?: number[]
@@ -511,6 +514,7 @@ export interface UpdateGroupRequest {
   fallback_group_id?: number | null
   fallback_group_id_on_invalid_request?: number | null
   mcp_xml_inject?: boolean
+  simulate_claude_max_enabled?: boolean
   supported_model_scopes?: string[]
   copy_accounts_from_group_ids?: number[]
 }
@@ -653,6 +657,7 @@ export interface Account {
   } & Record<string, unknown>)
   proxy_id: number | null
   concurrency: number
+  load_factor?: number | null
   current_concurrency?: number // Real-time concurrency count from Redis
   priority: number
   rate_multiplier?: number // Account billing multiplier (>=0, 0 means free)
@@ -705,6 +710,10 @@ export interface Account {
   cache_ttl_override_enabled?: boolean | null
   cache_ttl_override_target?: string | null
 
+  // API Key 账号配额限制
+  quota_limit?: number | null
+  quota_used?: number | null
+
   // 运行时状态（仅当启用对应限制时返回）
   current_window_cost?: number | null // 当前窗口费用
   active_sessions?: number | null // 当前活跃会话数
@@ -783,6 +792,7 @@ export interface CreateAccountRequest {
   extra?: Record<string, unknown>
   proxy_id?: number | null
   concurrency?: number
+  load_factor?: number | null
   priority?: number
   rate_multiplier?: number // Account billing multiplier (>=0, 0 means free)
   group_ids?: number[]
@@ -799,6 +809,7 @@ export interface UpdateAccountRequest {
   extra?: Record<string, unknown>
   proxy_id?: number | null
   concurrency?: number
+  load_factor?: number | null
   priority?: number
   rate_multiplier?: number // Account billing multiplier (>=0, 0 means free)
   schedulable?: boolean
diff --git a/frontend/src/utils/__tests__/usageLoadQueue.spec.ts b/frontend/src/utils/__tests__/usageLoadQueue.spec.ts
new file mode 100644
index 00000000..24cebec8
--- /dev/null
+++ b/frontend/src/utils/__tests__/usageLoadQueue.spec.ts
@@ -0,0 +1,87 @@
+import { describe, expect, it, vi } from 'vitest'
+import { enqueueUsageRequest } from '../usageLoadQueue'
+
+function delay(ms: number) {
+  return new Promise((r) => setTimeout(r, ms))
+}
+
+describe('usageLoadQueue', () => {
+  it('同组请求串行执行，间隔 >= 1s', async () => {
+    const timestamps: number[] = []
+    const makeFn = () => async () => {
+      timestamps.push(Date.now())
+      return 'ok'
+    }
+
+    const p1 = enqueueUsageRequest('anthropic', 'oauth', 1, makeFn())
+    const p2 = enqueueUsageRequest('anthropic', 'oauth', 1, makeFn())
+    const p3 = enqueueUsageRequest('anthropic', 'oauth', 1, makeFn())
+
+    await Promise.all([p1, p2, p3])
+
+    expect(timestamps).toHaveLength(3)
+    // 随机 1-1.5s 间隔，至少 950ms（留一点误差）
+    expect(timestamps[1] - timestamps[0]).toBeGreaterThanOrEqual(950)
+    expect(timestamps[1] - timestamps[0]).toBeLessThan(1600)
+    expect(timestamps[2] - timestamps[1]).toBeGreaterThanOrEqual(950)
+    expect(timestamps[2] - timestamps[1]).toBeLessThan(1600)
+  })
+
+  it('不同组请求并行执行', async () => {
+    const timestamps: Record<string, number> = {}
+    const makeTracked = (key: string) => async () => {
+      timestamps[key] = Date.now()
+      return key
+    }
+
+    const p1 = enqueueUsageRequest('anthropic', 'oauth', 1, makeTracked('group1'))
+    const p2 = enqueueUsageRequest('anthropic', 'oauth', 2, makeTracked('group2'))
+    const p3 = enqueueUsageRequest('gemini', 'oauth', 1, makeTracked('group3'))
+
+    await Promise.all([p1, p2, p3])
+
+    // 不同组应几乎同时启动（差距 < 50ms）
+    const values = Object.values(timestamps)
+    const spread = Math.max(...values) - Math.min(...values)
+    expect(spread).toBeLessThan(50)
+  })
+
+  it('请求失败时 reject，后续任务继续执行', async () => {
+    const results: string[] = []
+
+    const p1 = enqueueUsageRequest('anthropic', 'oauth', 99, async () => {
+      throw new Error('fail')
+    })
+    const p2 = enqueueUsageRequest('anthropic', 'oauth', 99, async () => {
+      results.push('second')
+      return 'ok'
+    })
+
+    await expect(p1).rejects.toThrow('fail')
+    await p2
+    expect(results).toEqual(['second'])
+  })
+
+  it('返回值正确透传', async () => {
+    const result = await enqueueUsageRequest('test', 'oauth', null, async () => {
+      return { usage: 42 }
+    })
+    expect(result).toEqual({ usage: 42 })
+  })
+
+  it('proxy_id 为 null 的账号归为同一组', async () => {
+    const order: number[] = []
+    const makeFn = (n: number) => async () => {
+      order.push(n)
+      return n
+    }
+
+    const p1 = enqueueUsageRequest('anthropic', 'oauth', null, makeFn(1))
+    const p2 = enqueueUsageRequest('anthropic', 'oauth', null, makeFn(2))
+
+    await Promise.all([p1, p2])
+
+    // 同组串行，按入队顺序执行
+    expect(order).toEqual([1, 2])
+  })
+})
diff --git a/frontend/src/utils/usageLoadQueue.ts b/frontend/src/utils/usageLoadQueue.ts
new file mode 100644
index 00000000..97549b15
--- /dev/null
+++ b/frontend/src/utils/usageLoadQueue.ts
@@ -0,0 +1,72 @@
+/**
+ * Usage request queue that throttles API calls by group.
+ *
+ * Accounts sharing the same upstream (platform + type + proxy) are placed
+ * into a single serial queue with a configurable delay between requests,
+ * preventing upstream 429 rate-limit errors.
+ *
+ * Different groups run in parallel since they hit different upstreams.
+ */
+
+const GROUP_DELAY_MIN_MS = 1000
+const GROUP_DELAY_MAX_MS = 1500
+
+type Task<T> = {
+  fn: () => Promise<T>
+  resolve: (value: T) => void
+  reject: (reason: unknown) => void
+}
+
+const queues = new Map<string, Task<unknown>[]>()
+const running = new Set<string>()
+
+function buildGroupKey(platform: string, type: string, proxyId: number | null): string {
+  return `${platform}:${type}:${proxyId ?? 'direct'}`
+}
+
+async function drain(groupKey: string) {
+  if (running.has(groupKey)) return
+  running.add(groupKey)
+
+  const queue = queues.get(groupKey)
+  while (queue && queue.length > 0) {
+    const task = queue.shift()!
+    try {
+      const result = await task.fn()
+      task.resolve(result)
+    } catch (err) {
+      task.reject(err)
+    }
+    // Wait a random 1–1.5s before next request in the same group
+    if (queue.length > 0) {
+      const jitter = GROUP_DELAY_MIN_MS + Math.random() * (GROUP_DELAY_MAX_MS - GROUP_DELAY_MIN_MS)
+      await new Promise((r) => setTimeout(r, jitter))
+    }
+  }
+
+  running.delete(groupKey)
+  queues.delete(groupKey)
+}
+
+/**
+ * Enqueue a usage fetch call. Returns a promise that resolves when the
+ * request completes (after waiting its turn in the group queue).
+ */
+export function enqueueUsageRequest<T>(
+  platform: string,
+  type: string,
+  proxyId: number | null,
+  fn: () => Promise<T>
+): Promise<T> {
+  const key = buildGroupKey(platform, type, proxyId)
+
+  return new Promise<T>((resolve, reject) => {
+    let queue = queues.get(key)
+    if (!queue) {
+      queue = []
+      queues.set(key, queue)
+    }
+    queue.push({ fn, resolve, reject } as Task<unknown>)
+    drain(key)
+  })
+}
diff --git a/frontend/src/views/HomeView.vue b/frontend/src/views/HomeView.vue
index 6a3753f1..babcf046 100644
--- a/frontend/src/views/HomeView.vue
+++ b/frontend/src/views/HomeView.vue
@@ -122,8 +122,11 @@
             >
               {{ siteName }}
             </h1>
-            <p class="mb-8 text-lg text-gray-600 dark:text-dark-300 md:text-xl">
-              {{ siteSubtitle }}
+            <p class="mb-3 text-xl font-semibold text-primary-600 dark:text-primary-400 md:text-2xl">
+              {{ t('home.heroSubtitle') }}
+            </p>
+            <p class="mb-8 text-base text-gray-600 dark:text-dark-300 md:text-lg">
+              {{ t('home.heroDescription') }}
             </p>
 
             <!-- CTA Button -->
@@ -177,7 +180,7 @@
         </div>
 
         <!-- Feature Tags - Centered -->
-        <div class="mb-12 flex flex-wrap items-center justify-center gap-4 md:gap-6">
+        <div class="mb-16 flex flex-wrap items-center justify-center gap-4 md:gap-6">
           <div
             class="inline-flex items-center gap-2.5 rounded-full border border-gray-200/50 bg-white/80 px-5 py-2.5 shadow-sm backdrop-blur-sm dark:border-dark-700/50 dark:bg-dark-800/80"
           >
@@ -204,6 +207,63 @@
           </div>
         </div>
 
+        <!-- Pain Points Section -->
+        <div class="mb-16">
+          <h2 class="mb-8 text-center text-2xl font-bold text-gray-900 dark:text-white md:text-3xl">
+            {{ t('home.painPoints.title') }}
+          </h2>
+          <div class="grid gap-4 sm:grid-cols-2 lg:grid-cols-4">
+            <!-- Pain Point 1: Expensive -->
+            <div class="rounded-xl border border-red-200/50 bg-red-50/50 p-5 dark:border-red-900/30 dark:bg-red-950/20">
+              <div class="mb-3 flex h-10 w-10 items-center justify-center rounded-lg bg-red-100 dark:bg-red-900/30">
+                <svg class="h-5 w-5 text-red-500" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
+                  <path stroke-linecap="round" stroke-linejoin="round" d="M12 8c-1.657 0-3 .895-3 2s1.343 2 3 2 3 .895 3 2-1.343 2-3 2m0-8c1.11 0 2.08.402 2.599 1M12 8V7m0 1v8m0 0v1m0-1c-1.11 0-2.08-.402-2.599-1M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
+                </svg>
+              </div>
+              <h3 class="mb-1.5 font-semibold text-gray-900 dark:text-white">{{ t('home.painPoints.items.expensive.title') }}</h3>
+              <p class="text-sm text-gray-600 dark:text-dark-400">{{ t('home.painPoints.items.expensive.desc') }}</p>
+            </div>
+            <!-- Pain Point 2: Complex -->
+            <div class="rounded-xl border border-orange-200/50 bg-orange-50/50 p-5 dark:border-orange-900/30 dark:bg-orange-950/20">
+              <div class="mb-3 flex h-10 w-10 items-center justify-center rounded-lg bg-orange-100 dark:bg-orange-900/30">
+                <svg class="h-5 w-5 text-orange-500" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
+                  <path stroke-linecap="round" stroke-linejoin="round" d="M19 11H5m14 0a2 2 0 012 2v6a2 2 0 01-2 2H5a2 2 0 01-2-2v-6a2 2 0 012-2m14 0V9a2 2 0 00-2-2M5 11V9a2 2 0 012-2m0 0V5a2 2 0 012-2h6a2 2 0 012 2v2M7 7h10" />
+                </svg>
+              </div>
+              <h3 class="mb-1.5 font-semibold text-gray-900 dark:text-white">{{ t('home.painPoints.items.complex.title') }}</h3>
+              <p class="text-sm text-gray-600 dark:text-dark-400">{{ t('home.painPoints.items.complex.desc') }}</p>
+            </div>
+            <!-- Pain Point 3: Unstable -->
+            <div class="rounded-xl border border-yellow-200/50 bg-yellow-50/50 p-5 dark:border-yellow-900/30 dark:bg-yellow-950/20">
+              <div class="mb-3 flex h-10 w-10 items-center justify-center rounded-lg bg-yellow-100 dark:bg-yellow-900/30">
+                <svg class="h-5 w-5 text-yellow-600" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
+                  <path stroke-linecap="round" stroke-linejoin="round" d="M12 9v2m0 4h.01m-6.938 4h13.856c1.54 0 2.502-1.667 1.732-3L13.732 4c-.77-1.333-2.694-1.333-3.464 0L3.34 16c-.77 1.333.192 3 1.732 3z" />
+                </svg>
+              </div>
+              <h3 class="mb-1.5 font-semibold text-gray-900 dark:text-white">{{ t('home.painPoints.items.unstable.title') }}</h3>
+              <p class="text-sm text-gray-600 dark:text-dark-400">{{ t('home.painPoints.items.unstable.desc') }}</p>
+            </div>
+            <!-- Pain Point 4: No Control -->
+            <div class="rounded-xl border border-gray-200/50 bg-gray-50/50 p-5 dark:border-dark-700/50 dark:bg-dark-800/50">
+              <div class="mb-3 flex h-10 w-10 items-center justify-center rounded-lg bg-gray-100 dark:bg-dark-700">
+                <svg class="h-5 w-5 text-gray-500" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2">
+                  <path stroke-linecap="round" stroke-linejoin="round" d="M18.364 18.364A9 9 0 005.636 5.636m12.728 12.728A9 9 0 015.636 5.636m12.728 12.728L5.636 5.636" />
+                </svg>
+              </div>
+              <h3 class="mb-1.5 font-semibold text-gray-900 dark:text-white">{{ t('home.painPoints.items.noControl.title') }}</h3>
+              <p class="text-sm text-gray-600 dark:text-dark-400">{{ t('home.painPoints.items.noControl.desc') }}</p>
+            </div>
+          </div>
+        </div>
+
+        <!-- Solutions Section Title -->
+        <div class="mb-8 text-center">
+          <h2 class="mb-2 text-2xl font-bold text-gray-900 dark:text-white md:text-3xl">
+            {{ t('home.solutions.title') }}
+          </h2>
+          <p class="text-gray-600 dark:text-dark-400">{{ t('home.solutions.subtitle') }}</p>
+        </div>
+
         <!-- Features Grid -->
         <div class="mb-12 grid gap-6 md:grid-cols-3">
           <!-- Feature 1: Unified Gateway -->
@@ -369,6 +429,77 @@
             >
           </div>
         </div>
+
+        <!-- Comparison Table -->
+        <div class="mb-16">
+          <h2 class="mb-8 text-center text-2xl font-bold text-gray-900 dark:text-white md:text-3xl">
+            {{ t('home.comparison.title') }}
+          </h2>
+          <div class="overflow-x-auto">
+            <table class="w-full rounded-xl border border-gray-200/50 bg-white/60 backdrop-blur-sm dark:border-dark-700/50 dark:bg-dark-800/60">
+              <thead>
+                <tr class="border-b border-gray-200/50 dark:border-dark-700/50">
+                  <th class="px-6 py-4 text-left text-sm font-semibold text-gray-900 dark:text-white">{{ t('home.comparison.headers.feature') }}</th>
+                  <th class="px-6 py-4 text-center text-sm font-semibold text-gray-500 dark:text-dark-400">{{ t('home.comparison.headers.official') }}</th>
+                  <th class="px-6 py-4 text-center text-sm font-semibold text-primary-600 dark:text-primary-400">{{ t('home.comparison.headers.us') }}</th>
+                </tr>
+              </thead>
+              <tbody class="divide-y divide-gray-200/50 dark:divide-dark-700/50">
+                <tr>
+                  <td class="px-6 py-4 text-sm font-medium text-gray-900 dark:text-white">{{ t('home.comparison.items.pricing.feature') }}</td>
+                  <td class="px-6 py-4 text-center text-sm text-gray-500 dark:text-dark-400">{{ t('home.comparison.items.pricing.official') }}</td>
+                  <td class="px-6 py-4 text-center text-sm font-medium text-primary-600 dark:text-primary-400">{{ t('home.comparison.items.pricing.us') }}</td>
+                </tr>
+                <tr>
+                  <td class="px-6 py-4 text-sm font-medium text-gray-900 dark:text-white">{{ t('home.comparison.items.models.feature') }}</td>
+                  <td class="px-6 py-4 text-center text-sm text-gray-500 dark:text-dark-400">{{ t('home.comparison.items.models.official') }}</td>
+                  <td class="px-6 py-4 text-center text-sm font-medium text-primary-600 dark:text-primary-400">{{ t('home.comparison.items.models.us') }}</td>
+                </tr>
+                <tr>
+                  <td class="px-6 py-4 text-sm font-medium text-gray-900 dark:text-white">{{ t('home.comparison.items.management.feature') }}</td>
+                  <td class="px-6 py-4 text-center text-sm text-gray-500 dark:text-dark-400">{{ t('home.comparison.items.management.official') }}</td>
+                  <td class="px-6 py-4 text-center text-sm font-medium text-primary-600 dark:text-primary-400">{{ t('home.comparison.items.management.us') }}</td>
+                </tr>
+                <tr>
+                  <td class="px-6 py-4 text-sm font-medium text-gray-900 dark:text-white">{{ t('home.comparison.items.stability.feature') }}</td>
+                  <td class="px-6 py-4 text-center text-sm text-gray-500 dark:text-dark-400">{{ t('home.comparison.items.stability.official') }}</td>
+                  <td class="px-6 py-4 text-center text-sm font-medium text-primary-600 dark:text-primary-400">{{ t('home.comparison.items.stability.us') }}</td>
+                </tr>
+                <tr>
+                  <td class="px-6 py-4 text-sm font-medium text-gray-900 dark:text-white">{{ t('home.comparison.items.control.feature') }}</td>
+                  <td class="px-6 py-4 text-center text-sm text-gray-500 dark:text-dark-400">{{ t('home.comparison.items.control.official') }}</td>
+                  <td class="px-6 py-4 text-center text-sm font-medium text-primary-600 dark:text-primary-400">{{ t('home.comparison.items.control.us') }}</td>
+                </tr>
+              </tbody>
+            </table>
+          </div>
+        </div>
+
+        <!-- CTA Section -->
+        <div class="mb-8 rounded-2xl bg-gradient-to-r from-primary-500 to-primary-600 p-8 text-center shadow-xl shadow-primary-500/20 md:p-12">
+          <h2 class="mb-3 text-2xl font-bold text-white md:text-3xl">
+            {{ t('home.cta.title') }}
+          </h2>
+          <p class="mb-6 text-primary-100">
+            {{ t('home.cta.description') }}
+          </p>
+          <router-link
+            v-if="!isAuthenticated"
+            to="/register"
+            class="inline-flex items-center gap-2 rounded-full bg-white px-8 py-3 font-semibold text-primary-600 shadow-lg transition-all hover:bg-gray-50 hover:shadow-xl"
+          >
+            {{ t('home.cta.button') }}
+            <Icon name="arrowRight" size="md" :stroke-width="2" />
+          </router-link>
+          <router-link
+            v-else
+            :to="dashboardPath"
+            class="inline-flex items-center gap-2 rounded-full bg-white px-8 py-3 font-semibold text-primary-600 shadow-lg transition-all hover:bg-gray-50 hover:shadow-xl"
+          >
+            {{ t('home.goToDashboard') }}
+            <Icon name="arrowRight" size="md" :stroke-width="2" />
+          </router-link>
+        </div>
       </div>
     </main>
 
@@ -380,27 +511,20 @@
         <p class="text-sm text-gray-500 dark:text-dark-400">
           &copy; {{ currentYear }} {{ siteName }}. {{ t('home.footer.allRightsReserved') }}
         </p>
-        <div class="flex items-center gap-4">
-          <a
-            v-if="docUrl"
-            :href="docUrl"
-            target="_blank"
-            rel="noopener noreferrer"
-            class="text-sm text-gray-500 transition-colors hover:text-gray-700 dark:text-dark-400 dark:hover:text-white"
-          >
-            {{ t('home.docs') }}
-          </a>
-          <a
-            :href="githubUrl"
-            target="_blank"
-            rel="noopener noreferrer"
-            class="text-sm text-gray-500 transition-colors hover:text-gray-700 dark:text-dark-400 dark:hover:text-white"
-          >
-            GitHub
-          </a>
-        </div>
+        <a
+          v-if="docUrl"
+          :href="docUrl"
+          target="_blank"
+          rel="noopener noreferrer"
+          class="text-sm text-gray-500 transition-colors hover:text-gray-700 dark:text-dark-400 dark:hover:text-white"
+        >
+          {{ t('home.docs') }}
+        </a>
       </div>
     </footer>
+
+    <!-- 微信客服悬浮按钮 -->
+    <WechatServiceButton />
   </div>
 </template>
 
@@ -410,6 +534,7 @@ import { useI18n } from 'vue-i18n'
 import { useAuthStore, useAppStore } from '@/stores'
 import LocaleSwitcher from '@/components/common/LocaleSwitcher.vue'
 import Icon from '@/components/icons/Icon.vue'
+import WechatServiceButton from '@/components/common/WechatServiceButton.vue'
 
 const { t } = useI18n()
 
@@ -419,7 +544,6 @@ const appStore = useAppStore()
 // Site settings - directly from appStore (already initialized from injected config)
 const siteName = computed(() => appStore.cachedPublicSettings?.site_name || appStore.siteName || 'Sub2API')
 const siteLogo = computed(() => appStore.cachedPublicSettings?.site_logo || appStore.siteLogo || '')
-const siteSubtitle = computed(() => appStore.cachedPublicSettings?.site_subtitle || 'AI API Gateway Platform')
 const docUrl = computed(() => appStore.cachedPublicSettings?.doc_url || appStore.docUrl || '')
 const homeContent = computed(() => appStore.cachedPublicSettings?.home_content || '')
 
@@ -432,9 +556,6 @@ const isHomeContentUrl = computed(() => {
 // Theme
 const isDark = ref(document.documentElement.classList.contains('dark'))
 
-// GitHub URL
-const githubUrl = 'https://github.com/Wei-Shaw/sub2api'
-
 // Auth state
 const isAuthenticated = computed(() => authStore.isAuthenticated)
 const isAdmin = computed(() => authStore.isAdmin)
diff --git a/frontend/src/views/admin/AccountsView.vue b/frontend/src/views/admin/AccountsView.vue
index 146b2647..0173ea0a 100644
--- a/frontend/src/views/admin/AccountsView.vue
+++ b/frontend/src/views/admin/AccountsView.vue
@@ -261,7 +261,7 @@
     <AccountTestModal :show="showTest" :account="testingAcc" @close="closeTestModal" />
     <AccountStatsModal :show="showStats" :account="statsAcc" @close="closeStatsModal" />
     <ScheduledTestsPanel :show="showSchedulePanel" :account-id="scheduleAcc?.id ?? null" :model-options="scheduleModelOptions" @close="closeSchedulePanel" />
-    <AccountActionMenu :show="menu.show" :account="menu.acc" :position="menu.pos" @close="menu.show = false" @test="handleTest" @stats="handleViewStats" @schedule="handleSchedule" @reauth="handleReAuth" @refresh-token="handleRefresh" @reset-status="handleResetStatus" @clear-rate-limit="handleClearRateLimit" />
+    <AccountActionMenu :show="menu.show" :account="menu.acc" :position="menu.pos" @close="menu.show = false" @test="handleTest" @stats="handleViewStats" @schedule="handleSchedule" @reauth="handleReAuth" @refresh-token="handleRefresh" @reset-status="handleResetStatus" @clear-rate-limit="handleClearRateLimit" @reset-quota="handleResetQuota" />
     <SyncFromCrsModal :show="showSync" @close="showSync = false" @synced="reload" />
     <ImportDataModal :show="showImportData" @close="showImportData = false" @imported="handleDataImported" />
     <BulkEditAccountModal :show="showBulkEdit" :account-ids="selIds" :selected-platforms="selPlatforms" :selected-types="selTypes" :proxies="proxies" :groups="groups" @close="showBulkEdit = false" @updated="handleBulkUpdated" />
@@ -1125,6 +1125,16 @@ const handleClearRateLimit = async (a: Account) => {
     console.error('Failed to clear rate limit:', error)
   }
 }
+const handleResetQuota = async (a: Account) => {
+  try {
+    const updated = await adminAPI.accounts.resetAccountQuota(a.id)
+    patchAccountInList(updated)
+    enterAutoRefreshSilentWindow()
+    appStore.showSuccess(t('common.success'))
+  } catch (error) {
+    console.error('Failed to reset quota:', error)
+  }
+}
 const handleDelete = (a: Account) => { deletingAcc.value = a; showDeleteDialog.value = true }
 const confirmDelete = async () => { if(!deletingAcc.value) return; try { await adminAPI.accounts.delete(deletingAcc.value.id); showDeleteDialog.value = false; deletingAcc.value = null; reload() } catch (error) { console.error('Failed to delete account:', error) } }
 const handleToggleSchedulable = async (a: Account) => {
diff --git a/frontend/src/views/admin/GroupsView.vue b/frontend/src/views/admin/GroupsView.vue
index aa0a49a7..7fd70fc5 100644
--- a/frontend/src/views/admin/GroupsView.vue
+++ b/frontend/src/views/admin/GroupsView.vue
@@ -708,6 +708,58 @@
           </div>
         </div>
 
+        <!-- Claude Max Usage 模拟（仅 anthropic 平台） -->
+        <div v-if="createForm.platform === 'anthropic'" class="border-t pt-4">
+          <div class="mb-1.5 flex items-center gap-1">
+            <label class="text-sm font-medium text-gray-700 dark:text-gray-300">
+              {{ t('admin.groups.claudeMaxSimulation.title') }}
+            </label>
+            <div class="group relative inline-flex">
+              <Icon
+                name="questionCircle"
+                size="sm"
+                :stroke-width="2"
+                class="cursor-help text-gray-400 transition-colors hover:text-primary-500 dark:text-gray-500 dark:hover:text-primary-400"
+              />
+              <div class="pointer-events-none absolute bottom-full left-0 z-50 mb-2 w-80 opacity-0 transition-all duration-200 group-hover:pointer-events-auto group-hover:opacity-100">
+                <div class="rounded-lg bg-gray-900 p-3 text-white shadow-lg dark:bg-gray-800">
+                  <p class="text-xs leading-relaxed text-gray-300">
+                    {{ t('admin.groups.claudeMaxSimulation.tooltip') }}
+                  </p>
+                  <div class="absolute -bottom-1.5 left-3 h-3 w-3 rotate-45 bg-gray-900 dark:bg-gray-800"></div>
+                </div>
+              </div>
+            </div>
+          </div>
+          <div class="flex items-center gap-3">
+            <button
+              type="button"
+              @click="createForm.simulate_claude_max_enabled = !createForm.simulate_claude_max_enabled"
+              :class="[
+                'relative inline-flex h-6 w-11 items-center rounded-full transition-colors',
+                createForm.simulate_claude_max_enabled ? 'bg-primary-500' : 'bg-gray-300 dark:bg-dark-600'
+              ]"
+            >
+              <span
+                :class="[
+                  'inline-block h-4 w-4 transform rounded-full bg-white shadow transition-transform',
+                  createForm.simulate_claude_max_enabled ? 'translate-x-6' : 'translate-x-1'
+                ]"
+              />
+            </button>
+            <span class="text-sm text-gray-500 dark:text-gray-400">
+              {{
+                createForm.simulate_claude_max_enabled
+                  ? t('admin.groups.claudeMaxSimulation.enabled')
+                  : t('admin.groups.claudeMaxSimulation.disabled')
+              }}
+            </span>
+          </div>
+          <p class="mt-2 text-xs text-gray-500 dark:text-gray-400">
+            {{ t('admin.groups.claudeMaxSimulation.hint') }}
+          </p>
+        </div>
+
         <!-- 无效请求兜底（仅 anthropic/antigravity 平台，且非订阅分组） -->
         <div
           v-if="['anthropic', 'antigravity'].includes(createForm.platform) && createForm.subscription_type !== 'subscription'"
@@ -1405,6 +1457,58 @@
           </div>
         </div>
 
+        <!-- Claude Max Usage 模拟（仅 anthropic 平台） -->
+        <div v-if="editForm.platform === 'anthropic'" class="border-t pt-4">
+          <div class="mb-1.5 flex items-center gap-1">
+            <label class="text-sm font-medium text-gray-700 dark:text-gray-300">
+              {{ t('admin.groups.claudeMaxSimulation.title') }}
+            </label>
+            <div class="group relative inline-flex">
+              <Icon
+                name="questionCircle"
+                size="sm"
+                :stroke-width="2"
+                class="cursor-help text-gray-400 transition-colors hover:text-primary-500 dark:text-gray-500 dark:hover:text-primary-400"
+              />
+              <div class="pointer-events-none absolute bottom-full left-0 z-50 mb-2 w-80 opacity-0 transition-all duration-200 group-hover:pointer-events-auto group-hover:opacity-100">
+                <div class="rounded-lg bg-gray-900 p-3 text-white shadow-lg dark:bg-gray-800">
+                  <p class="text-xs leading-relaxed text-gray-300">
+                    {{ t('admin.groups.claudeMaxSimulation.tooltip') }}
+                  </p>
+                  <div class="absolute -bottom-1.5 left-3 h-3 w-3 rotate-45 bg-gray-900 dark:bg-gray-800"></div>
+                </div>
+              </div>
+            </div>
+          </div>
+          <div class="flex items-center gap-3">
+            <button
+              type="button"
+              @click="editForm.simulate_claude_max_enabled = !editForm.simulate_claude_max_enabled"
+              :class="[
+                'relative inline-flex h-6 w-11 items-center rounded-full transition-colors',
+                editForm.simulate_claude_max_enabled ? 'bg-primary-500' : 'bg-gray-300 dark:bg-dark-600'
+              ]"
+            >
+              <span
+                :class="[
+                  'inline-block h-4 w-4 transform rounded-full bg-white shadow transition-transform',
+                  editForm.simulate_claude_max_enabled ? 'translate-x-6' : 'translate-x-1'
+                ]"
+              />
+            </button>
+            <span class="text-sm text-gray-500 dark:text-gray-400">
+              {{
+                editForm.simulate_claude_max_enabled
+                  ? t('admin.groups.claudeMaxSimulation.enabled')
+                  : t('admin.groups.claudeMaxSimulation.disabled')
+              }}
+            </span>
+          </div>
+          <p class="mt-2 text-xs text-gray-500 dark:text-gray-400">
+            {{ t('admin.groups.claudeMaxSimulation.hint') }}
+          </p>
+        </div>
+
         <!-- 无效请求兜底（仅 anthropic/antigravity 平台，且非订阅分组） -->
         <div
           v-if="['anthropic', 'antigravity'].includes(editForm.platform) && editForm.subscription_type !== 'subscription'"
@@ -1918,6 +2022,8 @@ const createForm = reactive({
   sora_storage_quota_gb: null as number | null,
   // Claude Code 客户端限制（仅 anthropic 平台使用）
   claude_code_only: false,
+  // Claude Max usage 模拟开关（仅 anthropic 平台）
+  simulate_claude_max_enabled: false,
   fallback_group_id: null as number | null,
   fallback_group_id_on_invalid_request: null as number | null,
   // 模型路由开关
@@ -2159,6 +2265,8 @@ const editForm = reactive({
   sora_storage_quota_gb: null as number | null,
   // Claude Code 客户端限制（仅 anthropic 平台使用）
   claude_code_only: false,
+  // Claude Max usage 模拟开关（仅 anthropic 平台）
+  simulate_claude_max_enabled: false,
   fallback_group_id: null as number | null,
   fallback_group_id_on_invalid_request: null as number | null,
   // 模型路由开关
@@ -2258,6 +2366,7 @@ const closeCreateModal = () => {
   createForm.sora_video_price_per_request_hd = null
   createForm.sora_storage_quota_gb = null
   createForm.claude_code_only = false
+  createForm.simulate_claude_max_enabled = false
   createForm.fallback_group_id = null
   createForm.fallback_group_id_on_invalid_request = null
   createForm.supported_model_scopes = ['claude', 'gemini_text', 'gemini_image']
@@ -2278,6 +2387,8 @@ const handleCreateGroup = async () => {
     const requestData = {
       ...createRest,
       sora_storage_quota_bytes: createQuotaGb ? Math.round(createQuotaGb * 1024 * 1024 * 1024) : 0,
+      simulate_claude_max_enabled:
+        createForm.platform === 'anthropic' ? createForm.simulate_claude_max_enabled : false,
       model_routing: convertRoutingRulesToApiFormat(createModelRoutingRules.value)
     }
     await adminAPI.groups.create(requestData)
@@ -2318,6 +2429,7 @@ const handleEdit = async (group: AdminGroup) => {
   editForm.sora_video_price_per_request_hd = group.sora_video_price_per_request_hd
   editForm.sora_storage_quota_gb = group.sora_storage_quota_bytes ? Number((group.sora_storage_quota_bytes / (1024 * 1024 * 1024)).toFixed(2)) : null
   editForm.claude_code_only = group.claude_code_only || false
+  editForm.simulate_claude_max_enabled = group.simulate_claude_max_enabled || false
   editForm.fallback_group_id = group.fallback_group_id
   editForm.fallback_group_id_on_invalid_request = group.fallback_group_id_on_invalid_request
   editForm.model_routing_enabled = group.model_routing_enabled || false
@@ -2337,6 +2449,7 @@ const closeEditModal = () => {
   showEditModal.value = false
   editingGroup.value = null
   editModelRoutingRules.value = []
+  editForm.simulate_claude_max_enabled = false
   editForm.copy_accounts_from_group_ids = []
 }
 
@@ -2354,6 +2467,8 @@ const handleUpdateGroup = async () => {
     const payload = {
       ...editRest,
       sora_storage_quota_bytes: editQuotaGb ? Math.round(editQuotaGb * 1024 * 1024 * 1024) : 0,
+      simulate_claude_max_enabled:
+        editForm.platform === 'anthropic' ? editForm.simulate_claude_max_enabled : false,
       fallback_group_id: editForm.fallback_group_id === null ? 0 : editForm.fallback_group_id,
       fallback_group_id_on_invalid_request:
         editForm.fallback_group_id_on_invalid_request === null
@@ -2410,6 +2525,21 @@ watch(
     if (!['anthropic', 'antigravity'].includes(newVal)) {
       createForm.fallback_group_id_on_invalid_request = null
     }
+    if (newVal !== 'anthropic') {
+      createForm.simulate_claude_max_enabled = false
+    }
+  }
+)
+
+watch(
+  () => editForm.platform,
+  (newVal) => {
+    if (!['anthropic', 'antigravity'].includes(newVal)) {
+      editForm.fallback_group_id_on_invalid_request = null
+    }
+    if (newVal !== 'anthropic') {
+      editForm.simulate_claude_max_enabled = false
+    }
   }
 )
 
diff --git a/frontend/src/views/admin/ops/components/OpsConcurrencyCard.vue b/frontend/src/views/admin/ops/components/OpsConcurrencyCard.vue
index c7370ab5..ca640ade 100644
--- a/frontend/src/views/admin/ops/components/OpsConcurrencyCard.vue
+++ b/frontend/src/views/admin/ops/components/OpsConcurrencyCard.vue
@@ -122,6 +122,7 @@ const platformRows = computed((): SummaryRow[] => {
       available_accounts: availableAccounts,
       rate_limited_accounts: safeNumber(avail.rate_limit_count),
 
+
       error_accounts: safeNumber(avail.error_count),
       total_concurrency: totalConcurrency,
       used_concurrency: usedConcurrency,
@@ -161,7 +162,6 @@ const groupRows = computed((): SummaryRow[] => {
         total_accounts: totalAccounts,
         available_accounts: availableAccounts,
         rate_limited_accounts: safeNumber(avail.rate_limit_count),
-  
         error_accounts: safeNumber(avail.error_count),
         total_concurrency: totalConcurrency,
         used_concurrency: usedConcurrency,
@@ -329,6 +329,7 @@ function formatDuration(seconds: number): string {
 }
 
 
+
 watch(
   () => realtimeEnabled.value,
   async (enabled) => {
diff --git a/stress_test_gemini_session.sh b/stress_test_gemini_session.sh
new file mode 100644
index 00000000..1f2aca57
--- /dev/null
+++ b/stress_test_gemini_session.sh
@@ -0,0 +1,127 @@
+#!/bin/bash
+
+# Gemini 粘性会话压力测试脚本
+# 测试目标：验证不同会话分配不同账号，同一会话保持同一账号
+
+BASE_URL="http://host.clicodeplus.com:8080"
+API_KEY="sk-32ad0a3197e528c840ea84f0dc6b2056dd3fead03526b5c605a60709bd408f7e"
+MODEL="gemini-2.5-flash"
+
+# 创建临时目录存放结果
+RESULT_DIR="/tmp/gemini_stress_test_$(date +%s)"
+mkdir -p "$RESULT_DIR"
+
+echo "=========================================="
+echo "Gemini 粘性会话压力测试"
+echo "结果目录: $RESULT_DIR"
+echo "=========================================="
+
+# 函数：发送请求并记录
+send_request() {
+    local session_id=$1
+    local round=$2
+    local system_prompt=$3
+    local contents=$4
+    local output_file="$RESULT_DIR/session_${session_id}_round_${round}.json"
+
+    local request_body=$(cat <<EOF
+{
+    "systemInstruction": {
+        "parts": [{"text": "$system_prompt"}]
+    },
+    "contents": $contents
+}
+EOF
+)
+
+    curl -s -X POST "${BASE_URL}/v1beta/models/${MODEL}:generateContent" \
+        -H "Content-Type: application/json" \
+        -H "x-goog-api-key: ${API_KEY}" \
+        -d "$request_body" > "$output_file" 2>&1
+
+    echo "[Session $session_id Round $round] 完成"
+}
+
+# 会话1：数学计算器（累加序列）
+run_session_1() {
+    local sys_prompt="你是一个数学计算器，只返回计算结果数字，不要任何解释"
+
+    # Round 1: 1+1=?
+    send_request 1 1 "$sys_prompt" '[{"role":"user","parts":[{"text":"1+1=?"}]}]'
+
+    # Round 2: 继续 2+2=?（累加历史）
+    send_request 1 2 "$sys_prompt" '[{"role":"user","parts":[{"text":"1+1=?"}]},{"role":"model","parts":[{"text":"2"}]},{"role":"user","parts":[{"text":"2+2=?"}]}]'
+
+    # Round 3: 继续 3+3=?
+    send_request 1 3 "$sys_prompt" '[{"role":"user","parts":[{"text":"1+1=?"}]},{"role":"model","parts":[{"text":"2"}]},{"role":"user","parts":[{"text":"2+2=?"}]},{"role":"model","parts":[{"text":"4"}]},{"role":"user","parts":[{"text":"3+3=?"}]}]'
+
+    # Round 4: 批量计算 10+10, 20+20, 30+30
+    send_request 1 4 "$sys_prompt" '[{"role":"user","parts":[{"text":"1+1=?"}]},{"role":"model","parts":[{"text":"2"}]},{"role":"user","parts":[{"text":"2+2=?"}]},{"role":"model","parts":[{"text":"4"}]},{"role":"user","parts":[{"text":"3+3=?"}]},{"role":"model","parts":[{"text":"6"}]},{"role":"user","parts":[{"text":"计算: 10+10=? 20+20=? 30+30=?"}]}]'
+}
+
+# 会话2：英文翻译器（不同系统提示词 = 不同会话）
+run_session_2() {
+    local sys_prompt="你是一个英文翻译器，将中文翻译成英文，只返回翻译结果"
+
+    send_request 2 1 "$sys_prompt" '[{"role":"user","parts":[{"text":"你好"}]}]'
+    send_request 2 2 "$sys_prompt" '[{"role":"user","parts":[{"text":"你好"}]},{"role":"model","parts":[{"text":"Hello"}]},{"role":"user","parts":[{"text":"世界"}]}]'
+    send_request 2 3 "$sys_prompt" '[{"role":"user","parts":[{"text":"你好"}]},{"role":"model","parts":[{"text":"Hello"}]},{"role":"user","parts":[{"text":"世界"}]},{"role":"model","parts":[{"text":"World"}]},{"role":"user","parts":[{"text":"早上好"}]}]'
+}
+
+# 会话3：日文翻译器
+run_session_3() {
+    local sys_prompt="你是一个日文翻译器，将中文翻译成日文，只返回翻译结果"
+
+    send_request 3 1 "$sys_prompt" '[{"role":"user","parts":[{"text":"你好"}]}]'
+    send_request 3 2 "$sys_prompt" '[{"role":"user","parts":[{"text":"你好"}]},{"role":"model","parts":[{"text":"こんにちは"}]},{"role":"user","parts":[{"text":"谢谢"}]}]'
+    send_request 3 3 "$sys_prompt" '[{"role":"user","parts":[{"text":"你好"}]},{"role":"model","parts":[{"text":"こんにちは"}]},{"role":"user","parts":[{"text":"谢谢"}]},{"role":"model","parts":[{"text":"ありがとう"}]},{"role":"user","parts":[{"text":"再见"}]}]'
+}
+
+# 会话4：乘法计算器（另一个数学会话，但系统提示词不同）
+run_session_4() {
+    local sys_prompt="你是一个乘法专用计算器，只计算乘法，返回数字结果"
+
+    send_request 4 1 "$sys_prompt" '[{"role":"user","parts":[{"text":"2*3=?"}]}]'
+    send_request 4 2 "$sys_prompt" '[{"role":"user","parts":[{"text":"2*3=?"}]},{"role":"model","parts":[{"text":"6"}]},{"role":"user","parts":[{"text":"4*5=?"}]}]'
+    send_request 4 3 "$sys_prompt" '[{"role":"user","parts":[{"text":"2*3=?"}]},{"role":"model","parts":[{"text":"6"}]},{"role":"user","parts":[{"text":"4*5=?"}]},{"role":"model","parts":[{"text":"20"}]},{"role":"user","parts":[{"text":"计算: 10*10=? 20*20=?"}]}]'
+}
+
+# 会话5：诗人（完全不同的角色）
+run_session_5() {
+    local sys_prompt="你是一位诗人，用简短的诗句回应每个话题，每次只写一句诗"
+
+    send_request 5 1 "$sys_prompt" '[{"role":"user","parts":[{"text":"春天"}]}]'
+    send_request 5 2 "$sys_prompt" '[{"role":"user","parts":[{"text":"春天"}]},{"role":"model","parts":[{"text":"春风拂面花满枝"}]},{"role":"user","parts":[{"text":"夏天"}]}]'
+    send_request 5 3 "$sys_prompt" '[{"role":"user","parts":[{"text":"春天"}]},{"role":"model","parts":[{"text":"春风拂面花满枝"}]},{"role":"user","parts":[{"text":"夏天"}]},{"role":"model","parts":[{"text":"蝉鸣蛙声伴荷香"}]},{"role":"user","parts":[{"text":"秋天"}]}]'
+}
+
+echo ""
+echo "开始并发测试 5 个独立会话..."
+echo ""
+
+# 并发运行所有会话
+run_session_1 &
+run_session_2 &
+run_session_3 &
+run_session_4 &
+run_session_5 &
+
+# 等待所有后台任务完成
+wait
+
+echo ""
+echo "=========================================="
+echo "所有请求完成，结果保存在: $RESULT_DIR"
+echo "=========================================="
+
+# 显示结果摘要
+echo ""
+echo "响应摘要:"
+for f in "$RESULT_DIR"/*.json; do
+    filename=$(basename "$f")
+    response=$(cat "$f" | head -c 200)
+    echo "[$filename]: ${response}..."
+done
+
+echo ""
+echo "请检查服务器日志确认账号分配情况"

{{ t('home.comparison.headers.feature') }}	{{ t('home.comparison.headers.official') }}	{{ t('home.comparison.headers.us') }}
{{ t('home.comparison.items.pricing.feature') }}	{{ t('home.comparison.items.pricing.official') }}	{{ t('home.comparison.items.pricing.us') }}
{{ t('home.comparison.items.models.feature') }}	{{ t('home.comparison.items.models.official') }}	{{ t('home.comparison.items.models.us') }}
{{ t('home.comparison.items.management.feature') }}	{{ t('home.comparison.items.management.official') }}	{{ t('home.comparison.items.management.us') }}
{{ t('home.comparison.items.stability.feature') }}	{{ t('home.comparison.items.stability.official') }}	{{ t('home.comparison.items.stability.us') }}
{{ t('home.comparison.items.control.feature') }}	{{ t('home.comparison.items.control.official') }}	{{ t('home.comparison.items.control.us') }}