fix(sandbox):deer-flow-provisioner container fails to start in local execution mode (#889)

This commit is contained in:
Willem Jiang
2026-02-24 08:31:52 +08:00
committed by GitHub
parent b5c11baece
commit 03705acf3a
12 changed files with 452 additions and 52 deletions

View File

@@ -0,0 +1,35 @@
name: Unit Tests
on:
pull_request:
types: [opened, synchronize, reopened, ready_for_review]
concurrency:
group: unit-tests-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
backend-unit-tests:
if: github.event.pull_request.draft == false
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install uv
uses: astral-sh/setup-uv@v6
- name: Install backend dependencies
working-directory: backend
run: uv sync --group dev
- name: Run unit tests of backend
working-directory: backend
run: uv run pytest tests/test_provisioner_kubeconfig.py tests/test_docker_sandbox_mode_detection.py

View File

@@ -41,6 +41,8 @@ Docker provides a consistent, isolated environment with all dependencies pre-con
```bash
make docker-start
```
`make docker-start` reads `config.yaml` and starts `provisioner` only for provisioner/Kubernetes sandbox mode.
All services will start with hot-reload enabled:
- Frontend changes are automatically reloaded
- Backend changes trigger automatic restart
@@ -56,7 +58,7 @@ Docker provides a consistent, isolated environment with all dependencies pre-con
```bash
# Build the custom k3s image (with pre-cached sandbox image)
make docker-init
# Start all services in Docker (localhost:2026)
# Start Docker services (mode-aware, localhost:2026)
make docker-start
# Stop Docker development services
make docker-stop
@@ -77,7 +79,8 @@ Docker Compose (deer-flow-dev)
├→ nginx (port 2026) ← Reverse proxy
├→ web (port 3000) ← Frontend with hot-reload
├→ api (port 8001) ← Gateway API with hot-reload
→ langgraph (port 2024) ← LangGraph server with hot-reload
→ langgraph (port 2024) ← LangGraph server with hot-reload
└→ provisioner (optional, port 8002) ← Started only in provisioner/K8s sandbox mode
```
**Benefits of Docker Development**:
@@ -238,6 +241,13 @@ cd frontend
pnpm test
```
### PR Regression Checks
Every pull request runs the backend regression workflow at [.github/workflows/backend-unit-tests.yml](.github/workflows/backend-unit-tests.yml), including:
- `tests/test_provisioner_kubeconfig.py`
- `tests/test_docker_sandbox_mode_detection.py`
## Code Style
- **Backend (Python)**: We use `ruff` for linting and formatting

View File

@@ -13,7 +13,7 @@ help:
@echo ""
@echo "Docker Development Commands:"
@echo " make docker-init - Build the custom k3s image (with pre-cached sandbox image)"
@echo " make docker-start - Start all services in Docker (localhost:2026)"
@echo " make docker-start - Start Docker services (mode-aware from config.yaml, localhost:2026)"
@echo " make docker-stop - Stop Docker development services"
@echo " make docker-logs - View Docker development logs"
@echo " make docker-logs-frontend - View Docker frontend logs"

View File

@@ -105,9 +105,11 @@ The fastest way to get started with a consistent environment:
1. **Initialize and start**:
```bash
make docker-init # Pull sandbox image (Only once or when image updates)
make docker-start # Start all services and watch for code changes
make docker-start # Start services (auto-detects sandbox mode from config.yaml)
```
`make docker-start` now starts `provisioner` only when `config.yaml` uses provisioner mode (`sandbox.use: src.community.aio_sandbox:AioSandboxProvider` with `provisioner_url`).
2. **Access**: http://localhost:2026
See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed Docker development guide.
@@ -142,6 +144,8 @@ DeerFlow supports multiple sandbox execution modes:
- **Docker Execution** (runs sandbox code in isolated Docker containers)
- **Docker Execution with Kubernetes** (runs sandbox code in Kubernetes pods via provisioner service)
For Docker development, service startup follows `config.yaml` sandbox mode. In Local/Docker modes, `provisioner` is not started.
See the [Sandbox Configuration Guide](backend/docs/CONFIGURATION.md#sandbox) to configure your preferred mode.
#### MCP Server
@@ -242,6 +246,8 @@ DeerFlow is model-agnostic — it works with any LLM that implements the OpenAI-
We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, workflow, and guidelines.
Regression coverage includes Docker sandbox mode detection and provisioner kubeconfig-path handling tests in `backend/tests/`.
## License
This project is open source and available under the [MIT License](./LICENSE).

View File

@@ -11,6 +11,7 @@ DeerFlow is a LangGraph-based AI super agent system with a full-stack architectu
- **Gateway API** (port 8001): REST API for models, MCP, skills, memory, artifacts, and uploads
- **Frontend** (port 3000): Next.js web interface
- **Nginx** (port 2026): Unified reverse proxy entry point
- **Provisioner** (port 8002, optional in Docker dev): Started only when sandbox is configured for provisioner/Kubernetes mode
**Project Structure**:
```
@@ -83,8 +84,15 @@ make dev # Run LangGraph server only (port 2024)
make gateway # Run Gateway API only (port 8001)
make lint # Lint with ruff
make format # Format code with ruff
uv run pytest # Run backend tests
```
Regression tests related to Docker/provisioner behavior:
- `tests/test_docker_sandbox_mode_detection.py` (mode detection from `config.yaml`)
- `tests/test_provisioner_kubeconfig.py` (kubeconfig file/directory handling)
CI runs these regression tests for every pull request via [.github/workflows/backend-unit-tests.yml](../.github/workflows/backend-unit-tests.yml).
## Architecture
### Agent System

View File

@@ -98,6 +98,8 @@ sandbox:
provisioner_url: http://provisioner:8002
```
When using Docker development (`make docker-start`), DeerFlow starts the `provisioner` service only if this provisioner mode is configured. In local or plain Docker sandbox modes, `provisioner` is skipped.
See [Provisioner Setup Guide](docker/provisioner/README.md) for detailed configuration, prerequisites, and troubleshooting.
Choose between local execution or Docker-based isolation:

View File

@@ -0,0 +1,95 @@
"""Regression tests for docker sandbox mode detection logic."""
from __future__ import annotations
import subprocess
import tempfile
from pathlib import Path
REPO_ROOT = Path(__file__).resolve().parents[2]
SCRIPT_PATH = REPO_ROOT / "scripts" / "docker.sh"
def _detect_mode_with_config(config_content: str) -> str:
"""Write config content into a temp project root and execute detect_sandbox_mode."""
with tempfile.TemporaryDirectory() as tmpdir:
tmp_root = Path(tmpdir)
(tmp_root / "config.yaml").write_text(config_content)
command = (
f"source '{SCRIPT_PATH}' && "
f"PROJECT_ROOT='{tmp_root}' && "
"detect_sandbox_mode"
)
output = subprocess.check_output(
["bash", "-lc", command],
text=True,
).strip()
return output
def test_detect_mode_defaults_to_local_when_config_missing():
"""No config file should default to local mode."""
with tempfile.TemporaryDirectory() as tmpdir:
command = (
f"source '{SCRIPT_PATH}' && "
f"PROJECT_ROOT='{tmpdir}' && "
"detect_sandbox_mode"
)
output = subprocess.check_output(["bash", "-lc", command], text=True).strip()
assert output == "local"
def test_detect_mode_local_provider():
"""Local sandbox provider should map to local mode."""
config = """
sandbox:
use: src.sandbox.local:LocalSandboxProvider
""".strip()
assert _detect_mode_with_config(config) == "local"
def test_detect_mode_aio_without_provisioner_url():
"""AIO sandbox without provisioner_url should map to aio mode."""
config = """
sandbox:
use: src.community.aio_sandbox:AioSandboxProvider
""".strip()
assert _detect_mode_with_config(config) == "aio"
def test_detect_mode_provisioner_with_url():
"""AIO sandbox with provisioner_url should map to provisioner mode."""
config = """
sandbox:
use: src.community.aio_sandbox:AioSandboxProvider
provisioner_url: http://provisioner:8002
""".strip()
assert _detect_mode_with_config(config) == "provisioner"
def test_detect_mode_ignores_commented_provisioner_url():
"""Commented provisioner_url should not activate provisioner mode."""
config = """
sandbox:
use: src.community.aio_sandbox:AioSandboxProvider
# provisioner_url: http://provisioner:8002
""".strip()
assert _detect_mode_with_config(config) == "aio"
def test_detect_mode_unknown_provider_falls_back_to_local():
"""Unknown sandbox provider should default to local mode."""
config = """
sandbox:
use: custom.module:UnknownProvider
""".strip()
assert _detect_mode_with_config(config) == "local"

View File

@@ -0,0 +1,121 @@
"""Regression tests for provisioner kubeconfig path handling."""
from __future__ import annotations
import importlib.util
from pathlib import Path
def _load_provisioner_module():
"""Load docker/provisioner/app.py as an importable test module."""
repo_root = Path(__file__).resolve().parents[2]
module_path = repo_root / "docker" / "provisioner" / "app.py"
spec = importlib.util.spec_from_file_location("provisioner_app_test", module_path)
assert spec is not None
assert spec.loader is not None
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module
def test_wait_for_kubeconfig_rejects_directory(tmp_path):
"""Directory mount at kubeconfig path should fail fast with clear error."""
provisioner_module = _load_provisioner_module()
kubeconfig_dir = tmp_path / "config_dir"
kubeconfig_dir.mkdir()
provisioner_module.KUBECONFIG_PATH = str(kubeconfig_dir)
try:
provisioner_module._wait_for_kubeconfig(timeout=1)
raise AssertionError("Expected RuntimeError for directory kubeconfig path")
except RuntimeError as exc:
assert "directory" in str(exc)
def test_wait_for_kubeconfig_accepts_file(tmp_path):
"""Regular file mount should pass readiness wait."""
provisioner_module = _load_provisioner_module()
kubeconfig_file = tmp_path / "config"
kubeconfig_file.write_text("apiVersion: v1\n")
provisioner_module.KUBECONFIG_PATH = str(kubeconfig_file)
# Should return immediately without raising.
provisioner_module._wait_for_kubeconfig(timeout=1)
def test_init_k8s_client_rejects_directory_path(tmp_path):
"""KUBECONFIG_PATH that resolves to a directory should be rejected."""
provisioner_module = _load_provisioner_module()
kubeconfig_dir = tmp_path / "config_dir"
kubeconfig_dir.mkdir()
provisioner_module.KUBECONFIG_PATH = str(kubeconfig_dir)
try:
provisioner_module._init_k8s_client()
raise AssertionError("Expected RuntimeError for directory kubeconfig path")
except RuntimeError as exc:
assert "expected a file" in str(exc)
def test_init_k8s_client_uses_file_kubeconfig(tmp_path, monkeypatch):
"""When file exists, provisioner should load kubeconfig file path."""
provisioner_module = _load_provisioner_module()
kubeconfig_file = tmp_path / "config"
kubeconfig_file.write_text("apiVersion: v1\n")
called: dict[str, object] = {}
def fake_load_kube_config(config_file: str):
called["config_file"] = config_file
monkeypatch.setattr(
provisioner_module.k8s_config,
"load_kube_config",
fake_load_kube_config,
)
monkeypatch.setattr(
provisioner_module.k8s_client,
"CoreV1Api",
lambda *args, **kwargs: "core-v1",
)
provisioner_module.KUBECONFIG_PATH = str(kubeconfig_file)
result = provisioner_module._init_k8s_client()
assert called["config_file"] == str(kubeconfig_file)
assert result == "core-v1"
def test_init_k8s_client_falls_back_to_incluster_when_missing(
tmp_path, monkeypatch
):
"""When kubeconfig file is missing, in-cluster config should be attempted."""
provisioner_module = _load_provisioner_module()
missing_path = tmp_path / "missing-config"
calls: dict[str, int] = {"incluster": 0}
def fake_load_incluster_config():
calls["incluster"] += 1
monkeypatch.setattr(
provisioner_module.k8s_config,
"load_incluster_config",
fake_load_incluster_config,
)
monkeypatch.setattr(
provisioner_module.k8s_client,
"CoreV1Api",
lambda *args, **kwargs: "core-v1",
)
provisioner_module.KUBECONFIG_PATH = str(missing_path)
result = provisioner_module._init_k8s_client()
assert calls["incluster"] == 1
assert result == "core-v1"

View File

@@ -6,11 +6,10 @@
# - frontend: Frontend Next.js dev server (port 3000)
# - gateway: Backend Gateway API (port 8001)
# - langgraph: LangGraph server (port 2024)
# - provisioner: Sandbox provisioner (creates Pods in host Kubernetes)
# - provisioner (optional): Sandbox provisioner (creates Pods in host Kubernetes)
#
# Prerequisites:
# - Host machine must have a running Kubernetes cluster (Docker Desktop K8s,
# minikube, kind, etc.) with kubectl configured (~/.kube/config).
# - Kubernetes cluster + kubeconfig are only required when using provisioner mode.
#
# Access: http://localhost:2026
@@ -20,6 +19,8 @@ services:
# cluster via the K8s API.
# Backend accesses sandboxes directly via host.docker.internal:{NodePort}.
provisioner:
profiles:
- provisioner
build:
context: ./provisioner
dockerfile: Dockerfile
@@ -55,19 +56,21 @@ services:
start_period: 15s
# ── Reverse Proxy ──────────────────────────────────────────────────────
# Routes API traffic to gateway, langgraph, and provisioner services.
# Routes API traffic to gateway/langgraph and (optionally) provisioner.
# Select nginx config via NGINX_CONF:
# - nginx.local.conf (default): no provisioner route (local/aio modes)
# - nginx.conf: includes provisioner route (provisioner mode)
nginx:
image: nginx:alpine
container_name: deer-flow-nginx
ports:
- "2026:2026"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/${NGINX_CONF:-nginx.local.conf}:/etc/nginx/nginx.conf:ro
depends_on:
- frontend
- gateway
- langgraph
- provisioner
networks:
- deer-flow-dev
restart: unless-stopped

View File

@@ -188,7 +188,7 @@ kubectl config view --minify -o jsonpath='{.clusters[0].cluster.server}'
The provisioner runs as part of the docker-compose-dev stack:
```bash
# Start all services including provisioner
# Start Docker services (provisioner starts only when config.yaml enables provisioner mode)
make docker-start
# Or start just the provisioner
@@ -249,6 +249,18 @@ docker exec deer-flow-gateway curl -s $SANDBOX_URL/v1/sandbox
- Run `kubectl config view` to verify
- Check the volume mount in docker-compose-dev.yaml
### Issue: "Kubeconfig path is a directory"
**Cause**: The mounted `KUBECONFIG_PATH` points to a directory instead of a file.
**Solution**:
- Ensure the compose mount source is a file (e.g., `~/.kube/config`) not a directory
- Verify inside container:
```bash
docker exec deer-flow-provisioner ls -ld /root/.kube/config
```
- Expected output should indicate a regular file (`-`), not a directory (`d`)
### Issue: "Connection refused" to K8s API
**Cause**: The provisioner can't reach the K8s API server.

View File

@@ -80,12 +80,29 @@ def _init_k8s_client() -> k8s_client.CoreV1Api:
Tries the mounted kubeconfig first, then falls back to in-cluster
config (useful if the provisioner itself runs inside K8s).
"""
try:
k8s_config.load_kube_config(config_file=KUBECONFIG_PATH)
logger.info(f"Loaded kubeconfig from {KUBECONFIG_PATH}")
except Exception:
logger.warning("Could not load kubeconfig from file, trying in-cluster config")
k8s_config.load_incluster_config()
if os.path.exists(KUBECONFIG_PATH):
if os.path.isdir(KUBECONFIG_PATH):
raise RuntimeError(
f"KUBECONFIG_PATH points to a directory, expected a file: {KUBECONFIG_PATH}"
)
try:
k8s_config.load_kube_config(config_file=KUBECONFIG_PATH)
logger.info(f"Loaded kubeconfig from {KUBECONFIG_PATH}")
except Exception as exc:
raise RuntimeError(
f"Failed to load kubeconfig from {KUBECONFIG_PATH}: {exc}"
) from exc
else:
logger.warning(
f"Kubeconfig not found at {KUBECONFIG_PATH}; trying in-cluster config"
)
try:
k8s_config.load_incluster_config()
except Exception as exc:
raise RuntimeError(
"Failed to initialize Kubernetes client. "
f"No kubeconfig at {KUBECONFIG_PATH}, and in-cluster config is unavailable: {exc}"
) from exc
# When connecting from inside Docker to the host's K8s API, the
# kubeconfig may reference ``localhost`` or ``127.0.0.1``. We
@@ -103,15 +120,27 @@ def _init_k8s_client() -> k8s_client.CoreV1Api:
def _wait_for_kubeconfig(timeout: int = 30) -> None:
"""Block until the kubeconfig file is available."""
"""Wait for kubeconfig file if configured, then continue with fallback support."""
deadline = time.time() + timeout
while time.time() < deadline:
if os.path.exists(KUBECONFIG_PATH):
logger.info(f"Found kubeconfig at {KUBECONFIG_PATH}")
return
if os.path.isfile(KUBECONFIG_PATH):
logger.info(f"Found kubeconfig file at {KUBECONFIG_PATH}")
return
if os.path.isdir(KUBECONFIG_PATH):
raise RuntimeError(
"Kubeconfig path is a directory. "
f"Please mount a kubeconfig file at {KUBECONFIG_PATH}."
)
raise RuntimeError(
f"Kubeconfig path exists but is not a regular file: {KUBECONFIG_PATH}"
)
logger.info(f"Waiting for kubeconfig at {KUBECONFIG_PATH}")
time.sleep(2)
raise RuntimeError(f"Kubeconfig not found at {KUBECONFIG_PATH} after {timeout}s")
logger.warning(
f"Kubeconfig not found at {KUBECONFIG_PATH} after {timeout}s; "
"will attempt in-cluster Kubernetes config"
)
def _ensure_namespace() -> None:

View File

@@ -15,6 +15,51 @@ DOCKER_DIR="$PROJECT_ROOT/docker"
# Docker Compose command with project name
COMPOSE_CMD="docker compose -p deer-flow-dev -f docker-compose-dev.yaml"
detect_sandbox_mode() {
local config_file="$PROJECT_ROOT/config.yaml"
local sandbox_use=""
local provisioner_url=""
if [ ! -f "$config_file" ]; then
echo "local"
return
fi
sandbox_use=$(awk '
/^[[:space:]]*sandbox:[[:space:]]*$/ { in_sandbox=1; next }
in_sandbox && /^[^[:space:]#]/ { in_sandbox=0 }
in_sandbox && /^[[:space:]]*use:[[:space:]]*/ {
line=$0
sub(/^[[:space:]]*use:[[:space:]]*/, "", line)
print line
exit
}
' "$config_file")
provisioner_url=$(awk '
/^[[:space:]]*sandbox:[[:space:]]*$/ { in_sandbox=1; next }
in_sandbox && /^[^[:space:]#]/ { in_sandbox=0 }
in_sandbox && /^[[:space:]]*provisioner_url:[[:space:]]*/ {
line=$0
sub(/^[[:space:]]*provisioner_url:[[:space:]]*/, "", line)
print line
exit
}
' "$config_file")
if [[ "$sandbox_use" == *"src.sandbox.local:LocalSandboxProvider"* ]]; then
echo "local"
elif [[ "$sandbox_use" == *"src.community.aio_sandbox:AioSandboxProvider"* ]]; then
if [ -n "$provisioner_url" ]; then
echo "provisioner"
else
echo "aio"
fi
else
echo "local"
fi
}
# Cleanup function for Ctrl+C
cleanup() {
echo ""
@@ -49,10 +94,32 @@ init() {
# Start Docker development environment
start() {
local sandbox_mode
local nginx_conf
local services
echo "=========================================="
echo " Starting DeerFlow Docker Development"
echo "=========================================="
echo ""
sandbox_mode="$(detect_sandbox_mode)"
if [ "$sandbox_mode" = "provisioner" ]; then
nginx_conf="nginx.conf"
services="frontend gateway langgraph provisioner nginx"
else
nginx_conf="nginx.local.conf"
services="frontend gateway langgraph nginx"
fi
echo -e "${BLUE}Detected sandbox mode: $sandbox_mode${NC}"
if [ "$sandbox_mode" = "provisioner" ]; then
echo -e "${BLUE}Provisioner enabled (Kubernetes mode).${NC}"
else
echo -e "${BLUE}Provisioner disabled (not required for this sandbox mode).${NC}"
fi
echo ""
# Set DEER_FLOW_ROOT for provisioner if not already set
if [ -z "$DEER_FLOW_ROOT" ]; then
@@ -62,7 +129,7 @@ start() {
fi
echo "Building and starting containers..."
cd "$DOCKER_DIR" && $COMPOSE_CMD up --build -d --remove-orphans
cd "$DOCKER_DIR" && NGINX_CONF="$nginx_conf" $COMPOSE_CMD up --build -d --remove-orphans $services
echo ""
echo "=========================================="
echo " DeerFlow Docker is starting!"
@@ -94,12 +161,16 @@ logs() {
service="nginx"
echo -e "${BLUE}Viewing nginx logs...${NC}"
;;
--provisioner)
service="provisioner"
echo -e "${BLUE}Viewing provisioner logs...${NC}"
;;
"")
echo -e "${BLUE}Viewing all logs...${NC}"
;;
*)
echo -e "${YELLOW}Unknown option: $1${NC}"
echo "Usage: $0 logs [--frontend|--gateway]"
echo "Usage: $0 logs [--frontend|--gateway|--nginx|--provisioner]"
exit 1
;;
esac
@@ -138,40 +209,48 @@ help() {
echo ""
echo "Commands:"
echo " init - Pull the sandbox image (speeds up first Pod startup)"
echo " start - Start all services in Docker (localhost:2026)"
echo " start - Start Docker services (auto-detects sandbox mode from config.yaml)"
echo " restart - Restart all running Docker services"
echo " logs [option] - View Docker development logs"
echo " --frontend View frontend logs only"
echo " --gateway View gateway logs only"
echo " --gateway View gateway logs only"
echo " --nginx View nginx logs only"
echo " --provisioner View provisioner logs only"
echo " stop - Stop Docker development services"
echo " help - Show this help message"
echo ""
}
# Main command dispatcher
case "$1" in
init)
init
;;
start)
start
;;
restart)
restart
;;
logs)
logs "$2"
;;
stop)
stop
;;
help|--help|-h|"")
help
;;
*)
echo -e "${YELLOW}Unknown command: $1${NC}"
echo ""
help
exit 1
;;
esac
main() {
# Main command dispatcher
case "$1" in
init)
init
;;
start)
start
;;
restart)
restart
;;
logs)
logs "$2"
;;
stop)
stop
;;
help|--help|-h|"")
help
;;
*)
echo -e "${YELLOW}Unknown command: $1${NC}"
echo ""
help
exit 1
;;
esac
}
if [[ "${BASH_SOURCE[0]}" == "$0" ]]; then
main "$@"
fi