Add Kubernetes-based sandbox provider for multi-instance support (#19)

* feat: adds docker-based dev environment * docs: updates Docker command help * fix local dev * feat(sandbox): add Kubernetes-based sandbox provider for multi-instance support * fix: skills path in k8s * feat: add example config for k8s sandbox * fix: docker config * fix: load skills on docker dev * feat: support sandbox execution to Kubernetes Deployment model * chore: rename web service name
2026-04-18 20:14:44 +08:00 · 2026-02-09 21:59:13 +08:00
parent 554ec7a91e
commit b6da3a219e
20 changed files with 981 additions and 94 deletions
--- a/docker/k8s/README.md
+++ b/docker/k8s/README.md
@@ -0,0 +1,427 @@
+# Kubernetes Sandbox Setup
+
+This guide explains how to deploy and configure the DeerFlow sandbox execution environment on Kubernetes.
+
+## Overview
+
+The Kubernetes sandbox deployment allows you to run DeerFlow's code execution sandbox in a Kubernetes cluster, providing:
+
+- **Isolated Execution**: Sandbox runs in dedicated Kubernetes pods
+- **Scalability**: Easy horizontal scaling with replica configuration
+- **Cluster Integration**: Seamless integration with existing Kubernetes infrastructure
+- **Persistent Skills**: Skills directory mounted from host or PersistentVolume
+
+## Prerequisites
+
+Before you begin, ensure you have:
+
+1. **Kubernetes Cluster**: One of the following:
+   - Docker Desktop with Kubernetes enabled
+   - OrbStack with Kubernetes enabled
+   - Minikube
+   - Any production Kubernetes cluster
+
+2. **kubectl**: Kubernetes command-line tool
+   ```bash
+   # macOS
+   brew install kubectl
+   
+   # Linux
+   # See: https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/
+   ```
+
+3. **Docker**: For pulling the sandbox image (optional, but recommended)
+   ```bash
+   # Verify installation
+   docker version
+   ```
+
+## Quick Start
+
+### 1. Enable Kubernetes
+
+**Docker Desktop:**
+```
+Settings → Kubernetes → Enable Kubernetes → Apply & Restart
+```
+
+**OrbStack:**
+```
+Settings → Enable Kubernetes
+```
+
+**Minikube:**
+```bash
+minikube start
+```
+
+### 2. Run Setup Script
+
+The easiest way to get started:
+
+```bash
+cd docker/k8s
+./setup.sh
+```
+
+This will:
+- ✅ Check kubectl installation and cluster connectivity
+- ✅ Pull the sandbox Docker image (optional, can be skipped)
+- ✅ Create the `deer-flow` namespace
+- ✅ Deploy the sandbox service and deployment
+- ✅ Verify the deployment is running
+
+### 3. Configure Backend
+
+Add the following to `backend/config.yaml`:
+
+```yaml
+sandbox:
+  use: src.community.aio_sandbox:AioSandboxProvider
+  base_url: http://deer-flow-sandbox.deer-flow.svc.cluster.local:8080
+```
+
+### 4. Verify Deployment
+
+Check that the sandbox pod is running:
+
+```bash
+kubectl get pods -n deer-flow
+```
+
+You should see:
+```
+NAME                                 READY   STATUS    RESTARTS   AGE
+deer-flow-sandbox-xxxxxxxxxx-xxxxx   1/1     Running   0          1m
+```
+
+## Advanced Configuration
+
+### Custom Skills Path
+
+By default, the setup script uses `PROJECT_ROOT/skills`. You can specify a custom path:
+
+**Using command-line argument:**
+```bash
+./setup.sh --skills-path /custom/path/to/skills
+```
+
+**Using environment variable:**
+```bash
+SKILLS_PATH=/custom/path/to/skills ./setup.sh
+```
+
+### Custom Sandbox Image
+
+To use a different sandbox image:
+
+**Using command-line argument:**
+```bash
+./setup.sh --image your-registry/sandbox:tag
+```
+
+**Using environment variable:**
+```bash
+SANDBOX_IMAGE=your-registry/sandbox:tag ./setup.sh
+```
+
+### Skip Image Pull
+
+If you already have the image locally or want to pull it manually later:
+
+```bash
+./setup.sh --skip-pull
+```
+
+### Combined Options
+
+```bash
+./setup.sh --skip-pull --skills-path /custom/skills --image custom/sandbox:latest
+```
+
+## Manual Deployment
+
+If you prefer manual deployment or need more control:
+
+### 1. Create Namespace
+
+```bash
+kubectl apply -f namespace.yaml
+```
+
+### 2. Create Service
+
+```bash
+kubectl apply -f sandbox-service.yaml
+```
+
+### 3. Deploy Sandbox
+
+First, update the skills path in `sandbox-deployment.yaml`:
+
+```bash
+# Replace __SKILLS_PATH__ with your actual path
+sed 's|__SKILLS_PATH__|/Users/feng/Projects/deer-flow/skills|g' \
+  sandbox-deployment.yaml | kubectl apply -f -
+```
+
+Or manually edit `sandbox-deployment.yaml` and replace `__SKILLS_PATH__` with your skills directory path.
+
+### 4. Verify Deployment
+
+```bash
+# Check all resources
+kubectl get all -n deer-flow
+
+# Check pod status
+kubectl get pods -n deer-flow
+
+# Check pod logs
+kubectl logs -n deer-flow -l app=deer-flow-sandbox
+
+# Describe pod for detailed info
+kubectl describe pod -n deer-flow -l app=deer-flow-sandbox
+```
+
+## Configuration Options
+
+### Resource Limits
+
+Edit `sandbox-deployment.yaml` to adjust resource limits:
+
+```yaml
+resources:
+  requests:
+    cpu: 100m      # Minimum CPU
+    memory: 256Mi  # Minimum memory
+  limits:
+    cpu: 1000m     # Maximum CPU (1 core)
+    memory: 1Gi    # Maximum memory
+```
+
+### Scaling
+
+Adjust the number of replicas:
+
+```yaml
+spec:
+  replicas: 3  # Run 3 sandbox pods
+```
+
+Or scale dynamically:
+
+```bash
+kubectl scale deployment deer-flow-sandbox -n deer-flow --replicas=3
+```
+
+### Health Checks
+
+The deployment includes readiness and liveness probes:
+
+- **Readiness Probe**: Checks if the pod is ready to serve traffic
+- **Liveness Probe**: Restarts the pod if it becomes unhealthy
+
+Configure in `sandbox-deployment.yaml`:
+
+```yaml
+readinessProbe:
+  httpGet:
+    path: /v1/sandbox
+    port: 8080
+  initialDelaySeconds: 5
+  periodSeconds: 5
+  timeoutSeconds: 3
+  failureThreshold: 3
+```
+
+## Troubleshooting
+
+### Pod Not Starting
+
+Check pod status and events:
+
+```bash
+kubectl describe pod -n deer-flow -l app=deer-flow-sandbox
+```
+
+Common issues:
+- **ImagePullBackOff**: Docker image cannot be pulled
+  - Solution: Pre-pull image with `docker pull <image>`
+- **Skills path not found**: HostPath doesn't exist
+  - Solution: Verify the skills path exists on the host
+- **Resource constraints**: Not enough CPU/memory
+  - Solution: Adjust resource requests/limits
+
+### Service Not Accessible
+
+Verify the service is running:
+
+```bash
+kubectl get service -n deer-flow
+kubectl describe service deer-flow-sandbox -n deer-flow
+```
+
+Test connectivity from another pod:
+
+```bash
+kubectl run test-pod -n deer-flow --rm -it --image=curlimages/curl -- \
+  curl http://deer-flow-sandbox.deer-flow.svc.cluster.local:8080/v1/sandbox
+```
+
+### Check Logs
+
+View sandbox logs:
+
+```bash
+# Follow logs in real-time
+kubectl logs -n deer-flow -l app=deer-flow-sandbox -f
+
+# View logs from previous container (if crashed)
+kubectl logs -n deer-flow -l app=deer-flow-sandbox --previous
+```
+
+### Health Check Failures
+
+If pods show as not ready:
+
+```bash
+# Check readiness probe
+kubectl get events -n deer-flow --sort-by='.lastTimestamp'
+
+# Exec into pod to debug
+kubectl exec -it -n deer-flow <pod-name> -- /bin/sh
+```
+
+## Cleanup
+
+### Remove All Resources
+
+Using the setup script:
+
+```bash
+./setup.sh --cleanup
+```
+
+Or manually:
+
+```bash
+kubectl delete -f sandbox-deployment.yaml
+kubectl delete -f sandbox-service.yaml
+kubectl delete namespace deer-flow
+```
+
+### Remove Specific Resources
+
+```bash
+# Delete only the deployment (keeps namespace and service)
+kubectl delete deployment deer-flow-sandbox -n deer-flow
+
+# Delete pods (they will be recreated by deployment)
+kubectl delete pods -n deer-flow -l app=deer-flow-sandbox
+```
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────┐
+│         DeerFlow Backend                    │
+│  (config.yaml: base_url configured)         │
+└────────────────┬────────────────────────────┘
+                 │ HTTP requests
+                 ↓
+┌─────────────────────────────────────────────┐
+│    Kubernetes Service (ClusterIP)           │
+│  deer-flow-sandbox.deer-flow.svc:8080       │
+└────────────────┬────────────────────────────┘
+                 │ Load balancing
+                 ↓
+┌─────────────────────────────────────────────┐
+│         Sandbox Pods (replicas)             │
+│  ┌──────────┐  ┌──────────┐  ┌──────────┐  │
+│  │  Pod 1   │  │  Pod 2   │  │  Pod 3   │  │
+│  │ Port 8080│  │ Port 8080│  │ Port 8080│  │
+│  └──────────┘  └──────────┘  └──────────┘  │
+└────────────────┬────────────────────────────┘
+                 │ Volume mount
+                 ↓
+┌─────────────────────────────────────────────┐
+│         Host Skills Directory               │
+│    /path/to/deer-flow/skills                │
+└─────────────────────────────────────────────┘
+```
+
+## Setup Script Reference
+
+### Command-Line Options
+
+```bash
+./setup.sh [options]
+
+Options:
+  -h, --help              Show help message
+  -c, --cleanup           Remove all Kubernetes resources
+  -p, --skip-pull         Skip pulling sandbox image
+  --image <image>         Use custom sandbox image
+  --skills-path <path>    Custom skills directory path
+
+Environment Variables:
+  SANDBOX_IMAGE      Custom sandbox image
+  SKILLS_PATH        Custom skills path
+
+Examples:
+  ./setup.sh                                    # Use default settings
+  ./setup.sh --skills-path /custom/path         # Use custom skills path
+  ./setup.sh --skip-pull --image custom:tag     # Custom image, skip pull
+  SKILLS_PATH=/custom/path ./setup.sh           # Use env variable
+```
+
+## Production Considerations
+
+### Security
+
+1. **Network Policies**: Restrict pod-to-pod communication
+2. **RBAC**: Configure appropriate service account permissions
+3. **Pod Security**: Enable pod security standards
+4. **Image Security**: Scan images for vulnerabilities
+
+### High Availability
+
+1. **Multiple Replicas**: Run at least 3 replicas
+2. **Pod Disruption Budget**: Prevent all pods from being evicted
+3. **Node Affinity**: Distribute pods across nodes
+4. **Resource Quotas**: Set namespace resource limits
+
+### Monitoring
+
+1. **Prometheus**: Scrape metrics from pods
+2. **Logging**: Centralized log aggregation
+3. **Alerting**: Set up alerts for pod failures
+4. **Tracing**: Distributed tracing for requests
+
+### Storage
+
+For production, consider using PersistentVolume instead of hostPath:
+
+1. **Create PersistentVolume**: Define storage backend
+2. **Create PersistentVolumeClaim**: Request storage
+3. **Update Deployment**: Use PVC instead of hostPath
+
+See `skills-pv-pvc.yaml.bak` for reference implementation.
+
+## Next Steps
+
+After successful deployment:
+
+1. **Start Backend**: `make dev` or `make docker-start`
+2. **Test Sandbox**: Create a conversation and execute code
+3. **Monitor**: Watch pod logs and resource usage
+4. **Scale**: Adjust replicas based on workload
+
+## Support
+
+For issues and questions:
+
+- Check troubleshooting section above
+- Review pod logs: `kubectl logs -n deer-flow -l app=deer-flow-sandbox`
+- See main project documentation: [../../README.md](../../README.md)
+- Report issues on GitHub
--- a/docker/k8s/namespace.yaml
+++ b/docker/k8s/namespace.yaml
@@ -0,0 +1,7 @@
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: deer-flow
+  labels:
+    app.kubernetes.io/name: deer-flow
+    app.kubernetes.io/component: sandbox
--- a/docker/k8s/sandbox-deployment.yaml
+++ b/docker/k8s/sandbox-deployment.yaml
@@ -0,0 +1,65 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: deer-flow-sandbox
+  namespace: deer-flow
+  labels:
+    app.kubernetes.io/name: deer-flow
+    app.kubernetes.io/component: sandbox
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: deer-flow-sandbox
+  template:
+    metadata:
+      labels:
+        app: deer-flow-sandbox
+        app.kubernetes.io/name: deer-flow
+        app.kubernetes.io/component: sandbox
+    spec:
+      containers:
+        - name: sandbox
+          image: enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest
+          ports:
+            - name: http
+              containerPort: 8080
+              protocol: TCP
+          readinessProbe:
+            httpGet:
+              path: /v1/sandbox
+              port: 8080
+            initialDelaySeconds: 5
+            periodSeconds: 5
+            timeoutSeconds: 3
+            failureThreshold: 3
+          livenessProbe:
+            httpGet:
+              path: /v1/sandbox
+              port: 8080
+            initialDelaySeconds: 10
+            periodSeconds: 10
+            timeoutSeconds: 3
+            failureThreshold: 3
+          resources:
+            requests:
+              cpu: 100m
+              memory: 256Mi
+            limits:
+              cpu: 1000m
+              memory: 1Gi
+          volumeMounts:
+            - name: skills
+              mountPath: /mnt/skills
+              readOnly: true
+          securityContext:
+            privileged: false
+            allowPrivilegeEscalation: true
+      volumes:
+        - name: skills
+          hostPath:
+            # Path to skills directory on the host machine
+            # This will be replaced by setup.sh with the actual path
+            path: __SKILLS_PATH__
+            type: Directory
+      restartPolicy: Always
--- a/docker/k8s/sandbox-service.yaml
+++ b/docker/k8s/sandbox-service.yaml
@@ -0,0 +1,21 @@
+apiVersion: v1
+kind: Service
+metadata:
+  name: deer-flow-sandbox
+  namespace: deer-flow
+  labels:
+    app.kubernetes.io/name: deer-flow
+    app.kubernetes.io/component: sandbox
+spec:
+  type: ClusterIP
+  clusterIP: None # Headless service for direct Pod DNS access
+  ports:
+    - name: http
+      port: 8080
+      targetPort: 8080
+      protocol: TCP
+  selector:
+    app: deer-flow-sandbox
+  # Enable DNS-based service discovery
+  # Pods will be accessible at: {pod-name}.deer-flow-sandbox.deer-flow.svc.cluster.local:8080
+  publishNotReadyAddresses: false
--- a/docker/k8s/setup.sh
+++ b/docker/k8s/setup.sh
@@ -0,0 +1,245 @@
+#!/bin/bash
+
+# Kubernetes Sandbox Initialization Script for Deer-Flow
+# This script sets up the Kubernetes environment for the sandbox provider
+
+set -e
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "${SCRIPT_DIR}/../.." && pwd)"
+
+# Default sandbox image
+DEFAULT_SANDBOX_IMAGE="enterprise-public-cn-beijing.cr.volces.com/vefaas-public/all-in-one-sandbox:latest"
+
+# Colors for output
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+BLUE='\033[0;34m'
+NC='\033[0m' # No Color
+
+echo -e "${BLUE}╔════════════════════════════════════════════╗${NC}"
+echo -e "${BLUE}║   Deer-Flow Kubernetes Sandbox Setup       ║${NC}"
+echo -e "${BLUE}╚════════════════════════════════════════════╝${NC}"
+echo
+
+# Function to print status messages
+info() {
+    echo -e "${BLUE}[INFO]${NC} $1"
+}
+
+success() {
+    echo -e "${GREEN}[SUCCESS]${NC} $1"
+}
+
+warn() {
+    echo -e "${YELLOW}[WARN]${NC} $1"
+}
+
+error() {
+    echo -e "${RED}[ERROR]${NC} $1"
+}
+
+# Check if kubectl is installed
+check_kubectl() {
+    info "Checking kubectl installation..."
+    if ! command -v kubectl &> /dev/null; then
+        error "kubectl is not installed. Please install kubectl first."
+        echo "  - macOS: brew install kubectl"
+        echo "  - Linux: https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/"
+        exit 1
+    fi
+    success "kubectl is installed"
+}
+
+# Check if Kubernetes cluster is accessible
+check_cluster() {
+    info "Checking Kubernetes cluster connection..."
+    if ! kubectl cluster-info &> /dev/null; then
+        error "Cannot connect to Kubernetes cluster."
+        echo "Please ensure:"
+        echo "  - Docker Desktop: Settings → Kubernetes → Enable Kubernetes"
+        echo "  - Or OrbStack: Enable Kubernetes in settings"
+        echo "  - Or Minikube: minikube start"
+        exit 1
+    fi
+    success "Connected to Kubernetes cluster"
+}
+
+# Apply Kubernetes resources
+apply_resources() {
+    info "Applying Kubernetes resources..."
+    
+    # Determine skills path
+    SKILLS_PATH="${SKILLS_PATH:-${PROJECT_ROOT}/skills}"
+    info "Using skills path: ${SKILLS_PATH}"
+    
+    # Validate skills path exists
+    if [[ ! -d "${SKILLS_PATH}" ]]; then
+        warn "Skills path does not exist: ${SKILLS_PATH}"
+        warn "Creating directory..."
+        mkdir -p "${SKILLS_PATH}"
+    fi
+    
+    echo "  → Creating namespace..."
+    kubectl apply -f "${SCRIPT_DIR}/namespace.yaml"
+    
+    echo "  → Creating sandbox service..."
+    kubectl apply -f "${SCRIPT_DIR}/sandbox-service.yaml"
+    
+    echo "  → Creating sandbox deployment with skills path: ${SKILLS_PATH}"
+    # Replace __SKILLS_PATH__ placeholder with actual path
+    if [[ "$OSTYPE" == "darwin"* ]]; then
+        # macOS
+        sed "s|__SKILLS_PATH__|${SKILLS_PATH}|g" "${SCRIPT_DIR}/sandbox-deployment.yaml" | kubectl apply -f -
+    else
+        # Linux
+        sed "s|__SKILLS_PATH__|${SKILLS_PATH}|g" "${SCRIPT_DIR}/sandbox-deployment.yaml" | kubectl apply -f -
+    fi
+    
+    success "All Kubernetes resources applied"
+}
+
+# Verify deployment
+verify_deployment() {
+    info "Verifying deployment..."
+    
+    echo "  → Checking namespace..."
+    kubectl get namespace deer-flow
+    
+    echo "  → Checking service..."
+    kubectl get service -n deer-flow
+    
+    echo "  → Checking deployment..."
+    kubectl get deployment -n deer-flow
+    
+    echo "  → Checking pods..."
+    kubectl get pods -n deer-flow
+    
+    success "Deployment verified"
+}
+
+# Pull sandbox image
+pull_image() {
+    info "Checking sandbox image..."
+    
+    IMAGE="${SANDBOX_IMAGE:-$DEFAULT_SANDBOX_IMAGE}"
+    
+    # Check if image already exists locally
+    if docker image inspect "$IMAGE" &> /dev/null; then
+        success "Image already exists locally: $IMAGE"
+        return 0
+    fi
+    
+    info "Pulling sandbox image (this may take a few minutes on first run)..."
+    echo "  → Image: $IMAGE"
+    echo
+    
+    if docker pull "$IMAGE"; then
+        success "Image pulled successfully"
+    else
+        warn "Failed to pull image. Pod startup may be slow on first run."
+        echo "  You can manually pull the image later with:"
+        echo "    docker pull $IMAGE"
+    fi
+}
+
+# Print next steps
+print_next_steps() {
+    echo
+    echo -e "${BLUE}╔════════════════════════════════════════════╗${NC}"
+    echo -e "${BLUE}║   Setup Complete!                          ║${NC}"
+    echo -e "${BLUE}╚════════════════════════════════════════════╝${NC}"
+    echo
+    echo -e "${YELLOW}To enable Kubernetes sandbox, add the following to backend/config.yaml:${NC}"
+    echo
+    echo -e "${GREEN}sandbox:${NC}"
+    echo -e "${GREEN}  use: src.community.aio_sandbox:AioSandboxProvider${NC}"
+    echo -e "${GREEN}  base_url: http://deer-flow-sandbox.deer-flow.svc.cluster.local:8080${NC}"
+    echo
+    echo
+    echo -e "${GREEN}Next steps:${NC}"
+    echo "  make dev                # Start backend and frontend in development mode"
+    echo "  make docker-start       # Start backend and frontend in Docker containers"
+    echo
+}
+
+# Cleanup function
+cleanup() {
+    if [[ "$1" == "--cleanup" ]] || [[ "$1" == "-c" ]]; then
+        info "Cleaning up Kubernetes resources..."
+        kubectl delete -f "${SCRIPT_DIR}/sandbox-deployment.yaml" --ignore-not-found=true
+        kubectl delete -f "${SCRIPT_DIR}/sandbox-service.yaml" --ignore-not-found=true
+        kubectl delete -f "${SCRIPT_DIR}/namespace.yaml" --ignore-not-found=true
+        success "Cleanup complete"
+        exit 0
+    fi
+}
+
+# Show help
+show_help() {
+    echo "Usage: $0 [options]"
+    echo
+    echo "Options:"
+    echo "  -h, --help              Show this help message"
+    echo "  -c, --cleanup           Remove all Kubernetes resources"
+    echo "  -p, --skip-pull         Skip pulling sandbox image"
+    echo "  --image <image>         Use custom sandbox image"
+    echo "  --skills-path <path>    Custom skills directory path"
+    echo
+    echo "Environment variables:"
+    echo "  SANDBOX_IMAGE      Custom sandbox image (default: $DEFAULT_SANDBOX_IMAGE)"
+    echo "  SKILLS_PATH        Custom skills path (default: PROJECT_ROOT/skills)"
+    echo
+    echo "Examples:"
+    echo "  $0                                    # Use default settings"
+    echo "  $0 --skills-path /custom/path         # Use custom skills path"
+    echo "  SKILLS_PATH=/custom/path $0           # Use env variable"
+    echo
+    exit 0
+}
+
+# Parse arguments
+SKIP_PULL=false
+while [[ $# -gt 0 ]]; do
+    case $1 in
+        -h|--help)
+            show_help
+            ;;
+        -c|--cleanup)
+            cleanup "$1"
+            ;;
+        -p|--skip-pull)
+            SKIP_PULL=true
+            shift
+            ;;
+        --image)
+            SANDBOX_IMAGE="$2"
+            shift 2
+            ;;
+        --skills-path)
+            SKILLS_PATH="$2"
+            shift 2
+            ;;
+        *)
+            shift
+            ;;
+    esac
+done
+
+# Main execution
+main() {
+    check_kubectl
+    check_cluster
+    
+    # Pull image first to avoid Pod startup timeout
+    if [[ "$SKIP_PULL" == false ]]; then
+        pull_image
+    fi
+    
+    apply_resources
+    verify_deployment
+    print_next_steps
+}
+
+main