mirror of
https://gitee.com/wanwujie/deer-flow
synced 2026-04-03 06:12:14 +08:00
Add GuardrailMiddleware that evaluates every tool call before execution. Three provider options: built-in AllowlistProvider (zero deps), OAP passport providers (open standard), or custom providers loaded by class path. - GuardrailProvider protocol with GuardrailRequest/Decision dataclasses - GuardrailMiddleware (AgentMiddleware, position 5 in chain) - AllowlistProvider for simple deny/allow by tool name - GuardrailsConfig (Pydantic singleton, loaded from config.yaml) - 25 tests covering allow/deny, fail-closed/open, async, GraphBubbleUp - Comprehensive docs at backend/docs/GUARDRAILS.md Closes #1213 Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
386 lines
18 KiB
Markdown
386 lines
18 KiB
Markdown
# Guardrails: Pre-Tool-Call Authorization
|
|
|
|
> **Context:** [Issue #1213](https://github.com/bytedance/deer-flow/issues/1213) — DeerFlow has Docker sandboxing and human approval via `ask_clarification`, but no deterministic, policy-driven authorization layer for tool calls. An agent running autonomous multi-step tasks can execute any loaded tool with any arguments. Guardrails add a middleware that evaluates every tool call against a policy **before** execution.
|
|
|
|
## Why Guardrails
|
|
|
|
```
|
|
Without guardrails: With guardrails:
|
|
|
|
Agent Agent
|
|
│ │
|
|
▼ ▼
|
|
┌──────────┐ ┌──────────┐
|
|
│ bash │──▶ executes immediately │ bash │──▶ GuardrailMiddleware
|
|
│ rm -rf / │ │ rm -rf / │ │
|
|
└──────────┘ └──────────┘ ▼
|
|
┌──────────────┐
|
|
│ Provider │
|
|
│ evaluates │
|
|
│ against │
|
|
│ policy │
|
|
└──────┬───────┘
|
|
│
|
|
┌─────┴─────┐
|
|
│ │
|
|
ALLOW DENY
|
|
│ │
|
|
▼ ▼
|
|
Tool runs Agent sees:
|
|
normally "Guardrail denied:
|
|
rm -rf blocked"
|
|
```
|
|
|
|
- **Sandboxing** provides process isolation but not semantic authorization. A sandboxed `bash` can still `curl` data out.
|
|
- **Human approval** (`ask_clarification`) requires a human in the loop for every action. Not viable for autonomous workflows.
|
|
- **Guardrails** provide deterministic, policy-driven authorization that works without human intervention.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ Middleware Chain │
|
|
│ │
|
|
│ 1. ThreadDataMiddleware ─── per-thread dirs │
|
|
│ 2. UploadsMiddleware ─── file upload tracking │
|
|
│ 3. SandboxMiddleware ─── sandbox acquisition │
|
|
│ 4. DanglingToolCallMiddleware ── fix incomplete tool calls │
|
|
│ 5. GuardrailMiddleware ◄──── EVALUATES EVERY TOOL CALL │
|
|
│ 6. ToolErrorHandlingMiddleware ── convert exceptions to messages │
|
|
│ 7-12. (Summarization, Title, Memory, Vision, Subagent, Clarify) │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────────┐
|
|
│ GuardrailProvider │ ◄── pluggable: any class
|
|
│ (configured in YAML) │ with evaluate/aevaluate
|
|
└────────────┬─────────────┘
|
|
│
|
|
┌─────────┼──────────────┐
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
Built-in OAP Passport Custom
|
|
Allowlist Provider Provider
|
|
(zero dep) (open standard) (your code)
|
|
│
|
|
Any implementation
|
|
(e.g. APort, or
|
|
your own evaluator)
|
|
```
|
|
|
|
The `GuardrailMiddleware` implements `wrap_tool_call` / `awrap_tool_call` (the same `AgentMiddleware` pattern used by `ToolErrorHandlingMiddleware`). It:
|
|
|
|
1. Builds a `GuardrailRequest` with tool name, arguments, and passport reference
|
|
2. Calls `provider.evaluate(request)` on whatever provider is configured
|
|
3. If **deny**: returns `ToolMessage(status="error")` with the reason -- agent sees the denial and adapts
|
|
4. If **allow**: passes through to the actual tool handler
|
|
5. If **provider error** and `fail_closed=true` (default): blocks the call
|
|
6. `GraphBubbleUp` exceptions (LangGraph control signals) are always propagated, never caught
|
|
|
|
## Three Provider Options
|
|
|
|
### Option 1: Built-in AllowlistProvider (Zero Dependencies)
|
|
|
|
The simplest option. Ships with DeerFlow. Block or allow tools by name. No external packages, no passport, no network.
|
|
|
|
**config.yaml:**
|
|
```yaml
|
|
guardrails:
|
|
enabled: true
|
|
provider:
|
|
use: deerflow.guardrails.builtin:AllowlistProvider
|
|
config:
|
|
denied_tools: ["bash", "write_file"]
|
|
```
|
|
|
|
This blocks `bash` and `write_file` for all requests. All other tools pass through.
|
|
|
|
You can also use an allowlist (only these tools are permitted):
|
|
```yaml
|
|
guardrails:
|
|
enabled: true
|
|
provider:
|
|
use: deerflow.guardrails.builtin:AllowlistProvider
|
|
config:
|
|
allowed_tools: ["web_search", "read_file", "ls"]
|
|
```
|
|
|
|
**Try it:**
|
|
1. Add the config above to your `config.yaml`
|
|
2. Start DeerFlow: `make dev`
|
|
3. Ask the agent: "Use bash to run echo hello"
|
|
4. The agent sees: `Guardrail denied: tool 'bash' was blocked (oap.tool_not_allowed)`
|
|
|
|
### Option 2: OAP Passport Provider (Policy-Based)
|
|
|
|
For policy enforcement based on the [Open Agent Passport (OAP)](https://github.com/aporthq/aport-spec) open standard. An OAP passport is a JSON document that declares an agent's identity, capabilities, and operational limits. Any provider that reads an OAP passport and returns OAP-compliant decisions works with DeerFlow.
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ OAP Passport (JSON) │
|
|
│ (open standard, any provider) │
|
|
│ { │
|
|
│ "spec_version": "oap/1.0", │
|
|
│ "status": "active", │
|
|
│ "capabilities": [ │
|
|
│ {"id": "system.command.execute"}, │
|
|
│ {"id": "data.file.read"}, │
|
|
│ {"id": "data.file.write"}, │
|
|
│ {"id": "web.fetch"}, │
|
|
│ {"id": "mcp.tool.execute"} │
|
|
│ ], │
|
|
│ "limits": { │
|
|
│ "system.command.execute": { │
|
|
│ "allowed_commands": ["git", "npm", "node", "ls"], │
|
|
│ "blocked_patterns": ["rm -rf", "sudo", "chmod 777"] │
|
|
│ } │
|
|
│ } │
|
|
│ } │
|
|
└──────────────────────────┬──────────────────────────────────┘
|
|
│
|
|
Any OAP-compliant provider
|
|
┌────────────────┼────────────────┐
|
|
│ │ │
|
|
Your own APort (ref. Other future
|
|
evaluator implementation) implementations
|
|
```
|
|
|
|
**Creating a passport manually:**
|
|
|
|
An OAP passport is just a JSON file. You can create one by hand following the [OAP specification](https://github.com/aporthq/aport-spec/blob/main/oap/oap-spec.md) and validate it against the [JSON schema](https://github.com/aporthq/aport-spec/blob/main/oap/passport-schema.json). See the [examples](https://github.com/aporthq/aport-spec/tree/main/oap/examples) directory for templates.
|
|
|
|
**Using APort as a reference implementation:**
|
|
|
|
[APort Agent Guardrails](https://github.com/aporthq/aport-agent-guardrails) is one open-source (Apache 2.0) implementation of an OAP provider. It handles passport creation, local evaluation, and optional hosted API evaluation.
|
|
|
|
```bash
|
|
pip install aport-agent-guardrails
|
|
aport setup --framework deerflow
|
|
```
|
|
|
|
This creates:
|
|
- `~/.aport/deerflow/config.yaml` -- evaluator config (local or API mode)
|
|
- `~/.aport/deerflow/aport/passport.json` -- OAP passport with capabilities and limits
|
|
|
|
**config.yaml (using APort as the provider):**
|
|
```yaml
|
|
guardrails:
|
|
enabled: true
|
|
provider:
|
|
use: aport_guardrails.providers.generic:OAPGuardrailProvider
|
|
```
|
|
|
|
**config.yaml (using your own OAP provider):**
|
|
```yaml
|
|
guardrails:
|
|
enabled: true
|
|
provider:
|
|
use: my_oap_provider:MyOAPProvider
|
|
config:
|
|
passport_path: ./my-passport.json
|
|
```
|
|
|
|
Any provider that accepts `framework` as a kwarg and implements `evaluate`/`aevaluate` works. The OAP standard defines the passport format and decision codes; DeerFlow doesn't care which provider reads them.
|
|
|
|
**What the passport controls:**
|
|
|
|
| Passport field | What it does | Example |
|
|
|---|---|---|
|
|
| `capabilities[].id` | Which tool categories the agent can use | `system.command.execute`, `data.file.write` |
|
|
| `limits.*.allowed_commands` | Which commands are allowed | `["git", "npm", "node"]` or `["*"]` for all |
|
|
| `limits.*.blocked_patterns` | Patterns always denied | `["rm -rf", "sudo", "chmod 777"]` |
|
|
| `status` | Kill switch | `active`, `suspended`, `revoked` |
|
|
|
|
**Evaluation modes (provider-dependent):**
|
|
|
|
OAP providers may support different evaluation modes. For example, the APort reference implementation supports:
|
|
|
|
| Mode | How it works | Network | Latency |
|
|
|---|---|---|---|
|
|
| **Local** | Evaluates passport locally (bash script). | None | ~300ms |
|
|
| **API** | Sends passport + context to a hosted evaluator. Signed decisions. | Yes | ~65ms |
|
|
|
|
A custom OAP provider can implement any evaluation strategy -- the DeerFlow middleware doesn't care how the provider reaches its decision.
|
|
|
|
**Try it:**
|
|
1. Install and set up as above
|
|
2. Start DeerFlow and ask: "Create a file called test.txt with content hello"
|
|
3. Then ask: "Now delete it using bash rm -rf"
|
|
4. Guardrail blocks it: `oap.blocked_pattern: Command contains blocked pattern: rm -rf`
|
|
|
|
### Option 3: Custom Provider (Bring Your Own)
|
|
|
|
Any Python class with `evaluate(request)` and `aevaluate(request)` methods works. No base class or inheritance needed -- it's a structural protocol.
|
|
|
|
```python
|
|
# my_guardrail.py
|
|
|
|
class MyGuardrailProvider:
|
|
name = "my-company"
|
|
|
|
def evaluate(self, request):
|
|
from deerflow.guardrails.provider import GuardrailDecision, GuardrailReason
|
|
|
|
# Example: block any bash command containing "delete"
|
|
if request.tool_name == "bash" and "delete" in str(request.tool_input):
|
|
return GuardrailDecision(
|
|
allow=False,
|
|
reasons=[GuardrailReason(code="custom.blocked", message="delete not allowed")],
|
|
policy_id="custom.v1",
|
|
)
|
|
return GuardrailDecision(allow=True, reasons=[GuardrailReason(code="oap.allowed")])
|
|
|
|
async def aevaluate(self, request):
|
|
return self.evaluate(request)
|
|
```
|
|
|
|
**config.yaml:**
|
|
```yaml
|
|
guardrails:
|
|
enabled: true
|
|
provider:
|
|
use: my_guardrail:MyGuardrailProvider
|
|
```
|
|
|
|
Make sure `my_guardrail.py` is on the Python path (e.g. in the backend directory or installed as a package).
|
|
|
|
**Try it:**
|
|
1. Create `my_guardrail.py` in the backend directory
|
|
2. Add the config
|
|
3. Start DeerFlow and ask: "Use bash to delete test.txt"
|
|
4. Your provider blocks it
|
|
|
|
## Implementing a Provider
|
|
|
|
### Required Interface
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────┐
|
|
│ GuardrailProvider Protocol │
|
|
│ │
|
|
│ name: str │
|
|
│ │
|
|
│ evaluate(request: GuardrailRequest) │
|
|
│ -> GuardrailDecision │
|
|
│ │
|
|
│ aevaluate(request: GuardrailRequest) (async) │
|
|
│ -> GuardrailDecision │
|
|
└──────────────────────────────────────────────────┘
|
|
|
|
┌──────────────────────────┐ ┌──────────────────────────┐
|
|
│ GuardrailRequest │ │ GuardrailDecision │
|
|
│ │ │ │
|
|
│ tool_name: str │ │ allow: bool │
|
|
│ tool_input: dict │ │ reasons: [GuardrailReason]│
|
|
│ agent_id: str | None │ │ policy_id: str | None │
|
|
│ thread_id: str | None │ │ metadata: dict │
|
|
│ is_subagent: bool │ │ │
|
|
│ timestamp: str │ │ GuardrailReason: │
|
|
│ │ │ code: str │
|
|
└──────────────────────────┘ │ message: str │
|
|
└──────────────────────────┘
|
|
```
|
|
|
|
### DeerFlow Tool Names
|
|
|
|
These are the tool names your provider will see in `request.tool_name`:
|
|
|
|
| Tool | What it does |
|
|
|---|---|
|
|
| `bash` | Shell command execution |
|
|
| `write_file` | Create/overwrite a file |
|
|
| `str_replace` | Edit a file (find and replace) |
|
|
| `read_file` | Read file content |
|
|
| `ls` | List directory |
|
|
| `web_search` | Web search query |
|
|
| `web_fetch` | Fetch URL content |
|
|
| `image_search` | Image search |
|
|
| `present_file` | Present file to user |
|
|
| `view_image` | Display image |
|
|
| `ask_clarification` | Ask user a question |
|
|
| `task` | Delegate to subagent |
|
|
| `mcp__*` | MCP tools (dynamic) |
|
|
|
|
### OAP Reason Codes
|
|
|
|
Standard codes used by the [OAP specification](https://github.com/aporthq/aport-spec):
|
|
|
|
| Code | Meaning |
|
|
|---|---|
|
|
| `oap.allowed` | Tool call authorized |
|
|
| `oap.tool_not_allowed` | Tool not in allowlist |
|
|
| `oap.command_not_allowed` | Command not in allowed_commands |
|
|
| `oap.blocked_pattern` | Command matches a blocked pattern |
|
|
| `oap.limit_exceeded` | Operation exceeds a limit |
|
|
| `oap.passport_suspended` | Passport status is suspended/revoked |
|
|
| `oap.evaluator_error` | Provider crashed (fail-closed) |
|
|
|
|
### Provider Loading
|
|
|
|
DeerFlow loads providers via `resolve_variable()` -- the same mechanism used for models, tools, and sandbox providers. The `use:` field is a Python class path: `package.module:ClassName`.
|
|
|
|
The provider is instantiated with `**config` kwargs if `config:` is set, plus `framework="deerflow"` is always injected. Accept `**kwargs` to stay forward-compatible:
|
|
|
|
```python
|
|
class YourProvider:
|
|
def __init__(self, framework: str = "generic", **kwargs):
|
|
# framework="deerflow" tells you which config dir to use
|
|
...
|
|
```
|
|
|
|
## Configuration Reference
|
|
|
|
```yaml
|
|
guardrails:
|
|
# Enable/disable guardrail middleware (default: false)
|
|
enabled: true
|
|
|
|
# Block tool calls if provider raises an exception (default: true)
|
|
fail_closed: true
|
|
|
|
# Passport reference -- passed as request.agent_id to the provider.
|
|
# File path, hosted agent ID, or null (provider resolves from its config).
|
|
passport: null
|
|
|
|
# Provider: loaded by class path via resolve_variable
|
|
provider:
|
|
use: deerflow.guardrails.builtin:AllowlistProvider
|
|
config: # optional kwargs passed to provider.__init__
|
|
denied_tools: ["bash"]
|
|
```
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
cd backend
|
|
uv run python -m pytest tests/test_guardrail_middleware.py -v
|
|
```
|
|
|
|
25 tests covering:
|
|
- AllowlistProvider: allow, deny, both allowlist+denylist, async
|
|
- GuardrailMiddleware: allow passthrough, deny with OAP codes, fail-closed, fail-open, passport forwarding, empty reasons fallback, empty tool name, protocol isinstance check
|
|
- Async paths: awrap_tool_call for allow, deny, fail-closed, fail-open
|
|
- GraphBubbleUp: LangGraph control signals propagate through (not caught)
|
|
- Config: defaults, from_dict, singleton load/reset
|
|
|
|
## Files
|
|
|
|
```
|
|
packages/harness/deerflow/guardrails/
|
|
__init__.py # Public exports
|
|
provider.py # GuardrailProvider protocol, GuardrailRequest, GuardrailDecision
|
|
middleware.py # GuardrailMiddleware (AgentMiddleware subclass)
|
|
builtin.py # AllowlistProvider (zero deps)
|
|
|
|
packages/harness/deerflow/config/
|
|
guardrails_config.py # GuardrailsConfig Pydantic model + singleton
|
|
|
|
packages/harness/deerflow/agents/middlewares/
|
|
tool_error_handling_middleware.py # Registers GuardrailMiddleware in chain
|
|
|
|
config.example.yaml # Three provider options documented
|
|
tests/test_guardrail_middleware.py # 25 tests
|
|
docs/GUARDRAILS.md # This file
|
|
```
|