OpenClaw 2026.3.1 Security Evaluation: Grade B

SpiderShield Team·March 10, 2026·6 min read

OpenClawEvaluationSecuritySpiderScore

OpenClaw is the most popular open-source AI agent framework, with 297K+ GitHub stars and hundreds of thousands of active installs. We evaluated version 2026.3.1 (released March 2) through SpiderRating's full security evaluation -- scanning 3,566 source files, analyzing 202 tool definitions, and checking the entire skill ecosystem. Note: the latest release at time of writing is v2026.3.8; findings here apply to the 2026.3.1 codebase.

SpiderScore: 7.3/10 (Grade B)

Layer	Score	Weight
Description Quality	3.0/10	35%
Security Analysis	10.0/10	35%
Metadata Health	9.2/10	30%
Overall	7.3/10 (B)

The good news: OpenClaw's codebase is clean from a security standpoint. No command injection, no credential theft, no supply chain risks in the core platform. The bad news: its tool descriptions are holding it back.

Security: 10.0/10 -- no issues found

We scanned the entire codebase against 46 security rules covering command injection, path traversal, SQL injection, SSRF, unsafe deserialization, and credential exposure. Zero real findings.

One pattern worth noting: the browser automation tool (pw-tools-core.interactions.ts) uses new Function() + eval() to execute user-provided JavaScript in browser contexts. This is by design -- browser automation requires it, and the code has explicit eslint-disable annotations confirming the intent. It's not a vulnerability, but users should understand that the browser tool has full script execution capability.

We also manually reviewed all exec() calls across the codebase. Every instance uses typed wrappers with explicit argument arrays (e.g., exec(tailscaleBin, ["status", "--json"])), not shell string interpolation. The infra and security modules follow secure subprocess patterns consistently.

Description Quality: 3.0/10 -- the real weakness

This is where OpenClaw falls short. With 202 tool definitions, we evaluated each against 5 dimensions:

Intent Clarity -- Does the description explain what the tool does?
Permission Scope -- Does it disclose what resources it accesses?
Side Effects -- Does it mention what it modifies?
Capability Disclosure -- Does it explain the full range of capabilities?
Operational Boundaries -- Does it say when NOT to use the tool?

Most tool descriptions cover basic functionality but skip critical context that LLM agents need. For example, a tool that writes files should say "Creates or overwrites the file at the specified path" -- not just "Write a file." An agent relying on vague descriptions may misuse tools, skip safer alternatives, or fail to anticipate side effects.

This matters because AI agents make tool selection decisions entirely based on descriptions. A 3.0/10 score means agents using OpenClaw tools are operating with incomplete information about what those tools can do.

Metadata: 9.2/10 -- strong project health

OpenClaw scores near-perfect on project health signals:

297K+ GitHub stars, 56K+ forks
Active development (last push March 8, two days before our scan)
12K+ open issues (high volume, but actively triaged)
Clear provenance and MIT licensing
Strong contributor community

What we learned from the scan

Our scanner had false positives too

Transparency note: our initial automated scan flagged 18 issues (4 critical, 10 high, 4 medium) and rated OpenClaw an F. After manual review, every single finding was a false positive.

The biggest culprit: our dangerous_eval rule matched JavaScript's RegExp.prototype.exec() as if it were Python's exec(). A date-parsing regex like /^\d{4}-\d{2}-\d{2}$/.exec(input) is completely safe -- but our scanner couldn't tell the difference.

We've since fixed three scanner rules: 1. Python-only rules (dangerous_eval, sql_injection) no longer run on TypeScript files 2. Path traversal detection now requires HTTP request context (req.params), not just any function parameter 3. The fetch() SSRF rule was removed for JS/TS -- it's a standard API, not a vulnerability indicator

This is why we manually review every evaluation before publishing. Automated scanning is a starting point, not a final answer.

Monorepo scope limiting matters

OpenClaw has 3,566 source files. Our scope limiter reduced this to 325 files in MCP-related directories. Manual review confirmed no real issues were hiding in the excluded code -- but the limiter was too aggressive, excluding core directories like src/browser/ and src/infra/ that aren't named with MCP keywords but are part of the agent platform.

Recommendations

For OpenClaw users

Upgrade to the latest release (v2026.3.8+) -- Security patches are cumulative across releases
Run spiderrating check --skills after any update -- Verify your config and installed skills
Pin your skills with spiderrating pin add-all -- Detect tampering via SHA-256 hashing
Enable sandbox mode -- Set sandbox.mode to all in openclaw.json
Be aware of browser tool capabilities -- The evaluate tool executes arbitrary JavaScript in page context

For the OpenClaw team

The biggest opportunity is improving tool descriptions. Going from 3.0 to 7.0 on descriptions would push the overall SpiderScore from B to A. Specifically: - Add side-effect declarations to file/network tools - Specify operational boundaries (what the tool won't do) - Document permission requirements for each tool

Try it yourself

pip install spidershield
spidershield agent-check ~/.openclaw

Or run the full SpiderRating evaluation:

spiderrating rate openclaw/openclaw
spiderrating check --skills --verify

Browse all ratings in our server directory, or check the leaderboard for top-rated tools.

Related reads: - How We Score MCP Servers — deep dive into the SpiderScore model - 98% of MCP Tools Don't Tell AI Agents When to Use Them — the description quality crisis - The State of MCP Security in 2026 — ecosystem-wide findings

Try it yourself with our free scanner or read the methodology.

← Back to Blog