We Scanned 5,928 MCP Servers, Then Manually Audited the Worst Ones
> TL;DR: We scanned 5,928 MCP servers with 46+ automated rules. 114 scored Grade F. We then manually audited the top ones and found our scanner was wrong 14% of the time. We fixed the false positives, corrected 16 ratings, and published everything. Transparency > perfection.
---
The Numbers (Post-Calibration)
| Grade | Count | % | Meaning |
|---|---|---|---|
| A (9.0+) | 0 | 0% | Nobody reaches top security |
| B (7.0-8.9) | 268 | 4.5% | Good |
| C (5.0-6.9) | 4,373 | 73.8% | Average |
| D (3.0-4.9) | 1,189 | 20.1% | Below average |
| F (0-2.9) | 98 | 1.7% | Critical issues |
Average score: 4.81/10. Zero servers achieve Grade A.
The Most Dangerous MCP Servers (Verified)
After automated scanning AND manual source code review, these are the most popular servers with confirmed critical vulnerabilities:
1. hangwin/mcp-chrome (10,789 stars) — Grade F, 9 critical
A Chrome browser automation tool with real, unpatched vulnerabilities:
- `readBase64File()` accepts arbitrary file paths with zero validation. Can read
/etc/shadow,~/.ssh/id_rsa, browser credentials — anything on the filesystem. ThecleanupFile()function in the *same file* has a temp-directory check, proving the developer knew about path restrictions but missed the read operations. - 9 instances of `new Function()` for JavaScript execution in browser tabs. This is by-design (browser automation requires JS injection), but in MCP context, a prompt injection attack means arbitrary JS in your browser session — stealing cookies, session tokens, or performing actions as you.
This is the most dangerous MCP server we've found. The path traversal in file-handler.ts is a genuine vulnerability, not a design choice.
2. Klavis-AI/klavis (5,666 stars) — Grade F, 5 critical + 210 high
The largest attack surface in the MCP ecosystem: 1,369 tools with 5 critical and 210 high-severity issues. This is an aggregator that wraps many services — the sheer volume of tools means a vast number of potential entry points.
3. wonderwhy-er/DesktopCommanderMCP (5,711 stars) — Grade F, 2 critical + 7 high
A desktop control tool where shell execution IS the product. The execute_command tool runs arbitrary shell commands by design. In MCP context, this means prompt injection = attacker runs commands on your machine. The allowedDirectories mechanism defaults to the entire home directory.
Not a bug — a dangerous feature. Use with strict policy or sandboxed environment only.
What We Got Wrong (And Fixed)
We manually audited 6 high-star F-grade servers and found our scanner was wrong on 3 of them:
| Server | Stars | What Scanner Said | What Code Does | Action |
|---|---|---|---|---|
| sansan0/TrendRadar | 49,093 | "SQL injection via f-string" | f-string only generates ? placeholders; values properly parameterized | Fixed: F → C (6.84) |
| AgentDeskAI/browser-tools-mcp | 7,126 | "Command injection via exec()" | exec() argument is hardcoded AppleScript; no MCP input reaches it | Fixed: F → C (6.91) |
| bytebase/dbhub | 2,328 | "SQL injection (19 instances)" | All 19 use ? parameterized queries correctly | Fixed: F → C (6.82) |
We then batch-rescanned all 111 F-grade servers with the fixed rules. 16 ratings were corrected (14% false positive rate). The other 86% of F-grade ratings were confirmed accurate.
What Caused the False Positives
Our scanner uses regex + Semgrep taint analysis. Two patterns caused most FPs:
- `.execute(f"...{placeholders}...")` — the scanner sees
f-string + execute()and flags SQL injection. But when the f-string only interpolates?placeholder counts (not data), it's the standard safe parameterized query pattern. - `execSync(variable)` — flagged as command injection whenever the argument isn't a string literal. But when
variablecontains hardcoded values like"which rg"or AppleScript strings, there's no user input in the path.
We've added FP suppression rules for both patterns.
What This Teaches Us
1. Automated Scanning Catches Real Issues
mcp-chrome's readBase64File() is a genuine vulnerability that leaks any file on your system. No human reviewer caught it before our scanner. Automation matters.
2. But Automation Isn't Enough
3 of our 6 highest-profile F-grades were wrong. If we'd published "TrendRadar (49K stars) is dangerous!" without checking, we'd have lost all credibility.
3. "By Design" ≠ "Safe for MCP"
DesktopCommanderMCP's shell execution is intentional. But MCP's threat model is different from traditional software — the AI agent can be manipulated by prompt injection, turning "useful feature" into "attack vector."
4. Transparency Builds Trust
We're publishing our false positive rate (14%), our corrections, and our methodology. We'd rather be honest about our limitations than pretend our scanner is perfect.
How to Protect Yourself
- Check before you connect: spiderrating.com/servers/{owner}/{repo}
- Scan locally:
npx spidershield scan ./your-server(free, open source) - Add runtime protection: SpiderShield PreToolUse hook blocks Grade F servers automatically
- For dangerous-by-design tools: Use
balancedorstrictpolicy mode
---
*Browse all 5,928 ratings at spiderrating.com. Scanner source: github.com/teehooai/spidershield (MIT). Questions? [email protected].*