We Scanned 5,928 MCP Servers, Then Manually Audited the Worst Ones

SpiderRating Research··8 min read
MCPSecurityAuditAI AgentsOpen SourceVulnerabilityScanner

> TL;DR: We scanned 5,928 MCP servers with 46+ automated rules. 114 scored Grade F. We then manually audited the top ones and found our scanner was wrong 14% of the time. We fixed the false positives, corrected 16 ratings, and published everything. Transparency > perfection.

---

The Numbers (Post-Calibration)

GradeCount%Meaning
A (9.0+)00%Nobody reaches top security
B (7.0-8.9)2684.5%Good
C (5.0-6.9)4,37373.8%Average
D (3.0-4.9)1,18920.1%Below average
F (0-2.9)981.7%Critical issues

Average score: 4.81/10. Zero servers achieve Grade A.

The Most Dangerous MCP Servers (Verified)

After automated scanning AND manual source code review, these are the most popular servers with confirmed critical vulnerabilities:

1. hangwin/mcp-chrome (10,789 stars) — Grade F, 9 critical

A Chrome browser automation tool with real, unpatched vulnerabilities:

  • `readBase64File()` accepts arbitrary file paths with zero validation. Can read /etc/shadow, ~/.ssh/id_rsa, browser credentials — anything on the filesystem. The cleanupFile() function in the *same file* has a temp-directory check, proving the developer knew about path restrictions but missed the read operations.
  • 9 instances of `new Function()` for JavaScript execution in browser tabs. This is by-design (browser automation requires JS injection), but in MCP context, a prompt injection attack means arbitrary JS in your browser session — stealing cookies, session tokens, or performing actions as you.

This is the most dangerous MCP server we've found. The path traversal in file-handler.ts is a genuine vulnerability, not a design choice.

2. Klavis-AI/klavis (5,666 stars) — Grade F, 5 critical + 210 high

The largest attack surface in the MCP ecosystem: 1,369 tools with 5 critical and 210 high-severity issues. This is an aggregator that wraps many services — the sheer volume of tools means a vast number of potential entry points.

3. wonderwhy-er/DesktopCommanderMCP (5,711 stars) — Grade F, 2 critical + 7 high

A desktop control tool where shell execution IS the product. The execute_command tool runs arbitrary shell commands by design. In MCP context, this means prompt injection = attacker runs commands on your machine. The allowedDirectories mechanism defaults to the entire home directory.

Not a bug — a dangerous feature. Use with strict policy or sandboxed environment only.

What We Got Wrong (And Fixed)

We manually audited 6 high-star F-grade servers and found our scanner was wrong on 3 of them:

ServerStarsWhat Scanner SaidWhat Code DoesAction
sansan0/TrendRadar49,093"SQL injection via f-string"f-string only generates ? placeholders; values properly parameterizedFixed: F → C (6.84)
AgentDeskAI/browser-tools-mcp7,126"Command injection via exec()"exec() argument is hardcoded AppleScript; no MCP input reaches itFixed: F → C (6.91)
bytebase/dbhub2,328"SQL injection (19 instances)"All 19 use ? parameterized queries correctlyFixed: F → C (6.82)

We then batch-rescanned all 111 F-grade servers with the fixed rules. 16 ratings were corrected (14% false positive rate). The other 86% of F-grade ratings were confirmed accurate.

What Caused the False Positives

Our scanner uses regex + Semgrep taint analysis. Two patterns caused most FPs:

  1. `.execute(f"...{placeholders}...")` — the scanner sees f-string + execute() and flags SQL injection. But when the f-string only interpolates ? placeholder counts (not data), it's the standard safe parameterized query pattern.
  2. `execSync(variable)` — flagged as command injection whenever the argument isn't a string literal. But when variable contains hardcoded values like "which rg" or AppleScript strings, there's no user input in the path.

We've added FP suppression rules for both patterns.

What This Teaches Us

1. Automated Scanning Catches Real Issues

mcp-chrome's readBase64File() is a genuine vulnerability that leaks any file on your system. No human reviewer caught it before our scanner. Automation matters.

2. But Automation Isn't Enough

3 of our 6 highest-profile F-grades were wrong. If we'd published "TrendRadar (49K stars) is dangerous!" without checking, we'd have lost all credibility.

3. "By Design" ≠ "Safe for MCP"

DesktopCommanderMCP's shell execution is intentional. But MCP's threat model is different from traditional software — the AI agent can be manipulated by prompt injection, turning "useful feature" into "attack vector."

4. Transparency Builds Trust

We're publishing our false positive rate (14%), our corrections, and our methodology. We'd rather be honest about our limitations than pretend our scanner is perfect.

How to Protect Yourself

  1. Check before you connect: spiderrating.com/servers/{owner}/{repo}
  2. Scan locally: npx spidershield scan ./your-server (free, open source)
  3. Add runtime protection: SpiderShield PreToolUse hook blocks Grade F servers automatically
  4. For dangerous-by-design tools: Use balanced or strict policy mode

---

*Browse all 5,928 ratings at spiderrating.com. Scanner source: github.com/teehooai/spidershield (MIT). Questions? [email protected].*