Agent-driven security audits

An autonomous white-hat
security auditor.

Install the skill once. Ask Codex, Claude Code, or another skills-aware agent to prepare the target, audit the code, run proof tests, and collect the report.

Works with Codex Claude Code Gemini CLI Cursor OpenCode OpenHands
$ npx skills add adshao/flounder -g
Flounder skill — agent-driven audit
$ npx skills add adshao/flounder -g
◇ skill installed for Codex / Claude Code
Audit this repository with Flounder.
◇ target authorized boundary captured
◇ agent prepared workspace · mapped scope · dug promising regions
test runner returned PASS with command evidence
confirmed-executable report package ready
sealed audit complete · network stayed off
Use it with an agent

Ask naturally. Flounder handles the audit contract.

The installed skill triggers from Flounder audit requests, daemon/provider setup, suspected-finding verification, real-finding confirmation, and report collection.

1install skillone-time setup
2ask agentCodex or Claude Code
3provesandboxed local tests
4reportprivate disclosure draft

agent owns strategy  ·  Flounder owns safety and evidence

01 natural language

Codex / Claude Code driver

No custom scenario pipeline

Ask for an authorized audit, verification, confirmation, or report package. The Flounder skill gives the agent the operating manual and keeps it on the workflow.

The source of truth is skills/flounder/SKILL.md, not a marketing-only prompt.

02 execution-backed

End-to-end audit system

Prep → audit → proof → report

Flounder can prepare the workspace, read source and corpus, map attack surface, dig promising regions, construct exploit paths, run proof tests, and collect reports.

The framework supplies sandboxing, command policy, durable state, gates, daemon execution, and reporting.

Local dashboard

Track audits while the agent works.

flounder ui gives operators a localhost control plane for projects, daemons, provider profiles, runs, scopes, findings, live activity, and reports.

Flounder dashboard showing an Aztec Rollup demo audit with workflow phases, scope coverage, live activity, candidates, and report-ready reproduced findings
Project view: prepare → map → dig → synthesize → verify → confirm → report, with live model activity and finding-grained report actions.
Daemon-owned execution Live tool and model activity Finding-grained Verify / Confirm / Report
Why Flounder

Thin framework. Strong guarantees.

Flounder is not a scanner, checklist runner, or set of hand-written bug rules. The model decides how to reason; Flounder makes the result usable.

Agent-native

Install the skill once. Codex, Claude Code, or another skills-aware agent can drive the workflow from a plain request.

Framework-agnostic

Source, corpus, and optional profiles are inputs. The audit strategy comes from the model, not a stack-specific scanner.

Execution-grounded

A finding is not real because the model says so. It must cite command evidence from a passing local proof test.

Blind then real

Discovery runs network-sealed. Reproduction can use real-world ground truth under white-hat no-broadcast rules.

Sandbox boundary

Model-written tests, PoCs, dependencies, and commands run in a copied workspace away from the host checkout.

Local control

The UI is a control plane. Audits run on a daemon, so target code and provider credentials stay on the executor host.

Use cases

Use Flounder when a security question needs proof.

Choose the path by what you already have: a clean target, a factual clue, a public bounty scope, local source, a suspected finding, or confirmed evidence.

blind capability audit

Measure unaided audit ability.

Start with an authorized project, repo, package, source tree, or project link and no bug hint.

Input: target only, no incident writeup
incident investigation

Explain a suspicious transaction or exploit clue.

Use Prepare to collect chain facts, deployed source, official material, and reproduction requirements.

Input: transaction, address, exploit link
open-world bounty

Audit with official public context.

Let Flounder gather bounty scope, docs, deployments, provenance, and package metadata before sealed audit.

Input: public program, repo, deployment
source-provided audit

Audit code that is already staged locally.

Provide source paths, build root, and optional corpus to enter sealed map/dig directly.

Input: source, build root, docs
targeted follow-up

Settle one claim or region.

Verify suspected findings, dig selected scopes, confirm a run, or continue from prior project state.

Output: confirmed, refuted, or narrowed
disclosure prep

Package only evidence-backed bugs.

Consolidate duplicates, run real-target confirmation when needed, and regenerate selected reports.

Output: reports, decisions, command evidence
Prepare target Map scope Dig deeply Run proof Collect report
Proof boundary

Execution is the promotion rule.

A candidate stays suspected until it cites a passing confirmation-eligible command. The status is a framework verdict from command evidence, not the model's assertion.

refuted

The claim failed reproduction or skeptic review.

suspected

Credible, but no passing cited test yet.

confirmed-executable

A real local test/build runner passed.

confirmed-differential

The same exploit is blocked by its own minimal fix.

  • 1 Model-owned strategy

    Flounder is not a stack scanner or checklist runner. Source, corpus, and optional profiles are inputs, not conclusions.

  • 2 Sandboxed execution

    Commands run in a copied workspace. The default OCI backend fails closed if the sandbox image is missing.

  • 3 Real test runners only

    Inspection commands cannot mint proof. Confirmation needs a command like cargo test, forge test, or pytest.

  • 4 Local control

    The control plane queues work; the daemon executes it. Target code and provider credentials stay on the executor host.

Quickstart

Install once. Ask your agent.

The Flounder skill is the product interface for Codex, Claude Code, and other skills-aware agents.

1. Install Skill
# add Flounder to your agent once
$ npx skills add adshao/flounder -g

Installs the operating manual, safety boundary, and workflow contract.

2. Ask Agent
# use plain language from Codex or Claude Code
 Audit this repository with Flounder.
 Verify this suspected finding with Flounder.
 Collect the execution-backed bug report package.

The agent handles setup, audit planning, proof runs, and report collection.

Dashboard, CLI, and REST API remain available when you want direct control.

White-hat by construction.

Flounder is for authorized auditing only — your own code or public bug-bounty scope. Discovery is network-sealed; reproduction may fork and read live networks but never broadcasts, moves funds, or writes to any live system — exploits replay against a local fork only. Build the smallest proof needed, report privately, coordinate disclosure.

Read the security policy →
FAQ

Practical questions before you run it.

Answers for operators setting up their first agent-driven audit.

Is Flounder a local service or a cloud service?

Flounder is local-first. The dashboard and control plane run on localhost by default, and audits execute on a daemon you control. That daemon can be on your machine or another executor host you connect; Flounder does not require uploading targets to a hosted Flounder cloud.

Is Flounder open source? What license?

Yes. Flounder is open source under the GNU AGPL v3. The repository includes the full license text.

How do I use Flounder with Codex or Claude Code?

Install the Flounder skill once, then ask a skills-aware coding agent to audit an authorized target, verify a suspected finding, confirm a real finding, or collect the final report package. The dashboard, CLI, and REST API are control surfaces; the skill is the recommended way to drive the workflow.

Is Flounder a scanner?

No. The agent owns the audit strategy and target-specific reasoning. Flounder supplies the sandbox, command policy, durable state, execution gates, daemon control plane, and report package so the agent's work can be resumed, checked, and proven.

Will Flounder use a lot of tokens?

High-quality audits can be token-heavy. You can cap map, dig, and confirm budgets, but hard caps can stop a productive investigation. The default is unbounded: the agent stops when the work is done, and interrupted runs can resume. For serious use, plan around high-cap subscriptions such as ChatGPT Pro or Claude Max 20x, or set explicit budgets for API/pay-as-you-go usage.

Does my source code leave my machine?

Flounder keeps its database, artifacts, workspaces, and provider auth under local control, with default state under ~/.flounder. Provider credentials stay on the executor host. Your chosen model provider still receives the prompts and context your agent sends, so keep sensitive material out of scope unless that provider and account are approved for it.

What do I need to run a real audit?

Node.js 24.13 or newer on the current 24 LTS line, a skills-aware agent, the Flounder skill, a configured model provider on the daemon, and a sandbox backend. For execution-backed audits, use Docker or a Docker-compatible runtime with the Flounder sandbox image or a target-specific image. Host mode is for trusted local smoke tests.

What targets are a good fit?

Flounder fits source audits where claims can be proven locally: repositories, packages, smart contracts, Solidity/EVM projects, ZK/proof systems, suspected findings, transactions, addresses, and prior reports. It is strongest when the target has tests, forks, fixtures, or harnesses that can turn a vulnerability claim into command evidence.

Is it safe to run model-written exploit code?

Model-written files and commands run in a copied workspace. The default OCI sandbox fails closed if the sandbox image is missing, instead of silently falling back to the host. Use host execution only when you explicitly trust the target and the command environment.

Can Flounder be used on live targets?

Only with authorization. Discovery stays sealed and local. Confirmation may fetch, search, fork, or read real-world ground truth, but it must never broadcast, move funds, submit writes, persist access, or go outside the approved scope.

Give your agent an authorized target.

Flounder turns the request into a sandboxed, evidence-gated audit workflow.

$ npx skills add adshao/flounder -g