FiniteMonkey is an advanced vulnerability mining engine powered purely by GPT, requiring no prior knowledge base or fine-tuning. Its effectiveness significantly surpasses most current related research approaches.
- Task-driven, not question-driven
- Prompt-driven, not code-driven
- Focus on prompt design, not model design
- Leveraging "deception" and hallucination as key mechanics
As of May 2024, this tool has helped identify vulnerabilities worth over $60,000 in bounties.
2024.11.19: Version 1.0 released - Demonstrating feasibility of LLM-based auditing and productization
Earlier Updates:
- 2024.08.02: Project renamed to finite-monkey-engine
- 2024.08.01: Added support for func, tact
- 2024.07.23: Added support for cairo, move
- 2024.07.01: Updated license
- 2024.06.01: Added Python language support
- 2024.05.18: Improved false positive reduction (~20%)
- 2024.05.16: Added cross-contract vulnerability confirmation
- 2024.04.29: Added basic Rust language support
- PostgreSQL database
- OpenAI API access
- Python environment
- Configure test environment in
src/main.py
:
switch_production_or_test = 'test'
-
Place project under
src/dataset/agent-v1-c4
-
Configure project in
datasets.json
:
{
"StEverVault2": {
"path": "StEverVault",
"files": [],
"functions": []
}
}
-
Create database using
src/db.sql
-
Configure
.env
:
DATABASE_URL=postgresql://postgres:1234@127.0.0.1:5432/postgres
OPENAI_API_BASE="api.openai.com"
OPENAI_API_KEY=xxxxxx
BUSINESS_FLOW_MODEL_ID=gpt-4-turbo
VUL_MODEL_ID=gpt-4-turbo
BUSINESS_FLOW_COUNT=10
SWITCH_FUNCTION_CODE=False
SWITCH_BUSINESS_CODE=True
- Scans can be resumed if interrupted due to network/API issues by rerunning main.py with same project_id
- Strongly recommend using GPT-4-turbo - GPT-3.5 and GPT-4.0 have inferior reasoning capabilities
- Results are marked with detailed annotations and Chinese explanations:
- Prioritize entries with
"result":"yes"
in result column - Filter for
"dont need In-project other contract"
in category column - Check business_flow_code column for specific code
- Reference name column for code locations
- Prioritize entries with
- Best suited for logic vulnerability mining in real projects
- Not recommended for academic vulnerability testing
- GPT-4-turbo recommended for optimal results
- Average scan time: 2-3 hours for medium projects
- Cost estimate: $20-30 for medium projects with 10 iterations
- Current false positive rate: 30-65% depending on project size
- GPT-4 provides better results, GPT-3 not thoroughly tested
- The tricky prompt theory can be adapted for any language with minor modifications
- ANTLR AST parsing support recommended for better code slicing results
- Currently supports Solidity with plans for expansion
- Code structure optimization
- Additional language support
- Documentation and code analysis
- Command line interface implementation
- Excellent at code comprehension and logic vulnerability detection
- Less effective for control flow vulnerability detection
- Designed for real-world projects rather than academic test cases
- Each scan preserves progress automatically
- GPT-4-turbo provides optimal performance compared to other models
- Medium projects with 10 iterations take approximately 2.5 hours
- Results include detailed categorization and Chinese explanations
GNU General Public License v3.0 (GPL-3.0)
Contributions welcome! Please feel free to submit pull requests.
Note: The name is inspired by Large Language Monkeys paper