Welcome to the Pythia main query repository. The place where threat hunters and cybersecurity researchers collaborate on malware/threat actor infrastructure hunting.
Pythia offers a generic standardized query format (pretty similar to, and inspired from Sigma) that is easily convertable to multiple infrastructure hunting platforms. Such platforms are designed to scan daily (at different frequencies) multiple IP ranges and collect data (snapshots of a state in time). An important tip is to validate your findings with multiple tools to verify your results and even get more up-to-date results. That's why Pythia was developed.
Named after the legendary Oracle of Delphi in Greek mythology, Pythia is a versatile tool designed to translate and adapt search queries across multiple cybersecurity platforms. Just as the ancient Pythia provided insights and prophecies to seekers from diverse backgrounds, this modern tool delivers precise and actionable intelligence by converting a standard query format into the specific syntax required by various search engines. Pythia streamlines the process of querying multiple data sources, making it an indispensable asset for cybersecurity professionals seeking comprehensive network visibility and threat intelligence.
Pythia aims to be for publicly-discoverable data, what Snort is for network files, YARA is for files and Sigma is for log files.
Don't judge, we are still in Beta! Feel free to contribute!
Today, IoCs (Indicators of Compromise) are starting to fade away as their lifespan is decreasing. Cybercriminals no longer reuse the same infrastructure, making it difficult for traditional detection methods to keep up. Instead, the new trend is IoFA (Indicators of Future Attacks), where threat actors deploy multiple infrastructure in automated ways. The job of security researchers now involves fingerprinting these assets and searching for them on platforms like Shodan, Censys, FOFA and others to detect them before they are actively used by threat actors for their malicious campaigns.
Pythia supports this modern approach by providing a common format and easy conversions into multiple platform formats. This enables researchers to validate and enrich their findings efficiently, staying ahead of cyber threats by identifying and mitigating potential attacks before they occur.
- title: A short capitalised title describing the query in high-level.
- id: A unique Pythia id. The id starts with "pythia-" and is supplemented a uuid (you can generate one from here: https://www.uuidgenerator.net/version4).
- status: The status of the query (experimental, test, or stable).
- description: The description of the query. Here are the details of the query such as the malware/threat actor infrastructure trying to identify, along with the specific fingerprints.
- references: Any references as URL for further information.
- tags: Any tags, useful for clustering the queries.
- author: The author of the query (i.e. FirstName Lastname, @twitter_name)
- date: The date the query was created.
- query: The query part consists of two sections: the parameters, where it consists of parts (part1, part2, .., partN) and the condition. For the parameters section, each part is basically a field-value pairing. The field must be one of them that Pythia allows (you can find them in the mappings folders). The values of each field may be the ones that someone used in the original platform that run the query. Attention here to also include the operator (i.e. : or =). Lastly, there is the condition section where the parts unite in logic, which includes them using logical operators (again found in mappings folder). For more information on how to structure your Pythia queries read Query_Creation_Guide.
- falsepositives: Any potential false positives generated by the query.
- level: The level of confidence for successfully identifying the true positives. Values include: low, moderate and high.
Pythia includes 1-1 mappings with each of the supported platforms. These mappings are in fact strings in a dictionary. Those strings are searched using regular expressions, inside the condition field of the Pythia query, and if any hit is identified, the convertors perform string substitutions with the value of each platform's mappings.
Example conversion fields(substitutions: jarm_fingerprint, :, and, http_favicon,:):
-
Pythia query (the condition part):
jarm_fingerprint:"29d29d00000000021c29d29d29d29d1f4989c319e75da83988253a39553038" and http_favicon_hash:"1768726119"
-
FOFA query (converted result):
jarm="29d29d00000000021c29d29d29d29d1f4989c319e75da83988253a39553038" && icon_hash="1768726119"
- Standardized format
- Validator scripts
- Convertor scripts to any of the supported platforms
- Directly searching the converted queries into the platforms using APIs
- Shared location to store infrastructure hunting queries
- Create abstract format ✅
- Create validator script ✅
- Create mappings for each platform ✅
- Create script that converts the queries ✅
- Add API integration for searching directly into the platforms✅
- Expand mappings 🔜
- Store more queries 🔜
Please refer to Query_Creation_Guide.md
Please refer to CONTRIBUTING.md.
If you are interested in collaboration please reach out:
Installation:
- Clone and Move to the Repository
git clone https://github.com/EfstratiosLontzetidis/pythia.git
cd pythia
- Optional: Set Up a Virtual Environment
python3 -m venv venv
source venv/bin/activate
- Install Dependencies
pip install -r requirements.txt
- Start using Pythia! There are queries stored in the queries folder for testing or usage.
Usage Examples:
- Validate Pythia query:
python3 pythia.py -file queries/TOOLS/mythic_c2_favicon_hash_dec_or_title.yml -validate
- Convert Pythia query to a specified platform's format
python3 pythia.py -file queries/MALWARE/asyncrat_subject_issuer_cn.yml -convert FOFA
- Convert Pythia query to a specified platform's format and open the URL in the browser
python3 pythia.py -file queries/MALWARE/hookbot_panel_html_title.yml -convert SHODAN -open_url
- Convert Pythia query to a specified platform's format and search its API for results (you must supplement the API credentials in config/api_configs.py)
python3 pythia.py -file queries/MALWARE/meduza_stealer_html_title.yml -convert CENSYS -api
- Convert Pythia query to all platforms' format and save them in a file
python3 pythia.py -file queries/MALWARE/quasar_rat_subject_common_name.yml -convert ALL -output_file quasar_results.txt
- Hunting Adversary Infrastructure Course - Intel-Ops
- Will Thomas Blog
- Intel-Ops Blog
- Embee Research Blog
- Censys Blog
- Pivot Atlas
- Pythia works with manual mappings. It's important to be validated to ensure accurate and up-to-date mappings.
- Not all field mappings are included as Pythia is still in Beta.
- Pythia works with string substitutions, thus it may be prone to wrong parts converted. Manual intervention may be required.
- Complex queries do not work well with pythia at the moment.
In this section we will include every contributor in any type of assistance. We expect Pythia to grow from the help of the great CTI community. We are very grateful for any kind of help from contributing to the project, to adding/updating Pythia queries.