How to Use the Command 'theHarvester' (with Examples)
TheHarvester is a widely-used penetration testing tool designed to gather information about a particular domain. It is typically employed in the initial stages of a security assessment to collect publicly available data, which can aid ethical hackers and security analysts in identifying potential vulnerabilities. By querying various search engines and other online resources, theHarvester aggregates data such as email addresses, subdomain names, virtual hosts, open ports, and banners utilized by the target domain. This gathered intelligence forms a critical foundation for further security audits.
Use Case 1: Gather Information on a Domain Using Google
Code:
theHarvester --domain domain_name --source google
Motivation:
One of the primary motivations for using this specific command is its ability to harvest information quickly and efficiently from a trusted source like Google. Google is a vast search engine that indexes countless web pages, making it a potent tool for collecting information about a specific domain. Ethical hackers and penetration testers often begin with Google to get a broad overview of what is publicly accessible about a domain. This step is crucial for understanding how much sensitive information is available and might need securitization.
Explanation:
--domain domain_name
: This argument specifies the target domain from which you want to gather information. Replacedomain_name
with the actual domain of interest.--source google
: This argument tells theHarvester to use Google as the source for collecting data. Google’s extensive index makes it a rich source of publicly accessible information for a domain.
Example Output:
Hosts found:
------------------
www.example.com
mail.example.com
Emails found:
------------------
contact@example.com
info@example.com
Use Case 2: Gather Information on a Domain Using Multiple Sources
Code:
theHarvester --domain domain_name --source duckduckgo,bing,crtsh
Motivation:
Using multiple sources broadens the scope of the data collection process. Each search engine and source might index or prioritize data differently, and, as such, relying on just one source could omit valuable information. By tapping into alternative search engines like DuckDuckGo and Bing, in addition to certificate transparency logs via crtsh, a more comprehensive dataset can be gathered. This practice helps ensure that no relevant details slip through the cracks during early discovery stages.
Explanation:
--domain domain_name
: As in the previous example, this specifies the target domain.--source duckduckgo,bing,crtsh
: This argument tells theHarvester to use DuckDuckGo, Bing, and crtsh (Certificate Transparency Log Search) as sources. DuckDuckGo offers privacy-focused search results, Bing offers global search indexing similar to Google, and crtsh provides data from certificate transparency logs that can uncover active domains and subdomains.
Example Output:
Hosts found:
------------------
blog.example.com
dev.example.com
Emails found:
------------------
support@example.com
admin@example.com
Use Case 3: Change the Limit of Results to Work With
Code:
theHarvester --domain domain_name --source google --limit 200
Motivation:
Adjusting the limit of search results is essential when conducting thorough and expansive reconnaissance. Sometimes, a default limit may not be sufficient to uncover all potential threats, leading to incomplete data collection. By increasing the limit to 200, penetration testers aim to cast a wider net, ensuring they capture more data points, which increases the robustness of the initial assessment, particularly for large organizations with extensive digital footprints.
Explanation:
--domain domain_name
: This argument denotes the target domain.--source google
: Google is the source under this command invocation.--limit 200
: This argument sets the search result limit to 200, overriding any default preset to ensure a broader range of data is collected for analysis.
Example Output:
Hosts found:
------------------
api.example.com
store.example.com
...
Emails found:
------------------
sales@example.com
hr@example.com
...
Use Case 4: Save the Output to Two Files in XML and HTML Format
Code:
theHarvester --domain domain_name --source google --file output_file_name
Motivation:
Exporting results is a vital aspect of the penetration testing process. Storing the harvested data in both XML and HTML formats provides flexibility and ease of access for further analysis and reporting. XML is suited for structured data parsing and can be imported into other tools for detailed examination, whereas HTML provides a more human-readable format, useful for presentations and documentation. This dual-format export supports varied workflows within security assessment teams.
Explanation:
--domain domain_name
: The specific domain to target.--source google
: Google will again serve as the source of information.--file output_file_name
: This argument instructs theHarvester to save the collected data into files namedoutput_file_name.xml
andoutput_file_name.html
, allowing for easy retrieval and use in subsequent processes.
Example Output:
Data successfully saved to output_file_name.xml and output_file_name.html
Use Case 5: Display Help
Code:
theHarvester --help
Motivation:
Understanding the full capabilities and options of any tool is critical to its effective use. Displaying the help menu is a fundamental step for new users seeking to familiarize themselves with all functionalities theHarvester offers. It provides a comprehensive list of available options, switches, and detailed descriptions, empowering users to tailor their reconnaissance efforts precisely and efficiently according to their specific needs.
Explanation:
--help
: This argument calls for the tool’s help menu, listing all possible commands, options, descriptions, and usage scenarios to aid users in wielding theHarvester effectively.
Example Output:
Usage: theHarvester [OPTIONS] COMMAND [ARGS]...
Options:
-d, --domain TEXT Domain to search
-b, --source TEXT Data source
-l, --limit INTEGER Limit the number of search results
--help Show this message and exit
...
Conclusion:
TheHarvester is a powerful tool when tasked with gathering domain-related information. Through its flexible use across various search engines and formats, security professionals can have a foundational understanding of what potentially sensitive information a domain may inadvertently expose. By tailoring search parameters and utilizing various output formats, theHarvester fits seamlessly into the broader strategy of penetration testing and vulnerability assessment, making it indispensable within the cybersecurity toolkit.