How to use the command 'gau' (with examples)
The ‘gau’ (Get All URLs) command is a powerful tool designed to retrieve known URLs associated with a domain from major sources such as AlienVault’s Open Threat Exchange, the Wayback Machine, Common Crawl, and URLScan. By leveraging these databases, security analysts and researchers can gather crucial URL information that may be useful for cybersecurity investigations, web archiving, and crawling. Below, we explore various use cases for utilizing the ‘gau’ command effectively.
Use case 1: Fetch all URLs of a domain from AlienVault, the Wayback Machine, Common Crawl, and URLScan
Code:
gau example.com
Motivation:
When conducting a security audit or investigation for a specific domain, retrieving a comprehensive list of URLs is often the first step. This command allows security professionals and researchers to fetch known URLs quickly, enhancing their understanding of the domain’s digital footprint and history.
Explanation:
gau
: Invokes the command.example.com
: Specifies the domain for which URL data is to be retrieved. In practice, this domain is replaced with the target domain of interest.
Example Output:
A list of URLs associated with example.com
, each sourced from the aforementioned providers.
Use case 2: Fetch URLs of multiple domains
Code:
gau domain1 domain2 ...
Motivation:
In many scenarios, especially when dealing with a large organization or related group of domains, gathering URL data for multiple domains simultaneously can be vital. This is useful for comparing domain activities, cross-validating threat intelligence data, and gaining insights into interconnected domain infrastructures.
Explanation:
gau
: Base command to initiate the URL gathering.domain1 domain2 ...
: A space-separated list of multiple domains for which URL information is sought.
Example Output:
Multiple lists of URLs, each corresponding to one of the input domains, providing a comprehensive collection of URLs across the domains entered.
Use case 3: Fetch all URLs of several domains from an input file, running multiple threads
Code:
gau --threads 4 < path/to/domains.txt
Motivation:
When working with a large number of domains, manually entering each domain is inefficient. By automating the process using an input file, users can expedite data retrieval. Utilizing multiple threads maximizes computational power and decreases the overall retrieval time, making it ideal for time-sensitive investigations.
Explanation:
--threads 4
: Indicates the command should use four parallel threads for processing, efficiently speeding up the retrieval process.< path/to/domains.txt
: Points to a text file containing a newline-separated list of domains to be processed.
Example Output:
A compiled result of URLs for all domains listed in the input file, fetched across the specified number of threads.
Use case 4: Write output results to a file
Code:
gau example.com --o path/to/found_urls.txt
Motivation:
For documentation, analysis, or further processing, having a persistent file of retrieved URLs is invaluable. This functionality supports audit trails, data sharing across teams, and archival purposes.
Explanation:
example.com
: The target domain for which URLs are being fetched.--o path/to/found_urls.txt
: Specifies the destination file path where the retrieved URLs should be saved.
Example Output:
URLs corresponding to example.com
saved into the specified text file, found_urls.txt
, neatly organized for future reference.
Use case 5: Search for URLs from only one specific provider
Code:
gau --providers wayback|commoncrawl|otx|urlscan example.com
Motivation:
In certain situations, URLs from a specific source may be more trustworthy, relevant, or meet specific criteria set by investigators. As such, targeting a single provider might streamline data quality or quantity based on known characteristics of the provider’s data.
Explanation:
--providers wayback|commoncrawl|otx|urlscan
: Restricts the data retrieval to a specified single provider, offering focused results from either the Wayback Machine, Common Crawl, AlienVault’s Open Threat Exchange, or URLScan.example.com
: Domain for which URLs are gathered from the chosen provider.
Example Output:
URLs exclusively from the selected provider associated with the domain example.com
.
Use case 6: Search for URLs from multiple providers
Code:
gau --providers wayback,otx,... example.com
Motivation:
To obtain a diverse set of URLs from various trusted sources, users might need information collected from several providers. This approach provides a well-rounded dataset, accommodating cross-verification and comprehensive analysis.
Explanation:
--providers wayback,otx,...
: Allows specifying multiple providers from which to pull URLs, separated by commas.example.com
: The domain to target for URL extraction across the indicated providers.
Example Output:
A diverse list of URLs derived from the specified multiple providers for the input domain.
Use case 7: Search for URLs within a specific date range
Code:
gau --from YYYYMM --to YYYYMM example.com
Motivation:
Some projects necessitate information within a certain time window, such as event-based investigations, historical data analysis, and regulatory compliance checks. This use case enables targeted data retrieval based on temporal constraints.
Explanation:
--from YYYYMM
: Sets the beginning date for the search window in ‘YearMonth’ format.--to YYYYMM
: Sets the end date for the search window in ‘YearMonth’ format.example.com
: The domain whose URLs are to be extracted within the specified time range.
Example Output:
A chronologically filtered list of URLs associated with example.com
, dated within the set range.
Conclusion:
The ‘gau’ command offers a robust and versatile set of functionalities for fetching URLs from a variety of sources. Whether used for a single domain, multiple domains, or specific providers, ‘gau’ equips cybersecurity professionals with essential data, facilitating informed decision-making and analytic processes. By leveraging each use case effectively, users can tailor their URL gathering strategies to meet their distinct investigative or research needs.