How to use the command 'cewl' (with examples)
- Linux
- December 17, 2024
Cewl is a powerful Ruby-based tool designed to harvest words from a website, generating custom wordlists that can be used for password cracking and security assessments. This unique web-based wordlist generator “spiders” a website, collecting useful words that can then be customized based on specific needs. Cewl is instrumental in security auditing and penetration testing, especially when a target may have unique keywords not commonly found in standard wordlists. The tool provides several options to tailor the wordlists, such as setting the depth of the crawl, including numbers, setting minimum word length, or using proxies for gathering information stealthily.
Create a Wordlist File from a Given URL up to 2 Links Depth
Code:
cewl --depth 2 --write path/to/wordlist.txt url
Motivation:
In security assessments, especially for web applications, it is beneficial to gather as comprehensive a list of potential keywords as possible that might appear in a site’s password lists or login prompts. By setting the link depth to 2, Cewl will crawl not just the specified URL but also follow links found on that first page, allowing for a wider and richer collection of words while keeping the scope controlled. This capability makes it instrumental for penetration testers who are crafting more targeted wordlists based on actual content and links from specific sections of a website.
Explanation:
--depth 2
: Sets the level of link depth, meaning the tool will visit links on the initial page and those on the pages it visits next. This level of depth provides a reasonable balance between coverage and resource use.--write path/to/wordlist.txt
: Directs Cewl to output the discovered words into a specified file. Having a text file of potential keywords improves the ease of use for subsequent tests or cracking attempts.url
: The base webpage from which the spidering begins. This URL is the starting point for wordlist generation and collection.
Example Output:
Generated wordlist saved to path/to/wordlist.txt with words like "login", "account", "secure", appearing based on the website's content hierarchy up to 2 links deep.
Output an Alphanumeric Wordlist from the Given URL with Words of Minimum 5 Characters
Code:
cewl --with-numbers --min_word_length 5 url
Motivation:
Often, generic security wordlists will result in an overwhelming amount of data filled with trivial or irrelevant words. During penetration testing, focusing on plausible complexity such as words containing numbers, or thresholding by length, enhances testing efficiency by discarding obviously irrelevant entries. This is particularly useful for users trying to crack passwords or test systems where only more intricate words are meaningful.
Explanation:
--with-numbers
: Instructs Cewl to include numerical values present in the text, hence capturing words like ‘pass123’ which might otherwise be split or omitted.--min_word_length 5
: Commands the tool to discard any words shorter than 5 characters. This filters out most insignificant words, leaving a cleaner and potentially more useful wordlist.url
: The target website to crawl for content. This URL maintains where Cewl should direct its spidering activities.
Example Output:
Extracted alphanumeric list with minimum 5 characters: 'admin123', 'securelist', 'alpha5omega'.
Output a Wordlist from the Given URL in Debug Mode Including Email Addresses
Code:
cewl --debug --email url
Motivation:
Enabling debug mode allows a more transparent view of Cewl’s inner workings, which is crucial for troubleshooting complex scraping operations or consulting a unusual web environment. For tasks like gathering email addresses during security checks, combining debug mode offers clarity and ensures the tool operates correctly, especially when parsing email structures which could vary significantly by site.
Explanation:
--debug
: Activates detailed runtime output, giving deeper insights than usual. This is crucial for developers or testers needing to understand or resolve unexpected tool behaviors.--email
: Commands Cewl to seek out and collect email addresses on the site. This can be vital for penetrating systems where email communications form a part of social engineering vectors.url
: The destination site for extracting words and emails. This website forms the central focus of Cewl’s crawling activity.
Example Output:
Debug mode active. Email extraction findings: [contact@example.com, admin@organization.org]
Output a Wordlist from the Given URL Using HTTP Basic or Digest Authentication
Code:
cewl --auth_type basic --auth_user username --auth_pass password url
Motivation:
Many web environments require authentication to access internal or sensitive pages where valuable content may reside. For penetration testers or cybersecurity specialists, the ability to authenticate using basic or digest methods allows Cewl to generate wordlists from authenticated pages, greatly expanding the range of harvested information.
Explanation:
--auth_type basic|digest
: Specifies the authentication type. Basic or Digest should be aligned with what the target site implements for access.--auth_user username
: The username credential to authenticate and allow Cewl’s access to restricted areas.--auth_pass password
: Works alongside the username to authenticate. The password facilitates access for the wordlist generation process.url
: The target site which has secured sections requiring authentication to explore.
Example Output:
Authenticated successfully with basic auth. Wordlist includes "confidential", "internalsecure".
Output a Wordlist from the Given URL Through a Proxy
Code:
cewl --proxy_host host --proxy_port port url
Motivation:
Using proxies is a common technique to mask the origin of web scrapes, reduce direct loads on the target server, or traverse regional restrictions. When conducting penetration tests, proxies can anonymize interactions, making Cewl ideal for scraping while using intermediary virtual barriers.
Explanation:
--proxy_host host
: The hostname or IP address of the proxy server connecting to the target site indirectly.--proxy_port port
: Port number used for proxy server communication, allowing the data to funnel through the correct channel.url
: The originating link from which the tool is instructed to scrape through a proxy facilitation.
Example Output:
Proxy connection established through 192.168.1.1:8080. Generating wordlist including "proxytested", "disguisedinfo".
Conclusion:
Cewl emerges as a versatile utility in the arsenal of penetration testers and security analysts. Each use case presented expands on the core abilities of wordlist enrichment, whether by traversing under different access levels, through proxies, or even fine-tuning word complexities. Complementing these capabilities are ease and transparency in working with URLs, authenticated entryways, or even proxy servers. Through both routine assessments and complex audits, Cewl provides tailored wordlist generation essential for today’s dynamic digital forensics and cybersecurity landscapes.