How to use the command 'wget2' (with examples)

How to use the command 'wget2' (with examples)

wget2 is a powerful command-line utility that facilitates downloading files from the internet. As an evolved version of the traditional wget, it maintains support for HTTP, HTTPS, and introduces HTTP/2 protocols, offering markedly improved performance. By default, it employs multiple threads, allowing for quicker retrieval of web content, making it an efficient tool for programmers and web developers who require robust file download capabilities from various web sources.

Use case 1: Download the contents of a URL using multiple threads

Code:

wget2 https://example.com/foo

Motivation:

The primary draw of wget2 over its predecessor is its use of multiple threads by default. This feature speeds up the downloading process significantly. When working with large files or when downloading from servers with high bandwidth capabilities, using multiple threads can make the process several times faster than downloading with a single connection.

Explanation:

  • wget2: This is the command itself, initiating the download action.
  • https://example.com/foo: This specifies the URL from which to fetch the content. The URL can point to any downloadable resource.

Example Output:

Downloaded: 100%[=====================================>]    5.00M  1.25MB/s  in 4s
Saving to: ‘foo’

Use case 2: Limit the number of threads used for downloading

Code:

wget2 --max-threads=10 https://example.com/foo

Motivation:

While wget2 employs multiple threads by default to accelerate downloads, there are times when you might want to limit the number of threads. This is useful in cases where server load is a concern, or when your local network resources are limited. By controlling the number of threads, you can manage bandwidth usage more effectively and prevent overloading the server.

Explanation:

  • --max-threads=10: This flag specifies the maximum number of threads to use while downloading. By adjusting this number, users can throttle the degree of parallelism according to their needs or constraints.
  • https://example.com/foo: The web resource to be downloaded.

Example Output:

Connecting to example.com (...):80... Connected.
HTTP request sent, awaiting response.... 200 OK
Thread 1 started; Thread 2 started; ... [total 10 threads active]

Use case 3: Download a single web page and all its resources

Code:

wget2 --page-requisites --convert-links https://example.com/somepage.html

Motivation:

When a complete offline version of a webpage is needed, it is essential to download not only the HTML file but also all ancillary resources like images, CSS files, and JavaScript. This ensures that when the HTML file is viewed offline, it appears as it would online. This is highly beneficial when presenting web designs or analyzing existing sites without needing internet access.

Explanation:

  • --page-requisites: This option specifies that all the elements required to display the page properly should be downloaded.
  • --convert-links: Converts the links in the document to make them suitable for local viewing, ensuring the offline version of the page is functional.
  • https://example.com/somepage.html: The URL of the target webpage to be downloaded.

Example Output:

Loading web page and requisites: /somepage.html
Saving: stylesheet.css, script.js, image.jpg,... 
Total files downloaded: 10

Use case 4: Mirror a website, but do not ascend to the parent directory

Code:

wget2 --mirror --no-parent https://example.com/somepath/

Motivation:

Mirroring a website using wget2 is an efficient way of downloading all content available in a directory structure on a website. The --no-parent argument is critical in preventing the retrieval process from moving up to parent directories, ensuring downloads are constrained only to the specified subdirectory. This is practical for backing up web resources within a specific site path without delving into unrelated content.

Explanation:

  • --mirror: Facilitates the download of the entire website or directory specified.
  • --no-parent: Prevents the download process from ascending beyond the specified directory path.
  • https://example.com/somepath/: The base directory in the website intended for mirroring.

Example Output:

Mirroring the directory: somepath/
Retrieved directories: 3, Retrieved files: 48
Total download completed: 20MB

Use case 5: Limit the download speed and number of connection retries

Code:

wget2 --limit-rate=300k --tries=100 https://example.com/somepath/

Motivation:

Controlling download speed and retries can dramatically impact download success rates, especially over unstable connections. By capping download speed, network performance is safeguarded, particularly when concurrent downloads are running. The retry limit increases the likelihood of completing the download in cases of intermittent connectivity.

Explanation:

  • --limit-rate=300k: This option throttles the download speed, setting a maximum rate of 300 kilobytes per second.
  • --tries=100: Configures the command to attempt connection up to 100 times before stopping, useful for overcoming temporary server or network issues.
  • https://example.com/somepath/: The URL path of the files or resources intended for download.

Example Output:

Download speed limited to 300KB/s
Attempting connection: 1...5...10... 
Download successful after 10 retries

Use case 6: Continue an incomplete download

Code:

wget2 --continue https://example.com

Motivation:

For users downloading large files or operating over unstable internet connections, downloads might terminate unexpectedly. Utilizing the --continue option enables a seamless resumption where the download was interrupted, preventing data redundancy and saving bandwidth and time.

Explanation:

  • --continue: This flag instructs wget2 to resume getting a partially-downloaded file.
  • https://example.com: The target URL for the file to continue downloading.

Example Output:

Continuing download from offset 325MB
Progress: 70% [==============================>           ]...

Use case 7: Download all URLs stored in a text file to a specific directory

Code:

wget2 --directory-prefix path/to/directory --input-file URLs.txt

Motivation:

Downloading multiple files listed in a text document saves effort and time, avoiding the need to manually initiate each download. Additionally, by setting a directory prefix, all downloaded files are organized neatly within a specified folder, aiding file management on local storage.

Explanation:

  • --directory-prefix path/to/directory: Specifies the directory path where downloaded files will be stored.
  • --input-file URLs.txt: Informs the utility to read a list of URLs from the specified text file, downloading each sequentially.

Example Output:

Reading URLs from: URLs.txt
Downloading http://site1.com/file1
... (additional files)
Files successfully saved to path/to/directory

Use case 8: Download a file from an HTTP server using Basic Auth

Code:

wget2 --user=username --password=password https://example.com

Motivation:

Accessing restricted content that requires authentication often necessitates entering credentials during the download process. By using basic authentication options directly in wget2, users can streamline access to secured files without manual login popups, benefiting users managing content behind protected environments.

Explanation:

  • --user=username: Supplies the username required for authentication.
  • --password=password: Supplies the corresponding password.
  • https://example.com: The URL pointing to the protected resource.

Example Output:

Authenticating as username...
Access granted, downloading file
Download complete

Conclusion:

wget2 enhances the traditional capabilities of wget by adding features like multithreading and HTTP/2 support, increasing performance considerably. Each use case highlights the command’s versatility across different downloading scenarios, making it an invaluable tool for anyone needing to manage and expedite downloads from the web.

Related Posts

How to Use the Command 'smbmap' (with Examples)

How to Use the Command 'smbmap' (with Examples)

smbmap is a powerful SMB (Server Message Block) enumeration tool used primarily in the field of cybersecurity for penetration testing and network auditing.

Read More
How to Use the Command 'qtchooser' (with examples)

How to Use the Command 'qtchooser' (with examples)

Qt is a popular framework for developing cross-platform applications. It provides a rich set of libraries to help developers create seamless desktop applications with captivating graphical interfaces.

Read More
An In-Depth Guide to Using Flex (with examples)

An In-Depth Guide to Using Flex (with examples)

Flex, or Fast Lexical Analyzer Generator, is a powerful tool for generating lexical analyzers.

Read More