Introduction to Data Acquisition
Wget is short for World Wide Web get and is used on the command line to download a file from a website or webserver.
Learning Objective
Upon completion of this section the learner will be able to:
- Utilize wget to download a files
- Download multiple files using regular expressions
- Download an entire website
Here is a generic example of how to use wget to download a file.
A are a couple of specific Examples
- Photo of a kitten in Rizal Park
- Photo of Arabidopsis
Sometimes you may find a need to download an entire directory of files and downloading directory using wget is not straightforward.
wget for multiple files and directories
There are 2 options. You can either specify a regular expression for a file or put a regular expression in the URL itself. First option is useful, when there are large number of files in a directory, but you want to get only specific format of files (eg., fasta)
The second option is useful if you have numerous files that have the same name, but are in different directory
The files won’t be overwritten (as they all have same names), instead they are saved as-is maintaining the directory structure.
Some times, if you have a series of files to download (and are numbered accordingly), you can use UNIX
To archive the entire website (yes, every single file of that domain), you can use the mirror option.
Other options to consider
Option | What it does | Use case |
---|---|---|
Limits Speed to 20KiB/s | Limit the data rate to avoid impacting other users’ accessing the server. | |
Check if File Exists | For if you don’t want to save a file but just want to know if it still exists. | |
Wait Seconds | After this flag, add a number of seconds to wait between each request - again, to not overload a server. | |
Set Username | wget will attempt to login using the username provided. | |
Use Password | wget will use this password with your username to authenticate. | |
or | FTP Credentials | Just like the previous settings, wget can login to an FTP server to retrieve files. |
Citations
NextPreviousTable of contents
-
-
-