There are many ways to schedule downloads, but one way is to use a Download Scheduler. A Download Scheduler is a program that helps you plan and schedule downloads. It can help you save time and make your life easier. A Download Scheduler can be used in many ways. For example, you could use it to plan and schedule downloads for your website. You could also use it to plan and schedule downloads for your office or home computer. There are many different Download Schedulers available on the internet. You can find one that suits your needs and preferences. The important thing is to find a Download Scheduler that is easy to use and has features that will help you save time and make your life easier.
In this article we will show you a built in software in Ubuntu that we can use to download stuff from the internet using wget. On top of that we will show you how to schedule the download using Cron.
Download Using Wget
Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive command line tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.
Open your terminal and let’s explore how we can use wget to download stuff from the net. The basic syntax of downloading with wget is the following:
wget [option]… [URL]…
This command will download the wget manual into your local drive
wget http://www.gnu.org/software/wget/manual/wget.pdf
Linux Cron
Ubuntu comes with a cron daemon used for scheduling tasks to be executed at a certain time. Crontab allows you to specify actions and times that they should be executed. This is how you would normally schedule a task using the command line tool.
Open a terminal window and enter crontab -e.
Each of the sections in a crontab is separated by a space, with the final section having one or more spaces in it. A cron entry consist of minute (0-59), hour (0-23, 0 = midnight), day (1-31), month (1-12), weekday (0-6, 0 = Sunday), command. The third entry in the above crontab downloads wget.pdf at 2 am. The first entry (0) and the second entry (2) means 2:00. The third to the fifth entry (*) means any time of day, month, or week. The last entry is the wget command to download the wget.pdf from the specified URL.
That is the basic on wget and how Cron works. Let’s take a loot at a real life example on how to schedule a download.
Scheduling Download
We are going to download Firefox 3.6 at 2 AM.Since our ISP only gives a limited amount of data, we need to stop the download at 8 AM. This is what the setup looks like.
Ignore the first 2 entries in the above crontab. The third and fourth command are the only 2 commands that you need. The third command setup a task that will download Firefox at 2 AM:
[code] 0 2 * * * wget -c http://download.mozilla.org/?product=firefox-3.6.6&os=win&lang=en-GB [/code]
The -c options denote that wget should resume the existing download if it has not been completed.
The fourth command will stop wget at 8 am. ‘Killall’ is a unix command that kill processes by name.
[code] 0 8 * * * killall wget [/code]
The killall wget tells Ubuntu to stop wget from downloading the file at 8 AM.
Other useful wget commands
- Specifying the directory to download a file
[code] wget –output-document=/home/zainul/Downloads/wget manual.pdf http://www.gnu.org/software/wget/manual/wget.pdf [/code]
the option –output-document lets you specify the directory and the name of the file that you download
- Downloading a website
wget is also capable to download a website.
[code] wget -m http://www.google.com/profiles/zainul.franciscus [/code]
The above command will download my entire google profile web page. The option ‘-m’ tells wget to download a ‘mirror’ image of the specified URL.
Another important option is to tell wget how many links should it follows when it download a website.
[code] wget -r -l1 http://www.google.com/profiles/zainul.franciscus [/code]
The above wget command uses two options. The first option ‘-r’ tells wget to download the specified website recursively. The second option ‘-l1’ tells wget to only get the first level of links from that specified website. We can set up to three level ‘-l2’ and ‘-l3’.
- Ignoring robot entry
Web master maintain a text file called Robot.txt. ‘Robot.txt’ maintain a list of URL that a web page crawler such as wget should not crawl. We can tell wget to ignore the ‘Robot.txt’ with ‘-erobots=off’ option. The following command tells wget to download the first page of my google profile and ignore the ‘Robot.txt.
[code] wget -erobots=off http://www.google.com/profiles/zainul.franciscus [/code]
Another useful option is -U. This option will mask wget as a browser. Take note that masking an application as an other application may violate the term and service of a web service provider.
[code] wget -erobots=off -U Mozilla http://www.google.com/profiles/zainul.franciscus [/code]
Conclusion
Wget is a very old school yet hackable GNU software package that we can use to download files. Wget is an interactive command line tool which means we can let it run on our computer in the background without having to start any application. Check out the wget man page
[code] $ man wget [/code]
to understand other options that we can use with wget.
Links
Wget Manual How to Combine Two Downloaded Files When wget Fails Halfway Through Linux QuickTip: Downloading and Un-tarring in One Step