One of the most important aspects of mirroring information from the Internet is updating your archives.
Downloading the whole archive again and again, just to replace a few changed files is expensive, both in terms of wasted bandwidth and money, and the time to do the update. This is why all the mirroring tools offer the option of incremental updating.
Such an updating mechanism means that the remote server is scanned in search of new files. Only those new files will be downloaded in the place of the old ones.
A file is considered new if one of these two conditions are met:
To implement this, the program needs to be aware of the time of last modification of both remote and local files. Such information are called the time-stamps.
The time-stamping in GNU Wget is turned on using `--timestamping'
(`-N') option, or through
timestamping = on directive in
`.wgetrc'. With this option, for each file it intends to download,
Wget will check whether a local file of the same name exists. If it
does, and the remote file is older, Wget will not download it.
If the local file does not exist, or the sizes of the files do not match, Wget will download the remote file no matter what the time-stamps say.
The usage of time-stamping is simple. Say you would like to download a file so that it keeps its date of modification.
wget -S http://www.gnu.ai.mit.edu/
ls -l shows that the time stamp on the local file equals
the state of the
Last-Modified header, as returned by the server.
As you can see, the time-stamping info is preserved locally, even
Several days later, you would like Wget to check if the remote file has changed, and download it if it has.
wget -N http://www.gnu.ai.mit.edu/
Wget will ask the server for the last-modified date. If the local file is newer, the remote file will not be re-fetched. However, if the remote file is more recent, Wget will proceed fetching it normally.
The same goes for FTP. For example:
ls will show that the timestamps are set according to the state
on the remote server. Reissuing the command with `-N' will make
Wget re-fetch only the files that have been modified.
In both HTTP and FTP retrieval Wget will time-stamp the local
file correctly (with or without `-N') if it gets the stamps,
i.e. gets the directory listing for FTP or the
header for HTTP.
If you wished to mirror the GNU archive every week, you would use the following command every week:
wget --timestamping -r ftp://prep.ai.mit.edu/pub/gnu/
Time-stamping in HTTP is implemented by checking of the
Last-Modified header. If you wish to retrieve the file
`foo.html' through HTTP, Wget will check whether
`foo.html' exists locally. If it doesn't, `foo.html' will be
If the file does exist locally, Wget will first check its local
time-stamp (similar to the way
ls -l checks it), and then send a
HEAD request to the remote server, demanding the information on
the remote file.
Last-Modified header is examined to find which file was
modified more recently (which makes it "newer"). If the remote file
is newer, it will be downloaded; if it is older, Wget will give
Arguably, HTTP time-stamping should be implemented using the
In theory, FTP time-stamping works much the same as HTTP, only FTP has no headers--time-stamps must be received from the directory listings.
For each directory files must be retrieved from, Wget will use the
LIST command to get the listing. It will try to analyze the
listing, assuming that it is a Unix
ls -l listing, and extract
the time-stamps. The rest is exactly the same as for HTTP.
Assumption that every directory listing is a Unix-style listing may sound extremely constraining, but in practice it is not, as many non-Unix FTP servers use the Unixoid listing format because most (all?) of the clients understand it. Bear in mind that RFC959 defines no standard way to get a file list, let alone the time-stamps. We can only hope that a future standard will define this.
Another non-standard solution includes the use of
that is supported by some FTP servers (including the popular
wu-ftpd), which returns the exact time of the specified file.
Wget may support this command in the future.
Go to the first, previous, next, last section, table of contents.