| % File src/library/utils/man/download.file.Rd |
| % Part of the R package, https://www.R-project.org |
| % Copyright 1995-2022 R Core Team |
| % Distributed under GPL 2 or later |
| |
| \name{download.file} |
| \alias{download.file} |
| \concept{proxy} |
| \concept{ftp} |
| \concept{http} |
| \title{Download File from the Internet} |
| \description{ |
| This function can be used to download a file from the Internet. |
| } |
| \usage{ |
| download.file(url, destfile, method, quiet = FALSE, mode = "w", |
| cacheOK = TRUE, |
| extra = getOption("download.file.extra"), |
| headers = NULL, \dots) |
| } |
| \arguments{ |
| \item{url}{a \code{\link{character}} string (or longer vector e.g., |
| for the \code{"libcurl"} method) naming the URL of a resource to be |
| downloaded.} |
| |
| \item{destfile}{a character string (or vector, see the \code{url} |
| argument) with the file path where the downloaded file is to be |
| saved. Tilde-expansion is performed.} |
| |
| \item{method}{Method to be used for downloading files. Current |
| download methods are \code{"internal"}, \code{"wininet"} (Windows |
| only) \code{"libcurl"}, \code{"wget"} and \code{"curl"}, and there |
| is a value \code{"auto"}: see \sQuote{Details} and \sQuote{Note}. |
| |
| The method can also be set through the option |
| \code{"download.file.method"}: see \code{\link{options}()}. |
| } |
| |
| \item{quiet}{If \code{TRUE}, suppress status messages (if any), and |
| the progress bar.} |
| |
| \item{mode}{character. The mode with which to write the file. Useful |
| values are \code{"w"}, \code{"wb"} (binary), \code{"a"} (append) and |
| \code{"ab"}. Not used for methods \code{"wget"} and \code{"curl"}. |
| See also \sQuote{Details}, notably about using \code{"wb"} for Windows. |
| } |
| \item{cacheOK}{logical. Is a server-side cached value acceptable?} |
| |
| \item{extra}{character vector of additional command-line arguments for |
| the \code{"wget"} and \code{"curl"} methods.} |
| |
| \item{headers}{named character vector of HTTP headers to use in HTTP |
| requests. It is ignored for non-HTTP URLs. The \code{User-Agent} |
| header, coming from the \code{HTTPUserAgent} option (see |
| \code{\link{options}}) is used as the first header, automatically.} |
| |
| \item{\dots}{allow additional arguments to be passed, unused.} |
| } |
| \details{ |
| The function \code{download.file} can be used to download a single |
| file as described by \code{url} from the internet and store it in |
| \code{destfile}. |
| |
| The \code{url} must start with a scheme such as \samp{http://}, |
| \samp{https://}, \samp{ftp://} or \samp{file://}. Which methods |
| support which schemes varies by \R version, but \code{method = "auto"} |
| will try to find a method which supports the scheme. |
| |
| For \code{method = "auto"} (the default) the \code{"internal"} method |
| is used for \samp{file://} URLs, and \code{"libcurl"} for all others. |
| |
| Support for method \code{"libcurl"} was optional on Windows prior to |
| \R 4.2.0: use \code{\link{capabilities}("libcurl")} to see if it is |
| supported on an earlier version. It uses an external library of that |
| name (\url{https://curl.se/libcurl/}) against which \R can be |
| compiled. |
| |
| When method \code{"libcurl"} is used, it provides |
| (non-blocking) access to \samp{https://} and (usually) \samp{ftps://} |
| URLs. There is support for simultaneous downloads, so \code{url} and |
| \code{destfile} can be character vectors of the same length greater |
| than one (but the method has to be specified explicitly and not |
| \emph{via} \code{"auto"}). For a single URL and \code{quiet = FALSE} |
| a progress bar is shown in interactive use. |
| |
| For methods \code{"wget"} and \code{"curl"} a system call is made to |
| the tool given by \code{method}, and the respective program must be |
| installed on your system and be in the search path for executables. |
| They will block all other activity on the \R process until they |
| complete: this may make a GUI unresponsive. |
| |
| \code{cacheOK = FALSE} is useful for \samp{http://} and |
| \samp{https://} URLs: it will attempt to get a copy directly from the |
| site rather than from an intermediate cache. It is used by |
| \code{\link{available.packages}}. |
| |
| The \code{"libcurl"} and \code{"wget"} methods follow \samp{http://} |
| and \samp{https://} redirections to any scheme they support. (For |
| method \code{"curl"} use argument \code{extra = "-L"}. To disable |
| redirection in \command{wget}, use \code{extra = "--max-redirect=0"}.) |
| The \code{"wininet"} method supports some redirections but not all. |
| (For method \code{"libcurl"}, messages will quote the endpoint of |
| redirections.) |
| |
| Support for \samp{http://} URLs in the \code{"internal"} method and |
| \samp{ftp://} URLs in the \code{"internal"} and \code{"wininet"} |
| methods was deprecated in \R 4.1.1 and removed in \R 4.2.0. |
| |
| See \code{\link{url}} for how \samp{file://} URLs are interpreted, |
| especially on Windows. The \code{"internal"} and \code{"wininet"} |
| methods do not percent-decode, but the \code{"libcurl"} and |
| \code{"curl"} methods do: method \code{"wget"} does not support them. |
| |
| Most methods do not percent-encode special characters such as spaces |
| in URLs (see \code{\link{URLencode}}), but it seems the |
| \code{"wininet"} method does. |
| |
| The remaining details apply to the \code{"wininet"} and |
| \code{"libcurl"} methods only. |
| |
| The timeout for many parts of the transfer can be set by the option |
| \code{timeout} which defaults to 60 seconds. This is often |
| insufficient for downloads of large files (50MB or more) and |
| so should be increased when \code{download.file} is used in packages |
| to do so. Note that the user can set the default timeout by the |
| environment variable \env{R_DEFAULT_INTERNET_TIMEOUT} in recent |
| versions of \R, so to ensure that this is not decreased packages should |
| use something like |
| \preformatted{ |
| options(timeout = max(300, getOption("timeout"))) |
| } |
| (It is unrealistic to require download times of less than 1s/MB.) |
| |
| The level of detail provided during transfer can be set by the |
| \code{quiet} argument and the \code{internet.info} option: the details |
| depend on the platform and scheme. For the \code{"internal"} method |
| setting option \code{internet.info} to 0 gives all available details, |
| including all server responses. Using 2 (the default) gives only |
| serious messages, and 3 or more suppresses all messages. For the |
| \code{"libcurl"} method values of the option less than 2 give verbose |
| output. |
| |
| A progress bar tracks the transfer platform-specifically: |
| \describe{ |
| \item{On Windows}{If the file length is known, the |
| full width of the bar is the known length. Otherwise the initial |
| width represents 100 Kbytes and is doubled whenever the current width |
| is exceeded. (In non-interactive use this uses a text version. If the |
| file length is known, an equals sign represents 2\% of the transfer |
| completed: otherwise a dot represents 10Kb.)} |
| \item{On a unix-alike}{If the file length is known, an |
| equals sign represents 2\% of the transfer completed: otherwise a dot |
| represents 10Kb.} |
| } |
| |
| |
| The choice of binary transfer (\code{mode = "wb"} or \code{"ab"}) is |
| important on Windows, since unlike Unix-alikes it does distinguish |
| between text and binary files and for text transfers changes \code{\\n} |
| line endings to \code{\\r\\n} (aka \file{CRLF}). |
| |
| On Windows, if \code{mode} is not supplied (\code{\link{missing}()}) |
| and \code{url} ends in one of \code{.gz}, \code{.bz2}, \code{.xz}, |
| \code{.tgz}, \code{.zip}, \code{.jar}, \code{.rda}, \code{.rds} or |
| \code{.RData}, \code{mode = "wb"} is set so that a binary transfer |
| is done to help unwary users. |
| |
| Code written to download binary files must use \code{mode = "wb"} (or |
| \code{"ab"}), but the problems incurred by a text transfer will only |
| be seen on Windows. |
| } |
| \note{ |
| Files of more than 2GB are supported on 64-bit builds of \R; they |
| may be truncated on some 32-bit builds. |
| |
| Methods \code{"wget"} and \code{"curl"} are mainly for historical |
| compatibility but provide may provide capabilities not supported by |
| the \code{"libcurl"} or \code{"wininet"} methods. |
| |
| Method \code{"wget"} can be used with proxy firewalls which require |
| user/password authentication if proper values are stored in the |
| configuration file for \code{wget}. |
| |
| \command{wget} (\url{https://www.gnu.org/software/wget/}) is commonly |
| installed on Unix-alikes (but not macOS). Windows binaries are |
| available from MSYS2 and elsewhere. |
| |
| \command{curl} (\url{https://curl.se/}) is installed on macOS and |
| increasingly commonly on Unix-alikes. Windows binaries are available |
| at that URL. |
| } |
| \section{Setting Proxies}{ |
| For the Windows-only method \code{"wininet"}, the \sQuote{Internet |
| Options} of the system are used to choose proxies and so on; these are |
| set in the Control Panel and are those used for system browsers. |
| |
| For the \code{"libcurl"} and \code{"curl"} methods, proxies can be set |
| \emph{via} the environment variables \env{http_proxy} or |
| \env{ftp_proxy}. See |
| \url{https://curl.se/libcurl/c/libcurl-tutorial.html} for further |
| details. |
| } |
| \section{Secure URLs}{ |
| Methods which access \samp{https://} and \samp{ftps://} URLs should |
| try to verify the site certificates. This is usually done using the CA |
| root certificates installed by the OS (although we have seen instances |
| in which these got removed rather than updated). For further information |
| see \url{https://curl.se/docs/sslcerts.html}. |
| |
| On Windows, with \code{method = "libcurl"}, the CA root certificates are |
| provided by the OS when \R was linked with \code{libcurl} with |
| \code{Schannel} enabled, which is the default in Rtools. This can be |
| verified by that \code{libcurlVersion()} returns a version string |
| containing "Schannel". If it does not, for verification to be on, the |
| environment variable \env{CURL_CA_BUNDLE} must be set to a path to a |
| certificate bundle file, usually named \file{ca-bundle.crt} or |
| \file{curl-ca-bundle.crt}. (This is normally done automatically for a |
| binary installation of \R, which installs |
| \file{\var{R_HOME}/etc/curl-ca-bundle.crt} and sets \env{CURL_CA_BUNDLE} |
| to point to it if that environment variable is not already set.) For an |
| updated certificate bundle, see \url{https://curl.se/docs/sslcerts.html}. |
| Currently one can download a copy from |
| \url{https://raw.githubusercontent.com/bagder/ca-bundle/master/ca-bundle.crt} |
| and set \env{CURL_CA_BUNDLE} to the full path to the downloaded file. |
| |
| Note that the root certificates used by \R may or may not be the same |
| as used in a browser, and indeed different browsers may use different |
| certificate bundles (there is typically a build option to choose |
| either their own or the system ones). |
| } |
| \section{FTP sites}{ |
| \samp{ftp:} URLs are accessed using the FTP protocol which has a |
| number of variants. One distinction is between \sQuote{active} and |
| \sQuote{(extended) passive} modes: which is used is chosen by the |
| client. The \code{"libcurl"} methods uses passive |
| mode, and that is almost universally used by browsers. |
| } |
| \section{Good practice}{ |
| Setting the \code{method} should be left to the end user. Neither of |
| the \command{wget} nor \command{curl} commands is widely available: |
| you can check if one is available \emph{via} \code{\link{Sys.which}}, |
| and should do so in a package or script. |
| |
| If you use \code{download.file} in a package or script, you must check |
| the return value, since it is possible that the download will fail |
| with a non-zero status but not an \R error. |
| |
| The supported \code{method}s do change: method \code{libcurl} was |
| introduced in \R 3.2.0 and was optional on Windows until \R 4.2.0 -- |
| use \code{\link{capabilities}("libcurl")} in a program to see if it is |
| available. |
| } |
| \value{ |
| An (invisible) integer code, \code{0} for success and non-zero for |
| failure. For the \code{"wget"} and \code{"curl"} methods this is the |
| status code returned by the external program. The \code{"internal"} |
| method can return \code{1}, but will in most cases throw an error. |
| |
| What happens to the destination file(s) in the case of error depends |
| on the method and \R{} version. Currently the \code{"internal"}, |
| \code{"wininet"} and \code{"libcurl"} methods will remove the file if |
| there the URL is unavailable except when \code{mode} specifies |
| appending when the file should be unchanged. |
| } |
| \seealso{ |
| \code{\link{options}} to set the \code{HTTPUserAgent}, \code{timeout} |
| and \code{internet.info} options used by some of the methods. |
| |
| \code{\link{url}} for a finer-grained way to read data from URLs. |
| |
| \code{\link{url.show}}, \code{\link{available.packages}}, |
| \code{\link{download.packages}} for applications. |
| |
| Contributed packages \CRANpkg{RCurl} and \CRANpkg{curl} provide more |
| comprehensive facilities to download from URLs. |
| } |
| \keyword{utilities} |