Entry
How can I make a http request from a shell script?
Jul 4th, 2003 01:49
markus wolf, Tim Brown, O R,
Try "wget":
bink@foo> wget http://www.foo.com/bar.html
bink@foo> wget ftp://www.foo.com/bar.tgz
bink@foo> wget --help
GNU Wget 1.6, a non-interactive network retriever.
Usage: wget [OPTION]... [URL]...
Mandatory arguments to long options are mandatory for short options too.
Startup:
-V, --version display the version of Wget and exit.
-h, --help print this help.
-b, --background go to background after startup.
-e, --execute=COMMAND execute a `.wgetrc'-style command.
Logging and input file:
-o, --output-file=FILE log messages to FILE.
-a, --append-output=FILE append messages to FILE.
-d, --debug print debug output.
-q, --quiet quiet (no output).
-v, --verbose be verbose (this is the default).
-nv, --non-verbose turn off verboseness, without being quiet.
-i, --input-file=FILE download URLs found in FILE.
-F, --force-html treat input file as HTML.
-B, --base=URL prepends URL to relative links in -F -i file.
Download:
--bind-address=ADDRESS bind to ADDRESS (hostname or IP) on
local host.
-t, --tries=NUMBER set number of retries to NUMBER (0
unlimits).
-O --output-document=FILE write documents to FILE.
-nc, --no-clobber don't clobber existing files or use .#
suffixes.
-c, --continue restart getting an existing file.
--dot-style=STYLE set retrieval display style.
-N, --timestamping don't retrieve files if older than local.
-S, --server-response print server response.
--spider don't download anything.
-T, --timeout=SECONDS set the read timeout to SECONDS.
-w, --wait=SECONDS wait SECONDS between retrievals.
--waitretry=SECONDS wait 1...SECONDS between retries of a
retrieval.
-Y, --proxy=on/off turn proxy on or off.
-Q, --quota=NUMBER set retrieval quota to NUMBER.
Directories:
-nd --no-directories don't create directories.
-x, --force-directories force creation of directories.
-nH, --no-host-directories don't create host directories.
-P, --directory-prefix=PREFIX save files to PREFIX/...
--cut-dirs=NUMBER ignore NUMBER remote directory
components.
HTTP options:
--http-user=USER set http user to USER.
--http-passwd=PASS set http password to PASS.
-C, --cache=on/off (dis)allow server-cached data (normally
allowed).
-E, --html-extension save all text/html documents with .html
extension.
--ignore-length ignore `Content-Length' header field.
--header=STRING insert STRING among the headers.
--proxy-user=USER set USER as proxy username.
--proxy-passwd=PASS set PASS as proxy password.
--referer=URL include `Referer: URL' header in HTTP request.
-s, --save-headers save the HTTP headers to file.
-U, --user-agent=AGENT identify as AGENT instead of Wget/VERSION.
FTP options:
--retr-symlinks when recursing, retrieve linked-to files (not
dirs).
-g, --glob=on/off turn file name globbing on or off.
--passive-ftp use the "passive" transfer mode.
Recursive retrieval:
-r, --recursive recursive web-suck -- use with care!.
-l, --level=NUMBER maximum recursion depth (inf or 0 for
infinite).
--delete-after delete files locally after downloading them.
-k, --convert-links convert non-relative links to relative.
-K, --backup-converted before converting file X, back up as X.orig.
-m, --mirror shortcut option equivalent to -r -N -l
inf -nr.
-nr, --dont-remove-listing don't remove `.listing' files.
-p, --page-requisites get all images, etc. needed to display
HTML page.
Recursive accept/reject:
-A, --accept=LIST comma-separated list of accepted
extensions.
-R, --reject=LIST comma-separated list of rejected
extensions.
-D, --domains=LIST comma-separated list of accepted
domains.
--exclude-domains=LIST comma-separated list of rejected
domains.
--follow-ftp follow FTP links from HTML documents.
--follow-tags=LIST comma-separated list of followed
HTML tags.
-G, --ignore-tags=LIST comma-separated list of ignored HTML
tags.
-H, --span-hosts go to foreign hosts when recursive.
-L, --relative follow relative links only.
-I, --include-directories=LIST list of allowed directories.
-X, --exclude-directories=LIST list of excluded directories.
-nh, --no-host-lookup don't DNS-lookup hosts.
-np, --no-parent don't ascend to the parent directory.
Mail bug reports and suggestions to <bug-wget@gnu.org>.
Or use tools curl, http://curl.haxx.se/, or httrack, http://www.httrack.com/
> curl --help
curl 7.10.4 (win32) libcurl/7.10.4 OpenSSL/0.9.7a zlib/1.1.4
Usage: curl [options...] <url>
Options: (H) means HTTP/HTTPS only, (F) means FTP only
-a/--append Append to target file when uploading (F)
-A/--user-agent <string> User-Agent to send to server (H)
-b/--cookie <name=string/file> Cookie string or file to read cookies from (H)
-B/--use-ascii Use ASCII/text transfer
-c/--cookie-jar <file> Write all cookies to this file after operation (H)
-C/--continue-at <offset> Specify absolute resume offset
-d/--data <data> HTTP POST data (H)
--data-ascii <data> HTTP POST ASCII data (H)
--data-binary <data> HTTP POST binary data (H)
--disable-epsv Prevents curl from using EPSV (F)
-D/--dump-header <file> Write the headers to this file
--egd-file <file> EGD socket path for random data (SSL)
-e/--referer Referer page (H)
-E/--cert <cert[:passwd]> Specifies your certificate file and password (HTTPS)
--cert-type <type> Specifies certificate file type (DER/PEM/ENG) (HTTPS)
--key <key> Specifies private key file (HTTPS)
--key-type <type> Specifies private key file type (DER/PEM/ENG) (HTTPS)
--pass <pass> Specifies passphrase for the private key (HTTPS)
--engine <eng> Specifies the crypto engine to use (HTTPS)
--cacert <file> CA certifciate to verify peer against (SSL)
--capath <directory> CA directory (made using c_rehash) to verify
peer against (SSL)
--ciphers <list> What SSL ciphers to use (SSL)
--compressed Request a compressed response (using deflate).
--connect-timeout <seconds> Maximum time allowed for connection
--create-dirs Create the necessary local directory hierarchy
--crlf Convert LF to CRLF in upload. Useful for MVS (OS/390)
-f/--fail Fail silently (no output at all) on errors (H)
-F/--form <name=content> Specify HTTP POST data (H)
-g/--globoff Disable URL sequences and ranges using {} and []
-G/--get Send the -d data with a HTTP GET (H)
-h/--help This help text
-H/--header <line> Custom header to pass to server. (H)
-i/--include Include the HTTP-header in the output (H)
-I/--head Fetch document info only (HTTP HEAD/FTP SIZE)
-j/--junk-session-cookies Ignore session cookies read from file (H)
--interface <interface> Specify the interface to be used
--krb4 <level> Enable krb4 with specified security level (F)
-k/--insecure Allow curl to connect to SSL sites without certs (H)
-K/--config Specify which config file to read
-l/--list-only List only names of an FTP directory (F)
--limit-rate <rate> Limit how fast transfers to allow
-L/--location Follow Location: hints (H)
--location-trusted Same, and continue to send authentication when
following locations, even when hostname changed
-m/--max-time <seconds> Maximum time allowed for the transfer
-M/--manual Display huge help text
-n/--netrc Must read .netrc for user name and password
--netrc-optional Use either .netrc or URL; overrides -n
-N/--no-buffer Disables the buffering of the output stream
-o/--output <file> Write output to <file> instead of stdout
-O/--remote-name Write output to a file named as the remote file
-p/--proxytunnel Perform non-HTTP services through a HTTP proxy
-P/--ftpport <address> Use PORT with address instead of PASV when ftping (F)
-q When used as the first parameter disables .curlrc
-Q/--quote <cmd> Send QUOTE command to FTP before file transfer (F)
-r/--range <range> Retrieve a byte range from a HTTP/1.1 or FTP server
-R/--remote-time Set the remote file's time on the local output
-s/--silent Silent mode. Don't output anything
-S/--show-error Show error. With -s, make curl show errors when they occur
--stderr <file> Where to redirect stderr. - means stdout.
-t/--telnet-option <OPT=val> Set telnet option
--trace <file> Dump a network/debug trace to the given file
--trace-ascii <file> Like --trace but without the hex output
-T/--upload-file <file> Transfer/upload <file> to remote site
--url <URL> Another way to specify URL to work with
-u/--user <user[:password]> Specify user and password to use
Overrides -n and --netrc-optional
-U/--proxy-user <user[:password]> Specify Proxy authentication
-v/--verbose Makes the operation more talkative
-V/--version Outputs version number then quits
-w/--write-out [format] What to output after completion
-x/--proxy <host[:port]> Use proxy. (Default port is 1080)
--random-file <file> File to use for reading random data from (SSL)
-X/--request <command> Specific request command to use
-y/--speed-time Time needed to trig speed-limit abort. Defaults to 30
-Y/--speed-limit Stop transfer if below speed-limit for 'speed-time' secs
-z/--time-cond <time> Includes a time condition to the server (H)
-Z/--max-redirs <num> Set maximum number of redirections allowed (H)
-0/--http1.0 Force usage of HTTP 1.0 (H)
-1/--tlsv1 Force usage of TLSv1 (H)
-2/--sslv2 Force usage of SSLv2 (H)
-3/--sslv3 Force usage of SSLv3 (H)
-#/--progress-bar Display transfer progress as a progress bar
> httrack --help
HTTrack version 3.23+swf (compiled Mar 8 2003)
usage: httrack <URLs> [-option] [+<FILTERs>] [-<FILTERs>]
with options listed below: (* is the default value)
General options:
O path for mirror/logfiles+cache (-O path_mirror[,path_cache_and_logfiles]) (--path <param>)
Action options:
w *mirror web sites (--mirror)
W mirror web sites, semi-automatic (asks questions) (--mirror-wizard)
g just get files (saved in the current directory) (--get-files)
i continue an interrupted mirror using the cache (--continue)
Y mirror ALL links located in the first level pages (mirror links) (--mirrorlinks)
Proxy options:
P proxy use (-P proxy:port or -P user:pass@proxy:port) (--proxy <param>)
%f *use proxy for ftp (f0 don't use) (--httpproxy-ftp[=N])
Limits options:
rN set the mirror depth to N (* r9999) (--depth[=N])
%eN set the external links depth to N (* %e0) (--ext-depth[=N])
mN maximum file length for a non-html file (--max-files[=N])
mN,N2 maximum file length for non html (N) and html (N2)
MN maximum overall size that can be uploaded/scanned (--max-size[=N])
EN maximum mirror time in seconds (60=1 minute, 3600=1 hour) (--max-time[=N])
AN maximum transfer rate in bytes/seconds (1000=1KB/s max) (--max-rate[=N])
%cN maximum number of connections/seconds (*%c10) (--connection-per-second[=N])
GN pause transfer if N bytes reached, and wait until lock file is deleted (--max-pause[=N])
Flow control:
cN number of multiple connections (*c8) (--sockets[=N])
TN timeout, number of seconds after a non-responding link is shutdown (--timeout)
RN number of retries, in case of timeout or non-fatal errors (*R1) (--retries[=N])
JN traffic jam control, minimum transfert rate (bytes/seconds) tolerated for a link (--min-rate[=N])
HN host is abandonned if: 0=never, 1=timeout, 2=slow, 3=timeout or slow (--host-control[=N])
Links options:
%P *extended parsing, attempt to parse all links, even in unknown tags or Javascript (%P0 don't use) (--extended-parsing[=N])
n get non-html files 'near' an html file (ex: an image located outside) (--near)
t test all URLs (even forbidden ones) (--test)
%L <file> add all URL located in this text file (one URL per line) (--list <param>)
%S <file> add all scan rules located in this text file (one scan rule per line) (--urllist <param>)
Build options:
NN structure type (0 *original structure, 1+: see below) (--structure[=N])
or user defined structure (-N "%h%p/%n%q.%t")
LN long names (L1 *long names / L0 8-3 conversion / L2 ISO9660 compatible) (--long-names[=N])
KN keep original links (e.g. http://www.adr/link) (K0 *relative link, K absolute links, K4 original links, K3 absolute URI links) (--keep-links[=N])
x replace external html links by error pages (--replace-external)
%x do not include any password for external password protected websites (%x0 include) (--no-passwords)
%q *include query string for local files (useless, for information purpose only) (%q0 don't include) (--include-query-string)
o *generate output html file in case of error (404..) (o0 don't generate) (--generate-errors)
X *purge old files after update (X0 keep delete) (--purge-old[=N])
%p preserve html files 'as is' (identical to '-K4 -%F ""') (--preserve)
Spider options:
bN accept cookies in cookies.txt (0=do not accept,* 1=accept) (--cookies[=N])
u check document type if unknown (cgi,asp..) (u0 don't check, * u1 check but /, u2 check always) (--check-type[=N])
j *parse Java Classes (j0 don't parse) (--parse-java[=N])
sN follow robots.txt and meta robots tags (0=never,1=sometimes,* 2=always) (--robots[=N])
%h force HTTP/1.0 requests (reduce update features, only for old servers or proxies) (--http-10)
%k use keep-alive if possible, greately reducing latency for small files and test requests (%k0 don't use) (--keep-alive)
%B tolerant requests (accept bogus responses on some servers, but not standard!) (--tolerant)
%s update hacks: various hacks to limit re-transfers when updating (identical size, bogus response..) (--updatehack)
%A assume that a type (cgi,asp..) is always linked with a mime type (-%A php3,cgi=text/html;dat,bin=application/x-zip) (--assume <param>)
shortcut: '--assume standard' is equivalent to -%A php2,php3,php4,php,cgi,asp,jsp,pl,cfm=text/html
@iN internet protocol (0=both ipv6+ipv4, 4=ipv4 only, 6=ipv6 only) (--protocol[=N])
Browser ID:
F user-agent field (-F "user-agent name") (--user-agent <param>)
%F footer string in Html code (-%F "Mirrored [from host %s [file %s [at %s]]]" (--footer <param>)
%l preffered language (-%l "fr, en, jp, *" (--language <param>)
Log, index, cache
C create/use a cache for updates and retries (C0 no cache,C1 cache is prioritary,* C2 test update before) (--cache[=N])
k store all files in cache (not useful if files on disk) (--store-all-in-cache)
%n do not re-download locally erased files (--do-not-recatch)
%v display on screen filenames downloaded (in realtime) - * %v1 short version (--display)
Q no log - quiet mode (--do-not-log)
q no questions - quiet mode (--quiet)
z log - extra infos (--extra-log)
Z log - debug (--debug-log)
v log on screen (--verbose)
f *log in files (--file-log)
f2 one single log file (--single-log)
I *make an index (I0 don't make) (--index)
%I make an searchable index for this mirror (* %I0 don't make) (--search-index)
Expert options:
pN priority mode: (* p3) (--priority[=N])
p0 just scan, don't save anything (for checking links)
p1 save only html files
p2 save only non html files
*p3 save all files
p7 get html files before, then treat other files
S stay on the same directory (--stay-on-same-dir)
D *can only go down into subdirs (--can-go-down)
U can only go to upper directories (--can-go-up)
B can both go up&down into the directory structure (--can-go-up-and-down)
a *stay on the same address (--stay-on-same-address)
d stay on the same principal domain (--stay-on-same-domain)
l stay on the same TLD (eg: .com) (--stay-on-same-tld)
e go everywhere on the web (--go-everywhere)
%H debug HTTP headers in logfile (--debug-headers)
Guru options: (do NOT use if possible)
#X *use optimized engine (limited memory boundary checks) (--fast-engine)
#0 filter test (-#0 '*.gif' 'www.bar.com/foo.gif') (--debug-testfilters <param>)
#C cache list (-#C '*.com/spider*.gif' (--debug-cache <param>)
#f always flush log files (--advanced-flushlogs)
#FN maximum number of filters (--advanced-maxfilters[=N])
#h version info (--version)
#K scan stdin (debug) (--debug-scanstdin)
#L maximum number of links (-#L1000000) (--advanced-maxlinks)
#p display ugly progress information (--advanced-progressinfo)
#P catch URL (--catch-url)
#R old FTP routines (debug) (--debug-oldftp)
#T generate transfer ops. log every minutes (--debug-xfrstats)
#u wait time (--advanced-wait)
#Z generate transfer rate statictics every minutes (--debug-ratestats)
#! execute a shell command (-#! "echo hello") (--exec <param>)
Command-line specific options:
V execute system command after each files ($0 is the filename: -V "rm \$0") (--userdef-cmd <param>)
%U run the engine with another id when called as root (-%U smith) (--user <param>)
Details: Option N
N0 Site-structure (default)
N1 HTML in web/, images/other files in web/images/
N2 HTML in web/HTML, images/other in web/images
N3 HTML in web/, images/other in web/
N4 HTML in web/, images/other in web/xxx, where xxx is the file extension (all gif will be placed onto web/gif, for example)
N5 Images/other in web/xxx and HTML in web/HTML
N99 All files in web/, with random names (gadget !)
N100 Site-structure, without www.domain.xxx/
N101 Identical to N1 exept that "web" is replaced by the site's name
N102 Identical to N2 exept that "web" is replaced by the site's name
N103 Identical to N3 exept that "web" is replaced by the site's name
N104 Identical to N4 exept that "web" is replaced by the site's name
N105 Identical to N5 exept that "web" is replaced by the site's name
N199 Identical to N99 exept that "web" is replaced by the site's name
N1001 Identical to N1 exept that there is no "web" directory
N1002 Identical to N2 exept that there is no "web" directory
N1003 Identical to N3 exept that there is no "web" directory (option set for g option)
N1004 Identical to N4 exept that there is no "web" directory
N1005 Identical to N5 exept that there is no "web" directory
N1099 Identical to N99 exept that there is no "web" directory
Details: User-defined option N
'%n' Name of file without file type (ex: image)
'%N' Name of file, including file type (ex: image.gif)
'%t' File type (ex: gif)
'%p' Path [without ending /] (ex: /someimages)
'%h' Host name (ex: www.someweb.com)
'%M' URL MD5 (128 bits, 32 ascii bytes)
'%Q' query string MD5 (128 bits, 32 ascii bytes)
'%q' small query string MD5 (16 bits, 4 ascii bytes)
'%s?' Short name version (ex: %sN)
'%[param]' param variable in query string
'%[param:before:after:notfound:empty]' advanced variable extraction
Details: User-defined option N and advanced variable extraction
%[param:before:after:notfound:empty]
param : parameter name
before : string to prepend if the parameter was found
after : string to append if the parameter was found
notfound : string replacement if the parameter could not be found
empty : string replacement if the parameter was empty
all fields, except the first one (the parameter name), can be empty
Details: Option K
K0 foo.cgi?q=45 -> foo4B54.html?q=45 (relative URI, default)
K -> http://www.foobar.com/folder/foo.cgi?q=45 (absolute URL) (--keep-links[=N])
K4 -> foo.cgi?q=45 (original URL)
K3 -> /folder/foo.cgi?q=45 (absolute URI)
Shortcuts:
--mirror <URLs> *make a mirror of site(s) (default)
--get <URLs> get the files indicated, do not seek other URLs (-qg)
--list <text file> add all URL located in this text file (-%L)
--mirrorlinks <URLs> mirror all links in 1st level pages (-Y)
--testlinks <URLs> test links in pages (-r1p0C0I0t)
--spider <URLs> spider site(s), to test links: reports Errors & Warnings (-p0C0I0t)
--testsite <URLs> identical to --spider
--skeleton <URLs> make a mirror, but gets only html files (-p1)
--update update a mirror, without confirmation (-iC2)
--continue continue a mirror, without confirmation (-iC1)
--catchurl create a temporary proxy to capture an URL or a form post URL
--clean erase cache & log files
--http10 force http/1.0 requests (-%h)
example: httrack www.someweb.com/bob/
means: mirror site www.someweb.com/bob/ and only this site
example: httrack www.someweb.com/bob/ www.anothertest.com/mike/ +*.com/*.jpg
means: mirror the two sites together (with shared links) and accept any .jpg files on .com sites
example: httrack www.someweb.com/bob/bobby.html +* -r6
means get all files starting from bobby.html, with 6 link-depth, and possibility of going everywhere on the web
example: httrack www.someweb.com/bob/bobby.html --spider -P proxy.myhost.com:8080
runs the spider on www.someweb.com/bob/bobby.html using a proxy
example: httrack --update
updates a mirror in the current folder
example: httrack
will bring you to the interactive mode
example: httrack --continue
continues a mirror in the current folder
HTTrack version 3.23+swf (compiled Mar 8 2003)
Copyright (C) Xavier Roche and other contributors