Download a whole website with wget (or other) including all its downloadable content
I'm trying to download winamp's website in case they shut it down. I need to download literally everything.
I tried once with wget
and I managed to download the website itself, but when I try to download any file from it it gives a file without an extension or name. How can I fix that?
downloads wget
add a comment |
I'm trying to download winamp's website in case they shut it down. I need to download literally everything.
I tried once with wget
and I managed to download the website itself, but when I try to download any file from it it gives a file without an extension or name. How can I fix that?
downloads wget
add a comment |
I'm trying to download winamp's website in case they shut it down. I need to download literally everything.
I tried once with wget
and I managed to download the website itself, but when I try to download any file from it it gives a file without an extension or name. How can I fix that?
downloads wget
I'm trying to download winamp's website in case they shut it down. I need to download literally everything.
I tried once with wget
and I managed to download the website itself, but when I try to download any file from it it gives a file without an extension or name. How can I fix that?
downloads wget
downloads wget
edited Apr 17 '17 at 10:35
Martin Thoma
6,429155172
6,429155172
asked Dec 16 '13 at 14:58
Mina Michael
4,0771759121
4,0771759121
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:
wget -r http://winapp.com # or whatever
With HTTrack, first install it:
sudo apt-get install httrack
now run it just 1 external link:
httrack --ext-depth=1 http://winapp.com
This will download the winapp CDN files, but not the files in the files in the files in the whole internet.
add a comment |
wget -p -k http://somewebsite.com
From man wget
-p
--page-requisites
This option causes Wget to download all the files that are
necessary to properly display a given HTML page. This includes
such things as inlined images, sounds, and referenced stylesheets.
Ordinarily, when downloading a single HTML page, any requisite
documents that may be needed to display it properly are not
downloaded. Using -r together with -l can help, but since Wget
does not ordinarily distinguish between external and inlined
documents, one is generally left with "leaf documents" that are
missing their requisites.
For instance, say document 1.html contains an "<IMG>" tag
referencing 1.gif and an "<A>" tag pointing to external document
2.html. Say that 2.html is similar but that its image is 2.gif and
it links to 3.html. Say this continues up to some arbitrarily high
number.
If one executes the command:
wget -r -l 2 http://<site>/1.html
then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.
As you can see, 3.html is without its requisite 3.gif because Wget
is simply counting the number of hops (up to 2) away from 1.html in
order to determine where to stop the recursion. However, with this
command:
wget -r -l 2 -p http://<site>/1.html
all the above files and 3.html's requisite 3.gif will be
downloaded. Similarly,
wget -r -l 1 -p http://<site>/1.html
will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded. One
might think that:
wget -r -l 0 -p http://<site>/1.html
would download just 1.html and 1.gif, but unfortunately this is not
the case, because -l 0 is equivalent to -l inf---that is, infinite
recursion. To download a single HTML page (or a handful of them,
all specified on the command-line or in a -i URL input file) and
its (or their) requisites, simply leave off -r and -l:
wget -p http://<site>/1.html
Note that Wget will behave as if -r had been specified, but only
that single page and its requisites will be downloaded.Links from
that page to external documents will not be followed. Actually, to
download a single page and all its requisites (even if they exist
on separate websites), and make sure the lot displays properly
locally, this author likes to use a few options in addition to -p:
wget -E -H -k -K -p http://<site>/<document>
To finish off this topic, it's worth knowing that Wget's idea of an
external document link is any URL specified in an "<A>" tag, an
"<AREA>" tag, or a "<LINK>" tag other than "<LINK
REL="stylesheet">".
==================================================================
-k
--convert-links
After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that
links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.
Each link will be changed in one of the two ways:
· The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.
Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif. This kind of transformation
works reliably for arbitrary combinations of directories.
· The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.
Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.
Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet
address rather than presenting a broken link. The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.
Note that only at the end of the download can Wget know which links have been downloaded. Because of that, the work done by -k will be performed at the end of all the downloads.
--convert-file-only
This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term
here in order not to cause confusion.
It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different
hosts.
Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted
to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be
processed by Wget and converted to the effective scheme (ie. "http://").
sorry for my bad indentation :(
add a comment |
If You want to download everything associated with the link you have
You can try this
wget -r -U "BrowserName" "Url"
You may wanna use --wait="duration"
to avoid your ip being blocked.
Its weird requesting page after page without wait periods. that's not human
1
Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
– anonymous2
Apr 11 '17 at 13:49
wget -m
could also be used instead of-r
– tricasse
Dec 7 '17 at 12:50
1
Use--random-wait
with--wait=X
in order to avoid blocks further.
– Patrick
Jan 4 at 21:40
@Patrick Would you care to post a full answer? Your comment sounds interesting.
– WinEunuuchs2Unix
Jan 15 at 2:52
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f391622%2fdownload-a-whole-website-with-wget-or-other-including-all-its-downloadable-con%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:
wget -r http://winapp.com # or whatever
With HTTrack, first install it:
sudo apt-get install httrack
now run it just 1 external link:
httrack --ext-depth=1 http://winapp.com
This will download the winapp CDN files, but not the files in the files in the files in the whole internet.
add a comment |
You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:
wget -r http://winapp.com # or whatever
With HTTrack, first install it:
sudo apt-get install httrack
now run it just 1 external link:
httrack --ext-depth=1 http://winapp.com
This will download the winapp CDN files, but not the files in the files in the files in the whole internet.
add a comment |
You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:
wget -r http://winapp.com # or whatever
With HTTrack, first install it:
sudo apt-get install httrack
now run it just 1 external link:
httrack --ext-depth=1 http://winapp.com
This will download the winapp CDN files, but not the files in the files in the files in the whole internet.
You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:
wget -r http://winapp.com # or whatever
With HTTrack, first install it:
sudo apt-get install httrack
now run it just 1 external link:
httrack --ext-depth=1 http://winapp.com
This will download the winapp CDN files, but not the files in the files in the files in the whole internet.
answered Dec 16 '13 at 15:13
Braiam
51.3k20136219
51.3k20136219
add a comment |
add a comment |
wget -p -k http://somewebsite.com
From man wget
-p
--page-requisites
This option causes Wget to download all the files that are
necessary to properly display a given HTML page. This includes
such things as inlined images, sounds, and referenced stylesheets.
Ordinarily, when downloading a single HTML page, any requisite
documents that may be needed to display it properly are not
downloaded. Using -r together with -l can help, but since Wget
does not ordinarily distinguish between external and inlined
documents, one is generally left with "leaf documents" that are
missing their requisites.
For instance, say document 1.html contains an "<IMG>" tag
referencing 1.gif and an "<A>" tag pointing to external document
2.html. Say that 2.html is similar but that its image is 2.gif and
it links to 3.html. Say this continues up to some arbitrarily high
number.
If one executes the command:
wget -r -l 2 http://<site>/1.html
then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.
As you can see, 3.html is without its requisite 3.gif because Wget
is simply counting the number of hops (up to 2) away from 1.html in
order to determine where to stop the recursion. However, with this
command:
wget -r -l 2 -p http://<site>/1.html
all the above files and 3.html's requisite 3.gif will be
downloaded. Similarly,
wget -r -l 1 -p http://<site>/1.html
will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded. One
might think that:
wget -r -l 0 -p http://<site>/1.html
would download just 1.html and 1.gif, but unfortunately this is not
the case, because -l 0 is equivalent to -l inf---that is, infinite
recursion. To download a single HTML page (or a handful of them,
all specified on the command-line or in a -i URL input file) and
its (or their) requisites, simply leave off -r and -l:
wget -p http://<site>/1.html
Note that Wget will behave as if -r had been specified, but only
that single page and its requisites will be downloaded.Links from
that page to external documents will not be followed. Actually, to
download a single page and all its requisites (even if they exist
on separate websites), and make sure the lot displays properly
locally, this author likes to use a few options in addition to -p:
wget -E -H -k -K -p http://<site>/<document>
To finish off this topic, it's worth knowing that Wget's idea of an
external document link is any URL specified in an "<A>" tag, an
"<AREA>" tag, or a "<LINK>" tag other than "<LINK
REL="stylesheet">".
==================================================================
-k
--convert-links
After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that
links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.
Each link will be changed in one of the two ways:
· The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.
Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif. This kind of transformation
works reliably for arbitrary combinations of directories.
· The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.
Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.
Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet
address rather than presenting a broken link. The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.
Note that only at the end of the download can Wget know which links have been downloaded. Because of that, the work done by -k will be performed at the end of all the downloads.
--convert-file-only
This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term
here in order not to cause confusion.
It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different
hosts.
Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted
to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be
processed by Wget and converted to the effective scheme (ie. "http://").
sorry for my bad indentation :(
add a comment |
wget -p -k http://somewebsite.com
From man wget
-p
--page-requisites
This option causes Wget to download all the files that are
necessary to properly display a given HTML page. This includes
such things as inlined images, sounds, and referenced stylesheets.
Ordinarily, when downloading a single HTML page, any requisite
documents that may be needed to display it properly are not
downloaded. Using -r together with -l can help, but since Wget
does not ordinarily distinguish between external and inlined
documents, one is generally left with "leaf documents" that are
missing their requisites.
For instance, say document 1.html contains an "<IMG>" tag
referencing 1.gif and an "<A>" tag pointing to external document
2.html. Say that 2.html is similar but that its image is 2.gif and
it links to 3.html. Say this continues up to some arbitrarily high
number.
If one executes the command:
wget -r -l 2 http://<site>/1.html
then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.
As you can see, 3.html is without its requisite 3.gif because Wget
is simply counting the number of hops (up to 2) away from 1.html in
order to determine where to stop the recursion. However, with this
command:
wget -r -l 2 -p http://<site>/1.html
all the above files and 3.html's requisite 3.gif will be
downloaded. Similarly,
wget -r -l 1 -p http://<site>/1.html
will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded. One
might think that:
wget -r -l 0 -p http://<site>/1.html
would download just 1.html and 1.gif, but unfortunately this is not
the case, because -l 0 is equivalent to -l inf---that is, infinite
recursion. To download a single HTML page (or a handful of them,
all specified on the command-line or in a -i URL input file) and
its (or their) requisites, simply leave off -r and -l:
wget -p http://<site>/1.html
Note that Wget will behave as if -r had been specified, but only
that single page and its requisites will be downloaded.Links from
that page to external documents will not be followed. Actually, to
download a single page and all its requisites (even if they exist
on separate websites), and make sure the lot displays properly
locally, this author likes to use a few options in addition to -p:
wget -E -H -k -K -p http://<site>/<document>
To finish off this topic, it's worth knowing that Wget's idea of an
external document link is any URL specified in an "<A>" tag, an
"<AREA>" tag, or a "<LINK>" tag other than "<LINK
REL="stylesheet">".
==================================================================
-k
--convert-links
After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that
links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.
Each link will be changed in one of the two ways:
· The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.
Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif. This kind of transformation
works reliably for arbitrary combinations of directories.
· The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.
Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.
Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet
address rather than presenting a broken link. The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.
Note that only at the end of the download can Wget know which links have been downloaded. Because of that, the work done by -k will be performed at the end of all the downloads.
--convert-file-only
This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term
here in order not to cause confusion.
It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different
hosts.
Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted
to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be
processed by Wget and converted to the effective scheme (ie. "http://").
sorry for my bad indentation :(
add a comment |
wget -p -k http://somewebsite.com
From man wget
-p
--page-requisites
This option causes Wget to download all the files that are
necessary to properly display a given HTML page. This includes
such things as inlined images, sounds, and referenced stylesheets.
Ordinarily, when downloading a single HTML page, any requisite
documents that may be needed to display it properly are not
downloaded. Using -r together with -l can help, but since Wget
does not ordinarily distinguish between external and inlined
documents, one is generally left with "leaf documents" that are
missing their requisites.
For instance, say document 1.html contains an "<IMG>" tag
referencing 1.gif and an "<A>" tag pointing to external document
2.html. Say that 2.html is similar but that its image is 2.gif and
it links to 3.html. Say this continues up to some arbitrarily high
number.
If one executes the command:
wget -r -l 2 http://<site>/1.html
then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.
As you can see, 3.html is without its requisite 3.gif because Wget
is simply counting the number of hops (up to 2) away from 1.html in
order to determine where to stop the recursion. However, with this
command:
wget -r -l 2 -p http://<site>/1.html
all the above files and 3.html's requisite 3.gif will be
downloaded. Similarly,
wget -r -l 1 -p http://<site>/1.html
will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded. One
might think that:
wget -r -l 0 -p http://<site>/1.html
would download just 1.html and 1.gif, but unfortunately this is not
the case, because -l 0 is equivalent to -l inf---that is, infinite
recursion. To download a single HTML page (or a handful of them,
all specified on the command-line or in a -i URL input file) and
its (or their) requisites, simply leave off -r and -l:
wget -p http://<site>/1.html
Note that Wget will behave as if -r had been specified, but only
that single page and its requisites will be downloaded.Links from
that page to external documents will not be followed. Actually, to
download a single page and all its requisites (even if they exist
on separate websites), and make sure the lot displays properly
locally, this author likes to use a few options in addition to -p:
wget -E -H -k -K -p http://<site>/<document>
To finish off this topic, it's worth knowing that Wget's idea of an
external document link is any URL specified in an "<A>" tag, an
"<AREA>" tag, or a "<LINK>" tag other than "<LINK
REL="stylesheet">".
==================================================================
-k
--convert-links
After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that
links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.
Each link will be changed in one of the two ways:
· The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.
Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif. This kind of transformation
works reliably for arbitrary combinations of directories.
· The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.
Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.
Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet
address rather than presenting a broken link. The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.
Note that only at the end of the download can Wget know which links have been downloaded. Because of that, the work done by -k will be performed at the end of all the downloads.
--convert-file-only
This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term
here in order not to cause confusion.
It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different
hosts.
Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted
to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be
processed by Wget and converted to the effective scheme (ie. "http://").
sorry for my bad indentation :(
wget -p -k http://somewebsite.com
From man wget
-p
--page-requisites
This option causes Wget to download all the files that are
necessary to properly display a given HTML page. This includes
such things as inlined images, sounds, and referenced stylesheets.
Ordinarily, when downloading a single HTML page, any requisite
documents that may be needed to display it properly are not
downloaded. Using -r together with -l can help, but since Wget
does not ordinarily distinguish between external and inlined
documents, one is generally left with "leaf documents" that are
missing their requisites.
For instance, say document 1.html contains an "<IMG>" tag
referencing 1.gif and an "<A>" tag pointing to external document
2.html. Say that 2.html is similar but that its image is 2.gif and
it links to 3.html. Say this continues up to some arbitrarily high
number.
If one executes the command:
wget -r -l 2 http://<site>/1.html
then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.
As you can see, 3.html is without its requisite 3.gif because Wget
is simply counting the number of hops (up to 2) away from 1.html in
order to determine where to stop the recursion. However, with this
command:
wget -r -l 2 -p http://<site>/1.html
all the above files and 3.html's requisite 3.gif will be
downloaded. Similarly,
wget -r -l 1 -p http://<site>/1.html
will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded. One
might think that:
wget -r -l 0 -p http://<site>/1.html
would download just 1.html and 1.gif, but unfortunately this is not
the case, because -l 0 is equivalent to -l inf---that is, infinite
recursion. To download a single HTML page (or a handful of them,
all specified on the command-line or in a -i URL input file) and
its (or their) requisites, simply leave off -r and -l:
wget -p http://<site>/1.html
Note that Wget will behave as if -r had been specified, but only
that single page and its requisites will be downloaded.Links from
that page to external documents will not be followed. Actually, to
download a single page and all its requisites (even if they exist
on separate websites), and make sure the lot displays properly
locally, this author likes to use a few options in addition to -p:
wget -E -H -k -K -p http://<site>/<document>
To finish off this topic, it's worth knowing that Wget's idea of an
external document link is any URL specified in an "<A>" tag, an
"<AREA>" tag, or a "<LINK>" tag other than "<LINK
REL="stylesheet">".
==================================================================
-k
--convert-links
After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that
links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.
Each link will be changed in one of the two ways:
· The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.
Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif. This kind of transformation
works reliably for arbitrary combinations of directories.
· The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.
Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.
Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet
address rather than presenting a broken link. The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.
Note that only at the end of the download can Wget know which links have been downloaded. Because of that, the work done by -k will be performed at the end of all the downloads.
--convert-file-only
This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term
here in order not to cause confusion.
It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different
hosts.
Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted
to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be
processed by Wget and converted to the effective scheme (ie. "http://").
sorry for my bad indentation :(
answered Dec 11 at 10:19
waLL e
1113
1113
add a comment |
add a comment |
If You want to download everything associated with the link you have
You can try this
wget -r -U "BrowserName" "Url"
You may wanna use --wait="duration"
to avoid your ip being blocked.
Its weird requesting page after page without wait periods. that's not human
1
Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
– anonymous2
Apr 11 '17 at 13:49
wget -m
could also be used instead of-r
– tricasse
Dec 7 '17 at 12:50
1
Use--random-wait
with--wait=X
in order to avoid blocks further.
– Patrick
Jan 4 at 21:40
@Patrick Would you care to post a full answer? Your comment sounds interesting.
– WinEunuuchs2Unix
Jan 15 at 2:52
add a comment |
If You want to download everything associated with the link you have
You can try this
wget -r -U "BrowserName" "Url"
You may wanna use --wait="duration"
to avoid your ip being blocked.
Its weird requesting page after page without wait periods. that's not human
1
Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
– anonymous2
Apr 11 '17 at 13:49
wget -m
could also be used instead of-r
– tricasse
Dec 7 '17 at 12:50
1
Use--random-wait
with--wait=X
in order to avoid blocks further.
– Patrick
Jan 4 at 21:40
@Patrick Would you care to post a full answer? Your comment sounds interesting.
– WinEunuuchs2Unix
Jan 15 at 2:52
add a comment |
If You want to download everything associated with the link you have
You can try this
wget -r -U "BrowserName" "Url"
You may wanna use --wait="duration"
to avoid your ip being blocked.
Its weird requesting page after page without wait periods. that's not human
If You want to download everything associated with the link you have
You can try this
wget -r -U "BrowserName" "Url"
You may wanna use --wait="duration"
to avoid your ip being blocked.
Its weird requesting page after page without wait periods. that's not human
edited Apr 11 '17 at 14:22
Sumeet Deshmukh
4,34152971
4,34152971
answered Apr 11 '17 at 13:03
Sp1k3
1
1
1
Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
– anonymous2
Apr 11 '17 at 13:49
wget -m
could also be used instead of-r
– tricasse
Dec 7 '17 at 12:50
1
Use--random-wait
with--wait=X
in order to avoid blocks further.
– Patrick
Jan 4 at 21:40
@Patrick Would you care to post a full answer? Your comment sounds interesting.
– WinEunuuchs2Unix
Jan 15 at 2:52
add a comment |
1
Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
– anonymous2
Apr 11 '17 at 13:49
wget -m
could also be used instead of-r
– tricasse
Dec 7 '17 at 12:50
1
Use--random-wait
with--wait=X
in order to avoid blocks further.
– Patrick
Jan 4 at 21:40
@Patrick Would you care to post a full answer? Your comment sounds interesting.
– WinEunuuchs2Unix
Jan 15 at 2:52
1
1
Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
– anonymous2
Apr 11 '17 at 13:49
Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
– anonymous2
Apr 11 '17 at 13:49
wget -m
could also be used instead of -r
– tricasse
Dec 7 '17 at 12:50
wget -m
could also be used instead of -r
– tricasse
Dec 7 '17 at 12:50
1
1
Use
--random-wait
with --wait=X
in order to avoid blocks further.– Patrick
Jan 4 at 21:40
Use
--random-wait
with --wait=X
in order to avoid blocks further.– Patrick
Jan 4 at 21:40
@Patrick Would you care to post a full answer? Your comment sounds interesting.
– WinEunuuchs2Unix
Jan 15 at 2:52
@Patrick Would you care to post a full answer? Your comment sounds interesting.
– WinEunuuchs2Unix
Jan 15 at 2:52
add a comment |
Thanks for contributing an answer to Ask Ubuntu!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f391622%2fdownload-a-whole-website-with-wget-or-other-including-all-its-downloadable-con%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown