Download a whole website with wget (or other) including all its downloadable content

I'm trying to download winamp's website in case they shut it down. I need to download literally everything.

I tried once with wget and I managed to download the website itself, but when I try to download any file from it it gives a file without an extension or name. How can I fix that?

edited Apr 17 '17 at 10:35

Martin Thoma

6,429155172

asked Dec 16 '13 at 14:58

Mina Michael

4,0771759121

add a comment |

I'm trying to download winamp's website in case they shut it down. I need to download literally everything.

I tried once with wget and I managed to download the website itself, but when I try to download any file from it it gives a file without an extension or name. How can I fix that?

edited Apr 17 '17 at 10:35

Martin Thoma

6,429155172

asked Dec 16 '13 at 14:58

Mina Michael

4,0771759121

add a comment |

I'm trying to download winamp's website in case they shut it down. I need to download literally everything.

I tried once with wget and I managed to download the website itself, but when I try to download any file from it it gives a file without an extension or name. How can I fix that?

edited Apr 17 '17 at 10:35

Martin Thoma

6,429155172

asked Dec 16 '13 at 14:58

Mina Michael

4,0771759121

I'm trying to download winamp's website in case they shut it down. I need to download literally everything.

I tried once with wget and I managed to download the website itself, but when I try to download any file from it it gives a file without an extension or name. How can I fix that?

downloads wget

edited Apr 17 '17 at 10:35

Martin Thoma

6,429155172

asked Dec 16 '13 at 14:58

Mina Michael

4,0771759121

edited Apr 17 '17 at 10:35

Martin Thoma

6,429155172

asked Dec 16 '13 at 14:58

Mina Michael

4,0771759121

edited Apr 17 '17 at 10:35

Martin Thoma

6,429155172

edited Apr 17 '17 at 10:35

Martin Thoma

6,429155172

edited Apr 17 '17 at 10:35

Martin Thoma

6,429155172

asked Dec 16 '13 at 14:58

Mina Michael

4,0771759121

asked Dec 16 '13 at 14:58

Mina Michael

4,0771759121

asked Dec 16 '13 at 14:58

Mina Michael

4,0771759121

add a comment |

3 Answers
3

active

oldest

votes

You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:

wget -r http://winapp.com # or whatever

With HTTrack, first install it:

sudo apt-get install httrack

now run it just 1 external link:

httrack --ext-depth=1 http://winapp.com

This will download the winapp CDN files, but not the files in the files in the files in the whole internet.

answered Dec 16 '13 at 15:13

Braiam

51.3k20136219

add a comment |

wget -p -k http://somewebsite.com

From man wget

-p

--page-requisites

   This option causes Wget to download all the files that are

   necessary to properly display a given HTML page.  This includes

   such things as inlined images, sounds, and referenced stylesheets.



   Ordinarily, when downloading a single HTML page, any requisite

   documents that may be needed to display it properly are not

   downloaded.  Using -r together with -l can help, but since Wget

   does not ordinarily distinguish between external and inlined

   documents, one is generally left with "leaf documents" that are

   missing their requisites.



   For instance, say document 1.html contains an "<IMG>" tag

   referencing 1.gif and an "<A>" tag pointing to external document

   2.html.  Say that 2.html is similar but that its image is 2.gif and

   it links to 3.html.  Say this continues up to some arbitrarily high

   number.



   If one executes the command:



           wget -r -l 2 http://<site>/1.html



   then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.

   As you can see, 3.html is without its requisite 3.gif because Wget

   is simply counting the number of hops (up to 2) away from 1.html in

   order to determine where to stop the recursion.  However, with this

   command:



           wget -r -l 2 -p http://<site>/1.html



   all the above files and 3.html's requisite 3.gif will be

   downloaded.  Similarly,



           wget -r -l 1 -p http://<site>/1.html



   will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded.  One

   might think that:



           wget -r -l 0 -p http://<site>/1.html



   would download just 1.html and 1.gif, but unfortunately this is not

   the case, because -l 0 is equivalent to -l inf---that is, infinite

   recursion.  To download a single HTML page (or a handful of them,

   all specified on the command-line or in a -i URL input file) and

   its (or their) requisites, simply leave off -r and -l:



           wget -p http://<site>/1.html



   Note that Wget will behave as if -r had been specified, but only

   that single page and its requisites will be downloaded.Links from

   that page to external documents will not be followed.  Actually, to

   download a single page and all its requisites (even if they exist

   on separate websites), and make sure the lot displays properly

   locally, this author likes to use a few options in addition to -p:



          wget -E -H -k -K -p http://<site>/<document>



   To finish off this topic, it's worth knowing that Wget's idea of an

   external document link is any URL specified in an "<A>" tag, an

   "<AREA>" tag, or a "<LINK>" tag other than "<LINK

   REL="stylesheet">".



  ==================================================================



 -k

 --convert-links

   After the download is complete, convert the links in the document to make them suitable for local viewing.  This affects not only the visible hyperlinks, but any part of the document that

   links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.



   Each link will be changed in one of the two ways:



   ·   The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.



       Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif.  This kind of transformation

       works reliably for arbitrary combinations of directories.



   ·   The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.



       Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.



   Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet

   address rather than presenting a broken link.  The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.



   Note that only at the end of the download can Wget know which links have been downloaded.  Because of that, the work done by -k will be performed at the end of all the downloads.



  --convert-file-only

   This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term

   here in order not to cause confusion.



   It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different

   hosts.



   Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted

   to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be

   processed by Wget and converted to the effective scheme (ie. "http://").

sorry for my bad indentation :(

answered Dec 11 at 10:19

waLL e

1113

add a comment |

-1

If You want to download everything associated with the link you have
You can try this

wget -r -U "BrowserName" "Url"

You may wanna use --wait="duration" to avoid your ip being blocked.
Its weird requesting page after page without wait periods. that's not human

edited Apr 11 '17 at 14:22

Sumeet Deshmukh

4,34152971

answered Apr 11 '17 at 13:03

Sp1k3

1

Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
– anonymous2
Apr 11 '17 at 13:49

wget -m could also be used instead of -r
– tricasse
Dec 7 '17 at 12:50

1

Use --random-wait with --wait=X in order to avoid blocks further.
– Patrick
Jan 4 at 21:40

@Patrick Would you care to post a full answer? Your comment sounds interesting.
– WinEunuuchs2Unix
Jan 15 at 2:52

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f391622%2fdownload-a-whole-website-with-wget-or-other-including-all-its-downloadable-con%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:

wget -r http://winapp.com # or whatever

With HTTrack, first install it:

sudo apt-get install httrack

now run it just 1 external link:

httrack --ext-depth=1 http://winapp.com

This will download the winapp CDN files, but not the files in the files in the files in the whole internet.

answered Dec 16 '13 at 15:13

Braiam

51.3k20136219

add a comment |

You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:

wget -r http://winapp.com # or whatever

With HTTrack, first install it:

sudo apt-get install httrack

now run it just 1 external link:

httrack --ext-depth=1 http://winapp.com

This will download the winapp CDN files, but not the files in the files in the files in the whole internet.

answered Dec 16 '13 at 15:13

Braiam

51.3k20136219

add a comment |

You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:

wget -r http://winapp.com # or whatever

With HTTrack, first install it:

sudo apt-get install httrack

now run it just 1 external link:

httrack --ext-depth=1 http://winapp.com

This will download the winapp CDN files, but not the files in the files in the files in the whole internet.

answered Dec 16 '13 at 15:13

Braiam

51.3k20136219

You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:

wget -r http://winapp.com # or whatever

With HTTrack, first install it:

sudo apt-get install httrack

now run it just 1 external link:

httrack --ext-depth=1 http://winapp.com

This will download the winapp CDN files, but not the files in the files in the files in the whole internet.

answered Dec 16 '13 at 15:13

Braiam

51.3k20136219

answered Dec 16 '13 at 15:13

Braiam

51.3k20136219

answered Dec 16 '13 at 15:13

Braiam

51.3k20136219

answered Dec 16 '13 at 15:13

Braiam

51.3k20136219

add a comment |

wget -p -k http://somewebsite.com

From man wget

-p

--page-requisites

   This option causes Wget to download all the files that are

   necessary to properly display a given HTML page.  This includes

   such things as inlined images, sounds, and referenced stylesheets.



   Ordinarily, when downloading a single HTML page, any requisite

   documents that may be needed to display it properly are not

   downloaded.  Using -r together with -l can help, but since Wget

   does not ordinarily distinguish between external and inlined

   documents, one is generally left with "leaf documents" that are

   missing their requisites.



   For instance, say document 1.html contains an "<IMG>" tag

   referencing 1.gif and an "<A>" tag pointing to external document

   2.html.  Say that 2.html is similar but that its image is 2.gif and

   it links to 3.html.  Say this continues up to some arbitrarily high

   number.



   If one executes the command:



           wget -r -l 2 http://<site>/1.html



   then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.

   As you can see, 3.html is without its requisite 3.gif because Wget

   is simply counting the number of hops (up to 2) away from 1.html in

   order to determine where to stop the recursion.  However, with this

   command:



           wget -r -l 2 -p http://<site>/1.html



   all the above files and 3.html's requisite 3.gif will be

   downloaded.  Similarly,



           wget -r -l 1 -p http://<site>/1.html



   will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded.  One

   might think that:



           wget -r -l 0 -p http://<site>/1.html



   would download just 1.html and 1.gif, but unfortunately this is not

   the case, because -l 0 is equivalent to -l inf---that is, infinite

   recursion.  To download a single HTML page (or a handful of them,

   all specified on the command-line or in a -i URL input file) and

   its (or their) requisites, simply leave off -r and -l:



           wget -p http://<site>/1.html



   Note that Wget will behave as if -r had been specified, but only

   that single page and its requisites will be downloaded.Links from

   that page to external documents will not be followed.  Actually, to

   download a single page and all its requisites (even if they exist

   on separate websites), and make sure the lot displays properly

   locally, this author likes to use a few options in addition to -p:



          wget -E -H -k -K -p http://<site>/<document>



   To finish off this topic, it's worth knowing that Wget's idea of an

   external document link is any URL specified in an "<A>" tag, an

   "<AREA>" tag, or a "<LINK>" tag other than "<LINK

   REL="stylesheet">".



  ==================================================================



 -k

 --convert-links

   After the download is complete, convert the links in the document to make them suitable for local viewing.  This affects not only the visible hyperlinks, but any part of the document that

   links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.



   Each link will be changed in one of the two ways:



   ·   The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.



       Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif.  This kind of transformation

       works reliably for arbitrary combinations of directories.



   ·   The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.



       Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.



   Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet

   address rather than presenting a broken link.  The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.



   Note that only at the end of the download can Wget know which links have been downloaded.  Because of that, the work done by -k will be performed at the end of all the downloads.



  --convert-file-only

   This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term

   here in order not to cause confusion.



   It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different

   hosts.



   Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted

   to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be

   processed by Wget and converted to the effective scheme (ie. "http://").

sorry for my bad indentation :(

answered Dec 11 at 10:19

waLL e

1113

add a comment |

wget -p -k http://somewebsite.com

From man wget

-p

--page-requisites

   This option causes Wget to download all the files that are

   necessary to properly display a given HTML page.  This includes

   such things as inlined images, sounds, and referenced stylesheets.



   Ordinarily, when downloading a single HTML page, any requisite

   documents that may be needed to display it properly are not

   downloaded.  Using -r together with -l can help, but since Wget

   does not ordinarily distinguish between external and inlined

   documents, one is generally left with "leaf documents" that are

   missing their requisites.



   For instance, say document 1.html contains an "<IMG>" tag

   referencing 1.gif and an "<A>" tag pointing to external document

   2.html.  Say that 2.html is similar but that its image is 2.gif and

   it links to 3.html.  Say this continues up to some arbitrarily high

   number.



   If one executes the command:



           wget -r -l 2 http://<site>/1.html



   then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.

   As you can see, 3.html is without its requisite 3.gif because Wget

   is simply counting the number of hops (up to 2) away from 1.html in

   order to determine where to stop the recursion.  However, with this

   command:



           wget -r -l 2 -p http://<site>/1.html



   all the above files and 3.html's requisite 3.gif will be

   downloaded.  Similarly,



           wget -r -l 1 -p http://<site>/1.html



   will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded.  One

   might think that:



           wget -r -l 0 -p http://<site>/1.html



   would download just 1.html and 1.gif, but unfortunately this is not

   the case, because -l 0 is equivalent to -l inf---that is, infinite

   recursion.  To download a single HTML page (or a handful of them,

   all specified on the command-line or in a -i URL input file) and

   its (or their) requisites, simply leave off -r and -l:



           wget -p http://<site>/1.html



   Note that Wget will behave as if -r had been specified, but only

   that single page and its requisites will be downloaded.Links from

   that page to external documents will not be followed.  Actually, to

   download a single page and all its requisites (even if they exist

   on separate websites), and make sure the lot displays properly

   locally, this author likes to use a few options in addition to -p:



          wget -E -H -k -K -p http://<site>/<document>



   To finish off this topic, it's worth knowing that Wget's idea of an

   external document link is any URL specified in an "<A>" tag, an

   "<AREA>" tag, or a "<LINK>" tag other than "<LINK

   REL="stylesheet">".



  ==================================================================



 -k

 --convert-links

   After the download is complete, convert the links in the document to make them suitable for local viewing.  This affects not only the visible hyperlinks, but any part of the document that

   links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.



   Each link will be changed in one of the two ways:



   ·   The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.



       Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif.  This kind of transformation

       works reliably for arbitrary combinations of directories.



   ·   The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.



       Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.



   Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet

   address rather than presenting a broken link.  The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.



   Note that only at the end of the download can Wget know which links have been downloaded.  Because of that, the work done by -k will be performed at the end of all the downloads.



  --convert-file-only

   This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term

   here in order not to cause confusion.



   It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different

   hosts.



   Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted

   to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be

   processed by Wget and converted to the effective scheme (ie. "http://").

sorry for my bad indentation :(

answered Dec 11 at 10:19

waLL e

1113

add a comment |

wget -p -k http://somewebsite.com

From man wget

-p

--page-requisites

   This option causes Wget to download all the files that are

   necessary to properly display a given HTML page.  This includes

   such things as inlined images, sounds, and referenced stylesheets.



   Ordinarily, when downloading a single HTML page, any requisite

   documents that may be needed to display it properly are not

   downloaded.  Using -r together with -l can help, but since Wget

   does not ordinarily distinguish between external and inlined

   documents, one is generally left with "leaf documents" that are

   missing their requisites.



   For instance, say document 1.html contains an "<IMG>" tag

   referencing 1.gif and an "<A>" tag pointing to external document

   2.html.  Say that 2.html is similar but that its image is 2.gif and

   it links to 3.html.  Say this continues up to some arbitrarily high

   number.



   If one executes the command:



           wget -r -l 2 http://<site>/1.html



   then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.

   As you can see, 3.html is without its requisite 3.gif because Wget

   is simply counting the number of hops (up to 2) away from 1.html in

   order to determine where to stop the recursion.  However, with this

   command:



           wget -r -l 2 -p http://<site>/1.html



   all the above files and 3.html's requisite 3.gif will be

   downloaded.  Similarly,



           wget -r -l 1 -p http://<site>/1.html



   will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded.  One

   might think that:



           wget -r -l 0 -p http://<site>/1.html



   would download just 1.html and 1.gif, but unfortunately this is not

   the case, because -l 0 is equivalent to -l inf---that is, infinite

   recursion.  To download a single HTML page (or a handful of them,

   all specified on the command-line or in a -i URL input file) and

   its (or their) requisites, simply leave off -r and -l:



           wget -p http://<site>/1.html



   Note that Wget will behave as if -r had been specified, but only

   that single page and its requisites will be downloaded.Links from

   that page to external documents will not be followed.  Actually, to

   download a single page and all its requisites (even if they exist

   on separate websites), and make sure the lot displays properly

   locally, this author likes to use a few options in addition to -p:



          wget -E -H -k -K -p http://<site>/<document>



   To finish off this topic, it's worth knowing that Wget's idea of an

   external document link is any URL specified in an "<A>" tag, an

   "<AREA>" tag, or a "<LINK>" tag other than "<LINK

   REL="stylesheet">".



  ==================================================================



 -k

 --convert-links

   After the download is complete, convert the links in the document to make them suitable for local viewing.  This affects not only the visible hyperlinks, but any part of the document that

   links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.



   Each link will be changed in one of the two ways:



   ·   The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.



       Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif.  This kind of transformation

       works reliably for arbitrary combinations of directories.



   ·   The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.



       Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.



   Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet

   address rather than presenting a broken link.  The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.



   Note that only at the end of the download can Wget know which links have been downloaded.  Because of that, the work done by -k will be performed at the end of all the downloads.



  --convert-file-only

   This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term

   here in order not to cause confusion.



   It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different

   hosts.



   Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted

   to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be

   processed by Wget and converted to the effective scheme (ie. "http://").

sorry for my bad indentation :(

answered Dec 11 at 10:19

waLL e

1113

wget -p -k http://somewebsite.com

From man wget

-p

--page-requisites

   This option causes Wget to download all the files that are

   necessary to properly display a given HTML page.  This includes

   such things as inlined images, sounds, and referenced stylesheets.



   Ordinarily, when downloading a single HTML page, any requisite

   documents that may be needed to display it properly are not

   downloaded.  Using -r together with -l can help, but since Wget

   does not ordinarily distinguish between external and inlined

   documents, one is generally left with "leaf documents" that are

   missing their requisites.



   For instance, say document 1.html contains an "<IMG>" tag

   referencing 1.gif and an "<A>" tag pointing to external document

   2.html.  Say that 2.html is similar but that its image is 2.gif and

   it links to 3.html.  Say this continues up to some arbitrarily high

   number.



   If one executes the command:



           wget -r -l 2 http://<site>/1.html



   then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.

   As you can see, 3.html is without its requisite 3.gif because Wget

   is simply counting the number of hops (up to 2) away from 1.html in

   order to determine where to stop the recursion.  However, with this

   command:



           wget -r -l 2 -p http://<site>/1.html



   all the above files and 3.html's requisite 3.gif will be

   downloaded.  Similarly,



           wget -r -l 1 -p http://<site>/1.html



   will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded.  One

   might think that:



           wget -r -l 0 -p http://<site>/1.html



   would download just 1.html and 1.gif, but unfortunately this is not

   the case, because -l 0 is equivalent to -l inf---that is, infinite

   recursion.  To download a single HTML page (or a handful of them,

   all specified on the command-line or in a -i URL input file) and

   its (or their) requisites, simply leave off -r and -l:



           wget -p http://<site>/1.html



   Note that Wget will behave as if -r had been specified, but only

   that single page and its requisites will be downloaded.Links from

   that page to external documents will not be followed.  Actually, to

   download a single page and all its requisites (even if they exist

   on separate websites), and make sure the lot displays properly

   locally, this author likes to use a few options in addition to -p:



          wget -E -H -k -K -p http://<site>/<document>



   To finish off this topic, it's worth knowing that Wget's idea of an

   external document link is any URL specified in an "<A>" tag, an

   "<AREA>" tag, or a "<LINK>" tag other than "<LINK

   REL="stylesheet">".



  ==================================================================



 -k

 --convert-links

   After the download is complete, convert the links in the document to make them suitable for local viewing.  This affects not only the visible hyperlinks, but any part of the document that

   links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.



   Each link will be changed in one of the two ways:



   ·   The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.



       Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif.  This kind of transformation

       works reliably for arbitrary combinations of directories.



   ·   The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.



       Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.



   Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet

   address rather than presenting a broken link.  The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.



   Note that only at the end of the download can Wget know which links have been downloaded.  Because of that, the work done by -k will be performed at the end of all the downloads.



  --convert-file-only

   This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term

   here in order not to cause confusion.



   It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different

   hosts.



   Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted

   to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be

   processed by Wget and converted to the effective scheme (ie. "http://").

sorry for my bad indentation :(

answered Dec 11 at 10:19

waLL e

1113

answered Dec 11 at 10:19

waLL e

1113

answered Dec 11 at 10:19

waLL e

1113

answered Dec 11 at 10:19

waLL e

1113

add a comment |

-1

If You want to download everything associated with the link you have
You can try this

wget -r -U "BrowserName" "Url"

You may wanna use --wait="duration" to avoid your ip being blocked.
Its weird requesting page after page without wait periods. that's not human

edited Apr 11 '17 at 14:22

Sumeet Deshmukh

4,34152971

answered Apr 11 '17 at 13:03

Sp1k3

1

Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
– anonymous2
Apr 11 '17 at 13:49

wget -m could also be used instead of -r
– tricasse
Dec 7 '17 at 12:50

1

Use --random-wait with --wait=X in order to avoid blocks further.
– Patrick
Jan 4 at 21:40

@Patrick Would you care to post a full answer? Your comment sounds interesting.
– WinEunuuchs2Unix
Jan 15 at 2:52

add a comment |

-1

If You want to download everything associated with the link you have
You can try this

wget -r -U "BrowserName" "Url"

You may wanna use --wait="duration" to avoid your ip being blocked.
Its weird requesting page after page without wait periods. that's not human

edited Apr 11 '17 at 14:22

Sumeet Deshmukh

4,34152971

answered Apr 11 '17 at 13:03

Sp1k3

1

Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
– anonymous2
Apr 11 '17 at 13:49

wget -m could also be used instead of -r
– tricasse
Dec 7 '17 at 12:50

1

Use --random-wait with --wait=X in order to avoid blocks further.
– Patrick
Jan 4 at 21:40

@Patrick Would you care to post a full answer? Your comment sounds interesting.
– WinEunuuchs2Unix
Jan 15 at 2:52

add a comment |

-1

If You want to download everything associated with the link you have
You can try this

wget -r -U "BrowserName" "Url"

You may wanna use --wait="duration" to avoid your ip being blocked.
Its weird requesting page after page without wait periods. that's not human

edited Apr 11 '17 at 14:22

Sumeet Deshmukh

4,34152971

answered Apr 11 '17 at 13:03

Sp1k3

If You want to download everything associated with the link you have
You can try this

wget -r -U "BrowserName" "Url"

You may wanna use --wait="duration" to avoid your ip being blocked.
Its weird requesting page after page without wait periods. that's not human

edited Apr 11 '17 at 14:22

Sumeet Deshmukh

4,34152971

answered Apr 11 '17 at 13:03

Sp1k3

edited Apr 11 '17 at 14:22

Sumeet Deshmukh

4,34152971

edited Apr 11 '17 at 14:22

Sumeet Deshmukh

4,34152971

edited Apr 11 '17 at 14:22

Sumeet Deshmukh

4,34152971

answered Apr 11 '17 at 13:03

Sp1k3

answered Apr 11 '17 at 13:03

Sp1k3

answered Apr 11 '17 at 13:03

Sp1k3

1

Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
– anonymous2
Apr 11 '17 at 13:49

wget -m could also be used instead of -r
– tricasse
Dec 7 '17 at 12:50

1

Use --random-wait with --wait=X in order to avoid blocks further.
– Patrick
Jan 4 at 21:40

@Patrick Would you care to post a full answer? Your comment sounds interesting.
– WinEunuuchs2Unix
Jan 15 at 2:52

add a comment |

1

Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
– anonymous2
Apr 11 '17 at 13:49

wget -m could also be used instead of -r
– tricasse
Dec 7 '17 at 12:50

1

Use --random-wait with --wait=X in order to avoid blocks further.
– Patrick
Jan 4 at 21:40

@Patrick Would you care to post a full answer? Your comment sounds interesting.
– WinEunuuchs2Unix
Jan 15 at 2:52

Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
– anonymous2
Apr 11 '17 at 13:49

wget -m could also be used instead of -r
– tricasse
Dec 7 '17 at 12:50

Use --random-wait with --wait=X in order to avoid blocks further.
– Patrick
Jan 4 at 21:40

@Patrick Would you care to post a full answer? Your comment sounds interesting.
– WinEunuuchs2Unix
Jan 15 at 2:52

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Ask Ubuntu!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Krdytkyu