Download a whole website with wget (or other) including all its downloadable content












18














I'm trying to download winamp's website in case they shut it down. I need to download literally everything.



I tried once with wget and I managed to download the website itself, but when I try to download any file from it it gives a file without an extension or name. How can I fix that?










share|improve this question





























    18














    I'm trying to download winamp's website in case they shut it down. I need to download literally everything.



    I tried once with wget and I managed to download the website itself, but when I try to download any file from it it gives a file without an extension or name. How can I fix that?










    share|improve this question



























      18












      18








      18


      7





      I'm trying to download winamp's website in case they shut it down. I need to download literally everything.



      I tried once with wget and I managed to download the website itself, but when I try to download any file from it it gives a file without an extension or name. How can I fix that?










      share|improve this question















      I'm trying to download winamp's website in case they shut it down. I need to download literally everything.



      I tried once with wget and I managed to download the website itself, but when I try to download any file from it it gives a file without an extension or name. How can I fix that?







      downloads wget






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 17 '17 at 10:35









      Martin Thoma

      6,429155172




      6,429155172










      asked Dec 16 '13 at 14:58









      Mina Michael

      4,0771759121




      4,0771759121






















          3 Answers
          3






          active

          oldest

          votes


















          15














          You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:



          wget -r http://winapp.com # or whatever


          With HTTrack, first install it:



          sudo apt-get install httrack


          now run it just 1 external link:



          httrack --ext-depth=1 http://winapp.com


          This will download the winapp CDN files, but not the files in the files in the files in the whole internet.






          share|improve this answer





























            1














            wget -p -k http://somewebsite.com


            From man wget



            -p
            --page-requisites
            This option causes Wget to download all the files that are
            necessary to properly display a given HTML page. This includes
            such things as inlined images, sounds, and referenced stylesheets.

            Ordinarily, when downloading a single HTML page, any requisite
            documents that may be needed to display it properly are not
            downloaded. Using -r together with -l can help, but since Wget
            does not ordinarily distinguish between external and inlined
            documents, one is generally left with "leaf documents" that are
            missing their requisites.

            For instance, say document 1.html contains an "<IMG>" tag
            referencing 1.gif and an "<A>" tag pointing to external document
            2.html. Say that 2.html is similar but that its image is 2.gif and
            it links to 3.html. Say this continues up to some arbitrarily high
            number.

            If one executes the command:

            wget -r -l 2 http://<site>/1.html

            then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.
            As you can see, 3.html is without its requisite 3.gif because Wget
            is simply counting the number of hops (up to 2) away from 1.html in
            order to determine where to stop the recursion. However, with this
            command:

            wget -r -l 2 -p http://<site>/1.html

            all the above files and 3.html's requisite 3.gif will be
            downloaded. Similarly,

            wget -r -l 1 -p http://<site>/1.html

            will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded. One
            might think that:

            wget -r -l 0 -p http://<site>/1.html

            would download just 1.html and 1.gif, but unfortunately this is not
            the case, because -l 0 is equivalent to -l inf---that is, infinite
            recursion. To download a single HTML page (or a handful of them,
            all specified on the command-line or in a -i URL input file) and
            its (or their) requisites, simply leave off -r and -l:

            wget -p http://<site>/1.html

            Note that Wget will behave as if -r had been specified, but only
            that single page and its requisites will be downloaded.Links from
            that page to external documents will not be followed. Actually, to
            download a single page and all its requisites (even if they exist
            on separate websites), and make sure the lot displays properly
            locally, this author likes to use a few options in addition to -p:

            wget -E -H -k -K -p http://<site>/<document>

            To finish off this topic, it's worth knowing that Wget's idea of an
            external document link is any URL specified in an "<A>" tag, an
            "<AREA>" tag, or a "<LINK>" tag other than "<LINK
            REL="stylesheet">".

            ==================================================================

            -k
            --convert-links
            After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that
            links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

            Each link will be changed in one of the two ways:

            · The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.

            Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif. This kind of transformation
            works reliably for arbitrary combinations of directories.

            · The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.

            Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.

            Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet
            address rather than presenting a broken link. The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.

            Note that only at the end of the download can Wget know which links have been downloaded. Because of that, the work done by -k will be performed at the end of all the downloads.

            --convert-file-only
            This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term
            here in order not to cause confusion.

            It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different
            hosts.

            Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted
            to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be
            processed by Wget and converted to the effective scheme (ie. "http://").


            sorry for my bad indentation :(






            share|improve this answer





























              -1














              If You want to download everything associated with the link you have
              You can try this



              wget -r -U "BrowserName" "Url"


              You may wanna use --wait="duration" to avoid your ip being blocked.
              Its weird requesting page after page without wait periods. that's not human






              share|improve this answer



















              • 1




                Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
                – anonymous2
                Apr 11 '17 at 13:49










              • wget -m could also be used instead of -r
                – tricasse
                Dec 7 '17 at 12:50






              • 1




                Use --random-wait with --wait=X in order to avoid blocks further.
                – Patrick
                Jan 4 at 21:40












              • @Patrick Would you care to post a full answer? Your comment sounds interesting.
                – WinEunuuchs2Unix
                Jan 15 at 2:52











              Your Answer








              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "89"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f391622%2fdownload-a-whole-website-with-wget-or-other-including-all-its-downloadable-con%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              15














              You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:



              wget -r http://winapp.com # or whatever


              With HTTrack, first install it:



              sudo apt-get install httrack


              now run it just 1 external link:



              httrack --ext-depth=1 http://winapp.com


              This will download the winapp CDN files, but not the files in the files in the files in the whole internet.






              share|improve this answer


























                15














                You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:



                wget -r http://winapp.com # or whatever


                With HTTrack, first install it:



                sudo apt-get install httrack


                now run it just 1 external link:



                httrack --ext-depth=1 http://winapp.com


                This will download the winapp CDN files, but not the files in the files in the files in the whole internet.






                share|improve this answer
























                  15












                  15








                  15






                  You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:



                  wget -r http://winapp.com # or whatever


                  With HTTrack, first install it:



                  sudo apt-get install httrack


                  now run it just 1 external link:



                  httrack --ext-depth=1 http://winapp.com


                  This will download the winapp CDN files, but not the files in the files in the files in the whole internet.






                  share|improve this answer












                  You may need to mirror the website completely, but be aware that some links may really dead. You can use HTTrack or wget:



                  wget -r http://winapp.com # or whatever


                  With HTTrack, first install it:



                  sudo apt-get install httrack


                  now run it just 1 external link:



                  httrack --ext-depth=1 http://winapp.com


                  This will download the winapp CDN files, but not the files in the files in the files in the whole internet.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Dec 16 '13 at 15:13









                  Braiam

                  51.3k20136219




                  51.3k20136219

























                      1














                      wget -p -k http://somewebsite.com


                      From man wget



                      -p
                      --page-requisites
                      This option causes Wget to download all the files that are
                      necessary to properly display a given HTML page. This includes
                      such things as inlined images, sounds, and referenced stylesheets.

                      Ordinarily, when downloading a single HTML page, any requisite
                      documents that may be needed to display it properly are not
                      downloaded. Using -r together with -l can help, but since Wget
                      does not ordinarily distinguish between external and inlined
                      documents, one is generally left with "leaf documents" that are
                      missing their requisites.

                      For instance, say document 1.html contains an "<IMG>" tag
                      referencing 1.gif and an "<A>" tag pointing to external document
                      2.html. Say that 2.html is similar but that its image is 2.gif and
                      it links to 3.html. Say this continues up to some arbitrarily high
                      number.

                      If one executes the command:

                      wget -r -l 2 http://<site>/1.html

                      then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.
                      As you can see, 3.html is without its requisite 3.gif because Wget
                      is simply counting the number of hops (up to 2) away from 1.html in
                      order to determine where to stop the recursion. However, with this
                      command:

                      wget -r -l 2 -p http://<site>/1.html

                      all the above files and 3.html's requisite 3.gif will be
                      downloaded. Similarly,

                      wget -r -l 1 -p http://<site>/1.html

                      will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded. One
                      might think that:

                      wget -r -l 0 -p http://<site>/1.html

                      would download just 1.html and 1.gif, but unfortunately this is not
                      the case, because -l 0 is equivalent to -l inf---that is, infinite
                      recursion. To download a single HTML page (or a handful of them,
                      all specified on the command-line or in a -i URL input file) and
                      its (or their) requisites, simply leave off -r and -l:

                      wget -p http://<site>/1.html

                      Note that Wget will behave as if -r had been specified, but only
                      that single page and its requisites will be downloaded.Links from
                      that page to external documents will not be followed. Actually, to
                      download a single page and all its requisites (even if they exist
                      on separate websites), and make sure the lot displays properly
                      locally, this author likes to use a few options in addition to -p:

                      wget -E -H -k -K -p http://<site>/<document>

                      To finish off this topic, it's worth knowing that Wget's idea of an
                      external document link is any URL specified in an "<A>" tag, an
                      "<AREA>" tag, or a "<LINK>" tag other than "<LINK
                      REL="stylesheet">".

                      ==================================================================

                      -k
                      --convert-links
                      After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that
                      links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

                      Each link will be changed in one of the two ways:

                      · The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.

                      Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif. This kind of transformation
                      works reliably for arbitrary combinations of directories.

                      · The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.

                      Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.

                      Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet
                      address rather than presenting a broken link. The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.

                      Note that only at the end of the download can Wget know which links have been downloaded. Because of that, the work done by -k will be performed at the end of all the downloads.

                      --convert-file-only
                      This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term
                      here in order not to cause confusion.

                      It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different
                      hosts.

                      Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted
                      to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be
                      processed by Wget and converted to the effective scheme (ie. "http://").


                      sorry for my bad indentation :(






                      share|improve this answer


























                        1














                        wget -p -k http://somewebsite.com


                        From man wget



                        -p
                        --page-requisites
                        This option causes Wget to download all the files that are
                        necessary to properly display a given HTML page. This includes
                        such things as inlined images, sounds, and referenced stylesheets.

                        Ordinarily, when downloading a single HTML page, any requisite
                        documents that may be needed to display it properly are not
                        downloaded. Using -r together with -l can help, but since Wget
                        does not ordinarily distinguish between external and inlined
                        documents, one is generally left with "leaf documents" that are
                        missing their requisites.

                        For instance, say document 1.html contains an "<IMG>" tag
                        referencing 1.gif and an "<A>" tag pointing to external document
                        2.html. Say that 2.html is similar but that its image is 2.gif and
                        it links to 3.html. Say this continues up to some arbitrarily high
                        number.

                        If one executes the command:

                        wget -r -l 2 http://<site>/1.html

                        then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.
                        As you can see, 3.html is without its requisite 3.gif because Wget
                        is simply counting the number of hops (up to 2) away from 1.html in
                        order to determine where to stop the recursion. However, with this
                        command:

                        wget -r -l 2 -p http://<site>/1.html

                        all the above files and 3.html's requisite 3.gif will be
                        downloaded. Similarly,

                        wget -r -l 1 -p http://<site>/1.html

                        will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded. One
                        might think that:

                        wget -r -l 0 -p http://<site>/1.html

                        would download just 1.html and 1.gif, but unfortunately this is not
                        the case, because -l 0 is equivalent to -l inf---that is, infinite
                        recursion. To download a single HTML page (or a handful of them,
                        all specified on the command-line or in a -i URL input file) and
                        its (or their) requisites, simply leave off -r and -l:

                        wget -p http://<site>/1.html

                        Note that Wget will behave as if -r had been specified, but only
                        that single page and its requisites will be downloaded.Links from
                        that page to external documents will not be followed. Actually, to
                        download a single page and all its requisites (even if they exist
                        on separate websites), and make sure the lot displays properly
                        locally, this author likes to use a few options in addition to -p:

                        wget -E -H -k -K -p http://<site>/<document>

                        To finish off this topic, it's worth knowing that Wget's idea of an
                        external document link is any URL specified in an "<A>" tag, an
                        "<AREA>" tag, or a "<LINK>" tag other than "<LINK
                        REL="stylesheet">".

                        ==================================================================

                        -k
                        --convert-links
                        After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that
                        links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

                        Each link will be changed in one of the two ways:

                        · The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.

                        Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif. This kind of transformation
                        works reliably for arbitrary combinations of directories.

                        · The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.

                        Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.

                        Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet
                        address rather than presenting a broken link. The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.

                        Note that only at the end of the download can Wget know which links have been downloaded. Because of that, the work done by -k will be performed at the end of all the downloads.

                        --convert-file-only
                        This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term
                        here in order not to cause confusion.

                        It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different
                        hosts.

                        Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted
                        to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be
                        processed by Wget and converted to the effective scheme (ie. "http://").


                        sorry for my bad indentation :(






                        share|improve this answer
























                          1












                          1








                          1






                          wget -p -k http://somewebsite.com


                          From man wget



                          -p
                          --page-requisites
                          This option causes Wget to download all the files that are
                          necessary to properly display a given HTML page. This includes
                          such things as inlined images, sounds, and referenced stylesheets.

                          Ordinarily, when downloading a single HTML page, any requisite
                          documents that may be needed to display it properly are not
                          downloaded. Using -r together with -l can help, but since Wget
                          does not ordinarily distinguish between external and inlined
                          documents, one is generally left with "leaf documents" that are
                          missing their requisites.

                          For instance, say document 1.html contains an "<IMG>" tag
                          referencing 1.gif and an "<A>" tag pointing to external document
                          2.html. Say that 2.html is similar but that its image is 2.gif and
                          it links to 3.html. Say this continues up to some arbitrarily high
                          number.

                          If one executes the command:

                          wget -r -l 2 http://<site>/1.html

                          then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.
                          As you can see, 3.html is without its requisite 3.gif because Wget
                          is simply counting the number of hops (up to 2) away from 1.html in
                          order to determine where to stop the recursion. However, with this
                          command:

                          wget -r -l 2 -p http://<site>/1.html

                          all the above files and 3.html's requisite 3.gif will be
                          downloaded. Similarly,

                          wget -r -l 1 -p http://<site>/1.html

                          will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded. One
                          might think that:

                          wget -r -l 0 -p http://<site>/1.html

                          would download just 1.html and 1.gif, but unfortunately this is not
                          the case, because -l 0 is equivalent to -l inf---that is, infinite
                          recursion. To download a single HTML page (or a handful of them,
                          all specified on the command-line or in a -i URL input file) and
                          its (or their) requisites, simply leave off -r and -l:

                          wget -p http://<site>/1.html

                          Note that Wget will behave as if -r had been specified, but only
                          that single page and its requisites will be downloaded.Links from
                          that page to external documents will not be followed. Actually, to
                          download a single page and all its requisites (even if they exist
                          on separate websites), and make sure the lot displays properly
                          locally, this author likes to use a few options in addition to -p:

                          wget -E -H -k -K -p http://<site>/<document>

                          To finish off this topic, it's worth knowing that Wget's idea of an
                          external document link is any URL specified in an "<A>" tag, an
                          "<AREA>" tag, or a "<LINK>" tag other than "<LINK
                          REL="stylesheet">".

                          ==================================================================

                          -k
                          --convert-links
                          After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that
                          links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

                          Each link will be changed in one of the two ways:

                          · The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.

                          Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif. This kind of transformation
                          works reliably for arbitrary combinations of directories.

                          · The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.

                          Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.

                          Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet
                          address rather than presenting a broken link. The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.

                          Note that only at the end of the download can Wget know which links have been downloaded. Because of that, the work done by -k will be performed at the end of all the downloads.

                          --convert-file-only
                          This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term
                          here in order not to cause confusion.

                          It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different
                          hosts.

                          Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted
                          to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be
                          processed by Wget and converted to the effective scheme (ie. "http://").


                          sorry for my bad indentation :(






                          share|improve this answer












                          wget -p -k http://somewebsite.com


                          From man wget



                          -p
                          --page-requisites
                          This option causes Wget to download all the files that are
                          necessary to properly display a given HTML page. This includes
                          such things as inlined images, sounds, and referenced stylesheets.

                          Ordinarily, when downloading a single HTML page, any requisite
                          documents that may be needed to display it properly are not
                          downloaded. Using -r together with -l can help, but since Wget
                          does not ordinarily distinguish between external and inlined
                          documents, one is generally left with "leaf documents" that are
                          missing their requisites.

                          For instance, say document 1.html contains an "<IMG>" tag
                          referencing 1.gif and an "<A>" tag pointing to external document
                          2.html. Say that 2.html is similar but that its image is 2.gif and
                          it links to 3.html. Say this continues up to some arbitrarily high
                          number.

                          If one executes the command:

                          wget -r -l 2 http://<site>/1.html

                          then 1.html, 1.gif, 2.html, 2.gif, and 3.html will be downloaded.
                          As you can see, 3.html is without its requisite 3.gif because Wget
                          is simply counting the number of hops (up to 2) away from 1.html in
                          order to determine where to stop the recursion. However, with this
                          command:

                          wget -r -l 2 -p http://<site>/1.html

                          all the above files and 3.html's requisite 3.gif will be
                          downloaded. Similarly,

                          wget -r -l 1 -p http://<site>/1.html

                          will cause 1.html, 1.gif, 2.html, and 2.gif to be downloaded. One
                          might think that:

                          wget -r -l 0 -p http://<site>/1.html

                          would download just 1.html and 1.gif, but unfortunately this is not
                          the case, because -l 0 is equivalent to -l inf---that is, infinite
                          recursion. To download a single HTML page (or a handful of them,
                          all specified on the command-line or in a -i URL input file) and
                          its (or their) requisites, simply leave off -r and -l:

                          wget -p http://<site>/1.html

                          Note that Wget will behave as if -r had been specified, but only
                          that single page and its requisites will be downloaded.Links from
                          that page to external documents will not be followed. Actually, to
                          download a single page and all its requisites (even if they exist
                          on separate websites), and make sure the lot displays properly
                          locally, this author likes to use a few options in addition to -p:

                          wget -E -H -k -K -p http://<site>/<document>

                          To finish off this topic, it's worth knowing that Wget's idea of an
                          external document link is any URL specified in an "<A>" tag, an
                          "<AREA>" tag, or a "<LINK>" tag other than "<LINK
                          REL="stylesheet">".

                          ==================================================================

                          -k
                          --convert-links
                          After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that
                          links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

                          Each link will be changed in one of the two ways:

                          · The links to files that have been downloaded by Wget will be changed to refer to the file they point to as a relative link.

                          Example: if the downloaded file /foo/doc.html links to /bar/img.gif, also downloaded, then the link in doc.html will be modified to point to ../bar/img.gif. This kind of transformation
                          works reliably for arbitrary combinations of directories.

                          · The links to files that have not been downloaded by Wget will be changed to include host name and absolute path of the location they point to.

                          Example: if the downloaded file /foo/doc.html links to /bar/img.gif (or to ../bar/img.gif), then the link in doc.html will be modified to point to http://hostname/bar/img.gif.

                          Because of this, local browsing works reliably: if a linked file was downloaded, the link will refer to its local name; if it was not downloaded, the link will refer to its full Internet
                          address rather than presenting a broken link. The fact that the former links are converted to relative links ensures that you can move the downloaded hierarchy to another directory.

                          Note that only at the end of the download can Wget know which links have been downloaded. Because of that, the work done by -k will be performed at the end of all the downloads.

                          --convert-file-only
                          This option converts only the filename part of the URLs, leaving the rest of the URLs untouched. This filename part is sometimes referred to as the "basename", although we avoid that term
                          here in order not to cause confusion.

                          It works particularly well in conjunction with --adjust-extension, although this coupling is not enforced. It proves useful to populate Internet caches with files downloaded from different
                          hosts.

                          Example: if some link points to //foo.com/bar.cgi?xyz with --adjust-extension asserted and its local destination is intended to be ./foo.com/bar.cgi?xyz.css, then the link would be converted
                          to //foo.com/bar.cgi?xyz.css. Note that only the filename part has been modified. The rest of the URL has been left untouched, including the net path ("//") which would otherwise be
                          processed by Wget and converted to the effective scheme (ie. "http://").


                          sorry for my bad indentation :(







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Dec 11 at 10:19









                          waLL e

                          1113




                          1113























                              -1














                              If You want to download everything associated with the link you have
                              You can try this



                              wget -r -U "BrowserName" "Url"


                              You may wanna use --wait="duration" to avoid your ip being blocked.
                              Its weird requesting page after page without wait periods. that's not human






                              share|improve this answer



















                              • 1




                                Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
                                – anonymous2
                                Apr 11 '17 at 13:49










                              • wget -m could also be used instead of -r
                                – tricasse
                                Dec 7 '17 at 12:50






                              • 1




                                Use --random-wait with --wait=X in order to avoid blocks further.
                                – Patrick
                                Jan 4 at 21:40












                              • @Patrick Would you care to post a full answer? Your comment sounds interesting.
                                – WinEunuuchs2Unix
                                Jan 15 at 2:52
















                              -1














                              If You want to download everything associated with the link you have
                              You can try this



                              wget -r -U "BrowserName" "Url"


                              You may wanna use --wait="duration" to avoid your ip being blocked.
                              Its weird requesting page after page without wait periods. that's not human






                              share|improve this answer



















                              • 1




                                Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
                                – anonymous2
                                Apr 11 '17 at 13:49










                              • wget -m could also be used instead of -r
                                – tricasse
                                Dec 7 '17 at 12:50






                              • 1




                                Use --random-wait with --wait=X in order to avoid blocks further.
                                – Patrick
                                Jan 4 at 21:40












                              • @Patrick Would you care to post a full answer? Your comment sounds interesting.
                                – WinEunuuchs2Unix
                                Jan 15 at 2:52














                              -1












                              -1








                              -1






                              If You want to download everything associated with the link you have
                              You can try this



                              wget -r -U "BrowserName" "Url"


                              You may wanna use --wait="duration" to avoid your ip being blocked.
                              Its weird requesting page after page without wait periods. that's not human






                              share|improve this answer














                              If You want to download everything associated with the link you have
                              You can try this



                              wget -r -U "BrowserName" "Url"


                              You may wanna use --wait="duration" to avoid your ip being blocked.
                              Its weird requesting page after page without wait periods. that's not human







                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited Apr 11 '17 at 14:22









                              Sumeet Deshmukh

                              4,34152971




                              4,34152971










                              answered Apr 11 '17 at 13:03









                              Sp1k3

                              1




                              1








                              • 1




                                Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
                                – anonymous2
                                Apr 11 '17 at 13:49










                              • wget -m could also be used instead of -r
                                – tricasse
                                Dec 7 '17 at 12:50






                              • 1




                                Use --random-wait with --wait=X in order to avoid blocks further.
                                – Patrick
                                Jan 4 at 21:40












                              • @Patrick Would you care to post a full answer? Your comment sounds interesting.
                                – WinEunuuchs2Unix
                                Jan 15 at 2:52














                              • 1




                                Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
                                – anonymous2
                                Apr 11 '17 at 13:49










                              • wget -m could also be used instead of -r
                                – tricasse
                                Dec 7 '17 at 12:50






                              • 1




                                Use --random-wait with --wait=X in order to avoid blocks further.
                                – Patrick
                                Jan 4 at 21:40












                              • @Patrick Would you care to post a full answer? Your comment sounds interesting.
                                – WinEunuuchs2Unix
                                Jan 15 at 2:52








                              1




                              1




                              Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
                              – anonymous2
                              Apr 11 '17 at 13:49




                              Welcome to Ask Ubuntu! It would be a vast improvement to this answer to improve your grammar, or at least to approve the suggested edit...
                              – anonymous2
                              Apr 11 '17 at 13:49












                              wget -m could also be used instead of -r
                              – tricasse
                              Dec 7 '17 at 12:50




                              wget -m could also be used instead of -r
                              – tricasse
                              Dec 7 '17 at 12:50




                              1




                              1




                              Use --random-wait with --wait=X in order to avoid blocks further.
                              – Patrick
                              Jan 4 at 21:40






                              Use --random-wait with --wait=X in order to avoid blocks further.
                              – Patrick
                              Jan 4 at 21:40














                              @Patrick Would you care to post a full answer? Your comment sounds interesting.
                              – WinEunuuchs2Unix
                              Jan 15 at 2:52




                              @Patrick Would you care to post a full answer? Your comment sounds interesting.
                              – WinEunuuchs2Unix
                              Jan 15 at 2:52


















                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Ask Ubuntu!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f391622%2fdownload-a-whole-website-with-wget-or-other-including-all-its-downloadable-con%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Quarter-circle Tiles

                              build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

                              Mont Emei