How using Python3 with BeautifulSoup get an article's text from Wikipedia











up vote
6
down vote

favorite
1












I have such script made in Python3:



response = simple_get("https://en.wikipedia.org/wiki/Mathematics")
result = {'url': url}
if response is not None:
html = BeautifulSoup(response, 'html.parser')
title = html.select("#firstHeading")[0].text


As you can see I can get the title from the article, But I can not figure out how to get text from "Mathematics" (from Greek μά...) to the contents table...










share|improve this question









New contributor




wiki one is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
























    up vote
    6
    down vote

    favorite
    1












    I have such script made in Python3:



    response = simple_get("https://en.wikipedia.org/wiki/Mathematics")
    result = {'url': url}
    if response is not None:
    html = BeautifulSoup(response, 'html.parser')
    title = html.select("#firstHeading")[0].text


    As you can see I can get the title from the article, But I can not figure out how to get text from "Mathematics" (from Greek μά...) to the contents table...










    share|improve this question









    New contributor




    wiki one is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






















      up vote
      6
      down vote

      favorite
      1









      up vote
      6
      down vote

      favorite
      1






      1





      I have such script made in Python3:



      response = simple_get("https://en.wikipedia.org/wiki/Mathematics")
      result = {'url': url}
      if response is not None:
      html = BeautifulSoup(response, 'html.parser')
      title = html.select("#firstHeading")[0].text


      As you can see I can get the title from the article, But I can not figure out how to get text from "Mathematics" (from Greek μά...) to the contents table...










      share|improve this question









      New contributor




      wiki one is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      I have such script made in Python3:



      response = simple_get("https://en.wikipedia.org/wiki/Mathematics")
      result = {'url': url}
      if response is not None:
      html = BeautifulSoup(response, 'html.parser')
      title = html.select("#firstHeading")[0].text


      As you can see I can get the title from the article, But I can not figure out how to get text from "Mathematics" (from Greek μά...) to the contents table...







      python web-scraping beautifulsoup






      share|improve this question









      New contributor




      wiki one is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      wiki one is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 2 hours ago









      Aaron_ab

      333112




      333112






      New contributor




      wiki one is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 4 hours ago









      wiki one

      333




      333




      New contributor




      wiki one is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      wiki one is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      wiki one is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.
























          4 Answers
          4






          active

          oldest

          votes

















          up vote
          4
          down vote



          accepted










          select the <p> tag. There are 52 elements. Not sure if you want the whole thing, but you can iterate through those tags to store it as you may. I just chose to print each of them to show the output.



          import bs4
          import requests


          response = requests.get("https://en.wikipedia.org/wiki/Mathematics")

          if response is not None:
          html = bs4.BeautifulSoup(response.text, 'html.parser')

          title = html.select("#firstHeading")[0].text
          paragraphs = html.select("p")
          for para in paragraphs:
          print (para.text)

          # just grab the text up to contents as stated in question
          intro = 'n'.join([ para.text for para in paragraphs[0:5]])
          print (intro)





          share|improve this answer



















          • 3




            if response is not None can be rewritten as if response. also since the content may change in the future I would have suggested to get the entire div, read only the p and stop when you reach the div with class "toclimit-3"
            – PinoSan
            4 hours ago






          • 2




            @PinoSan I think it doesn't hurt to check for None explicitly. For example bool('' is not None) is not the same as bool(''). However, in this case the None check is completely unnecessary because response will always be a requests.models.Response object. If the request fails an exception will be raised.
            – t.m.adam
            2 hours ago












          • @t.m.adam what you are saying is true but as you said the response is not a string. So you just wanted to check that it was a valid object, not an empty string, None or an empty dictionary, ... About the exceptions, I agree we should check for exceptions in case of networks errors but also we should check for status code to be 200
            – PinoSan
            2 hours ago












          • @PinoSan Of course, and I too prefer the if response style, but you know, "Explicit is better than implicit.". The problem with if response is that it may produce strange errors, difficult to debug. But yes, in most cases a simple boolean check should be enough.
            – t.m.adam
            2 hours ago




















          up vote
          5
          down vote













          Use the library wikipedia



          import wikipedia
          #print(wikipedia.summary("Mathematics"))
          #wikipedia.search("Mathematics")
          print(wikipedia.page("Mathematics").content)





          share|improve this answer























          • I wish I knew about this lib before I messed up with BS4 lib. Thanks!
            – wiki one
            4 hours ago










          • It would be the logical choice for me. I haven't explored much though.
            – QHarr
            4 hours ago








          • 2




            I'd use wikipediaapi instead, wikipedia module seems to be not maintained. Though, both would do the job in a similar manner.
            – alecxe
            3 hours ago












          • @alecxe ooh.. Thanks for that +
            – QHarr
            3 hours ago


















          up vote
          5
          down vote













          There is a much, much more easy way to get information from wikipedia - Wikipedia API.



          There is this Python wrapper, which allows you to do it in a few lines only with zero HTML-parsing:



          import wikipediaapi

          wiki_wiki = wikipediaapi.Wikipedia('en')

          page = wiki_wiki.page('Mathematics')
          print(page.summary)


          Prints:




          Mathematics (from Greek μάθημα máthēma, "knowledge, study, learning")
          includes the study of such topics as quantity, structure, space, and
          change...(omitted intentionally)







          share|improve this answer




























            up vote
            1
            down vote













            You can get the desired output using lxml library like following.



            import requests
            from lxml.html import fromstring

            url = "https://en.wikipedia.org/wiki/Mathematics"

            res = requests.get(url)
            source = fromstring(res.content)
            paragraph = 'n'.join([item.text_content() for item in source.xpath('//p[following::h2[2][span="History"]]')])
            print(paragraph)


            Using BeautifulSoup:



            from bs4 import BeautifulSoup
            import requests

            res = requests.get("https://en.wikipedia.org/wiki/Mathematics")
            soup = BeautifulSoup(res.text, 'html.parser')
            for item in soup.find_all("p"):
            if item.text.startswith("The history"):break
            print(item.text)





            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });






              wiki one is a new contributor. Be nice, and check out our Code of Conduct.










              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53804643%2fhow-using-python3-with-beautifulsoup-get-an-articles-text-from-wikipedia%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              4 Answers
              4






              active

              oldest

              votes








              4 Answers
              4






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              4
              down vote



              accepted










              select the <p> tag. There are 52 elements. Not sure if you want the whole thing, but you can iterate through those tags to store it as you may. I just chose to print each of them to show the output.



              import bs4
              import requests


              response = requests.get("https://en.wikipedia.org/wiki/Mathematics")

              if response is not None:
              html = bs4.BeautifulSoup(response.text, 'html.parser')

              title = html.select("#firstHeading")[0].text
              paragraphs = html.select("p")
              for para in paragraphs:
              print (para.text)

              # just grab the text up to contents as stated in question
              intro = 'n'.join([ para.text for para in paragraphs[0:5]])
              print (intro)





              share|improve this answer



















              • 3




                if response is not None can be rewritten as if response. also since the content may change in the future I would have suggested to get the entire div, read only the p and stop when you reach the div with class "toclimit-3"
                – PinoSan
                4 hours ago






              • 2




                @PinoSan I think it doesn't hurt to check for None explicitly. For example bool('' is not None) is not the same as bool(''). However, in this case the None check is completely unnecessary because response will always be a requests.models.Response object. If the request fails an exception will be raised.
                – t.m.adam
                2 hours ago












              • @t.m.adam what you are saying is true but as you said the response is not a string. So you just wanted to check that it was a valid object, not an empty string, None or an empty dictionary, ... About the exceptions, I agree we should check for exceptions in case of networks errors but also we should check for status code to be 200
                – PinoSan
                2 hours ago












              • @PinoSan Of course, and I too prefer the if response style, but you know, "Explicit is better than implicit.". The problem with if response is that it may produce strange errors, difficult to debug. But yes, in most cases a simple boolean check should be enough.
                – t.m.adam
                2 hours ago

















              up vote
              4
              down vote



              accepted










              select the <p> tag. There are 52 elements. Not sure if you want the whole thing, but you can iterate through those tags to store it as you may. I just chose to print each of them to show the output.



              import bs4
              import requests


              response = requests.get("https://en.wikipedia.org/wiki/Mathematics")

              if response is not None:
              html = bs4.BeautifulSoup(response.text, 'html.parser')

              title = html.select("#firstHeading")[0].text
              paragraphs = html.select("p")
              for para in paragraphs:
              print (para.text)

              # just grab the text up to contents as stated in question
              intro = 'n'.join([ para.text for para in paragraphs[0:5]])
              print (intro)





              share|improve this answer



















              • 3




                if response is not None can be rewritten as if response. also since the content may change in the future I would have suggested to get the entire div, read only the p and stop when you reach the div with class "toclimit-3"
                – PinoSan
                4 hours ago






              • 2




                @PinoSan I think it doesn't hurt to check for None explicitly. For example bool('' is not None) is not the same as bool(''). However, in this case the None check is completely unnecessary because response will always be a requests.models.Response object. If the request fails an exception will be raised.
                – t.m.adam
                2 hours ago












              • @t.m.adam what you are saying is true but as you said the response is not a string. So you just wanted to check that it was a valid object, not an empty string, None or an empty dictionary, ... About the exceptions, I agree we should check for exceptions in case of networks errors but also we should check for status code to be 200
                – PinoSan
                2 hours ago












              • @PinoSan Of course, and I too prefer the if response style, but you know, "Explicit is better than implicit.". The problem with if response is that it may produce strange errors, difficult to debug. But yes, in most cases a simple boolean check should be enough.
                – t.m.adam
                2 hours ago















              up vote
              4
              down vote



              accepted







              up vote
              4
              down vote



              accepted






              select the <p> tag. There are 52 elements. Not sure if you want the whole thing, but you can iterate through those tags to store it as you may. I just chose to print each of them to show the output.



              import bs4
              import requests


              response = requests.get("https://en.wikipedia.org/wiki/Mathematics")

              if response is not None:
              html = bs4.BeautifulSoup(response.text, 'html.parser')

              title = html.select("#firstHeading")[0].text
              paragraphs = html.select("p")
              for para in paragraphs:
              print (para.text)

              # just grab the text up to contents as stated in question
              intro = 'n'.join([ para.text for para in paragraphs[0:5]])
              print (intro)





              share|improve this answer














              select the <p> tag. There are 52 elements. Not sure if you want the whole thing, but you can iterate through those tags to store it as you may. I just chose to print each of them to show the output.



              import bs4
              import requests


              response = requests.get("https://en.wikipedia.org/wiki/Mathematics")

              if response is not None:
              html = bs4.BeautifulSoup(response.text, 'html.parser')

              title = html.select("#firstHeading")[0].text
              paragraphs = html.select("p")
              for para in paragraphs:
              print (para.text)

              # just grab the text up to contents as stated in question
              intro = 'n'.join([ para.text for para in paragraphs[0:5]])
              print (intro)






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 4 hours ago

























              answered 4 hours ago









              chitown88

              1,230212




              1,230212








              • 3




                if response is not None can be rewritten as if response. also since the content may change in the future I would have suggested to get the entire div, read only the p and stop when you reach the div with class "toclimit-3"
                – PinoSan
                4 hours ago






              • 2




                @PinoSan I think it doesn't hurt to check for None explicitly. For example bool('' is not None) is not the same as bool(''). However, in this case the None check is completely unnecessary because response will always be a requests.models.Response object. If the request fails an exception will be raised.
                – t.m.adam
                2 hours ago












              • @t.m.adam what you are saying is true but as you said the response is not a string. So you just wanted to check that it was a valid object, not an empty string, None or an empty dictionary, ... About the exceptions, I agree we should check for exceptions in case of networks errors but also we should check for status code to be 200
                – PinoSan
                2 hours ago












              • @PinoSan Of course, and I too prefer the if response style, but you know, "Explicit is better than implicit.". The problem with if response is that it may produce strange errors, difficult to debug. But yes, in most cases a simple boolean check should be enough.
                – t.m.adam
                2 hours ago
















              • 3




                if response is not None can be rewritten as if response. also since the content may change in the future I would have suggested to get the entire div, read only the p and stop when you reach the div with class "toclimit-3"
                – PinoSan
                4 hours ago






              • 2




                @PinoSan I think it doesn't hurt to check for None explicitly. For example bool('' is not None) is not the same as bool(''). However, in this case the None check is completely unnecessary because response will always be a requests.models.Response object. If the request fails an exception will be raised.
                – t.m.adam
                2 hours ago












              • @t.m.adam what you are saying is true but as you said the response is not a string. So you just wanted to check that it was a valid object, not an empty string, None or an empty dictionary, ... About the exceptions, I agree we should check for exceptions in case of networks errors but also we should check for status code to be 200
                – PinoSan
                2 hours ago












              • @PinoSan Of course, and I too prefer the if response style, but you know, "Explicit is better than implicit.". The problem with if response is that it may produce strange errors, difficult to debug. But yes, in most cases a simple boolean check should be enough.
                – t.m.adam
                2 hours ago










              3




              3




              if response is not None can be rewritten as if response. also since the content may change in the future I would have suggested to get the entire div, read only the p and stop when you reach the div with class "toclimit-3"
              – PinoSan
              4 hours ago




              if response is not None can be rewritten as if response. also since the content may change in the future I would have suggested to get the entire div, read only the p and stop when you reach the div with class "toclimit-3"
              – PinoSan
              4 hours ago




              2




              2




              @PinoSan I think it doesn't hurt to check for None explicitly. For example bool('' is not None) is not the same as bool(''). However, in this case the None check is completely unnecessary because response will always be a requests.models.Response object. If the request fails an exception will be raised.
              – t.m.adam
              2 hours ago






              @PinoSan I think it doesn't hurt to check for None explicitly. For example bool('' is not None) is not the same as bool(''). However, in this case the None check is completely unnecessary because response will always be a requests.models.Response object. If the request fails an exception will be raised.
              – t.m.adam
              2 hours ago














              @t.m.adam what you are saying is true but as you said the response is not a string. So you just wanted to check that it was a valid object, not an empty string, None or an empty dictionary, ... About the exceptions, I agree we should check for exceptions in case of networks errors but also we should check for status code to be 200
              – PinoSan
              2 hours ago






              @t.m.adam what you are saying is true but as you said the response is not a string. So you just wanted to check that it was a valid object, not an empty string, None or an empty dictionary, ... About the exceptions, I agree we should check for exceptions in case of networks errors but also we should check for status code to be 200
              – PinoSan
              2 hours ago














              @PinoSan Of course, and I too prefer the if response style, but you know, "Explicit is better than implicit.". The problem with if response is that it may produce strange errors, difficult to debug. But yes, in most cases a simple boolean check should be enough.
              – t.m.adam
              2 hours ago






              @PinoSan Of course, and I too prefer the if response style, but you know, "Explicit is better than implicit.". The problem with if response is that it may produce strange errors, difficult to debug. But yes, in most cases a simple boolean check should be enough.
              – t.m.adam
              2 hours ago














              up vote
              5
              down vote













              Use the library wikipedia



              import wikipedia
              #print(wikipedia.summary("Mathematics"))
              #wikipedia.search("Mathematics")
              print(wikipedia.page("Mathematics").content)





              share|improve this answer























              • I wish I knew about this lib before I messed up with BS4 lib. Thanks!
                – wiki one
                4 hours ago










              • It would be the logical choice for me. I haven't explored much though.
                – QHarr
                4 hours ago








              • 2




                I'd use wikipediaapi instead, wikipedia module seems to be not maintained. Though, both would do the job in a similar manner.
                – alecxe
                3 hours ago












              • @alecxe ooh.. Thanks for that +
                – QHarr
                3 hours ago















              up vote
              5
              down vote













              Use the library wikipedia



              import wikipedia
              #print(wikipedia.summary("Mathematics"))
              #wikipedia.search("Mathematics")
              print(wikipedia.page("Mathematics").content)





              share|improve this answer























              • I wish I knew about this lib before I messed up with BS4 lib. Thanks!
                – wiki one
                4 hours ago










              • It would be the logical choice for me. I haven't explored much though.
                – QHarr
                4 hours ago








              • 2




                I'd use wikipediaapi instead, wikipedia module seems to be not maintained. Though, both would do the job in a similar manner.
                – alecxe
                3 hours ago












              • @alecxe ooh.. Thanks for that +
                – QHarr
                3 hours ago













              up vote
              5
              down vote










              up vote
              5
              down vote









              Use the library wikipedia



              import wikipedia
              #print(wikipedia.summary("Mathematics"))
              #wikipedia.search("Mathematics")
              print(wikipedia.page("Mathematics").content)





              share|improve this answer














              Use the library wikipedia



              import wikipedia
              #print(wikipedia.summary("Mathematics"))
              #wikipedia.search("Mathematics")
              print(wikipedia.page("Mathematics").content)






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 4 hours ago

























              answered 4 hours ago









              QHarr

              29k81839




              29k81839












              • I wish I knew about this lib before I messed up with BS4 lib. Thanks!
                – wiki one
                4 hours ago










              • It would be the logical choice for me. I haven't explored much though.
                – QHarr
                4 hours ago








              • 2




                I'd use wikipediaapi instead, wikipedia module seems to be not maintained. Though, both would do the job in a similar manner.
                – alecxe
                3 hours ago












              • @alecxe ooh.. Thanks for that +
                – QHarr
                3 hours ago


















              • I wish I knew about this lib before I messed up with BS4 lib. Thanks!
                – wiki one
                4 hours ago










              • It would be the logical choice for me. I haven't explored much though.
                – QHarr
                4 hours ago








              • 2




                I'd use wikipediaapi instead, wikipedia module seems to be not maintained. Though, both would do the job in a similar manner.
                – alecxe
                3 hours ago












              • @alecxe ooh.. Thanks for that +
                – QHarr
                3 hours ago
















              I wish I knew about this lib before I messed up with BS4 lib. Thanks!
              – wiki one
              4 hours ago




              I wish I knew about this lib before I messed up with BS4 lib. Thanks!
              – wiki one
              4 hours ago












              It would be the logical choice for me. I haven't explored much though.
              – QHarr
              4 hours ago






              It would be the logical choice for me. I haven't explored much though.
              – QHarr
              4 hours ago






              2




              2




              I'd use wikipediaapi instead, wikipedia module seems to be not maintained. Though, both would do the job in a similar manner.
              – alecxe
              3 hours ago






              I'd use wikipediaapi instead, wikipedia module seems to be not maintained. Though, both would do the job in a similar manner.
              – alecxe
              3 hours ago














              @alecxe ooh.. Thanks for that +
              – QHarr
              3 hours ago




              @alecxe ooh.. Thanks for that +
              – QHarr
              3 hours ago










              up vote
              5
              down vote













              There is a much, much more easy way to get information from wikipedia - Wikipedia API.



              There is this Python wrapper, which allows you to do it in a few lines only with zero HTML-parsing:



              import wikipediaapi

              wiki_wiki = wikipediaapi.Wikipedia('en')

              page = wiki_wiki.page('Mathematics')
              print(page.summary)


              Prints:




              Mathematics (from Greek μάθημα máthēma, "knowledge, study, learning")
              includes the study of such topics as quantity, structure, space, and
              change...(omitted intentionally)







              share|improve this answer

























                up vote
                5
                down vote













                There is a much, much more easy way to get information from wikipedia - Wikipedia API.



                There is this Python wrapper, which allows you to do it in a few lines only with zero HTML-parsing:



                import wikipediaapi

                wiki_wiki = wikipediaapi.Wikipedia('en')

                page = wiki_wiki.page('Mathematics')
                print(page.summary)


                Prints:




                Mathematics (from Greek μάθημα máthēma, "knowledge, study, learning")
                includes the study of such topics as quantity, structure, space, and
                change...(omitted intentionally)







                share|improve this answer























                  up vote
                  5
                  down vote










                  up vote
                  5
                  down vote









                  There is a much, much more easy way to get information from wikipedia - Wikipedia API.



                  There is this Python wrapper, which allows you to do it in a few lines only with zero HTML-parsing:



                  import wikipediaapi

                  wiki_wiki = wikipediaapi.Wikipedia('en')

                  page = wiki_wiki.page('Mathematics')
                  print(page.summary)


                  Prints:




                  Mathematics (from Greek μάθημα máthēma, "knowledge, study, learning")
                  includes the study of such topics as quantity, structure, space, and
                  change...(omitted intentionally)







                  share|improve this answer












                  There is a much, much more easy way to get information from wikipedia - Wikipedia API.



                  There is this Python wrapper, which allows you to do it in a few lines only with zero HTML-parsing:



                  import wikipediaapi

                  wiki_wiki = wikipediaapi.Wikipedia('en')

                  page = wiki_wiki.page('Mathematics')
                  print(page.summary)


                  Prints:




                  Mathematics (from Greek μάθημα máthēma, "knowledge, study, learning")
                  includes the study of such topics as quantity, structure, space, and
                  change...(omitted intentionally)








                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered 3 hours ago









                  alecxe

                  319k63610834




                  319k63610834






















                      up vote
                      1
                      down vote













                      You can get the desired output using lxml library like following.



                      import requests
                      from lxml.html import fromstring

                      url = "https://en.wikipedia.org/wiki/Mathematics"

                      res = requests.get(url)
                      source = fromstring(res.content)
                      paragraph = 'n'.join([item.text_content() for item in source.xpath('//p[following::h2[2][span="History"]]')])
                      print(paragraph)


                      Using BeautifulSoup:



                      from bs4 import BeautifulSoup
                      import requests

                      res = requests.get("https://en.wikipedia.org/wiki/Mathematics")
                      soup = BeautifulSoup(res.text, 'html.parser')
                      for item in soup.find_all("p"):
                      if item.text.startswith("The history"):break
                      print(item.text)





                      share|improve this answer



























                        up vote
                        1
                        down vote













                        You can get the desired output using lxml library like following.



                        import requests
                        from lxml.html import fromstring

                        url = "https://en.wikipedia.org/wiki/Mathematics"

                        res = requests.get(url)
                        source = fromstring(res.content)
                        paragraph = 'n'.join([item.text_content() for item in source.xpath('//p[following::h2[2][span="History"]]')])
                        print(paragraph)


                        Using BeautifulSoup:



                        from bs4 import BeautifulSoup
                        import requests

                        res = requests.get("https://en.wikipedia.org/wiki/Mathematics")
                        soup = BeautifulSoup(res.text, 'html.parser')
                        for item in soup.find_all("p"):
                        if item.text.startswith("The history"):break
                        print(item.text)





                        share|improve this answer

























                          up vote
                          1
                          down vote










                          up vote
                          1
                          down vote









                          You can get the desired output using lxml library like following.



                          import requests
                          from lxml.html import fromstring

                          url = "https://en.wikipedia.org/wiki/Mathematics"

                          res = requests.get(url)
                          source = fromstring(res.content)
                          paragraph = 'n'.join([item.text_content() for item in source.xpath('//p[following::h2[2][span="History"]]')])
                          print(paragraph)


                          Using BeautifulSoup:



                          from bs4 import BeautifulSoup
                          import requests

                          res = requests.get("https://en.wikipedia.org/wiki/Mathematics")
                          soup = BeautifulSoup(res.text, 'html.parser')
                          for item in soup.find_all("p"):
                          if item.text.startswith("The history"):break
                          print(item.text)





                          share|improve this answer














                          You can get the desired output using lxml library like following.



                          import requests
                          from lxml.html import fromstring

                          url = "https://en.wikipedia.org/wiki/Mathematics"

                          res = requests.get(url)
                          source = fromstring(res.content)
                          paragraph = 'n'.join([item.text_content() for item in source.xpath('//p[following::h2[2][span="History"]]')])
                          print(paragraph)


                          Using BeautifulSoup:



                          from bs4 import BeautifulSoup
                          import requests

                          res = requests.get("https://en.wikipedia.org/wiki/Mathematics")
                          soup = BeautifulSoup(res.text, 'html.parser')
                          for item in soup.find_all("p"):
                          if item.text.startswith("The history"):break
                          print(item.text)






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited 1 hour ago

























                          answered 2 hours ago









                          SIM

                          9,6753639




                          9,6753639






















                              wiki one is a new contributor. Be nice, and check out our Code of Conduct.










                              draft saved

                              draft discarded


















                              wiki one is a new contributor. Be nice, and check out our Code of Conduct.













                              wiki one is a new contributor. Be nice, and check out our Code of Conduct.












                              wiki one is a new contributor. Be nice, and check out our Code of Conduct.
















                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53804643%2fhow-using-python3-with-beautifulsoup-get-an-articles-text-from-wikipedia%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Ellipse (mathématiques)

                              Quarter-circle Tiles

                              Mont Emei