Creating a csv file using scrapy











up vote
1
down vote

favorite












I've created a script using python in association with scrapy to parse the movie names and its years spread across multiple pages from a torrent site. My goal here is to write the parsed data in a csv file other than using the built in command provided by scrapy cause when I do like this scrapy crawl torrentdata -o outputfile.csv -t csv I get a blank line in every alternate row in the csv file.



However, I thought to go in a slightly different way to achieve the same. Now, I get a data laden csv file in the right format when I run the following script. Most importantly I made use of with statement while creating a csv file so that when the writing is done the file gets automatically closed. I used crawlerprocess to execute the script from within an IDE.




My question: ain't it a better idea, if I comply with the way I tried below?




This is the working script:



import scrapy
from scrapy.crawler import CrawlerProcess
import csv

class TorrentSpider(scrapy.Spider):
name = "torrentdata"
start_urls = ["https://yts.am/browse-movies?page={}".format(page) for page in range(2,20)] #get something within list
itemlist =

def parse(self, response):
for record in response.css('.browse-movie-bottom'):
items = {}
items["Name"] = record.css('.browse-movie-title::text').extract_first(default='')
items["Year"] = record.css('.browse-movie-year::text').extract_first(default='')
self.itemlist.append(items)

with open("outputfile.csv","w", newline="") as f:
writer = csv.DictWriter(f,['Name','Year'])
writer.writeheader()
for data in self.itemlist:
writer.writerow(data)

c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0',
})
c.crawl(TorrentSpider)
c.start()









share|improve this question


























    up vote
    1
    down vote

    favorite












    I've created a script using python in association with scrapy to parse the movie names and its years spread across multiple pages from a torrent site. My goal here is to write the parsed data in a csv file other than using the built in command provided by scrapy cause when I do like this scrapy crawl torrentdata -o outputfile.csv -t csv I get a blank line in every alternate row in the csv file.



    However, I thought to go in a slightly different way to achieve the same. Now, I get a data laden csv file in the right format when I run the following script. Most importantly I made use of with statement while creating a csv file so that when the writing is done the file gets automatically closed. I used crawlerprocess to execute the script from within an IDE.




    My question: ain't it a better idea, if I comply with the way I tried below?




    This is the working script:



    import scrapy
    from scrapy.crawler import CrawlerProcess
    import csv

    class TorrentSpider(scrapy.Spider):
    name = "torrentdata"
    start_urls = ["https://yts.am/browse-movies?page={}".format(page) for page in range(2,20)] #get something within list
    itemlist =

    def parse(self, response):
    for record in response.css('.browse-movie-bottom'):
    items = {}
    items["Name"] = record.css('.browse-movie-title::text').extract_first(default='')
    items["Year"] = record.css('.browse-movie-year::text').extract_first(default='')
    self.itemlist.append(items)

    with open("outputfile.csv","w", newline="") as f:
    writer = csv.DictWriter(f,['Name','Year'])
    writer.writeheader()
    for data in self.itemlist:
    writer.writerow(data)

    c = CrawlerProcess({
    'USER_AGENT': 'Mozilla/5.0',
    })
    c.crawl(TorrentSpider)
    c.start()









    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I've created a script using python in association with scrapy to parse the movie names and its years spread across multiple pages from a torrent site. My goal here is to write the parsed data in a csv file other than using the built in command provided by scrapy cause when I do like this scrapy crawl torrentdata -o outputfile.csv -t csv I get a blank line in every alternate row in the csv file.



      However, I thought to go in a slightly different way to achieve the same. Now, I get a data laden csv file in the right format when I run the following script. Most importantly I made use of with statement while creating a csv file so that when the writing is done the file gets automatically closed. I used crawlerprocess to execute the script from within an IDE.




      My question: ain't it a better idea, if I comply with the way I tried below?




      This is the working script:



      import scrapy
      from scrapy.crawler import CrawlerProcess
      import csv

      class TorrentSpider(scrapy.Spider):
      name = "torrentdata"
      start_urls = ["https://yts.am/browse-movies?page={}".format(page) for page in range(2,20)] #get something within list
      itemlist =

      def parse(self, response):
      for record in response.css('.browse-movie-bottom'):
      items = {}
      items["Name"] = record.css('.browse-movie-title::text').extract_first(default='')
      items["Year"] = record.css('.browse-movie-year::text').extract_first(default='')
      self.itemlist.append(items)

      with open("outputfile.csv","w", newline="") as f:
      writer = csv.DictWriter(f,['Name','Year'])
      writer.writeheader()
      for data in self.itemlist:
      writer.writerow(data)

      c = CrawlerProcess({
      'USER_AGENT': 'Mozilla/5.0',
      })
      c.crawl(TorrentSpider)
      c.start()









      share|improve this question













      I've created a script using python in association with scrapy to parse the movie names and its years spread across multiple pages from a torrent site. My goal here is to write the parsed data in a csv file other than using the built in command provided by scrapy cause when I do like this scrapy crawl torrentdata -o outputfile.csv -t csv I get a blank line in every alternate row in the csv file.



      However, I thought to go in a slightly different way to achieve the same. Now, I get a data laden csv file in the right format when I run the following script. Most importantly I made use of with statement while creating a csv file so that when the writing is done the file gets automatically closed. I used crawlerprocess to execute the script from within an IDE.




      My question: ain't it a better idea, if I comply with the way I tried below?




      This is the working script:



      import scrapy
      from scrapy.crawler import CrawlerProcess
      import csv

      class TorrentSpider(scrapy.Spider):
      name = "torrentdata"
      start_urls = ["https://yts.am/browse-movies?page={}".format(page) for page in range(2,20)] #get something within list
      itemlist =

      def parse(self, response):
      for record in response.css('.browse-movie-bottom'):
      items = {}
      items["Name"] = record.css('.browse-movie-title::text').extract_first(default='')
      items["Year"] = record.css('.browse-movie-year::text').extract_first(default='')
      self.itemlist.append(items)

      with open("outputfile.csv","w", newline="") as f:
      writer = csv.DictWriter(f,['Name','Year'])
      writer.writeheader()
      for data in self.itemlist:
      writer.writerow(data)

      c = CrawlerProcess({
      'USER_AGENT': 'Mozilla/5.0',
      })
      c.crawl(TorrentSpider)
      c.start()






      python python-3.x web-scraping scrapy






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked 48 mins ago









      robots.txt

      162




      162






















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote













          By putting the CSV exporting logic into the spider itself, you are re-inventing the wheel and not using all the advantages of Scrapy and its components and, also, making the crawling slower as you are writing to disk in the crawling stage every time the callback is triggered.



          As you mentioned, the CSV exporter is built-in, you just need to yield/return items from the parse() callback:



          import scrapy


          class TorrentSpider(scrapy.Spider):
          name = "torrentdata"
          start_urls = ["https://yts.am/browse-movies?page={}".format(page) for page in range(2,20)] #get something within list

          def parse(self, response):
          for record in response.css('.browse-movie-bottom'):
          yield {
          "Name": record.css('.browse-movie-title::text').extract_first(default=''),
          "Year": record.css('.browse-movie-year::text').extract_first(default='')
          }


          Then, by running:



          scrapy runspider spider.py -o outputfile.csv -t csv


          (or the crawl command)



          you would have the following in the outputfile.csv:



          Name,Year
          "Faith, Love & Chocolate",2018
          Bennett's Song,2018
          ...
          Tender Mercies,1983
          You Might Be the Killer,2018





          share|improve this answer





















            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "196"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209766%2fcreating-a-csv-file-using-scrapy%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            0
            down vote













            By putting the CSV exporting logic into the spider itself, you are re-inventing the wheel and not using all the advantages of Scrapy and its components and, also, making the crawling slower as you are writing to disk in the crawling stage every time the callback is triggered.



            As you mentioned, the CSV exporter is built-in, you just need to yield/return items from the parse() callback:



            import scrapy


            class TorrentSpider(scrapy.Spider):
            name = "torrentdata"
            start_urls = ["https://yts.am/browse-movies?page={}".format(page) for page in range(2,20)] #get something within list

            def parse(self, response):
            for record in response.css('.browse-movie-bottom'):
            yield {
            "Name": record.css('.browse-movie-title::text').extract_first(default=''),
            "Year": record.css('.browse-movie-year::text').extract_first(default='')
            }


            Then, by running:



            scrapy runspider spider.py -o outputfile.csv -t csv


            (or the crawl command)



            you would have the following in the outputfile.csv:



            Name,Year
            "Faith, Love & Chocolate",2018
            Bennett's Song,2018
            ...
            Tender Mercies,1983
            You Might Be the Killer,2018





            share|improve this answer

























              up vote
              0
              down vote













              By putting the CSV exporting logic into the spider itself, you are re-inventing the wheel and not using all the advantages of Scrapy and its components and, also, making the crawling slower as you are writing to disk in the crawling stage every time the callback is triggered.



              As you mentioned, the CSV exporter is built-in, you just need to yield/return items from the parse() callback:



              import scrapy


              class TorrentSpider(scrapy.Spider):
              name = "torrentdata"
              start_urls = ["https://yts.am/browse-movies?page={}".format(page) for page in range(2,20)] #get something within list

              def parse(self, response):
              for record in response.css('.browse-movie-bottom'):
              yield {
              "Name": record.css('.browse-movie-title::text').extract_first(default=''),
              "Year": record.css('.browse-movie-year::text').extract_first(default='')
              }


              Then, by running:



              scrapy runspider spider.py -o outputfile.csv -t csv


              (or the crawl command)



              you would have the following in the outputfile.csv:



              Name,Year
              "Faith, Love & Chocolate",2018
              Bennett's Song,2018
              ...
              Tender Mercies,1983
              You Might Be the Killer,2018





              share|improve this answer























                up vote
                0
                down vote










                up vote
                0
                down vote









                By putting the CSV exporting logic into the spider itself, you are re-inventing the wheel and not using all the advantages of Scrapy and its components and, also, making the crawling slower as you are writing to disk in the crawling stage every time the callback is triggered.



                As you mentioned, the CSV exporter is built-in, you just need to yield/return items from the parse() callback:



                import scrapy


                class TorrentSpider(scrapy.Spider):
                name = "torrentdata"
                start_urls = ["https://yts.am/browse-movies?page={}".format(page) for page in range(2,20)] #get something within list

                def parse(self, response):
                for record in response.css('.browse-movie-bottom'):
                yield {
                "Name": record.css('.browse-movie-title::text').extract_first(default=''),
                "Year": record.css('.browse-movie-year::text').extract_first(default='')
                }


                Then, by running:



                scrapy runspider spider.py -o outputfile.csv -t csv


                (or the crawl command)



                you would have the following in the outputfile.csv:



                Name,Year
                "Faith, Love & Chocolate",2018
                Bennett's Song,2018
                ...
                Tender Mercies,1983
                You Might Be the Killer,2018





                share|improve this answer












                By putting the CSV exporting logic into the spider itself, you are re-inventing the wheel and not using all the advantages of Scrapy and its components and, also, making the crawling slower as you are writing to disk in the crawling stage every time the callback is triggered.



                As you mentioned, the CSV exporter is built-in, you just need to yield/return items from the parse() callback:



                import scrapy


                class TorrentSpider(scrapy.Spider):
                name = "torrentdata"
                start_urls = ["https://yts.am/browse-movies?page={}".format(page) for page in range(2,20)] #get something within list

                def parse(self, response):
                for record in response.css('.browse-movie-bottom'):
                yield {
                "Name": record.css('.browse-movie-title::text').extract_first(default=''),
                "Year": record.css('.browse-movie-year::text').extract_first(default='')
                }


                Then, by running:



                scrapy runspider spider.py -o outputfile.csv -t csv


                (or the crawl command)



                you would have the following in the outputfile.csv:



                Name,Year
                "Faith, Love & Chocolate",2018
                Bennett's Song,2018
                ...
                Tender Mercies,1983
                You Might Be the Killer,2018






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 30 mins ago









                alecxe

                14.6k53377




                14.6k53377






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Code Review Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209766%2fcreating-a-csv-file-using-scrapy%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Quarter-circle Tiles

                    build a pushdown automaton that recognizes the reverse language of a given pushdown automaton?

                    Mont Emei