问题：

scrapy python csv输出每行之间有空行

韩嘉胜

2023-03-14

在生成的csv输出文件中，每行scrapy输出之间都有不需要的空行。

我已经从python2迁移到Python3，我使用的是Windows10。因此，我正在为python3调整我的scrapy项目。

我目前（也是目前唯一）的问题是，当我将scrapy输出写入CSV文件时，每行之间会有一行空行。这已经在这里的几个帖子中强调了（它与视窗有关），但我无法找到一个工作的解决方案。

碰巧，我还在piplines.py文件中添加了一些代码，以确保csv输出处于给定的列顺序，而不是某个随机顺序。因此，我可以使用正常的scrapy抓取charlesChurch来运行这段代码，而不是scrapy抓取charlesChurch-ocharleschurch2017xxxx.csv

有人知道如何在CSV输出中跳过/省略此空行吗？

下面是我的pipelines.py代码（我可能不需要import csv行，但我想我可能需要最后的答案）：

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html

import csv
from scrapy import signals
from scrapy.exporters import CsvItemExporter

class CSVPipeline(object):

  def __init__(self):
    self.files = {}

  @classmethod
  def from_crawler(cls, crawler):
    pipeline = cls()
    crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
    crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
    return pipeline

  def spider_opened(self, spider):
    file = open('%s_items.csv' % spider.name, 'w+b')
    self.files[spider] = file
    self.exporter = CsvItemExporter(file)
    self.exporter.fields_to_export = ["plotid","plotprice","plotname","name","address"]
    self.exporter.start_exporting()

  def spider_closed(self, spider):
    self.exporter.finish_exporting()
    file = self.files.pop(spider)
    file.close()

  def process_item(self, item, spider):
    self.exporter.export_item(item)
    return item

我将这一行添加到settings.py文件中（不确定300的相关性）：

ITEM_PIPELINES = {'CharlesChurch.pipelines.CSVPipeline': 300 }

我的剪贴代码如下：

import scrapy
from urllib.parse import urljoin

from CharlesChurch.items import CharleschurchItem

class charleschurchSpider(scrapy.Spider):
    name = "charleschurch"
    allowed_domains = ["charleschurch.com"]    
    start_urls = ["https://www.charleschurch.com/county-durham_willington/the-ridings-1111"]


    def parse(self, response):

        for sel in response.xpath('//*[@id="aspnetForm"]/div[4]'):
           item = CharleschurchItem()
           item['name'] = sel.xpath('//*[@id="XplodePage_ctl12_dsDetailsSnippet_pDetailsContainer"]/span[1]/b/text()').extract()
           item['address'] = sel.xpath('//*[@id="XplodePage_ctl12_dsDetailsSnippet_pDetailsContainer"]/div/*[@itemprop="postalCode"]/text()').extract()
           plotnames = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__name"]/a/text()').extract()
           plotnames = [plotname.strip() for plotname in plotnames]
           plotids = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__name"]/a/@href').extract()
           plotids = [plotid.strip() for plotid in plotids]
           plotprices = sel.xpath('//div[@class="housetype js-filter-housetype"]/div[@class="housetype__col-2"]/div[@class="housetype__plots"]/div[not(contains(@data-status,"Sold"))]/div[@class="plot__price"]/text()').extract()
           plotprices = [plotprice.strip() for plotprice in plotprices]
           result = zip(plotnames, plotids, plotprices)
           for plotname, plotid, plotprice in result:
               item['plotname'] = plotname
               item['plotid'] = plotid
               item['plotprice'] = plotprice
               yield item

共有2个答案

闻人凯泽

2023-03-14

wb中的b很可能是问题的一部分，因为这将使文件被视为二进制文件，因此换行符按原样写入。

因此，第一步是删除b。然后通过添加U您还可以激活通用换行支持（请参阅：https://docs.python.org/3/glossary.html#term-通用新线）

所以这条线应该是这样的：

file = open('%s_items.csv' % spider.name, 'Uw+')

萧永长

2023-03-14

我怀疑不理想，但我已经找到了解决这个问题的方法。在pipelines.py文件中，我添加了更多的代码，这些代码基本上是读取带有空行的csv文件到列表中，因此删除空行，然后将清理后的列表写入新文件。

我添加的代码是：

with open('%s_items.csv' % spider.name, 'r') as f:
  reader = csv.reader(f)
  original_list = list(reader)
  cleaned_list = list(filter(None,original_list))

with open('%s_items_cleaned.csv' % spider.name, 'w', newline='') as output_file:
    wr = csv.writer(output_file, dialect='excel')
    for data in cleaned_list:
      wr.writerow(data)

因此，整个pipelines.py文件是：

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html

import csv
from scrapy import signals
from scrapy.exporters import CsvItemExporter

class CSVPipeline(object):

  def __init__(self):
    self.files = {}

  @classmethod
  def from_crawler(cls, crawler):
    pipeline = cls()
    crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
    crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
    return pipeline

  def spider_opened(self, spider):
    file = open('%s_items.csv' % spider.name, 'w+b')
    self.files[spider] = file
    self.exporter = CsvItemExporter(file)
    self.exporter.fields_to_export = ["plotid","plotprice","plotname","name","address"]
    self.exporter.start_exporting()

  def spider_closed(self, spider):
    self.exporter.finish_exporting()
    file = self.files.pop(spider)
    file.close()

    #given I am using Windows i need to elimate the blank lines in the csv file
    print("Starting csv blank line cleaning")
    with open('%s_items.csv' % spider.name, 'r') as f:
      reader = csv.reader(f)
      original_list = list(reader)
      cleaned_list = list(filter(None,original_list))

    with open('%s_items_cleaned.csv' % spider.name, 'w', newline='') as output_file:
        wr = csv.writer(output_file, dialect='excel')
        for data in cleaned_list:
          wr.writerow(data)

  def process_item(self, item, spider):
    self.exporter.export_item(item)
    return item


class CharleschurchPipeline(object):
    def process_item(self, item, spider):
        return item

不理想，但暂时解决了问题。

类似资料：

用Python编写的CSV文件在每行之间都有空行

问题内容：该代码读取，进行更改并将结果写入。但是，当我在中打开生成的时，每条记录后都有一个额外的空白行！有没有办法使它不放在多余的空白行？问题答案：在Python 2中，请outfile使用模式而不是来打开。该写入直接到文件中。如果你未以二进制模式打开文件，它将写入，因为在Windows 文本模式下会将每个文件转换为。在Python 3中，所需的语法已更改（请参见下面的文档链接），因此
Matlab输出-空间填充？

我试图输出一个矩阵：
如何在两个输出之间添加空格？

问题内容：这是我正在使用的代码。我将单独的main方法与此代码一起调用上述方法：有没有一种方法可以轻松在输出之间添加空格？这是当前的样子：问题答案：添加文字空间或制表符：
在Python中用Dictwriter输出时，为什么CSV文件在每个数据行之间都包含一个空白行

问题内容：我正在使用DictWriter将字典中的数据输出到csv文件。为什么CSV文件的每个数据行之间都有一个空白行？这不是什么大问题，但是我的数据集很大并且不适合一个csv文件，因为“ double-spacing”使文件中的行数增加了一倍，因此行数太多。我写字典的代码是：问题答案：默认情况下，模块中的类使用Windows样式的行终止符（），而不使用Unix样式的（）。这可能是造成明显
遍历ls -l输出的每一行

问题内容：我想遍历输出的每一行：现在我正在尝试：但是，这会分别遍历行中的每个元素，因此我得到：但是，我想遍历整个行。我怎么做？问题答案：将IFS设置为换行符，如下所示：如果您不想永久设置IFS，请在其周围放一个子外壳：或同时使用| 改为阅读：还有一个选项，它在同一shell级别上运行while / read：
Twitter Bootstrap-在行之间添加顶部空间

问题内容：如何使用Twitter Bootstrap框架向元素添加边距顶部？问题答案：在Twitter引导程序中编辑或覆盖行是一个坏主意，因为这是页面支架的核心部分，并且您将需要没有上边距的行。要解决此问题，请创建一个新类“ top-buffer”，添加所需的标准边距。然后在需要上边距的行div上使用它。

scrapy python csv输出每行之间有空行

共有2个答案

相关问答

相关文章

相关阅读

相关工具

相关文档