获取调试信息：使用Lua脚本，循环超时进行Scrapy-Splash爬取时出现(504)错误

2021-5-13 20:26:15

收藏：0

阅读：88

评论：0

我是编程的新手，正在努力构建一个web爬虫。我使用Lua脚本，以使我的Scrapy请求等待任何Web元素（不关心哪个元素，我只需要最初的页面加载器完成加载，以便我可以访问HTML元素）在网站上的JavaScript加载完成后出现。我想访问的特定网站是https://www.ladbrokes.com.au/sports/basketball/usa/nba，在任何网站元素加载之前都有JS初始加载器页面。

目前我的代码如下：

class Ladbrokes(scrapy.Spider):

      name = 'Ladbrokes'
      allowed_domains = ['ladbrokes.com.au']
      start_urls = ['https://www.ladbrokes.com.au/sports']

      def parse (self, response):

           sports_link = select_ladbrokes(response)

           for link in sports_link:
                url = response.urljoin(link)
                yield SplashRequest(url = url, callback =self.ladbrokes_all_comps,endpoint='execute',
                            args={'lua_source':lua_script})



      def ladbrokes_all_comps(self, response):
           comps = response.xpath('//*[@id="accordion_4e099d27-0f11-4c6e-848e-965fff7ad995"]/div[2]/div[2]/div[1]/div[2]/div[1]/div/div[1]/text()').extract()

lua_script = '''
   function main(splash)

      assert(splash:go(splash.args.url))
      while not splash:select('#page-content-left > div > div') do
         splash:wait(0.1)
      end
      return {html=splash:html()}
    end '''

当我调用我的Spider时，我最终得到了这些错误：

2019-11-25 16:41:30 [scrapy.core.engine] DEBUG: Crawled (504) <GET https://www.ladbrokes.com.au/sports/nrl via http://0.0.0.0:8050/execute> (referer: None)
2019-11-25 16:41:30 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <504 https://www.ladbrokes.com.au/sports/nrl>: HTTP status code is not handled or not allowed

似乎在Lua脚本While循环上超时了，但我不确定是因为我选错了Web元素还是因为其他原因。

我还尝试在SplashRequest函数中设置长时间的‘wait’参数，但似乎初始页面加载器永远没有完成加载。任何关于此问题的帮助都将是有益的！

评论区的留言会收到邮件通知哦~

作者:

用户12427737

获取调试信息：使用Lua脚本，循环超时进行Scrapy-Splash爬取时出现(504)错误

社区规范

发文指南

社区文章

开源项目 & 应用

🎮 游戏开发

World of Warcraft

Roblox

Defold

LÖVE 2D

🌐 高性能网络与 Web 服务

OpenResty

Kong

Redis

Nmap

LuaJIT

Wapiti

Wireshark

⚙️ 嵌入式系统与应用工具

LuatOS

TeX Live

Awesome WM

Vim/Neovim

FFmpeg

🧠 人工智能与科学计算

Torch

SciLua