Scrapy 中的 Lua 脚本

2019-6-25 14:59:2

收藏：0

阅读：73

评论：1

我正在使用 scrapy 1.6 和 splash 3.2。我有以下代码：

import scrapy
import random
from scrapy_splash import SplashRequest
from scrapy.utils.response import open_in_browser
from scrapy.linkextractors import LinkExtractor

USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:48.0) Gecko/20100101 Firefox/48.0'

class MySpider(scrapy.Spider):

    start_urls = ["http://yahoo.com"]
    name = 'mytest'

    def start_requests(self):
        for url in self.start_urls:
            yield SplashRequest(url, self.parse, endpoint='render.html', args={'wait': 2.5},headers={'User-Agent': USER_AGENT,'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'})

    def parse(self, response):
        # response.body is a result of render.html call; it
        # contains HTML processed by a browser.
        # from scrapy.http.response.html import HtmlResponse
        # ht = HtmlResponse('jj')
        # ht.body.replace =response
        open_in_browser(response)
        return None

我正在阅读 https://blog.scrapinghub.com/2015/03/02/handling-javascript-in-scrapy-with-splash ，他们在其中给出了一个例子：

function main(splash)
assert(splash:go(splash.args.url))
splash:wait(0.5)
local title = splash:evaljs("document.title")
return {title=title}
end

显然，我不能把 Lua 放在我的 python 脚本中。我该把它放在哪里，如何访问它并将其传递给我的 splash 请求？

用户3923463

你可以像这样将 lua 脚本传递为字符串:

script = """
    function main(splash)
        assert(splash:go(splash.args.url))
        splash:wait(0.5)
        local title = splash:evaljs('document.title')
        return {title=title}
    end
"""
yield SplashRequest(
    url, self.parse, endpoint='render.html',
    args={'wait': 2.5, 'lua_source': script},
    headers={'User-Agent': USER_AGENT,'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'}
)

查看 scrapy-splash 的文档: https://github.com/scrapy-plugins/scrapy-splash

2019-06-28 19:25:06

评论区的留言会收到邮件通知哦~

作者:

用户1592380

Scrapy 中的 Lua 脚本

社区规范

发文指南

社区文章

开源项目 & 应用

🎮 游戏开发

World of Warcraft

Roblox

Defold

LÖVE 2D

🌐 高性能网络与 Web 服务

OpenResty

Kong

Redis

Nmap

LuaJIT

Wapiti

Wireshark

⚙️ 嵌入式系统与应用工具

LuatOS

TeX Live

Awesome WM

Vim/Neovim

FFmpeg

🧠 人工智能与科学计算

Torch

SciLua