[
  {
    "path": ".gitignore",
    "content": "# Created by .ignore support plugin (hsz.mobi)\n### Eclipse template\n\n.metadata\nbin/\ntmp/\n*.tmp\n*.bak\n*.swp\n*~.nib\nlocal.properties\n.settings/\n.loadpath\n.recommenders\n\n# External tool builders\n.externalToolBuilders/\n\n# Locally stored \"Eclipse launch configurations\"\n*.launch\n\n# PyDev specific (Python IDE for Eclipse)\n*.pydevproject\n\n# CDT-specific (C/C++ Development Tooling)\n.cproject\n\n# Java annotation processor (APT)\n.factorypath\n\n# PDT-specific (PHP Development Tools)\n.buildpath\n\n# sbteclipse plugin\n.target\n\n# Tern plugin\n.tern-project\n\n# TeXlipse plugin\n.texlipse\n\n# STS (Spring Tool Suite)\n.springBeans\n\n# Code Recommenders\n.recommenders/\n\n# Scala IDE specific (Scala & Java development for Eclipse)\n.cache-main\n.scala_dependencies\n.worksheet\n### Maven template\ntarget/\npom.xml.tag\npom.xml.releaseBackup\npom.xml.versionsBackup\npom.xml.next\nrelease.properties\ndependency-reduced-pom.xml\nbuildNumber.properties\n.mvn/timing.properties\n\n# Avoid ignoring Maven wrapper jar file (.jar files are usually ignored)\n!/.mvn/wrapper/maven-wrapper.jar\n### Java template\n# Compiled class file\n*.class\n\n# Log file\n*.log\n\n# BlueJ files\n*.ctxt\n\n# Mobile Tools for Java (J2ME)\n.mtj.tmp/\n\n# Package Files #\n*.jar\n*.war\n*.ear\n*.zip\n*.tar.gz\n*.rar\n\n# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml\nhs_err_pid*\n### JetBrains template\n# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and Webstorm\n# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839\n\n# User-specific stuff:\n.idea/**/workspace.xml\n.idea/**/tasks.xml\n.idea/dictionaries\n\n# Sensitive or high-churn files:\n.idea/**/dataSources/\n.idea/**/dataSources.ids\n.idea/**/dataSources.xml\n.idea/**/dataSources.local.xml\n.idea/**/sqlDataSources.xml\n.idea/**/dynamic.xml\n.idea/**/uiDesigner.xml\n\n# Gradle:\n.idea/**/gradle.xml\n.idea/**/libraries\n\n# CMake\ncmake-build-debug/\n\n# Mongo Explorer plugin:\n.idea/**/mongoSettings.xml\n\n## File-based project format:\n*.iws\n\n## Plugin-specific files:\n\n# IntelliJ\nout/\n\n# mpeltonen/sbt-idea plugin\n.idea_modules/\n\n# JIRA plugin\natlassian-ide-plugin.xml\n\n# Cursive Clojure plugin\n.idea/replstate.xml\n\n# Crashlytics plugin (for Android Studio and IntelliJ)\ncom_crashlytics_export_strings.xml\ncrashlytics.properties\ncrashlytics-build.properties\nfabric.properties\n\n"
  },
  {
    "path": ".travis.yml",
    "content": "language: java\n\njdk:\n  - oraclejdk8\n\nnotifications:\n  email: false\n\nsudo: false\n\nbefore_install:\n    - export TZ='Asia/Shanghai'\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2018 王爵nice (biezhi)\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# Elves\n\n一个轻量级的爬虫框架设计与实现，[博文分析](https://blog.biezhi.me/2018/01/design-and-implement-a-crawler-framework.html)。\n\n[![](https://img.shields.io/travis/biezhi/elves.svg)](https://travis-ci.org/biezhi/elves)\n[![](https://img.shields.io/maven-central/v/io.github.biezhi/elves.svg)](https://mvnrepository.com/artifact/io.github.biezhi/elves)\n[![@biezhi on zhihu](https://img.shields.io/badge/zhihu-%40biezhi-red.svg)](https://www.zhihu.com/people/biezhi)\n[![](https://img.shields.io/badge/license-MIT-FF0080.svg)](https://github.com/biezhi/elves/blob/master/LICENSE)\n[![](https://img.shields.io/github/followers/biezhi.svg?style=social&label=Follow%20Me)](https://github.com/biezhi)\n\n## 特性\n\n- 事件驱动\n- 易于定制\n- 多线程执行\n- `CSS` 选择器和 `XPath` 支持\n\n**Maven** 坐标\n\n```xml\n<dependency>\n    <groupId>io.github.biezhi</groupId>\n    <artifactId>elves</artifactId>\n    <version>0.0.2</version>\n</dependency>\n```\n\n如果你想在本地运行这个项目源码，请确保你是 `Java8` 环境并且安装了 [lombok](https://projectlombok.org/) 插件。\n\n## 架构图\n\n<img src=\"docs/static/elves.png\" width=\"60%\"/>\n\n## 调用流程图\n\n<img src=\"docs/static/dispatch.png\" width=\"90%\"/>\n\n## 快速上手\n\n搭建一个爬虫程序需要进行这么几步操作\n\n1. 编写一个爬虫类继承自 `Spider`\n2. 设置要抓取的 URL 列表\n3. 实现 `Spider` 的 `parse` 方法\n4. 添加 `Pipeline` 处理 `parse` 过滤后的数据\n\n举个栗子:\n\n```java\npublic class DoubanSpider extends Spider {\n\n    public DoubanSpider(String name) {\n        super(name);\n        this.startUrls(\n            \"https://movie.douban.com/tag/爱情\",\n            \"https://movie.douban.com/tag/喜剧\",\n            \"https://movie.douban.com/tag/动画\",\n            \"https://movie.douban.com/tag/动作\",\n            \"https://movie.douban.com/tag/史诗\",\n            \"https://movie.douban.com/tag/犯罪\");\n    }\n\n    @Override\n    public void onStart(Config config) {\n        this.addPipeline((Pipeline<List<String>>) (item, request) -> log.info(\"保存到文件: {}\", item));\n    }\n\n    public Result parse(Response response) {\n        Result<List<String>> result   = new Result<>();\n        Elements             elements = response.body().css(\"#content table .pl2 a\");\n\n        List<String> titles = elements.stream().map(Element::text).collect(Collectors.toList());\n        result.setItem(titles);\n\n        // 获取下一页 URL\n        Elements nextEl = response.body().css(\"#content > div > div.article > div.paginator > span.next > a\");\n        if (null != nextEl && nextEl.size() > 0) {\n            String  nextPageUrl = nextEl.get(0).attr(\"href\");\n            Request nextReq     = this.makeRequest(nextPageUrl, this::parse);\n            result.addRequest(nextReq);\n        }\n        return result;\n    }\n\n}\n\npublic static void main(String[] args) {\n    DoubanSpider doubanSpider = new DoubanSpider(\"豆瓣电影\");\n    Elves.me(doubanSpider, Config.me()).start();\n}\n```\n\n## 爬虫例子\n\n- [豆瓣电影](https://github.com/biezhi/elves/blob/master/src/test/java/io/github/biezhi/elves/examples/DoubanExample.java)\n- [网易新闻](https://github.com/biezhi/elves/blob/master/src/test/java/io/github/biezhi/elves/examples/News163Example.java)\n- [糗事百科](https://github.com/biezhi/elves/blob/master/src/test/java/io/github/biezhi/elves/examples/QiubaiExample.java)\n- [妹。。。妹子图](https://github.com/biezhi/elves/blob/master/src/test/java/io/github/biezhi/elves/examples/MeiziExample.java)\n\n## 开源协议\n\n[MIT](https://github.com/biezhi/elves/blob/master/LICENSE)"
  },
  {
    "path": "pom.xml",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n         xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n         xsi:schemaLocation=\"http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd\">\n    <modelVersion>4.0.0</modelVersion>\n\n    <groupId>io.github.biezhi</groupId>\n    <artifactId>elves</artifactId>\n    <version>0.0.2</version>\n    <name>elves</name>\n    <url>https://biezhi.github.io/elves</url>\n    <description>crawler framework</description>\n\n    <licenses>\n        <license>\n            <name>The Apache Software License, Version 2.0</name>\n            <url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>\n        </license>\n    </licenses>\n    <developers>\n        <developer>\n            <name>biezhi</name>\n            <email>biezhi.me@gmail.com</email>\n        </developer>\n    </developers>\n    <scm>\n        <connection>scm:git@github.com:biezhi/elves.git</connection>\n        <developerConnection>scm:git@github.com:biezhi/elves.git</developerConnection>\n        <url>git@github.com:biezhi/elves.git</url>\n    </scm>\n\n    <dependencies>\n        <dependency>\n            <groupId>org.slf4j</groupId>\n            <artifactId>slf4j-api</artifactId>\n            <version>1.7.25</version>\n        </dependency>\n        <dependency>\n            <groupId>org.jsoup</groupId>\n            <artifactId>jsoup</artifactId>\n            <version>1.10.3</version>\n        </dependency>\n        <dependency>\n            <groupId>us.codecraft</groupId>\n            <artifactId>xsoup</artifactId>\n            <version>0.3.1</version>\n        </dependency>\n        <dependency>\n            <groupId>io.github.biezhi</groupId>\n            <artifactId>oh-my-request</artifactId>\n            <version>0.0.1</version>\n        </dependency>\n        <dependency>\n            <groupId>org.projectlombok</groupId>\n            <artifactId>lombok</artifactId>\n            <version>1.16.18</version>\n            <scope>provided</scope>\n        </dependency>\n\n        <dependency>\n            <groupId>ch.qos.logback</groupId>\n            <artifactId>logback-classic</artifactId>\n            <version>1.2.3</version>\n            <scope>test</scope>\n        </dependency>\n\n    </dependencies>\n\n    <build>\n        <plugins>\n            <plugin>\n                <groupId>org.apache.maven.plugins</groupId>\n                <artifactId>maven-compiler-plugin</artifactId>\n                <configuration>\n                    <source>1.8</source>\n                    <target>1.8</target>\n                </configuration>\n            </plugin>\n        </plugins>\n    </build>\n\n    <profiles>\n        <profile>\n            <id>release</id>\n            <distributionManagement>\n                <snapshotRepository>\n                    <id>oss</id>\n                    <url>\n                        https://oss.sonatype.org/content/repositories/snapshots/\n                    </url>\n                </snapshotRepository>\n                <repository>\n                    <id>oss</id>\n                    <url>\n                        https://oss.sonatype.org/service/local/staging/deploy/maven2/\n                    </url>\n                </repository>\n            </distributionManagement>\n            <build>\n                <plugins>\n                    <!--  Source  -->\n                    <plugin>\n                        <groupId>org.apache.maven.plugins</groupId>\n                        <artifactId>maven-source-plugin</artifactId>\n                        <version>2.4</version>\n                        <executions>\n                            <execution>\n                                <phase>package</phase>\n                                <goals>\n                                    <goal>jar-no-fork</goal>\n                                </goals>\n                            </execution>\n                        </executions>\n                    </plugin>\n                    <!--  Javadoc  -->\n                    <plugin>\n                        <groupId>org.apache.maven.plugins</groupId>\n                        <artifactId>maven-javadoc-plugin</artifactId>\n                        <version>2.10.2</version>\n                        <configuration>\n                            <charset>UTF-8</charset>\n                            <docencoding>UTF-8</docencoding>\n                        </configuration>\n                        <executions>\n                            <execution>\n                                <phase>package</phase>\n                                <goals>\n                                    <goal>jar</goal>\n                                </goals>\n                                <configuration>\n                                    <additionalparam>-Xdoclint:none</additionalparam>\n                                </configuration>\n                            </execution>\n                        </executions>\n                    </plugin>\n                    <!--  Gpg Signature  -->\n                    <plugin>\n                        <groupId>org.apache.maven.plugins</groupId>\n                        <artifactId>maven-gpg-plugin</artifactId>\n                        <version>1.6</version>\n                        <executions>\n                            <execution>\n                                <id>sign-artifacts</id>\n                                <phase>verify</phase>\n                                <goals>\n                                    <goal>sign</goal>\n                                </goals>\n                            </execution>\n                        </executions>\n                    </plugin>\n                </plugins>\n            </build>\n        </profile>\n        <profile>\n            <id>snapshots</id>\n            <distributionManagement>\n                <snapshotRepository>\n                    <id>oss</id>\n                    <url>\n                        https://oss.sonatype.org/content/repositories/snapshots/\n                    </url>\n                </snapshotRepository>\n                <repository>\n                    <id>oss</id>\n                    <url>\n                        https://oss.sonatype.org/service/local/staging/deploy/maven2/\n                    </url>\n                </repository>\n            </distributionManagement>\n            <build>\n                <plugins>\n                    <!--  Source  -->\n                    <plugin>\n                        <groupId>org.apache.maven.plugins</groupId>\n                        <artifactId>maven-source-plugin</artifactId>\n                        <version>2.4</version>\n                        <executions>\n                            <execution>\n                                <phase>package</phase>\n                                <goals>\n                                    <goal>jar-no-fork</goal>\n                                </goals>\n                            </execution>\n                        </executions>\n                    </plugin>\n                    <!--skip test-->\n                    <plugin>\n                        <groupId>org.apache.maven.plugins</groupId>\n                        <artifactId>maven-surefire-plugin</artifactId>\n                        <version>2.17</version>\n                        <configuration>\n                            <skipTests>true</skipTests>\n                        </configuration>\n                    </plugin>\n                    <!--  Gpg Signature  -->\n                    <plugin>\n                        <groupId>org.apache.maven.plugins</groupId>\n                        <artifactId>maven-gpg-plugin</artifactId>\n                        <version>1.6</version>\n                        <executions>\n                            <execution>\n                                <id>sign-artifacts</id>\n                                <phase>verify</phase>\n                                <goals>\n                                    <goal>sign</goal>\n                                </goals>\n                            </execution>\n                        </executions>\n                    </plugin>\n                </plugins>\n            </build>\n        </profile>\n    </profiles>\n\n</project>"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/Elves.java",
    "content": "package io.github.biezhi.elves;\n\nimport io.github.biezhi.elves.config.Config;\nimport io.github.biezhi.elves.event.ElvesEvent;\nimport io.github.biezhi.elves.event.EventManager;\nimport io.github.biezhi.elves.spider.Spider;\nimport lombok.NoArgsConstructor;\nimport lombok.extern.slf4j.Slf4j;\n\nimport java.util.ArrayList;\nimport java.util.List;\nimport java.util.function.Consumer;\n\n/**\n * Elves\n *\n * @author biezhi\n * @date 2018/1/11\n */\n@Slf4j\n@NoArgsConstructor\npublic class Elves {\n\n    List<Spider> spiders = new ArrayList<>();\n    Config config;\n\n    public static Elves me(Spider spider) {\n        return me(spider, Config.me());\n    }\n\n    public static Elves me(Spider spider, Config config) {\n        Elves elves = new Elves();\n        elves.spiders.add(spider);\n        elves.config = config;\n        return elves;\n    }\n\n    public void start() {\n        new ElvesEngine(this).start();\n    }\n\n    public Elves onStart(Consumer<Config> consumer) {\n        EventManager.registerEvent(ElvesEvent.GLOBAL_STARTED, consumer);\n        return this;\n    }\n\n}\n"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/ElvesEngine.java",
    "content": "package io.github.biezhi.elves;\n\nimport io.github.biezhi.elves.config.Config;\nimport io.github.biezhi.elves.download.Downloader;\nimport io.github.biezhi.elves.event.ElvesEvent;\nimport io.github.biezhi.elves.event.EventManager;\nimport io.github.biezhi.elves.pipeline.Pipeline;\nimport io.github.biezhi.elves.request.Parser;\nimport io.github.biezhi.elves.request.Request;\nimport io.github.biezhi.elves.response.Response;\nimport io.github.biezhi.elves.response.Result;\nimport io.github.biezhi.elves.scheduler.Scheduler;\nimport io.github.biezhi.elves.spider.Spider;\nimport io.github.biezhi.elves.utils.ElvesUtils;\nimport io.github.biezhi.elves.utils.NamedThreadFactory;\nimport lombok.extern.slf4j.Slf4j;\n\nimport java.util.List;\nimport java.util.concurrent.*;\nimport java.util.stream.Collectors;\n\n/**\n * Elves Engine\n *\n * @author biezhi\n * @date 2018/1/12\n */\n@Slf4j\npublic class ElvesEngine {\n\n    private List<Spider>    spiders;\n    private Config          config;\n    private boolean         isRunning;\n    private Scheduler       scheduler;\n    private ExecutorService executorService;\n\n    ElvesEngine(Elves elves) {\n        this.scheduler = new Scheduler();\n        this.spiders = elves.spiders;\n        this.config = elves.config;\n        this.executorService = new ThreadPoolExecutor(config.parallelThreads(), config.parallelThreads(), 0, TimeUnit.MILLISECONDS,\n                config.queueSize() == 0 ? new SynchronousQueue<>()\n                        : (config.queueSize() < 0 ? new LinkedBlockingQueue<>()\n                        : new LinkedBlockingQueue<>(config.queueSize())), new NamedThreadFactory(\"task\"));\n    }\n\n    public void start() {\n        if (isRunning) {\n            throw new RuntimeException(\"Elves 已经启动\");\n        }\n\n        isRunning = true;\n        // 全局启动事件\n        EventManager.fireEvent(ElvesEvent.GLOBAL_STARTED, config);\n\n        spiders.forEach(spider -> {\n\n            Config conf = config.clone();\n\n            log.info(\"Spider [{}] 启动...\", spider.getName());\n            log.info(\"Spider [{}] 配置 [{}]\", spider.getName(), conf);\n            spider.setConfig(conf);\n\n            List<Request> requests = spider.getStartUrls().stream()\n                    .map(spider::makeRequest).collect(Collectors.toList());\n\n            spider.getRequests().addAll(requests);\n            scheduler.addRequests(requests);\n\n            EventManager.fireEvent(ElvesEvent.SPIDER_STARTED, conf);\n\n        });\n\n        // 后台生产\n        Thread downloadTread = new Thread(() -> {\n            while (isRunning) {\n                if (!scheduler.hasRequest()) {\n                    ElvesUtils.sleep(100);\n                    continue;\n                }\n                Request request = scheduler.nextRequest();\n                executorService.submit(new Downloader(scheduler, request));\n                ElvesUtils.sleep(request.getSpider().getConfig().delay());\n            }\n        });\n        downloadTread.setDaemon(true);\n        downloadTread.setName(\"download-thread\");\n        downloadTread.start();\n        // 消费\n        this.complete();\n    }\n\n    private void complete() {\n        while (isRunning) {\n            if (!scheduler.hasResponse()) {\n                ElvesUtils.sleep(100);\n                continue;\n            }\n            Response response = scheduler.nextResponse();\n            Parser   parser   = response.getRequest().getParser();\n            if (null != parser) {\n                Result<?>     result   = parser.parse(response);\n                List<Request> requests = result.getRequests();\n                if (!ElvesUtils.isEmpty(requests)) {\n                    requests.forEach(scheduler::addRequest);\n                }\n                if (null != result.getItem()) {\n                    List<Pipeline> pipelines = response.getRequest().getSpider().getPipelines();\n                    pipelines.forEach(pipeline -> pipeline.process(result.getItem(), response.getRequest()));\n                }\n            }\n        }\n    }\n\n    public void stop(){\n        isRunning = false;\n        scheduler.clear();\n        log.info(\"爬虫已经停止.\");\n    }\n\n}\n"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/config/Config.java",
    "content": "package io.github.biezhi.elves.config;\n\nimport lombok.ToString;\n\n/**\n * 爬虫配置\n *\n * @author biezhi\n * @date 2018/1/11\n */\n@ToString\npublic class Config implements Cloneable {\n\n    /**\n     * 读取超时设置\n     */\n    private int timeout = 10_000;\n\n    /**\n     * 下载间隔\n     */\n    private int delay = 1000;\n\n    /**\n     * 下载线程数\n     */\n    private int parallelThreads = Runtime.getRuntime().availableProcessors() * 2;\n\n    /**\n     * UserAgent\n     */\n    private String userAgent = UserAgent.CHROME_FOR_MAC;\n\n    private int queueSize;\n\n    public static Config me() {\n        return new Config();\n    }\n\n    public Config timeout(int timeout) {\n        this.timeout = timeout;\n        return this;\n    }\n\n    public int timeout() {\n        return this.timeout;\n    }\n\n    public Config delay(int delay) {\n        this.delay = delay;\n        return this;\n    }\n\n    public long delay() {\n        return this.delay;\n    }\n\n    public Config parallelThreads(int parallelThreads) {\n        this.parallelThreads = parallelThreads;\n        return this;\n    }\n\n    public int parallelThreads() {\n        return this.parallelThreads;\n    }\n\n    public String userAgent() {\n        return userAgent;\n    }\n\n    public Config userAgent(String userAgent) {\n        this.userAgent = userAgent;\n        return this;\n    }\n\n    public int queueSize() {\n        return queueSize;\n    }\n\n    public Config queueSize(int queueSize) {\n        this.queueSize = queueSize;\n        return this;\n    }\n\n    @Override\n    public Config clone() {\n        try {\n            return (Config) super.clone();\n        } catch (CloneNotSupportedException e) {\n            e.printStackTrace();\n        }\n        return null;\n    }\n\n}"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/config/UserAgent.java",
    "content": "package io.github.biezhi.elves.config;\n\n/**\n * 浏览器UA常量\n *\n * @author biezhi\n * @date 2018/1/11\n */\npublic interface UserAgent {\n\n    String SAFARI_FOR_MAC  = \"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50\";\n    String IE_9_FOR_WIN    = \"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0;\";\n    String IE_8_FOR_WIN    = \"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)\";\n    String IE_7_FOR_WIN    = \"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)\";\n    String FIREFOX_FOR_MAC = \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1\";\n    String OPERA_FOR_MAC   = \"Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; en) Presto/2.8.131 Version/11.11\";\n    String CHROME_FOR_MAC  = \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11\";\n    String TENCENT_TT      = \"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)\";\n    String THE_WORLD       = \"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; The World)\";\n    String SOUGOU          = \"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SE 2.X MetaSr 1.0; SE 2.X MetaSr 1.0; .NET CLR 2.0.50727; SE 2.X MetaSr 1.0)\";\n    String QIHU_360        = \"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)\";\n\n    /**\n     * 移动端UA\n     */\n\n    String SAFARI_FOR_IPHONE = \"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5\";\n}\n"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/download/Downloader.java",
    "content": "package io.github.biezhi.elves.download;\n\nimport io.github.biezhi.elves.request.Request;\nimport io.github.biezhi.elves.response.Response;\nimport io.github.biezhi.elves.scheduler.Scheduler;\nimport lombok.extern.slf4j.Slf4j;\n\nimport java.io.InputStream;\n\n/**\n * 下载器线程\n *\n * @author biezhi\n * @date 2018/1/11\n */\n@Slf4j\npublic class Downloader implements Runnable {\n\n    private final Scheduler scheduler;\n    private final Request   request;\n\n    public Downloader(Scheduler scheduler, Request request) {\n        this.scheduler = scheduler;\n        this.request = request;\n    }\n\n    @Override\n    public void run() {\n        log.debug(\"[{}] 开始请求\", request.getUrl());\n        io.github.biezhi.request.Request httpReq = null;\n        if (\"get\".equalsIgnoreCase(request.method())) {\n            httpReq = io.github.biezhi.request.Request.get(request.getUrl());\n        }\n        if (\"post\".equalsIgnoreCase(request.method())) {\n            httpReq = io.github.biezhi.request.Request.post(request.getUrl());\n        }\n\n        InputStream result = httpReq.contentType(request.contentType()).headers(request.getHeaders())\n                .connectTimeout(request.getSpider().getConfig().timeout())\n                .readTimeout(request.getSpider().getConfig().timeout())\n                .stream();\n\n        log.debug(\"[{}] 下载完毕\", request.getUrl());\n        Response response = new Response(request, result);\n        scheduler.addResponse(response);\n    }\n\n}\n"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/event/ElvesEvent.java",
    "content": "package io.github.biezhi.elves.event;\n\n/**\n * 事件枚举\n *\n * @author biezhi\n * @date 2018/1/11\n */\npublic enum ElvesEvent {\n\n    GLOBAL_STARTED,\n    SPIDER_STARTED\n\n}\n"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/event/EventManager.java",
    "content": "package io.github.biezhi.elves.event;\n\nimport io.github.biezhi.elves.config.Config;\n\nimport java.util.*;\nimport java.util.function.Consumer;\n\n/**\n * 事件管理器\n *\n * @author biezhi\n * @date 2018/1/11\n */\npublic class EventManager {\n\n    private static final Map<ElvesEvent, List<Consumer<Config>>> elvesEventConsumerMap = new HashMap<>();\n\n    public static void registerEvent(ElvesEvent elvesEvent, Consumer<Config> consumer) {\n        List<Consumer<Config>> consumers = elvesEventConsumerMap.get(elvesEvent);\n        if (null == consumers) {\n            consumers = new ArrayList<>();\n        }\n        consumers.add(consumer);\n        elvesEventConsumerMap.put(elvesEvent, consumers);\n    }\n\n    public static void fireEvent(ElvesEvent elvesEvent, Config config) {\n        Optional.ofNullable(elvesEventConsumerMap.get(elvesEvent)).ifPresent(consumers -> consumers.forEach(consumer -> consumer.accept(config)));\n    }\n\n}"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/pipeline/Pipeline.java",
    "content": "package io.github.biezhi.elves.pipeline;\n\nimport io.github.biezhi.elves.request.Request;\nimport io.github.biezhi.elves.spider.Spider;\n\n/**\n * 数据处理接口\n *\n * @author biezhi\n * @date 2018/1/12\n */\npublic interface Pipeline<T> {\n\n    void process(T item, Request request);\n\n}\n"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/request/Parser.java",
    "content": "package io.github.biezhi.elves.request;\n\nimport io.github.biezhi.elves.response.Result;\nimport io.github.biezhi.elves.response.Response;\n\n/**\n * 解析器接口\n *\n * @author biezhi\n * @date 2018/1/12\n */\npublic interface Parser<T> {\n\n    Result<T> parse(Response response);\n\n}\n"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/request/Request.java",
    "content": "package io.github.biezhi.elves.request;\n\nimport io.github.biezhi.elves.spider.Spider;\nimport lombok.Getter;\n\nimport java.nio.charset.Charset;\nimport java.nio.charset.StandardCharsets;\nimport java.util.HashMap;\nimport java.util.Map;\n\n/**\n * Request 请求\n *\n * @author biezhi\n * @date 2018/1/11\n */\n@Getter\npublic class Request<T> {\n\n    private Spider spider;\n    private String url;\n    private String              method  = \"GET\";\n    private Map<String, String> headers = new HashMap<>();\n    private Map<String, String> cookies = new HashMap<>();\n    private String contentType = \"text/html; charset=UTF-8\";\n    private String charset = \"UTF-8\";\n    private Parser<T> parser;\n\n    public Request(Spider spider, String url, Parser<T> parser) {\n        this.spider = spider;\n        this.url = url;\n        this.parser = parser;\n        this.header(\"User-Agent\", spider.getConfig().userAgent());\n    }\n\n    public Request header(String key, String value) {\n        this.headers.put(key, value);\n        return this;\n    }\n\n    public Request cookie(String key, String value) {\n        this.cookies.put(key, value);\n        return this;\n    }\n\n    public String header(String key) {\n        return this.headers.get(key);\n    }\n\n    public String cookie(String key) {\n        return this.cookies.get(key);\n    }\n\n    public void setParser(Parser<T> parser) {\n        this.parser = parser;\n    }\n\n    public String contentType() {\n        return contentType;\n    }\n\n    public Request contentType(String contentType) {\n        this.contentType = contentType;\n        return this;\n    }\n\n    public String charset() {\n        return charset;\n    }\n\n    public Request charset(String charset) {\n        this.charset = charset;\n        return this;\n    }\n\n    public Request method(String method) {\n        this.method = method;\n        return this;\n    }\n\n    public String method() {\n        return this.method;\n    }\n}\n"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/response/Body.java",
    "content": "package io.github.biezhi.elves.response;\n\nimport org.jsoup.Jsoup;\nimport org.jsoup.select.Elements;\nimport us.codecraft.xsoup.XElements;\nimport us.codecraft.xsoup.Xsoup;\n\nimport java.io.BufferedReader;\nimport java.io.InputStream;\nimport java.io.InputStreamReader;\n\n/**\n * 响应Body\n *\n * @author biezhi\n * @date 2018/1/12\n */\npublic class Body {\n\n    private final InputStream inputStream;\n    private final String      charset;\n    private       String      bodyString;\n\n    public Body(InputStream inputStream, String charset) {\n        this.inputStream = inputStream;\n        this.charset = charset;\n    }\n\n    @Override\n    public String toString() {\n        if (null == this.bodyString) {\n            StringBuilder html = new StringBuilder(100);\n            try {\n                BufferedReader br = new BufferedReader(new InputStreamReader(inputStream, charset));\n                String         temp;\n                while ((temp = br.readLine()) != null) {\n                    html.append(temp).append(\"\\n\");\n                }\n            } catch (Exception e) {\n                e.printStackTrace();\n            }\n            this.bodyString = html.toString();\n        }\n        return this.bodyString;\n    }\n\n    public InputStream getInputStream() {\n        return inputStream;\n    }\n\n    public Elements css(String css) {\n        return Jsoup.parse(this.toString()).select(css);\n    }\n\n    public XElements xpath(String xpath) {\n        return Xsoup.compile(xpath).evaluate(Jsoup.parse(this.toString()));\n    }\n\n}\n"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/response/Response.java",
    "content": "package io.github.biezhi.elves.response;\n\nimport io.github.biezhi.elves.request.Request;\nimport lombok.Getter;\n\nimport java.io.InputStream;\n\n/**\n * 响应对象\n *\n * @author biezhi\n * @date 2018/1/11\n */\npublic class Response {\n\n    @Getter\n    private Request request;\n    private Body    body;\n\n    public Response(Request request, InputStream inputStream) {\n        this.request = request;\n        this.body = new Body(inputStream, request.charset());\n    }\n\n    public Body body() {\n        return body;\n    }\n\n}\n"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/response/Result.java",
    "content": "package io.github.biezhi.elves.response;\n\nimport io.github.biezhi.elves.request.Request;\nimport io.github.biezhi.elves.utils.ElvesUtils;\nimport lombok.Data;\nimport lombok.NoArgsConstructor;\n\nimport java.util.ArrayList;\nimport java.util.List;\n\n/**\n * 响应结果封装\n * <p>\n * 存储 Item 数据和新添加的 Request 列表\n *\n * @author biezhi\n * @date 2018/1/12\n */\n@Data\n@NoArgsConstructor\npublic class Result<T> {\n\n    private List<Request> requests = new ArrayList<>();\n    private T item;\n\n    public Result(T item) {\n        this.item = item;\n    }\n\n    public Result addRequest(Request request) {\n        this.requests.add(request);\n        return this;\n    }\n\n    public Result addRequests(List<Request> requests) {\n        if (!ElvesUtils.isEmpty(requests)) {\n            this.requests.addAll(requests);\n        }\n        return this;\n    }\n\n}\n"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/scheduler/Scheduler.java",
    "content": "package io.github.biezhi.elves.scheduler;\n\nimport io.github.biezhi.elves.request.Request;\nimport io.github.biezhi.elves.response.Response;\nimport lombok.extern.slf4j.Slf4j;\n\nimport java.util.List;\nimport java.util.concurrent.BlockingQueue;\nimport java.util.concurrent.LinkedBlockingQueue;\n\n/**\n * 爬虫调度器\n *\n * @author biezhi\n * @date 2018/1/12\n */\n@Slf4j\npublic class Scheduler {\n\n    private BlockingQueue<Request>  pending = new LinkedBlockingQueue<>();\n    private BlockingQueue<Response> result  = new LinkedBlockingQueue<>();\n\n    public void addRequest(Request request) {\n        try {\n            this.pending.put(request);\n        } catch (InterruptedException e) {\n            log.error(\"向调度器添加 Request 出错\", e);\n        }\n    }\n\n    public void addResponse(Response response) {\n        try {\n            this.result.put(response);\n        } catch (InterruptedException e) {\n            log.error(\"向调度器添加 Response 出错\", e);\n        }\n    }\n\n    public boolean hasRequest() {\n        return pending.size() > 0;\n    }\n\n    public Request nextRequest() {\n        try {\n            return pending.take();\n        } catch (InterruptedException e) {\n            log.error(\"从调度器获取 Request 出错\", e);\n            return null;\n        }\n    }\n\n    public boolean hasResponse() {\n        return result.size() > 0;\n    }\n\n    public Response nextResponse() {\n        try {\n            return result.take();\n        } catch (InterruptedException e) {\n            log.error(\"从调度器获取 Response 出错\", e);\n            return null;\n        }\n    }\n\n    public void addRequests(List<Request> requests) {\n        requests.forEach(this::addRequest);\n    }\n\n    public void clear() {\n        pending.clear();\n    }\n\n}\n"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/spider/Spider.java",
    "content": "package io.github.biezhi.elves.spider;\n\nimport io.github.biezhi.elves.config.Config;\nimport io.github.biezhi.elves.event.ElvesEvent;\nimport io.github.biezhi.elves.event.EventManager;\nimport io.github.biezhi.elves.pipeline.Pipeline;\nimport io.github.biezhi.elves.request.Parser;\nimport io.github.biezhi.elves.request.Request;\nimport io.github.biezhi.elves.response.Response;\nimport io.github.biezhi.elves.response.Result;\nimport lombok.Data;\n\nimport java.util.ArrayList;\nimport java.util.Arrays;\nimport java.util.List;\nimport java.util.function.Consumer;\n\n/**\n * 爬虫基类\n *\n * @author biezhi\n * @date 2018/1/11\n */\n@Data\npublic abstract class Spider {\n\n    protected String name;\n    protected Config config;\n    protected List<String>   startUrls = new ArrayList<>();\n    protected List<Pipeline> pipelines = new ArrayList<>();\n    protected List<Request>  requests  = new ArrayList<>();\n\n    public Spider(String name) {\n        this.name = name;\n        EventManager.registerEvent(ElvesEvent.SPIDER_STARTED, this::onStart);\n    }\n\n    public Spider startUrls(String... urls) {\n        this.startUrls.addAll(Arrays.asList(urls));\n        return this;\n    }\n\n    /**\n     * 爬虫启动前执行\n     */\n    public void onStart(Config config) {\n    }\n\n    /**\n     * 添加 Pipeline 处理\n     */\n    protected <T> Spider addPipeline(Pipeline<T> pipeline) {\n        this.pipelines.add(pipeline);\n        return this;\n    }\n\n    /**\n     * 构建一个Request\n     */\n    public <T> Request<T> makeRequest(String url) {\n        return makeRequest(url, this::parse);\n    }\n\n    public <T> Request<T> makeRequest(String url, Parser<T> parser) {\n        return new Request(this, url, parser);\n    }\n\n    /**\n     * 解析 DOM\n     */\n    protected abstract <T> Result<T> parse(Response response);\n\n    protected void resetRequest(Consumer<Request> requestConsumer) {\n        this.resetRequest(this.requests, requestConsumer);\n    }\n\n    protected void resetRequest(List<Request> requests, Consumer<Request> requestConsumer) {\n        requests.forEach(requestConsumer::accept);\n    }\n\n}\n"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/utils/ElvesUtils.java",
    "content": "package io.github.biezhi.elves.utils;\n\nimport java.util.Collection;\nimport java.util.concurrent.TimeUnit;\n\n/**\n * Elves Utils\n *\n * @author biezhi\n * @date 2018/1/12\n */\npublic class ElvesUtils {\n\n    public static void sleep(long time){\n        try {\n            TimeUnit.MILLISECONDS.sleep(time);\n        } catch (InterruptedException e) {\n            e.printStackTrace();\n        }\n    }\n\n    public static <E> boolean isEmpty(Collection<E> collection){\n        return null == collection || collection.size() == 0;\n    }\n\n}\n"
  },
  {
    "path": "src/main/java/io/github/biezhi/elves/utils/NamedThreadFactory.java",
    "content": "package io.github.biezhi.elves.utils;\n\nimport java.util.concurrent.ThreadFactory;\nimport java.util.concurrent.atomic.LongAdder;\n\npublic class NamedThreadFactory implements ThreadFactory {\n\n    private final String prefix;\n    private final LongAdder threadNumber = new LongAdder();\n\n    public NamedThreadFactory(String prefix) {\n        this.prefix = prefix;\n    }\n\n    @Override\n    public Thread newThread(Runnable runnable) {\n        threadNumber.add(1);\n        return new Thread(runnable, prefix + \"@thread-\" + threadNumber.intValue());\n    }\n}"
  },
  {
    "path": "src/test/java/io/github/biezhi/elves/event/ElvesEventTest.java",
    "content": "package io.github.biezhi.elves.event;\n\nimport io.github.biezhi.elves.Elves;\nimport io.github.biezhi.elves.config.Config;\nimport io.github.biezhi.elves.response.Response;\nimport io.github.biezhi.elves.response.Result;\nimport io.github.biezhi.elves.spider.Spider;\n\n/**\n * @author biezhi\n * @date 2018/1/12\n */\npublic class ElvesEventTest {\n\n    public static void main(String[] args) {\n        Elves.me(new Spider(\"测试爬虫\") {\n            @Override\n            public Result<String> parse(Response response) {\n                return new Result<>(response.body().toString());\n            }\n        }, Config.me()).onStart(config -> System.out.println(\"asasas\")).start();\n    }\n\n}\n"
  },
  {
    "path": "src/test/java/io/github/biezhi/elves/examples/DoubanExample.java",
    "content": "package io.github.biezhi.elves.examples;\n\nimport io.github.biezhi.elves.Elves;\nimport io.github.biezhi.elves.config.Config;\nimport io.github.biezhi.elves.pipeline.Pipeline;\nimport io.github.biezhi.elves.request.Request;\nimport io.github.biezhi.elves.response.Response;\nimport io.github.biezhi.elves.response.Result;\nimport io.github.biezhi.elves.spider.Spider;\nimport lombok.extern.slf4j.Slf4j;\nimport org.jsoup.nodes.Element;\nimport org.jsoup.select.Elements;\n\nimport java.util.List;\nimport java.util.stream.Collectors;\n\n/**\n * 豆瓣电影示例\n *\n * @author biezhi\n * @date 2018/1/11\n */\npublic class DoubanExample {\n\n    @Slf4j\n    static class DoubanSpider extends Spider {\n\n        public DoubanSpider(String name) {\n            super(name);\n            this.startUrls(\n                    \"https://movie.douban.com/tag/爱情\",\n                    \"https://movie.douban.com/tag/喜剧\",\n                    \"https://movie.douban.com/tag/动画\",\n                    \"https://movie.douban.com/tag/动作\",\n                    \"https://movie.douban.com/tag/史诗\",\n                    \"https://movie.douban.com/tag/犯罪\");\n        }\n\n        @Override\n        public void onStart(Config config) {\n            this.addPipeline((Pipeline<List<String>>) (item, request) -> log.info(\"保存到文件: {}\", item));\n        }\n\n        public Result parse(Response response) {\n            Result<List<String>> result   = new Result<>();\n            Elements             elements = response.body().css(\"#content table .pl2 a\");\n\n            List<String> titles = elements.stream().map(Element::text).collect(Collectors.toList());\n            result.setItem(titles);\n\n            // 获取下一页 URL\n            Elements nextEl = response.body().css(\"#content > div > div.article > div.paginator > span.next > a\");\n            if (null != nextEl && nextEl.size() > 0) {\n                String  nextPageUrl = nextEl.get(0).attr(\"href\");\n                Request nextReq     = this.makeRequest(nextPageUrl, this::parse);\n                result.addRequest(nextReq);\n            }\n            return result;\n        }\n\n    }\n\n    public static void main(String[] args) {\n        DoubanSpider doubanSpider = new DoubanSpider(\"豆瓣电影\");\n        Elves.me(doubanSpider, Config.me()).start();\n    }\n\n}\n"
  },
  {
    "path": "src/test/java/io/github/biezhi/elves/examples/MeiziExample.java",
    "content": "package io.github.biezhi.elves.examples;\n\nimport io.github.biezhi.elves.Elves;\nimport io.github.biezhi.elves.config.Config;\nimport io.github.biezhi.elves.config.UserAgent;\nimport io.github.biezhi.elves.pipeline.Pipeline;\nimport io.github.biezhi.elves.request.Parser;\nimport io.github.biezhi.elves.request.Request;\nimport io.github.biezhi.elves.response.Response;\nimport io.github.biezhi.elves.response.Result;\nimport io.github.biezhi.elves.spider.Spider;\nimport lombok.extern.slf4j.Slf4j;\nimport org.jsoup.nodes.Element;\nimport org.jsoup.select.Elements;\n\nimport java.io.File;\nimport java.util.List;\nimport java.util.Optional;\nimport java.util.stream.Collectors;\n\n/**\n * 妹子图示例\n *\n * @author biezhi\n * @date 2018/1/12\n */\npublic class MeiziExample {\n\n    @Slf4j\n    static class MeiziSpider extends Spider {\n\n        private String storageDir = \"/Users/biezhi/Desktop/meizi\";\n\n        public MeiziSpider(String name) {\n            super(name);\n            this.startUrls(\n                    \"http://www.meizitu.com/a/pure.html\",\n                    \"http://www.meizitu.com/a/cute.html\",\n                    \"http://www.meizitu.com/a/sexy.html\",\n                    \"http://www.meizitu.com/a/fuli.html\",\n                    \"http://www.meizitu.com/a/legs.html\");\n        }\n\n        @Override\n        public void onStart(Config config) {\n            this.addPipeline((Pipeline<List<String>>) (item, request) -> {\n                item.forEach(imgUrl -> {\n                    log.info(\"开始下载: {}\", imgUrl);\n                    io.github.biezhi.request.Request.get(imgUrl)\n                            .header(\"Referer\", request.getUrl())\n                            .header(\"User-Agent\", UserAgent.CHROME_FOR_MAC)\n                            .connectTimeout(20_000)\n                            .readTimeout(20_000)\n                            .receive(new File(storageDir, System.currentTimeMillis() + \".jpg\"));\n                });\n\n                log.info(\"[{}] 图片下载 OJ8K.\", request.getUrl());\n            });\n\n            this.requests.forEach(this::resetRequest);\n        }\n\n        private Request resetRequest(Request request) {\n            request.contentType(\"text/html; charset=gb2312\");\n            request.charset(\"gb2312\");\n            return request;\n        }\n\n        @Override\n        protected Result parse(Response response) {\n            Result   result   = new Result<>();\n            Elements elements = response.body().css(\"#maincontent > div.inWrap > ul > li:nth-child(1) > div > div > a\");\n            log.info(\"elements size: {}\", elements.size());\n\n            List<Request> requests = elements.stream()\n                    .map(element -> element.attr(\"href\"))\n                    .map(href -> MeiziSpider.this.makeRequest(href, new MeiziSpider.PictureParser()))\n                    .map(this::resetRequest)\n                    .collect(Collectors.toList());\n            result.addRequests(requests);\n\n            // 获取下一页 URL\n            Optional<Element> nextEl = response.body().css(\"#wp_page_numbers > ul > li > a\").stream().filter(element -> \"下一页\".equals(element.text())).findFirst();\n            if (nextEl.isPresent()) {\n                String          nextPageUrl = \"http://www.meizitu.com/a/\" + nextEl.get().attr(\"href\");\n                Request<String> nextReq     = MeiziSpider.this.makeRequest(nextPageUrl, this::parse);\n                result.addRequest(this.resetRequest(nextReq));\n            }\n            return result;\n        }\n\n        static class PictureParser implements Parser<List<String>> {\n            @Override\n            public Result<List<String>> parse(Response response) {\n                Elements     elements = response.body().css(\"#picture > p > img\");\n                List<String> src      = elements.stream().map(element -> element.attr(\"src\")).collect(Collectors.toList());\n                return new Result<>(src);\n            }\n        }\n\n    }\n\n\n    public static void main(String[] args) {\n        MeiziSpider meiziSpider = new MeiziSpider(\"妹子图\");\n        Elves.me(meiziSpider, Config.me().delay(3000)).start();\n    }\n\n}\n"
  },
  {
    "path": "src/test/java/io/github/biezhi/elves/examples/News163Example.java",
    "content": "package io.github.biezhi.elves.examples;\n\nimport io.github.biezhi.elves.Elves;\nimport io.github.biezhi.elves.config.Config;\nimport io.github.biezhi.elves.pipeline.Pipeline;\nimport io.github.biezhi.elves.response.Response;\nimport io.github.biezhi.elves.response.Result;\nimport io.github.biezhi.elves.spider.Spider;\nimport lombok.extern.slf4j.Slf4j;\nimport org.jsoup.nodes.Element;\n\nimport java.util.List;\nimport java.util.stream.Collectors;\n\n/**\n * 网易新闻示例\n *\n * @author biezhi\n * @date 2018/1/15\n */\npublic class News163Example {\n\n    @Slf4j\n    static class News163Spider extends Spider {\n        public News163Spider(String name) {\n            super(name);\n            this.startUrls(\n                    \"http://news.163.com/special/0001386F/rank_news.html\",\n                    \"http://news.163.com/special/0001386F/rank_ent.html\", // 娱乐\n                    \"http://news.163.com/special/0001386F/rank_sports.html\", // 体育\n                    \"http://news.163.com/special/0001386F/rank_tech.html\", // 科技\n                    \"http://news.163.com/special/0001386F/game_rank.html\", //游戏\n                    \"http://news.163.com/special/0001386F/rank_book.html\"); // 读书\n        }\n\n        @Override\n        public void onStart(Config config) {\n            this.addPipeline((Pipeline<List<String>>) (item, request) -> item.forEach(System.out::println));\n            this.requests.forEach(request -> {\n                request.contentType(\"text/html; charset=gb2312\");\n                request.charset(\"gb2312\");\n            });\n        }\n\n        @Override\n        protected Result parse(Response response) {\n            List<String> titles = response.body().css(\"div.areabg1 .area-half.left div.tabContents td a\").stream()\n                    .map(Element::text)\n                    .collect(Collectors.toList());\n\n            return new Result(titles);\n        }\n    }\n\n    public static void main(String[] args) {\n        Elves.me(new News163Spider(\"网易新闻\")).start();\n    }\n\n}\n"
  },
  {
    "path": "src/test/java/io/github/biezhi/elves/examples/QiubaiExample.java",
    "content": "package io.github.biezhi.elves.examples;\n\nimport io.github.biezhi.elves.Elves;\nimport io.github.biezhi.elves.config.Config;\nimport io.github.biezhi.elves.pipeline.Pipeline;\nimport io.github.biezhi.elves.request.Request;\nimport io.github.biezhi.elves.response.Response;\nimport io.github.biezhi.elves.response.Result;\nimport io.github.biezhi.elves.spider.Spider;\nimport lombok.extern.slf4j.Slf4j;\nimport org.jsoup.nodes.Element;\n\nimport java.util.List;\nimport java.util.Optional;\nimport java.util.stream.Collectors;\n\n/**\n * 糗事百科示例\n *\n * @author biezhi\n * @date 2018/1/15\n */\npublic class QiubaiExample {\n\n    private static final String BASE_URL = \"https://www.qiushibaike.com\";\n\n    @Slf4j\n    static class QiubaiSpider extends Spider {\n        public QiubaiSpider(String name) {\n            super(name);\n            this.startUrls(BASE_URL);\n        }\n\n        @Override\n        public void onStart(Config config) {\n            this.addPipeline((Pipeline<List<String>>) (items, request) -> {\n                log.info(\"=== 段子来了 ===\");\n                items.forEach(item -> System.out.println(\"\\r\\n\" + item + \"\\r\\n============END==========\\r\\n\"));\n            });\n        }\n\n        @Override\n        protected Result parse(Response response) {\n            Result result = new Result();\n\n            List<String> items = response.body().css(\"#content-left div.article div.content span\").stream()\n                    .map(element -> element.text().replace(\"<br/>\", \"\\r\\n\"))\n                    .collect(Collectors.toList());\n\n            result.setItem(items);\n\n            // 下一页\n            Optional<Element> nextEl = response.body().css(\"ul.pagination a span\").stream()\n                    .filter(element -> \"下一页\".equals(element.text()))\n                    .map(Element::parent)\n                    .findFirst();\n            if (nextEl.isPresent()) {\n                String          nextPageUrl = BASE_URL + nextEl.get().attr(\"href\");\n                Request<String> nextReq     = QiubaiSpider.this.makeRequest(nextPageUrl, this::parse);\n                result.addRequest(nextReq);\n            }\n            return result;\n        }\n    }\n\n    public static void main(String[] args) {\n        QiubaiSpider qiubaiSpider = new QiubaiSpider(\"糗事百科\");\n        Elves.me(qiubaiSpider, Config.me().delay(2000)).start();\n    }\n\n}\n"
  }
]