Full Code of dataabc/weiboSpider for AI

master 720d52a58aef cached
80 files
255.0 KB
97.1k tokens
159 symbols
1 requests
Download .txt
Showing preview only (276K chars total). Download the full file or copy to clipboard to get everything.
Repository: dataabc/weiboSpider
Branch: master
Commit: 720d52a58aef
Files: 80
Total size: 255.0 KB

Directory structure:
gitextract_3r6f4gjt/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug-report.md
│   │   ├── failed.md
│   │   ├── feature-request.md
│   │   └── other.md
│   ├── stale.yml
│   └── workflows/
│       └── python-app.yml
├── .gitignore
├── CONTRIBUTING.md
├── README.md
├── docs/
│   ├── FAQ.md
│   ├── academic.md
│   ├── automation.md
│   ├── contributors.md
│   ├── cookie.md
│   ├── example.md
│   ├── known_issues.md
│   ├── settings.md
│   └── userid.md
├── requirements.txt
├── setup.py
├── tests/
│   ├── __init__.py
│   ├── test_downloader_async.py
│   ├── test_parser/
│   │   ├── __init__.py
│   │   ├── test_album_parser.py
│   │   ├── test_comment_parser.py
│   │   ├── test_index_parser.py
│   │   ├── test_info_parser.py
│   │   ├── test_mblog_picAll_parser.py
│   │   ├── test_page_parser.py
│   │   ├── test_photo_parser.py
│   │   └── util.py
│   └── testdata/
│       ├── 2f62165fa3ca1e85e0d398d385c377a068b76eb95765f7020ffffd3e.html
│       ├── 4957814af5a123b82e974b5537dea736dfb34e48d8835203a45d2e67.html
│       ├── 4d5ed0a3ebd0303cb45edd544dbc0ab5e86d43e103405f0c60515884.html
│       ├── 63a98849ec82b2c87ec55bca03cbf5988f7eac233a23d86b4fdf5ffd.html
│       ├── 76233b3f90394581aac6f19cfa5d674a610e8b442b1f83de7673ab49.html
│       ├── a4437630f3bdfa2757bae1595186ac063fe5ec25cf2f98116ece83cb.html
│       ├── b541fd1751117498b6d6f40d3321686ddf871651237c4ac854a5c3eb.html
│       ├── ca5f2a555e8d62f728c66fa90afb2d54d19f8c898e164204a61bdf03.html
│       ├── d486235d4a17dd0accb0f2cc77b3648abfa03580b9e0cdb61f1e618f.html
│       ├── e4d541ecb02253c14abc1d52605fc00d91279df9ac4c1465c85b91b3.html
│       ├── e97222acd5bc7d8d1bfbd3f352f8cad3e36fdd19e40b69e1c33fb3c3.html
│       └── url_map.json
└── weibo_spider/
    ├── __init__.py
    ├── __main__.py
    ├── config_sample.json
    ├── config_util.py
    ├── datetime_util.py
    ├── downloader/
    │   ├── __init__.py
    │   ├── avatar_picture_downloader.py
    │   ├── downloader.py
    │   ├── img_downloader.py
    │   ├── origin_picture_downloader.py
    │   ├── retweet_picture_downloader.py
    │   └── video_downloader.py
    ├── logging.conf
    ├── parser/
    │   ├── __init__.py
    │   ├── album_parser.py
    │   ├── comment_parser.py
    │   ├── index_parser.py
    │   ├── info_parser.py
    │   ├── mblog_picAll_parser.py
    │   ├── page_parser.py
    │   ├── parser.py
    │   ├── photo_parser.py
    │   └── util.py
    ├── spider.py
    ├── user.py
    ├── user_id_list.txt
    ├── weibo.py
    └── writer/
        ├── __init__.py
        ├── csv_writer.py
        ├── json_writer.py
        ├── kafka_writer.py
        ├── mongo_writer.py
        ├── mysql_writer.py
        ├── post_writer.py
        ├── sqlite_writer.py
        ├── txt_writer.py
        └── writer.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/ISSUE_TEMPLATE/bug-report.md
================================================
---
name: Bug报修
about: 向程序开发者申报bug
title: ''
labels: bug
assignees: ''

---

感谢您申报bug,为了表示感谢,如果bug确实存在,您将出现在本项目的贡献者列表里;如果您不但发现了bug,还提供了很好的解决方案,我们会邀请您以pull request的方式成为本项目的代码贡献者(Contributor);如果您多次提供很好的pull request,我们将邀请您成为本项目的协助者(Collaborator)。当然,是否提供解决方按都是自愿的。不管是否是真正的bug、是否提供解决方案,我们都感谢您对本项目的帮助。

- 问:请您指明哪个版本出了bug(github版/PyPi版/全部)?

答:

- 问:您使用的是否是最新的程序(是/否)?

答:

- 问:爬取任意用户都会复现此bug吗(是/否)?

答:

- 问:若只有爬特定微博时才出bug,能否提供出错微博的weibo_id或url(非必填)?

答:

- 问:若您已提供出错微博的weibo_id或url,可忽略此内容,否则能否提供出错账号的**user_id**及您配置的**since_date**,方便我们定位出错微博(非必填)?

答:

- 问:如果方便,请您描述bug详情,如果代码报错,最好附上错误提示。

答:


================================================
FILE: .github/ISSUE_TEMPLATE/failed.md
================================================
---
name: 程序运行出错
about: 运行出错,需要帮助
title: ''
labels: failed
assignees: ''

---

为了更好的解决问题,请认真回答下面的问题。等到问题解决,请及时关闭本issue。

- 问:请您指明哪个版本运行出错(github版/PyPi版/全部)?

答:

- 问:您使用的是否是最新的程序(是/否)?

答:

- 问:爬取任意用户都会运行出错吗(是/否)?

答:

- 问:若只有爬特定微博时才出错,能否提供出错微博的weibo_id或url(非必填)?

答:

- 问:若您已提供出错微博的weibo_id或url,可忽略此内容,否则能否提供出错账号的**user_id**及您配置的**since_date**,方便我们定位出错微博(非必填)?

答:

- 问:如果方便,请您描述出错详情,最好附上错误提示。

答:


================================================
FILE: .github/ISSUE_TEMPLATE/feature-request.md
================================================
---
name: 新需求或建议
about: 建议开发新功能,或虽然没有新需求但对本项目有其它建议
title: ''
labels: 'feature'
assignees: ''

---

- 问:请说明需要什么新功能。

答:

- 问:请说明添加该功能的意义。(非必填)

答:


================================================
FILE: .github/ISSUE_TEMPLATE/other.md
================================================
---
name: 其它问题
about: 其它想讨论的问题
title: ''
labels: ''
assignees: ''

---




================================================
FILE: .github/stale.yml
================================================
# Number of days of inactivity before an issue becomes stale
daysUntilStale: 60

# Number of days of inactivity before a stale issue is closed
daysUntilClose: 7

# Issues with these labels will never be considered stale
exemptLabels:
  - pinned
  - security
  - to do

# Set to true to ignore issues with an assignee
exemptAssignees: true

# Label to use when marking an issue as stale
staleLabel: wontfix

# Comment to post when marking an issue as stale. Set to `false` to disable
markComment: >
  This issue has been automatically marked as stale because it has not had
  recent activity. It will be closed if no further activity occurs. Thank you
  for your contributions.

# Comment to post when closing a stale issue. Set to `false` to disable
closeComment: >
  Closing as stale, please reopen if you'd like to work on this further.

# Limit to only `issues` or `pulls`
only: issues


================================================
FILE: .github/workflows/python-app.yml
================================================
# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: Python application

on:
  push:
    branches: [ master ]
  pull_request:
    branches: [ master ]

jobs:
  build:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v2
    - name: Set up Python 3.8
      uses: actions/setup-python@v2
      with:
        python-version: 3.8
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install flake8 pytest
        if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
    - name: Lint with flake8
      run: |
        # stop the build if there are Python syntax errors or undefined names
        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
        flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
    - name: Test with pytest
      run: |
        pytest


================================================
FILE: .gitignore
================================================
.vscode 

*.pyc
__pycache__

build/
dist/
*.egg-info

config.json

weibo/
weibo.db
*.log

.idea


================================================
FILE: CONTRIBUTING.md
================================================
# 为本项目做贡献

本项目使用**Python3**编写,感谢大家对项目的支持,也欢迎大家为开源项目做贡献。鉴于大家拥有不同的技能、经验、认知、时间等,每个人可以根据自身的情况为本项目贡献力量。我们不会因为贡献者写的代码少或者提的建议不好而失去感恩之心,每一个乐于奉献的人都值得并且应该被尊重。所以,如果您觉得自己的代码或建议不好,而不好意思去贡献,这样可能就让本项目失去了一次变得更好的机会。所以,如果您有好的想法、建议,或者发现了bug,欢迎通过issue提出来,这也是一种贡献方式。如果您想要为本项目贡献代码,我们也非常欢迎。最开始您可以通过pull request方式提交代码,如果我们发现您的代码质量非常高,或者非常有想法等,我们会邀请您请成为本项目的协作者([Collaborator](https://help.github.com/cn/github/setting-up-and-managing-your-github-user-account/permission-levels-for-a-user-account-repository#collaborator-access-on-a-repository-owned-by-a-user-account)),这样您就可以直接向本项目提交代码了。在您贡献代码之前,请先阅读下面的说明,这会让您更好的贡献代码。

## 贡献代码之前

如果要开发新功能或者其它需要大量编写代码的修改,在开发之前最好发Issue说明一下。比如,“我准备开发xx新功能”或者“我想修改xx功能”之类的。因为要开发的功能不一定适合本项目,所以提前说明讨论,判断新功能或修改是否有必要。否则,费时费力写了很多代码,结果最后没有被采纳,可能会做一些无用功。

## Python风格规范(建议Python新手阅读)

参考[Python风格规范](https://zh-google-styleguide.readthedocs.io/en/latest/google-python-styleguide/python_style_rules/)
或者[Python风格规范](https://github.com/zh-google-styleguide/zh-google-styleguide/blob/master/google-python-styleguide/python_style_rules.rst),
二者内容是一样的。

## git提交规范

参考[Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/)
或者[Git提交规范](https://zhuanlan.zhihu.com/p/67804026),commit描述中文英文皆可,只要符合规范就好。

## git提交建议(可选)

本建议是可选的,如果你觉得不合理,可以按自己的方式编写代码。建议每次提交都是代码改动较少的提交,如果新功能需要大量修改代码,建议将新功能分成几个小模块,每个模块提交一次。原因是这样更容易管理代码。比如,一个新功能包含几个模块。其中大部分模块都写的很好,但是有一个模块有bug。分模块提交只需要单独处理出问题的模块,其他模块不受影响。

## Python之linter

本项目使用flake8。

## Python之formatter

本项目使用yapf。

## 引号的使用

代码中**建议使用单引号**,只有在特殊情况下使用双引号,如类、方法、函数等开头的注释使用6个双引号包裹(注释左边三个双引号,右边三个双引号),或者字符串中中已经包含单引号了,则要用双引号包裹。

## 避免过多的模块依赖

除非有必要,尽量少使用非内置的模块,因为会增加用户的安装成本,当然如果该模块能够为本项目或用户带来很多便利,则可以使用。


================================================
FILE: README.md
================================================
[![Build Status](https://github.com/dataabc/weiboSpider/workflows/Python%20application/badge.svg)](https://badge.fury.io/py/weibo-spider)
[![Python](https://img.shields.io/pypi/pyversions/weibo-spider)](https://badge.fury.io/py/weibo-spider)
[![PyPI](https://badge.fury.io/py/weibo-spider.svg)](https://badge.fury.io/py/weibo-spider)

# Weibo Spider

本程序可以连续爬取**一个**或**多个**新浪微博用户(如[胡歌](https://weibo.cn/u/1223178222)、[迪丽热巴](https://weibo.cn/u/1669879400)、[郭碧婷](https://weibo.cn/u/1729370543))的数据,并将结果信息写入**文件**或**数据库**。写入信息几乎包括用户微博的所有数据,包括**用户信息**和**微博信息**两大类。因为内容太多,这里不再赘述,详细内容见[获取到的字段](#获取到的字段)。如果只需要用户信息,可以通过设置实现只爬取微博用户信息的功能。本程序需设置cookie来获取微博访问权限,后面会讲解[如何获取cookie](#如何获取cookie)。如果不想设置cookie,可以使用[免cookie版](https://github.com/dataabc/weibo-crawler),二者功能类似。

爬取结果可写入文件和数据库,具体的写入文件类型如下:

- **txt文件**(默认)
- **csv文件**(默认)
- **json文件**(可选)
- **MySQL数据库**(可选)
- **MongoDB数据库**(可选)
- **SQLite数据库**(可选)

同时支持下载微博中的图片和视频,具体的可下载文件如下:

- **原创**微博中的原始**图片**(可选)
- **转发**微博中的原始**图片**(可选)
- **原创**微博中的**视频**(可选)
- **转发**微博中的**视频**(可选)
- **原创**微博**Live Photo**中的**视频**([免cookie版](https://github.com/dataabc/weibo-crawler)特有)
- **转发**微博**Live Photo**中的**视频**([免cookie版](https://github.com/dataabc/weibo-crawler)特有)

## 内容列表

[TOC]

- [Weibo Spider](#weibo-spider)
  - [内容列表](#内容列表)
  - [获取到的字段](#获取到的字段)
    - [用户信息](#用户信息)
    - [微博信息](#微博信息)
  - [示例](#示例)
  - [运行环境](#运行环境)
  - [使用说明](#使用说明)
    - [0.版本](#0版本)
    - [1.安装程序](#1安装程序)
      - [源码安装](#源码安装)
      - [pip安装](#pip安装)
    - [2.程序设置](#2程序设置)
    - [3.运行程序](#3运行程序)
  - [个性化定制程序(可选)](#个性化定制程序可选)
  - [定期自动爬取微博(可选)](#定期自动爬取微博可选)
  - [如何获取cookie](#如何获取cookie)
  - [如何获取user_id](#如何获取user_id)
  - [常见问题](#常见问题)
  - [学术研究](#学术研究)
  - [相关项目](#相关项目)
  - [贡献](#贡献)
  - [贡献者](#贡献者)
  - [注意事项](#注意事项)

## 获取到的字段

本部分为爬取到的字段信息说明,为了与[免cookie版](https://github.com/dataabc/weibo-crawler)区分,下面将两者爬取到的信息都列出来。如果是免cookie版所特有的信息,会有免cookie标注,没有标注的为二者共有的信息。

### 用户信息

- 用户id:微博用户id,如"1669879400",其实这个字段本来就是已知字段
- 昵称:用户昵称,如"Dear-迪丽热巴"
- 性别:微博用户性别
- 生日:用户出生日期
- 所在地:用户所在地
- 学习经历:用户上学时学校的名字和时间
- 工作经历:用户所属公司名字和时间
- 阳光信用(免cookie版):用户的阳光信用
- 微博注册时间(免cookie版):用户微博注册日期
- 微博数:用户的全部微博数(转发微博+原创微博)
- 关注数:用户关注的微博数量
- 粉丝数:用户的粉丝数
- 简介:用户简介
- 主页地址(免cookie版):微博移动版主页url
- 头像url(免cookie版):用户头像url
- 高清头像url(免cookie版):用户高清头像url
- 微博等级(免cookie版):用户微博等级
- 会员等级(免cookie版):微博会员用户等级,普通用户该等级为0
- 是否认证(免cookie版):用户是否认证,为布尔类型
- 认证类型(免cookie版):用户认证类型,如个人认证、企业认证、政府认证等
- 认证信息:为认证用户特有,用户信息栏显示的认证信息

### 微博信息

- 微博id:微博唯一标志
- 微博内容:微博正文
- 头条文章url:微博中头条文章的url,若微博中不存在头条文章,则值为''
- 原始图片url:原创微博图片和转发微博转发理由中图片的url,若某条微博存在多张图片,每个url以英文逗号分隔,若没有图片则值为"无"
- 视频url: 微博中的视频url,若微博中没有视频,则值为"无"
- 微博发布位置:位置微博中的发布位置
- 微博发布时间:微博发布时的时间,精确到分
- 点赞数:微博被赞的数量
- 转发数:微博被转发的数量
- 评论数:微博被评论的数量
- 微博发布工具:微博的发布工具,如iPhone客户端、HUAWEI Mate 20 Pro等
- 结果文件:保存在当前目录weibo文件夹下以用户昵称为名的文件夹里,名字为"user_id.csv"和"user_id.txt"的形式
- 微博图片:原创微博中的图片和转发微博转发理由中的图片,保存在以用户昵称为名的文件夹下的img文件夹里
- 微博视频:原创微博中的视频,保存在以用户昵称为名的文件夹下的video文件夹里
- 微博bid(免cookie版):为[免cookie版](https://github.com/dataabc/weibo-crawler)所特有,与本程序中的微博id是同一个值
- 话题(免cookie版):微博话题,即两个#中的内容,若存在多个话题,每个url以英文逗号分隔,若没有则值为''
- @用户(免cookie版):微博@的用户,若存在多个@用户,每个url以英文逗号分隔,若没有则值为''
- 原始微博(免cookie版):为转发微博所特有,是转发微博中那条被转发的微博,存储为字典形式,包含了上述微博信息中的所有内容,如微博id、微博内容等等

## 示例

如果想要知道程序的具体运行结果,可以查看[示例文档](https://github.com/dataabc/weiboSpider/blob/master/docs/example.md),该文档介绍了爬取[迪丽热巴微博](https://weibo.cn/u/1669879400)的例子,并附有部分结果文件截图。

## 运行环境

- 开发语言:python2/python3
- 系统: Windows/Linux/macOS

## 使用说明

### 0.版本

本程序有两个版本,你现在看到的是python3版,另一个是python2版,python2版位于[python2分支](https://github.com/dataabc/weiboSpider/tree/python2)。目前主力开发python3版,包括新功能开发和bug修复;python2版仅支持bug修复。推荐python3用户使用当前版本,推荐python2用户使用[python2版](https://github.com/dataabc/weiboSpider/tree/python2),本使用说明是python3版的使用说明。

### 1.安装程序

本程序提供两种安装方式,一种是**源码安装**,另一种是**pip安装**,二者功能完全相同。如果你需要修改源码,建议使用第一种方式,否则选哪种安装方式都可以。

#### 源码安装

```bash
$ git clone https://github.com/dataabc/weiboSpider.git
$ cd weiboSpider
$ pip install -r requirements.txt
```

#### pip安装

```bash
$ python3 -m pip install weibo-spider
```

### 2.程序设置

要了解程序设置,请查看[程序设置文档](https://github.com/dataabc/weiboSpider/blob/master/docs/settings.md)。

### 3.运行程序

**源码安装**的用户可以在weiboSpider目录运行如下命令,**pip安装**的用户可以在任意有写权限的目录运行如下命令

```bash
$ python3 -m weibo_spider
```

第一次执行,会自动在当前目录创建config.json配置文件,配置好后执行同样的命令就可以获取微博了。

如果你已经有config.json文件了,也可以通过config_path参数配置config.json路径,运行程序,命令行如下:

```bash
$ python3 -m weibo_spider --config_path="config.json"
```

如果你想指定文件(csv、txt、json、图片、视频)保存路径,可以通过output_dir参数设定。假如你想把文件保存到/home/weibo/目录,可以运行如下命令:

```bash
$ python3 -m weibo_spider --output_dir="/home/weibo/"
```

如果你想通过命令行输入user_id,可以使用参数u,可以输入一个或多个user_id,每个user_id以英文逗号分开,如果这些user_id中有重复的user_id,程序会自动去重。命令行如下:

```bash
$ python3 -m weibo_spider --u="1669879400,1223178222"
```

程序会获取user_id分别为1669879400和1223178222的微博用户的微博,后面会讲[如何获取user_id](#如何获取user_id)。该方式的所有user_id使用config.json中的since_date和end_date设置,通过修改它们的值可以控制爬取的时间范围。若config.json中的user_id_list是文件路径,每个命令行中的user_id都会自动保存到该文件内,且自动更新since_date;若不是路径,user_id会保存在当前目录的user_id_list.txt内,且自动更新since_date,若当前目录下不存在user_id_list.txt,程序会自动创建它。

## 个性化定制程序(可选)

本部分为可选部分,如果不需要个性化定制程序或添加新功能,可以忽略此部分。

本程序主体代码位于weibo_spider.py文件,程序主体是一个 Spider 类,上述所有功能都是通过在main函数调用 Spider 类实现的,默认的调用代码如下:

```python
        config = get_config()
        wb = Spider(config)
        wb.start()  # 爬取微博信息
```

用户可以按照自己的需求调用或修改 Spider 类。通过执行本程序,我们可以得到很多信息。

<details>

<summary>点击查看详情</summary>

- wb.user['nickname']:用户昵称;
- wb.user['gender']:用户性别;
- wb.user['location']:用户所在地;
- wb.user['birthday']:用户出生日期;
- wb.user['description']:用户简介;
- wb.user['verified_reason']:用户认证;
- wb.user['talent']:用户标签;
- wb.user['education']:用户学习经历;
- wb.user['work']:用户工作经历;
- wb.user['weibo_num']:微博数;
- wb.user['following']:关注数;
- wb.user['followers']:粉丝数;

</details>

**wb.weibo**:除不包含上述信息外,wb.weibo包含爬取到的所有微博信息,如**微博id**、**微博正文**、**原始图片url**、**发布位置**、**发布时间**、**发布工具**、**点赞数**、**转发数**、**评论数**等。如果爬的是全部微博(原创+转发),除上述信息之外,还包含被**转发微博原始图片url**、**是否为原创微博**等。wb.weibo是一个列表,包含了爬取的所有微博信息。wb.weibo[0]为爬取的第一条微博,wb.weibo[1]为爬取的第二条微博,以此类推。当filter=1时,wb.weibo[0]为爬取的第一条**原创**微博,以此类推。wb.weibo[0]['id']为第一条微博的id,wb.weibo[0]['content']为第一条微博的正文,wb.weibo[0]['publish_time']为第一条微博的发布时间,还有其它很多信息不在赘述,大家可以点击下面的"详情"查看具体用法。

<details>
  
<summary>详情</summary>

若目标微博用户存在微博,则:

- id:存储微博id。如wb.weibo[0]['id']为最新一条微博的id;
- content:存储微博正文。如wb.weibo[0]['content']为最新一条微博的正文;
- article_url:存储微博中头条文章的url。如wb.weibo[0]['article_url']为最新一条微博的头条文章url,若微博中不存在头条文章,则值为'';
- original_pictures:存储原创微博的原始图片url和转发微博转发理由中的图片url。如wb.weibo[0]['original_pictures']为最新一条微博的原始图片url,若该条微博有多张图片,则存储多个url,以英文逗号分割;若该微博没有图片,则值为"无";
- retweet_pictures:存储被转发微博中的原始图片url。当最新微博为原创微博或者为没有图片的转发微博时,则值为"无",否则为被转发微博的图片url。若有多张图片,则存储多个url,以英文逗号分割;
- publish_place:存储微博的发布位置。如wb.weibo[0]['publish_place']为最新一条微博的发布位置,如果该条微博没有位置信息,则值为"无";
- publish_time:存储微博的发布时间。如wb.weibo[0]['publish_time']为最新一条微博的发布时间;
- up_num:存储微博获得的点赞数。如wb.weibo[0]['up_num']为最新一条微博获得的点赞数;
- retweet_num:存储微博获得的转发数。如wb.weibo[0]['retweet_num']为最新一条微博获得的转发数;
- comment_num:存储微博获得的评论数。如wb.weibo[0]['comment_num']为最新一条微博获得的评论数;
- publish_tool:存储微博的发布工具。如wb.weibo[0]['publish_tool']为最新一条微博的发布工具。

</details>

## 定期自动爬取微博(可选)

要想让程序每隔一段时间自动爬取,且爬取的内容为新增加的内容(不包括已经获取的微博),请查看[定期自动爬取微博](https://github.com/dataabc/weiboSpider/blob/master/docs/automation.md)。

## 如何获取cookie

要了解获取cookie方法,请查看[cookie文档](https://github.com/dataabc/weiboSpider/blob/master/docs/cookie.md)。

## 如何获取user_id

要了解获取user_id方法,请查看[user_id文档](https://github.com/dataabc/weiboSpider/blob/master/docs/userid.md),该文档介绍了如何获取一个及多个微博用户user_id的方法。

## 常见问题

如果运行程序的过程中出现错误,可以查看[常见问题](https://github.com/dataabc/weiboSpider/blob/master/docs/FAQ.md)页面,里面包含了最常见的问题及解决方法。另一方面,由于当前项目所使用的技术或API的局限性,我们已知某些情况无法处理或某些需求无法实现,已将其整理总结在了[已知问题](https://github.com/dataabc/weiboSpider/blob/master/docs/known_issues.md)。除此之外,如果您在程序使用过程中遇到与预期不符的行为,可以通过[发issue](https://github.com/dataabc/weiboSpider/issues/new/choose)寻求帮助,我们会很乐意为您解答。

## 学术研究

本项目通过获取微博数据,为写论文、做研究等非商业项目提供所需数据。[学术研究文档](https://github.com/dataabc/weiboSpider/blob/master/docs/academic.md)是一些在论文或研究等方面使用过本程序的项目,这些项目展示已征得所有者同意。在一些涉及隐私的描述上,已与所有者做了沟通,描述中只介绍所有者允许展示的部分。如果部分信息所有者之前同意展示并且已经写在了文档中,现在又不想展示了,可以通过邮件(chillychen1991@gmail.com)或issue的方式告诉我,我会删除相关信息。同时,也欢迎使用本项目写论文或做其它学术研究的朋友,将自己的研究成果展示在[学术研究文档](https://github.com/dataabc/weiboSpider/blob/master/docs/academic.md)里,这完全是自愿的。

为方便大家引用,现提供本项目的 bibtex 条目如下:

```
@misc{weibospider2020,
  author = {Lei Chen, Zhengyang Song, schaepher, minami9, bluerthanever, MKSP2015, moqimoqidea, windlively, eggachecat, mtuwei, codermino, duangan1},
  title = {{Weibo Spider}},
  howpublished = {\url{https://github.com/dataabc/weiboSpider}},
  year = {2020}
}
```

## 相关项目

- [weibo-crawler](https://github.com/dataabc/weibo-crawler) - 功能和本项目完全一样,可以不添加cookie,获取的微博属性更多;
- [weibo-search](https://github.com/dataabc/weibo-search) - 可以连续获取一个或多个**微博关键词搜索**结果,并将结果写入文件(可选)、数据库(可选)等。所谓微博关键词搜索即:**搜索正文中包含指定关键词的微博**,可以指定搜索的时间范围。对于非常热门的关键词,一天的时间范围,可以获得**1000万**以上的搜索结果,N天的时间范围就可以获得1000万 X N搜索结果。对于大多数关键词,一天产生的相应微博数量应该在1000万条以下,因此可以说该程序可以获得大部分关键词的全部或近似全部的搜索结果。而且该程序可以获得搜索结果的所有信息,本程序获得的微博信息该程序都能获得。

## 贡献

欢迎为本项目贡献力量。贡献可以是提交代码,可以是通过issue提建议(如新功能、改进方案等),也可以是通过issue告知我们项目存在哪些bug、缺点等,具体贡献方式见[为本项目做贡献](https://github.com/dataabc/weiboSpider/blob/master/CONTRIBUTING.md)。

## 贡献者

感谢所有为本项目贡献力量的朋友,贡献者详情见[贡献者](https://github.com/dataabc/weiboSpider/blob/master/docs/contributors.md)页面。

## 注意事项

1. user_id不能为爬虫微博的user_id。因为要爬微博信息,必须先登录到某个微博账号,此账号我们姑且称为爬虫微博。爬虫微博访问自己的页面和访问其他用户的页面,得到的网页格式不同,所以无法爬取自己的微博信息;如果想要爬取爬虫微博内容,可以参考[获取自身微博信息](https://github.com/dataabc/weiboSpider/issues/113);
2. cookie有期限限制,大约三个月。若提示cookie错误或已过期,需要重新更新cookie。


================================================
FILE: docs/FAQ.md
================================================
# 常见问题

## 1. 程序运行出错,错误提示中包含“ImportError: cannot import name 'config_util' from '__main__'”,如何解决?

出现这种错误,说明使用者很可能是直接运行的.py文件,程序正确的运行方式是在weiboSpider目录下,运行如下命令:

```bash
python3 -m weibo_spider
```

## 2. 程序运行出错,错误提示中包含“'NoneType' object”字样,如何解决?

这是最常见的问题之一。出错原因是爬取速度太快,被暂时限制了,限制可能包含爬虫账号限制和ip限制。一般情况下,一段时间后限制会自动解除。可通过降低爬取速度避免被限制,具体修改config.json文件中的如下代码:

```json
    "random_wait_pages": [1, 5],
    "random_wait_seconds": [6, 10],
    "global_wait": [[1000, 3600], [500, 2000]],    
```

前两行的意思是每爬取1到5页,随机等待6到10秒。可以通过加快暂停频率(减小random_wait_pages内的值)或增加等待时间(加大random_wait_seconds内的值)避免被限制。最后一行的意思是获取1000页微博,一次性等待3600秒;之后获取500页微博一次性等待2000秒。默认只有两个global_wait配置([1000, 3600]和[500, 2000]),可以添加更多个,也可以自定义。当配置使用完,如默认配置在获取1500(1000+500)页微博后就用完了,之后程序会从第一个配置开始循环使用(获取第1501页到2500页等待3600秒,获取第2501页到第3000页等待2000秒,以此类推)。

## 3. 程序运行出错,错误提示中包含“browser_cookie3.BrowserCookieError: Unable to get key for cookie decryption”字样,如何解决?[issue619](https://github.com/dataabc/weiboSpider/issues/619)

跟Google Chrome的安全策略有关,参考borisbabic/browser_cookie3#210 (comment), 实测换到Google Chrome旧版本就可以了。

## 4. 程序运行出错,错误提示中包含“Failed to obtain weibo.cn cookie from Chrome browser: [Errno 13] Permission denied: 'xxxxxxx'”字样,如何解决?[issue621](https://github.com/dataabc/weiboSpider/issues/621)

可能程序运行时同时运行了chrome,关闭Chrome或者参考https://blog.csdn.net/weixin_43667972/article/details/132197618

## 5. 如何获取微博评论?

因为限制,只能获取一部分评论,无法获取全部,因此暂时没有添加获取评论功能的计划。

## 6. 有的长微博正文只能获取一部分内容,如何解决?

程序是可以获取长微博全文的。程序首先在微博列表页获取微博,如果发现长微博(正文没有显示完整,以“全文”代替部分内容的微博),会先保存这个不全的内容,然后去该长微博的详情页尝试获取全文,如果获取成功,获取的内容就是微博文本;如果获取失败,等待若干秒重新获取;如果连续尝试5次都失败,就用上面不全的内容代替。这样做的原因是避免因部分长微博获取失败而卡住。如果想尝试更多次,可以修改comment_parser.py文件get_long_weibo方法内for循环的次数。

## 7. 如何按指定关键词获取微博?

请使用[weibo-search](https://github.com/dataabc/weibo-search)。该程序可以连续获取一个或多个微博关键词搜索结果,并将结果写入文件(可选)、数据库(可选)等。所谓微博关键词搜索即:搜索正文中包含指定关键词的微博,可以指定搜索的时间范围。对于非常热门的关键词,一天的时间范围,可以获得1000万以上的搜索结果,N天的时间范围就可以获得1000万 X N搜索结果。对于大多数关键词,一天产生的相应微博数量应该在1000万条以下,因此可以说该程序可以获得大部分关键词的全部或近似全部的搜索结果。而且该程序可以获得搜索结果的所有信息,本程序获得的微博信息该程序都能获得。

## 8. 如何获取微博用户关注列表中用户的user_id?

请使用[weibo-follow](https://github.com/dataabc/weibo-follow)。该程序可以利用一个user_id,获取该user_id微博用户关注人的user_id,一个user_id最多可以获得200个user_id,并写入user_id_list.txt文件。程序支持读文件,利用这200个user_id,可以获得最多200X200=40000个user_id。再利用这40000个user_id可以得到40000X200=8000000个user_id,如此反复,以此类推,可以获得大量user_id。本项目也支持读文件,将上述程序的结果文件user_id_list.txt路径赋值给本项目config.json的user_id_list参数,就可以获得这些user_id用户所发布的大量微博。

## 9. 如何获取自己的微博?

修改page_parser.py中__init__方法,将self.url修改为:

```python
        self.url = "https://weibo.cn/%s/profile?page=%d" % (self.user_uri, page)
```


================================================
FILE: docs/academic.md
================================================
# 学术研究

本项目通过获取微博数据,为写论文、做研究等非商业项目提供所需数据。下面是一些在论文或研究等方面使用过本程序的项目。在一些涉及隐私的描述上,已与研究者做了沟通,在下面的描述中只介绍研究者
允许展示的部分。如果部分信息研究者之前同意展示并且已经写在了本文档中,现在又不想展示了,可以通过邮件(chillychen1991@gmail.com)或issue的方式告诉我,我会删除相关信息。同时,使用本项目写论文或做其它学术研究的朋友,如果想把自己的研究成果展示在下面,也可以通过邮件或issue的方式告诉我。

***

- 英国伦敦国王学院[Mak-LokGay](https://github.com/Mak-LokGay)的[毕业论文](https://github.com/Mak-LokGay/KCL_Dissertation)


================================================
FILE: docs/automation.md
================================================
# 定期自动爬取微博(可选)

我们爬取了微博以后,很多微博账号又可能发了一些新微博,定期自动爬取微博就是每隔一段时间自动运行程序,自动爬取这段时间产生的新微博(忽略以前爬过的旧微博)。本部分为可选部分,如果不需要可以忽略。

思路是**利用第三方软件,如crontab,让程序每隔一段时间运行一次**。因为是要跳过以前爬过的旧微博,只爬新微博。所以需要**设置一个动态的since_date**。很多时候我们使用的since_date是固定的,比如since_date="2018-01-01",程序就会按照这个设置从最新的微博一直爬到发布时间为2018-01-01的微博(包括这个时间)。因为我们想追加新微博,跳过旧微博。第二次爬取时since_date值就应该是当前时间到上次爬取的时间。
如果我们使用最原始的方式实现追加爬取,应该是这样:

```text
假如程序第一次执行时间是2019-06-06,since_date假如为2018-01-01,那这一次就是爬取从2018-01-01到2019-06-06这段时间用户所发的微博;
第二次爬取,我们想要接着上次的爬,那since_date的值应该是上次程序执行的日期,即2019-06-06
```

上面的方法太麻烦,因为每次都要手动设置since_date。因此我们需要动态设置since_date,即程序根据实际情况,自动生成since_date。

有两种方法实现动态更新since_date,**推荐使用方法二**。

## 方法一:将since_date设置成整数

将config.json文件中的since_date设置成整数,如:

```json
"since_date": 10,
```

这个配置告诉程序爬取最近10天的微博,更准确说是爬取发布时间从**10天前到本程序开始执行时**之间的微博。这样since_date就是一个动态的变量,每次程序执行时,它的值就是当前日期减10。配合crontab每9天或10天执行一次,就实现了定期追加爬取。

## 方法二:将上次执行程序的时间写入文件(推荐)

这个方法很简单,就是使用[程序设置](https://github.com/dataabc/weiboSpider/blob/master/docs/settings.md)中**设置user_id_list**的第二种方法设置user_id_list,这样设置就全部结束了。

说下这个方法的好处和原理,假如你的txt文件内容为:

```text
1669879400
1223178222 胡歌
1729370543 郭碧婷 2019-01-01 19:28
```

第一次执行时,因为第一行和第二行都没有写时间,程序会按照config.json文件中since_date的值爬取,第三行有时间“2019-01-01 19:28”,程序就会把这个时间当作since_date。每个用户爬取结束程序都会自动更新txt文件,每一行第一部分是user_id,第二部分是用户昵称,第三部分是程序**准备**爬取该用户第一条微博(最新微博)时的时间。爬完三个用户后,txt文件的内容自动更新为:

```text
1669879400 Dear-迪丽热巴 2020-01-13 19:18
1223178222 胡歌 2020-01-13 19:28
1729370543 郭碧婷 2020-01-13 19:33
```

下次再爬取微博的时候,程序会把每行的时间数据作为since_date。这样的好处一是不用修改since_date,程序自动更新;二是每一个用户都可以单独拥有只属于自己的since_date,每个用户的since_date相互独立,互不干扰。since_date既可以是“yyyy-mm-dd”格式,也可以是“yyyy-mm-dd hh:mm”格式。比如,现在又添加了一个新用户,例如杨紫,你想获取她从2018-01-23到现在的全部微博,只需要这样修改txt文件:

```text
1669879400 Dear-迪丽热巴 2020-01-13 19:18
1223178222 胡歌 2020-01-13 19:28
1729370543 郭碧婷 2020-01-13 19:33
1227368500 杨紫 2018-01-23
```

注意每一行的用户配置参数以空格分隔,如果第一个参数全部由数字组成,程序就认为此行为一个用户的配置,否则程序会认为该行只是注释,跳过该行;第二个参数可以为任意格式,建议写用户昵称;第三个如果是日期格式(yyyy-mm-dd),程序就将该日期设置为用户自己的since_date,否则使用config.json中的since_date爬取该用户的微博,第二个参数和第三个参数也可以不填。

推荐第二种方法,本方法是[Evifly](https://github.com/Evifly)想出的,非常热心非常有想法的网友,在此感谢。


================================================
FILE: docs/contributors.md
================================================
# 贡献者

感谢所有为本项目作出贡献和将要作出贡献的朋友,感谢对开源事业的支持。大家每贡献一行code都让项目功能更丰富,每提一个建议都让程序更完善,每发现一个bug都让代码更健壮。

本项目贡献者包含三部分:主要代码开发者、代码贡献者和优质issue提出者。以下按贡献者的用户名首字母排序,若某贡献者在多部分都有贡献,则以主要贡献为准。

## 主要代码开发者

| [dataabc](https://github.com/dataabc) | [songzy12](https://github.com/songzy12) |
| - | - |

## 代码贡献者

| [codermino](https://github.com/codermino) | [duangan1](https://github.com/duangan1) | [MKSP2015](https://github.com/MKSP2015) |
| - | - | - |

## 优质issue提出者

|   |   |   |   |   |   |
| - | - | - | - | - | - |
| [13531982270](https://github.com/13531982270) | [Archenemy61](https://github.com/Archenemy61) | [arctanx](https://github.com/arctanx) | [bossming](https://github.com/bossming) | [bubblesran](https://github.com/bubblesran) | [cangling](https://github.com/cangling)|
| [Ccccche](https://github.com/Ccccche) | [Evifly](https://github.com/Evifly) | [gudaost](https://github.com/gudaost) | [Hylan129](https://github.com/Hylan129) | [HZzzzy](https://github.com/HZzzzy) | [kur0mi](https://github.com/kur0mi) |
| [leonall](https://github.com/leonall) | [liu-song](https://github.com/liu-song) | [Issac110](https://github.com/Issac110) | [MengyingQian](https://github.com/MengyingQian) | [PandGnone](https://github.com/PandGnone) | [PLQin](https://github.com/PLQin) |
| [redMUSCLE](https://github.com/redMUSCLE) | [shengdade](https://github.com/shengdade) | [softrime](https://github.com/softrime) | [SugimitoYuuji](https://github.com/SugimitoYuuji) | [sunbat](https://github.com/sunbat) | [taichifox95](https://github.com/taichifox95) |
| [Twinklingcode](https://github.com/Twinklingcode) | [vincentlee5](https://github.com/vincentlee5) | [wiidi](https://github.com/wiidi) | [wwwpf](https://github.com/wwwpf) | [xiaomingdaily](https://github.com/xiaomingdaily) | [xiekeyi98](https://github.com/xiekeyi98) |
| [xnzmc](https://github.com/xnzmc) | [yangy9593](https://github.com/yangy9593) | [zhangjibao](https://github.com/zhangjibao) |


================================================
FILE: docs/cookie.md
================================================
# 如何获取cookie

1. 用Chrome打开<https://passport.weibo.cn/signin/login>;
2. 输入微博的用户名、密码,登录,如图所示:
![weibo log in page](https://github.com/dataabc/media/blob/master/weiboSpider/images/cookie1.png)
登录成功后会跳转到<https://m.weibo.cn>;
3. 按F12键打开Chrome开发者工具,在地址栏输入并跳转到<https://weibo.cn>,跳转后会显示如下类似界面:
![chrome debugger network tab](https://github.com/dataabc/media/blob/master/weiboSpider/images/cookie2.png)
4. 依此点击Chrome开发者工具中的Network->Name中的weibo.cn->Headers->Request Headers,"Cookie:"后的值即为我们要找的cookie值,复制即可,如图所示:
![cookie in request headers section](https://github.com/dataabc/media/blob/master/weiboSpider/images/cookie3.png)

================================================
FILE: docs/example.md
================================================
# 实例

以爬取迪丽热巴的微博为例,我们需要修改**config.json**文件,文件内容如下:

```json
{
    "user_id_list": ["1669879400"],
    "filter": 1,
    "since_date": "1900-01-01",
    "end_date": "now",
    "write_mode": ["csv", "txt", "json"],
    "pic_download": 1,
    "video_download": 1,
    "result_dir_name": 0,
    "cookie": "your cookie"
}
```

对于上述参数的含义以及取值范围,这里仅作简单介绍,详细信息见[程序设置](https://github.com/dataabc/weiboSpider/blob/master/docs/settings.md)。

- **user_id_list**代表我们要爬取的微博用户的user_id,可以是一个或多个,也可以是文件路径,微博用户Dear-迪丽热巴的user_id为1669879400,具体如何获取user_id见[如何获取user_id](https://github.com/dataabc/weiboSpider/blob/master/docs/userid.md);
- **filter**的值为1代表爬取全部原创微博,值为0代表爬取全部微博(原创+转发);
- **since_date**代表我们要爬取since_date日期之后发布的微博,因为我要爬迪丽热巴的全部原创微博,所以since_date设置了一个非常早的值;
- **end_date**代表我们要爬取end_date日期之前发布的微博,since_date配合end_date,表示我们要爬取发布日期在since_date和end_date之间的微博,包含边界,如果end_date值为"now",表示爬取发布日期从since_date到现在的微博;
- **write_mode**代表结果文件的保存类型,我想要把结果写入txt文件、csv文件和json文件,所以它的值为["csv", "txt", "json"],如果你想写入数据库,具体设置见[设置数据库](https://github.com/dataabc/weiboSpider/blob/master/docs/settings.md#设置数据库可选);
- **pic_download**值为1代表下载微博中的图片,值为0代表不下载;
- **video_download**值为1代表下载微博中的视频,值为0代表不下载;
- **result_dir_name**控制结果文件夹名,值为1代表文件夹名是用户id,值为0代表文件夹名是用户昵称;
- **cookie**是爬虫微博的cookie,具体如何获取cookie见[cookie文档](https://github.com/dataabc/weiboSpider/blob/master/docs/cookie.md),获取cookie后把"your cookie"替换成真实的cookie值即可。

cookie修改完成后在weiboSpider目录下运行如下命令:

```bash
$ python3 -m weibo_spider
```

程序会自动生成一个weibo文件夹,我们以后爬取的所有微博都被存储在这里。然后程序在该文件夹下生成一个名为"Dear-迪丽热巴"的文件夹,迪丽热巴的所有微博爬取结果都在这里。"Dear-迪丽热巴"文件夹里包含一个csv文件、一个txt文件、一个json文件、一个img文件夹和一个video文件夹,img文件夹用来存储下载到的图片,video文件夹用来存储下载到的视频。如果你设置了保存数据库功能,这些信息也会保存在数据库里,数据库设置见[设置数据库](https://github.com/dataabc/weiboSpider/blob/master/docs/settings.md#设置数据库可选)部分。

## csv结果文件如下所示

*1669879400.csv*

![](https://github.com/dataabc/media/blob/master/weiboSpider/images/weibo_csv.png)

## txt结果文件如下所示

*1669879400.txt*

![](https://github.com/dataabc/media/blob/master/weiboSpider/images/weibo_txt.png)

json文件包含迪丽热巴的用户信息和上千条微博信息,内容较多。为了表达清晰,这里仅展示两条微博。

## json结果文件如下所示

*1669879400.json*

```json
{
    "user": {
        "id": "1669879400",
        "nickname": "Dear-迪丽热巴",
        "gender": "女",
        "location": "上海",
        "birthday": "双子座",
        "description": "一只喜欢默默表演的小透明。工作联系jaywalk@jaywalk.com.cn 🍒",
        "verified_reason": "嘉行传媒签约演员",
        "talent": "",
        "education": "上海戏剧学院",
        "work": "嘉行传媒 ",
        "weibo_num": 1121,
        "following": 250,
        "followers": 66395910
    },
    "weibo": [
        {
            "id": "IonM9ryMy",
            "content": "2019#微博之夜#盛典即将开启,以微博之力,让世界更美。1月11日,不见不散@微博之夜  原图 ",
            "original_pictures": "http://wx1.sinaimg.cn/large/63885668ly1gao0a01kfzj20ku112k98.jpg",
            "video_url": "无",
            "publish_place": "无",
            "publish_time": "2020-01-07 14:59",
            "publish_tool": "无",
            "up_num": 239242,
            "retweet_num": 71914,
            "comment_num": 55916
        },
        {
            "id": "InB4Df73X",
            "content": "#happyNEOyear#都到了2020,还不换点新pose配新装[來] 穿上@adidasneo 迪士尼联名款,让#生来好动#的我们一起玩“新”大发、自拍不重样🤳http://t.cn/AiF7nREj adidasneo的微博视频  ",
            "original_pictures": "无",
            "video_url": "http://f.video.weibocdn.com/000pYrGmlx07zPTskBQQ010412008AOY0E010.mp4?label=mp4_hd&template=852x480.25.0&trans_finger=62b30a3f061b162e421008955c73f536&Expires=1578569162&ssig=IV3JEbh3Zu&KID=unistore,video",
            "publish_place": "无",
            "publish_time": "2020-01-02 11:00",
            "publish_tool": "无",
            "up_num": 275419,
            "retweet_num": 376734,
            "comment_num": 131069
        }
    ]
}
```

## 下载的图片如下所示

*img文件夹*

![](https://github.com/dataabc/media/blob/master/weiboSpider/images/img.png)

本次下载了793张图片,大小一共1.21GB,包括她原创微博中的图片和转发微博转发理由中的图片。图片名为yyyymmdd+微博id的形式,若某条微博存在多张图片,则图片名中还会包括它在微博图片中的序号。若某张图片因为网络等原因下载失败,程序则会以“weibo_id:pic_url”的形式将出错微博id和图片url写入同文件夹下的not_downloaded.txt里;

## 下载的视频如下所示

*video文件夹*

![](https://github.com/dataabc/media/blob/master/weiboSpider/images/video.png)

本次下载了70个视频,是她原创微博中的视频,视频名为yyyymmdd+微博id的形式。其中有一个视频因为网络原因下载失败,程序将它的微博id和视频url以“weibo_id:video_url”的形式写到了同文件夹下的not_downloaded.txt里。

因为我本地没有安装MySQL数据库和MongoDB数据库,所以暂时设置成不写入数据库。如果你想要将爬取结果写入数据库,只需要先安装数据库(MySQL或MongoDB),再安装对应包(pymysql或pymongo),然后将mysql_write或mongodb_write值设置为1即可。写入MySQL需要用户名、密码等配置信息,这些配置如何设置见[设置数据库](https://github.com/dataabc/weiboSpider/blob/master/docs/settings.md#设置数据库可选)部分。


================================================
FILE: docs/known_issues.md
================================================
# 已知问题

该文档列出由于本项目所选用的技术局限而导致的已知的无法或难以在短时间内修复的问题。

## 1. 程序无法爬取同时带有图片和视频的微博

参见:https://github.com/dataabc/weiboSpider/issues/668

具体原因如下:

当前项目的爬取实现是通过微博移动版来实现的(weibo.cn 而非 weibo.com),因为移动版的结构相对简单。

对于同时带有图片和视频的微博,在移动版的显示如下:https://weibo.cn/3384824824/Q11YMrQtB

```
#PS5##合金装备# [查看全部图片/视频]   08月22日 15:07  关注他
```

其中 `[查看全部图片/视频]` 只有一个链接到微博桌面版,而没有可供可直接爬取的数据:

```
<a href="https://weibo.com/3384824824/Q11YMrQtB">查看全部图片/视频</a>
```


================================================
FILE: docs/settings.md
================================================
# 程序设置

**源码下载安装**的用户在weiboSpider目录下运行如下命令,**pip安装**的用户在任意有写权限的目录运行如下命令:

```bash
$ python3 -m weibo_spider
```

第一次运行会生成**config.json**文件,请打开**config.json**文件,你会看到如下内容:

```json
{
    "user_id_list": ["1669879400"],
    "filter": 1,
    "since_date": "2018-01-01",
    "end_date": "now",
    "random_wait_pages": [1, 5],
    "random_wait_seconds": [6, 10],
    "global_wait": [[1000, 3600], [500, 2000]],    
    "write_mode": ["csv", "txt"],
    "pic_download": 1,
    "video_download": 1,
    "result_dir_name": 0,
    "cookie": "your cookie",
    "mysql_config": {
        "host": "localhost",
        "port": 3306,
        "user": "root",
        "password": "123456",
        "charset": "utf8mb4"
    },
    "sqlite_config": "weibo.db"
}
```

下面讲解每个参数的含义与设置方法。

## 设置user_id_list

user_id_list是我们要爬取的微博的id,可以是一个,也可以是多个,例如:

```json
"user_id_list": ["1223178222", "1669879400", "1729370543"],
```

上述代码代表我们要连续爬取user_id分别为“1223178222”、 “1669879400”、 “1729370543”的三个用户的微博,具体如何获取user_id见[如何获取user_id](https://github.com/dataabc/weiboSpider/blob/master/docs/userid.md)。

user_id_list的值也可以是文件路径,我们可以把要爬的所有微博用户的user_id都写到txt文件里,然后把文件的位置路径赋值给user_id_list,**推荐这种方式**。

在txt文件中,每个user_id占一行,也可以在user_id后面加注释(可选),如用户昵称等信息,user_id和注释之间必需要有空格,文件名任意,类型为txt,位置位于本程序的同目录下,文件内容示例如下:

```text
1223178222 胡歌
1669879400 迪丽热巴
1729370543 郭碧婷
```

假如文件叫user_id_list.txt,则user_id_list设置代码为:

```json
"user_id_list": "user_id_list.txt",
```

## 设置filter

filter控制爬取范围,值为1代表爬取全部原创微博,值为0代表爬取全部微博(原创+转发)。例如,如果要爬全部原创微博,请使用如下代码:

```json
"filter": 1,
```

## 设置since_date

since_date值可以是日期,也可以是整数。如果是日期,代表爬取该日期之后的微博,格式应为“yyyy-mm-dd”,如:

```json
"since_date": "2018-01-01",
```

代表爬取从2018年1月1日到现在的微博。

如果是整数,代表爬取最近n天的微博,如:

```json
"since_date": 10,
```

代表爬取最近10天的微博,这个说法不是特别准确,准确说是爬取发布时间从**10天前到本程序开始执行时**之间的微博。

**since_date是所有user的爬取起始时间,非常不灵活。如果你要爬多个用户,并且想单独为每个用户设置一个since_date,可以使用[定期自动爬取微博](https://github.com/dataabc/weiboSpider/blob/master/docs/automation.md)方法二中的方法,该方法可以为多个用户设置不同的since_date,非常灵活。**

## 设置end_date

end_date值可以是日期,也可以是"now"。如果是日期,代表爬取该日期之前的微博,格式应为“yyyy-mm-dd”;如果是"now",代表爬取发布日期从since_date到现在的微博。since_date配合end_date,表示爬取发布日期在since_date和end_date之间的微博,包含边界。since_date是起始日期,end_date是结束日期,因此end_date时间应晚于since_date。注意,since_date即可以通过config.json文件的since_date参数设置,也可以通过user_id_list.txt设置;而end_date只能通过config.json文件的end_date参数设置,是全局变量,所有user_id都使用同一个end_date。

**推荐使用"now"作为end_date值**,当值为"now"时,获取结果是正确和稳定的;当end_date值不是"now"时,在爬微博数非常多的账号时,程序可能不稳定,得到很多空微博页,并且此时无法获取微博中的视频,如果想要获取视频,请为end_date赋值为"now"。

## 设置random_wait_pages

random_wait_pages值是一个长度为2的整数列表,代表每爬取x页微博暂停一次,x为整数,值在random_wait_pages列表两个整数之间随机获取。默认值为[1, 5],代表每爬取1到5页暂停一次,如果程序被限制,可以加快暂停频率,即适当减小random_wait_pages内的值。

## 设置random_wait_seconds

random_wait_seconds值是一个长度为2的整数列表,代表每次暂停sleep x 秒,x为整数, 值在random_wait_seconds列表两个整数之间随机获取。默认值为[6, 10],代表每次暂停sleep 6到10秒,如果程序被限制,可以增加等待时间,即适当增大random_wait_seconds内的值。

## 设置global_wait

global_wait控制全局等待时间,默认值为[[1000, 3600], [500, 2000]],代表获取1000页微博,程序一次性暂停3600秒;之后获取500页微博,程序再一次性暂停2000秒;之后如果再获取1000页微博,程序一次性暂停3600秒,以此类推。默认的只有前面的两个全局等待时间([1000, 3600]和[500, 2000]),可以设置多个,如值可以为[[1000, 3600], [500, 3000], [700, 3600]],程序会根据配置依次等待对应时间,如果配置全部被使用,程序会从第一个配置开始,依次使用,循环往复。

## 设置write_mode

write_mode控制结果文件格式,取值范围是csv、txt、json、mongo、mysql和sqlite,分别代表将结果文件写入csv、txt、json、MongoDB、MySQL和SQLite数据库。write_mode可以同时包含这些取值中的一个或几个,如:

```json
"write_mode": ["csv", "txt"],
```

代表将结果信息写入csv文件和txt文件。特别注意,如果你想写入数据库,除了在write_mode添加对应数据库的名字外,还应该安装相关数据库和对应python模块,具体操作见[设置数据库](https://github.com/dataabc/weiboSpider/blob/master/docs/settings.md#设置数据库可选)部分。

## 设置pic_download

pic_download控制是否下载微博中的图片,值为1代表下载,值为0代表不下载,如

```json
"pic_download": 1,
```

代表下载微博中的图片。

## 设置video_download

video_download控制是否下载微博中的视频,值为1代表下载,值为0代表不下载,如

```json
"video_download": 1,
```

代表下载微博中的视频。

## 设置result_dir_name

result_dir_name控制结果目录的名字,可选值为0和1,默认值为0:

```json
"result_dir_name": 0,
```

值为0表示将结果文件保存在以用户昵称为名的文件夹里,这样结果更清晰;值为1表示将结果保存在以用户id为名的文件夹里,这样更能保证多次爬取的一致性,因为用户昵称可以改变,用户id是不变的。

## 设置cookie

请按照[如何获取cookie](https://github.com/dataabc/weiboSpider/blob/master/docs/cookie.md),获取cookie,然后将“your cookie”替换成真实的cookie值。

## 设置mysql_config(可选)

mysql_config控制mysql参数配置。如果你不需要将结果信息写入mysql,这个参数可以忽略,即删除或保留都无所谓;如果你需要写入mysql且config.json文件中mysql_config的配置与你的mysql配置不一样,请将该值改成你自己mysql中的参数配置。

## 设置sqlite_config(可选)

sqlite_config控制SQLite参数配置,代表SQLite数据库的保存路径,可根据自己需求修改。

## 设置数据库(可选)

本部分是可选部分,如果不需要将爬取信息写入数据库,可跳过这一步。本程序目前支持MySQL数据库和MongoDB数据库,如果你需要写入其它数据库,可以参考这两个数据库的写法自己编写。

## MySQL数据库写入

要想将爬取信息写入MySQL,请根据自己的系统环境安装MySQL,然后命令行执行:

```bash
$ pip install pymysql
```

## MongoDB数据库写入

要想将爬取信息写入MongoDB,请根据自己的系统环境安装MongoDB,然后命令行执行:

```bash
$ pip install pymongo
```
connection_string是MongoDB标准URI:
```text
mongodb://[username:password@]host1[:port1][,...hostN[:portN]][/[defaultauthdb][?options]]
```

dba_name和dba_password对应URI中的username和password。如果没有访问限制可不填。
无访问限制的例子:
```json
"connection_string": "mongodb://localhost:27017/weibo",
```
使用用户名和密码的例子:
```json
"connection_string": "mongodb://admin:password@localhost:27017/weibo",
"dba_name": "",
"dba_password": "",
```
或
```json
"connection_string": "mongodb://localhost:27017/weibo",
"dba_name": "admin",
"dba_password": "password",
```

MySQL和MongDB数据库的写入内容一样。程序首先会创建一个名为"weibo"的数据库,然后再创建"user"表和"weibo"表,包含爬取的所有内容。爬取到的微博**用户信息**或插入或更新,都会存储到user表里;爬取到的**微博信息**或插入或更新,都会存储到weibo表里,两个表通过user_id关联。如果想了解两个表的具体字段,请点击"详情"。

<details>

<summary>详情</summary>

- **user表**
- **id**:存储用户id,如"1669879400";
- **nickname**:存储用户昵称,如"Dear-迪丽热巴";
- **gender**:存储用户性别;
- **location**:存储用户所在地;
- **birthday**:存储用户出生日期;
- **description**:存储用户简介;
- **verified_reason**:存储用户认证;
- **talent**:存储用户标签;
- **education**:存储用户学习经历;
- **work**:存储用户工作经历;
- **weibo_num**:存储微博数;
- **following**:存储关注数;
- **followers**:存储粉丝数。

***

- **weibo表**
- **id**:存储微博id;
- **user_id**:存储微博发布者的用户id,如"1669879400";
- **content**:存储微博正文;
- **article_url**:存储微博中头条文章的url,若微博中不存在头条文章,则值为'';
- **original_pictures**:存储原创微博的原始图片url和转发微博转发理由中的图片url。若某条微博有多张图片,则存储多个url,以英文逗号分割;若某微博没有图片,则值为"无";
- **retweet_pictures**:存储被转发微博中的原始图片url。当最新微博为原创微博或者为没有图片的转发微博时,则值为"无",否则为被转发微博的图片url。若有多张图片,则存储多个url,以英文逗号分割;
- **publish_place**:存储微博的发布位置。如果某条微博没有位置信息,则值为"无";
- **publish_time**:存储微博的发布时间;
- **up_num**:存储微博获得的点赞数;
- **retweet_num**:存储微博获得的转发数;
- **comment_num**:存储微博获得的评论数;
- **publish_tool**:存储微博的发布工具。

</details>

## 设置API接口POST联动(可选)

本部分是可选部分,如果不需要将爬取信息通过POST请求发送到指定API接口,可跳过这一步

请求数据格式为 `content-type : application/json`,接口响应返回也需要是  `content-type : application/json`,HTTP状态码为 `200` 

数据主体与 `write_mode` 配置的 `json` 输出格式一致,是整页获取数据json,每页POST发送一次

`api_url` 为指定的API接口地址

`api_token` 为接口鉴权TOKEN,将在 Request Headers 中添加 `api-token` 字段,根据需要配置

================================================
FILE: docs/userid.md
================================================
## 如何获取user_id

1. 打开网址<https://weibo.cn>,搜索我们要找的人,如"迪丽热巴",进入她的主页;
   ![user home](https://github.com/dataabc/media/blob/master/weiboSpider/images/user_home.png)
2. 按照上图箭头所指,点击"资料"链接,跳转到用户资料页面;
   ![user info](https://github.com/dataabc/media/blob/master/weiboSpider/images/user_info.png)

如上图所示,迪丽热巴微博资料页的地址为"<https://weibo.cn/1669879400/info>",其中的"1669879400"即为此微博的user_id。

事实上,此微博的user_id也包含在用户主页(<https://weibo.cn/u/1669879400?f=search_0>)中,之所以我们还要点击主页中的"资料"来获取user_id,是因为很多用户的主页不是"<https://weibo.cn/user_id?f=search_0>"的形式,而是"<https://weibo.cn/个性域名?f=search_0>"或"<https://weibo.cn/微号?f=search_0>"的形式。其中"微号"和user_id都是一串数字,如果仅仅通过主页地址提取user_id,很容易将"微号"误认为user_id。

上述可以获得一个user_id,如果想要获得**大量**微博,见[如何获取大量user_id](#如何获取大量user_id)部分。

## 如何获取大量user_id

[如何获取user_id](#如何获取user_id)部分可以获得一个user_id,<https://github.com/dataabc/weibo-follow>可以利用这一个user_id,获取该user_id微博用户关注人的user_id,一个user_id最多可以获得200个user_id,并写入user_id_list.txt文件。程序支持读文件,利用这200个user_id,可以获得最多200X200=40000个user_id。再利用这40000个user_id可以得到40000X200=8000000个user_id,如此反复,以此类推,可以获得大量user_id。本项目也支持读文件,将上述程序的结果文件user_id_list.txt路径赋值给本项目config.json的user_id_list参数,就可以获得这些user_id用户所发布的大量微博。


================================================
FILE: requirements.txt
================================================
lxml
requests==2.32.4
tqdm==4.66.3
absl-py==0.12.0
browser_cookie3==0.20.1
aiohttp

================================================
FILE: setup.py
================================================
import setuptools

with open('README.md', 'r', encoding='utf-8') as fh:
    long_description = fh.read()

setuptools.setup(
    name='weibo-spider',
    version='0.2.8',
    author='Chen Lei',
    author_email='chillychen1991@gmail.com',
    description='新浪微博爬虫,用python爬取新浪微博数据。',
    long_description=long_description,
    long_description_content_type='text/markdown',
    url='https://github.com/dataabc/weiboSpider',
    packages=setuptools.find_packages(),
    package_data={'weibo_spider': ['config_sample.json', 'logging.conf']},
    classifiers=[
        'Programming Language :: Python :: 3',
        'Operating System :: OS Independent',
    ],
    install_requires=[
        'absl-py',
        'lxml',
        'requests',
        'tqdm',
    ],
    python_requires='>=3.6',
)


================================================
FILE: tests/__init__.py
================================================


================================================
FILE: tests/test_downloader_async.py
================================================
import asyncio
import unittest
from unittest.mock import MagicMock, AsyncMock, patch
import os
import shutil

from weibo_spider.downloader.downloader import Downloader
from weibo_spider.downloader.img_downloader import ImgDownloader

class MockWeibo:
    def __init__(self):
        self.id = '12345'
        self.publish_time = '2023-10-27 10:00:00'
        self.media = {}
        self.original_pictures = 'http://example.com/pic.jpg'

class TestDownloaderAsync(unittest.TestCase):
    def setUp(self):
        self.test_dir = 'tests/tmp_downloader'
        if not os.path.exists(self.test_dir):
            os.makedirs(self.test_dir)

    def tearDown(self):
        if os.path.exists(self.test_dir):
            shutil.rmtree(self.test_dir)

    def test_img_downloader(self):
        async def run_test():
            downloader = ImgDownloader(self.test_dir, [1, 1, 1])
            downloader.key = 'original_pictures' # Set key explicitly
            weibo = MockWeibo()
            
            mock_session = MagicMock()
            mock_response = AsyncMock()
            mock_response.status = 200
            mock_response.read.return_value = b'fake_image_content'
            
            # Mock session.get to return an async context manager
            mock_context = AsyncMock()
            mock_context.__aenter__.return_value = mock_response
            mock_context.__aexit__.return_value = None
            mock_session.get.return_value = mock_context

            # Patch asyncio.sleep to speed up tests
            with patch('asyncio.sleep', AsyncMock()):
                # Test download_files
                await downloader.download_files([weibo], mock_session)
            
            # Check if file exists
            expected_file = os.path.join(self.test_dir, '图片', '20231027_12345.jpg')
            self.assertTrue(os.path.exists(expected_file), f"File {expected_file} does not exist")
            
            # Check content
            with open(expected_file, 'rb') as f:
                content = f.read()
                self.assertEqual(content, b'fake_image_content')

        asyncio.run(run_test())

if __name__ == '__main__':
    unittest.main()


================================================
FILE: tests/test_parser/__init__.py
================================================


================================================
FILE: tests/test_parser/test_album_parser.py
================================================
from unittest.mock import patch

from .util import mock_request_get_content
from weibo_spider.parser.album_parser import AlbumParser


@patch('requests.get', mock_request_get_content)
def test_album_parser():
    album_parser = AlbumParser(
        cookie="",
        album_url="https://weibo.cn/album/166564740000001980768563?rl=1")

    pic_urls = album_parser.extract_pic_urls()
    assert (len(pic_urls) == 4)
    assert (pic_urls == [
        'http://wx1.sinaimg.cn/wap180/76102133ly8ga961tpte6j20u00u0q65.jpg',
        'http://wx2.sinaimg.cn/wap180/76102133ly8fwr33wpn8fj20v90v9tbw.jpg',
        'http://wx4.sinaimg.cn/wap180/76102133ly8fvlyn5n52gj20v90v949a.jpg',
        'http://wx2.sinaimg.cn/wap180/76102133ly8fk0btnrn5zj20dp0e8q3t.jpg'
    ])


================================================
FILE: tests/test_parser/test_comment_parser.py
================================================
from unittest.mock import patch

from .util import mock_request_get_content
from weibo_spider.parser.comment_parser import CommentParser


@patch('requests.get', mock_request_get_content)
def test_comment_parser():
    comment_parser = CommentParser(cookie="", weibo_id="J5cVGuUNq")

    long_weibo = comment_parser.get_long_weibo()
    long_retweet = comment_parser.get_long_retweet()

    assert (
        long_retweet == """去年和亲善大使热巴@Dear-迪丽热巴 的特别回忆。"""
        """我们在藏北羌塘一起爬山,探访藏羚羊、雪豹、黑颈鹤的栖息地,感受野生动物保护工作的点滴。"""
        """此时此刻,我们比以往更加重视与自然相处的方式,我们也从未如此迫切需要将想法付诸行动。"""
        """热巴已经和我们@北京绿色阳光 站在一起,希望看完视频的你们,也能获得同样感受与动力。\n"""
        """We Stand for Wildlife. \n"""
        """明日朝阳68309的优酷视频""")
    assert (
        long_weibo == """去年和亲善大使热巴@Dear-迪丽热巴 的特别回忆。"""
        """我们在藏北羌塘一起爬山,探访藏羚羊、雪豹、黑颈鹤的栖息地,感受野生动物保护工作的点滴。"""
        """此时此刻,我们比以往更加重视与自然相处的方式,我们也从未如此迫切需要将想法付诸行动。"""
        """热巴已经和我们@北京绿色阳光 站在一起,希望看完视频的你们,也能获得同样感受与动力。\n"""
        """We Stand for Wildlife. \n"""
        """明日朝阳68309的优酷视频""")


================================================
FILE: tests/test_parser/test_index_parser.py
================================================
from unittest.mock import patch

from .util import mock_request_get_content
from weibo_spider.parser.index_parser import IndexParser


@patch('requests.get', mock_request_get_content)
def test_index_parser():
    index_parser = IndexParser(cookie="", user_uri="1669879400")
    assert (index_parser.get_page_num() == 117)
    assert (str(index_parser.get_user()) == """用户昵称: Dear-迪丽热巴\n"""
            """用户id: 1669879400\n"""
            """微博数: 1159\n"""
            """关注数: 253\n"""
            """粉丝数: 70805574\n""")


================================================
FILE: tests/test_parser/test_info_parser.py
================================================
from unittest.mock import patch

from .util import mock_request_get_content
from weibo_spider.parser.info_parser import InfoParser


@patch('requests.get', mock_request_get_content)
def test_info_parser():
    info_parser = InfoParser(cookie="", user_id="1669879400")
    user = info_parser.extract_user_info()
    # With info_parser, we can only get the nickname.
    assert (user.nickname == "Dear-迪丽热巴")


================================================
FILE: tests/test_parser/test_mblog_picAll_parser.py
================================================
from unittest.mock import patch

from .util import mock_request_get_content
from weibo_spider.parser.mblog_picAll_parser import MblogPicAllParser


@patch('requests.get', mock_request_get_content)
def test_mblog_picAll_parser():
    mblog_picAll_parser = MblogPicAllParser(cookie="", weibo_id="J5ZcSnCAg")
    preview_picture_list = mblog_picAll_parser.extract_preview_picture_list()
    # With info_parser, we can only get the nickname.
    assert (len(preview_picture_list) == 18)
    assert (
        preview_picture_list[0] ==
        'http://ww3.sinaimg.cn/thumb180/63885668ly1gfn5qz5m1yj20u0140472.jpg')


================================================
FILE: tests/test_parser/test_page_parser.py
================================================
from unittest.mock import patch

from weibo_spider.parser.page_parser import PageParser

from .util import mock_request_get_content


@patch('requests.get', mock_request_get_content)
def test_page_parser():
    user_config = {
        'user_uri': '1669879400',
        'since_date': '2020-06-01',
        'end_date': 'now'
    }
    page_parser = PageParser(cookie="",
                             user_config=user_config,
                             page=2,
                             filter=True)
    weibos, weibo_id_list, to_continue = page_parser.get_one_page([])
    assert (weibo_id_list == ['J4PGk4yMw', 'J4EUStJKu'])
    assert (len(weibos) == 2)
    assert (str(weibos[0]) == """生日动态 \xa0\n"""
            """微博发布位置:无\n"""
            """发布时间:2020-06-03 00:00\n"""
            """发布工具:生日动态\n"""
            """点赞数:1499675\n"""
            """转发数:1000000\n"""
            """评论数:1000000\n"""
            """url:https://weibo.cn/comment/J4PGk4yMw\n""")
    assert (str(weibos[1]) ==
            """#微博剧场# #周放设计淡黄的长裙# 这是一幅有声音的手稿#幸福触手可及# 绿洲 \xa0原图\xa0\n"""
            """微博发布位置:无\n"""
            """发布时间:2020-06-01 20:35\n"""
            """发布工具:绿洲APP\n"""
            """点赞数:419181\n"""
            """转发数:1000000\n"""
            """评论数:1000000\n"""
            """url:https://weibo.cn/comment/J4EUStJKu\n""")


================================================
FILE: tests/test_parser/test_photo_parser.py
================================================
from unittest.mock import patch

from weibo_spider.parser.photo_parser import PhotoParser

from .util import mock_request_get_content


@patch('requests.get', mock_request_get_content)
def test_photo_parser():
    photo_parser = PhotoParser(cookie="", user_id=1980768563)

    avatar_album_url = photo_parser.extract_avatar_album_url()
    assert (avatar_album_url ==
            "https://weibo.cn/album/166564740000001980768563?rl=1")


================================================
FILE: tests/test_parser/util.py
================================================
import json
import os
from unittest.mock import Mock

from weibo_spider.parser.util import TEST_DATA_DIR, URL_MAP_FILE


def mock_request_get_content(url, headers):
    with open(os.path.join(TEST_DATA_DIR, URL_MAP_FILE)) as f:
        url_map = json.loads(f.read())
    resp_file = url_map[url]
    mock = Mock()
    with open(resp_file, "rb") as f:
        mock.content = f.read()
    return mock


================================================
FILE: tests/testdata/2f62165fa3ca1e85e0d398d385c377a068b76eb95765f7020ffffd3e.html
================================================
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta http-equiv="Cache-Control" content="no-cache"/><meta id="viewport" name="viewport" content="width=device-width,initial-scale=1.0,minimum-scale=1.0, maximum-scale=2.0" /><link rel="icon" sizes="any" mask href="https://h5.sinaimg.cn/upload/2015/05/15/28/WeiboLogoCh.svg" color="black"><meta name="MobileOptimized" content="240"/><title>Dear-迪丽热巴的微博</title><style type="text/css" id="internalStyle">html,body,p,form,div,table,textarea,input,span,select{font-size:12px;word-wrap:break-word;}body{background:#F8F9F9;color:#000;padding:1px;margin:1px;}table,tr,td{border-width:0px;margin:0px;padding:0px;}form{margin:0px;padding:0px;border:0px;}textarea{border:1px solid #96c1e6}textarea{width:95%;}a,.tl{color:#2a5492;text-decoration:underline;}/*a:link {color:#023298}*/.k{color:#2a5492;text-decoration:underline;}.kt{color:#F00;}.ib{border:1px solid #C1C1C1;}.pm,.pmy{clear:both;background:#ffffff;color:#676566;border:1px solid #b1cee7;padding:3px;margin:2px 1px;overflow:hidden;}.pms{clear:both;background:#c8d9f3;color:#666666;padding:3px;margin:0 1px;overflow:hidden;}.pmst{margin-top: 5px;}.pmsl{clear:both;padding:3px;margin:0 1px;overflow:hidden;}.pmy{background:#DADADA;border:1px solid #F8F8F8;}.t{padding:0px;margin:0px;height:35px;}.b{background:#e3efff;text-align:center;color:#2a5492;clear:both;padding:4px;}.bl{color:#2a5492;}.n{clear:both;background:#436193;color:#FFF;padding:4px; margin: 1px;}.nt{color:#b9e7ff;}.nl{color:#FFF;text-decoration:none;}.nfw{clear:both;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.s{border-bottom:1px dotted #666666;margin:3px;clear:both;}.tip{clear:both; background:#c8d9f3;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tip2{color:#000000;padding:2px 3px;clear:both;}.ps{clear:both;background:#FFF;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tm{background:#feffe5;border:1px solid #e6de8d;padding:4px;}.tm a{color:#ba8300;}.tmn{color:#f00}.tk{color:#ffffff}.tc{color:#63676A;}.c{padding:2px 5px;}.c div a img{border:1px solid #C1C1C1;}.ct{color:#9d9d9d;font-style:italic;}.cmt{color:#9d9d9d;}.ctt{color:#000;}.cc{color:#2a5492;}.nk{color:#2a5492;}.por {border: 1px solid #CCCCCC;height:50px;width:50px;}.me{color:#000000;background:#FEDFDF;padding:2px 5px;}.pa{padding:2px 4px;}.nm{margin:10px 5px;padding:2px;}.hm{padding:5px;background:#FFF;color:#63676A;}.u{margin:2px 1px;background:#ffffff;border:1px solid #b1cee7;}.ut{padding:2px 3px;}.cd{text-align:center;}.r{color:#F00;}.g{color:#0F0;}.bn{background: transparent;border: 0 none;text-align: left;padding-left: 0;}</style></head><body><div class="n" style="padding: 6px 4px;"><a href="https://weibo.cn/?tf=5_009" class="nl">首页<span class="tk">!</span></a>|<a href="https://weibo.cn/msg/?tf=5_010" class="nl">消息</a>|<a href="https://huati.weibo.cn" class="nl">话题</a>|<a href="https://weibo.cn/search/?tf=5_012" class="nl">搜索</a>|<a href="/1669879400?page=2&amp;rand=2195&amp;p=r" class="nl">刷新</a></div><div class="c tip"><a href="https://m.weibo.cn" id="top" class="tl">手机微博触屏版,点击前往>></a></div><div class="u"><div class="ut">Dear-迪丽热巴的微博&nbsp;<a href="/attention/add?uid=1669879400&amp;rl=0&amp;st=46d484">加关注</a></div><div class="tip2"><span class="tc">微博[1159]</span>&nbsp;<a href="/1669879400/follow">关注[253]</a>&nbsp;<a href="/1669879400/fans">粉丝[70805575]</a>&nbsp;<a href="/attgroup/opening?uid=1669879400">分组[1]</a>&nbsp;<a href="/at/weibo?uid=1669879400">@她的</a></div></div><div class="pmst"><span class="pms">&nbsp;微博&nbsp;</span><span class="pmsl">&nbsp;<a href="/1669879400/photo?tf=6_008">相册</a>&nbsp;</span></div><div class="pms" >全部-<a href="/1669879400?filter=1">原创</a>-<a href="/1669879400?filter=2">图片</a>-<a href="/attgroup/opening?uid=1669879400">分组</a>-<a href="/1669879400/search?f=u&amp;rl=0">筛选</a></div><div class="c" id="M_J4PGk4yMw"><div><span class="ctt"><a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA62OI7FO&amp;ep=J4PGk4yMw%2C1669879400%2CJ4PGk4yMw%2C1669879400">生日动态</a> </span>&nbsp;<br /><a href="https://weibo.cn/attitude/J4PGk4yMw/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[1499675]</a>&nbsp;<a href="https://weibo.cn/repost/J4PGk4yMw?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J4PGk4yMw?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J4PGk4yMw?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">2020-06-03 00:00&nbsp;来自生日动态</span></div></div><div class="s"></div><div class="c" id="M_J4EUStJKu"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=%E5%BE%AE%E5%8D%9A%E5%89%A7%E5%9C%BA&amp;from=feed">#微博剧场#</a> <a href="https://weibo.cn/pages/100808topic?extparam=%E5%91%A8%E6%94%BE%E8%AE%BE%E8%AE%A1%E6%B7%A1%E9%BB%84%E7%9A%84%E9%95%BF%E8%A3%99&amp;from=feed">#周放设计淡黄的长裙#</a> 这是一幅有声音的手稿<img alt="[喵喵]" src="//h5.sinaimg.cn/m/emoticon/icon/others/d_miao-61fe2a7aaa.png" style="width:1em; height:1em;" /></span><a href="https://weibo.cn/pages/100808topic?extparam=%E5%B9%B8%E7%A6%8F%E8%A7%A6%E6%89%8B%E5%8F%AF%E5%8F%8A&amp;from=feed">#幸福触手可及#</a> <a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA62NoEfa&amp;ep=J4EUStJKu%2C1669879400%2CJ4EUStJKu%2C1669879400">绿洲</a> </span></div><div><a href="https://weibo.cn/mblog/pic/J4EUStJKu?rl=0"><img src="http://wx4.sinaimg.cn/wap180/63885668ly3gfd2h742y4j2116166kfe.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J4EUStJKu&amp;u=63885668ly3gfd2h742y4j2116166kfe">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J4EUStJKu/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[419181]</a>&nbsp;<a href="https://weibo.cn/repost/J4EUStJKu?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J4EUStJKu?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J4EUStJKu?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">2020-06-01 20:35&nbsp;来自绿洲APP</span></div></div><div class="s"></div><div class="c" id="M_J4rIrl3HH"><div><span class="ctt">炎炎夏日让每天的沐浴时光都变得尤其重要,精致的沙龙香相伴让沐浴也可以成为清新浪漫的享受!给大家<a href="/n/LUX%E5%8A%9B%E5%A3%AB">@LUX力士</a> 的沐浴小秘密分享,有力士植萃沐浴露,把沐浴变成“仪式感”!我的心选好物分享给你们啦 <img alt="[笑而不语]" src="//h5.sinaimg.cn/m/emoticon/icon/default/d_heiheihei-f7ca09d6e8.png" style="width:1em; height:1em;" /></span> <a href="https://m.weibo.cn/s/video/show?object_id=1034:4510423408640056&amp;fromWap=1">LUX力士的微博视频</a> </span>&nbsp;<br /><a href="https://weibo.cn/attitude/J4rIrl3HH/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[377578]</a>&nbsp;<a href="https://weibo.cn/repost/J4rIrl3HH?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J4rIrl3HH?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J4rIrl3HH?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">2020-05-31 10:59</span></div></div><div class="s"></div><div class="c" id="M_J4ls8e0QD"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=idoltube&amp;from=feed">#idoltube#</a><a href="https://weibo.cn/pages/100808topic?extparam=%E5%91%A8%E6%94%BEvlog&amp;from=feed">#周放vlog#</a> 第二篇来啦!今天邀请大家走进生活,走进幸福的放放子一家~<img alt="[喵喵]" src="//h5.sinaimg.cn/m/emoticon/icon/others/d_miao-61fe2a7aaa.png" style="width:1em; height:1em;" /></span><a href="https://weibo.cn/pages/100808topic?extparam=%E5%B9%B8%E7%A6%8F%E8%A7%A6%E6%89%8B%E5%8F%AF%E5%8F%8A&amp;from=feed">#幸福触手可及#</a> <a href="https://m.weibo.cn/s/video/show?object_id=1034:4510357356740654&amp;fromWap=1">Dear-迪丽热巴的微博视频</a> </span>&nbsp;<br /><a href="https://weibo.cn/attitude/J4ls8e0QD/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[397970]</a>&nbsp;<a href="https://weibo.cn/repost/J4ls8e0QD?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J4ls8e0QD?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J4ls8e0QD?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">2020-05-30 19:02&nbsp;来自国产剧集 · 视频社区</span></div></div><div class="s"></div><div class="c" id="M_J4iKCc2N7"><div><span class="ctt"><a href="/n/%E6%B3%95%E5%9B%BD%E5%A8%87%E9%9F%B5%E8%AF%97">@法国娇韵诗</a> 收到宠爱了~小娇的618<a href="https://weibo.cn/pages/100808topic?extparam=%E5%A8%87%E5%AE%A0%E4%BD%A0%E6%9C%89%E4%B8%80%E5%A5%97&amp;from=feed">#娇宠你有一套#</a>,早晚护肤都靠它,超级喜欢这份宠爱!现在给全体爱丽丝们施法,希望你们都可以拥有这份让你变美的娇宠礼物哦~同款娇宠<a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA62cgDJp&amp;ep=J4iKCc2N7%2C1669879400%2CJ4iKCc2N7%2C1669879400">http://t.cn/A62cgDJp</a>一起享用! </span>&nbsp;[<a href="https://weibo.cn/mblog/picAll/J4iKCc2N7?rl=1">组图共2张</a>]</div><div><a href="https://weibo.cn/mblog/pic/J4iKCc2N7?rl=0"><img src="http://wx4.sinaimg.cn/wap180/63885668ly1gfacma37xcj20zk0npwxn.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J4iKCc2N7&amp;u=63885668ly1gfacma37xcj20zk0npwxn">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J4iKCc2N7/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[646984]</a>&nbsp;<a href="https://weibo.cn/repost/J4iKCc2N7?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J4iKCc2N7?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J4iKCc2N7?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">2020-05-30 12:09</span></div></div><div class="s"></div><div class="c" id="M_J4au6chHI"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=%E5%BE%AE%E5%8D%9A%E5%89%A7%E5%9C%BA&amp;from=feed">#微博剧场#</a> 我为4A景区代言,酷飒周放的追剧邀请,你来吗? <a href="https://weibo.cn/pages/100808topic?extparam=4A%E6%99%AF%E5%8C%BA%E8%A7%A6%E6%89%8B%E5%8F%AF%E5%8F%8A&amp;from=feed">#4A景区触手可及#</a> </span></div><div><a href="https://weibo.cn/mblog/pic/J4au6chHI?rl=0"><img src="http://wx4.sinaimg.cn/wap180/63885668ly1gf9c4lx836g20gi0ginpg.gif" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J4au6chHI&amp;u=63885668ly1gf9c4lx836g20gi0ginpg">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J4au6chHI/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[393006]</a>&nbsp;<a href="https://weibo.cn/repost/J4au6chHI?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J4au6chHI?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[424989]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J4au6chHI?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">2020-05-29 15:07</span></div></div><div class="s"></div><div class="c" id="M_J49dYce7d"><div><span class="ctt"><a href="/n/%E8%B7%AF%E6%98%93%E5%A8%81%E7%99%BB">@路易威登</a>  PONT 9 手袋 陪你摩登一夏<img alt="[嘻嘻]" src="//h5.sinaimg.cn/m/emoticon/icon/default/d_xixi-813ededea2.png" style="width:1em; height:1em;" /></span><a href="https://weibo.cn/pages/100808topic?extparam=LVPONT9&amp;from=feed">#LVPONT9#</a> </span>&nbsp;[<a href="https://weibo.cn/mblog/picAll/J49dYce7d?rl=1">组图共3张</a>]</div><div><a href="https://weibo.cn/mblog/pic/J49dYce7d?rl=0"><img src="http://wx2.sinaimg.cn/wap180/63885668ly1gf96jxvgqtj21jk223x6q.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J49dYce7d&amp;u=63885668ly1gf96jxvgqtj21jk223x6q">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J49dYce7d/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[750023]</a>&nbsp;<a href="https://weibo.cn/repost/J49dYce7d?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J49dYce7d?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J49dYce7d?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">2020-05-29 11:54</span></div></div><div class="s"></div><div class="c" id="M_J3RvgnUuJ"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=%E7%83%AD%E5%B7%B4%E6%89%8B%E7%A8%BF%E5%A1%AB%E8%89%B2%E5%A4%A7%E8%B5%9B&amp;from=feed">#热巴手稿填色大赛#</a>服装手稿填色游戏正式开启!图一出自迪迪子,图二出自放放子。迪迪子的面子就靠大家的后期填色了<img alt="[微笑]" src="//h5.sinaimg.cn/m/emoticon/icon/default/d_hehe-039d0a6a8a.png" style="width:1em; height:1em;" /></span> <a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA62Ia4cl&amp;ep=J3RvgnUuJ%2C1669879400%2CJ3RvgnUuJ%2C1669879400">绿洲</a> </span>&nbsp;[<a href="https://weibo.cn/mblog/picAll/J3RvgnUuJ?rl=1">组图共2张</a>]</div><div><a href="https://weibo.cn/mblog/pic/J3RvgnUuJ?rl=0"><img src="http://wx1.sinaimg.cn/wap180/63885668ly3gf70bvty0jj216n1kw1kx.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J3RvgnUuJ&amp;u=63885668ly3gf70bvty0jj216n1kw1kx">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J3RvgnUuJ/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[733671]</a>&nbsp;<a href="https://weibo.cn/repost/J3RvgnUuJ?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J3RvgnUuJ?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J3RvgnUuJ?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">2020-05-27 14:48&nbsp;来自绿洲APP</span></div></div><div class="s"></div><div class="c" id="M_J3GG65557"><div><span class="cmt">转发了&nbsp;<a href="https://weibo.cn/happywhisper">护舒宝</a><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5337.gif" alt="V"/><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/>&nbsp;的微博:</span><span class="ctt">还记得和宝宝陪着<a href="/n/Dear-%E8%BF%AA%E4%B8%BD%E7%83%AD%E5%B7%B4">@Dear-迪丽热巴</a> 走过的花路吗?谢谢阿丝们一直以来的陪伴<img alt="[太开心]" src="//h5.sinaimg.cn/m/emoticon/icon/default/d_taikaixin-97bd3f82d6.png" style="width:1em; height:1em;" /></span><img alt="[太开心]" src="//h5.sinaimg.cn/m/emoticon/icon/default/d_taikaixin-97bd3f82d6.png" style="width:1em; height:1em;" /></span>~为你甄选护舒宝天然纯棉卫生巾,给你透气亲肤的体验。现在上天猫超市购买,1套减25,第2套只要19.9。未来的花路,和宝宝一起用好物,守护热巴!<a href="https://weibo.cn/pages/100808topic?extparam=%E8%BF%AA%E4%B8%BD%E7%83%AD%E5%B7%B4%5B%E8%B6%85%E8%AF%9D%5D&amp;from=feed">#迪丽热巴[超话]#</a> </span></div><div><a href="https://weibo.cn/mblog/pic/J3GbU6T1r?rl=0"><img src="http://wx1.sinaimg.cn/wap180/6a8c26b5gy1gf5mc6hi0tj20u01hcwna.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J3GbU6T1r&amp;u=6a8c26b5gy1gf5mc6hi0tj20u01hcwna">原图</a>&nbsp;<span class="cmt">赞[43521]</span>&nbsp;<span class="cmt">原文转发[1000000]</span>&nbsp;<a href="https://weibo.cn/comment/J3GbU6T1r?rl=0#cmtfrm" class="cc">原文评论[13967]</a><!----></div><div><span class="cmt">转发理由:</span>谢谢<a href="/n/%E6%8A%A4%E8%88%92%E5%AE%9D">@护舒宝</a> 和阿丝们的守护,每一刻都非常有意义。未来请继续指教啦~&nbsp;&nbsp;<br /><a href="https://weibo.cn/attitude/J3GG65557/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[418834]</a>&nbsp;<a href="https://weibo.cn/repost/J3GG65557?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J3GG65557?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J3GG65557?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">2020-05-26 11:14</span></div></div><div class="s"></div><div class="c" id="M_J3B2NqFyZ"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=idoltube&amp;from=feed">#idoltube#</a><a href="https://weibo.cn/pages/100808topic?extparam=%E5%91%A8%E6%94%BEvlog&amp;from=feed">#周放vlog#</a> 放放子的第一支搞事业篇vlog已上线~约vlog的朋友们可以放下你们的号码牌了<img alt="[可爱]" src="//h5.sinaimg.cn/m/emoticon/icon/default/d_keai-7a5bf88086.png" style="width:1em; height:1em;" /></span> <a href="https://weibo.cn/pages/100808topic?extparam=%E5%B9%B8%E7%A6%8F%E8%A7%A6%E6%89%8B%E5%8F%AF%E5%8F%8A&amp;from=feed">#幸福触手可及#</a> <a href="https://m.weibo.cn/s/video/show?object_id=1034:4508573473112072&amp;fromWap=1">Dear-迪丽热巴的微博视频</a> </span>&nbsp;<br /><a href="https://weibo.cn/attitude/J3B2NqFyZ/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[450541]</a>&nbsp;<a href="https://weibo.cn/repost/J3B2NqFyZ?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J3B2NqFyZ?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[216934]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J3B2NqFyZ?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">2020-05-25 20:53&nbsp;来自影视剪辑 · 视频社区</span></div></div><div class="s"></div><div class="pa" id="pagelist"><form action="/1669879400" method="post"><div><a href="/1669879400?page=3">下页</a>&nbsp;<a href="/1669879400?page=1">上页</a>&nbsp;<a href="/1669879400">首页</a>&nbsp;<input name="mp" type="hidden" value="117" /><input type="text" name="page" size="2" style='-wap-input-format: "*N"' /><input type="submit" value="跳页" />&nbsp;2/117页</div></form></div><div class="pm"><form action="/search/" method="post"><div><input type="text" name="keyword" value="" size="15" /><input type="submit" name="smblog" value="搜微博" /><input type="submit" name="suser" value="找人" /><br/><span class="pmf"><a href="/search/mblog/?keyword=%E7%A7%A6%E6%98%8A%E5%9B%A0%E5%A5%B3%E5%84%BF%E8%A2%AB%E6%AC%BA%E8%B4%9F%E8%90%BD%E6%B3%AA&amp;rl=0" class="k">秦昊因女儿被欺负落泪</a>&nbsp;<a href="/search/mblog/?keyword=%E5%94%90%E7%A6%B9%E5%93%B2%E6%89%BF%E8%AE%A4%E6%81%8B%E6%83%85&amp;rl=0" class="k">唐禹哲承认恋情</a>&nbsp;<a href="/search/mblog/?keyword=%E8%82%96%E6%88%98%E9%87%87%E8%AE%BF&amp;rl=0" class="k">肖战采访</a>&nbsp;<a href="/search/mblog/?keyword=%E8%9B%8B%E5%A3%B3%E5%85%AC%E5%AF%93CEO%E8%A2%AB%E8%B0%83%E6%9F%A5&amp;rl=0" class="k">蛋壳公寓CEO被调查</a>&nbsp;<a href="/search/mblog/?keyword=%E5%BC%A0%E6%98%8A&amp;rl=0" class="k">张昊</a></span></div></form></div><div class="cd"><a href="#top"><img src="https://h5.sinaimg.cn/upload/2017/04/27/319/5e990ec2.gif" alt="TOP"/></a></div><div class="pms"><a href="https://weibo.cn">首页<span class="tk">!</span></a>.<a href="https://weibo.cn/topic/240489">反馈</a>.<a href="https://weibo.cn/page/91">帮助</a>.<a  href="http://down.sina.cn/weibo/default/index/soft_id/1/mid/0"  >客户端</a>.<a href="https://weibo.cn/spam/?rl=0&amp;type=3&amp;fuid=1669879400" class="kt">举报</a>.<a href="https://passport.sina.cn/sso/logout?r=https%3A%2F%2Fweibo.cn%2Fpub%2F%3Fvt%3D&amp;entry=mweibo">退出</a></div><div class="c">设置:<a href="https://weibo.cn/account/customize/skin?tf=7_005&amp;st=46d484">皮肤</a>.<a href="https://weibo.cn/account/customize/pic?tf=7_006&amp;st=46d484">图片</a>.<a href="https://weibo.cn/account/customize/pagesize?tf=7_007&amp;st=46d484">条数</a>.<a href="https://weibo.cn/account/privacy/?tf=7_008&amp;st=46d484">隐私</a></div><div class="c">彩版|<a href="https://m.weibo.cn/?tf=7_010">触屏</a>|<a href="https://weibo.cn/page/521?tf=7_011">语音</a></div><div class="b">weibo.cn[06-19 00:47]</div></body></html>


================================================
FILE: tests/testdata/4957814af5a123b82e974b5537dea736dfb34e48d8835203a45d2e67.html
================================================
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta http-equiv="Cache-Control" content="no-cache"/><meta id="viewport" name="viewport" content="width=device-width,initial-scale=1.0,minimum-scale=1.0, maximum-scale=2.0" /><link rel="icon" sizes="any" mask href="https://h5.sinaimg.cn/upload/2015/05/15/28/WeiboLogoCh.svg" color="black"><meta name="MobileOptimized" content="240"/><title>Dear-迪丽热巴的微博</title><style type="text/css" id="internalStyle">html,body,p,form,div,table,textarea,input,span,select{font-size:12px;word-wrap:break-word;}body{background:#F8F9F9;color:#000;padding:1px;margin:1px;}table,tr,td{border-width:0px;margin:0px;padding:0px;}form{margin:0px;padding:0px;border:0px;}textarea{border:1px solid #96c1e6}textarea{width:95%;}a,.tl{color:#2a5492;text-decoration:underline;}/*a:link {color:#023298}*/.k{color:#2a5492;text-decoration:underline;}.kt{color:#F00;}.ib{border:1px solid #C1C1C1;}.pm,.pmy{clear:both;background:#ffffff;color:#676566;border:1px solid #b1cee7;padding:3px;margin:2px 1px;overflow:hidden;}.pms{clear:both;background:#c8d9f3;color:#666666;padding:3px;margin:0 1px;overflow:hidden;}.pmst{margin-top: 5px;}.pmsl{clear:both;padding:3px;margin:0 1px;overflow:hidden;}.pmy{background:#DADADA;border:1px solid #F8F8F8;}.t{padding:0px;margin:0px;height:35px;}.b{background:#e3efff;text-align:center;color:#2a5492;clear:both;padding:4px;}.bl{color:#2a5492;}.n{clear:both;background:#436193;color:#FFF;padding:4px; margin: 1px;}.nt{color:#b9e7ff;}.nl{color:#FFF;text-decoration:none;}.nfw{clear:both;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.s{border-bottom:1px dotted #666666;margin:3px;clear:both;}.tip{clear:both; background:#c8d9f3;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tip2{color:#000000;padding:2px 3px;clear:both;}.ps{clear:both;background:#FFF;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tm{background:#feffe5;border:1px solid #e6de8d;padding:4px;}.tm a{color:#ba8300;}.tmn{color:#f00}.tk{color:#ffffff}.tc{color:#63676A;}.c{padding:2px 5px;}.c div a img{border:1px solid #C1C1C1;}.ct{color:#9d9d9d;font-style:italic;}.cmt{color:#9d9d9d;}.ctt{color:#000;}.cc{color:#2a5492;}.nk{color:#2a5492;}.por {border: 1px solid #CCCCCC;height:50px;width:50px;}.me{color:#000000;background:#FEDFDF;padding:2px 5px;}.pa{padding:2px 4px;}.nm{margin:10px 5px;padding:2px;}.hm{padding:5px;background:#FFF;color:#63676A;}.u{margin:2px 1px;background:#ffffff;border:1px solid #b1cee7;}.ut{padding:2px 3px;}.cd{text-align:center;}.r{color:#F00;}.g{color:#0F0;}.bn{background: transparent;border: 0 none;text-align: left;padding-left: 0;}</style></head><body><div class="n" style="padding: 6px 4px;"><a href="https://weibo.cn/?tf=5_009" class="nl">首页<span class="tk">!</span></a>|<a href="https://weibo.cn/msg/?tf=5_010" class="nl">消息</a>|<a href="https://huati.weibo.cn" class="nl">话题</a>|<a href="https://weibo.cn/search/?tf=5_012" class="nl">搜索</a>|<a href="/1669879400?page=1&amp;rand=5066&amp;p=r" class="nl">刷新</a></div><div class="c tip"><a href="https://m.weibo.cn" id="top" class="tl">手机微博触屏版,点击前往>></a></div><div class="u"><table><tr><td valign="top"><a href="/1669879400/avatar?rl=0"><img src="https://tvax1.sinaimg.cn/crop.0.0.1080.1080.50/63885668ly8geyrcrw0zjj20u00u0mz6.jpg?KID=imgbed,tva&Expires=1592509637&ssig=5Mmd1nGHJ2" alt="头像" class="por" /></a></td><td valign="top"><div class="ut"><span class="ctt">Dear-迪丽热巴<img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5338.gif" alt="V"/><a href="http://vip.weibo.cn/?F=W_tq_zsbs_01"><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/></a>&nbsp;女/上海    &nbsp;    <a href="/attention/add?uid=1669879400&amp;rl=0&amp;st=46d484">加关注</a></span><br /><span class="ctt">认证:嘉行传媒签约演员 </span><br /><span class="ctt" style="word-break:break-all; width:50px;">一只喜欢默默表演的小透明。工作联系jaywalk@jaywalk.com.cn ...</span><br /><a href="/im/chat?uid=1669879400&amp;rl=0">私信</a>&nbsp;<a href="/1669879400/info">资料</a>&nbsp;<a href="/1669879400/operation?rl=0">操作</a>&nbsp;<a href="/attgroup/special?fuid=1669879400&amp;st=46d484">特别关注</a>&nbsp;<a href="http://new.vip.weibo.cn/vippay/payother?present=1&amp;action=comfirmTime&amp;uid=1669879400">送Ta会员</a></div></td></tr></table><div class="tip2"><span class="tc">微博[1159]</span>&nbsp;<a href="/1669879400/follow">关注[253]</a>&nbsp;<a href="/1669879400/fans">粉丝[70805573]</a>&nbsp;<a href="/attgroup/opening?uid=1669879400">分组[1]</a>&nbsp;<a href="/at/weibo?uid=1669879400">@她的</a></div></div><div class="pmst"><span class="pms">&nbsp;微博&nbsp;</span><span class="pmsl">&nbsp;<a href="/1669879400/photo?tf=6_008">相册</a>&nbsp;</span></div><div class="pms" >全部-<a href="/1669879400?filter=1">原创</a>-<a href="/1669879400?filter=2">图片</a>-<a href="/attgroup/opening?uid=1669879400">分组</a>-<a href="/1669879400/search?f=u&amp;rl=0">筛选</a></div><div class="c" id="M_J74OAhxtL"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=%E4%B8%80%E8%B5%B7%E7%83%AD%E7%88%B1%E5%B0%B1%E7%8E%B0%E5%9C%A8&amp;from=feed">#一起热爱就现在#</a>给你们康康我眼前的画面<img alt="[嘻嘻]" src="//h5.sinaimg.cn/m/emoticon/icon/default/d_xixi-813ededea2.png" style="width:1em; height:1em;" /></span> <a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA6Lftj1K&amp;ep=J74OAhxtL%2C1669879400%2CJ74OAhxtL%2C1669879400">绿洲</a> </span></div><div><a href="https://weibo.cn/mblog/pic/J74OAhxtL?rl=0"><img src="http://wx4.sinaimg.cn/wap180/63885668ly3gfvg8ubqjmj20u01407wh.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J74OAhxtL&amp;u=63885668ly3gfvg8ubqjmj20u01407wh">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J74OAhxtL/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[523801]</a>&nbsp;<a href="https://weibo.cn/repost/J74OAhxtL?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J74OAhxtL?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[393530]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J74OAhxtL?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月17日 18:12&nbsp;来自绿洲APP</span></div></div><div class="s"></div><div class="c" id="M_J6INEAyUV"><div><span class="ctt">刚收到我定制的亓那眼镜,猜猜定制了什么<img alt="[doge]" src="//h5.sinaimg.cn/m/emoticon/icon/others/d_doge-861403219c.png" style="width:1em; height:1em;" /></span>好奇?没关系,你们也可以拥有自己的定制眼镜。关注<a href="/n/QINA%E4%BA%93%E9%82%A3%E7%9C%BC%E9%95%9C">@QINA亓那眼镜</a> 解锁6月限定惊喜,<a href="https://weibo.cn/pages/100808topic?extparam=%E6%97%B6%E9%AB%A6%E5%AF%BB%E5%AE%9D%E8%AE%A1%E5%88%92&amp;from=feed">#时髦寻宝计划#</a> 线上线下都安排了<img alt="[偷笑]" src="//h5.sinaimg.cn/m/emoticon/icon/default/d_touxiao-15afb1c739.png" style="width:1em; height:1em;" /></span><a href="https://m.weibo.cn/s/video/show?object_id=1034:4514680232673315&amp;fromWap=1">QINA亓那眼镜的微博视频</a> </span>&nbsp;<br /><a href="https://weibo.cn/attitude/J6INEAyUV/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[415899]</a>&nbsp;<a href="https://weibo.cn/repost/J6INEAyUV?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J6INEAyUV?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J6INEAyUV?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月15日 10:09</span></div></div><div class="s"></div><div class="c" id="M_J6Divw6j7"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=idoltube&amp;from=feed">#idoltube#</a><a href="https://weibo.cn/pages/100808topic?extparam=%E5%91%A8%E6%94%BEvlog&amp;from=feed">#周放vlog#</a> 什么?放放子还有两副面孔呢?<img alt="[喵喵]" src="//h5.sinaimg.cn/m/emoticon/icon/others/d_miao-61fe2a7aaa.png" style="width:1em; height:1em;" /></span> <a href="https://weibo.cn/pages/100808topic?extparam=%E5%B9%B8%E7%A6%8F%E8%A7%A6%E6%89%8B%E5%8F%AF%E5%8F%8A&amp;from=feed">#幸福触手可及#</a> <a href="https://m.weibo.cn/s/video/show?object_id=1034:4515809876181011&amp;fromWap=1">Dear-迪丽热巴的微博视频</a> </span>&nbsp;<br /><a href="https://weibo.cn/attitude/J6Divw6j7/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[318054]</a>&nbsp;<a href="https://weibo.cn/repost/J6Divw6j7?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J6Divw6j7?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[514546]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J6Divw6j7?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月14日 20:09&nbsp;来自影视剪辑 · 视频社区</span></div></div><div class="s"></div><div class="c" id="M_J6k49kbTc"><div><span class="ctt">7000<img alt="[耶]" src="//h5.sinaimg.cn/m/emoticon/icon/others/h_ye-256191c090.png" style="width:1em; height:1em;" /></span> <a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA6Lzm5BB&amp;ep=J6k49kbTc%2C1669879400%2CJ6k49kbTc%2C1669879400">绿洲</a> </span>&nbsp;[<a href="https://weibo.cn/mblog/picAll/J6k49kbTc?rl=1">组图共2张</a>]</div><div><a href="https://weibo.cn/mblog/pic/J6k49kbTc?rl=0"><img src="http://wx1.sinaimg.cn/wap180/63885668ly3gfppuxsc3lj216n1kwqv5.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J6k49kbTc&amp;u=63885668ly3gfppuxsc3lj216n1kwqv5">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J6k49kbTc/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[1150265]</a>&nbsp;<a href="https://weibo.cn/repost/J6k49kbTc?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J6k49kbTc?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J6k49kbTc?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月12日 19:11&nbsp;来自绿洲APP</span></div></div><div class="s"></div><div class="c" id="M_J5ZcSnCAg"><div><span class="ctt">言出必行,说了18张就是18张,送给七千万的你们 ~ </span>&nbsp;[<a href="https://weibo.cn/mblog/picAll/J5ZcSnCAg?rl=1">组图共18张</a>]</div><div><a href="https://weibo.cn/mblog/pic/J5ZcSnCAg?rl=0"><img src="http://wx3.sinaimg.cn/wap180/63885668ly1gfn5qz5m1yj20u0140472.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5qz5m1yj20u0140472">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J5ZcSnCAg/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[2511439]</a>&nbsp;<a href="https://weibo.cn/repost/J5ZcSnCAg?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J5ZcSnCAg?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J5ZcSnCAg?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月10日 14:05</span></div></div><div class="s"></div><div class="c" id="M_J5GPjo3Tu"><div><span class="ctt">放放子缺个快板<img alt="[偷笑]" src="//h5.sinaimg.cn/m/emoticon/icon/default/d_touxiao-15afb1c739.png" style="width:1em; height:1em;" /></span> <a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA62rvGPb&amp;ep=J5GPjo3Tu%2C1669879400%2CJ5GPjo3Tu%2C1669879400">绿洲</a> </span></div><div><a href="https://weibo.cn/mblog/pic/J5GPjo3Tu?rl=0"><img src="http://wx3.sinaimg.cn/wap180/63885668ly3gfkwmkb6sej21kw1kw7wi.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J5GPjo3Tu&amp;u=63885668ly3gfkwmkb6sej21kw1kw7wi">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J5GPjo3Tu/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[571755]</a>&nbsp;<a href="https://weibo.cn/repost/J5GPjo3Tu?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J5GPjo3Tu?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J5GPjo3Tu?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月08日 15:17&nbsp;来自绿洲APP</span></div></div><div class="s"></div><div class="c" id="M_J5ykphxc8"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=%E5%87%BA%E6%89%8B%E5%90%A7%E5%85%84%E5%BC%9F&amp;from=feed">#出手吧兄弟#</a> <a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA623TGxZ&amp;ep=J5ykphxc8%2C1669879400%2CJ5ykphxc8%2C1669879400">今晚20:10,我在湖南卫视</a> ,为湖南永州的夏橙出手! <a href="https://weibo.cn/pages/100808topic?extparam=%E5%87%BA%E6%89%8B%E5%90%A7%E5%85%84%E5%BC%9F%E8%8A%82%E7%9B%AE%E5%8D%95&amp;from=feed">#出手吧兄弟节目单#</a> </span></div><div><a href="https://weibo.cn/mblog/pic/J5ykphxc8?rl=0"><img src="http://wx4.sinaimg.cn/wap180/63885668gy1gfjv3m6ixxj20u01hcjxe.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J5ykphxc8&amp;u=63885668gy1gfjv3m6ixxj20u01hcjxe">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J5ykphxc8/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[0]</a>&nbsp;<a href="https://weibo.cn/repost/J5ykphxc8?uid=1669879400&amp;rl=0">转发[0]</a>&nbsp;<a href="https://weibo.cn/comment/J5ykphxc8?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[0]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J5ykphxc8?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月07日 17:39</span></div></div><div class="s"></div><div class="c" id="M_J5cVGuUNq"><div><span class="cmt">转发了&nbsp;<a href="https://weibo.cn/wcschinaprogram">WCS野生生物保护学会</a><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5337.gif" alt="V"/>&nbsp;的微博:</span><span class="ctt">去年和亲善大使热巴<a href="/n/Dear-%E8%BF%AA%E4%B8%BD%E7%83%AD%E5%B7%B4">@Dear-迪丽热巴</a> 的特别回忆<img alt="[心]" src="//h5.sinaimg.cn/m/emoticon/icon/others/l_xin-6912791858.png" style="width:1em; height:1em;" /></span>。我们在藏北羌塘一起爬山,探访藏羚羊、雪豹、黑颈鹤的栖息地,感受野生动物保护工作的点滴。此时此刻,我们比以往更加重视与自然相处的方式,我们也从未如此迫切需要将想法付诸行动。热巴已经和我们<a href="/n/%E5%8C%97%E4%BA%AC%E7%BB%BF%E8%89%B2%E9%98%B3%E5%85%89">@北京绿色阳光</a> 站在一起,希望看完视频的你们,也...<a href='/comment/J5cRdli6m?ckAll=1'>全文</a></span>&nbsp;<span class="cmt">赞[119296]</span>&nbsp;<span class="cmt">原文转发[1000000]</span>&nbsp;<a href="https://weibo.cn/comment/J5cRdli6m?rl=0#cmtfrm" class="cc">原文评论[38688]</a><!----></div><div><span class="cmt">转发理由:</span>在羌塘的美好回忆~第一次来到这片独特的荒野,看到野生动物自由生活,还有一群快乐可爱的人在守护着它们。把这些美好留存下来,关注野生动物保护,积极行动,我们每个人都能贡献力量。<img alt="[心]" src="//h5.sinaimg.cn/m/emoticon/icon/others/l_xin-6912791858.png" style="width:1em; height:1em;" /></span>&nbsp;&nbsp;<br /><a href="https://weibo.cn/attitude/J5cVGuUNq/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[554415]</a>&nbsp;<a href="https://weibo.cn/repost/J5cVGuUNq?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J5cVGuUNq?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J5cVGuUNq?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月05日 11:11</span></div></div><div class="s"></div><div class="c" id="M_J4YtUxJjO"><div><span class="ctt">要开心。要充实。 </span></div><div><a href="https://weibo.cn/mblog/pic/J4YtUxJjO?rl=0"><img src="http://wx2.sinaimg.cn/wap180/63885668ly1gffgspjnqnj21400u07ao.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J4YtUxJjO&amp;u=63885668ly1gffgspjnqnj21400u07ao">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J4YtUxJjO/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[2386877]</a>&nbsp;<a href="https://weibo.cn/repost/J4YtUxJjO?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J4YtUxJjO?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J4YtUxJjO?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月03日 22:24</span></div></div><div class="s"></div><div class="c" id="M_J4X9cvVA4"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=%E5%BE%AE%E5%8D%9Alive%E7%A7%80&amp;from=feed">#微博live秀#</a> 28岁的直播~<a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA62lK1TV&amp;ep=J4X9cvVA4%2C1669879400%2CJ4X9cvVA4%2C1669879400">@Dear-迪丽热巴 的一直播</a>(下载App-&gt;<a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FRDUuslr&amp;ep=J4X9cvVA4%2C1669879400%2CJ4X9cvVA4%2C1669879400">http://t.cn/RDUuslr</a>) </span>&nbsp;<br /><a href="https://weibo.cn/attitude/J4X9cvVA4/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[435650]</a>&nbsp;<a href="https://weibo.cn/repost/J4X9cvVA4?uid=1669879400&amp;rl=0">转发[23584]</a>&nbsp;<a href="https://weibo.cn/comment/J4X9cvVA4?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J4X9cvVA4?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月03日 19:00&nbsp;来自一直播Yi</span></div></div><div class="s"></div><div class="pa" id="pagelist"><form action="/1669879400" method="post"><div><a href="/1669879400?page=2">下页</a>&nbsp;<input name="mp" type="hidden" value="117" /><input type="text" name="page" size="2" style='-wap-input-format: "*N"' /><input type="submit" value="跳页" />&nbsp;1/117页</div></form></div><div class="pm"><form action="/search/" method="post"><div><input type="text" name="keyword" value="" size="15" /><input type="submit" name="smblog" value="搜微博" /><input type="submit" name="suser" value="找人" /><br/><span class="pmf"><a href="/search/mblog/?keyword=%E7%A7%A6%E6%98%8A%E5%9B%A0%E5%A5%B3%E5%84%BF%E8%A2%AB%E6%AC%BA%E8%B4%9F%E8%90%BD%E6%B3%AA&amp;rl=0" class="k">秦昊因女儿被欺负落泪</a>&nbsp;<a href="/search/mblog/?keyword=%E8%9B%8B%E5%A3%B3%E5%85%AC%E5%AF%93CEO%E8%A2%AB%E8%B0%83%E6%9F%A5&amp;rl=0" class="k">蛋壳公寓CEO被调查</a>&nbsp;<a href="/search/mblog/?keyword=%E5%94%90%E7%A6%B9%E5%93%B2%E6%89%BF%E8%AE%A4%E6%81%8B%E6%83%85&amp;rl=0" class="k">唐禹哲承认恋情</a>&nbsp;<a href="/search/mblog/?keyword=%E6%B5%B7%E6%B8%85%E5%92%8C%E6%88%91%E4%BB%AC%E7%9A%84%E6%B8%A9%E5%B7%AE&amp;rl=0" class="k">海清和我们的温差</a>&nbsp;<a href="/search/mblog/?keyword=%E8%82%96%E6%88%98%E9%87%87%E8%AE%BF&amp;rl=0" class="k">肖战采访</a></span></div></form></div><div class="cd"><a href="#top"><img src="https://h5.sinaimg.cn/upload/2017/04/27/319/5e990ec2.gif" alt="TOP"/></a></div><div class="pms"><a href="https://weibo.cn">首页<span class="tk">!</span></a>.<a href="https://weibo.cn/topic/240489">反馈</a>.<a href="https://weibo.cn/page/91">帮助</a>.<a  href="http://down.sina.cn/weibo/default/index/soft_id/1/mid/0"  >客户端</a>.<a href="https://weibo.cn/spam/?rl=0&amp;type=3&amp;fuid=1669879400" class="kt">举报</a>.<a href="https://passport.sina.cn/sso/logout?r=https%3A%2F%2Fweibo.cn%2Fpub%2F%3Fvt%3D&amp;entry=mweibo">退出</a></div><div class="c">设置:<a href="https://weibo.cn/account/customize/skin?tf=7_005&amp;st=46d484">皮肤</a>.<a href="https://weibo.cn/account/customize/pic?tf=7_006&amp;st=46d484">图片</a>.<a href="https://weibo.cn/account/customize/pagesize?tf=7_007&amp;st=46d484">条数</a>.<a href="https://weibo.cn/account/privacy/?tf=7_008&amp;st=46d484">隐私</a></div><div class="c">彩版|<a href="https://m.weibo.cn/?tf=7_010">触屏</a>|<a href="https://weibo.cn/page/521?tf=7_011">语音</a></div><div class="b">weibo.cn[06-19 00:47]</div></body></html>


================================================
FILE: tests/testdata/4d5ed0a3ebd0303cb45edd544dbc0ab5e86d43e103405f0c60515884.html
================================================
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta http-equiv="Cache-Control" content="no-cache"/><meta id="viewport" name="viewport" content="width=device-width,initial-scale=1.0,minimum-scale=1.0, maximum-scale=2.0" /><link rel="shortcut icon" type="image/x-icon" href="https://weibo.cn/favicon.ico"><link rel="icon" sizes="any" mask href="https://h5.sinaimg.cn/upload/2015/05/15/28/WeiboLogoCh.svg" color="black"><meta name="MobileOptimized" content="240"/><title>评论列表</title><style type="text/css" id="internalStyle">html,body,p,form,div,table,textarea,input,span,select{font-size:12px;word-wrap:break-word;}body{background:#F8F9F9;color:#000;padding:1px;margin:1px;}table,tr,td{border-width:0px;margin:0px;padding:0px;}form{margin:0px;padding:0px;border:0px;}textarea{border:1px solid #96c1e6}textarea{width:95%;}a,.tl{color:#2a5492;text-decoration:underline;}/*a:link {color:#023298}*/.k{color:#2a5492;text-decoration:underline;}.kt{color:#F00;}.ib{border:1px solid #C1C1C1;}.pm,.pmy{clear:both;background:#ffffff;color:#676566;border:1px solid #b1cee7;padding:3px;margin:2px 1px;overflow:hidden;}.pms{clear:both;background:#c8d9f3;color:#666666;padding:3px;margin:0 1px;overflow:hidden;}.pmst{margin-top: 5px;}.pmsl{clear:both;padding:3px;margin:0 1px;overflow:hidden;}.pmy{background:#DADADA;border:1px solid #F8F8F8;}.t{padding:0px;margin:0px;height:35px;}.b{background:#e3efff;text-align:center;color:#2a5492;clear:both;padding:4px;}.bl{color:#2a5492;}.n{clear:both;background:#436193;color:#FFF;padding:4px; margin: 1px;}.nt{color:#b9e7ff;}.nl{color:#FFF;text-decoration:none;}.nfw{clear:both;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.s{border-bottom:1px dotted #666666;margin:3px;clear:both;}.tip{clear:both; background:#c8d9f3;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tip2{color:#000000;padding:2px 3px;clear:both;}.ps{clear:both;background:#FFF;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tm{background:#feffe5;border:1px solid #e6de8d;padding:4px;}.tm a{color:#ba8300;}.tmn{color:#f00}.tk{color:#ffffff}.tc{color:#63676A;}.c{padding:2px 5px;}.c div a img{border:1px solid #C1C1C1;}.ct{color:#9d9d9d;font-style:italic;}.cmt{color:#9d9d9d;}.ctt{color:#000;}.cc{color:#2a5492;}.nk{color:#2a5492;}.por {border: 1px solid #CCCCCC;height:50px;width:50px;}.me{color:#000000;background:#FEDFDF;padding:2px 5px;}.pa{padding:2px 4px;}.nm{margin:10px 5px;padding:2px;}.hm{padding:5px;background:#FFF;color:#63676A;}.u{margin:2px 1px;background:#ffffff;border:1px solid #b1cee7;}.ut{padding:2px 3px;}.cd{text-align:center;}.r{color:#F00;}.g{color:#0F0;}.bn{background: transparent;border: 0 none;text-align: left;padding-left: 0;}</style><script>if(top != self){top.location = self.location;}</script></head><body><div class="n" style="padding: 6px 4px;"><a href="https://weibo.cn/?tf=5_009" class="nl">首页</a>|<a href="https://weibo.cn/msg/?tf=5_010" class="nl">消息</a>|<a href="/comment/J5cVGuUNq?rand=9659&amp;p=r" class="nl">刷新</a></div><div class="c tip"><a href="https://m.weibo.cn" id="top" class="tl">手机微博触屏版,点击前往>></a></div><div class="s"></div><div class="c" id="M_"><div>    <a href="/1669879400">Dear-迪丽热巴</a><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5338.gif" alt="V"/><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/>    <span class="cmt">&nbsp;转发了&nbsp;<a href="/2620077053">@WCS野生生物保护学会</a><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5337.gif" alt="V"/>&nbsp;的微博:</span><span class="ctt">去年和亲善大使热巴<a href="/n/Dear-%E8%BF%AA%E4%B8%BD%E7%83%AD%E5%B7%B4">@Dear-迪丽热巴</a> 的特别回忆<span class="url-icon"><img alt=[心] src="https://h5.sinaimg.cn/m/emoticon/icon/others/l_xin-43af9086c0.png" style="width:1em; height:1em;" /></span>。我们在藏北羌塘一起爬山,探访藏羚羊、雪豹、黑颈鹤的栖息地,感受野生动物保护工作的点滴。此时此刻,我们比以往更加重视与自然相处的方式,我们也从未如此迫切需要将想法付诸行动。热巴已经和我们<a href="/n/%E5%8C%97%E4%BA%AC%E7%BB%BF%E8%89%B2%E9%98%B3%E5%85%89">@北京绿色阳光</a> 站在一起,希望看完视频的你们,也能获得同样感受与动力。<br/><br/>We Stand for Wildlife. <br/> <br/><a href='https://m.weibo.cn/s/video/show?object_id=1007002:4512128925630475&amp;fromWap=1'>明日朝阳68309的优酷视频</a></span>                &nbsp;<span class="cmt">原文转发[1000000]</span>&nbsp;<a href="/comment/J5cRdli6m#cmtfrm" class="cc">原文评论[38774]</a></div><div><span class="cmt">转发理由:</span>在羌塘的美好回忆~第一次来到这片独特的荒野,看到野生动物自由生活,还有一群快乐可爱的人在守护着它们。把这些美好留存下来,关注野生动物保护,积极行动,我们每个人都能贡献力量。<span class="url-icon"><img alt=[心] src="https://h5.sinaimg.cn/m/emoticon/icon/others/l_xin-43af9086c0.png" style="width:1em; height:1em;" /></span>        <!-- 是否进行翻译 -->        &nbsp;    <span class="ct">2020-06-05 11:11:05    </span>    &nbsp;<a href="/attention/add?uid=1669879400&amp;rl=1&amp;st=f156f2">关注她</a>        &nbsp;<a href="/spam/?mid=J5cVGuUNq&amp;fuid=1669879400&amp;type=1&amp;rl=1">举报</a>&nbsp;<a href="/fav/addFav/J5cVGuUNq?rl=1&amp;st=f156f2">收藏</a>&nbsp;<a href="/mblog/operation/J5cVGuUNq?uid=1669879400&amp;rl=1" >操作</a>    </div></div><div class="c"></div><div><span>&nbsp;<a href="/repost/J5cVGuUNq?uid=1669879400&amp;#rt">转发[1000000]</a>&nbsp;</span><span class="pms">&nbsp;评论[1000000]&nbsp;</span><span >&nbsp;<a href="/attitude/J5cVGuUNq?#attitude">赞[572950]</a>&nbsp;</span><br/></div><div class="pms" id="cmtfrm"><form action="/comments/addcomment?st=f156f2" method="post"><div>    评论只显示前140字:<br/>    <input type="hidden" name="srcuid" value="1669879400" />    <input type="hidden" name="id" value="J5cVGuUNq" />        <input type="hidden" name="rl" value="1" />    <textarea name="content" rows="2" cols="20"></textarea><br/>        <input type="submit" value="评论" />&nbsp;<input type="submit" value="评论并转发" name="rt" /></div></form></div><div class="c" id="C_4512413737386822"><span class="kt">[热门]</span><a href="/luxchina">LUX力士</a><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5337.gif" alt="V"/>:<span class="ctt">大家和迪迪一起保护动物<span class="url-icon"><img alt=[心] src="https://h5.sinaimg.cn/m/emoticon/icon/others/l_xin-43af9086c0.png" style="width:1em; height:1em;" /></span></span>&nbsp;<a href="/spam/?cid=4512413737386822&amp;fuid=2184967550&amp;type=2&amp;rl=1">举报</a>&nbsp;<span class="cc"><a href="/attitude/J5cW9uZEi/update?object_type=comment&amp;uid=3113276555&amp;rl=1&amp;st=f156f2">赞[18737]</a></span>&nbsp;<span class="cc"><a href="/comments/reply/J5cVGuUNq/4512413737386822?rl=1&amp;st=f156f2">回复</a></span>&nbsp;<span class="ct">2020-06-05 11:12:14&nbsp;网页</span></div><div class="s"></div><div class="c" id="C_4512414056541102"><span class="kt">[热门]</span><a href="/happywhisper">护舒宝</a><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5337.gif" alt="V"/><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/>:<span class="ctt">跟迪迪一起好好保护野生动物<span class="url-icon"><img alt=[心] src="https://h5.sinaimg.cn/m/emoticon/icon/others/l_xin-43af9086c0.png" style="width:1em; height:1em;" /></span></span>&nbsp;<a href="/spam/?cid=4512414056541102&amp;fuid=1787569845&amp;type=2&amp;rl=1">举报</a>&nbsp;<span class="cc"><a href="/attitude/J5cWFrrDE/update?object_type=comment&amp;uid=3113276555&amp;rl=1&amp;st=f156f2">赞[17423]</a></span>&nbsp;<span class="cc"><a href="/comments/reply/J5cVGuUNq/4512414056541102?rl=1&amp;st=f156f2">回复</a></span>&nbsp;<span class="ct">2020-06-05 11:13:30&nbsp;网页</span></div><div class="s"></div><div class="c" id="C_4512413871654824"><span class="kt">[热门]</span><a href="/baidudilraba">Dear迪丽热巴后援会</a><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5337.gif" alt="V"/><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/>:<span class="ctt">一起保护野生动物<span class="url-icon"><img alt=[给你小心心] src="https://h5.sinaimg.cn/m/emoticon/icon/others/qixi2018_xiaoxinxin-c76bf85343.png" style="width:1em; height:1em;" /></span></span>&nbsp;<a href="/spam/?cid=4512413871654824&amp;fuid=3511007781&amp;type=2&amp;rl=1">举报</a>&nbsp;<span class="cc"><a href="/attitude/J5cWn6WuI/update?object_type=comment&amp;uid=3113276555&amp;rl=1&amp;st=f156f2">赞[15945]</a></span>&nbsp;<span class="cc"><a href="/comments/reply/J5cVGuUNq/4512413871654824?rl=1&amp;st=f156f2">回复</a></span>&nbsp;<span class="ct">2020-06-05 11:12:46&nbsp;网页</span></div><div class="s"></div><div class="c"><a href="/comment/hot/J5cVGuUNq?rl=1">查看更多热门&gt;&gt;</a></div><div class="s"></div><div class="c" id="C_5121248325535333">        <a href="/u/5685899949">源汐梦</a> <img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/>    :<span class="ctt">//<a href="/n/Dear-%E8%BF%AA%E4%B8%BD%E7%83%AD%E5%B7%B4">@Dear-迪丽热巴</a>:在羌塘的美好回忆~第一次来到这片独特的荒野,看到野生动物自由生活,还有一群快乐可爱的人在守护着它们。把这些美好留存下来,关注野生动物保护,积极行动,我们每个人都能贡献力量。<span class="url-icon"><img alt=[心] src="https://h5.sinaimg.cn/m/emoticon/icon/others/l_xin-43af9086c0.png" style="width:1em; height:1em;" /></span></span>    &nbsp;<a href="/spam/?cid=5121248325535333&amp;fuid=5685899949&amp;type=2&amp;rl=1">举报</a>    &nbsp;    <span class="cc">    <a href="/attitude/P8ULundZz/update?object_type=comment&amp;uid=3113276555&amp;rl=1&amp;st=f156f2">赞[0]</a></span>        &nbsp;<span class="cc"><a href="/comments/reply/J5cVGuUNq/5121248325535333?rl=1&amp;st=f156f2">回复</a></span>        &nbsp;    <span class="ct">01月10日 12:43&nbsp;来自来自河南    </span></div>    <div class="s"></div>            <div class="c" id="C_5093106131142464">        <a href="/u/7890176598">今天就中热巴亲签</a>     :<span class="ctt"><span class="url-icon"><img alt="[送花花]" src="https://face.t.sinajs.cn/t4/appstyle/expression/ext/normal/cb/2022_Flowers_org.png" style="width:1em; height:1em;" /></span><span class="url-icon"><img alt=[可怜] src="https://h5.sinaimg.cn/m/emoticon/icon/default/d_kelian-a9df4278bf.png" style="width:1em; height:1em;" /></span><span class="url-icon"><img alt=[可怜] src="https://h5.sinaimg.cn/m/emoticon/icon/default/d_kelian-a9df4278bf.png" style="width:1em; height:1em;" /></span></span>    &nbsp;<a href="/spam/?cid=5093106131142464&amp;fuid=7890176598&amp;type=2&amp;rl=1">举报</a>    &nbsp;    <span class="cc">    <a href="/attitude/OD47b4NcQ/update?object_type=comment&amp;uid=3113276555&amp;rl=1&amp;st=f156f2">赞[0]</a></span>        &nbsp;<span class="cc"><a href="/comments/reply/J5cVGuUNq/5093106131142464?rl=1&amp;st=f156f2">回复</a></span>        &nbsp;    <span class="ct">2024-10-24 20:56:29&nbsp;来自来自安徽    </span></div>    <div class="s"></div>            <div class="c" id="C_5036098864549413">        <a href="/u/6556871376">qadxxghlrrr</a>     :<span class="ctt">饱饱<span class="url-icon"><img alt=[打call] src="https://h5.sinaimg.cn/m/emoticon/icon/default/fb_a1dacall-1e0c4593fc.png" style="width:1em; height:1em;" /></span><span class="url-icon"><img alt=[打call] src="https://h5.sinaimg.cn/m/emoticon/icon/default/fb_a1dacall-1e0c4593fc.png" style="width:1em; height:1em;" /></span><span class="url-icon"><img alt=[打call] src="https://h5.sinaimg.cn/m/emoticon/icon/default/fb_a1dacall-1e0c4593fc.png" style="width:1em; height:1em;" /></span></span>    &nbsp;<a href="/spam/?cid=5036098864549413&amp;fuid=6556871376&amp;type=2&amp;rl=1">举报</a>    &nbsp;    <span class="cc">    <a href="/attitude/Of95Yj5vD/update?object_type=comment&amp;uid=3113276555&amp;rl=1&amp;st=f156f2">赞[0]</a></span>        &nbsp;<span class="cc"><a href="/comments/reply/J5cVGuUNq/5036098864549413?rl=1&amp;st=f156f2">回复</a></span>        &nbsp;    <span class="ct">2024-05-20 13:29:58&nbsp;来自来自湖南    </span></div>    <div class="s"></div>            <div class="c" id="C_4944898253655886">        <a href="/u/5505673339">正义终将会归来</a>     :<span class="ctt">👍👍👍👍❤️❤️</span>    &nbsp;<a href="/spam/?cid=4944898253655886&amp;fuid=5505673339&amp;type=2&amp;rl=1">举报</a>    &nbsp;    <span class="cc">    <a href="/attitude/NiQ0xfl3U/update?object_type=comment&amp;uid=3113276555&amp;rl=1&amp;st=f156f2">赞[1]</a></span>        &nbsp;<span class="cc"><a href="/comments/reply/J5cVGuUNq/4944898253655886?rl=1&amp;st=f156f2">回复</a></span>        &nbsp;    <span class="ct">2023-09-11 21:31:18&nbsp;来自来自广东    </span></div>    <div class="s"></div>            <div class="c" id="C_4934753994146647">        <a href="/u/7745472536">守护热爱星球</a>     :<span class="ctt">//<a href="/n/Dear-%E8%BF%AA%E4%B8%BD%E7%83%AD%E5%B7%B4">@Dear-迪丽热巴</a>:在羌塘的美好回忆~第一次来到这片独特的荒野,看到野生动物自由生活,还有一群快乐可爱的人在守护着它们。把这些美好留存下来,关注野生动物保护,积极行动,我们每个人都能贡献力量。<span class="url-icon"><img alt=[心] src="https://h5.sinaimg.cn/m/emoticon/icon/others/l_xin-43af9086c0.png" style="width:1em; height:1em;" /></span></span>    &nbsp;<a href="/spam/?cid=4934753994146647&amp;fuid=7745472536&amp;type=2&amp;rl=1">举报</a>    &nbsp;    <span class="cc">    <a href="/attitude/NeA6PhoJp/update?object_type=comment&amp;uid=3113276555&amp;rl=1&amp;st=f156f2">赞[0]</a></span>        &nbsp;<span class="cc"><a href="/comments/reply/J5cVGuUNq/4934753994146647?rl=1&amp;st=f156f2">回复</a></span>        &nbsp;    <span class="ct">2023-08-14 21:41:37&nbsp;来自来自山东    </span></div>    <div class="s"></div>            <div class="c" id="C_4851421549435244">        <a href="/u/5892818823">林深时听瑜_</a> <img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5338.gif" alt="V"/><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/>    :<span class="ctt">和巴巴一起好好保护野生动物</span>    &nbsp;<a href="/spam/?cid=4851421549435244&amp;fuid=5892818823&amp;type=2&amp;rl=1">举报</a>    &nbsp;    <span class="cc">    <a href="/attitude/MlzHYDAxm/update?object_type=comment&amp;uid=3113276555&amp;rl=1&amp;st=f156f2">赞[0]</a></span>        &nbsp;<span class="cc"><a href="/comments/reply/J5cVGuUNq/4851421549435244?rl=1&amp;st=f156f2">回复</a></span>        &nbsp;    <span class="ct">2022-12-27 22:48:15&nbsp;来自来自福建    </span></div>    <div class="s"></div>            <div class="c" id="C_4806884668213217">        <a href="/u/7768981618">-土豆削成丝-</a>     :<span class="ctt">宝贝</span>    &nbsp;<a href="/spam/?cid=4806884668213217&amp;fuid=7768981618&amp;type=2&amp;rl=1">举报</a>    &nbsp;    <span class="cc">    <a href="/attitude/M2T6iysDf/update?object_type=comment&amp;uid=3113276555&amp;rl=1&amp;st=f156f2">赞[0]</a></span>        &nbsp;<span class="cc"><a href="/comments/reply/J5cVGuUNq/4806884668213217?rl=1&amp;st=f156f2">回复</a></span>        &nbsp;    <span class="ct">2022-08-27 01:14:35&nbsp;来自来自安徽    </span></div>    <div class="s"></div>            <div class="c" id="C_4763485337027708">        <a href="/u/7551190607">诗润我心</a>     :<span class="ctt">刚看完</span>    &nbsp;<a href="/spam/?cid=4763485337027708&amp;fuid=7551190607&amp;type=2&amp;rl=1">举报</a>    &nbsp;    <span class="cc">    <a href="/attitude/LqDxHtue8/update?object_type=comment&amp;uid=3113276555&amp;rl=1&amp;st=f156f2">赞[0]</a></span>        &nbsp;<span class="cc"><a href="/comments/reply/J5cVGuUNq/4763485337027708?rl=1&amp;st=f156f2">回复</a></span>        &nbsp;    <span class="ct">2022-04-29 07:01:08&nbsp;来自来自河北    </span></div>    <div class="s"></div>            <div class="c" id="C_4763485294037480">        <a href="/u/7551190607">诗润我心</a>     :<span class="ctt">太棒了,热巴姐姐</span>    &nbsp;<a href="/spam/?cid=4763485294037480&amp;fuid=7551190607&amp;type=2&amp;rl=1">举报</a>    &nbsp;    <span class="cc">    <a href="/attitude/LqDxDgWkE/update?object_type=comment&amp;uid=3113276555&amp;rl=1&amp;st=f156f2">赞[0]</a></span>        &nbsp;<span class="cc"><a href="/comments/reply/J5cVGuUNq/4763485294037480?rl=1&amp;st=f156f2">回复</a></span>        &nbsp;    <span class="ct">2022-04-29 07:00:58&nbsp;来自来自河北    </span></div>        <div class="s"></div><div class="pa" id="pagelist"><form action="/comment/J5cVGuUNq" method="post"><div><a href="/comment/J5cVGuUNq?page=2">下页</a>&nbsp;<input name="mp" type="hidden" value="100000" /><input type="text" name="page" size="2" style='-wap-input-format: "*N"' /><input type="submit" value="跳页" />&nbsp;1/100000页</div></form></div><div class="s"></div></body></html>

================================================
FILE: tests/testdata/63a98849ec82b2c87ec55bca03cbf5988f7eac233a23d86b4fdf5ffd.html
================================================
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta http-equiv="Cache-Control" content="no-cache"/><meta id="viewport" name="viewport" content="width=device-width,initial-scale=1.0,minimum-scale=1.0, maximum-scale=2.0" /><link rel="icon" sizes="any" mask href="https://h5.sinaimg.cn/upload/2015/05/15/28/WeiboLogoCh.svg" color="black"><meta name="MobileOptimized" content="240"/><title>微博</title><style type="text/css" id="internalStyle">html,body,p,form,div,table,textarea,input,span,select{font-size:12px;word-wrap:break-word;}body{background:#F8F9F9;color:#000;padding:1px;margin:1px;}table,tr,td{border-width:0px;margin:0px;padding:0px;}form{margin:0px;padding:0px;border:0px;}textarea{border:1px solid #96c1e6}textarea{width:95%;}a,.tl{color:#2a5492;text-decoration:underline;}/*a:link {color:#023298}*/.k{color:#2a5492;text-decoration:underline;}.kt{color:#F00;}.ib{border:1px solid #C1C1C1;}.pm,.pmy{clear:both;background:#ffffff;color:#676566;border:1px solid #b1cee7;padding:3px;margin:2px 1px;overflow:hidden;}.pms{clear:both;background:#c8d9f3;color:#666666;padding:3px;margin:0 1px;overflow:hidden;}.pmst{margin-top: 5px;}.pmsl{clear:both;padding:3px;margin:0 1px;overflow:hidden;}.pmy{background:#DADADA;border:1px solid #F8F8F8;}.t{padding:0px;margin:0px;height:35px;}.b{background:#e3efff;text-align:center;color:#2a5492;clear:both;padding:4px;}.bl{color:#2a5492;}.n{clear:both;background:#436193;color:#FFF;padding:4px; margin: 1px;}.nt{color:#b9e7ff;}.nl{color:#FFF;text-decoration:none;}.nfw{clear:both;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.s{border-bottom:1px dotted #666666;margin:3px;clear:both;}.tip{clear:both; background:#c8d9f3;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tip2{color:#000000;padding:2px 3px;clear:both;}.ps{clear:both;background:#FFF;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tm{background:#feffe5;border:1px solid #e6de8d;padding:4px;}.tm a{color:#ba8300;}.tmn{color:#f00}.tk{color:#ffffff}.tc{color:#63676A;}.c{padding:2px 5px;}.c div a img{border:1px solid #C1C1C1;}.ct{color:#9d9d9d;font-style:italic;}.cmt{color:#9d9d9d;}.ctt{color:#000;}.cc{color:#2a5492;}.nk{color:#2a5492;}.por {border: 1px solid #CCCCCC;height:50px;width:50px;}.me{color:#000000;background:#FEDFDF;padding:2px 5px;}.pa{padding:2px 4px;}.nm{margin:10px 5px;padding:2px;}.hm{padding:5px;background:#FFF;color:#63676A;}.u{margin:2px 1px;background:#ffffff;border:1px solid #b1cee7;}.ut{padding:2px 3px;}.cd{text-align:center;}.r{color:#F00;}.g{color:#0F0;}.bn{background: transparent;border: 0 none;text-align: left;padding-left: 0;}</style></head><body><div class="c"><a href="/">返回</a></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5qz5m1yj20u0140472&amp;rl=1"><img src="http://ww3.sinaimg.cn/thumb180/63885668ly1gfn5qz5m1yj20u0140472.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">1/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5qz5m1yj20u0140472&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5qzjciuj20u0140qbc&amp;rl=1"><img src="http://ww1.sinaimg.cn/thumb180/63885668ly1gfn5qzjciuj20u0140qbc.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">2/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5qzjciuj20u0140qbc&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5qywnfoj20u0140n4x&amp;rl=1"><img src="http://ww2.sinaimg.cn/thumb180/63885668ly1gfn5qywnfoj20u0140n4x.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">3/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5qywnfoj20u0140n4x&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5qzsl0rj20u0140k3i&amp;rl=1"><img src="http://ww2.sinaimg.cn/thumb180/63885668ly1gfn5qzsl0rj20u0140k3i.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">4/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5qzsl0rj20u0140k3i&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5r03jlwj20u0140tje&amp;rl=1"><img src="http://ww2.sinaimg.cn/thumb180/63885668ly1gfn5r03jlwj20u0140tje.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">5/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5r03jlwj20u0140tje&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5r0ce4yj20u0140drj&amp;rl=1"><img src="http://ww4.sinaimg.cn/thumb180/63885668ly1gfn5r0ce4yj20u0140drj.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">6/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5r0ce4yj20u0140drj&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5r0p5qrj20u014045u&amp;rl=1"><img src="http://ww2.sinaimg.cn/thumb180/63885668ly1gfn5r0p5qrj20u014045u.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">7/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5r0p5qrj20u014045u&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5r12yjfj20u0140n4q&amp;rl=1"><img src="http://ww1.sinaimg.cn/thumb180/63885668ly1gfn5r12yjfj20u0140n4q.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">8/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5r12yjfj20u0140n4q&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5r1cpj9j20u0140dmd&amp;rl=1"><img src="http://ww4.sinaimg.cn/thumb180/63885668ly1gfn5r1cpj9j20u0140dmd.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">9/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5r1cpj9j20u0140dmd&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5r2ph31j20u0140k2p&amp;rl=1"><img src="http://ww2.sinaimg.cn/thumb180/63885668ly1gfn5r2ph31j20u0140k2p.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">10/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5r2ph31j20u0140k2p&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5r281exj20u014049o&amp;rl=1"><img src="http://ww1.sinaimg.cn/thumb180/63885668ly1gfn5r281exj20u014049o.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">11/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5r281exj20u014049o&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5r36o7nj20u0140tik&amp;rl=1"><img src="http://ww3.sinaimg.cn/thumb180/63885668ly1gfn5r36o7nj20u0140tik.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">12/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5r36o7nj20u0140tik&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5r4i7k7j20u01407aw&amp;rl=1"><img src="http://ww3.sinaimg.cn/thumb180/63885668ly1gfn5r4i7k7j20u01407aw.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">13/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5r4i7k7j20u01407aw&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5r4rkfyj20u0140ah1&amp;rl=1"><img src="http://ww2.sinaimg.cn/thumb180/63885668ly1gfn5r4rkfyj20u0140ah1.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">14/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5r4rkfyj20u0140ah1&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5r549j8j20u0140dyn&amp;rl=1"><img src="http://ww2.sinaimg.cn/thumb180/63885668ly1gfn5r549j8j20u0140dyn.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">15/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5r549j8j20u0140dyn&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5r5inyej20u0140h69&amp;rl=1"><img src="http://ww2.sinaimg.cn/thumb180/63885668ly1gfn5r5inyej20u0140h69.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">16/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5r5inyej20u0140h69&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5qynm7kj20u0140ql2&amp;rl=1"><img src="http://ww3.sinaimg.cn/thumb180/63885668ly1gfn5qynm7kj20u0140ql2.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">17/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5qynm7kj20u0140ql2&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J5ZcSnCAg?picId=63885668ly1gfn5r3vo9nj20u0140kb9&amp;rl=1"><img src="http://ww2.sinaimg.cn/thumb180/63885668ly1gfn5r3vo9nj20u0140kb9.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">18/18</span>&nbsp;<a href="/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5r3vo9nj20u0140kb9&amp;rl=1">原图</a><br/></div><div class="c"><a href="/">返回</a></div></body></html>

================================================
FILE: tests/testdata/76233b3f90394581aac6f19cfa5d674a610e8b442b1f83de7673ab49.html
================================================
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta http-equiv="Cache-Control" content="no-cache"/><meta id="viewport" name="viewport" content="width=device-width,initial-scale=1.0,minimum-scale=1.0, maximum-scale=2.0" /><link rel="icon" sizes="any" mask href="https://h5.sinaimg.cn/upload/2015/05/15/28/WeiboLogoCh.svg" color="black"><meta name="MobileOptimized" content="240"/><title>微博</title><style type="text/css" id="internalStyle">html,body,p,form,div,table,textarea,input,span,select{font-size:12px;word-wrap:break-word;}body{background:#F8F9F9;color:#000;padding:1px;margin:1px;}table,tr,td{border-width:0px;margin:0px;padding:0px;}form{margin:0px;padding:0px;border:0px;}textarea{border:1px solid #96c1e6}textarea{width:95%;}a,.tl{color:#2a5492;text-decoration:underline;}/*a:link {color:#023298}*/.k{color:#2a5492;text-decoration:underline;}.kt{color:#F00;}.ib{border:1px solid #C1C1C1;}.pm,.pmy{clear:both;background:#ffffff;color:#676566;border:1px solid #b1cee7;padding:3px;margin:2px 1px;overflow:hidden;}.pms{clear:both;background:#c8d9f3;color:#666666;padding:3px;margin:0 1px;overflow:hidden;}.pmst{margin-top: 5px;}.pmsl{clear:both;padding:3px;margin:0 1px;overflow:hidden;}.pmy{background:#DADADA;border:1px solid #F8F8F8;}.t{padding:0px;margin:0px;height:35px;}.b{background:#e3efff;text-align:center;color:#2a5492;clear:both;padding:4px;}.bl{color:#2a5492;}.n{clear:both;background:#436193;color:#FFF;padding:4px; margin: 1px;}.nt{color:#b9e7ff;}.nl{color:#FFF;text-decoration:none;}.nfw{clear:both;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.s{border-bottom:1px dotted #666666;margin:3px;clear:both;}.tip{clear:both; background:#c8d9f3;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tip2{color:#000000;padding:2px 3px;clear:both;}.ps{clear:both;background:#FFF;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tm{background:#feffe5;border:1px solid #e6de8d;padding:4px;}.tm a{color:#ba8300;}.tmn{color:#f00}.tk{color:#ffffff}.tc{color:#63676A;}.c{padding:2px 5px;}.c div a img{border:1px solid #C1C1C1;}.ct{color:#9d9d9d;font-style:italic;}.cmt{color:#9d9d9d;}.ctt{color:#000;}.cc{color:#2a5492;}.nk{color:#2a5492;}.por {border: 1px solid #CCCCCC;height:50px;width:50px;}.me{color:#000000;background:#FEDFDF;padding:2px 5px;}.pa{padding:2px 4px;}.nm{margin:10px 5px;padding:2px;}.hm{padding:5px;background:#FFF;color:#63676A;}.u{margin:2px 1px;background:#ffffff;border:1px solid #b1cee7;}.ut{padding:2px 3px;}.cd{text-align:center;}.r{color:#F00;}.g{color:#0F0;}.bn{background: transparent;border: 0 none;text-align: left;padding-left: 0;}</style></head><body><div class="c"><a href="/1669879400?page=3&amp;rand=169384">返回</a></div><div class="c"><a href="/mblog/pic/J3xfm61AZ?picId=63885668ly1gf4iwe1dlwj24as2a1b2d&amp;rl=1"><img src="http://ww4.sinaimg.cn/thumb180/63885668ly1gf4iwe1dlwj24as2a1b2d.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">1/2</span>&nbsp;<a href="/mblog/oripic?id=J3xfm61AZ&amp;u=63885668ly1gf4iwe1dlwj24as2a1b2d&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J3xfm61AZ?picId=63885668ly1gf4iwgftuzj22pg1ww1l3&amp;rl=1"><img src="http://ww4.sinaimg.cn/thumb180/63885668ly1gf4iwgftuzj22pg1ww1l3.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">2/2</span>&nbsp;<a href="/mblog/oripic?id=J3xfm61AZ&amp;u=63885668ly1gf4iwgftuzj22pg1ww1l3&amp;rl=1">原图</a><br/></div><div class="c"><a href="/1669879400?page=3&amp;rand=169384">返回</a></div></body></html>


================================================
FILE: tests/testdata/a4437630f3bdfa2757bae1595186ac063fe5ec25cf2f98116ece83cb.html
================================================
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta http-equiv="Cache-Control" content="no-cache"/><meta id="viewport" name="viewport" content="width=device-width,initial-scale=1.0,minimum-scale=1.0, maximum-scale=2.0" /><link rel="icon" sizes="any" mask href="https://h5.sinaimg.cn/upload/2015/05/15/28/WeiboLogoCh.svg" color="black"><meta name="MobileOptimized" content="240"/><title>Dear-迪丽热巴的微博</title><style type="text/css" id="internalStyle">html,body,p,form,div,table,textarea,input,span,select{font-size:12px;word-wrap:break-word;}body{background:#F8F9F9;color:#000;padding:1px;margin:1px;}table,tr,td{border-width:0px;margin:0px;padding:0px;}form{margin:0px;padding:0px;border:0px;}textarea{border:1px solid #96c1e6}textarea{width:95%;}a,.tl{color:#2a5492;text-decoration:underline;}/*a:link {color:#023298}*/.k{color:#2a5492;text-decoration:underline;}.kt{color:#F00;}.ib{border:1px solid #C1C1C1;}.pm,.pmy{clear:both;background:#ffffff;color:#676566;border:1px solid #b1cee7;padding:3px;margin:2px 1px;overflow:hidden;}.pms{clear:both;background:#c8d9f3;color:#666666;padding:3px;margin:0 1px;overflow:hidden;}.pmst{margin-top: 5px;}.pmsl{clear:both;padding:3px;margin:0 1px;overflow:hidden;}.pmy{background:#DADADA;border:1px solid #F8F8F8;}.t{padding:0px;margin:0px;height:35px;}.b{background:#e3efff;text-align:center;color:#2a5492;clear:both;padding:4px;}.bl{color:#2a5492;}.n{clear:both;background:#436193;color:#FFF;padding:4px; margin: 1px;}.nt{color:#b9e7ff;}.nl{color:#FFF;text-decoration:none;}.nfw{clear:both;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.s{border-bottom:1px dotted #666666;margin:3px;clear:both;}.tip{clear:both; background:#c8d9f3;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tip2{color:#000000;padding:2px 3px;clear:both;}.ps{clear:both;background:#FFF;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tm{background:#feffe5;border:1px solid #e6de8d;padding:4px;}.tm a{color:#ba8300;}.tmn{color:#f00}.tk{color:#ffffff}.tc{color:#63676A;}.c{padding:2px 5px;}.c div a img{border:1px solid #C1C1C1;}.ct{color:#9d9d9d;font-style:italic;}.cmt{color:#9d9d9d;}.ctt{color:#000;}.cc{color:#2a5492;}.nk{color:#2a5492;}.por {border: 1px solid #CCCCCC;height:50px;width:50px;}.me{color:#000000;background:#FEDFDF;padding:2px 5px;}.pa{padding:2px 4px;}.nm{margin:10px 5px;padding:2px;}.hm{padding:5px;background:#FFF;color:#63676A;}.u{margin:2px 1px;background:#ffffff;border:1px solid #b1cee7;}.ut{padding:2px 3px;}.cd{text-align:center;}.r{color:#F00;}.g{color:#0F0;}.bn{background: transparent;border: 0 none;text-align: left;padding-left: 0;}</style></head><body><div class="n" style="padding: 6px 4px;"><a href="https://weibo.cn/?tf=5_009" class="nl">首页<span class="tk">!</span></a>|<a href="https://weibo.cn/msg/?tf=5_010" class="nl">消息</a>|<a href="https://huati.weibo.cn" class="nl">话题</a>|<a href="https://weibo.cn/search/?tf=5_012" class="nl">搜索</a>|<a href="/1669879400?rand=4158&amp;p=r" class="nl">刷新</a></div><div class="c tip"><a href="https://m.weibo.cn" id="top" class="tl">手机微博触屏版,点击前往>></a></div><div class="u"><table><tr><td valign="top"><a href="/1669879400/avatar?rl=0"><img src="https://tvax1.sinaimg.cn/crop.0.0.1080.1080.50/63885668ly8geyrcrw0zjj20u00u0mz6.jpg?KID=imgbed,tva&Expires=1592509636&ssig=E%2Bca9cPhHO" alt="头像" class="por" /></a></td><td valign="top"><div class="ut"><span class="ctt">Dear-迪丽热巴<img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5338.gif" alt="V"/><a href="http://vip.weibo.cn/?F=W_tq_zsbs_01"><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/></a>&nbsp;女/上海    &nbsp;    <a href="/attention/add?uid=1669879400&amp;rl=0&amp;st=46d484">加关注</a></span><br /><span class="ctt">认证:嘉行传媒签约演员 </span><br /><span class="ctt" style="word-break:break-all; width:50px;">一只喜欢默默表演的小透明。工作联系jaywalk@jaywalk.com.cn ...</span><br /><a href="/im/chat?uid=1669879400&amp;rl=0">私信</a>&nbsp;<a href="/1669879400/info">资料</a>&nbsp;<a href="/1669879400/operation?rl=0">操作</a>&nbsp;<a href="/attgroup/special?fuid=1669879400&amp;st=46d484">特别关注</a>&nbsp;<a href="http://new.vip.weibo.cn/vippay/payother?present=1&amp;action=comfirmTime&amp;uid=1669879400">送Ta会员</a></div></td></tr></table><div class="tip2"><span class="tc">微博[1159]</span>&nbsp;<a href="/1669879400/follow">关注[253]</a>&nbsp;<a href="/1669879400/fans">粉丝[70805574]</a>&nbsp;<a href="/attgroup/opening?uid=1669879400">分组[1]</a>&nbsp;<a href="/at/weibo?uid=1669879400">@她的</a></div></div><div class="pmst"><span class="pms">&nbsp;微博&nbsp;</span><span class="pmsl">&nbsp;<a href="/1669879400/photo?tf=6_008">相册</a>&nbsp;</span></div><div class="pms" >全部-<a href="/1669879400?filter=1">原创</a>-<a href="/1669879400?filter=2">图片</a>-<a href="/attgroup/opening?uid=1669879400">分组</a>-<a href="/1669879400/search?f=u&amp;rl=0">筛选</a></div><div class="c" id="M_J74OAhxtL"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=%E4%B8%80%E8%B5%B7%E7%83%AD%E7%88%B1%E5%B0%B1%E7%8E%B0%E5%9C%A8&amp;from=feed">#一起热爱就现在#</a>给你们康康我眼前的画面<img alt="[嘻嘻]" src="//h5.sinaimg.cn/m/emoticon/icon/default/d_xixi-813ededea2.png" style="width:1em; height:1em;" /></span> <a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA6Lftj1K&amp;ep=J74OAhxtL%2C1669879400%2CJ74OAhxtL%2C1669879400">绿洲</a> </span></div><div><a href="https://weibo.cn/mblog/pic/J74OAhxtL?rl=0"><img src="http://wx4.sinaimg.cn/wap180/63885668ly3gfvg8ubqjmj20u01407wh.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J74OAhxtL&amp;u=63885668ly3gfvg8ubqjmj20u01407wh">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J74OAhxtL/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[523801]</a>&nbsp;<a href="https://weibo.cn/repost/J74OAhxtL?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J74OAhxtL?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[393530]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J74OAhxtL?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月17日 18:12&nbsp;来自绿洲APP</span></div></div><div class="s"></div><div class="c" id="M_J6INEAyUV"><div><span class="ctt">刚收到我定制的亓那眼镜,猜猜定制了什么<img alt="[doge]" src="//h5.sinaimg.cn/m/emoticon/icon/others/d_doge-861403219c.png" style="width:1em; height:1em;" /></span>好奇?没关系,你们也可以拥有自己的定制眼镜。关注<a href="/n/QINA%E4%BA%93%E9%82%A3%E7%9C%BC%E9%95%9C">@QINA亓那眼镜</a> 解锁6月限定惊喜,<a href="https://weibo.cn/pages/100808topic?extparam=%E6%97%B6%E9%AB%A6%E5%AF%BB%E5%AE%9D%E8%AE%A1%E5%88%92&amp;from=feed">#时髦寻宝计划#</a> 线上线下都安排了<img alt="[偷笑]" src="//h5.sinaimg.cn/m/emoticon/icon/default/d_touxiao-15afb1c739.png" style="width:1em; height:1em;" /></span><a href="https://m.weibo.cn/s/video/show?object_id=1034:4514680232673315&amp;fromWap=1">QINA亓那眼镜的微博视频</a> </span>&nbsp;<br /><a href="https://weibo.cn/attitude/J6INEAyUV/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[415899]</a>&nbsp;<a href="https://weibo.cn/repost/J6INEAyUV?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J6INEAyUV?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J6INEAyUV?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月15日 10:09</span></div></div><div class="s"></div><div class="c" id="M_J6Divw6j7"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=idoltube&amp;from=feed">#idoltube#</a><a href="https://weibo.cn/pages/100808topic?extparam=%E5%91%A8%E6%94%BEvlog&amp;from=feed">#周放vlog#</a> 什么?放放子还有两副面孔呢?<img alt="[喵喵]" src="//h5.sinaimg.cn/m/emoticon/icon/others/d_miao-61fe2a7aaa.png" style="width:1em; height:1em;" /></span> <a href="https://weibo.cn/pages/100808topic?extparam=%E5%B9%B8%E7%A6%8F%E8%A7%A6%E6%89%8B%E5%8F%AF%E5%8F%8A&amp;from=feed">#幸福触手可及#</a> <a href="https://m.weibo.cn/s/video/show?object_id=1034:4515809876181011&amp;fromWap=1">Dear-迪丽热巴的微博视频</a> </span>&nbsp;<br /><a href="https://weibo.cn/attitude/J6Divw6j7/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[318054]</a>&nbsp;<a href="https://weibo.cn/repost/J6Divw6j7?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J6Divw6j7?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[514546]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J6Divw6j7?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月14日 20:09&nbsp;来自影视剪辑 · 视频社区</span></div></div><div class="s"></div><div class="c" id="M_J6k49kbTc"><div><span class="ctt">7000<img alt="[耶]" src="//h5.sinaimg.cn/m/emoticon/icon/others/h_ye-256191c090.png" style="width:1em; height:1em;" /></span> <a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA6Lzm5BB&amp;ep=J6k49kbTc%2C1669879400%2CJ6k49kbTc%2C1669879400">绿洲</a> </span>&nbsp;[<a href="https://weibo.cn/mblog/picAll/J6k49kbTc?rl=1">组图共2张</a>]</div><div><a href="https://weibo.cn/mblog/pic/J6k49kbTc?rl=0"><img src="http://wx1.sinaimg.cn/wap180/63885668ly3gfppuxsc3lj216n1kwqv5.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J6k49kbTc&amp;u=63885668ly3gfppuxsc3lj216n1kwqv5">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J6k49kbTc/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[1150265]</a>&nbsp;<a href="https://weibo.cn/repost/J6k49kbTc?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J6k49kbTc?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J6k49kbTc?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月12日 19:11&nbsp;来自绿洲APP</span></div></div><div class="s"></div><div class="c" id="M_J5ZcSnCAg"><div><span class="ctt">言出必行,说了18张就是18张,送给七千万的你们 ~ </span>&nbsp;[<a href="https://weibo.cn/mblog/picAll/J5ZcSnCAg?rl=1">组图共18张</a>]</div><div><a href="https://weibo.cn/mblog/pic/J5ZcSnCAg?rl=0"><img src="http://wx3.sinaimg.cn/wap180/63885668ly1gfn5qz5m1yj20u0140472.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J5ZcSnCAg&amp;u=63885668ly1gfn5qz5m1yj20u0140472">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J5ZcSnCAg/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[2511439]</a>&nbsp;<a href="https://weibo.cn/repost/J5ZcSnCAg?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J5ZcSnCAg?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J5ZcSnCAg?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月10日 14:05</span></div></div><div class="s"></div><div class="c" id="M_J5GPjo3Tu"><div><span class="ctt">放放子缺个快板<img alt="[偷笑]" src="//h5.sinaimg.cn/m/emoticon/icon/default/d_touxiao-15afb1c739.png" style="width:1em; height:1em;" /></span> <a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA62rvGPb&amp;ep=J5GPjo3Tu%2C1669879400%2CJ5GPjo3Tu%2C1669879400">绿洲</a> </span></div><div><a href="https://weibo.cn/mblog/pic/J5GPjo3Tu?rl=0"><img src="http://wx3.sinaimg.cn/wap180/63885668ly3gfkwmkb6sej21kw1kw7wi.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J5GPjo3Tu&amp;u=63885668ly3gfkwmkb6sej21kw1kw7wi">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J5GPjo3Tu/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[571755]</a>&nbsp;<a href="https://weibo.cn/repost/J5GPjo3Tu?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J5GPjo3Tu?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J5GPjo3Tu?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月08日 15:17&nbsp;来自绿洲APP</span></div></div><div class="s"></div><div class="c" id="M_J5ykphxc8"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=%E5%87%BA%E6%89%8B%E5%90%A7%E5%85%84%E5%BC%9F&amp;from=feed">#出手吧兄弟#</a> <a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA623TGxZ&amp;ep=J5ykphxc8%2C1669879400%2CJ5ykphxc8%2C1669879400">今晚20:10,我在湖南卫视</a> ,为湖南永州的夏橙出手! <a href="https://weibo.cn/pages/100808topic?extparam=%E5%87%BA%E6%89%8B%E5%90%A7%E5%85%84%E5%BC%9F%E8%8A%82%E7%9B%AE%E5%8D%95&amp;from=feed">#出手吧兄弟节目单#</a> </span></div><div><a href="https://weibo.cn/mblog/pic/J5ykphxc8?rl=0"><img src="http://wx4.sinaimg.cn/wap180/63885668gy1gfjv3m6ixxj20u01hcjxe.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J5ykphxc8&amp;u=63885668gy1gfjv3m6ixxj20u01hcjxe">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J5ykphxc8/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[0]</a>&nbsp;<a href="https://weibo.cn/repost/J5ykphxc8?uid=1669879400&amp;rl=0">转发[0]</a>&nbsp;<a href="https://weibo.cn/comment/J5ykphxc8?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[0]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J5ykphxc8?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月07日 17:39</span></div></div><div class="s"></div><div class="c" id="M_J5cVGuUNq"><div><span class="cmt">转发了&nbsp;<a href="https://weibo.cn/wcschinaprogram">WCS野生生物保护学会</a><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5337.gif" alt="V"/>&nbsp;的微博:</span><span class="ctt">去年和亲善大使热巴<a href="/n/Dear-%E8%BF%AA%E4%B8%BD%E7%83%AD%E5%B7%B4">@Dear-迪丽热巴</a> 的特别回忆<img alt="[心]" src="//h5.sinaimg.cn/m/emoticon/icon/others/l_xin-6912791858.png" style="width:1em; height:1em;" /></span>。我们在藏北羌塘一起爬山,探访藏羚羊、雪豹、黑颈鹤的栖息地,感受野生动物保护工作的点滴。此时此刻,我们比以往更加重视与自然相处的方式,我们也从未如此迫切需要将想法付诸行动。热巴已经和我们<a href="/n/%E5%8C%97%E4%BA%AC%E7%BB%BF%E8%89%B2%E9%98%B3%E5%85%89">@北京绿色阳光</a> 站在一起,希望看完视频的你们,也...<a href='/comment/J5cRdli6m?ckAll=1'>全文</a></span>&nbsp;<span class="cmt">赞[119296]</span>&nbsp;<span class="cmt">原文转发[1000000]</span>&nbsp;<a href="https://weibo.cn/comment/J5cRdli6m?rl=0#cmtfrm" class="cc">原文评论[38688]</a><!----></div><div><span class="cmt">转发理由:</span>在羌塘的美好回忆~第一次来到这片独特的荒野,看到野生动物自由生活,还有一群快乐可爱的人在守护着它们。把这些美好留存下来,关注野生动物保护,积极行动,我们每个人都能贡献力量。<img alt="[心]" src="//h5.sinaimg.cn/m/emoticon/icon/others/l_xin-6912791858.png" style="width:1em; height:1em;" /></span>&nbsp;&nbsp;<br /><a href="https://weibo.cn/attitude/J5cVGuUNq/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[554415]</a>&nbsp;<a href="https://weibo.cn/repost/J5cVGuUNq?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J5cVGuUNq?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J5cVGuUNq?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月05日 11:11</span></div></div><div class="s"></div><div class="c" id="M_J4YtUxJjO"><div><span class="ctt">要开心。要充实。 </span></div><div><a href="https://weibo.cn/mblog/pic/J4YtUxJjO?rl=0"><img src="http://wx2.sinaimg.cn/wap180/63885668ly1gffgspjnqnj21400u07ao.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J4YtUxJjO&amp;u=63885668ly1gffgspjnqnj21400u07ao">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J4YtUxJjO/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[2386877]</a>&nbsp;<a href="https://weibo.cn/repost/J4YtUxJjO?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J4YtUxJjO?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J4YtUxJjO?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月03日 22:24</span></div></div><div class="s"></div><div class="c" id="M_J4X9cvVA4"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=%E5%BE%AE%E5%8D%9Alive%E7%A7%80&amp;from=feed">#微博live秀#</a> 28岁的直播~<a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA62lK1TV&amp;ep=J4X9cvVA4%2C1669879400%2CJ4X9cvVA4%2C1669879400">@Dear-迪丽热巴 的一直播</a>(下载App-&gt;<a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FRDUuslr&amp;ep=J4X9cvVA4%2C1669879400%2CJ4X9cvVA4%2C1669879400">http://t.cn/RDUuslr</a>) </span>&nbsp;<br /><a href="https://weibo.cn/attitude/J4X9cvVA4/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[435650]</a>&nbsp;<a href="https://weibo.cn/repost/J4X9cvVA4?uid=1669879400&amp;rl=0">转发[23584]</a>&nbsp;<a href="https://weibo.cn/comment/J4X9cvVA4?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J4X9cvVA4?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">06月03日 19:00&nbsp;来自一直播Yi</span></div></div><div class="s"></div><div class="pa" id="pagelist"><form action="/1669879400" method="post"><div><a href="/1669879400?page=2">下页</a>&nbsp;<input name="mp" type="hidden" value="117" /><input type="text" name="page" size="2" style='-wap-input-format: "*N"' /><input type="submit" value="跳页" />&nbsp;1/117页</div></form></div><div class="pm"><form action="/search/" method="post"><div><input type="text" name="keyword" value="" size="15" /><input type="submit" name="smblog" value="搜微博" /><input type="submit" name="suser" value="找人" /><br/><span class="pmf"><a href="/search/mblog/?keyword=%E6%97%A0%E8%BF%AA%E7%8E%8B%E7%89%8C%E7%BB%84%E5%90%88&amp;rl=0" class="k">无迪王牌组合</a>&nbsp;<a href="/search/mblog/?keyword=%E7%A7%91%E6%AF%94%E5%9D%A0%E6%9C%BA%E7%B3%BB%E5%9B%A0%E9%A3%9E%E8%A1%8C%E5%91%98%E8%BF%B7%E5%A4%B1%E6%96%B9%E5%90%91&amp;rl=0" class="k">科比坠机系因飞行员迷失方向</a>&nbsp;<a href="/search/mblog/?keyword=%E9%82%93%E4%BC%A6%E6%9D%8E%E4%BD%B3%E7%90%A6%E7%9B%B4%E6%92%AD&amp;rl=0" class="k">邓伦李佳琦直播</a>&nbsp;<a href="/search/mblog/?keyword=%E8%9B%8B%E5%A3%B3%E5%85%AC%E5%AF%93CEO%E8%A2%AB%E8%B0%83%E6%9F%A5&amp;rl=0" class="k">蛋壳公寓CEO被调查</a>&nbsp;<a href="/search/mblog/?keyword=%E4%B8%BA%E4%BB%9D%E5%8D%93%E5%8A%9E%E7%90%86%E8%99%9A%E5%81%87%E8%BD%AC%E5%AD%A6%E6%89%8B%E7%BB%AD6%E4%BA%BA%E8%A2%AB%E5%A4%84%E7%90%86&amp;rl=0" class="k">为仝卓办理虚假转学手续6人被处理</a></span></div></form></div><div class="cd"><a href="#top"><img src="https://h5.sinaimg.cn/upload/2017/04/27/319/5e990ec2.gif" alt="TOP"/></a></div><div class="pms"><a href="https://weibo.cn">首页<span class="tk">!</span></a>.<a href="https://weibo.cn/topic/240489">反馈</a>.<a href="https://weibo.cn/page/91">帮助</a>.<a  href="http://down.sina.cn/weibo/default/index/soft_id/1/mid/0"  >客户端</a>.<a href="https://weibo.cn/spam/?rl=0&amp;type=3&amp;fuid=1669879400" class="kt">举报</a>.<a href="https://passport.sina.cn/sso/logout?r=https%3A%2F%2Fweibo.cn%2Fpub%2F%3Fvt%3D&amp;entry=mweibo">退出</a></div><div class="c">设置:<a href="https://weibo.cn/account/customize/skin?tf=7_005&amp;st=46d484">皮肤</a>.<a href="https://weibo.cn/account/customize/pic?tf=7_006&amp;st=46d484">图片</a>.<a href="https://weibo.cn/account/customize/pagesize?tf=7_007&amp;st=46d484">条数</a>.<a href="https://weibo.cn/account/privacy/?tf=7_008&amp;st=46d484">隐私</a></div><div class="c">彩版|<a href="https://m.weibo.cn/?tf=7_010">触屏</a>|<a href="https://weibo.cn/page/521?tf=7_011">语音</a></div><div class="b">weibo.cn[06-19 00:47]</div></body></html>


================================================
FILE: tests/testdata/b541fd1751117498b6d6f40d3321686ddf871651237c4ac854a5c3eb.html
================================================
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta http-equiv="Cache-Control" content="no-cache"/><meta id="viewport" name="viewport" content="width=device-width,initial-scale=1.0,minimum-scale=1.0, maximum-scale=2.0" /><link rel="shortcut icon" type="image/x-icon" href="https://weibo.cn/favicon.ico"><link rel="icon" sizes="any" mask href="https://h5.sinaimg.cn/upload/2015/05/15/28/WeiboLogoCh.svg" color="black"><meta name="MobileOptimized" content="240"/><title>专辑:头像相册</title><style type="text/css" id="internalStyle">html,body,p,form,div,table,textarea,input,span,select{font-size:12px;word-wrap:break-word;}body{background:#F8F9F9;color:#000;padding:1px;margin:1px;}table,tr,td{border-width:0px;margin:0px;padding:0px;}form{margin:0px;padding:0px;border:0px;}textarea{border:1px solid #96c1e6}textarea{width:95%;}a,.tl{color:#2a5492;text-decoration:underline;}/*a:link {color:#023298}*/.k{color:#2a5492;text-decoration:underline;}.kt{color:#F00;}.ib{border:1px solid #C1C1C1;}.pm,.pmy{clear:both;background:#ffffff;color:#676566;border:1px solid #b1cee7;padding:3px;margin:2px 1px;overflow:hidden;}.pms{clear:both;background:#c8d9f3;color:#666666;padding:3px;margin:0 1px;overflow:hidden;}.pmst{margin-top: 5px;}.pmsl{clear:both;padding:3px;margin:0 1px;overflow:hidden;}.pmy{background:#DADADA;border:1px solid #F8F8F8;}.t{padding:0px;margin:0px;height:35px;}.b{background:#e3efff;text-align:center;color:#2a5492;clear:both;padding:4px;}.bl{color:#2a5492;}.n{clear:both;background:#436193;color:#FFF;padding:4px; margin: 1px;}.nt{color:#b9e7ff;}.nl{color:#FFF;text-decoration:none;}.nfw{clear:both;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.s{border-bottom:1px dotted #666666;margin:3px;clear:both;}.tip{clear:both; background:#c8d9f3;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tip2{color:#000000;padding:2px 3px;clear:both;}.ps{clear:both;background:#FFF;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tm{background:#feffe5;border:1px solid #e6de8d;padding:4px;}.tm a{color:#ba8300;}.tmn{color:#f00}.tk{color:#ffffff}.tc{color:#63676A;}.c{padding:2px 5px;}.c div a img{border:1px solid #C1C1C1;}.ct{color:#9d9d9d;font-style:italic;}.cmt{color:#9d9d9d;}.ctt{color:#000;}.cc{color:#2a5492;}.nk{color:#2a5492;}.por {border: 1px solid #CCCCCC;height:50px;width:50px;}.me{color:#000000;background:#FEDFDF;padding:2px 5px;}.pa{padding:2px 4px;}.nm{margin:10px 5px;padding:2px;}.hm{padding:5px;background:#FFF;color:#63676A;}.u{margin:2px 1px;background:#ffffff;border:1px solid #b1cee7;}.ut{padding:2px 3px;}.cd{text-align:center;}.r{color:#F00;}.g{color:#0F0;}.bn{background: transparent;border: 0 none;text-align: left;padding-left: 0;}</style><script>if(top != self){top.location = self.location;}</script></head><body><div class="tm"><a href="https://weibo.cn/msg/comment/receive?unread=1"><span class="tmn">1</span>评论</a>&nbsp;&nbsp;<a href="https://weibo.cn/msg/clearAllUnread?type=dcm&amp;rl=11"><img src="https://h5.sinaimg.cn/upload/2016/12/30/125/5366.gif" alt="[X]" /></a><br/></div><div class="c" style="padding: 6px 4px;"><a href="/?tf=5_009">首页!</a>|<a href="/msg/?tf=5_010">消息</a>|<a href="/album/166564740000001980768563?rl=1&amp;rand=5759&amp;p=r">刷新</a></div><div style="background-color:#77BBE0;"></div><div style="margin:0px;" class="n"><a href="/album/updates?st=81505d" class="nl">好友</a>|<a href="/album/square" class="nl">美图</a>|<a href="/album/likelist" class="nl">喜欢</a>|<a href="/album/albumlist" class="nl">我的</a></div><div class="tip"><a href="https://weibo.cn/shuangye2012" class="nk">霜叶</a><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5338.gif" alt="V" /><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/>&gt;<a href="/album/albumlist?fuid=1980768563">他的相册</a>&gt;浏览</div><div class="c">专辑:头像相册</div><div class="c"><a href="/album/166564740000001980768563/photo/44534610716263050000001980768563/detail?page=1&amp;rl=11"><img src="http://wx1.sinaimg.cn/wap180/76102133ly8ga961tpte6j20u00u0q65.jpg" alt='' class="c"/></a><a href="/album/166564740000001980768563/photo/43010955171725440000001980768563/detail?page=2&amp;rl=11"><img src="http://wx2.sinaimg.cn/wap180/76102133ly8fwr33wpn8fj20v90v9tbw.jpg" alt='' class="c"/></a><a href="/album/166564740000001980768563/photo/42882049152386420000001980768563/detail?page=3&amp;rl=11"><img src="http://wx4.sinaimg.cn/wap180/76102133ly8fvlyn5n52gj20v90v949a.jpg" alt='' class="c"/></a><a href="/album/166564740000001980768563/photo/41572947986015870000001980768563/detail?page=4&amp;rl=11"><img src="http://wx2.sinaimg.cn/wap180/76102133ly8fk0btnrn5zj20dp0e8q3t.jpg" alt='' class="c"/></a></div><div class="c"><a href="/album/166564740000001980768563/rt?rl=11">转发</a>&nbsp;<a href="/album/166564740000001980768563/comment?rl=11">评论</a>&nbsp;</div><div class="pm">照片墙|<a href="/album/166564740000001980768563/?DisplayMode=2&amp;rl=1">传统列表</a></div><div class="cd"><a href="#top"><img src="https://h5.sinaimg.cn/upload/2017/04/27/319/5e990ec2.gif" alt="TOP"/></a></div><div class="pms"><a href="https://weibo.cn">首页<span class="tk">!</span></a>.<a href="https://weibo.cn/topic/240489">反馈</a>.<a href="https://weibo.cn/page/91">帮助</a>.<a  href="https://c.weibo.cn"  >客户端</a>.<a href="https://weibo.cn/spam/?rl=11&amp;type=3&amp;fuid=3113276555" class="kt">举报</a>.<a href="https://weibo.cn/logout">退出</a></div><div class="b"><a href="https://beian.miit.gov.cn" target="_blank">京ICP备12002058号-1</a> [08-16 00:54]</div></body></html>

================================================
FILE: tests/testdata/ca5f2a555e8d62f728c66fa90afb2d54d19f8c898e164204a61bdf03.html
================================================
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta http-equiv="Cache-Control" content="no-cache"/><meta id="viewport" name="viewport" content="width=device-width,initial-scale=1.0,minimum-scale=1.0, maximum-scale=2.0" /><link rel="icon" sizes="any" mask href="https://h5.sinaimg.cn/upload/2015/05/15/28/WeiboLogoCh.svg" color="black"><meta name="MobileOptimized" content="240"/><title>Dear-迪丽热巴的资料</title><style type="text/css" id="internalStyle">html,body,p,form,div,table,textarea,input,span,select{font-size:12px;word-wrap:break-word;}body{background:#F8F9F9;color:#000;padding:1px;margin:1px;}table,tr,td{border-width:0px;margin:0px;padding:0px;}form{margin:0px;padding:0px;border:0px;}textarea{border:1px solid #96c1e6}textarea{width:95%;}a,.tl{color:#2a5492;text-decoration:underline;}/*a:link {color:#023298}*/.k{color:#2a5492;text-decoration:underline;}.kt{color:#F00;}.ib{border:1px solid #C1C1C1;}.pm,.pmy{clear:both;background:#ffffff;color:#676566;border:1px solid #b1cee7;padding:3px;margin:2px 1px;overflow:hidden;}.pms{clear:both;background:#c8d9f3;color:#666666;padding:3px;margin:0 1px;overflow:hidden;}.pmst{margin-top: 5px;}.pmsl{clear:both;padding:3px;margin:0 1px;overflow:hidden;}.pmy{background:#DADADA;border:1px solid #F8F8F8;}.t{padding:0px;margin:0px;height:35px;}.b{background:#e3efff;text-align:center;color:#2a5492;clear:both;padding:4px;}.bl{color:#2a5492;}.n{clear:both;background:#436193;color:#FFF;padding:4px; margin: 1px;}.nt{color:#b9e7ff;}.nl{color:#FFF;text-decoration:none;}.nfw{clear:both;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.s{border-bottom:1px dotted #666666;margin:3px;clear:both;}.tip{clear:both; background:#c8d9f3;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tip2{color:#000000;padding:2px 3px;clear:both;}.ps{clear:both;background:#FFF;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tm{background:#feffe5;border:1px solid #e6de8d;padding:4px;}.tm a{color:#ba8300;}.tmn{color:#f00}.tk{color:#ffffff}.tc{color:#63676A;}.c{padding:2px 5px;}.c div a img{border:1px solid #C1C1C1;}.ct{color:#9d9d9d;font-style:italic;}.cmt{color:#9d9d9d;}.ctt{color:#000;}.cc{color:#2a5492;}.nk{color:#2a5492;}.por {border: 1px solid #CCCCCC;height:50px;width:50px;}.me{color:#000000;background:#FEDFDF;padding:2px 5px;}.pa{padding:2px 4px;}.nm{margin:10px 5px;padding:2px;}.hm{padding:5px;background:#FFF;color:#63676A;}.u{margin:2px 1px;background:#ffffff;border:1px solid #b1cee7;}.ut{padding:2px 3px;}.cd{text-align:center;}.r{color:#F00;}.g{color:#0F0;}.bn{background: transparent;border: 0 none;text-align: left;padding-left: 0;}</style></head><body><div class="n" style="padding: 6px 4px;"><a href="https://weibo.cn/?tf=5_009" class="nl">首页<span class="tk">!</span></a>|<a href="https://weibo.cn/msg/?tf=5_010" class="nl">消息</a>|<a href="https://huati.weibo.cn" class="nl">话题</a>|<a href="https://weibo.cn/search/?tf=5_012" class="nl">搜索</a>|<a href="/1669879400/info?rand=4415&amp;p=r" class="nl">刷新</a></div><div class="c tip"><a href="https://m.weibo.cn" id="top" class="tl">手机微博触屏版,点击前往>></a></div><div class="c"><img src="https://tvax1.sinaimg.cn/crop.0.0.1080.1080.180/63885668ly8geyrcrw0zjj20u00u0mz6.jpg?KID=imgbed,tva&Expires=1592509636&ssig=qGABB3EqzX" alt="头像" /></div><div class="c">会员等级:7级&nbsp;<a href="/member/present/comfirmTime?uid=1669879400">送Ta会员</a><br/><img src="http://img.t.sinajs.cn/t4/style/images/medal/433_s.gif?version=" alt="微身份" />&nbsp;<img src="http://img.t.sinajs.cn/t4/style/images/medal/98_s.gif?version=" alt="语惊四座" />&nbsp;<img src="http://img.t.sinajs.cn/t4/style/images/medal/1_s.gif?version=" alt="七步成诗" />&nbsp;<img src="http://img.t.sinajs.cn/t4/style/images/medal/8_s.gif?version=" alt="谈笑风生" />&nbsp;<a href="/medal/owned?uid=1669879400">更多勋章</a></div><div class="tip">基本信息</div><div class="c">昵称:Dear-迪丽热巴<br/>认证:嘉行传媒签约演员 <br/>性别:女<br/>地区:上海<br/>生日:双子座<br/>认证信息:嘉行传媒签约演员 <br/>简介:一只喜欢默默表演的小透明。工作联系jaywalk@jaywalk.com.cn 🍒<br/></div><div class="tip">学习经历</div><div class="c">·上海戏剧学院<br/></div><div class="tip">工作经历</div><div class="c">·嘉行传媒&nbsp;<br/></div><div class="tip">其他信息</div><div class="c">互联网:http://weibo.com/u/1669879400<br/>手机版:https://weibo.cn/u/1669879400<br/><a href="/album/albumlist?fuid=1669879400">她的相册&gt;&gt;</a></div><div class="cd"><a href="#top"><img src="https://h5.sinaimg.cn/upload/2017/04/27/319/5e990ec2.gif" alt="TOP"/></a></div><div class="pms"><a href="https://weibo.cn">首页<span class="tk">!</span></a>.<a href="https://weibo.cn/topic/240489">反馈</a>.<a href="https://weibo.cn/page/91">帮助</a>.<a  href="http://down.sina.cn/weibo/default/index/soft_id/1/mid/0"  >客户端</a>.<a href="https://weibo.cn/spam/?rl=1&amp;type=3&amp;fuid=1669879400" class="kt">举报</a>.<a href="https://passport.sina.cn/sso/logout?r=https%3A%2F%2Fweibo.cn%2Fpub%2F%3Fvt%3D&amp;entry=mweibo">退出</a></div><div class="c">设置:<a href="https://weibo.cn/account/customize/skin?tf=7_005&amp;st=46d484">皮肤</a>.<a href="https://weibo.cn/account/customize/pic?tf=7_006&amp;st=46d484">图片</a>.<a href="https://weibo.cn/account/customize/pagesize?tf=7_007&amp;st=46d484">条数</a>.<a href="https://weibo.cn/account/privacy/?tf=7_008&amp;st=46d484">隐私</a></div><div class="c">彩版|<a href="https://m.weibo.cn/?tf=7_010">触屏</a>|<a href="https://weibo.cn/page/521?tf=7_011">语音</a></div><div class="b">weibo.cn[06-19 00:47]</div></body></html>


================================================
FILE: tests/testdata/d486235d4a17dd0accb0f2cc77b3648abfa03580b9e0cdb61f1e618f.html
================================================
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta http-equiv="Cache-Control" content="no-cache"/><meta id="viewport" name="viewport" content="width=device-width,initial-scale=1.0,minimum-scale=1.0, maximum-scale=2.0" /><link rel="icon" sizes="any" mask href="https://h5.sinaimg.cn/upload/2015/05/15/28/WeiboLogoCh.svg" color="black"><meta name="MobileOptimized" content="240"/><title>Dear-迪丽热巴的微博</title><style type="text/css" id="internalStyle">html,body,p,form,div,table,textarea,input,span,select{font-size:12px;word-wrap:break-word;}body{background:#F8F9F9;color:#000;padding:1px;margin:1px;}table,tr,td{border-width:0px;margin:0px;padding:0px;}form{margin:0px;padding:0px;border:0px;}textarea{border:1px solid #96c1e6}textarea{width:95%;}a,.tl{color:#2a5492;text-decoration:underline;}/*a:link {color:#023298}*/.k{color:#2a5492;text-decoration:underline;}.kt{color:#F00;}.ib{border:1px solid #C1C1C1;}.pm,.pmy{clear:both;background:#ffffff;color:#676566;border:1px solid #b1cee7;padding:3px;margin:2px 1px;overflow:hidden;}.pms{clear:both;background:#c8d9f3;color:#666666;padding:3px;margin:0 1px;overflow:hidden;}.pmst{margin-top: 5px;}.pmsl{clear:both;padding:3px;margin:0 1px;overflow:hidden;}.pmy{background:#DADADA;border:1px solid #F8F8F8;}.t{padding:0px;margin:0px;height:35px;}.b{background:#e3efff;text-align:center;color:#2a5492;clear:both;padding:4px;}.bl{color:#2a5492;}.n{clear:both;background:#436193;color:#FFF;padding:4px; margin: 1px;}.nt{color:#b9e7ff;}.nl{color:#FFF;text-decoration:none;}.nfw{clear:both;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.s{border-bottom:1px dotted #666666;margin:3px;clear:both;}.tip{clear:both; background:#c8d9f3;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tip2{color:#000000;padding:2px 3px;clear:both;}.ps{clear:both;background:#FFF;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tm{background:#feffe5;border:1px solid #e6de8d;padding:4px;}.tm a{color:#ba8300;}.tmn{color:#f00}.tk{color:#ffffff}.tc{color:#63676A;}.c{padding:2px 5px;}.c div a img{border:1px solid #C1C1C1;}.ct{color:#9d9d9d;font-style:italic;}.cmt{color:#9d9d9d;}.ctt{color:#000;}.cc{color:#2a5492;}.nk{color:#2a5492;}.por {border: 1px solid #CCCCCC;height:50px;width:50px;}.me{color:#000000;background:#FEDFDF;padding:2px 5px;}.pa{padding:2px 4px;}.nm{margin:10px 5px;padding:2px;}.hm{padding:5px;background:#FFF;color:#63676A;}.u{margin:2px 1px;background:#ffffff;border:1px solid #b1cee7;}.ut{padding:2px 3px;}.cd{text-align:center;}.r{color:#F00;}.g{color:#0F0;}.bn{background: transparent;border: 0 none;text-align: left;padding-left: 0;}</style></head><body><div class="n" style="padding: 6px 4px;"><a href="https://weibo.cn/?tf=5_009" class="nl">首页<span class="tk">!</span></a>|<a href="https://weibo.cn/msg/?tf=5_010" class="nl">消息</a>|<a href="https://huati.weibo.cn" class="nl">话题</a>|<a href="https://weibo.cn/search/?tf=5_012" class="nl">搜索</a>|<a href="/1669879400?page=3&amp;rand=7758&amp;p=r" class="nl">刷新</a></div><div class="c tip"><a href="https://m.weibo.cn" id="top" class="tl">手机微博触屏版,点击前往>></a></div><div class="u"><div class="ut">Dear-迪丽热巴的微博&nbsp;<a href="/attention/add?uid=1669879400&amp;rl=0&amp;st=46d484">加关注</a></div><div class="tip2"><span class="tc">微博[1159]</span>&nbsp;<a href="/1669879400/follow">关注[253]</a>&nbsp;<a href="/1669879400/fans">粉丝[70805575]</a>&nbsp;<a href="/attgroup/opening?uid=1669879400">分组[1]</a>&nbsp;<a href="/at/weibo?uid=1669879400">@她的</a></div></div><div class="pmst"><span class="pms">&nbsp;微博&nbsp;</span><span class="pmsl">&nbsp;<a href="/1669879400/photo?tf=6_008">相册</a>&nbsp;</span></div><div class="pms" >全部-<a href="/1669879400?filter=1">原创</a>-<a href="/1669879400?filter=2">图片</a>-<a href="/attgroup/opening?uid=1669879400">分组</a>-<a href="/1669879400/search?f=u&amp;rl=0">筛选</a></div><div class="c" id="M_J3xfm61AZ"><div><span class="ctt">粉色天空、闪耀夜色、浪漫爱意…我把我喜爱的元素和巴黎限定记忆全部定格在这一瓶<a href="https://weibo.cn/pages/100808topic?extparam=YSL%E5%8F%8D%E8%BD%AC%E5%B7%B4%E9%BB%8E&amp;from=feed">#YSL反转巴黎#</a>热爱限定中,第一次与YSL一起合作设计香水,在这<a href="https://weibo.cn/pages/100808topic?extparam=%E6%8B%A6%E4%B8%8D%E4%BD%8F%E7%9A%84%E5%A4%8F%E5%A4%A9&amp;from=feed">#拦不住的夏天#</a>把甜甜的曼陀罗花香送给你们,喜欢吗?💓 </span>&nbsp;[<a href="https://weibo.cn/mblog/picAll/J3xfm61AZ?rl=1">组图共2张</a>]</div><div><a href="https://weibo.cn/mblog/pic/J3xfm61AZ?rl=0"><img src="http://wx4.sinaimg.cn/wap180/63885668ly1gf4iwe1dlwj24as2a1b2d.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J3xfm61AZ&amp;u=63885668ly1gf4iwe1dlwj24as2a1b2d">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J3xfm61AZ/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[692092]</a>&nbsp;<a href="https://weibo.cn/repost/J3xfm61AZ?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J3xfm61AZ?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J3xfm61AZ?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">05月25日 11:13</span></div></div><div class="s"></div><div class="c" id="M_J2CpmFpJ2"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=%E5%B9%B8%E7%A6%8F%E8%A7%A6%E6%89%8B%E5%8F%AF%E5%8F%8A%E5%BC%80%E6%92%AD&amp;from=feed">#幸福触手可及开播#</a><a href="https://weibo.cn/pages/100808topic?extparam=%E5%B9%B8%E7%A6%8F%E8%A7%A6%E6%89%8B%E5%8F%AF%E5%8F%8A&amp;from=feed">#幸福触手可及#</a> 度量自身,方能修炼精彩人生。追梦不易,披荆斩棘。今晚八点<a href="/n/%E6%B9%96%E5%8D%97%E5%8D%AB%E8%A7%86">@湖南卫视</a> 和周放,一起守护梦想,书写初夏。 </span></div><div><a href="https://weibo.cn/mblog/pic/J2CpmFpJ2?rl=0"><img src="http://wx1.sinaimg.cn/wap180/63885668gy1gexjyzy4jbj227d3344qr.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J2CpmFpJ2&amp;u=63885668gy1gexjyzy4jbj227d3344qr">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J2CpmFpJ2/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[537887]</a>&nbsp;<a href="https://weibo.cn/repost/J2CpmFpJ2?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J2CpmFpJ2?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[101799]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J2CpmFpJ2?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">05月19日 10:31</span></div></div><div class="s"></div><div class="c" id="M_J2BPExOpw"><div><span class="ctt">很高兴成为力士大中华区沐浴系列代言人,520就要到啦,大家快来接收告白福利哦!全新植萃泡泡沐浴露让每一位小仙女都能在浓密泡泡浴中拥有夏日嫩白肌,仙气香气都十足!关注<a href="/n/LUX%E5%8A%9B%E5%A3%AB">@LUX力士</a> 第一时间锁定新品哦!<a href="https://m.weibo.cn/s/video/show?object_id=1034:4506086666076198&amp;fromWap=1">LUX力士的微博视频</a> </span>&nbsp;<br /><a href="https://weibo.cn/attitude/J2BPExOpw/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[741402]</a>&nbsp;<a href="https://weibo.cn/repost/J2BPExOpw?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J2BPExOpw?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J2BPExOpw?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">05月19日 09:03</span></div></div><div class="s"></div><div class="c" id="M_J24zxbSsA"><div><span class="cmt">转发了&nbsp;<a href="https://weibo.cn/u/7288464772">电视剧幸福触手可及</a><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5337.gif" alt="V"/><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/>&nbsp;的微博:</span><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=%E5%B9%B8%E7%A6%8F%E8%A7%A6%E6%89%8B%E5%8F%AF%E5%8F%8A&amp;from=feed">#幸福触手可及#</a><a href="https://weibo.cn/pages/100808topic?extparam=%E5%B9%B8%E7%A6%8F%E8%A7%A6%E6%89%8B%E5%8F%AF%E5%8F%8A%E5%AE%9A%E6%A1%A30519&amp;from=feed">#幸福触手可及定档0519#</a> 从没有一个时刻,幸福如此靠近,只因有你在身边<img alt="[心]" src="//h5.sinaimg.cn/m/emoticon/icon/others/l_xin-6912791858.png" style="width:1em; height:1em;" /></span><a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA6AeIVWe&amp;ep=J24zxbSsA%2C1669879400%2CJ24pYpDpY%2C7288464772">5月19日20:00锁定@湖南卫视</a>  金鹰独播剧场,<a href="/n/%E4%BC%98%E9%85%B7">@优酷</a> <a href="/n/%E7%88%B1%E5%A5%87%E8%89%BA">@爱奇艺</a> <a href="/n/%E8%85%BE%E8%AE%AF%E8%A7%86%E9%A2%91">@腾讯视频</a> 24点同步更新,等你解锁初夏甜梦! </span></div><div><a href="https://weibo.cn/mblog/pic/J24pYpDpY?rl=0"><img src="http://wx4.sinaimg.cn/wap180/007XfEDqly1getehjsrm4j33c41o0hdv.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J24pYpDpY&amp;u=007XfEDqly1getehjsrm4j33c41o0hdv">原图</a>&nbsp;<span class="cmt">赞[129675]</span>&nbsp;<span class="cmt">原文转发[332651]</span>&nbsp;<a href="https://weibo.cn/comment/J24pYpDpY?rl=0#cmtfrm" class="cc">原文评论[6900]</a><!----></div><div><span class="cmt">转发理由:</span><a href="https://weibo.cn/pages/100808topic?extparam=%E5%B9%B8%E7%A6%8F%E8%A7%A6%E6%89%8B%E5%8F%AF%E5%8F%8A%E5%AE%9A%E6%A1%A30519&amp;from=feed">#幸福触手可及定档0519#</a> 唯有热爱,不负韶华,为之全力以赴,才能成为更优秀的人。<a href="https://weibo.cn/sinaurl?f=w&amp;u=http%3A%2F%2Ft.cn%2FA6AeJN3R&amp;ep=J24zxbSsA%2C1669879400%2CJ24zxbSsA%2C1669879400">5月19日20:00锁定湖南卫视#幸福触手可及#</a>  ,愈挫愈勇的独立设计师周放来啦。&nbsp;&nbsp;<br /><a href="https://weibo.cn/attitude/J24zxbSsA/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[480418]</a>&nbsp;<a href="https://weibo.cn/repost/J24zxbSsA?uid=1669879400&amp;rl=0">转发[57012]</a>&nbsp;<a href="https://weibo.cn/comment/J24zxbSsA?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[40966]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J24zxbSsA?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">05月15日 20:23</span></div></div><div class="s"></div><div class="c" id="M_J20Muott9"><div><span class="ctt">哈哈哈哈哈哈👅 </span></div><div><a href="https://weibo.cn/mblog/pic/J20Muott9?rl=0"><img src="http://wx4.sinaimg.cn/wap180/63885668ly1gesxtznhuxj21o0280b2a.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J20Muott9&amp;u=63885668ly1gesxtznhuxj21o0280b2a">原图</a>&nbsp;<br /><a href="https://weibo.cn/attitude/J20Muott9/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[1136574]</a>&nbsp;<a href="https://weibo.cn/repost/J20Muott9?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J20Muott9?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J20Muott9?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">05月15日 10:44</span></div></div><div class="s"></div><div class="c" id="M_J20CL0m1Z"><div><span class="cmt">转发了&nbsp;<a href="https://weibo.cn/u/5980037952">北京2022年冬奥会</a><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5337.gif" alt="V"/><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/>&nbsp;的微博:</span><span class="ctt">【爱豆喊你来助力<a href="https://weibo.cn/pages/100808topic?extparam=%E5%8C%97%E4%BA%AC2022&amp;from=feed">#北京2022#</a>】<br/>花样滑冰,旋转跳跃 ,“迪丽”前行 <a href="/n/Dear-%E8%BF%AA%E4%B8%BD%E7%83%AD%E5%B7%B4">@Dear-迪丽热巴</a>  <a href="https://m.weibo.cn/s/video/show?object_id=1034:4504796736978981&amp;fromWap=1">北京2022年冬奥会的微博视频</a> </span>&nbsp;<span class="cmt">赞[680201]</span>&nbsp;<span class="cmt">原文转发[1450782]</span>&nbsp;<a href="https://weibo.cn/comment/J20ur0QeN?rl=0#cmtfrm" class="cc">原文评论[50694]</a><!----></div><div><span class="cmt">转发理由:</span>与我一起,关注花样滑冰,为中国健儿鼓劲加油[加油]&nbsp;&nbsp;<br /><a href="https://weibo.cn/attitude/J20CL0m1Z/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[501777]</a>&nbsp;<a href="https://weibo.cn/repost/J20CL0m1Z?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J20CL0m1Z?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J20CL0m1Z?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">05月15日 10:20</span></div></div><div class="s"></div><div class="c" id="M_J1kt05kM4"><div><span class="cmt">转发了&nbsp;<a href="https://weibo.cn/yangshiwangnet">央视网</a><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5337.gif" alt="V"/><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/>&nbsp;的微博:</span><span class="ctt">【想看看战疫一线医护人员们的脸!<a href="https://weibo.cn/pages/100808topic?extparam=%E6%9E%81%E9%99%90%E6%8C%91%E6%88%98%E8%87%B4%E6%95%AC%E5%8C%BB%E6%8A%A4%E4%BA%BA%E5%91%98&amp;from=feed">#极限挑战致敬医护人员#</a>】脱下防疫服,援鄂人员们原来是这个模样。八位医护人员集体分享支援一线的故事,是他们为后方的我们竖起了最坚实的屏障,感谢这群医护天使的负重前行,致敬!<a href="/n/%E5%A4%AE%E8%A7%86%E7%BD%91%E9%9D%92%E5%B9%B4">@央视网青年</a> <a href="/n/%E9%9B%B7%E4%BD%B3%E9%9F%B3">@雷佳音</a> <a href="/n/%E5%B2%B3%E4%BA%91%E9%B9%8F">@岳云鹏</a> <a href="/n/%E6%BC%94%E5%91%98%E7%8E%8B%E8%BF%85">@演员王迅</a> <a href="/n/%E8%B4%BE%E4%B9%83%E4%BA%AE">@贾乃亮</a> <a href="/n/%E5%8A%AA%E5%8A%9B%E5%8A%AA%E5%8A%9B%E5%86%8D%E5%8A%AA%E5%8A%9Bx">@努力努力再努力x</a> <a href="/n/Dear-%E8%BF%AA%E4%B8%BD%E7%83%AD%E5%B7%B4">@Dear-迪丽热巴</a>...<a href='/comment/J1jyI3vZq?ckAll=1'>全文</a></span>&nbsp;<span class="cmt">赞[364004]</span>&nbsp;<span class="cmt">原文转发[1056354]</span>&nbsp;<a href="https://weibo.cn/comment/J1jyI3vZq?rl=0#cmtfrm" class="cc">原文评论[3645]</a><!----></div><div><span class="cmt">转发理由:</span><a href="https://weibo.cn/pages/100808topic?extparam=%E6%9E%81%E9%99%90%E6%8C%91%E6%88%98&amp;from=feed">#极限挑战#</a> 感谢你们的守护,最美的逆行者们<img alt="[心]" src="//h5.sinaimg.cn/m/emoticon/icon/others/l_xin-6912791858.png" style="width:1em; height:1em;" /></span>&nbsp;&nbsp;<br /><a href="https://weibo.cn/attitude/J1kt05kM4/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[571256]</a>&nbsp;<a href="https://weibo.cn/repost/J1kt05kM4?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J1kt05kM4?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[362127]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J1kt05kM4?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">05月10日 23:01</span></div></div><div class="s"></div><div class="c" id="M_J1i2D3wbq"><div><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=%E6%9E%81%E9%99%90%E6%8C%91%E6%88%98&amp;from=feed">#极限挑战#</a> 无奖填词竞答,今晚看👉登峰造_,不可_量,百里_一,南征北_~ <a href="https://m.weibo.cn/s/video/show?object_id=1034:4503076439261200&amp;fromWap=1">Dear-迪丽热巴的微博视频</a> </span>&nbsp;<br /><a href="https://weibo.cn/attitude/J1i2D3wbq/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[731516]</a>&nbsp;<a href="https://weibo.cn/repost/J1i2D3wbq?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J1i2D3wbq?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J1i2D3wbq?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">05月10日 16:50</span></div></div><div class="s"></div><div class="c" id="M_J04yojRft"><div><span class="cmt">转发了&nbsp;<a href="https://weibo.cn/chinayouthdaily">中国青年报</a><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5337.gif" alt="V"/><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/>&nbsp;的微博:</span><span class="ctt"><a href="https://weibo.cn/pages/100808topic?extparam=%E4%BA%94%E5%9B%9B%E8%87%B4%E6%95%AC%E6%88%98%E7%96%AB%E9%9D%92%E5%B9%B4&amp;from=feed">#五四致敬战疫青年#</a> <a href="https://weibo.cn/pages/100808topic?extparam=%E9%9D%92%E6%98%A5%E4%B8%87%E5%B2%81&amp;from=feed">#青春万岁#</a>各地应急响应级别陆续下调,我们正在走向痊愈。回望这些年轻医务人员的脸,不应忘记,正是他们在危难之下,白衣执甲,毅然逆行,为我们筑起血肉长城。感恩提灯天使,致敬最可爱的人!春暖花开,等到疫情完全解除,无论你是从医还是就医,请记住医患之间的休戚与共、唇齿...<a href='/comment/J01SMgPdf?ckAll=1'>全文</a></span>&nbsp;[<a href="https://weibo.cn/mblog/picAll/J01SMgPdf?rl=1">组图共12张</a>]</div><div><a href="https://weibo.cn/mblog/pic/J01SMgPdf?rl=0"><img src="http://wx4.sinaimg.cn/wap180/66eeadffly1gedvuinbs6j20u01dtb2a.jpg" alt="图片" class="ib" /></a>&nbsp;<a href="https://weibo.cn/mblog/oripic?id=J01SMgPdf&amp;u=66eeadffly1gedvuinbs6j20u01dtb2a">原图</a>&nbsp;<span class="cmt">赞[32125]</span>&nbsp;<span class="cmt">原文转发[4801631]</span>&nbsp;<a href="https://weibo.cn/comment/J01SMgPdf?rl=0#cmtfrm" class="cc">原文评论[6975]</a><!----></div><div><span class="cmt">转发理由:</span><a href="https://weibo.cn/pages/100808topic?extparam=%E4%BA%94%E5%9B%9B%E8%87%B4%E6%95%AC%E6%88%98%E7%96%AB%E9%9D%92%E5%B9%B4&amp;from=feed">#五四致敬战疫青年#</a>五四青年节前夕,让我们说一声,<a href="https://weibo.cn/pages/100808topic?extparam=%E8%B0%A2%E8%B0%A2%E4%BD%A0%E4%BF%9D%E6%8A%A4%E4%BA%86%E6%88%91%E4%BB%AC&amp;from=feed">#谢谢你保护了我们#</a>!&nbsp;&nbsp;<br /><a href="https://weibo.cn/attitude/J04yojRft/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[721484]</a>&nbsp;<a href="https://weibo.cn/repost/J04yojRft?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/J04yojRft?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[597487]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/J04yojRft?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">05月02日 16:40</span></div></div><div class="s"></div><div class="c" id="M_IFHwwaVNL"><div><span class="cmt">转发了&nbsp;<a href="https://weibo.cn/dfwsjxtz">东方卫视极限挑战</a><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/5337.gif" alt="V"/><img src="https://h5.sinaimg.cn/upload/2016/05/26/319/donate_btn_s.png" alt="M"/>&nbsp;的微博:</span><span class="ctt">鸡条君目睹了vivo<a href="https://weibo.cn/pages/100808topic?extparam=%E6%9E%81%E9%99%90%E6%8C%91%E6%88%98&amp;from=feed">#极限挑战#</a>第六季首发阵容<a href="/n/%E9%9B%B7%E4%BD%B3%E9%9F%B3">@雷佳音</a> <a href="/n/%E5%B2%B3%E4%BA%91%E9%B9%8F">@岳云鹏</a> <a href="/n/%E6%BC%94%E5%91%98%E7%8E%8B%E8%BF%85">@演员王迅</a> <a href="/n/%E8%B4%BE%E4%B9%83%E4%BA%AE">@贾乃亮</a> <a href="/n/%E5%8A%AA%E5%8A%9B%E5%8A%AA%E5%8A%9B%E5%86%8D%E5%8A%AA%E5%8A%9Bx">@努力努力再努力x</a> <a href="/n/Dear-%E8%BF%AA%E4%B8%BD%E7%83%AD%E5%B7%B4">@Dear-迪丽热巴</a> <a href="/n/%E9%83%AD%E4%BA%AC%E9%A3%9E">@郭京飞</a> <a href="/n/%E9%82%93%E4%BC%A6">@邓伦</a> 集结的整个过程,这就是欢迎新人的方式<img alt="[疑问]" src="//h5.sinaimg.cn/m/emoticon/icon/default/d_yiwen-40a816d206.png" style="width:1em; height:1em;" /></span>说好要相亲相爱的呢😂<a href="https://m.weibo.cn/s/video/show?object_id=1034:4499239728775175&amp;fromWap=1">东方卫视极限挑战的微博视频</a> </span>&nbsp;<span class="cmt">赞[711013]</span>&nbsp;<span class="cmt">原文转发[1409505]</span>&nbsp;<a href="https://weibo.cn/comment/IFGJv3G00?rl=0#cmtfrm" class="cc">原文评论[14761]</a><!----></div><div><span class="cmt">转发理由:</span><a href="https://weibo.cn/pages/100808topic?extparam=%E6%9E%81%E9%99%90%E6%8C%91%E6%88%98&amp;from=feed">#极限挑战#</a>举手之劳,岳岳哥别客气!//<a href="/n/%E5%B2%B3%E4%BA%91%E9%B9%8F">@岳云鹏</a>:<a href="https://weibo.cn/pages/100808topic?extparam=%E6%9E%81%E9%99%90%E6%8C%91%E6%88%98&amp;from=feed">#极限挑战#</a>谢谢热巴<a href="/n/Dear-%E8%BF%AA%E4%B8%BD%E7%83%AD%E5%B7%B4">@Dear-迪丽热巴</a> 给我p图,我这里还有好多库存 <a href="https://ww4.sinaimg.cn/large/68687195gy1gebo93u29jj20m80dnqbr.jpg">查看图片</a>&nbsp;&nbsp;<br /><a href="https://weibo.cn/attitude/IFHwwaVNL/add?uid=3113276555&amp;rl=0&amp;st=46d484">赞[983376]</a>&nbsp;<a href="https://weibo.cn/repost/IFHwwaVNL?uid=1669879400&amp;rl=0">转发[1000000]</a>&nbsp;<a href="https://weibo.cn/comment/IFHwwaVNL?uid=1669879400&amp;rl=0#cmtfrm" class="cc">评论[1000000]</a>&nbsp;<a href="https://weibo.cn/fav/addFav/IFHwwaVNL?rl=0&amp;st=46d484">收藏</a><!---->&nbsp;<span class="ct">04月30日 12:30</span></div></div><div class="s"></div><div class="pa" id="pagelist"><form action="/1669879400" method="post"><div><a href="/1669879400?page=4">下页</a>&nbsp;<a href="/1669879400?page=2">上页</a>&nbsp;<a href="/1669879400">首页</a>&nbsp;<input name="mp" type="hidden" value="117" /><input type="text" name="page" size="2" style='-wap-input-format: "*N"' /><input type="submit" value="跳页" />&nbsp;3/117页</div></form></div><div class="pm"><form action="/search/" method="post"><div><input type="text" name="keyword" value="" size="15" /><input type="submit" name="smblog" value="搜微博" /><input type="submit" name="suser" value="找人" /><br/><span class="pmf"><a href="/search/mblog/?keyword=%E7%A7%91%E6%AF%94%E5%9D%A0%E6%9C%BA%E7%B3%BB%E5%9B%A0%E9%A3%9E%E8%A1%8C%E5%91%98%E8%BF%B7%E5%A4%B1%E6%96%B9%E5%90%91&amp;rl=0" class="k">科比坠机系因飞行员迷失方向</a>&nbsp;<a href="/search/mblog/?keyword=RNG%E6%88%98%E8%83%9CJDG&amp;rl=0" class="k">RNG战胜JDG</a>&nbsp;<a href="/search/mblog/?keyword=%E7%A7%A6%E6%98%8A%E5%9B%A0%E5%A5%B3%E5%84%BF%E8%A2%AB%E6%AC%BA%E8%B4%9F%E8%90%BD%E6%B3%AA&amp;rl=0" class="k">秦昊因女儿被欺负落泪</a>&nbsp;<a href="/search/mblog/?keyword=%E9%82%93%E4%BC%A6%E6%9D%8E%E4%BD%B3%E7%90%A6%E7%9B%B4%E6%92%AD&amp;rl=0" class="k">邓伦李佳琦直播</a>&nbsp;<a href="/search/mblog/?keyword=%E4%B8%BA%E4%BB%9D%E5%8D%93%E5%8A%9E%E7%90%86%E8%99%9A%E5%81%87%E8%BD%AC%E5%AD%A6%E6%89%8B%E7%BB%AD6%E4%BA%BA%E8%A2%AB%E5%A4%84%E7%90%86&amp;rl=0" class="k">为仝卓办理虚假转学手续6人被处理</a></span></div></form></div><div class="cd"><a href="#top"><img src="https://h5.sinaimg.cn/upload/2017/04/27/319/5e990ec2.gif" alt="TOP"/></a></div><div class="pms"><a href="https://weibo.cn">首页<span class="tk">!</span></a>.<a href="https://weibo.cn/topic/240489">反馈</a>.<a href="https://weibo.cn/page/91">帮助</a>.<a  href="http://down.sina.cn/weibo/default/index/soft_id/1/mid/0"  >客户端</a>.<a href="https://weibo.cn/spam/?rl=0&amp;type=3&amp;fuid=1669879400" class="kt">举报</a>.<a href="https://passport.sina.cn/sso/logout?r=https%3A%2F%2Fweibo.cn%2Fpub%2F%3Fvt%3D&amp;entry=mweibo">退出</a></div><div class="c">设置:<a href="https://weibo.cn/account/customize/skin?tf=7_005&amp;st=46d484">皮肤</a>.<a href="https://weibo.cn/account/customize/pic?tf=7_006&amp;st=46d484">图片</a>.<a href="https://weibo.cn/account/customize/pagesize?tf=7_007&amp;st=46d484">条数</a>.<a href="https://weibo.cn/account/privacy/?tf=7_008&amp;st=46d484">隐私</a></div><div class="c">彩版|<a href="https://m.weibo.cn/?tf=7_010">触屏</a>|<a href="https://weibo.cn/page/521?tf=7_011">语音</a></div><div class="b">weibo.cn[06-19 00:47]</div></body></html>


================================================
FILE: tests/testdata/e4d541ecb02253c14abc1d52605fc00d91279df9ac4c1465c85b91b3.html
================================================
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta http-equiv="Cache-Control" content="no-cache"/><meta id="viewport" name="viewport" content="width=device-width,initial-scale=1.0,minimum-scale=1.0, maximum-scale=2.0" /><link rel="shortcut icon" type="image/x-icon" href="https://weibo.cn/favicon.ico"><link rel="icon" sizes="any" mask href="https://h5.sinaimg.cn/upload/2015/05/15/28/WeiboLogoCh.svg" color="black"><meta name="MobileOptimized" content="240"/><title>微博</title><style type="text/css" id="internalStyle">html,body,p,form,div,table,textarea,input,span,select{font-size:12px;word-wrap:break-word;}body{background:#F8F9F9;color:#000;padding:1px;margin:1px;}table,tr,td{border-width:0px;margin:0px;padding:0px;}form{margin:0px;padding:0px;border:0px;}textarea{border:1px solid #96c1e6}textarea{width:95%;}a,.tl{color:#2a5492;text-decoration:underline;}/*a:link {color:#023298}*/.k{color:#2a5492;text-decoration:underline;}.kt{color:#F00;}.ib{border:1px solid #C1C1C1;}.pm,.pmy{clear:both;background:#ffffff;color:#676566;border:1px solid #b1cee7;padding:3px;margin:2px 1px;overflow:hidden;}.pms{clear:both;background:#c8d9f3;color:#666666;padding:3px;margin:0 1px;overflow:hidden;}.pmst{margin-top: 5px;}.pmsl{clear:both;padding:3px;margin:0 1px;overflow:hidden;}.pmy{background:#DADADA;border:1px solid #F8F8F8;}.t{padding:0px;margin:0px;height:35px;}.b{background:#e3efff;text-align:center;color:#2a5492;clear:both;padding:4px;}.bl{color:#2a5492;}.n{clear:both;background:#436193;color:#FFF;padding:4px; margin: 1px;}.nt{color:#b9e7ff;}.nl{color:#FFF;text-decoration:none;}.nfw{clear:both;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.s{border-bottom:1px dotted #666666;margin:3px;clear:both;}.tip{clear:both; background:#c8d9f3;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tip2{color:#000000;padding:2px 3px;clear:both;}.ps{clear:both;background:#FFF;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tm{background:#feffe5;border:1px solid #e6de8d;padding:4px;}.tm a{color:#ba8300;}.tmn{color:#f00}.tk{color:#ffffff}.tc{color:#63676A;}.c{padding:2px 5px;}.c div a img{border:1px solid #C1C1C1;}.ct{color:#9d9d9d;font-style:italic;}.cmt{color:#9d9d9d;}.ctt{color:#000;}.cc{color:#2a5492;}.nk{color:#2a5492;}.por {border: 1px solid #CCCCCC;height:50px;width:50px;}.me{color:#000000;background:#FEDFDF;padding:2px 5px;}.pa{padding:2px 4px;}.nm{margin:10px 5px;padding:2px;}.hm{padding:5px;background:#FFF;color:#63676A;}.u{margin:2px 1px;background:#ffffff;border:1px solid #b1cee7;}.ut{padding:2px 3px;}.cd{text-align:center;}.r{color:#F00;}.g{color:#0F0;}.bn{background: transparent;border: 0 none;text-align: left;padding-left: 0;}</style><script>if(top != self){top.location = self.location;}</script></head><body><div class="n" style="padding: 6px 4px;"><a href="https://weibo.cn/?tf=5_009" class="nl">首页<span class="tk">!</span></a>|<a href="https://weibo.cn/msg/?tf=5_010" class="nl">消息</a>|<a href="/1980768563/photo?tf=6_008&amp;rand=6508&amp;p=r" class="nl">刷新</a></div><div class="c tip"><a href="https://m.weibo.cn" id="top" class="tl">手机微博触屏版,点击前往>></a></div><div class="c">霜叶的相册</div><div class="s"></div><div class="pmst"><span class="pmsl">&nbsp;<a href="/shuangye2012">微博</a>&nbsp;</span><span class="pms">&nbsp;相册&nbsp;</span></div><div class="pms" style="margin: 0;padding: 0;line-height: 3px;">&nbsp;</div><div class="s"></div><div class="c"><table><tr><td><a href="/album/albummblog?fuid=1980768563"><img width="80" height="80" src="//img.t.sinajs.cn/t5/style/images/staticlogo/groups1_3.png?version=74d5a0aee49e3f11" alt="微博配图"/></a></td><td><div class="c"><a href="/album/albummblog?fuid=1980768563">微博配图</a></div></td></tr></table></div><div class="s"></div><div class="c"><table><tr><td><a href="/album/34589831934400230000001980768563?rl=1"><img width="80" height="80" src="http://ss1.sinaimg.cn/wap180/&690" alt='默认专辑'/></a></td><td><div class="c"><a href="/album/34589831934400230000001980768563?rl=1">默认专辑(3张)</a></div></td></tr></table></div><div class="s"></div><div class="c"><table><tr><td><a href="/album/166564740000001980768563?rl=1"><img width="80" height="80" src="https://tvax1.sinaimg.cn/crop.0.0.1080.1080.180/76102133ly8ga961tpte6j20u00u0q65.jpg?KID=imgbed,tva&Expires=1629140012&ssig=fJbL8N5deV" alt='头像相册'/></a></td><td><div class="c"><a href="/album/166564740000001980768563?rl=1">头像相册(4张)</a></div></td></tr></table></div><div class="cd"><a href="#top"><img src="https://h5.sinaimg.cn/upload/2017/04/27/319/5e990ec2.gif" alt="TOP"/></a></div><div class="pms"><a href="https://weibo.cn">首页<span class="tk">!</span></a>.<a href="https://weibo.cn/topic/240489">反馈</a>.<a href="https://weibo.cn/page/91">帮助</a>.<a  href="https://c.weibo.cn"  >客户端</a>.<a href="https://weibo.cn/spam/?rl=1&amp;type=3&amp;fuid=1980768563" class="kt">举报</a>.<a href="https://weibo.cn/logout">退出</a></div><div class="c">设置:<a href="https://weibo.cn/account/customize/skin?tf=7_005&amp;st=81505d">皮肤</a>.<a href="https://weibo.cn/account/customize/pic?tf=7_006&amp;st=81505d">图片</a>.<a href="https://weibo.cn/account/customize/pagesize?tf=7_007&amp;st=81505d">条数</a>.<a href="https://weibo.cn/account/privacy/?tf=7_008&amp;st=81505d">隐私</a></div><div class="b"><a href="https://beian.miit.gov.cn" target="_blank">京ICP备12002058号-1</a> [08-16 23:53]</div></body></html>

================================================
FILE: tests/testdata/e97222acd5bc7d8d1bfbd3f352f8cad3e36fdd19e40b69e1c33fb3c3.html
================================================
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><meta http-equiv="Cache-Control" content="no-cache"/><meta id="viewport" name="viewport" content="width=device-width,initial-scale=1.0,minimum-scale=1.0, maximum-scale=2.0" /><link rel="icon" sizes="any" mask href="https://h5.sinaimg.cn/upload/2015/05/15/28/WeiboLogoCh.svg" color="black"><meta name="MobileOptimized" content="240"/><title>微博</title><style type="text/css" id="internalStyle">html,body,p,form,div,table,textarea,input,span,select{font-size:12px;word-wrap:break-word;}body{background:#F8F9F9;color:#000;padding:1px;margin:1px;}table,tr,td{border-width:0px;margin:0px;padding:0px;}form{margin:0px;padding:0px;border:0px;}textarea{border:1px solid #96c1e6}textarea{width:95%;}a,.tl{color:#2a5492;text-decoration:underline;}/*a:link {color:#023298}*/.k{color:#2a5492;text-decoration:underline;}.kt{color:#F00;}.ib{border:1px solid #C1C1C1;}.pm,.pmy{clear:both;background:#ffffff;color:#676566;border:1px solid #b1cee7;padding:3px;margin:2px 1px;overflow:hidden;}.pms{clear:both;background:#c8d9f3;color:#666666;padding:3px;margin:0 1px;overflow:hidden;}.pmst{margin-top: 5px;}.pmsl{clear:both;padding:3px;margin:0 1px;overflow:hidden;}.pmy{background:#DADADA;border:1px solid #F8F8F8;}.t{padding:0px;margin:0px;height:35px;}.b{background:#e3efff;text-align:center;color:#2a5492;clear:both;padding:4px;}.bl{color:#2a5492;}.n{clear:both;background:#436193;color:#FFF;padding:4px; margin: 1px;}.nt{color:#b9e7ff;}.nl{color:#FFF;text-decoration:none;}.nfw{clear:both;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.s{border-bottom:1px dotted #666666;margin:3px;clear:both;}.tip{clear:both; background:#c8d9f3;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tip2{color:#000000;padding:2px 3px;clear:both;}.ps{clear:both;background:#FFF;color:#676566;border:1px solid #BACDEB;padding:3px;margin:2px 1px;}.tm{background:#feffe5;border:1px solid #e6de8d;padding:4px;}.tm a{color:#ba8300;}.tmn{color:#f00}.tk{color:#ffffff}.tc{color:#63676A;}.c{padding:2px 5px;}.c div a img{border:1px solid #C1C1C1;}.ct{color:#9d9d9d;font-style:italic;}.cmt{color:#9d9d9d;}.ctt{color:#000;}.cc{color:#2a5492;}.nk{color:#2a5492;}.por {border: 1px solid #CCCCCC;height:50px;width:50px;}.me{color:#000000;background:#FEDFDF;padding:2px 5px;}.pa{padding:2px 4px;}.nm{margin:10px 5px;padding:2px;}.hm{padding:5px;background:#FFF;color:#63676A;}.u{margin:2px 1px;background:#ffffff;border:1px solid #b1cee7;}.ut{padding:2px 3px;}.cd{text-align:center;}.r{color:#F00;}.g{color:#0F0;}.bn{background: transparent;border: 0 none;text-align: left;padding-left: 0;}</style></head><body><div class="c"><a href="/1669879400?rand=483937">返回</a></div><div class="c"><a href="/mblog/pic/J6k49kbTc?picId=63885668ly3gfppuxsc3lj216n1kwqv5&amp;rl=1"><img src="http://ww1.sinaimg.cn/thumb180/63885668ly3gfppuxsc3lj216n1kwqv5.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">1/2</span>&nbsp;<a href="/mblog/oripic?id=J6k49kbTc&amp;u=63885668ly3gfppuxsc3lj216n1kwqv5&amp;rl=1">原图</a><br/></div><div class="c"><a href="/mblog/pic/J6k49kbTc?picId=63885668ly3gfppuxthecj216o1kwqv5&amp;rl=1"><img src="http://ww3.sinaimg.cn/thumb180/63885668ly3gfppuxthecj216o1kwqv5.jpg" alt="图片加载中..." /></a>&nbsp;<span class="tc">2/2</span>&nbsp;<a href="/mblog/oripic?id=J6k49kbTc&amp;u=63885668ly3gfppuxthecj216o1kwqv5&amp;rl=1">原图</a><br/></div><div class="c"><a href="/1669879400?rand=483937">返回</a></div></body></html>


================================================
FILE: tests/testdata/url_map.json
================================================
{
    "https://weibo.cn/1669879400/profile": "tests/testdata/a4437630f3bdfa2757bae1595186ac063fe5ec25cf2f98116ece83cb.html",
    "https://weibo.cn/1669879400/info": "tests/testdata/ca5f2a555e8d62f728c66fa90afb2d54d19f8c898e164204a61bdf03.html",
    "https://weibo.cn/1669879400/profile?page=1": "tests/testdata/4957814af5a123b82e974b5537dea736dfb34e48d8835203a45d2e67.html",
    "https://weibo.cn/mblog/picAll/J6k49kbTc?rl=1": "tests/testdata/e97222acd5bc7d8d1bfbd3f352f8cad3e36fdd19e40b69e1c33fb3c3.html",
    "https://weibo.cn/mblog/picAll/J5ZcSnCAg?rl=1": "tests/testdata/63a98849ec82b2c87ec55bca03cbf5988f7eac233a23d86b4fdf5ffd.html",
    "https://weibo.cn/1669879400/profile?page=2": "tests/testdata/2f62165fa3ca1e85e0d398d385c377a068b76eb95765f7020ffffd3e.html",
    "https://weibo.cn/1669879400/profile?page=3": "tests/testdata/d486235d4a17dd0accb0f2cc77b3648abfa03580b9e0cdb61f1e618f.html",
    "https://weibo.cn/mblog/picAll/J3xfm61AZ?rl=1": "tests/testdata/76233b3f90394581aac6f19cfa5d674a610e8b442b1f83de7673ab49.html",
    "https://weibo.cn/comment/J5cVGuUNq": "tests/testdata/4d5ed0a3ebd0303cb45edd544dbc0ab5e86d43e103405f0c60515884.html",
    "https://weibo.cn/1980768563/photo?tf=6_008": "tests/testdata/e4d541ecb02253c14abc1d52605fc00d91279df9ac4c1465c85b91b3.html",    
    "https://weibo.cn/album/166564740000001980768563?rl=1": "tests/testdata/b541fd1751117498b6d6f40d3321686ddf871651237c4ac854a5c3eb.html"
}


================================================
FILE: weibo_spider/__init__.py
================================================


================================================
FILE: weibo_spider/__main__.py
================================================
import os
import sys

from absl import app
sys.path.append(os.path.abspath(os.path.dirname(os.getcwd())))
from weibo_spider.spider import main

app.run(main)


================================================
FILE: weibo_spider/config_sample.json
================================================
{
    "user_id_list": ["1669879400"],
    "filter": 1,
    "since_date": "2018-01-01",
    "end_date": "now",
    "random_wait_pages": [1, 5],
    "random_wait_seconds": [6, 10],
    "global_wait": [[1000, 3600], [500, 2000]],
    "write_mode": ["csv", "txt"],
    "pic_download": 1,
    "video_download": 1,
	"file_download_timeout": [5, 5, 10],
	"result_dir_name": 0,
    "cookie": "your cookie",
    "mysql_config": {
        "host": "localhost",
        "port": 3306,
        "user": "root",
        "password": "123456",
        "charset": "utf8mb4"
    },
    "kafka_config": {
        "bootstrap-server": "127.0.0.1:9092",
        "weibo_topics": ["spider_weibo"],
        "user_topics": ["spider_weibo"]
    },
    "sqlite_config": "weibo.db",
    "mongo_config": {
        "connection_string": "mongodb://admin:password@localhost:27017/weibo",
        "dba_name": "",
        "dba_password": ""
    },
    "post_config": {
        "api_url": "",
        "api_token": ""
    }
}


================================================
FILE: weibo_spider/config_util.py
================================================
import codecs
import logging
import os
import sys
import browser_cookie3
from datetime import datetime
import json

logger = logging.getLogger('spider.config_util')


def _is_date(date_str):
    """判断日期格式是否正确"""
    try:
        if ':' in date_str:
            datetime.strptime(date_str, '%Y-%m-%d %H:%M')
        else:
            datetime.strptime(date_str, '%Y-%m-%d')
        return True
    except ValueError:
        return False


def validate_config(config):
    """验证配置是否正确"""

    # 验证filter、pic_download、video_download
    argument_list = ['filter', 'pic_download', 'video_download']
    for argument in argument_list:
        if config[argument] != 0 and config[argument] != 1:
            logger.warning(u'%s值应为0或1,请重新输入', config[argument])
            sys.exit()

    # 验证since_date
    since_date = config['since_date']
    if (not _is_date(str(since_date))) and (not isinstance(since_date, int)):
        logger.warning(u'since_date值应为yyyy-mm-dd形式或整数,请重新输入')
        sys.exit()

    # 验证end_date
    end_date = str(config['end_date'])
    if (not _is_date(end_date)) and (end_date != 'now'):
        logger.warning(u'end_date值应为yyyy-mm-dd形式或"now",请重新输入')
        sys.exit()

    # 验证random_wait_pages
    random_wait_pages = config['random_wait_pages']
    if not isinstance(random_wait_pages, list):
        logger.warning(u'random_wait_pages参数值应为list类型,请重新输入')
        sys.exit()
    if (not isinstance(min(random_wait_pages), int)) or (not isinstance(
            max(random_wait_pages), int)):
        logger.warning(u'random_wait_pages列表中的值应为整数类型,请重新输入')
        sys.exit()
    if min(random_wait_pages) < 1:
        logger.warning(u'random_wait_pages列表中的值应大于0,请重新输入')
        sys.exit()

    # 验证random_wait_seconds
    random_wait_seconds = config['random_wait_seconds']
    if not isinstance(random_wait_seconds, list):
        logger.warning(u'random_wait_seconds参数值应为list类型,请重新输入')
        sys.exit()
    if (not isinstance(min(random_wait_seconds), int)) or (not isinstance(
            max(random_wait_seconds), int)):
        logger.warning(u'random_wait_seconds列表中的值应为整数类型,请重新输入')
        sys.exit()
    if min(random_wait_seconds) < 1:
        logger.warning(u'random_wait_seconds列表中的值应大于0,请重新输入')
        sys.exit()

    # 验证global_wait
    global_wait = config['global_wait']
    if not isinstance(global_wait, list):
        logger.warning(u'global_wait参数值应为list类型,请重新输入')
        sys.exit()
    for g in global_wait:
        if not isinstance(g, list):
            logger.warning(u'global_wait参数内的值应为长度为2的list类型,请重新输入')
            sys.exit()
        if len(g) != 2:
            logger.warning(u'global_wait参数内的list长度应为2,请重新输入')
            sys.exit()
        for i in g:
            if (not isinstance(i, int)) or i < 1:
                logger.warning(u'global_wait列表中的值应为大于0的整数,请重新输入')
                sys.exit()

    # 验证write_mode
    write_mode = ['txt', 'csv', 'json', 'mongo', 'mysql', 'sqlite', 'kafka','post']
    if not isinstance(config['write_mode'], list):
        logger.warning(u'write_mode值应为list类型')
        sys.exit()
    for mode in config['write_mode']:
        if mode not in write_mode:
            logger.warning(
                u'%s为无效模式,请从txt、csv、json、post、mongo、sqlite, kafka和mysql中挑选一个或多个作为write_mode',
                mode)
            sys.exit()

    # 验证user_id_list
    user_id_list = config['user_id_list']
    if (not isinstance(user_id_list,
                       list)) and (not user_id_list.endswith('.txt')):
        logger.warning(u'user_id_list值应为list类型或txt文件路径')
        sys.exit()
    if not isinstance(user_id_list, list):
        if not os.path.isabs(user_id_list):
            user_id_list = os.getcwd() + os.sep + user_id_list
        if not os.path.isfile(user_id_list):
            logger.warning(u'不存在%s文件', user_id_list)
            sys.exit()


def get_user_config_list(file_name, default_since_date):
    """获取文件中的微博id信息"""
    with open(file_name, 'rb') as f:
        try:
            lines = f.read().splitlines()
            lines = [line.decode('utf-8-sig') for line in lines]
        except UnicodeDecodeError:
            logger.error(u'%s文件应为utf-8编码,请先将文件编码转为utf-8再运行程序', file_name)
            sys.exit()
        user_config_list = []
        for line in lines:
            info = line.split(' ')
            if len(info) > 0 and info[0].isdigit():
                user_config = {}
                user_config['user_uri'] = info[0]
                if len(info) > 2 and _is_date(info[2]):
                    if len(info) > 3 and _is_date(info[2] + ' ' + info[3]):
                        user_config['since_date'] = info[2] + ' ' + info[3]
                    else:
                        user_config['since_date'] = info[2]
                else:
                    user_config['since_date'] = default_since_date
                if user_config not in user_config_list:
                    user_config_list.append(user_config)
    return user_config_list


def update_user_config_file(user_config_file_path, user_uri, nickname,
                            start_time):
    """更新用户配置文件"""
    if not user_config_file_path:
        user_config_file_path = os.getcwd() + os.sep + 'user_id_list.txt'
    with open(user_config_file_path, 'rb') as f:
        lines = f.read().splitlines()
        lines = [line.decode('utf-8-sig') for line in lines]
        for i, line in enumerate(lines):
            info = line.split(' ')
            if len(info) > 0:
                if user_uri == info[0]:
                    if len(info) == 1:
                        info.append(nickname)
                        info.append(start_time)
                    if len(info) == 2:
                        info.append(start_time)
                    if len(info) > 3 and _is_date(info[2] + ' ' + info[3]):
                        del info[3]
                    if len(info) > 2:
                        info[2] = start_time
                    lines[i] = ' '.join(info)
                    break
    with codecs.open(user_config_file_path, 'w', encoding='utf-8') as f:
        f.write('\n'.join(lines))


def add_user_uri_list(user_config_file_path, user_uri_list):
    """向user_id_list.txt文件添加若干user_uri"""
    if not user_config_file_path:
        user_config_file_path = os.getcwd() + os.sep + 'user_id_list.txt'
    if os.path.isfile(user_config_file_path):
        user_uri_list[0] = '\n' + user_uri_list[0]
    with codecs.open(user_config_file_path, 'a', encoding='utf-8') as f:
        f.write('\n'.join(user_uri_list))
      
def get_cookie():
    """Get weibo.cn cookie from Chrome browser"""
    try:
        chrome_cookies = browser_cookie3.chrome(domain_name='weibo.cn')
        cookies_dict = {cookie.name: cookie.value for cookie in chrome_cookies}
        return cookies_dict
    except Exception as e:
        logger.error(u'Failed to obtain weibo.cn cookie from Chrome browser: %s', str(e))
        raise 
    
def update_cookie_config(cookie, user_config_file_path):
    """Update cookie in config.json"""
    if not user_config_file_path:
        user_config_file_path = os.getcwd() + os.sep + 'config.json' 
    try:
        with codecs.open(user_config_file_path, 'r', encoding='utf-8') as f:
            config = json.load(f)
            
        cookie_string = '; '.join(f'{name}={value}' for name, value in cookie.items())
        
        if config['cookie'] != cookie_string:
            config['cookie'] = cookie_string
            with codecs.open(user_config_file_path, 'w', encoding='utf-8') as f:
                json.dump(config, f, indent=4, ensure_ascii=False)
    except Exception as e:
        logger.error(u'Failed to update cookie in config file: %s', str(e))
        raise 
    
def check_cookie(user_config_file_path): 
    """Checks if user is logged in"""
    try:
        cookie = get_cookie()
        if cookie.get("MLOGIN", '0') == '0':
            logger.warning("使用 Chrome 在此登录 %s", "https://passport.weibo.com/sso/signin?entry=wapsso&source=wapssowb&url=https://m.weibo.cn/")
            sys.exit()
        else:
            update_cookie_config(cookie, user_config_file_path)
    except Exception as e:
        logger.error(u'Check for cookie failed: %s', str(e))
        raise 


================================================
FILE: weibo_spider/datetime_util.py
================================================
from datetime import datetime


def str_to_time(text):
    """将字符串转换成时间类型"""
    if ':' in text:
        result = datetime.strptime(text, '%Y-%m-%d %H:%M')
    else:
        result = datetime.strptime(text, '%Y-%m-%d')
    return result


================================================
FILE: weibo_spider/downloader/__init__.py
================================================
from .origin_picture_downloader import OriginPictureDownloader
from .retweet_picture_downloader import RetweetPictureDownloader
from .avatar_picture_downloader import AvatarPictureDownloader
from .video_downloader import VideoDownloader

__all__ = [
    OriginPictureDownloader, RetweetPictureDownloader, AvatarPictureDownloader,
    VideoDownloader
]


================================================
FILE: weibo_spider/downloader/avatar_picture_downloader.py
================================================
import os

from .img_downloader import ImgDownloader


class AvatarPictureDownloader(ImgDownloader):
    def __init__(self, file_dir, file_download_timeout):
        super().__init__(file_dir, file_download_timeout)
        self.describe = u'头像图片'
        self.key = 'avatar_pictures'

    async def handle_download(self, urls, session):
        """处理下载相关操作"""
        file_dir = self.file_dir + os.sep + self.describe
        if not os.path.isdir(file_dir):
            os.makedirs(file_dir)

        for i, url in enumerate(urls):
            index = url.rfind('/')
            file_name = url[index:]
            file_path = file_dir + os.sep + file_name
            await self.download_one_file(url, file_path, 'xxx', session)

================================================
FILE: weibo_spider/downloader/downloader.py
================================================
# -*- coding: UTF-8 -*-
import asyncio
import logging
import os
import sys
import random
from abc import ABC, abstractmethod

import aiohttp
from tqdm import tqdm

logger = logging.getLogger('spider.downloader')


class Downloader(ABC):
    def __init__(self, file_dir, file_download_timeout):
        self.file_dir = file_dir
        self.describe = ''
        self.key = ''
        self.file_download_timeout = [5, 5, 10]
        if (isinstance(file_download_timeout, list)
                and len(file_download_timeout) == 3):
            for i in range(3):
                v = file_download_timeout[i]
                if isinstance(v, (int, float)) and v > 0:
                    self.file_download_timeout[i] = v

    @abstractmethod
    async def handle_download(self, urls, w, session):
        """下载 urls 里所指向的图片或视频文件,使用 w 里的信息来生成文件名"""
        pass

    async def download_one_file(self, url, file_path, weibo_id, session):
        """下载单个文件(图片/视频)"""
        try:
            if not os.path.isfile(file_path):
                # 随机延时,模拟人工操作
                await asyncio.sleep(random.uniform(0.5, 1.5))
                
                # Retry logic
                retries = self.file_download_timeout[0]
                timeout = aiohttp.ClientTimeout(
                    connect=self.file_download_timeout[1],
                    total=self.file_download_timeout[2]
                )
                
                last_exception = None
                for _ in range(retries + 1):
                    try:
                        async with session.get(url, timeout=timeout) as response:
                            if response.status == 200:
                                content = await response.read()
                                with open(file_path, 'wb') as f:
                                    f.write(content)
                                break
                    except Exception as e:
                        last_exception = e
                else:
                    if last_exception:
                        raise last_exception

            return os.path.isfile(file_path)
        except Exception as e:
            error_file = self.file_dir + os.sep + 'not_downloaded.txt'
            with open(error_file, 'ab') as f:
                url = weibo_id + ':' + file_path + ':' + url + '\n'
                f.write(url.encode(sys.stdout.encoding))
            logger.exception(e)
            return False

    async def download_files(self, weibos, session):
        """下载文件(图片/视频)"""
        try:
            logger.info(u'即将进行%s下载', self.describe)
            for w in tqdm(weibos, desc='Download progress'):
                if getattr(w, self.key) != u'无':
                    await self.handle_download(getattr(w, self.key), w, session)
            logger.info(u'%s下载完毕,保存路径:', self.describe)
            logger.info(self.file_dir)
        except Exception as e:
            logger.exception(e)


================================================
FILE: weibo_spider/downloader/img_downloader.py
================================================
import os

from .downloader import Downloader


class ImgDownloader(Downloader):
    def __init__(self, file_dir, file_download_timeout):
        super().__init__(file_dir, file_download_timeout)
        self.describe = u'图片'
        self.key = ''

    async def handle_download(self, urls, w, session):
        """处理下载相关操作"""
        file_prefix = w.publish_time[:10].replace('-', '') + '_' + w.id
        file_dir = self.file_dir + os.sep + self.describe
        if not os.path.isdir(file_dir):
            os.makedirs(file_dir)
        media_key = self.key or 'original_pictures'
        if ',' in urls:
            url_list = urls.split(',')
            for i, url in enumerate(url_list):
                index = url.rfind('.')
                if len(url) - index >= 5:
                    file_suffix = '.jpg'
                else:
                    file_suffix = url[index:]
                file_name = file_prefix + '_' + str(i + 1) + file_suffix
                file_path = file_dir + os.sep + file_name
                ok = await self.download_one_file(url, file_path, w.id, session)
                if ok:
                    w.media.setdefault(media_key, []).append({
                        'url': url,
                        'path': file_path
                    })
        else:
            index = urls.rfind('.')
            if len(urls) - index > 5:
                file_suffix = '.jpg'
            else:
                file_suffix = urls[index:]
            file_name = file_prefix + file_suffix
            file_path = file_dir + os.sep + file_name
            ok = await self.download_one_file(urls, file_path, w.id, session)
            if ok:
                w.media.setdefault(media_key, []).append({
                    'url': urls,
                    'path': file_path
                })


================================================
FILE: weibo_spider/downloader/origin_picture_downloader.py
================================================
from .img_downloader import ImgDownloader


class OriginPictureDownloader(ImgDownloader):
    def __init__(self, file_dir, file_download_timeout):
        super().__init__(file_dir, file_download_timeout)
        self.describe = u'原创微博图片'
        self.key = 'original_pictures'


================================================
FILE: weibo_spider/downloader/retweet_picture_downloader.py
================================================
from .img_downloader import ImgDownloader


class RetweetPictureDownloader(ImgDownloader):
    def __init__(self, file_dir, file_download_timeout):
        super().__init__(file_dir, file_download_timeout)
        self.describe = u'转发微博图片'
        self.key = 'retweet_pictures'


================================================
FILE: weibo_spider/downloader/video_downloader.py
================================================
import os

from .downloader import Downloader


class VideoDownloader(Downloader):
    def __init__(self, file_dir, file_download_timeout):
        super().__init__(file_dir, file_download_timeout)
        self.describe = u'视频'
        self.key = 'video_url'

    async def handle_download(self, urls, w, session):
        """处理下载相关操作"""
        file_prefix = w.publish_time[:10].replace('-', '') + '_' + w.id
        file_suffix = '.mp4'
        file_name = file_prefix + file_suffix
        file_path = self.file_dir + os.sep + file_name
        ok = await self.download_one_file(urls, file_path, w.id, session)
        if ok:
            w.media.setdefault('video', []).append({
                'url': urls,
                'path': file_path
            })


================================================
FILE: weibo_spider/logging.conf
================================================
[loggers]
keys=root,spider

[handlers]
keys=consoleHandler,fileHandler,errorHandler

[formatters]
keys=consoleFormatter,fileFormatter,errorFormatter

[logger_root]
level=DEBUG
handlers=consoleHandler,fileHandler,errorHandler

[logger_spider]
level=DEBUG
handlers=consoleHandler,fileHandler,errorHandler
qualname=spider
propagate=0

[handler_consoleHandler]
class=StreamHandler
level=DEBUG
formatter=consoleFormatter
args=(sys.stdout,)

[handler_fileHandler]
class=handlers.TimedRotatingFileHandler
level=INFO
formatter=fileFormatter
args=('all.log', 'D', 1, 5, 'utf-8', False, False)

[handler_errorHandler]
class=FileHandler
level=WARNING
formatter=errorFormatter
args=('error.log', 'a','utf-8')

[formatter_consoleFormatter]
format=%(message)s

[formatter_fileFormatter]
format=%(asctime)s - %(filename)s - %(levelname)s - %(message)s

[formatter_errorFormatter]
format=%(asctime)s - %(levelname)s - %(filename)s[:%(lineno)d] - %(message)s

================================================
FILE: weibo_spider/parser/__init__.py
================================================
from .index_parser import IndexParser
from .page_parser import PageParser
from .photo_parser import PhotoParser
from .album_parser import AlbumParser

__all__ = [IndexParser, PageParser, PhotoParser, AlbumParser]


================================================
FILE: weibo_spider/parser/album_parser.py
================================================
from .parser import Parser
from .util import handle_html


class AlbumParser(Parser):
    def __init__(self, cookie, album_url):
        self.cookie = cookie
        self.url = album_url
        self.selector = handle_html(self.cookie, self.url)

    def extract_pic_urls(self):
        # <img src="http://wx2.sinaimg.cn/wap180/76102133ly8fwr33wpn8fj20v90v9tbw.jpg" alt="" class="c">
        pic_list = self.selector.xpath('//div[@class="c"]//img/@src')
        for i, pic in enumerate(pic_list):
            if "?" in pic:
                pic = pic[:pic.index("?")]
            pic_list[i] = pic
        return pic_list


================================================
FILE: weibo_spider/parser/comment_parser.py
================================================
import logging
import random
import requests
import re
from time import sleep
from lxml.html import tostring
from lxml.html import fromstring
from lxml import etree
from .parser import Parser
from .util import handle_garbled, handle_html

logger = logging.getLogger('spider.comment_parser')


class CommentParser(Parser):
    def __init__(self, cookie, weibo_id):
        self.cookie = cookie
        self.url = 'https://weibo.cn/comment/' + weibo_id
        self.selector = handle_html(self.cookie, self.url)

    def get_long_weibo(self):
        """获取长原创微博"""
        try:
            for i in range(5):
                self.selector = handle_html(self.cookie, self.url)
                if self.selector is not None:
                    info_div = self.selector.xpath("//div[@class='c' and @id='M_']")[0]
                    info_span = info_div.xpath("//span[@class='ctt']")[0]
                    # 1. 获取 info_span 中的所有 HTML 代码作为字符串
                    html_string = etree.tostring(info_span, encoding='unicode', method='html')
                    # 2. 将 <br> 替换为 \n
                    html_string = html_string.replace('<br>', '\n')
                    # 3. 去掉所有 HTML 标签,但保留标签内的有效文本
                    new_content = fromstring(html_string).text_content()
                    # 4. 替换多个连续的 \n 为一个 \n
                    new_content = re.sub(r'\n+\s*', '\n', new_content)
                    weibo_content = handle_garbled(new_content)
           
Download .txt
gitextract_3r6f4gjt/

├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug-report.md
│   │   ├── failed.md
│   │   ├── feature-request.md
│   │   └── other.md
│   ├── stale.yml
│   └── workflows/
│       └── python-app.yml
├── .gitignore
├── CONTRIBUTING.md
├── README.md
├── docs/
│   ├── FAQ.md
│   ├── academic.md
│   ├── automation.md
│   ├── contributors.md
│   ├── cookie.md
│   ├── example.md
│   ├── known_issues.md
│   ├── settings.md
│   └── userid.md
├── requirements.txt
├── setup.py
├── tests/
│   ├── __init__.py
│   ├── test_downloader_async.py
│   ├── test_parser/
│   │   ├── __init__.py
│   │   ├── test_album_parser.py
│   │   ├── test_comment_parser.py
│   │   ├── test_index_parser.py
│   │   ├── test_info_parser.py
│   │   ├── test_mblog_picAll_parser.py
│   │   ├── test_page_parser.py
│   │   ├── test_photo_parser.py
│   │   └── util.py
│   └── testdata/
│       ├── 2f62165fa3ca1e85e0d398d385c377a068b76eb95765f7020ffffd3e.html
│       ├── 4957814af5a123b82e974b5537dea736dfb34e48d8835203a45d2e67.html
│       ├── 4d5ed0a3ebd0303cb45edd544dbc0ab5e86d43e103405f0c60515884.html
│       ├── 63a98849ec82b2c87ec55bca03cbf5988f7eac233a23d86b4fdf5ffd.html
│       ├── 76233b3f90394581aac6f19cfa5d674a610e8b442b1f83de7673ab49.html
│       ├── a4437630f3bdfa2757bae1595186ac063fe5ec25cf2f98116ece83cb.html
│       ├── b541fd1751117498b6d6f40d3321686ddf871651237c4ac854a5c3eb.html
│       ├── ca5f2a555e8d62f728c66fa90afb2d54d19f8c898e164204a61bdf03.html
│       ├── d486235d4a17dd0accb0f2cc77b3648abfa03580b9e0cdb61f1e618f.html
│       ├── e4d541ecb02253c14abc1d52605fc00d91279df9ac4c1465c85b91b3.html
│       ├── e97222acd5bc7d8d1bfbd3f352f8cad3e36fdd19e40b69e1c33fb3c3.html
│       └── url_map.json
└── weibo_spider/
    ├── __init__.py
    ├── __main__.py
    ├── config_sample.json
    ├── config_util.py
    ├── datetime_util.py
    ├── downloader/
    │   ├── __init__.py
    │   ├── avatar_picture_downloader.py
    │   ├── downloader.py
    │   ├── img_downloader.py
    │   ├── origin_picture_downloader.py
    │   ├── retweet_picture_downloader.py
    │   └── video_downloader.py
    ├── logging.conf
    ├── parser/
    │   ├── __init__.py
    │   ├── album_parser.py
    │   ├── comment_parser.py
    │   ├── index_parser.py
    │   ├── info_parser.py
    │   ├── mblog_picAll_parser.py
    │   ├── page_parser.py
    │   ├── parser.py
    │   ├── photo_parser.py
    │   └── util.py
    ├── spider.py
    ├── user.py
    ├── user_id_list.txt
    ├── weibo.py
    └── writer/
        ├── __init__.py
        ├── csv_writer.py
        ├── json_writer.py
        ├── kafka_writer.py
        ├── mongo_writer.py
        ├── mysql_writer.py
        ├── post_writer.py
        ├── sqlite_writer.py
        ├── txt_writer.py
        └── writer.py
Download .txt
SYMBOL INDEX (159 symbols across 38 files)

FILE: tests/test_downloader_async.py
  class MockWeibo (line 10) | class MockWeibo:
    method __init__ (line 11) | def __init__(self):
  class TestDownloaderAsync (line 17) | class TestDownloaderAsync(unittest.TestCase):
    method setUp (line 18) | def setUp(self):
    method tearDown (line 23) | def tearDown(self):
    method test_img_downloader (line 27) | def test_img_downloader(self):

FILE: tests/test_parser/test_album_parser.py
  function test_album_parser (line 8) | def test_album_parser():

FILE: tests/test_parser/test_comment_parser.py
  function test_comment_parser (line 8) | def test_comment_parser():

FILE: tests/test_parser/test_index_parser.py
  function test_index_parser (line 8) | def test_index_parser():

FILE: tests/test_parser/test_info_parser.py
  function test_info_parser (line 8) | def test_info_parser():

FILE: tests/test_parser/test_mblog_picAll_parser.py
  function test_mblog_picAll_parser (line 8) | def test_mblog_picAll_parser():

FILE: tests/test_parser/test_page_parser.py
  function test_page_parser (line 9) | def test_page_parser():

FILE: tests/test_parser/test_photo_parser.py
  function test_photo_parser (line 9) | def test_photo_parser():

FILE: tests/test_parser/util.py
  function mock_request_get_content (line 8) | def mock_request_get_content(url, headers):

FILE: weibo_spider/config_util.py
  function _is_date (line 12) | def _is_date(date_str):
  function validate_config (line 24) | def validate_config(config):
  function get_user_config_list (line 115) | def get_user_config_list(file_name, default_since_date):
  function update_user_config_file (line 142) | def update_user_config_file(user_config_file_path, user_uri, nickname,
  function add_user_uri_list (line 169) | def add_user_uri_list(user_config_file_path, user_uri_list):
  function get_cookie (line 178) | def get_cookie():
  function update_cookie_config (line 188) | def update_cookie_config(cookie, user_config_file_path):
  function check_cookie (line 206) | def check_cookie(user_config_file_path):

FILE: weibo_spider/datetime_util.py
  function str_to_time (line 4) | def str_to_time(text):

FILE: weibo_spider/downloader/avatar_picture_downloader.py
  class AvatarPictureDownloader (line 6) | class AvatarPictureDownloader(ImgDownloader):
    method __init__ (line 7) | def __init__(self, file_dir, file_download_timeout):
    method handle_download (line 12) | async def handle_download(self, urls, session):

FILE: weibo_spider/downloader/downloader.py
  class Downloader (line 15) | class Downloader(ABC):
    method __init__ (line 16) | def __init__(self, file_dir, file_download_timeout):
    method handle_download (line 29) | async def handle_download(self, urls, w, session):
    method download_one_file (line 33) | async def download_one_file(self, url, file_path, weibo_id, session):
    method download_files (line 71) | async def download_files(self, weibos, session):

FILE: weibo_spider/downloader/img_downloader.py
  class ImgDownloader (line 6) | class ImgDownloader(Downloader):
    method __init__ (line 7) | def __init__(self, file_dir, file_download_timeout):
    method handle_download (line 12) | async def handle_download(self, urls, w, session):

FILE: weibo_spider/downloader/origin_picture_downloader.py
  class OriginPictureDownloader (line 4) | class OriginPictureDownloader(ImgDownloader):
    method __init__ (line 5) | def __init__(self, file_dir, file_download_timeout):

FILE: weibo_spider/downloader/retweet_picture_downloader.py
  class RetweetPictureDownloader (line 4) | class RetweetPictureDownloader(ImgDownloader):
    method __init__ (line 5) | def __init__(self, file_dir, file_download_timeout):

FILE: weibo_spider/downloader/video_downloader.py
  class VideoDownloader (line 6) | class VideoDownloader(Downloader):
    method __init__ (line 7) | def __init__(self, file_dir, file_download_timeout):
    method handle_download (line 12) | async def handle_download(self, urls, w, session):

FILE: weibo_spider/parser/album_parser.py
  class AlbumParser (line 5) | class AlbumParser(Parser):
    method __init__ (line 6) | def __init__(self, cookie, album_url):
    method extract_pic_urls (line 11) | def extract_pic_urls(self):

FILE: weibo_spider/parser/comment_parser.py
  class CommentParser (line 15) | class CommentParser(Parser):
    method __init__ (line 16) | def __init__(self, cookie, weibo_id):
    method get_long_weibo (line 21) | def get_long_weibo(self):
    method get_long_retweet (line 44) | def get_long_retweet(self):
    method get_video_page_url (line 48) | def get_video_page_url(self):

FILE: weibo_spider/parser/index_parser.py
  class IndexParser (line 10) | class IndexParser(Parser):
    method __init__ (line 11) | def __init__(self, cookie, user_uri, selector=None):
    method _get_user_id (line 17) | def _get_user_id(self):
    method get_user (line 30) | def get_user(self):
    method get_user_async (line 46) | async def get_user_async(self, session):
    method get_page_num (line 67) | def get_page_num(self):

FILE: weibo_spider/parser/info_parser.py
  class InfoParser (line 11) | class InfoParser(Parser):
    method __init__ (line 12) | def __init__(self, cookie, user_id, selector=None):
    method extract_user_info (line 17) | def extract_user_info(self):

FILE: weibo_spider/parser/mblog_picAll_parser.py
  class MblogPicAllParser (line 5) | class MblogPicAllParser(Parser):
    method __init__ (line 6) | def __init__(self, cookie, weibo_id):
    method extract_preview_picture_list (line 11) | def extract_preview_picture_list(self):

FILE: weibo_spider/parser/page_parser.py
  class PageParser (line 18) | class PageParser(Parser):
    method __init__ (line 21) | def __init__(self, cookie, user_config, page, filter, selector=None, d...
    method get_one_page (line 68) | def get_one_page(self, weibo_id_list):
    method is_original (line 101) | def is_original(self, info):
    method get_original_weibo (line 109) | def get_original_weibo(self, info, weibo_id):
    method get_retweet (line 124) | def get_retweet(self, info, weibo_id):
    method get_weibo_content (line 152) | def get_weibo_content(self, info, is_original):
    method get_article_url (line 164) | def get_article_url(self, info):
    method get_publish_place (line 174) | def get_publish_place(self, info):
    method get_publish_time (line 198) | def get_publish_time(self, info):
    method get_publish_tool (line 229) | def get_publish_tool(self, info):
    method get_weibo_footer (line 242) | def get_weibo_footer(self, info):
    method get_picture_urls (line 264) | def get_picture_urls(self, info, is_original):
    method get_video_url (line 290) | def get_video_url(self, info):
    method get_one_weibo (line 317) | def get_one_weibo(self, info):
    method extract_picture_urls (line 360) | def extract_picture_urls(self, info, weibo_id):

FILE: weibo_spider/parser/parser.py
  class Parser (line 1) | class Parser:
    method __init__ (line 2) | def __init__(self, cookie):

FILE: weibo_spider/parser/photo_parser.py
  class PhotoParser (line 5) | class PhotoParser(Parser):
    method __init__ (line 6) | def __init__(self, cookie, user_id):
    method extract_avatar_album_url (line 12) | def extract_avatar_album_url(self):

FILE: weibo_spider/parser/util.py
  function hash_url (line 17) | def hash_url(url):
  function handle_html_async (line 21) | async def handle_html_async(cookie, url, session):
  function handle_html (line 50) | def handle_html(cookie, url):
  function handle_garbled (line 78) | def handle_garbled(info):
  function bid2mid (line 94) | def bid2mid(bid):
  function to_video_download_url (line 121) | def to_video_download_url(cookie, video_page_url):
  function string_to_int (line 142) | def string_to_int(string):

FILE: weibo_spider/spider.py
  class Spider (line 38) | class Spider:
    method __init__ (line 39) | def __init__(self, config):
    method write_weibo (line 131) | async def write_weibo(self, weibos):
    method write_user (line 138) | def write_user(self, user):
    method get_user_info (line 143) | async def get_user_info(self, user_uri):
    method download_user_avatar (line 150) | async def download_user_avatar(self, user_uri):
    method get_weibo_info (line 161) | async def get_weibo_info(self):
    method _get_filepath (line 243) | def _get_filepath(self, type):
    method initialize_info (line 264) | def initialize_info(self, user_config):
    method get_one_user (line 331) | async def get_one_user(self, user_config):
    method start (line 358) | async def start(self):
  function _get_config (line 382) | def _get_config():
  function async_main (line 409) | async def async_main(_):
  function main (line 418) | def main(_):

FILE: weibo_spider/user.py
  class User (line 1) | class User:
    method __init__ (line 8) | def __init__(self):
    method to_dict (line 27) | def to_dict(self):
    method __str__ (line 31) | def __str__(self):

FILE: weibo_spider/weibo.py
  class Weibo (line 1) | class Weibo:
    method to_dict (line 9) | def to_dict(self):
    method __init__ (line 13) | def __init__(self):
    method __str__ (line 32) | def __str__(self):

FILE: weibo_spider/writer/csv_writer.py
  class CsvWriter (line 9) | class CsvWriter(Writer):
    method __init__ (line 10) | def __init__(self, file_path, filter):
    method write_user (line 32) | def write_user(self, user):
    method write_weibo (line 35) | def write_weibo(self, weibos):

FILE: weibo_spider/writer/json_writer.py
  class JsonWriter (line 11) | class JsonWriter(Writer):
    method __init__ (line 12) | def __init__(self, file_path):
    method write_user (line 15) | def write_user(self, user):
    method _update_json_data (line 18) | def _update_json_data(self, data, weibo_info):
    method write_weibo (line 43) | def write_weibo(self, weibos):

FILE: weibo_spider/writer/kafka_writer.py
  class KafkaWriter (line 10) | class KafkaWriter(Writer):
    method __init__ (line 11) | def __init__(self, kafka_config):
    method write_weibo (line 28) | def write_weibo(self, weibo):
    method write_user (line 34) | def write_user(self, user):
    method __del__ (line 40) | def __del__(self):

FILE: weibo_spider/writer/mongo_writer.py
  class MongoWriter (line 10) | class MongoWriter(Writer):
    method __init__ (line 11) | def __init__(self, mongo_config):
    method _info_to_mongodb (line 17) | def _info_to_mongodb(self, collection, info_list):
    method write_weibo (line 48) | def write_weibo(self, weibos):
    method write_user (line 57) | def write_user(self, user):

FILE: weibo_spider/writer/mysql_writer.py
  class MySqlWriter (line 10) | class MySqlWriter(Writer):
    method __init__ (line 11) | def __init__(self, mysql_config):
    method _mysql_create (line 20) | def _mysql_create(self, connection, sql):
    method _mysql_create_database (line 28) | def _mysql_create_database(self, sql):
    method _mysql_create_table (line 43) | def _mysql_create_table(self, sql):
    method _mysql_insert (line 49) | def _mysql_insert(self, table, data_list):
    method write_weibo (line 81) | def write_weibo(self, weibos):
    method write_user (line 115) | def write_user(self, user):

FILE: weibo_spider/writer/post_writer.py
  class PostWriter (line 13) | class PostWriter(Writer):
    method __init__ (line 14) | def __init__(self, post_config):
    method write_user (line 20) | def write_user(self, user):
    method _update_json_data (line 23) | def _update_json_data(self, data, weibo_info):
    method send_post_request_with_token (line 32) | def send_post_request_with_token(self, url, data, token, max_retries, ...
    method write_weibo (line 51) | def write_weibo(self, weibos):

FILE: weibo_spider/writer/sqlite_writer.py
  class SqliteWriter (line 10) | class SqliteWriter(Writer):
    method __init__ (line 11) | def __init__(self, sqlite_config):
    method _sqlite_create (line 14) | def _sqlite_create(self, connection, sql):
    method _sqlite_create_table (line 22) | def _sqlite_create_table(self, sql):
    method _sqlite_insert (line 28) | def _sqlite_insert(self, table, data_list):
    method write_weibo (line 53) | def write_weibo(self, weibos):
    method write_user (line 84) | def write_user(self, user):

FILE: weibo_spider/writer/txt_writer.py
  class TxtWriter (line 9) | class TxtWriter(Writer):
    method __init__ (line 10) | def __init__(self, file_path, filter):
    method write_user (line 26) | def write_user(self, user):
    method write_weibo (line 37) | def write_weibo(self, weibo):

FILE: weibo_spider/writer/writer.py
  class Writer (line 4) | class Writer(ABC):
    method __init__ (line 5) | def __init__(self):
    method write_weibo (line 10) | def write_weibo(self, weibo):
    method write_user (line 15) | def write_user(self, user):
Condensed preview — 80 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (276K chars).
[
  {
    "path": ".github/ISSUE_TEMPLATE/bug-report.md",
    "chars": 588,
    "preview": "---\nname: Bug报修\nabout: 向程序开发者申报bug\ntitle: ''\nlabels: bug\nassignees: ''\n\n---\n\n感谢您申报bug,为了表示感谢,如果bug确实存在,您将出现在本项目的贡献者列表里;如"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/failed.md",
    "chars": 399,
    "preview": "---\nname: 程序运行出错\nabout: 运行出错,需要帮助\ntitle: ''\nlabels: failed\nassignees: ''\n\n---\n\n为了更好的解决问题,请认真回答下面的问题。等到问题解决,请及时关闭本issue。\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature-request.md",
    "chars": 146,
    "preview": "---\nname: 新需求或建议\nabout: 建议开发新功能,或虽然没有新需求但对本项目有其它建议\ntitle: ''\nlabels: 'feature'\nassignees: ''\n\n---\n\n- 问:请说明需要什么新功能。\n\n答:\n\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/other.md",
    "chars": 73,
    "preview": "---\nname: 其它问题\nabout: 其它想讨论的问题\ntitle: ''\nlabels: ''\nassignees: ''\n\n---\n\n\n"
  },
  {
    "path": ".github/stale.yml",
    "chars": 889,
    "preview": "# Number of days of inactivity before an issue becomes stale\ndaysUntilStale: 60\n\n# Number of days of inactivity before a"
  },
  {
    "path": ".github/workflows/python-app.yml",
    "chars": 1155,
    "preview": "# This workflow will install Python dependencies, run tests and lint with a single version of Python\n# For more informat"
  },
  {
    "path": ".gitignore",
    "chars": 96,
    "preview": ".vscode \n\n*.pyc\n__pycache__\n\nbuild/\ndist/\n*.egg-info\n\nconfig.json\n\nweibo/\nweibo.db\n*.log\n\n.idea\n"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 1643,
    "preview": "# 为本项目做贡献\n\n本项目使用**Python3**编写,感谢大家对项目的支持,也欢迎大家为开源项目做贡献。鉴于大家拥有不同的技能、经验、认知、时间等,每个人可以根据自身的情况为本项目贡献力量。我们不会因为贡献者写的代码少或者提的建议不好"
  },
  {
    "path": "README.md",
    "chars": 9518,
    "preview": "[![Build Status](https://github.com/dataabc/weiboSpider/workflows/Python%20application/badge.svg)](https://badge.fury.io"
  },
  {
    "path": "docs/FAQ.md",
    "chars": 2563,
    "preview": "# 常见问题\n\n## 1. 程序运行出错,错误提示中包含“ImportError: cannot import name 'config_util' from '__main__'”,如何解决?\n\n出现这种错误,说明使用者很可能是直接运行的"
  },
  {
    "path": "docs/academic.md",
    "chars": 375,
    "preview": "# 学术研究\n\n本项目通过获取微博数据,为写论文、做研究等非商业项目提供所需数据。下面是一些在论文或研究等方面使用过本程序的项目。在一些涉及隐私的描述上,已与研究者做了沟通,在下面的描述中只介绍研究者\n允许展示的部分。如果部分信息研究者之前"
  },
  {
    "path": "docs/automation.md",
    "chars": 2095,
    "preview": "# 定期自动爬取微博(可选)\n\n我们爬取了微博以后,很多微博账号又可能发了一些新微博,定期自动爬取微博就是每隔一段时间自动运行程序,自动爬取这段时间产生的新微博(忽略以前爬过的旧微博)。本部分为可选部分,如果不需要可以忽略。\n\n思路是**利"
  },
  {
    "path": "docs/contributors.md",
    "chars": 1930,
    "preview": "# 贡献者\n\n感谢所有为本项目作出贡献和将要作出贡献的朋友,感谢对开源事业的支持。大家每贡献一行code都让项目功能更丰富,每提一个建议都让程序更完善,每发现一个bug都让代码更健壮。\n\n本项目贡献者包含三部分:主要代码开发者、代码贡献者和"
  },
  {
    "path": "docs/cookie.md",
    "chars": 615,
    "preview": "# 如何获取cookie\n\n1. 用Chrome打开<https://passport.weibo.cn/signin/login>;\n2. 输入微博的用户名、密码,登录,如图所示:\n![weibo log in page](https:/"
  },
  {
    "path": "docs/example.md",
    "chars": 4492,
    "preview": "# 实例\n\n以爬取迪丽热巴的微博为例,我们需要修改**config.json**文件,文件内容如下:\n\n```json\n{\n    \"user_id_list\": [\"1669879400\"],\n    \"filter\": 1,\n    \""
  },
  {
    "path": "docs/known_issues.md",
    "chars": 434,
    "preview": "# 已知问题\n\n该文档列出由于本项目所选用的技术局限而导致的已知的无法或难以在短时间内修复的问题。\n\n## 1. 程序无法爬取同时带有图片和视频的微博\n\n参见:https://github.com/dataabc/weiboSpider/i"
  },
  {
    "path": "docs/settings.md",
    "chars": 6610,
    "preview": "# 程序设置\n\n**源码下载安装**的用户在weiboSpider目录下运行如下命令,**pip安装**的用户在任意有写权限的目录运行如下命令:\n\n```bash\n$ python3 -m weibo_spider\n```\n\n第一次运行会生"
  },
  {
    "path": "docs/userid.md",
    "chars": 1147,
    "preview": "## 如何获取user_id\n\n1. 打开网址<https://weibo.cn>,搜索我们要找的人,如\"迪丽热巴\",进入她的主页;\n   ![user home](https://github.com/dataabc/media/blob"
  },
  {
    "path": "requirements.txt",
    "chars": 82,
    "preview": "lxml\nrequests==2.32.4\ntqdm==4.66.3\nabsl-py==0.12.0\nbrowser_cookie3==0.20.1\naiohttp"
  },
  {
    "path": "setup.py",
    "chars": 787,
    "preview": "import setuptools\n\nwith open('README.md', 'r', encoding='utf-8') as fh:\n    long_description = fh.read()\n\nsetuptools.set"
  },
  {
    "path": "tests/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "tests/test_downloader_async.py",
    "chars": 2190,
    "preview": "import asyncio\nimport unittest\nfrom unittest.mock import MagicMock, AsyncMock, patch\nimport os\nimport shutil\n\nfrom weibo"
  },
  {
    "path": "tests/test_parser/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "tests/test_parser/test_album_parser.py",
    "chars": 754,
    "preview": "from unittest.mock import patch\n\nfrom .util import mock_request_get_content\nfrom weibo_spider.parser.album_parser import"
  },
  {
    "path": "tests/test_parser/test_comment_parser.py",
    "chars": 1013,
    "preview": "from unittest.mock import patch\n\nfrom .util import mock_request_get_content\nfrom weibo_spider.parser.comment_parser impo"
  },
  {
    "path": "tests/test_parser/test_index_parser.py",
    "chars": 521,
    "preview": "from unittest.mock import patch\n\nfrom .util import mock_request_get_content\nfrom weibo_spider.parser.index_parser import"
  },
  {
    "path": "tests/test_parser/test_info_parser.py",
    "chars": 407,
    "preview": "from unittest.mock import patch\n\nfrom .util import mock_request_get_content\nfrom weibo_spider.parser.info_parser import "
  },
  {
    "path": "tests/test_parser/test_mblog_picAll_parser.py",
    "chars": 610,
    "preview": "from unittest.mock import patch\n\nfrom .util import mock_request_get_content\nfrom weibo_spider.parser.mblog_picAll_parser"
  },
  {
    "path": "tests/test_parser/test_page_parser.py",
    "chars": 1323,
    "preview": "from unittest.mock import patch\n\nfrom weibo_spider.parser.page_parser import PageParser\n\nfrom .util import mock_request_"
  },
  {
    "path": "tests/test_parser/test_photo_parser.py",
    "chars": 436,
    "preview": "from unittest.mock import patch\n\nfrom weibo_spider.parser.photo_parser import PhotoParser\n\nfrom .util import mock_reques"
  },
  {
    "path": "tests/test_parser/util.py",
    "chars": 399,
    "preview": "import json\nimport os\nfrom unittest.mock import Mock\n\nfrom weibo_spider.parser.util import TEST_DATA_DIR, URL_MAP_FILE\n\n"
  },
  {
    "path": "tests/testdata/2f62165fa3ca1e85e0d398d385c377a068b76eb95765f7020ffffd3e.html",
    "chars": 18643,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xht"
  },
  {
    "path": "tests/testdata/4957814af5a123b82e974b5537dea736dfb34e48d8835203a45d2e67.html",
    "chars": 18639,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xht"
  },
  {
    "path": "tests/testdata/4d5ed0a3ebd0303cb45edd544dbc0ab5e86d43e103405f0c60515884.html",
    "chars": 15979,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xht"
  },
  {
    "path": "tests/testdata/63a98849ec82b2c87ec55bca03cbf5988f7eac233a23d86b4fdf5ffd.html",
    "chars": 9002,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xht"
  },
  {
    "path": "tests/testdata/76233b3f90394581aac6f19cfa5d674a610e8b442b1f83de7673ab49.html",
    "chars": 3682,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xht"
  },
  {
    "path": "tests/testdata/a4437630f3bdfa2757bae1595186ac063fe5ec25cf2f98116ece83cb.html",
    "chars": 18752,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xht"
  },
  {
    "path": "tests/testdata/b541fd1751117498b6d6f40d3321686ddf871651237c4ac854a5c3eb.html",
    "chars": 5720,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xht"
  },
  {
    "path": "tests/testdata/ca5f2a555e8d62f728c66fa90afb2d54d19f8c898e164204a61bdf03.html",
    "chars": 5582,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xht"
  },
  {
    "path": "tests/testdata/d486235d4a17dd0accb0f2cc77b3648abfa03580b9e0cdb61f1e618f.html",
    "chars": 21566,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xht"
  },
  {
    "path": "tests/testdata/e4d541ecb02253c14abc1d52605fc00d91279df9ac4c1465c85b91b3.html",
    "chars": 5567,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xht"
  },
  {
    "path": "tests/testdata/e97222acd5bc7d8d1bfbd3f352f8cad3e36fdd19e40b69e1c33fb3c3.html",
    "chars": 3660,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xht"
  },
  {
    "path": "tests/testdata/url_map.json",
    "chars": 1428,
    "preview": "{\n    \"https://weibo.cn/1669879400/profile\": \"tests/testdata/a4437630f3bdfa2757bae1595186ac063fe5ec25cf2f98116ece83cb.ht"
  },
  {
    "path": "weibo_spider/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "weibo_spider/__main__.py",
    "chars": 158,
    "preview": "import os\nimport sys\n\nfrom absl import app\nsys.path.append(os.path.abspath(os.path.dirname(os.getcwd())))\nfrom weibo_spi"
  },
  {
    "path": "weibo_spider/config_sample.json",
    "chars": 987,
    "preview": "{\n    \"user_id_list\": [\"1669879400\"],\n    \"filter\": 1,\n    \"since_date\": \"2018-01-01\",\n    \"end_date\": \"now\",\n    \"rando"
  },
  {
    "path": "weibo_spider/config_util.py",
    "chars": 8209,
    "preview": "import codecs\nimport logging\nimport os\nimport sys\nimport browser_cookie3\nfrom datetime import datetime\nimport json\n\nlogg"
  },
  {
    "path": "weibo_spider/datetime_util.py",
    "chars": 237,
    "preview": "from datetime import datetime\n\n\ndef str_to_time(text):\n    \"\"\"将字符串转换成时间类型\"\"\"\n    if ':' in text:\n        result = dateti"
  },
  {
    "path": "weibo_spider/downloader/__init__.py",
    "chars": 352,
    "preview": "from .origin_picture_downloader import OriginPictureDownloader\nfrom .retweet_picture_downloader import RetweetPictureDow"
  },
  {
    "path": "weibo_spider/downloader/avatar_picture_downloader.py",
    "chars": 730,
    "preview": "import os\n\nfrom .img_downloader import ImgDownloader\n\n\nclass AvatarPictureDownloader(ImgDownloader):\n    def __init__(se"
  },
  {
    "path": "weibo_spider/downloader/downloader.py",
    "chars": 2941,
    "preview": "# -*- coding: UTF-8 -*-\nimport asyncio\nimport logging\nimport os\nimport sys\nimport random\nfrom abc import ABC, abstractme"
  },
  {
    "path": "weibo_spider/downloader/img_downloader.py",
    "chars": 1818,
    "preview": "import os\n\nfrom .downloader import Downloader\n\n\nclass ImgDownloader(Downloader):\n    def __init__(self, file_dir, file_d"
  },
  {
    "path": "weibo_spider/downloader/origin_picture_downloader.py",
    "chars": 278,
    "preview": "from .img_downloader import ImgDownloader\n\n\nclass OriginPictureDownloader(ImgDownloader):\n    def __init__(self, file_di"
  },
  {
    "path": "weibo_spider/downloader/retweet_picture_downloader.py",
    "chars": 278,
    "preview": "from .img_downloader import ImgDownloader\n\n\nclass RetweetPictureDownloader(ImgDownloader):\n    def __init__(self, file_d"
  },
  {
    "path": "weibo_spider/downloader/video_downloader.py",
    "chars": 760,
    "preview": "import os\n\nfrom .downloader import Downloader\n\n\nclass VideoDownloader(Downloader):\n    def __init__(self, file_dir, file"
  },
  {
    "path": "weibo_spider/logging.conf",
    "chars": 941,
    "preview": "[loggers]\nkeys=root,spider\n\n[handlers]\nkeys=consoleHandler,fileHandler,errorHandler\n\n[formatters]\nkeys=consoleFormatter,"
  },
  {
    "path": "weibo_spider/parser/__init__.py",
    "chars": 213,
    "preview": "from .index_parser import IndexParser\nfrom .page_parser import PageParser\nfrom .photo_parser import PhotoParser\nfrom .al"
  },
  {
    "path": "weibo_spider/parser/album_parser.py",
    "chars": 621,
    "preview": "from .parser import Parser\nfrom .util import handle_html\n\n\nclass AlbumParser(Parser):\n    def __init__(self, cookie, alb"
  },
  {
    "path": "weibo_spider/parser/comment_parser.py",
    "chars": 2394,
    "preview": "import logging\nimport random\nimport requests\nimport re\nfrom time import sleep\nfrom lxml.html import tostring\nfrom lxml.h"
  },
  {
    "path": "weibo_spider/parser/index_parser.py",
    "chars": 3005,
    "preview": "import logging\n\nfrom .info_parser import InfoParser\nfrom .parser import Parser\nfrom .util import handle_html, handle_htm"
  },
  {
    "path": "weibo_spider/parser/info_parser.py",
    "chars": 2212,
    "preview": "import logging\nimport sys\n\nfrom ..user import User\nfrom .parser import Parser\nfrom .util import handle_html\n\nlogger = lo"
  },
  {
    "path": "weibo_spider/parser/mblog_picAll_parser.py",
    "chars": 389,
    "preview": "from .parser import Parser\nfrom .util import handle_html\n\n\nclass MblogPicAllParser(Parser):\n    def __init__(self, cooki"
  },
  {
    "path": "weibo_spider/parser/page_parser.py",
    "chars": 16604,
    "preview": "import logging\nimport re\nimport sys\nfrom datetime import datetime, timedelta\n\nfrom .. import datetime_util\nfrom ..weibo "
  },
  {
    "path": "weibo_spider/parser/parser.py",
    "chars": 126,
    "preview": "class Parser:\n    def __init__(self, cookie):\n        self.cookie = cookie\n        self.url = ''\n        self.selector ="
  },
  {
    "path": "weibo_spider/parser/photo_parser.py",
    "chars": 931,
    "preview": "from .parser import Parser\nfrom .util import handle_html\n\n\nclass PhotoParser(Parser):\n    def __init__(self, cookie, use"
  },
  {
    "path": "weibo_spider/parser/util.py",
    "chars": 4965,
    "preview": "import hashlib\nimport json\nimport logging\nimport sys\n\nimport aiohttp\nimport requests\nfrom lxml import etree\n\n# Set GENER"
  },
  {
    "path": "weibo_spider/spider.py",
    "chars": 17087,
    "preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\nimport json\nimport logging\nimport logging.config\nimport os\nimport random\n"
  },
  {
    "path": "weibo_spider/user.py",
    "chars": 1055,
    "preview": "class User:\n    __slots__ = (\n        'id', 'nickname', 'gender', 'location', 'birthday', 'description',\n        'verifi"
  },
  {
    "path": "weibo_spider/user_id_list.txt",
    "chars": 100,
    "preview": "1669879400 Dear-迪丽热巴 2020-01-13 19:18\n1223178222 胡歌 2020-01-13 19:28\n1729370543 郭碧婷 2020-01-13 19:33"
  },
  {
    "path": "weibo_spider/weibo.py",
    "chars": 1470,
    "preview": "class Weibo:\n    __slots__ = (\n        'id', 'user_id', 'content', 'article_url', 'original_pictures',\n        'retweet_"
  },
  {
    "path": "weibo_spider/writer/__init__.py",
    "chars": 405,
    "preview": "from .csv_writer import CsvWriter\nfrom .json_writer import JsonWriter\nfrom .mongo_writer import MongoWriter\nfrom .mysql_"
  },
  {
    "path": "weibo_spider/writer/csv_writer.py",
    "chars": 1781,
    "preview": "import csv\nimport logging\n\nfrom .writer import Writer\n\nlogger = logging.getLogger('spider.csv_writer')\n\n\nclass CsvWriter"
  },
  {
    "path": "weibo_spider/writer/json_writer.py",
    "chars": 1754,
    "preview": "import codecs\nimport json\nimport logging\nimport os\n\nfrom .writer import Writer\n\nlogger = logging.getLogger('spider.json_"
  },
  {
    "path": "weibo_spider/writer/kafka_writer.py",
    "chars": 1272,
    "preview": "import json\nimport logging\nimport sys\n\nfrom .writer import Writer\n\nlogger = logging.getLogger('spider.kafka_writer')\n\n\nc"
  },
  {
    "path": "weibo_spider/writer/mongo_writer.py",
    "chars": 2167,
    "preview": "import copy\nimport logging\nimport sys\n\nfrom .writer import Writer\n\nlogger = logging.getLogger('spider.mongo_writer')\n\n\nc"
  },
  {
    "path": "weibo_spider/writer/mysql_writer.py",
    "chars": 5244,
    "preview": "import copy\nimport logging\nimport sys\n\nfrom .writer import Writer\n\nlogger = logging.getLogger('spider.mysql_writer')\n\n\nc"
  },
  {
    "path": "weibo_spider/writer/post_writer.py",
    "chars": 2077,
    "preview": "import codecs\nimport json\nimport logging\nimport os\nimport requests\n\nfrom .writer import Writer\nfrom time import sleep\nfr"
  },
  {
    "path": "weibo_spider/writer/sqlite_writer.py",
    "chars": 3730,
    "preview": "import copy\nimport logging\nimport sys\n\nfrom .writer import Writer\n\nlogger = logging.getLogger('spider.sqlite_writer')\n\n\n"
  },
  {
    "path": "weibo_spider/writer/txt_writer.py",
    "chars": 1983,
    "preview": "import logging\nimport sys\n\nfrom .writer import Writer\n\nlogger = logging.getLogger('spider.txt_writer')\n\n\nclass TxtWriter"
  },
  {
    "path": "weibo_spider/writer/writer.py",
    "chars": 333,
    "preview": "from abc import ABC, abstractmethod\n\n\nclass Writer(ABC):\n    def __init__(self):\n        \"\"\"根据需要,初始化结果路径、初始化表头、初始化数据库等\"\""
  }
]

About this extraction

This page contains the full source code of the dataabc/weiboSpider GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 80 files (255.0 KB), approximately 97.1k tokens, and a symbol index with 159 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!