Repository: Chengyumeng/spider163
Branch: master
Commit: cf655e37dcc7
Files: 54
Total size: 110.7 KB
Directory structure:
gitextract_2_c_xszu/
├── .gitignore
├── .travis.yml
├── LICENSE
├── MANIFEST.in
├── Makefile
├── README.md
├── doc/
│ ├── 2017.Q4.TODO.md
│ ├── 2018.Q1.TODO.md
│ └── 2018.Q2.TODO.md
├── hack/
│ ├── docker-compose.yaml
│ └── spider/
│ ├── Dockerfile
│ └── spider163.conf
├── pypi.sh
├── setup.py
└── spider163/
├── __init__.py
├── bin/
│ ├── __init__.py
│ ├── cli.py
│ └── cli_test.py
├── mail/
│ ├── __init__.py
│ └── mail.py
├── settings.py
├── spider/
│ ├── __init__.py
│ ├── authorize.py
│ ├── comment.py
│ ├── lyric.py
│ ├── mp3.py
│ ├── music.py
│ ├── playlist.py
│ ├── public.py
│ ├── read.py
│ └── search.py
├── template/
│ └── spider163.conf
├── utils/
│ ├── __init__.py
│ ├── config.py
│ ├── const.py
│ ├── encrypt.py
│ ├── healthz.py
│ ├── mail.py
│ ├── pylog.py
│ ├── pysql.py
│ └── tools.py
├── version.py
└── www/
├── __init__.py
├── static/
│ ├── css/
│ │ └── spider163.css
│ └── js/
│ ├── macarons.js
│ ├── scan.js
│ ├── spider163.js
│ └── stat.js
├── templates/
│ ├── bussiness.html
│ ├── index.html
│ ├── scan.html
│ ├── spider.html
│ └── stat.html
└── web.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
# C extensions
*.so
# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
.hypothesis/
# Translations
*.mo
*.pot
# Django stuff:
*.log
local_settings.py
# Flask stuff:
instance/
.webassets-cache
# Scrapy stuff:
.scrapy
# Sphinx documentation
docs/_build/
# PyBuilder
target/
# IPython Notebook
.ipynb_checkpoints
# pyenv
.python-version
# celery beat schedule file
celerybeat-schedule
# dotenv
.env
# virtualenv
venv/
ENV/
# Spyder project settings
.spyderproject
# Rope project settings
.ropeproject
# 开发者自定义
.DS_Store
.idea/
hack/_dev
================================================
FILE: .travis.yml
================================================
language: python
python:
- 3.6
install:
- pip install -e .
script:
- python -m unittest discover -p "*_test.py"
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2017 Cheng YuMeng
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: MANIFEST.in
================================================
graft spider163/www/templates
graft spider163/www/static
graft spider163/template
================================================
FILE: Makefile
================================================
.PHONY: docker-build docker-run build
USER := $(shell id -u)
BRANCH := $(shell git rev-parse --abbrev-ref HEAD)
VERSION := $(shell git describe --always --tags | grep -Eo "[0-9]+\.[0-9]+\.[0-9]+")
docker-build:
docker build -t chengtian/spider163:$(VERSION) -f hack/spider/Dockerfile .
docker-run:
cd hack && docker-compose up
build:
pip install -e .
================================================
FILE: README.md
================================================

# spider163
[][license] [][releases] [][pyversions] [][buildstatus]
[license]: https://github.com/Chengyumeng/spider163/blob/master/LICENSE
[releases]:https://github.com/Chengyumeng/spider163/releases
[pyversions]:https://pypi.python.org/pypi/spider163
[buildstatus]:https://travis-ci.org/chengyumeng/spider163
###### GitHub上最易用的网易云音乐爬虫系统
## 安装模块
- 第一步:指定SPIDER163_PATH环境变量,缺省情况下为$HOME/spider163
- 第二步:把默认配置文件spider163.conf拷贝到SPIDER163_PATH下,并配置数据库
- 第三步:pip install spider163
- spider163 --help
## 历史文档
- [不重要,2018年spider163发布的5个版本](https://mp.weixin.qq.com/s/pim5tYPHd0zBTKYQaijkbQ)
- [Spider163同时支持python2.x和python3.x的演进之路](https://mp.weixin.qq.com/s/FFoD3gKM5touGVbvKebRlA)
- [Spider163支持下载网易付费歌曲了](https://mp.weixin.qq.com/s/L8uvPV_CiAS6vcnaOaifJw)
- [非重磅 | 网易云音乐爬虫Spider163更新v2.0](https://mp.weixin.qq.com/s?__biz=MzI2NTMxMDYxMg==&mid=2247483955&idx=1&sn=c1d8a38b4929cb298fc6172cf894e641&chksm=ea9e1ac8dde993de1d6095d000f289389ee92609bccebda3d2ebc88bfa1939eceb6b94cc3fce&scene=38#wechat_redirect)
- ...
## 使用指南
```console
$ spider163 initdb
$ # 根据配置文件的数据库信息自动创建数据库表,删除全部数据通过resetdb实现
```
```console
$ spider163 resetdb
$ # 重建相关数据库
```
```console
$ spider163 updatedb
$ # 根据时间重置过期数据重新抓取
```
```console
$ spider163 classify
$ # 获取已知曲风列表
```
```console
$ spider163 playlist
$ # 默认下载全部推荐歌单(1000+),也可以通过指定页码去下载(-p=1),以及歌曲风格(--classify=小语种,默认为全部)
```
```console
$ spider163 mp3 --playlist=2033391777
$ # 默认下载指定歌单列表内的全部包含版权的歌曲
```
```console
$ spider163 music
$ # 默认下载10个歌单的歌曲数据,也可以通过指定循环大小(-c=2)来下载10 * c 个歌单内歌曲
```
```console
$ spider163 comment
$ # 默认根据数据库存储的未下载歌曲随机下载一首单曲的评论,也可以通过-c指定需要下载的单曲数量和-s强制指定歌曲id
$ # spider163 comment -c 10 | spider163 comment -s 209115
```
```console
$ spider163 lyric --count=10
$ # 抓取10首音乐的歌词,可以通过制定歌曲ID抓取特定一首音乐(--song)
```
```console
$ spider163 search -q="林依晨"
$ # 搜索功能(待完善,暂支持歌曲搜索)
```
```console
$ spider163 get -s 209115
$ # 阅读歌曲基本信息、歌词、热评
```
```console
$ spider163 get --playlist 922064582
$ # 获取歌单的基本信息、歌曲等
```
```console
$ spider163 doc --playlist 922064582
$ # 歌单/歌曲信息汇总成word文档
```
```console
$ spider163 top50 --playlist 922064582 --username=xxx --password=xxx
$ # 创建TOP 50 歌单
```
## TODO
- [2018 Q2](https://github.com/Chengyumeng/spider163/blob/master/doc/2018.Q2.TODO.md)
- [2018 Q1](https://github.com/Chengyumeng/spider163/blob/master/doc/2018.Q1.TODO.md)
- [2017 Q4](https://github.com/Chengyumeng/spider163/blob/master/doc/2017.Q4.TODO.md)
- ...
# 欢迎关注微信公众账号:程天写代码

================================================
FILE: doc/2017.Q4.TODO.md
================================================
#### 小版本更新
* 完善歌词下载流程,支持批量下载 √
* 完善抓取过程的进度可视化 √
* 错误流程的管理控制 √
* 输出改为多彩的形式 √
* 搜索功能完善(广度更大)√
* 增加一些自动执行的脚本 √
* 增加pip支持 √
* 增加Docker镜像 √
* 抓取歌单增加多样性 √
#### WEB UI
* WEB UI支持实时检索
* WEB UI支持推荐 √
* WEB UI支持在线抓取 √
* ...
#### 微信接口
* ...
#### 跨平台支持
* 跨windows平台
* 不同版本的MySQL支持
* 考虑支持sqlite
================================================
FILE: doc/2018.Q1.TODO.md
================================================
#### 主要方向
* 文件类下载(mp3等)[完成下载mp3]
* 搜索框架(基于ES,可能衍生子项目)[TODO]
* 最小安装的探索(MySQL依赖太复杂)[生成word]
* 迁移到python3 [验证性完成]
* k8s下部署(提供官方镜像)
* 支持生成PDF[完成支持word,pdf可手动生成]
#### 优化
* web ui
* web ui 使用流行的前端框架(angular or react)
================================================
FILE: doc/2018.Q2.TODO.md
================================================
#### 主要方向
* 探索搜索框架【可能通过衍生子项目实现】
* 优化集成测试
* 开发邮箱订阅功能[dev]
* 集成telegraf/influxdb/grafana[dev]
* 赚钱!赚钱!!赚钱!!!
================================================
FILE: hack/docker-compose.yaml
================================================
version: '2'
services:
mysql163:
image: mysql:5.6.36
container_name: mysql163
networks:
- default
environment:
MYSQL_DATABASE: "spider163"
MYSQL_ROOT_PASSWORD: "a1b2c3d4e"
volumes:
- ./_dev/mysql:/var/lib/mysql:z
ports:
- "3336:3306"
expose:
- "3306"
spider163:
image: chengtian/spider163:2.7.5
container_name: spider163
volumes:
- ../:/root/code
ports:
# 将容器端口与宿主机绑定, 以便外部访问
- "1630:1630"
expose:
- "1630"
links:
- mysql163:mysql163.localhost
depends_on:
- mysql163
networks:
- default
networks:
default:
external:
name: spider163
================================================
FILE: hack/spider/Dockerfile
================================================
FROM python:3.6
RUN mkdir /root/code & mkdir /root/spider163
WORKDIR /root/code
ADD ./ /root/code/
ADD hack/spider/spider163.conf /root/spider163/spider163.conf
RUN pip install -e . -i http://mirrors.aliyun.com/pypi/simple --trusted-host=mirrors.aliyun.com && spider163 --help
ENTRYPOINT spider163 webserver
================================================
FILE: hack/spider/spider163.conf
================================================
[core]
db=mysql://root:a1b2c3d4e@mysql163.localhost/spider163?charset=utf8mb4
port=1630
================================================
FILE: pypi.sh
================================================
python setup.py clean
python setup.py sdist
twine upload dist/*
python setup.py clean
================================================
FILE: setup.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*
from setuptools import setup, find_packages, Command
import os
import imp
class CleanCommand(Command):
"""Custom clean command to tidy up the project root."""
user_options = []
def initialize_options(self):
pass
def finalize_options(self):
pass
def run(self):
os.system('rm -vrf ./build ./dist ./*.pyc ./*.tgz ./*.egg-info')
py3 = imp.load_source(
'spider163.version', os.path.join('spider163', 'version.py')).PYTHON3
version = imp.load_source(
'spider163.version', os.path.join('spider163', 'version.py')).VERSION
desc = imp.load_source(
'spider163.version', os.path.join('spider163', 'version.py')).DESCRIPTION
install_requires = [
"beautifulsoup4==4.6.0",
"bs4==0.0.1",
"cement==2.10.2",
"certifi==2017.7.27.1",
"chardet==3.0.4",
"idna==2.6",
"Naked==0.1.31",
"pprint==0.1",
"cryptography==2.3",
"PyYAML==3.12",
"requests==2.18.4",
"shellescape==3.4.1",
"SQLAlchemy==1.1.15",
"SQLAlchemy-Utils==0.32.18",
"terminaltables==3.1.0",
"urllib3==1.24.2",
"Logbook==1.1.0",
"colorama==0.3.9",
"flask==1.0",
"python-docx==0.8.6",
"xlwt==1.3.0"
]
if py3 is True:
install_requires.append("mysqlclient==1.3.12")
else:
install_requires.append("MySQL-python==1.2.5")
setup(
version=version,
name='spider163',
author='ChengTian',
description='简单易用、功能强大的网易云音乐爬虫',
long_description=desc,
entry_points={
"console_scripts": ["spider163=spider163.bin.cli:main"]
},
url='https://github.com/Chengyumeng/spider163',
author_email='792400644@qq.com',
packages=find_packages(),
include_package_data=True,
license='MIT License',
zip_safe=False,
install_requires=install_requires,
classifiers=[
'Development Status :: 5 - Production/Stable',
'Environment :: Console',
'Environment :: Web Environment',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3.4',
],
cmdclass={
'clean': CleanCommand,
},
)
================================================
FILE: spider163/__init__.py
================================================
================================================
FILE: spider163/bin/__init__.py
================================================
================================================
FILE: spider163/bin/cli.py
================================================
# -*- coding: utf-8 -*-
import os, datetime
from cement.core.foundation import CementApp
from cement.core.controller import CementBaseController, expose
from cement.core.exc import CaughtSignal
from colorama import Fore
from colorama import init
from spider163.utils import pysql
from spider163.spider import playlist
from spider163.spider import mp3
from spider163.spider import music
from spider163.spider import comment
from spider163.spider import lyric
from spider163.spider import search
from spider163.spider import read
from spider163.spider import authorize
from spider163.spider import public as uapi
from spider163.mail import mail
from spider163 import version
from spider163.www import web
from spider163.utils import config
from spider163.utils import pylog
from spider163.utils import healthz
BANNER = """
Spider163 Application v{}
Copyright (c) {} Cheng Tian Enterprises
Welcome to Follow My 【微信公众账号】"程天写代码"
""".format( version.VERSION , datetime.datetime.now().year)
init(autoreset=True)
class VersionController(CementBaseController):
class Meta:
label = 'base'
description = 'Spider163是Github上最流行的网易云音乐爬虫系统'
arguments = [
(['-v', '--version'], dict(action='version', version=BANNER)),
]
@expose(help="运行前健康检查")
def healthz(self):
healthz.is_correct_config()
healthz.is_correct_db()
healthz.can_spider()
@expose(help="提供生态信息数据【for telegraf】")
def expose(self):
healthz.expose_data()
class DatabaseController(CementBaseController):
class Meta:
label = "database"
description = "数据库相关操作"
arguments = [
(['-d', '--date'], dict(help="时间限制(example:-1,-2)")),
(['-t', '--table'], dict(help="修改的表(example:playlist,music)")),
]
@expose(help="自动生成数据库相关依赖")
def initdb(self):
print("正在生成全部数据库表结构……")
pysql.initdb()
@expose(help="重置数据库配置")
def resetdb(self):
print("正在删除全部已下载数据……")
pysql.dropdb()
pysql.initdb()
@expose(help="重置过期的playlist/music数据,便于重新抓取")
def updatedb(self):
if self.app.pargs.date is None:
print(Fore.RED + '没有指定参数 date 无法进行操作')
return
if self.app.pargs.table == "music":
m = music.Music()
m.create_update_strategy(date=int(self.app.pargs.date))
elif self.app.pargs.table == "playlist":
p = playlist.Playlist()
p.create_update_strategy(date=int(self.app.pargs.date))
else:
print(Fore.RED + '指定参数 table 不正确!')
class SpiderController(CementBaseController):
class Meta:
label = "spider"
description = "爬虫-蜘蛛等相关操作"
arguments = [
(['-p', '--page'],
dict(help="抓取的页码")),
(['-c', '--count'],
dict(help="")),
(['-s', '--song'],
dict(help="歌曲ID")),
(['--classify'],
dict(help="歌曲风格")),
(['--path'],
dict(help="存储路径"))
]
@expose(help="获取全部歌曲风格列表(作为抓取歌单的参照)")
def classify(self):
playlist.Playlist().get_classify()
@expose(help="根据推荐歌单抓取网易云音乐歌单数据(-p --page | --classify)")
def playlist(self):
pg = self.app.pargs.page
cf = "全部"
pl = playlist.Playlist()
if self.app.pargs.classify is not None:
cf = self.app.pargs.classify
if pg is not None:
print(Fore.GREEN + '正在抓取 曲风为 {} 的第 {} 页歌单……'.format(cf, pg))
pl.view_capture(int(pg), cf)
else:
for i in range(36):
print(Fore.GREEN + '正在抓取 曲风为 {} 的第 {} 页歌单……'.format(cf ,i + 1))
pl.view_capture(i + 1, cf)
@expose(help="根据指定的歌单下载歌单歌曲MP3(--playlist | --path)")
def mp3(self):
path = "."
if self.app.pargs.path is not None:
path = self.app.pargs.path
if not os.path.exists(path):
os.makedirs(path)
if self.app.pargs.playlist is not None:
m = mp3.MP3()
m.view_down(self.app.pargs.playlist, path)
@expose(help="通过歌单抓取网易云音乐歌曲,单次抓取歌单10个(-c --count)")
def music(self):
msc = music.Music()
if self.app.pargs.count is None:
msc.views_capture()
return
cnt = int(self.app.pargs.count)
if cnt <= 0:
print(Fore.RED + "不合法的--count -c 变量( > 0 )")
else:
for i in range(cnt):
print(Fore.GREEN + '正在执行第 {} 批抓取计划,本次抓取歌单歌曲 10 个\r\n'.format(i + 1))
msc.views_capture()
@expose(help="抓取网易云音乐官方排行榜歌单(-c --count)")
def toplist(self):
msc = music.Music()
cmt = comment.Comment(comment.Comment.Official)
for id in uapi.top:
pylog.print_info('正在抓取官方排行榜 歌单ID:{} 歌单名字:{}'.format(id, uapi.top[id]))
msc.view_capture(id)
cnt = int(self.app.pargs.count)
if cnt <= 0:
print(Fore.RED + "不合法的--count -c 变量( > 0 )")
else:
cmt.auto_view(cnt)
@expose(help="通过音乐列表抓取网易云音乐热评,单次抓取音乐1首(-c --count),也可以指定歌曲ID(-s --song)")
def comment(self):
cmt = comment.Comment()
if self.app.pargs.song is not None:
print(Fore.BLUE + '正在执行抓取歌曲 {} 热门评论计划'.format(self.app.pargs.song))
cmt.view_capture(int(self.app.pargs.song), 1)
print(Fore.GREEN + '抓取完成\r\n')
return
if self.app.pargs.count is not None:
print(Fore.GREEN + '正在执行批量抓取热门评论计划,本次计划抓取歌曲 {} 首\r\n'.format(self.app.pargs.count))
cmt.auto_view(int(self.app.pargs.count))
else:
cmt.auto_view(1)
@expose(help="通过音乐列表抓取网易云音乐歌词,可以指定抓取歌曲数量(-c --count),也可以指定歌曲ID(-s --song)")
def lyric(self):
lrc = lyric.Lyric()
if self.app.pargs.song is not None:
print(Fore.BLUE + '正在执行抓取歌曲 {} 歌词的计划'.format(self.app.pargs.song))
lrc.view_lyric(self.app.pargs.song)
print(Fore.GREEN + '抓取完成\r\n')
elif self.app.pargs.count is not None:
print(Fore.GREEN + '正在执行批量抓取歌词计划,本次计划抓取歌曲 {} 首\r\n'.format(self.app.pargs.count))
lrc.view_lyrics(int(self.app.pargs.count))
else:
print("您至少指定--song或者--count一个参数")
class QueryController(CementBaseController):
class Meta:
label = "query"
stacked_on = 'base'
description = "爬虫-蜘蛛等相关操作"
arguments = [
(['--playlist'],
dict(help="")),
(['-q', '--query'],
dict(help="")),
]
@expose(help="通过歌单ID和歌曲ID获取歌单、歌曲相关信息(--song --playlist)")
def get(self):
if self.app.pargs.song is not None:
comment.Comment().get_music(self.app.pargs.song)
lyric.Lyric().get_lyric(self.app.pargs.song)
if self.app.pargs.playlist is not None:
music.Music().get_playlist(self.app.pargs.playlist)
@expose(help="搜索功能(-q --query)")
def search(self):
if self.app.pargs.query is not None:
search.searchSong(self.app.pargs.query)
search.searchAlbum(self.app.pargs.query)
search.searchSinger(self.app.pargs.query)
search.searchPlaylist(self.app.pargs.query)
@expose(help="生成文档(word)(--playlist | --count)")
def doc(self):
if self.app.pargs.playlist is not None:
read.print_pdf(self.app.pargs.playlist)
if self.app.pargs.count is not None:
read.print_comment(int(self.app.pargs.count))
@expose(help="Spider163邮件系统(--playlist)")
def mail(self):
try:
if self.app.pargs.playlist is not None:
playlist_id = int(self.app.pargs.playlist)
mail.music(playlist_id)
except Exception as e:
print("{} 发送邮件发生意外 {}".format(Fore.RED, e))
class WebController(CementBaseController):
class Meta:
label = "web"
stacked_on = 'base'
description = "网络平台"
arguments = [
]
@expose(help="Spider163管理Web平台")
def webserver(self):
try:
webport = config.get_port()
web.app.run(host="0.0.0.0", port=webport, debug=True)
except Exception as e:
print("{} 退出web服务:{}".format(Fore.RED, e))
class AuthController(CementBaseController):
class Meta:
label = "auth"
stacked_on = 'base'
description = "登录授权操作"
arguments = [
(['--username'],
dict(help="登录账号(必须为中国大陆手机号)")),
(['--password'],
dict(help="登录密码")),
]
@expose(help="维护评论Top 50 歌单")
def top50(self):
if self.app.pargs.username is None:
pylog.print_warn("没有指定用户名(--username)参数,无法执行任务!")
return
if self.app.pargs.password is None:
pylog.print_warn("没有指定密码(--password)参数,无法执行任务!")
return
if self.app.pargs.playlist is None:
pylog.print_warn("没有指定目标歌单ID(--playlist),无法执行任务!")
return
username = self.app.pargs.username
password = self.app.pargs.password
playlist_id = self.app.pargs.playlist
cmd = authorize.Command()
cmd.do_login(username, password)
cmd.clear_playlist(playlist_id)
cmd.create_playlist_comment_top100(playlist_id)
class App(CementApp):
class Meta:
label = "Spider163"
base_controller = "base"
handlers = [VersionController, DatabaseController, SpiderController, QueryController, WebController, AuthController]
def main():
with App() as app:
try:
app.run()
except CaughtSignal as e:
pylog.print_warn("控制台异常:{}".format(e))
except Exception as e:
pylog.print_err("执行抓取任务遭遇配置异常: {}".format(e))
================================================
FILE: spider163/bin/cli_test.py
================================================
import unittest
from spider163.spider import playlist
from spider163.spider import mp3
from spider163.spider import search
from spider163.spider import read
from spider163.utils import healthz
class TestStringMethods(unittest.TestCase):
def test_config(self):
healthz.is_correct_config()
healthz.is_correct_db()
healthz.can_spider()
def test_classify(self):
playlist.Playlist().get_classify()
def test_mp3(self):
m = mp3.MP3()
# m.view_down(2127220577, ".")
def test_search(self):
search.searchSong("李荣浩")
search.searchAlbum("韩寒")
search.searchSinger("林依晨")
search.searchPlaylist("SHE")
def test_doc(self):
read.print_pdf(2127220577)
if __name__ == '__main__':
unittest.main()
================================================
FILE: spider163/mail/__init__.py
================================================
================================================
FILE: spider163/mail/mail.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from spider163 import settings
from spider163 import version
from spider163.spider import public as uapi
from spider163.utils import pysql
from spider163.utils import pylog
from spider163.utils import mail
from spider163.utils import config
def music(playlist_id):
if playlist_id not in uapi.top.keys():
pylog.print_info("歌单 {} 不在合法排行榜序列,合法歌单如下".format(playlist_id))
tb = [['歌单ID','歌单名字']]
for k,v in uapi.top.items():
tb.append([k, v])
pylog.Table(tb)
return
data = settings.Session.query(
pysql.Toplist163.song_name,
pysql.Toplist163.song_id,
pysql.Toplist163.author,
pysql.Toplist163.comment.label("count")
).filter(pysql.Toplist163.playlist_id == playlist_id, pysql.Toplist163.mailed == "N").order_by(pysql.Toplist163.id.asc()).slice(1,5).all()
page = []
body = version.MAILBODY
title = version.MAILMUSIC
comments = version.MAILCOMMENT
for m in data:
settings.Session.query(pysql.Toplist163).filter(pysql.Toplist163.song_id == m[1]).update({'mailed': 'Y'})
settings.Session.commit()
detail = ""
cms = settings.Session.query(pysql.Comment163).filter(pysql.Comment163.song_id == m[1]).order_by(pysql.Comment163.id).all()
for c in cms:
detail = detail + comments.format(c.author,c.liked,c.txt)
head = title.format(m[1], m[0], m[2], m[3], detail)
page.append(head + detail)
body = body.format(uapi.top[playlist_id],"<br>".join(page))
host,port,users = config.get_mail()
for user in users.split(","):
mail.send_email(host,port,"spider163每日网易云音乐分享", user, body)
================================================
FILE: spider163/settings.py
================================================
# -*- coding: utf-8 -*-
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
from spider163.utils import config
from sqlalchemy_utils.functions import database_exists
from spider163.utils import pylog
def configure_orm():
global engine
global Session
engine_args = {}
Session = scoped_session(
sessionmaker(autocommit=False, autoflush=False))
try:
if database_exists(config.get_db()) is False:
create_engine(config.get_mysql()['uri'], echo=False).execute("create database IF NOT EXISTS {} DEFAULT CHARACTER SET utf8mb4".format(config.get_mysql()['db']))
engine = create_engine(config.get_db(), **engine_args)
Session = scoped_session(
sessionmaker(autocommit=False, autoflush=False, bind=engine))
except Exception as e:
pylog.print_err("初始化数据库出现问题: {}".format(e))
================================================
FILE: spider163/spider/__init__.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from spider163 import settings
from spider163.utils import pylog
try:
settings.configure_orm()
except Exception as e:
pylog.print_info("无法执行数据库相关的任务: {}".format(e))
================================================
FILE: spider163/spider/authorize.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import requests
import time
from spider163.utils import encrypt
from spider163.spider import public as uapi
from spider163.spider import music
from spider163.utils import pysql
from spider163.utils import pylog
from spider163.utils import tools
from spider163 import settings
class Command():
def __init__(self):
modulus = uapi.comment_module
pubKey = uapi.pubKey
secKey = uapi.secKey
self.__encSecKey = self.rsaEncrypt(secKey, pubKey, modulus)
self.session = requests.session()
self.session.headers = uapi.header
def createPlaylistParams(self,ids,playlist_id,cmd,csrf_token):
text = '{"trackIds": ['+",".join(ids) + '],"pid": "{}","op": "{}","csrf_token": "{}"'.format(playlist_id,cmd,csrf_token) + '}'
nonce = '0CoJUm6Qyw8W8jud'
nonce2 = 16 * 'F'
encText = encrypt.aes(
encrypt.aes(text, nonce).decode("utf-8"), nonce2
)
return encText
def createPlaylistRemoveParams(self):
pass
def createLoginParams(self,username,password):
psw = tools.md5(password)
text = '{' + '"phone": "{}","password": "{}","rememberLogin": "true"'.format(username,psw)+'}'
nonce = '0CoJUm6Qyw8W8jud'
nonce2 = 16 * 'F'
encText = encrypt.aes(
encrypt.aes(text, nonce).decode("utf-8"), nonce2
)
return encText
def rsaEncrypt(self, text, pubKey, modulus):
text = text[::-1]
rs = int(tools.hex(text), 16)**int(pubKey, 16) % int(modulus, 16)
return format(rs, 'x').zfill(256)
def createSecretKey(self, size):
return (
''.join(map(lambda xx: (hex(ord(xx))[2:]), os.urandom(size)))
)[0:16]
def post_playlist_add(self,ids, playlist_id=2098905487, csrf_token="da2216e4b4ca4efcfab94d8d4920ef9"):
data = {
'params': self.createPlaylistParams(ids,playlist_id,"add",csrf_token),
'encSecKey': self.__encSecKey
}
url = uapi.playlist_add_api.format(csrf_token)
req = self.session.post(
url, data=data, timeout=100
)
return req.json()
def post_playlist_delete(self, ids, playlist_id=2098905487, csrf_token="da2216e4b4ca4efcfab94d8d4920ef9"):
data = {
'params': self.createPlaylistParams(ids, playlist_id, "delete", csrf_token),
'encSecKey': self.__encSecKey
}
url = uapi.playlist_add_api.format(csrf_token)
req = self.session.post(
url, data=data, timeout=10
)
return req.json()
def do_login(self,username,password):
data = {
'params': self.createLoginParams(username,password),
'encSecKey': self.__encSecKey
}
url = uapi.login_api
res = self.session.post(url, data=data, timeout=10).json()
# TODO 处理rep信息
if res["code"] != 200:
if res["code"] == 400:
raise Exception("用户名不合法!")
raise Exception(res["msg"])
return res
def clear_playlist(self,playlist_id=2098905487):
m = music.Music()
data = m.curl_playlist(playlist_id)
for d in data["tracks"]:
res = self.post_playlist_delete([str(d["id"]),],playlist_id)
if res["code"] == 200:
pylog.print_info("成功删除《{}》到指定歌单,歌单目前包含歌曲 {} 首".format(d["name"],res["count"]))
else:
time.sleep(5)
pylog.print_warn("歌曲《{}》不存在于歌单中!".format(d["name"]))
pylog.print_warn("删除歌单歌曲任务完成,请检查!")
def create_playlist_comment_top100(self,playlist_id=2098905487):
data = settings.Session.query(pysql.Music163.song_name, pysql.Music163.song_id,pysql.Music163.comment.label("count")).order_by(
pysql.Music163.comment.label("count").desc()).limit(200).all()
for d in data:
res = self.post_playlist_add([str(d[1]),],playlist_id)
if res["code"] == 502:
pylog.print_warn("歌曲《{}》已经存在于歌单中!".format(d[0]))
elif res["code"] == 200:
pylog.print_info("成功添加《{}》到指定歌单,歌单目前包含歌曲 {} 首".format(d[0],res["count"]))
else:
time.sleep(5)
pylog.print_warn("歌曲《{}》没有添加成功!".format(d[0]))
pylog.print_warn("增加歌单歌曲任务完成,请检查!")
================================================
FILE: spider163/spider/comment.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import requests
import datetime
from bs4 import BeautifulSoup
from terminaltables import AsciiTable
from spider163 import settings
from spider163.utils import pysql
from spider163.utils import pylog
from spider163.utils import tools
from spider163.utils import encrypt
from spider163.spider import public as uapi
class Comment:
Common = 'common music'
Official = 'official music'
def __init__(self, music_type=Common):
self.__headers = uapi.header
self.music_type = music_type
self.session = settings.Session()
modulus = uapi.comment_module
pubKey = uapi.pubKey
secKey = uapi.secKey
self.__encSecKey = self.rsaEncrypt(secKey, pubKey, modulus)
def createParams(self, page=1):
if page == 1:
text = (
'{rid:"", offset:"0", total:"true", limit:"20", csrf_token:""}'
)
else:
offset = str((page-1)*20)
text = (
'{rid:"", offset:"{}", total:"{}", limit:"20", '
'csrf_token:""}'.format(offset, 'false')
)
nonce = '0CoJUm6Qyw8W8jud'
nonce2 = 16 * 'F'
encText = encrypt.aes(
encrypt.aes(text, nonce).decode("utf-8"), nonce2
)
return encText
def rsaEncrypt(self, text, pubKey, modulus):
text = text[::-1]
rs = int(tools.hex(text), 16)**int(pubKey, 16) % int(modulus, 16)
return format(rs, 'x').zfill(256)
def createSecretKey(self, size):
return (
''.join(map(lambda xx: (hex(ord(xx))[2:]), os.urandom(size)))
)[0:16]
def post(self,song_id, page):
data = {
'params': self.createParams(page),
'encSecKey': self.__encSecKey
}
url = uapi.comment_url.format(song_id)
req = requests.post(
url, headers=self.__headers, data=data, timeout=10
)
return req.json()
def views_capture(self, song_id, page=1, pages=1024):
if pages > 1:
while page < pages:
pages = self.view_capture(song_id, page)
page = page + 1
else:
self.view_capture(song_id, 1)
self.view_links(song_id)
def view_capture(self, song_id, page=1):
if page == 1:
self.session.query(pysql.Comment163).filter(
pysql.Comment163.song_id == song_id
).delete()
self.session.commit()
try:
data = self.post(song_id,page)
for comment in data['comments']:
if comment['likedCount'] > 30:
txt = tools.encode(comment['content'])
author = tools.encode(comment['user']['nickname'])
liked = comment['likedCount']
self.session.add(pysql.Comment163(
song_id=song_id, txt=txt, author=author, liked=liked
))
self.session.flush()
if page == 1:
for comment in data['hotComments']:
txt = tools.encode(comment['content'])
author = tools.encode(comment['user']['nickname'])
liked = comment['likedCount']
self.session.add(pysql.Comment163(
song_id=song_id, txt=txt, author=author, liked=liked
))
self.session.flush()
cnt = int(data['total'])
self.session.query(pysql.Music163).filter(
pysql.Music163.song_id == song_id
).update({'done': 'Y', 'comment': cnt, 'update_time': datetime.datetime.now().strftime("%Y-%m-%d %H:%S:%M")})
if self.music_type == self.Official:
self.session.query(pysql.Toplist163).filter(
pysql.Toplist163.song_id == song_id
).update(
{'done': 'Y', 'comment': cnt})
self.session.commit()
return cnt / 20
except Exception as e:
self.session.rollback()
self.session.query(pysql.Music163).filter(
pysql.Music163.song_id == song_id
).update({'done': 'E', 'comment': -2})
self.session.commit()
pylog.log.error(
"解析歌曲评论的时候出现问题:{} 歌曲ID:{} 页码:{}".format(
e, song_id, page
)
)
raise
def view_links(self, song_id):
url = "http://music.163.com/song?id=" + str(song_id)
data = {'id': str(song_id)}
headers = {
'Cookie': 'MUSIC_U=e45797021db3403ab9fffb11c0f70a7994f71177b26efb5169b46948f2f9a60073d23a2665346106c9295f8f6dbb6c7731b299d667364ed3;' # noqa
}
try:
req = requests.get(url, headers=headers, data=data, timeout=100)
sup = BeautifulSoup(req.content, "html.parser")
for link in sup.find_all('li', class_="f-cb"):
html = link.find('a', 's-fc1')
if html is not None:
title = tools.encode(html.get('title'))
song_id = html.get('href')[9:]
author = tools.encode(link.find(
'div', 'f-thide s-fc4'
).find('span').get('title'))
if pysql.single("music163", "song_id", song_id) is True:
self.session.add(pysql.Music163(
song_id=song_id, song_name=title, author=author
))
self.session.flush()
for link in sup.find_all('a', 'sname f-fs1 s-fc0'):
play_link = link.get("href").replace("/playlist?id=", "")
play_name = tools.encode(link.get("title"))
if pysql.single("playlist163", "link", play_link) is True:
self.session.add(pysql.Playlist163(
title=play_name, link=play_link, cnt=-1,
dsc="来源:热评"
))
self.session.flush()
except Exception as e:
pylog.log.error("解析页面推荐时出现问题:{} 歌曲ID:{}".format(e, song_id))
def auto_view(self, count=1):
song = []
if self.music_type == self.Common:
msc = self.session.query(pysql.Music163).filter(pysql.Music163.done == "N").order_by(pysql.Music163.id).limit(count)
for m in msc:
try:
print("抓取热评 ID {} 歌曲 {}".format(m.song_id, pylog.Blue(tools.encode(m.song_name))))
self.views_capture(m.song_id, 1, 1)
song.append({"name": m.song_name, "author": m.author,"song_id": m.song_id})
except Exception as e:
pylog.log.error("自动抓取热评出现异常:{} 歌曲ID:{}".format(e, m.song_id))
elif self.music_type == self.Official:
msc = self.session.query(pysql.Toplist163).filter(pysql.Toplist163.done == "N").order_by(pysql.Toplist163.id).limit(count)
for m in msc:
try:
print("抓取官方榜单歌曲热评 ID {} 歌曲 {}".format(m.song_id, pylog.Blue(tools.encode(m.song_name))))
self.views_capture(m.song_id, 1, 2) # 意味着每一页的评论都抓取
song.append({"name": m.song_name, "author": m.author,"song_id": m.song_id})
except Exception as e:
pylog.log.error("自动抓取官方榜单热评出现异常:{} 歌曲ID:{}".format(e, m.song_id))
return song
def get_music(self, music_id):
self.view_capture(int(music_id), 1)
url = uapi.music_api.format(music_id, music_id)
data = tools.curl(url,self.__headers)
music = data['songs']
print("《" + tools.encode(music[0]['name']) + "》")
author = []
for a in music[0]['artists']:
author.append(tools.encode(a['name']))
album = str(tools.encode(music[0]['album']['name']))
print("演唱:{} 专辑:{}".format(",".join(author), album))
comments = self.session.query(pysql.Comment163).filter(
pysql.Comment163.song_id == int(music_id)
)
tb = AsciiTable([["序号", "作者", "评论", "点赞"]])
max_width = tb.column_max_width(2) - tb.column_max_width(2) % 3
cnt = 0
try:
for cmt in comments:
cnt = cnt + 1
au = tools.encode(cmt.author)
txt = ""
length = 0
for u in cmt.txt:
txt = txt + u
if ord(u) < 128:
length = length + 3
else:
length = length + 1
if length == max_width:
txt = txt + "\n"
length = 0
liked = str(cmt.liked)
tb.table_data.append([str(cnt), str(au), str(txt), liked])
print(tb.table)
except UnicodeEncodeError:
pylog.log.info("获取歌曲详情编码存在问题,转为非表格形式,歌曲ID:{}".format(music_id))
for cmt in comments:
print("评论: {}".format(tools.encode(cmt.txt)))
print(
"作者: {} 点赞: {}".format(
tools.encode(cmt.author), str(cmt.liked)
)
)
print("")
except Exception as e:
pylog.print_warn("获取歌曲时出现异常: {} 歌曲ID:{}".format(e, music_id))
"""
curl 'http://music.163.com/eapi/v1/resource/hotcomments/R_SO_4_439915614?limit=30&offset=30' -H 'MUSIC_U=b14e134d57809f6f2cad59071320962f70351b98b979328186bab129f64585d877f086e4dccc2d68d4631490c2eade1fcb19b68a33677785; versioncode=114; mobilename=SM901; buildver=1517983086; resolution=1920x1080; __csrf=33a68d2b8c79270a8b770ef851ca322b; channel=chuizi; os=android' -H 'Connection: keep-alive' --data 'params=E8C4EA3B185998031030633EE8255315B179427FC8206489FBB24BB0592665FDDD3729945E06958F8E1D7E9D3B8336C82A051CA692ED4EAD270699F0CCFA87BE252577E9DBA7D4ACE1ECAFAB78C190513D439E46D2E62F125C771C5A05EBF5B7F8A9783A2721EE3894DFFE3AAF6751B7A7C412947A0C49CC73F7DBE0D285B45F97A16013F7B4576F2CD2D611150B0ABFF40C8FCE075ED7ED25BE61CCA9154A4F1CB23BF9C720A7BE0A952F25EC77E746B1688AE3FCDE73BC19600468DB7D9175013144D6D759C1660A471A66B8C42B171A2BB3AA48BA8638978B7299A10F08A472D1E13D071136C670A3E748E7DFD5F0E6819E725D793FB2D2BB6852002D1E30A850F90D7F6556C50394E83D4F3FCF79C9721E766D8758399F17538CA1DF87E32DF3468FC6EB592EF5AE7F0E5D295184AEC16C1019FD6F54B41AE835D1967CA7F7E892A6059B95EBACF785D1512402C13A3C8A491970030A1F8E97B35DEDEECFF34BA27F5869047DB5FAABBFEFDE833E3B7E8C7B15C6B1F0764A1CD298039BF6BC7C38832C5B8B4644714C25F4CE1F256AC2456B9D315941CF3CBF69224CF3F0DB7D4BE81486C72562C024C6EB3897D0DED5740A345CAE3592482BD36208DA99F197119A497DC736E58ABF7C80A338EA64059455FD065C61D46499586DDD6A4BEBAF431C2839D49EE192CAA3165B3B6B116FF45760DD0C94FCC5ED5E6E0B990662EC900671ED89AEEB6B7A2F73B7008FC711CB44F9EE23F53415A6C39DF781D13A11B9BEBC87156F67DEC8E6D023394953735006FD471F3A7885B57C0F826CBB3CD4F286BC407FDBA5B4D83ED8CAE4BF17E7F07C2DC3FD072A21727B2FDECB551EB05364287AB201904518E10007EED6' --compressed
"""
================================================
FILE: spider163/spider/lyric.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from spider163.spider import public as uapi
from spider163 import settings
from spider163.utils import pysql
from spider163.utils import pylog
from spider163.utils import tools
class Lyric:
def __init__(self):
self.__headers = uapi.header
self.session = settings.Session()
def view_lyric(self, song_id):
url = uapi.lyric_url.format(str(song_id))
try:
data = tools.curl(url,self.__headers)
lrc = data['lrc']['lyric']
if pysql.single("lyric163", "song_id", song_id):
self.session.add(pysql.Lyric163(song_id=song_id, txt=lrc))
self.session.query(pysql.Music163).filter(pysql.Music163.song_id == song_id).update({"has_lyric": "Y"})
self.session.commit()
except Exception as e:
self.session.query(pysql.Music163).filter(pysql.Music163.song_id == song_id).update({"has_lyric": "E"})
self.session.commit()
pylog.log.error("抓取歌词出现问题:{} 歌曲ID:{}".format(e, song_id))
# raise
def get_lyric(self, song_id):
self.view_lyric(song_id)
lrc = self.session.query(pysql.Lyric163).filter(pysql.Lyric163.song_id == song_id)
print(lrc[0].txt)
def view_lyrics(self, count):
song = []
for i in range(int(count/10)):
ms = self.session.query(pysql.Music163).filter(pysql.Music163.has_lyric == "N").order_by(pysql.Music163.id).limit(10)
for m in ms:
print("正在抓取歌词 ID {} 歌曲 {}".format(m.song_id, pylog.Blue(tools.encode(m.song_name))))
self.view_lyric(m.song_id)
song.append({"name": m.song_name,"author": m.author,"comment": m.comment})
ms = self.session.query(pysql.Music163).filter(pysql.Music163.has_lyric == "N").order_by(pysql.Music163.id).limit(count%10)
for m in ms:
print("正在抓取歌词 ID {} 歌曲 {}".format(m.song_id, pylog.Blue(tools.encode(m.song_name))))
self.view_lyric(m.song_id)
song.append({"name": m.song_name, "author": m.author, "comment": m.comment})
return song
================================================
FILE: spider163/spider/mp3.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import requests
from terminaltables import AsciiTable
from spider163 import settings
from spider163.utils import pylog
from spider163.utils import tools
from spider163.utils import encrypt
from spider163.spider import public as uapi
class MP3:
def __init__(self):
self.__headers = uapi.header
self.session = settings.Session()
modulus = uapi.comment_module
pubKey = uapi.pubKey
secKey = uapi.secKey
self.__encSecKey = self.rsa_encrypt(secKey, pubKey, modulus)
def create_params(self, song_id):
text = '{"ids":[' + str(song_id) + '], br:"320000",csrf_token:"csrf"}'
nonce = '0CoJUm6Qyw8W8jud'
nonce2 = 16 * 'F'
encText = encrypt.aes(
encrypt.aes(text, nonce).decode("utf-8"), nonce2
)
return encText
def rsa_encrypt(self, text, pubKey, modulus):
text = text[::-1]
rs = int(tools.hex(text), 16)**int(pubKey, 16) % int(modulus, 16)
return format(rs, 'x').zfill(256)
def create_secretKey(self, size):
return (
''.join(map(lambda xx: (hex(ord(xx))[2:]), os.urandom(size)))
)[0:16]
def view_down(self, playlist_id, path="."):
list = self.get_playlist(str(playlist_id))
msg = {"success": 0, "failed": 0, "failed_list": []}
for music in list['tracks']:
pylog.print_info(
"正在下载歌曲 {}-{}.mp3".format(
tools.encode(music['name']),
tools.encode(music['artists'][0]['name'])
)
)
link = self.get_mp3_link(music["id"])
if link is None:
msg["failed"] = msg["failed"] + 1
msg["failed_list"].append(music)
continue
r = requests.get(link)
with open("{}/{}-{}{}".format(
path,
tools.encode(music['name']).replace("/", "-"),
tools.encode(music['artists'][0]['name']).replace("/", "-"),
".mp3"
), "wb") as code:
code.write(r.content)
msg["success"] = msg["success"] + 1
pylog.print_warn(
"下载成功:{} 首,下载失败:{}首".format(msg["success"], msg["failed"])
)
tb = [["歌曲名字", "艺术家", "ID"]]
for music in msg["failed_list"]:
n = music['name'].encode("utf-8")
a = music['artists'][0]['name'].encode("utf-8")
i = music['id']
tb.append([n, a, i])
print(AsciiTable(tb).table)
def get_playlist(self, playlist_id):
url = uapi.playlist_api.format(playlist_id)
try:
data = tools.curl(url,self.__headers)
playlist = data['result']
return playlist
except Exception as e:
raise
def get_mp3_link(self, song_id):
data = {
'params': self.create_params(song_id),
'encSecKey': self.__encSecKey
}
url = uapi.mp3_url
try:
req = requests.post(
url, headers=self.__headers, data=data, timeout=10
).json()
if req['code'] == 200:
return req['data'][0]['url']
except Exception as e:
raise
================================================
FILE: spider163/spider/music.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import datetime
from spider163.spider import public as uapi
from spider163 import settings
from spider163.utils import pysql
from spider163.utils import pylog
from spider163.utils import tools
from terminaltables import AsciiTable
class Music:
def __init__(self):
self.__headers = uapi.header
self.__url = uapi.music_url
self.session = settings.Session()
def views_capture(self,source=None):
playlist = {}
if source is None:
urls = self.session.query(pysql.Playlist163).filter(pysql.Playlist163.done == 'N').order_by(pysql.Playlist163.id).limit(10)
else:
if source.startswith("曲风:") is False:
source = "曲风:" + source
urls = self.session.query(pysql.Playlist163).filter(pysql.Playlist163.done == 'N',pysql.Playlist163.dsc==source).order_by(pysql.Playlist163.id).limit(1)
for url in urls:
print("正在抓取歌单《{}》的歌曲……".format(tools.encode(url.title)))
songs = self.view_capture(url.link)
playlist[tools.encode(url.title)] = songs
return playlist
def view_capture(self, link):
url = self.__url + str(link)
songs = []
try:
data = self.curl_playlist(link)
musics = data['tracks']
exist = 0
for music in musics:
name = tools.encode(music['name'])
authors = []
for art in music['artists']:
authors.append(tools.encode(art['name']))
if music["bMusic"] is None:
play_time = 0
else:
play_time = music["bMusic"]["playTime"]
if pysql.single("music163", "song_id", (music['id'])) is True:
self.session.add(pysql.Music163(song_id=music['id'],song_name=name,author=",".join(authors),playTime=play_time))
self.session.commit()
exist = exist + 1
songs.append({"name": name,"author": ",".join(authors)})
else:
pylog.log.info('{} : {} {}'.format("重复抓取歌曲", name, "取消持久化"))
# 处理官方榜单
if int(link) in uapi.top.keys():
updateTime = datetime.datetime.fromtimestamp(data['updateTime'] / 1000).strftime(
"%Y-%m-%d %H:%M:%S")
createTime = datetime.datetime.fromtimestamp(data['createTime'] / 1000).strftime(
"%Y-%m-%d %H:%M:%S")
position = music['position']
lastrank = 100000000
with tools.ignored(Exception):
lastrank = music['lastRank']
cnt = self.session.query(pysql.Toplist163).filter(pysql.Toplist163.update_time == updateTime,
pysql.Toplist163.song_id == music['id'],
pysql.Toplist163.playlist_id == link).count()
mcnt = self.session.query(pysql.Toplist163).filter(pysql.Toplist163.mailed == "Y",
pysql.Toplist163.song_id == music['id'],
pysql.Toplist163.playlist_id == link).count()
if cnt == 0:
mailed = "N"
if mcnt > 0:
mailed = "Y"
self.session.add(pysql.Toplist163(song_id=music['id'],song_name=name,author=",".join(authors),
playTime=play_time,position=position,playlist_id=link,
lastRank=lastrank,
mailed = mailed,
create_time=createTime,
update_time=updateTime))
self.session.commit()
print("歌单包含歌曲 {} 首,数据库 merge 歌曲 {} 首 \r\n".format(len(musics), exist))
self.session.query(pysql.Playlist163).filter(pysql.Playlist163.link == link).update({'done': 'Y','update_time': datetime.datetime.now().strftime("%Y-%m-%d %H:%S:%M")})
self.session.commit()
return songs
except Exception as e:
pylog.log.error("抓取歌单页面存在问题:{} 歌单ID:{}".format(e, url))
self.session.query(pysql.Playlist163).filter(pysql.Playlist163.link == url).update({'done': 'E', 'update_time': datetime.datetime.now().strftime("%Y-%m-%d %H:%S:%M")})
self.session.commit()
def curl_playlist(self,playlist_id):
url = uapi.playlist_api.format(playlist_id)
try:
data = tools.curl(url, self.__headers)
playlist = data['result']
self.session.query(pysql.Playlist163).\
filter(pysql.Playlist163.link == playlist_id).\
update({"playCount": playlist["playCount"],
"shareCount": playlist["shareCount"],
"commentCount": playlist["commentCount"],
"description": playlist["description"],
"tags":",".join(playlist["tags"])})
return playlist
except Exception as e:
pylog.Log("抓取歌单页面存在问题:{} 歌单ID:{}".format(e, playlist_id))
# pylog.print_warn("抓取歌单页面存在问题:{} 歌单ID:{}".format(e, playlist_id))
self.session.query(pysql.Playlist163).filter(pysql.Playlist163.link == playlist_id).update({'done': 'E', 'update_time': datetime.datetime.now().strftime("%Y-%m-%d %H:%S:%M")})
self.session.commit()
def get_playlist(self, playlist_id):
self.view_capture(int(playlist_id))
playlist = self.curl_playlist(playlist_id)
print("《" + tools.encode(playlist['name']) + "》")
author = tools.encode(playlist['creator']['nickname'])
pc = str(playlist['playCount'])
sc = str(playlist['subscribedCount'])
rc = str(playlist['shareCount'])
cc = str(playlist['commentCount'])
with tools.ignored(Exception):
print("维护者:{} 播放:{} 关注:{} 分享:{} 评论:{}".format(author, pc, sc, rc, cc))
print("描述:{}".format(tools.encode(playlist['description'])))
print("标签:{}".format(",".join(tools.encode(playlist['tags']))))
tb = [["ID", "歌曲名字", "艺术家", "唱片"]]
for music in playlist['tracks']:
artists = []
for s in music['artists']:
artists.append(s['name'])
ms = tools.encode(music['name'])
ar = tools.encode(",".join(artists))
ab = tools.encode(music['album']['name'])
id = music['id']
tb.append([id, ms, ar, ab])
print(AsciiTable(tb).table)
# date
def create_update_strategy(self, **kwargs):
date = (datetime.datetime.now() + datetime.timedelta(days=kwargs["date"])).strftime("%Y-%m-%d %H:%S:%M")
self.session.query(pysql.Music163).filter(pysql.Music163.done=="Y",pysql.Music163.update_time > date).update({ "done": "N","update_time":datetime.datetime.now().strftime("%Y-%m-%d %H:%S:%M")})
self.session.commit()
pylog.print_info("完成 重置时间 {} 之后的歌曲,可重新抓取评论".format(date))
================================================
FILE: spider163/spider/playlist.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import datetime
from terminaltables import AsciiTable
from spider163.utils import pysql
from spider163.utils import pylog
from spider163.utils import tools,const
from spider163.spider import public as uapi
from spider163 import settings
class Playlist:
__play_url = None
__headers = None
def __init__(self):
self.__headers = uapi.header
self.__play_url = uapi.play_url
self.session = settings.Session()
def get_classify(self):
table = [["类别", "风格列表"]]
for k, v in uapi.classify.items():
c = 0
lst = ""
for v in v:
c = c + 1
if c % 5 == 0:
lst = lst + v + "\n"
else:
lst = lst + v + ","
table.append([k, lst])
print(AsciiTable(table).table)
def view_capture(self, page, type="全部"):
play_url = self.__play_url.format(type, page * 35)
titles = []
try:
acmsk = {'class': 'msk'}
scnb = {'class': 'nb'}
dcu = {'class': 'u-cover u-cover-1'}
ucm = {'class': 'm-cvrlst f-cb'}
data = tools.curl(play_url,self.__headers,type=const.RETURE_HTML)
lst = data.find('ul', ucm)
for play in lst.find_all('div', dcu):
title = tools.encode(play.find('a', acmsk)['title'])
link = tools.encode(play.find('a', acmsk)['href']).replace("/playlist?id=", "")
cnt = tools.encode(play.find('span', scnb).text).replace('万', '0000')
if pysql.single("playlist163","link",link) is True:
pl = pysql.Playlist163(title=title, link=link, cnt=int(cnt), dsc="曲风:{}".format(type))
self.session.add(pl)
self.session.commit()
titles.append(title)
return titles
except Exception as e:
pylog.log.error("抓取歌单出现问题:{} 歌单类型:{} 页码:{}".format(e, type, page))
raise
# date
def create_update_strategy(self, **kwargs):
date = (datetime.datetime.now() + datetime.timedelta(days=kwargs["date"])).strftime("%Y-%m-%d %H:%S:%M")
self.session.query(pysql.Playlist163).filter(pysql.Playlist163.done == "Y",
pysql.Playlist163.update_time > date).update(
{"done": "N", "update_time": datetime.datetime.now().strftime("%Y-%m-%d %H:%S:%M")})
self.session.commit()
pylog.print_info("完成 重置时间 {} 之后的歌单,可重新抓取歌曲".format(date))
================================================
FILE: spider163/spider/public.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
header = {
'Referer': 'http://music.163.com/',
'Host': 'music.163.com',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
}
comment_text = {
'username': '13393376853',
'password': 'wangyidafahao',
'rememberLogin': 'true'
}
comment_module = '00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7'
pubKey = '010001'
secKey = 16 * 'F'
comment_url = "http://music.163.com/weapi/v1/resource/comments/R_SO_4_{}/?csrf_token="
lyric_url = "http://music.163.com/api/song/lyric?os=pc&id={}&lv=-1&kv=-1&tv=-1"
play_url = "http://music.163.com/discover/playlist/?order=hot&cat={}&limit=35&offset={}"
music_url = "http://music.163.com/api/playlist/detail?id="
mp3_url = "http://music.163.com/weapi/song/enhance/player/url?csrf_token="
playlist_api = "http://music.163.com/api/playlist/detail?id={}&upd"
music_api = "http://music.163.com/api/song/detail/?id={}&ids=[{}]"
search_api = "http://music.163.com/api/search/pc"
playlist_add_api = "http://music.163.com/weapi/playlist/manipulate/tracks?csrf_token={}"
login_api = "http://music.163.com/weapi/login/cellphone"
classify = {
"语种":["华语", "欧美", "日语","韩语", "粤语", "小语种", ],
"风格":["流行", "摇滚", "民谣", "电子", "舞曲", "说唱", "轻音乐", "爵士", "乡村", "R&B/Soul", "古典", "民族", "英伦", "金属", "朋克", "蓝调", "雷鬼", "世界音乐", "拉丁", "另类/独立", "New Age", "古风", "后摇", "Bossa Nova"],
"场景":["清晨", "夜晚", "学习", "工作", "午休", "下午茶", "地铁", "驾车", "运动", "旅行", "散步", "酒吧"],
"情感":["怀旧", "清新", "浪漫", "性感", "伤感", "治愈", "放松", "孤独", "感动", "兴奋", "快乐", "安静", "思念"],
"主题":["影视原声", "ACG", "校园", "游戏", "70后", "80后", "90后", "网络歌曲", "KTV", "经典", "翻唱", "吉他", "钢琴", "器乐", "儿童", "榜单", "00后"]
}
top = {
19723756: '云音乐飙升榜',
3779629: '云音乐新歌榜',
2884035: '网易原创歌曲榜',
3778678: '云音乐热歌榜',
1978921795: '云音乐电音榜',
991319590: '云音乐嘻哈榜',
71385702: '云音乐ACG音乐榜',
10520166: '云音乐新电力榜',
}
================================================
FILE: spider163/spider/read.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from docx import Document
from xlwt import Workbook
from docx.enum.dml import MSO_THEME_COLOR_INDEX
from sqlalchemy import desc
from spider163 import settings
from spider163.spider import public as uapi
from spider163.utils import tools
from spider163.utils import pylog
from spider163.utils import pysql
from spider163.spider import comment
def read_playlist_json(id):
url = uapi.playlist_api.format(id)
data = tools.curl(url,uapi.header)
return data
def read_music_data(id):
url = uapi.mp3_url.format(id)
def read_comment_data(id):
cmt = comment.Comment()
return cmt.post(id,1)
def read_lyric_data(id):
url = uapi.lyric_url.format(id)
data = tools.curl(url,uapi.header)
return data
def print_pdf(id):
data = read_playlist_json(id)
if data["code"] != 200:
pylog.print_warn("歌单信息拉取失败!")
return
document = Document()
try:
document.add_heading(data["result"]["name"], 0)
tags = document.add_paragraph(" ".join(data["result"]["tags"]))
desc = document.add_paragraph(data["result"]["description"])
for m in data["result"]["tracks"]:
document.add_paragraph().add_run(m["name"]).font.color.theme_color = MSO_THEME_COLOR_INDEX.ACCENT_2
lyric = read_lyric_data(m["id"])
document.add_paragraph().add_run(lyric["lrc"]["lyric"]).font.color.theme_color = MSO_THEME_COLOR_INDEX.ACCENT_3
comments = read_comment_data(m["id"])
for c in comments["hotComments"]:
author = document.add_paragraph().add_run(c["user"]["nickname"]).style = 'Emphasis'
content = document.add_paragraph(c["content"])
except Exception as e:
pylog.print_warn(e)
document.save("{}.docx".format(data["result"]["name"]))
pylog.print_info("文档 {}.docx 已经生成!".format(data["result"]["name"]))
def print_comment(count):
session = settings.Session()
comments = session.query(pysql.Comment163).order_by(
desc(pysql.Comment163.liked)).limit(count)
document = Document()
workbook = Workbook()
try:
document.add_heading("TOP {} 评论".format(count), 0)
sheet = workbook.add_sheet("TOP {} 评论".format(count))
i = 0
sheet.write(i, 0, "歌曲名字")
sheet.write(i, 1, "评论作者")
sheet.write(i, 2, "评论内容")
sheet.write(i, 3, "点赞数量")
sheet.write(i, 4, "歌曲链接")
for c in comments:
i = i + 1
song = session.query(pysql.Music163).filter(pysql.Music163.song_id == c.song_id)
pylog.print_info("正在填充第 {} 条评论,歌曲:{}".format(i, song[0].song_name))
document.add_paragraph().add_run(
"作者:{}".format(c.author)).font.color.theme_color = MSO_THEME_COLOR_INDEX.ACCENT_2
document.add_paragraph().add_run(
"内容:{}".format(c.txt)).font.color.theme_color = MSO_THEME_COLOR_INDEX.ACCENT_2
document.add_paragraph().add_run(
"歌曲:《{}》 链接:http://music.163.com/#/song?id={}".format(song[0].song_name, c.song_id)).font.color.theme_color = MSO_THEME_COLOR_INDEX.ACCENT_2
document.add_paragraph().add_run(
"赞同:{}".format(c.liked)).font.color.theme_color = MSO_THEME_COLOR_INDEX.ACCENT_2
document.add_paragraph("")
sheet.write(i, 0, song[0].song_name)
sheet.write(i, 1, c.author)
sheet.write(i, 2, c.txt)
sheet.write(i, 3, c.liked)
sheet.write(i, 4, "http://music.163.com/#/song?id={}".format(c.song_id))
except Exception as e:
pylog.print_warn(e)
document.save("TOP {} 评论.docx".format(count))
pylog.print_warn("\n完成文档 TOP {} 评论.docx 的生成!\n".format(count))
workbook.save("TOP {} 评论.xls".format(count))
pylog.print_warn("\n完成文档 TOP {} 评论.xls 的生成!\n".format(count))
================================================
FILE: spider163/spider/search.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import requests
from terminaltables import AsciiTable
from spider163.utils import pylog
from spider163.utils import tools
from spider163.spider import public as uapi
offset = 0
limit = 20
type = {1: "歌曲", 10: "专辑", 100: "歌手", 1000: "歌单"}
def searchSong(key):
url = uapi.search_api
data = {'s': key, 'offset': 0, 'limit': 20, 'type': "1"}
req = requests.post(url, headers=uapi.header, data=data, timeout=10)
if req.json()["result"]['songCount'] == 0:
pylog.log.warn("关键词 {} 没有可搜索歌曲".format(key))
return
songs = req.json()["result"]['songs']
song_table = AsciiTable([["ID", "歌曲", "专辑", "演唱"]])
for item in songs:
id = item['id']
name = tools.encode(item['name'])
album = tools.encode(item['album']['name'])
artist = []
for a in item['artists']:
artist.append(tools.encode(a['name']))
song_table.table_data.append([str(id), name, album, ",".join(artist)])
print(pylog.Blue("与 \"{}\" 有关的歌曲".format(key)))
print(song_table.table)
def searchAlbum(key):
url = uapi.search_api
data = {'s': key, 'offset': 0, 'limit': 20, 'type': "10"}
req = requests.post(url, headers=uapi.header, data=data, timeout=10)
if req.json()["result"]['albumCount'] == 0:
pylog.log.warn("关键词 {} 没有可搜索专辑".format(key))
return
albums = req.json()["result"]['albums']
song_table = AsciiTable([["ID", "专辑", "演唱","发行方"]])
for item in albums:
id = item['id']
name = tools.encode(item['name'])
company = ""
if item['company'] != None:
company = tools.encode(item['company'])
artist = []
for a in item['artists']:
artist.append(tools.encode(a['name']))
song_table.table_data.append([str(id), name, ",".join(artist), company])
print(pylog.Blue("与 \"{}\" 有关的专辑".format(key)))
print(song_table.table)
def searchSinger(key):
url = uapi.search_api
data = {'s': key, 'offset': 0, 'limit': 10, 'type': "100"}
req = requests.post(url, headers=uapi.header, data=data, timeout=10)
if req.json()["result"]['artistCount'] == 0:
pylog.log.warn("关键词 {} 没有可搜索艺术家".format(key))
return
artists = req.json()["result"]['artists']
song_table = AsciiTable([["ID", "姓名", "专辑数量", "MV数量"]])
for item in artists:
id = str(item['id'])
name = tools.encode(item['name'])
acount = str(item['albumSize'])
mcount = str(item['mvSize'])
song_table.table_data.append([id, name, acount, mcount])
print(pylog.Blue("与 \"{}\" 有关的歌手".format(key)))
print(song_table.table)
def searchPlaylist(key):
url = uapi.search_api
data = {'s': key, 'offset': 0, 'limit': 5, 'type': "1000"}
req = requests.post(url, headers=uapi.header, data=data, timeout=10)
if req.json()["result"]['playlistCount'] == 0:
pylog.log.warn("关键词 {} 没有可搜索歌单".format(key))
return
playlists = req.json()["result"]['playlists']
song_table = AsciiTable([["ID", "歌单", "维护者", "播放数量", "收藏数量"]])
for item in playlists:
id = str(item['id'])
name = tools.encode(item['name'])
creator = tools.encode(item['creator']['nickname'])
pcount = str(item['playCount'])
bcount = str(item['bookCount'])
song_table.table_data.append([id, name, creator, pcount, bcount])
print(pylog.Blue("与 \"{}\" 有关的歌单".format(key)))
print(song_table.table)
================================================
FILE: spider163/template/spider163.conf
================================================
# 请把本配置文件名字中删掉.default保存到工作目录中,默认为~/spider163/
# 请修改相关配置为本地可用配置
[core]
db=mysql://root:password@127.0.0.1/database?charset=utf8mb4
port=1630
[mail]
host=localhost
port=25
users=792400644@qq.com,
================================================
FILE: spider163/utils/__init__.py
================================================
# --* coding: utf-8 -*-
================================================
FILE: spider163/utils/config.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import re
from spider163 import version
if version.PYTHON3 is True:
import configparser as ConfigParser
else:
import ConfigParser
PATH = os.environ.get("HOME") + "/spider163"
if os.environ.get("SPIDER163_PATH") is not None:
PATH = os.environ.get("SPIDER163_PATH")
if not os.path.exists(PATH):
os.makedirs(PATH)
cf = ConfigParser.ConfigParser()
if not os.path.exists(PATH + "/spider163.conf"):
print("请在默认路径 " + PATH + " 下增加配置文件 spider163.conf 格式参照官方")
cf.read("{}/template/spider163.conf".format(version.root_path))
else:
cf.read(PATH + "/spider163.conf")
def get_path():
return PATH
def get_db():
try:
return cf.get("core", "db")
except Exception as e:
print("配置文件存在问题,请在 {}/spider163.conf 中配置db=xxx选项".format(PATH))
print("错误详情: {}".format(e))
raise e
def get_mail():
try:
return cf.get("mail", "host"),cf.get("mail", "port"),cf.get("mail", "users"),
except Exception as e:
print("配置文件存在问题,请在 {}/spider163.conf 中配置mail选项".format(PATH))
print("错误详情: {}".format(e))
raise e
def format_db():
"""db=mysql://root:password@127.0.0.1/spider?charset=utf8mb4"""
link = get_db()
r = re.search("mysql:\/\/([^:]+):([^@]+)@((?:[0-9]{1,3}\.){3}[0-9]{1,3})/([^\?]+)\?charset=utf8mb4", link)
if r is None:
return r
else:
return {
"link": r.group(0),
"user": r.group(1),
"password": r.group(2),
"ip": r.group(3),
"database": r.group(4)
}
def get_mysql():
link = get_db()
db = re.search('(?<=/)[^/]+(?=\?)', link).group(0)
uri = re.search('.*(?=/)', link).group(0)
return {"db": db, "uri": uri}
def get_port():
try:
return int(cf.get("core", "port"))
except Exception as e:
print("配置文件存在问题,请在 {}/spider163.conf 中配置port=xxx选项".format(PATH))
print("错误详情: {}".format(e))
raise e
================================================
FILE: spider163/utils/const.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
RETURN_JSON = "return json data"
RETURE_HTML = "return html data"
================================================
FILE: spider163/utils/encrypt.py
================================================
# coding=utf-8
import base64
from cryptography.hazmat.primitives.ciphers import (
Cipher, algorithms, modes
)
from cryptography.hazmat.backends import default_backend
def aes(text, sec_key):
backend = default_backend()
pad = 16 - len(text) % 16
text = text + pad * chr(pad)
cipher = Cipher(
algorithms.AES(sec_key.encode('utf-8')),
modes.CBC(b'0102030405060708'),
backend=backend
)
encryptor = cipher.encryptor()
ciphertext = encryptor.update(text.encode('utf-8')) + encryptor.finalize()
ciphertext = base64.b64encode(ciphertext)
return ciphertext
================================================
FILE: spider163/utils/healthz.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import json
from sqlalchemy import func
from spider163.utils import config
from spider163.utils import pylog
from spider163.utils import pysql
from spider163 import settings
def is_correct_config():
pylog.print_info("正在检查配置路径:{}".format(config.PATH))
if not os.path.exists(config.PATH + "/spider163.conf"):
print(" - 配置路径下 spider163.conf {}".format(pylog.red("不存在")))
else:
print(" - 配置路径下 spider163.conf {}".format(pylog.green("存在")))
pylog.print_info("正在检查配置文件 {}/spider163.conf 内容是否完整".format(config.PATH))
try:
config.cf.get("core", "db")
print(" - 配置文件中 db 选项 {}".format(pylog.green("存在")))
except Exception:
print(" - 配置文件中 db 选项 {}".format(pylog.red("不存在")))
try:
config.cf.get("core", "port")
print(" - 配置文件中 port 选项 {}".format(pylog.green("存在")))
except Exception:
print(" - 配置文件中 port 选项 {}".format(pylog.red("不存在")))
def is_correct_db():
db = config.format_db()
pylog.print_info("正在检查配置的数据库格式和可用性")
if db is None:
print(" - 配置文件中 db 选项 {}".format(pylog.red("不正确")))
else:
print(" - 账号:{} 密码:{} IP:{} 数据库:{}".format(db["user"], db["password"], db["ip"], db["database"]))
try:
from sqlalchemy import create_engine
create_engine(db["link"], echo=False).execute("show databases")
print("数据库连接验证 {}".format(pylog.green("成功")))
except Exception as e:
pylog.print_err("数据库连接失败,上述配置信息有问题: {}".format(e))
def can_spider():
print("抓取验证未完成")
def expose_data():
playlist = settings.Session.query(pysql.Playlist163).count()
playlist_type = settings.Session.query(pysql.Playlist163, func.count(pysql.Playlist163.id)).group_by(pysql.Playlist163.id).all()
music = settings.Session.query(pysql.Music163).count()
comment = settings.Session.query(pysql.Comment163).count()
lyric = settings.Session.query(pysql.Lyric163).count()
top = settings.Session.query(pysql.Toplist163).count()
data = {'playlist': {'count': playlist},
'music': {'count': music,},
'comment': {'count': comment},
'lyric': {'count': lyric},
'top':{'count': top}
}
js = json.dumps(data, ensure_ascii=False, indent=2)
print(js)
================================================
FILE: spider163/utils/mail.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import smtplib
import datetime
def send_email(host,port, subject, user, content):
illegal = ["\n", "\r"]
for ill in illegal:
subject = subject.replace(ill, ' ')
headers = {
'Content-Type': 'text/html; charset=utf-8',
'Content-Disposition': 'inline',
'Content-Transfer-Encoding': '8bit',
'Subject': subject,
'From': "chengyumeng@github.com",
'To': user,
'Date': datetime.datetime.now().strftime('%a, %d %b %Y %H:%M:%S %Z'),
'X-Mailer': 'ChengYumeng',
}
msg = ''
for key, value in headers.items():
msg += "%s: %s\n" % (key, value)
# add contents
msg += "\n%s\n" % content
s = smtplib.SMTP(host, port)
print ("sending %s to %s" % (subject,headers['To']))
s.sendmail( headers['From'], headers['To'], msg.encode("utf8"))
s.quit()
================================================
FILE: spider163/utils/pylog.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import logging
from spider163.utils import config
from logbook import FileHandler, Logger
from terminaltables import AsciiTable
from colorama import Fore
from colorama import init
path = config.get_path()
log_handler = FileHandler(filename=path + '/spider163.log')
log_handler.push_application()
log = Logger("")
init(autoreset=True)
def Log(msg):
print_warn(msg)
log.warn(msg)
def Table(tb):
print(AsciiTable(tb).table)
def Blue(msg):
return Fore.BLUE + msg
def green(msg):
return Fore.GREEN + msg
def red(msg):
return Fore.RED + msg
def print_err(msg):
print(Fore.RED + msg)
def print_warn(msg):
print(Fore.YELLOW + msg)
def print_info(msg):
print(Fore.BLUE + msg)
================================================
FILE: spider163/utils/pysql.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import random
from sqlalchemy import Column, Integer, String, TIMESTAMP, Index
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.sql import func
from sqlalchemy.dialects.mysql import MEDIUMTEXT
from spider163 import settings
from spider163.utils import pylog
Base = declarative_base()
class Playlist163(Base):
__tablename__ = "playlist163"
id = Column(Integer(), primary_key=True, autoincrement=True) # 歌曲ID
title = Column(String(5000), server_default="System Title") # 歌单名字
link = Column(String(255), server_default="No Link") # 歌曲链接
cnt = Column(Integer(), server_default="-1") # 歌曲数量
playCount = Column(Integer(), server_default="-1") # 播放次数
shareCount = Column(Integer(), server_default="-1") # 分享次数
commentCount = Column(Integer(), server_default="-1") # 评论数量
description = Column(MEDIUMTEXT)
tags = Column(String(255), server_default="")
dsc = Column(String(255), server_default="No Description")
create_time = Column(TIMESTAMP, server_default=func.now())
update_time = Column(TIMESTAMP, server_default=func.now())
done = Column(String(255), server_default="N")
done_link = Index("done_link", done, link)
class Music163(Base):
__tablename__ = "music163"
id = Column(Integer(), primary_key=True, autoincrement=True)
song_id = Column(Integer())
song_name = Column(String(5000), server_default="No Name")
author = Column(String(5000), server_default="No Author")
playTime = Column(Integer(), server_default="-1") # 歌曲播放次数
done = Column(String(255), server_default="N")
has_lyric = Column(String(255), server_default="N")
create_time = Column(TIMESTAMP, server_default=func.now())
update_time = Column(TIMESTAMP, server_default=func.now())
comment = Column(Integer(), server_default="-1")
done_id = Index("done_id", done,id)
song_id_comment = Index("song_id_comment", song_id, comment)
class Toplist163(Base):
__tablename__ = "top163"
id = Column(Integer(), primary_key=True, autoincrement=True)
song_id = Column(Integer())
song_name = Column(String(5000), server_default="No Name")
author = Column(String(5000), server_default="No Author")
playTime = Column(Integer(), server_default="-1") # 歌曲播放次数
done = Column(String(255), server_default="N")
mailed = Column(String(255), server_default="N")
has_lyric = Column(String(255), server_default="N")
create_time = Column(TIMESTAMP, server_default=func.now())
update_time = Column(TIMESTAMP, server_default=func.now())
comment = Column(Integer(), server_default="-1")
lastRank = Column(Integer(), server_default="100000000") # 上次排名字段
playlist_id = Column(Integer(), server_default="-1") # 排行榜歌单ID
position = Column(Integer(), server_default="0")
done_id = Index("done_id", done, id)
song_id_comment = Index("song_id_comment", song_id, comment)
class Comment163(Base):
__tablename__ = "comment163"
id = Column(Integer(), primary_key=True, autoincrement=True)
song_id = Column(Integer())
txt = Column(MEDIUMTEXT)
author = Column(String(5000), server_default="No Author")
liked = Column(Integer(), server_default="0")
create_time = Column(TIMESTAMP, server_default=func.now())
Index("liked_song_id", liked, song_id)
Index("song_id_liked", song_id, liked)
class Lyric163(Base):
__tablename__ = "lyric163"
id = Column(Integer(), primary_key=True, autoincrement=True)
song_id = Column(Integer())
txt = Column(MEDIUMTEXT)
create_time = Column(TIMESTAMP, server_default=func.now())
key_song_id = Index("song_id", song_id)
def single(table, k, v):
cnt = settings.engine.execute('select count(*) from ' + table + ' where ' + k + '=\'' + str(v) + '\'').fetchone()
if cnt[0] == 0:
return True
else:
return False
def stat_playlist():
data = {}
data["gdType"] = settings.Session.query(func.substring(Playlist163.dsc, 4, 2).label('type'), func.count('*').label('count')).group_by("type").all()
data["gdOver"] = settings.Session.query(Playlist163.done.label('over'), func.count('*').label('count')).group_by("over").all()
return data
def stat_music():
data = {"author-comment-count": []}
cd = settings.Session.query(Music163.author.label('author'), func.sum(Music163.comment).label('count')).group_by("author").order_by(func.sum(Music163.comment).label('count').label('count').desc()).limit(30).all()
for m in cd:
data["author-comment-count"].append([m[0], int(m[1])])
data["music-comment-count"] = settings.Session.query(Music163.song_name, Music163.comment.label("count")).order_by(Music163.comment.label("count").desc()).limit(30).all()
return data
def stat_data():
data = {}
data["countPlaylist"] = int(settings.engine.execute("select(select count(*) from playlist163 where done = 'Y')*100 / count(*) from playlist163").fetchone()[0]);
data["countComment"] = int(settings.engine.execute("select(select count(*) from music163 where done = 'Y')*100 / count(*) from music163").fetchone()[0]);
data["countLyric"] = int(settings.engine.execute("select(select count(*) from music163 where has_lyric = 'Y')*100 / count(*) from music163").fetchone()[0]);
return data
def random_data():
rng = settings.Session.query(func.min(Comment163.id), func.max(Comment163.id)).all()[0]
data = []
for i in range(12):
v = random.uniform(rng[0], rng[1])
d = settings.engine.execute("select txt,liked,a.author,song_name,a.song_id,b.author from comment163 a inner join music163 b on a.song_id= b.song_id where a.id>" +str(v) + " limit 1").fetchone()
data.append({"txt": d[0],"like": d[1] ,"author": d[2], "song" :{"name":d[3], "author": d[5], "id": d[4]}})
return data
def initdb():
try:
Base.metadata.create_all(settings.engine)
except Exception as e:
pylog.print_warn("自动生成数据库表出现问题: {}".format(e))
def dropdb():
try:
Base.metadata.drop_all(settings.engine)
except Exception as e:
pylog.print_warn("自动删除数据库表出现问题: {}".format(e))
================================================
FILE: spider163/utils/tools.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import contextlib
import codecs
import requests
import json
import hashlib
from bs4 import BeautifulSoup
from spider163 import version
from spider163.utils import const
@contextlib.contextmanager
def ignored(*exceptions):
try:
yield
except exceptions:
pass
def encode(s):
if version.PYTHON3 is True:
return codecs.encode(s,"utf-8").decode("utf-8")
else:
return s.encode("utf-8")
def hex(s):
if version.PYTHON3 is True:
return codecs.encode(bytes(s, encoding = "utf8"), 'hex')
else:
return s.encode("hex")
def md5(s):
m = hashlib.md5()
m.update(s.encode("utf-8"))
return m.hexdigest()
def curl(url, headers, type = const.RETURN_JSON):
try:
s = requests.session()
bs = BeautifulSoup(s.get(url, headers=headers).content, "html.parser")
if type == const.RETURN_JSON:
return json.loads(bs.text)
elif type == const.RETURE_HTML:
return bs
else:
return bs.text
except Exception:
raise
================================================
FILE: spider163/version.py
================================================
# -*- coding: utf-8 -*-
import os
import sys
PYTHON3 = False
if sys.version > "3":
PYTHON3 = True
VERSION = "2.7.8"
DESCRIPTION = """
Spider163的数据基础,来源于网易公司的网易云音乐产品。其授权协议,包含《网易云音乐服务条款》但不包含其霸王条款。
该项目遵循MIT开源协议。我们认为知识属于全人类,网易云音乐的评论区属于用户,不属于网易云音乐。
而广大网民有权利根据自己的喜好阅读、整理、分析和总结开放的、非私密信息。
您可以选择四种方式支持本项目的开发:
No.1 在Github上star本项目,或者在其它任何场合宣传本项目。
附:Spider163 GitHub 地址 https://github.com/Chengyumeng/spider163
No.2 关注本项目作者的唯一个人微信公众账号。
公众号名字:程天写代码
No.3 通过支付宝向作者转账赞助。
支付宝二维码:https://github.com/Chengyumeng/spider163/blob/master/spider163/www/static/img/zhifubao.jpeg
No.4 通过微信向作者转账赞助。
微信二维码:https://github.com/Chengyumeng/spider163/blob/master/spider163/www/static/img/weixin.jpeg
"""
root_path = os.path.dirname(os.path.abspath(__file__))
MAILBODY = """
<h2 style="color: #C20C0C; margin: 10px 0;"><a href="https://github.com/Chengyumeng/spider163" target="_blank">Spider163</a> 云音乐今日精彩推荐(微信公众号:pod1024)</h2>
<h2 style="color: #C20C0C; margin: 10px 0;"></h2>
<p style="color: #C10B0B; margin: 10px 0;" >今日分享 {} 摘编歌曲:</p>
<ul>{}</ul>
<div style="margin: 20px 0 0 0;">
<p style="font-weight: 400;font-style: normal;font-size: 30px;color: #333;text-align: center;margin: 30px auto;">欢迎关注程天写代码微信公众号:pod1024</p>
</div>
"""
MAILMUSIC = """
<li><span style="font-weight: bold; margin: 2px 10px 5px 10px;"><a href="http://music.163.com/#/song?id={}" target="_blank">{}</a></span>
<span style="font-weight: bold; margin: 2px 10px 5px 10px;">{}</span>
<span style="font-weight: bold; margin: 2px 10px 5px 10px;">评论数:{}</span></li><hr>
{}
"""
MAILCOMMENT = """
<p><span style="font-weight: bold; margin: 2px 10px 5px 10px;color: #a40011;">{}</span>
<span style="font-weight: bold; margin: 2px 10px 5px 10px;color:">{} :</span> </p>
<p style="margin: 12px 20px 15px 20px;">{}</p>
"""
================================================
FILE: spider163/www/__init__.py
================================================
================================================
FILE: spider163/www/static/css/spider163.css
================================================
.brand-intro {
text-align: center;
background-color: #a811b5;
color: #fff;
padding: 120px 0;
background-position: center center;
background-repeat: no-repeat;
}
.sui-btn {
display: inline-block;
padding: 2px 14px;
box-sizing: border-box;
margin-bottom: 0;
font-size: 12px;
line-height: 18px;
text-align: center;
vertical-align: middle;
cursor: pointer;
color: #333333;
background-color: #eeeeee;
filter: progid:DXImageTransform.Microsoft.gradient(enabled = false);
border: 1px solid #e1e1e1;
-webkit-border-radius: 2px;
-moz-border-radius: 2px;
border-radius: 2px;
-webkit-user-select: none;
-moz-user-select: none;
-ms-user-select: none;
-o-user-select: none;
user-select: none;
}
.btn-lead {
padding: 1em 4em;
border: 2px solid #fff;
opacity: .8;
background: transparent;
color: #fff;
font-size: 24px;
height: auto;
}
.footer {
border-top: 1px solid #aaa;
padding: 30px;
text-align: center;
color: #666;
font-size: 13px;
margin-top: 50px;
}
.index-body {
margin-top: 22px;
margin-bottom: 22px;
}
#index-group .i-c-p1 {
background: #A020F0 no-repeat 35px -47px;
}
#index-group .i-c-p2 {
background: #FF1493 no-repeat 35px -47px;
}
#index-group .i-c-p3 {
background: #e8b875 no-repeat 35px -47px;
}
#index-group .i-c-p4 {
background: #baa1e2 no-repeat 25px -260px;
}
#index-group .i-c-p {
height: 42px;
position: relative;
text-align: center;
overflow: hidden;
}
#index-group .i-c-ph, #index-group .i-c-ph1 {
color: #fff;
font-size: 1.3em;
margin: 0;
text-align: center;
position: absolute;
z-index: 10;
font-weight: 400;
left: 0;
width: 100%;
}
.i-c-ph, .i-c-ph1 {
color: #fff;
font-size: 1.5em;
text-align: center;
position: absolute;
z-index: 10;
bottom: .7em;
font-weight: 400;
left: 0;
width: 100%;
}
.i-c-list li {
height: 2.5em;
line-height: 2.5em;
overflow: hidden;
border-bottom: 1px dashed #eaeaea;
list-style: none;
padding-left: 10px;
font-size: 1.1em;
}
.modals {
margin-top:200px;
}
.gdspider, .gqspider, .gcspider, .rpspider {
margin:20px auto auto auto;
width:50%;
min-width:400px;
}
================================================
FILE: spider163/www/static/js/macarons.js
================================================
(function (root, factory) {
if (typeof define === 'function' && define.amd) {
// AMD. Register as an anonymous module.
define(['exports', 'echarts'], factory);
} else if (typeof exports === 'object' && typeof exports.nodeName !== 'string') {
// CommonJS
factory(exports, require('echarts'));
} else {
// Browser globals
factory({}, root.echarts);
}
}(this, function (exports, echarts) {
var log = function (msg) {
if (typeof console !== 'undefined') {
console && console.error && console.error(msg);
}
};
if (!echarts) {
log('ECharts is not Loaded');
return;
}
var colorPalette = [
'#2ec7c9','#b6a2de','#5ab1ef','#ffb980','#d87a80',
'#8d98b3','#e5cf0d','#97b552','#95706d','#dc69aa',
'#07a2a4','#9a7fd1','#588dd5','#f5994e','#c05050',
'#59678c','#c9ab00','#7eb00a','#6f5553','#c14089'
];
var theme = {
color: colorPalette,
title: {
textStyle: {
fontWeight: 'normal',
color: '#008acd'
}
},
visualMap: {
itemWidth: 15,
color: ['#5ab1ef','#e0ffff']
},
toolbox: {
iconStyle: {
normal: {
borderColor: colorPalette[0]
}
}
},
tooltip: {
backgroundColor: 'rgba(50,50,50,0.5)',
axisPointer : {
type : 'line',
lineStyle : {
color: '#008acd'
},
crossStyle: {
color: '#008acd'
},
shadowStyle : {
color: 'rgba(200,200,200,0.2)'
}
}
},
dataZoom: {
dataBackgroundColor: '#efefff',
fillerColor: 'rgba(182,162,222,0.2)',
handleColor: '#008acd'
},
grid: {
borderColor: '#eee'
},
categoryAxis: {
axisLine: {
lineStyle: {
color: '#008acd'
}
},
splitLine: {
lineStyle: {
color: ['#eee']
}
}
},
valueAxis: {
axisLine: {
lineStyle: {
color: '#008acd'
}
},
splitArea : {
show : true,
areaStyle : {
color: ['rgba(250,250,250,0.1)','rgba(200,200,200,0.1)']
}
},
splitLine: {
lineStyle: {
color: ['#eee']
}
}
},
timeline : {
lineStyle : {
color : '#008acd'
},
controlStyle : {
normal : { color : '#008acd'},
emphasis : { color : '#008acd'}
},
symbol : 'emptyCircle',
symbolSize : 3
},
line: {
smooth : true,
symbol: 'emptyCircle',
symbolSize: 3
},
candlestick: {
itemStyle: {
normal: {
color: '#d87a80',
color0: '#2ec7c9',
lineStyle: {
color: '#d87a80',
color0: '#2ec7c9'
}
}
}
},
scatter: {
symbol: 'circle',
symbolSize: 4
},
map: {
label: {
normal: {
textStyle: {
color: '#d87a80'
}
}
},
itemStyle: {
normal: {
borderColor: '#eee',
areaColor: '#ddd'
},
emphasis: {
areaColor: '#fe994e'
}
}
},
graph: {
color: colorPalette
},
gauge : {
axisLine: {
lineStyle: {
color: [[0.2, '#2ec7c9'],[0.8, '#5ab1ef'],[1, '#d87a80']],
width: 10
}
},
axisTick: {
splitNumber: 10,
length :15,
lineStyle: {
color: 'auto'
}
},
splitLine: {
length :22,
lineStyle: {
color: 'auto'
}
},
pointer : {
width : 5
}
}
};
echarts.registerTheme('macarons', theme);
}));
================================================
FILE: spider163/www/static/js/scan.js
================================================
$(function () {
this.createDom = function(){
}
this.documentEvent = function(){
}
this.init = function() {
commentlist()
}
this.init()
});
function commentlist() {
$.ajax({
url : "/scan/data",
type:"get",
dataType : "json",
success : function (data) {
code = "";
for ( i in data['comment']) {
c = data['comment'][i];
code = code + "<div class=\"col-md-5 col-sm-4 col-xs-6 i-c-item\">"
+ "<a title=\""+ c["song"]["name"]+"\" target=\"_blank\" href=\"http://music.163.com/#/song?id=" + c["song"]["id"]+ "\"><div class=\"i-c-p i-c-p3\"><h2 class=\"i-c-ph\">"
+ c["song"]["name"] + " - " + c["song"]["author"] + "</h2></div></a>"
+ "<p class=\"navbar-text\">" + c["txt"] + "</p>"
+ "<div style=\"position: absolute\"><small><span class=\"glyphicon glyphicon-user\" aria-hidden=\"true\"> " + c["author"] + "</span> <span class=\"glyphicon glyphicon-heart\" aria-hidden=\"true\"> " + c["like"] + "</span>" + "</small></div></div>"
}
$("#index-group").html(code);
}
});
}
================================================
FILE: spider163/www/static/js/spider163.js
================================================
$(function () {
this.createDom = function () {
this.spiderPlaylistObj = $("#spiderPlaylist");
this.SpiderMusicObj = $("#spiderMusic");
this.SpiderLyricObj = $("#spiderLyric");
this.SpiderCommentObj = $("#spiderComment");
}
this.documentEvent = function () {
var self = this;
this.hideBox = function() {
$(".gdspider").css({"display":"none"});
$(".gqspider").css({"display":"none"});
$(".gcspider").css({"display":"none"});
$(".rpspider").css({"display":"none"});
}
$("#gd").click(function(){
self.hideBox();
$(".gdspider").css({"display":"block"});
});
$("#gq").click(function(){
self.hideBox();
$(".gqspider").css({"display":"block"});
});
$("#gc").click(function(){
self.hideBox();
$(".gcspider").css({"display":"block"});
});
$("#rp").click(function(){
self.hideBox();
$(".rpspider").css({"display":"block"});
});
this.spiderPlaylistObj.click(function() {
var gdType = $("#gdType").val();
var gdPage = $("#gdPage").val();
$.ajax({
url : "/spider/getPlaylist",
data:"gdType="+gdType+"&gdPage="+gdPage,
type:"post",
dataType : "json",
success : function (data) {
var thead = " <thead><tr><th>#</th><th>歌单名字</th></tr></thead>";
var tbody = "";
for (t in data['title']) {
tbody = tbody + "<tr><th scope=\"row\">"+ t +"</th><td>"+ data['title'][t] +"</td></tr>";
}
$("#printTable").html(thead + "<tbody>" + tbody + "</tbody>");
},
});
});
this.SpiderMusicObj.click(function() {
var gdSource = $("#gdSource").val();
var gdCount = $("#gdCount").val();
for (i=0;i< gdCount; i++ ){
$.ajax({
url : "/spider/getMusic",
data:"gdSource="+gdSource,
type:"post",
dataType : "json",
success : function (data) {
var thead = " <thead><tr><th>#</th><th>歌单名字</th><th>歌曲名字</th><th>作者</th></tr></thead>";
var tbody = "";
for (playlist in data['data']) {
for ( m in data['data'][playlist]) {
tbody = tbody + "<tr><th scope=\"row\">"
+ "</th><td>"+ playlist +"</td>"
+"<td>"+ data['data'][playlist][m]["name"] + "</td>"
+ "<td>"+ data['data'][playlist][m]["author"] + "</td>"
+"</tr>";
}
}
if((data['data'][playlist]).length > 0) {
$("#printTable").html(thead + "<tbody>" + tbody + "</tbody>");
}
},
});
}
});
this.SpiderLyricObj.click(function() {
var gqCount = $("#gqCount").val();
$.ajax({
url : "/spider/getLyric",
data:"gqCount="+gqCount,
type:"post",
dataType : "json",
success : function (data) {
var thead = " <thead><tr><th>#</th><th>歌曲名字</th><th>作者</th><th>评论数量</th></tr></thead>";
var tbody = "";
for (cnt in data['data']) {
tbody = tbody + "<tr><th scope=\"row\">" + cnt
+ "</th>"
+"<td>"+ data['data'][cnt]["name"] + "</td>"
+ "<td>"+ data['data'][cnt]["author"] + "</td>"
+"<td>"+ data['data'][cnt]["comment"] + "</td>"
+"</tr>";
}
$("#printTable").html(thead + "<tbody>" + tbody + "</tbody>");
},
});
});
this.SpiderCommentObj.click(function() {
var gqCount = $("#gqCount-1").val();
$.ajax({
url : "/spider/getComment",
data:"gqCount="+gqCount,
type:"post",
dataType : "json",
success : function (data) {
var thead = " <thead><tr><th>#</th><th>歌曲名字</th><th>作者</th><th>ID</th></tr></thead>";
var tbody = "";
for (cnt in data['data']) {
tbody = tbody + "<tr><th scope=\"row\">" + cnt
+ "</th>"
+"<td>"+ data['data'][cnt]["name"] + "</td>"
+ "<td>"+ data['data'][cnt]["author"] + "</td>"
+"<td>"+ data['data'][cnt]["song_id"] + "</td>"
+"</tr>";
}
$("#printTable").html(thead + "<tbody>" + tbody + "</tbody>");
},
});
});
};
this.init = function() {
this.createDom();
this.documentEvent();
}
this.init()
});
================================================
FILE: spider163/www/static/js/stat.js
================================================
$(function () {
this.createDom = function () {
this.spiderPlaylistObj = $("#spiderPlaylist");
}
this.documentEvent = function () {
var self = this;
this.spiderPlaylistObj.click(function() {
var gdType = $("#gdType").val();
var gdCount = $("#gdCount").val();
$('#gdModal').modal('hide')
$.ajax({
url : "/spider/getPlaylist",
data:"gdType="+gdType+"&gdCount="+gdCount,
type:"post",
dataType : "json",
success : function (data) {
alert(data["test"]);
},
});
});
};
this.createCharts = function() {
dataCount()
playlist()
music()
setInterval(dataCount,10000);
setInterval(playlist,1000000);
setInterval(music,1000000);
}
this.init = function() {
this.createDom();
this.documentEvent();
this.createCharts();
}
this.init()
});
function dataCount() {
$.ajax({
url : "/stat/dataCount",
type:"get",
dataType : "json",
success : function (data) {
var name = {"countPlaylist":"歌单抓取","countLyric":"歌词抓取","countComment":"评论抓取"};
for (k in data){
var chart = echarts.init(document.getElementById(k), 'macarons');
var option = {
tooltip : {formatter: "{a} <br/>{b} : {c}%"},
// toolbox: {feature: {restore: {},saveAsImage: {}}},
series: [
{
name: k,
type: 'gauge',
detail: {formatter:'{value}%'},
data: [{value: data[k], name: name[k]}]
}
]};
chart.setOption(option);
}
},
});
}
function playlist() {
$.ajax({
url : "/stat/playlist",
type:"get",
dataType : "json",
success : function (data) {
for (k in data){
var chart = echarts.init(document.getElementById(k), 'macarons');
var t = [];
var v = [];
for (d in data[k]) {
t.push(data[k][d][0]);
v.push(data[k][d][1]);
}
var option = {
title: {text: k},
tooltip: {},
legend: {data:['数量']},
xAxis: {data: t},
yAxis: {},
series: [{name: '数量',type: 'bar',data: v }]
};
chart.setOption(option);
}
},
});
}
function music() {
$.ajax({
url : "/stat/music",
type:"get",
dataType : "json",
success : function (data) {
for (k in data){
var chart = echarts.init(document.getElementById(k), 'macarons');
var t = [];
var v = [];
for (d in data[k]) {
t.push(data[k][d][0]);
v.push(data[k][d][1]);
}
var option = {
title: {text: k},
tooltip: {},
legend: {data:['评论数量']},
xAxis: {data: t,
axisLabel:{
interval:0,
rotate:45,//倾斜度 -90 至 90 默认为0
margin:4,
},
},
yAxis: {},
series: [{name: '评论数量',type: 'bar',data: v }]
};
chart.setOption(option);
}
},
});
}
================================================
FILE: spider163/www/templates/bussiness.html
================================================
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Welcome To Spider163</title>
<link rel="stylesheet" href="/static/css/spider163.css">
<link href="https://cdn.bootcss.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet">
<link href="http://g.alicdn.com/sj/dpl/1.5.1/css/sui.min.css" rel="stylesheet">
</head>
<body>
<ul class="nav nav-tabs">
<li role="presentation"><a href="/#">主页</a></li>
<li role="presentation"><a href="/spider">抓取</a></li>
<li role="presentation"><a href="/stat">统计</a></li>
<li role="presentation"><a href="/scan">浏览</a></li>
<li role="presentation"><a href="/bussiness">商业</a></li>
</ul>
<div class="page-header" style="padding-left:100px">
<h2>Spider163 用户许可 和 我们的期待</h2>
</div>
<div>
<p class="navbar-text">
Spider163的数据基础,来源于网易公司的网易云音乐产品。其授权协议,包含《网易云音乐服务条款》但不包含其霸王条款。<br>
该项目遵循MIT开源协议。我们认为知识属于全人类,网易云音乐的评论区属于用户,不属于网易云音乐。<br>
而广大网民有权利根据自己的喜好阅读、整理、分析和总结开放的、非私密信息。<br>
附:<a href="http://music.163.com/html/web2/service.html" target="_blank">《网易云音乐服务条款》</a><br>
您可以选择四种方式支持本项目的开发:<br>
No.1 在Github上star本项目,或者在其它任何场合宣传本项目。<br>
附:<a href="https://github.com/Chengyumeng/spider163" target="_blank">Spider163 GitHub 地址</a><br>
No.2 关注本项目作者的唯一个人微信公众账号。<br>
公众号名字:程天写代码<br>
No.3 通过支付宝向作者转账赞助。<br>
<img src="/static/img/zhifubao.jpeg" width="180px"><br>
No.4 通过微信向作者转账赞助。<br>
<img src="/static/img/weixin.jpeg" width="180px"><br>
自2018年起,所有的现金转账赞助的朋友都可以在通过转账留言在本页面留言(50字以内)<br>
</p>
</div>
</body>
</html>
================================================
FILE: spider163/www/templates/index.html
================================================
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Welcome To Spider163</title>
<link rel="stylesheet" href="/static/css/spider163.css">
<link href="https://cdn.bootcss.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet">
</head>
<body>
<ul class="nav nav-tabs">
<li role="presentation"><a href="/#">主页</a></li>
<li role="presentation"><a href="/spider">抓取</a></li>
<li role="presentation"><a href="/stat">统计</a></li>
<li role="presentation"><a href="/scan">浏览</a></li>
<li role="presentation"><a href="#">商业</a></li>
</ul>
<div class="brand-intro">
<div class="sui-container">
<h1>简单易用、功能强大的网易云音乐爬虫</h1>
<p class="sui-lead">Spider163 是程天(GitHub:Chengyumeng,微信公众账号:程天写代码)开发的网易云音乐爬虫总库。<br>
项目遵循 MIT 开源协议</p>
<p class="btn-wrap">
<a href="/spider" class="sui-btn btn-lead">抓取</a>
<a href="/stat" class="sui-btn btn-lead">统计</a>
<a href="/scan" class="sui-btn btn-lead">浏览</a>
<a href="/bussiness" class="sui-btn btn-lead">商业</a>
</p>
</div>
</div>
<div class="footer">
<ul class="unstyled">
<li>@time 2017.04.07</li>
<li>@author 程天
<a href="https://www.zhihu.com/people/toocooltohavefriends/activities" class="">知乎</a>
<a href="" class="">微信</a>
<a href="https://github.com/Chengyumeng" class="">GitHub</a></li>
</ul>
</div>
</body>
<script src="https://ss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/js/lib/jquery-1.10.2_d88366fd.js"></script>
<script src="/static/js/spider163.js"></script>
</html>
================================================
FILE: spider163/www/templates/scan.html
================================================
<!DOCTYPE html>
<html lang="en" ng-app="spider">
<head>
<meta charset="UTF-8">
<title>Welcome To Spider163</title>
<link href="http://g.alicdn.com/sj/dpl/1.5.1/css/sui.min.css" rel="stylesheet">
<link href="https://cdn.bootcss.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet">
<link rel="stylesheet" href="/static/css/spider163.css">
</head>
<body>
<ul class="nav nav-tabs">
<li role="presentation"><a href="/#">主页</a></li>
<li role="presentation"><a href="/spider">抓取</a></li>
<li role="presentation"><a href="/stat">统计</a></li>
<li role="presentation"><a href="/scan">浏览</a></li>
<li role="presentation"><a href="/bussiness">商业</a></li>
</ul>
<div class="container index-body">
<h2 class="m-h2 m-h2-mb">随机展示 <i>本页面未来将增加搜索功能</i></h2>
<div class="row">
<div class="i-c-g" id="index-group"></div>
</div>
<div class="process"></div>
</div>
<div class="footer">
<ul class="unstyled">
<li>@time 2017.04.07</li>
<li>@author 程天
<a href="https://www.zhihu.com/people/toocooltohavefriends/activities" class="">知乎</a>
<a href="" class="">微信</a>
<a href="https://github.com/Chengyumeng" class="">GitHub</a></li>
</ul>
</div>
</body>
<script src="https://ss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/js/lib/jquery-1.10.2_d88366fd.js"></script>
<script src="/static/js/scan.js"></script>
</html>
================================================
FILE: spider163/www/templates/spider.html
================================================
<!DOCTYPE html>
<html lang="en" ng-app="spider">
<head>
<meta charset="UTF-8">
<title>Welcome To Spider163</title>
<link href="http://g.alicdn.com/sj/dpl/1.5.1/css/sui.min.css" rel="stylesheet">
<link href="https://cdn.bootcss.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet">
<link rel="stylesheet" href="/static/css/spider163.css">
</head>
<body>
<ul class="nav nav-tabs">
<li role="presentation"><a href="/#">主页</a></li>
<li role="presentation"><a href="/spider">抓取</a></li>
<li role="presentation"><a href="/stat">统计</a></li>
<li role="presentation"><a href="/scan">浏览</a></li>
<li role="presentation"><a href="/bussiness">商业</a></li>
</ul>
<div class="container index-body">
<h2 class="m-h2 m-h2-mb">智能抓取 <i>用户需求 · 智能定制</i></h2>
<div class="row">
<div id="index-group">
<div class="col-md-3 col-sm-4 col-xs-6">
<div class="i-c-item">
<a id="gd" title="歌单"><div class="i-c-p i-c-p1"><h2 class="i-c-ph">歌单</h2></div></a>
</div>
</div>
<div class="col-md-3 col-sm-4 col-xs-6">
<div class="i-c-item">
<a id="gq" title="歌曲"><div class="i-c-p i-c-p2"><h2 class="i-c-ph">歌曲</h2></div></a>
</div>
</div>
<div class="col-md-3 col-sm-4 col-xs-6">
<div class="i-c-item">
<a id="gc" title="歌词"><div class="i-c-p i-c-p3"><h2 class="i-c-ph">歌词</h2></div></a>
</div>
</div>
<div class="col-md-3 col-sm-4 col-xs-6">
<div class="i-c-item">
<a id="rp" title="热评"><div class="i-c-p i-c-p4"><h2 class="i-c-ph">热评</h2></div></a>
</div>
</div>
</div>
</div>
<div class="gdspider form-horizontal" style="display:none;">
<div class="form-group">
<label class="col-sm-2 control-label">歌单类型</label>
<div class="col-sm-10">
<input type="text" class="form-control" id="gdType" placeholder="全部,华语,欧美,日语,韩语,粤语,怀旧,清新,00后……">
</div>
</div>
<div class="form-group">
<label class="col-sm-2 control-label">歌单页码</label>
<div class="col-sm-10">
<input type="number" class="form-control" id="gdPage" placeholder="一般为1-34">
</div>
</div>
<button type="button" class="btn btn-primary btn-lg btn-block" id="spiderPlaylist">提交</button>
</div>
<div class="gqspider form-horizontal" style="display:none;">
<div class="form-group">
<label class="col-sm-2 control-label">歌单类型</label>
<div class="col-sm-10">
<input type="text" class="form-control" id="gdSource" placeholder="全部,华语,欧美,日语,韩语,粤语,怀旧,清新,00后……">
</div>
</div>
<div class="form-group">
<label class="col-sm-2 control-label">歌单数量</label>
<div class="col-sm-10">
<input type="number" class="form-control" id="gdCount" placeholder="为避免系统卡死尽量1-300">
</div>
</div>
<button type="button" class="btn btn-primary btn-lg btn-block" id="spiderMusic">提交</button>
</div>
<div class="gcspider form-horizontal" style="display:none;">
<div class="form-group">
<label class="col-sm-2 control-label">歌曲数量</label>
<div class="col-sm-10">
<input type="number" class="form-control" id="gqCount" placeholder="为避免系统卡死尽量1-300">
</div>
</div>
<button type="button" class="btn btn-primary btn-lg btn-block" id="spiderLyric">提交</button>
</div>
<div class="rpspider form-horizontal" style="display:none;">
<div class="form-group">
<label class="col-sm-2 control-label">歌曲数量</label>
<div class="col-sm-10">
<input type="number" class="form-control" id="gqCount-1" placeholder="为避免系统卡死尽量1-300">
</div>
</div>
<button type="button" class="btn btn-primary btn-lg btn-block" id="spiderComment">提交</button>
</div>
<div class="process">
<table class="table table-condensed" id="printTable">
</table>
</div>
</div>
<div class="footer">
<ul class="unstyled">
<li>@time 2017.04.07</li>
<li>@author 程天
<a href="https://www.zhihu.com/people/toocooltohavefriends/activities" class="">知乎</a>
<a href="" class="">微信</a>
<a href="https://github.com/Chengyumeng" class="">GitHub</a></li>
</ul>
</div>
</body>
<!--<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.5.6/angular.min.js"></script>-->
<script src="https://ss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/js/lib/jquery-1.10.2_d88366fd.js"></script>
<script src="/static/js/spider163.js"></script>
<script src="https://cdn.bootcss.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
</html>
================================================
FILE: spider163/www/templates/stat.html
================================================
<!DOCTYPE html>
<html lang="en" ng-app="stat">
<head>
<meta charset="UTF-8">
<title>Welcome To Spider163</title>
<link href="http://g.alicdn.com/sj/dpl/1.5.1/css/sui.min.css" rel="stylesheet">
<link href="https://cdn.bootcss.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet">
<link rel="stylesheet" href="/static/css/spider163.css">
<script src="/static/js/echarts.min.js"></script>
<script src="/static/js/macarons.js"></script>
</head>
<body>
<ul class="nav nav-tabs">
<li role="presentation"><a href="/#">主页</a></li>
<li role="presentation"><a href="/spider">抓取</a></li>
<li role="presentation"><a href="/stat">统计</a></li>
<li role="presentation"><a href="/scan">浏览</a></li>
<li role="presentation"><a href="/bussiness">商业</a></li>
</ul>
<div class="container index-body">
<h2 class="m-h2 m-h2-mb">Spider163 <i>统计 · 分析</i></h2>
<div class="row">
<div class="i-c-g" id="index-group">
<div class="col-md-4 col-sm-4 col-xs-6 i-c-item-outer">
<div id="countPlaylist" style="width: 100%;height:400px;">
</div>
</div>
<div class="col-md-4 col-sm-4 col-xs-6 i-c-item-outer">
<div id="countComment" style="width: 100%;height:400px;">
</div>
</div>
<div class="col-md-4 col-sm-4 col-xs-6 i-c-item-outer">
<div id="countLyric" style="width: 100%;height:400px;">
</div>
</div>
<!--<div class="col-md-4 col-sm-4 col-xs-6 i-c-item-outer">-->
<!--<div id="countLyric" style="width: 100%;height:400px;">-->
<!--</div>-->
<!--</div>-->
</div>
<div class="row">
<div class="i-c-g" id="index-group">
<div class="col-md-6 col-sm-4 col-xs-6 i-c-item-outer">
<div id="gdType" style="width: 100%;height:400px;">
</div>
</div>
<div class="col-md-6 col-sm-4 col-xs-6 i-c-item-outer">
<div id="gdOver" style="width: 100%;height:400px;">
</div>
</div>
</div>
</div>
<div class="row">
<div class="i-c-g" id="index-group">
<div class="col-md-12 col-sm-4 col-xs-6 i-c-item-outer">
<div id="author-comment-count" style="width: 100%;height:400px;">
</div>
</div>
</div>
</div>
<div class="row">
<div class="i-c-g" id="index-group">
<div class="col-md-12 col-sm-4 col-xs-6 i-c-item-outer">
<div id="music-comment-count" style="width: 100%;height:400px;">
</div>
</div>
</div>
</div>
<div class="process"></div>
</div>
<div class="footer">
<ul class="unstyled">
<li>@time 2017.04.07</li>
<li>@author 程天
<a href="https://www.zhihu.com/people/toocooltohavefriends/activities" class="">知乎</a>
<a href="" class="">微信</a>
<a href="https://github.com/Chengyumeng" class="">GitHub</a></li>
</ul>
</div>
</body>
<script src="https://ss0.bdstatic.com/5aV1bjqh_Q23odCf/static/superman/js/lib/jquery-1.10.2_d88366fd.js"></script>
<script src="/static/js/stat.js"></script>
<!--<script src="https://cdn.bootcss.com/bootstrap/3.3.7/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>-->
</html>
================================================
FILE: spider163/www/web.py
================================================
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from flask import Flask, request, json, jsonify
from flask import render_template, make_response
from spider163.spider import playlist
from spider163.spider import music
from spider163.spider import lyric
from spider163.spider import comment
from spider163.utils import pysql
from spider163.utils import tools
app = Flask(__name__, static_path='/static')
@app.route("/")
def index():
return render_template('index.html')
@app.route("/spider")
def spider(type=None):
return render_template('spider.html')
@app.route("/spider/getPlaylist", methods=['POST'])
def get_playlist():
pl = playlist.Playlist()
title = pl.view_capture(int(request.form['gdPage']),tools.encode(request.form["gdType"]))
return jsonify({"type": request.form["gdType"],"title": title})
@app.route("/spider/getMusic", methods=['POST'])
def get_music():
mu = music.Music()
data = mu.views_capture(tools.encode(request.form["gdSource"]))
return jsonify({"type": request.form["gdSource"],"data": data})
@app.route("/spider/getLyric", methods=["POST"])
def get_lyric():
ly = lyric.Lyric()
data = ly.view_lyrics(int(request.form["gqCount"]))
return jsonify({"count": request.form["gqCount"],"data": data})
@app.route("/spider/getComment", methods=["POST"])
def get_comment():
cm = comment.Comment()
data = cm.auto_view(int(request.form["gqCount"]))
return jsonify({"count": request.form["gqCount"],"data": data})
@app.route("/stat")
def statistics():
return render_template('stat.html')
@app.route("/stat/playlist")
def stat_playlist():
return jsonify(pysql.stat_playlist())
@app.route("/stat/music")
def stat_music():
return jsonify(pysql.stat_music())
@app.route("/stat/dataCount")
def stat_data():
return jsonify(pysql.stat_data())
@app.route("/scan")
def scan():
return render_template('scan.html')
@app.route("/scan/data")
def scan_data():
comment = pysql.random_data()
if len(comment) > 0:
res = {"msg":"ok","num":len(comment),"comment":comment}
return jsonify(res)
else:
return jsonify({"msg":"ok","num":len(comment),"comment":[]})
@app.route("/bussiness")
def bussiness():
return render_template('bussiness.html')
gitextract_2_c_xszu/
├── .gitignore
├── .travis.yml
├── LICENSE
├── MANIFEST.in
├── Makefile
├── README.md
├── doc/
│ ├── 2017.Q4.TODO.md
│ ├── 2018.Q1.TODO.md
│ └── 2018.Q2.TODO.md
├── hack/
│ ├── docker-compose.yaml
│ └── spider/
│ ├── Dockerfile
│ └── spider163.conf
├── pypi.sh
├── setup.py
└── spider163/
├── __init__.py
├── bin/
│ ├── __init__.py
│ ├── cli.py
│ └── cli_test.py
├── mail/
│ ├── __init__.py
│ └── mail.py
├── settings.py
├── spider/
│ ├── __init__.py
│ ├── authorize.py
│ ├── comment.py
│ ├── lyric.py
│ ├── mp3.py
│ ├── music.py
│ ├── playlist.py
│ ├── public.py
│ ├── read.py
│ └── search.py
├── template/
│ └── spider163.conf
├── utils/
│ ├── __init__.py
│ ├── config.py
│ ├── const.py
│ ├── encrypt.py
│ ├── healthz.py
│ ├── mail.py
│ ├── pylog.py
│ ├── pysql.py
│ └── tools.py
├── version.py
└── www/
├── __init__.py
├── static/
│ ├── css/
│ │ └── spider163.css
│ └── js/
│ ├── macarons.js
│ ├── scan.js
│ ├── spider163.js
│ └── stat.js
├── templates/
│ ├── bussiness.html
│ ├── index.html
│ ├── scan.html
│ ├── spider.html
│ └── stat.html
└── web.py
SYMBOL INDEX (157 symbols across 23 files)
FILE: setup.py
class CleanCommand (line 9) | class CleanCommand(Command):
method initialize_options (line 13) | def initialize_options(self):
method finalize_options (line 16) | def finalize_options(self):
method run (line 19) | def run(self):
FILE: spider163/bin/cli.py
class VersionController (line 37) | class VersionController(CementBaseController):
class Meta (line 38) | class Meta:
method healthz (line 46) | def healthz(self):
method expose (line 52) | def expose(self):
class DatabaseController (line 56) | class DatabaseController(CementBaseController):
class Meta (line 57) | class Meta:
method initdb (line 66) | def initdb(self):
method resetdb (line 71) | def resetdb(self):
method updatedb (line 77) | def updatedb(self):
class SpiderController (line 91) | class SpiderController(CementBaseController):
class Meta (line 92) | class Meta:
method classify (line 109) | def classify(self):
method playlist (line 113) | def playlist(self):
method mp3 (line 128) | def mp3(self):
method music (line 140) | def music(self):
method toplist (line 154) | def toplist(self):
method comment (line 168) | def comment(self):
method lyric (line 182) | def lyric(self):
class QueryController (line 195) | class QueryController(CementBaseController):
class Meta (line 196) | class Meta:
method get (line 208) | def get(self):
method search (line 216) | def search(self):
method doc (line 224) | def doc(self):
method mail (line 232) | def mail(self):
class WebController (line 241) | class WebController(CementBaseController):
class Meta (line 242) | class Meta:
method webserver (line 250) | def webserver(self):
class AuthController (line 258) | class AuthController(CementBaseController):
class Meta (line 259) | class Meta:
method top50 (line 271) | def top50(self):
class App (line 292) | class App(CementApp):
class Meta (line 293) | class Meta:
function main (line 299) | def main():
FILE: spider163/bin/cli_test.py
class TestStringMethods (line 10) | class TestStringMethods(unittest.TestCase):
method test_config (line 12) | def test_config(self):
method test_classify (line 17) | def test_classify(self):
method test_mp3 (line 20) | def test_mp3(self):
method test_search (line 24) | def test_search(self):
method test_doc (line 30) | def test_doc(self):
FILE: spider163/mail/mail.py
function music (line 13) | def music(playlist_id):
FILE: spider163/settings.py
function configure_orm (line 11) | def configure_orm():
FILE: spider163/spider/authorize.py
class Command (line 15) | class Command():
method __init__ (line 17) | def __init__(self):
method createPlaylistParams (line 25) | def createPlaylistParams(self,ids,playlist_id,cmd,csrf_token):
method createPlaylistRemoveParams (line 34) | def createPlaylistRemoveParams(self):
method createLoginParams (line 37) | def createLoginParams(self,username,password):
method rsaEncrypt (line 47) | def rsaEncrypt(self, text, pubKey, modulus):
method createSecretKey (line 52) | def createSecretKey(self, size):
method post_playlist_add (line 57) | def post_playlist_add(self,ids, playlist_id=2098905487, csrf_token="da...
method post_playlist_delete (line 68) | def post_playlist_delete(self, ids, playlist_id=2098905487, csrf_token...
method do_login (line 79) | def do_login(self,username,password):
method clear_playlist (line 94) | def clear_playlist(self,playlist_id=2098905487):
method create_playlist_comment_top100 (line 106) | def create_playlist_comment_top100(self,playlist_id=2098905487):
FILE: spider163/spider/comment.py
class Comment (line 19) | class Comment:
method __init__ (line 23) | def __init__(self, music_type=Common):
method createParams (line 32) | def createParams(self, page=1):
method rsaEncrypt (line 50) | def rsaEncrypt(self, text, pubKey, modulus):
method createSecretKey (line 55) | def createSecretKey(self, size):
method post (line 60) | def post(self,song_id, page):
method views_capture (line 71) | def views_capture(self, song_id, page=1, pages=1024):
method view_capture (line 80) | def view_capture(self, song_id, page=1):
method view_links (line 130) | def view_links(self, song_id):
method auto_view (line 164) | def auto_view(self, count=1):
method get_music (line 187) | def get_music(self, music_id):
FILE: spider163/spider/lyric.py
class Lyric (line 10) | class Lyric:
method __init__ (line 12) | def __init__(self):
method view_lyric (line 16) | def view_lyric(self, song_id):
method get_lyric (line 31) | def get_lyric(self, song_id):
method view_lyrics (line 36) | def view_lyrics(self, count):
FILE: spider163/spider/mp3.py
class MP3 (line 16) | class MP3:
method __init__ (line 18) | def __init__(self):
method create_params (line 26) | def create_params(self, song_id):
method rsa_encrypt (line 35) | def rsa_encrypt(self, text, pubKey, modulus):
method create_secretKey (line 40) | def create_secretKey(self, size):
method view_down (line 45) | def view_down(self, playlist_id, path="."):
method get_playlist (line 80) | def get_playlist(self, playlist_id):
method get_mp3_link (line 89) | def get_mp3_link(self, song_id):
FILE: spider163/spider/music.py
class Music (line 14) | class Music:
method __init__ (line 16) | def __init__(self):
method views_capture (line 21) | def views_capture(self,source=None):
method view_capture (line 35) | def view_capture(self, link):
method curl_playlist (line 95) | def curl_playlist(self,playlist_id):
method get_playlist (line 114) | def get_playlist(self, playlist_id):
method create_update_strategy (line 142) | def create_update_strategy(self, **kwargs):
FILE: spider163/spider/playlist.py
class Playlist (line 15) | class Playlist:
method __init__ (line 19) | def __init__(self):
method get_classify (line 24) | def get_classify(self):
method view_capture (line 38) | def view_capture(self, page, type="全部"):
method create_update_strategy (line 63) | def create_update_strategy(self, **kwargs):
FILE: spider163/spider/read.py
function read_playlist_json (line 17) | def read_playlist_json(id):
function read_music_data (line 23) | def read_music_data(id):
function read_comment_data (line 27) | def read_comment_data(id):
function read_lyric_data (line 32) | def read_lyric_data(id):
function print_pdf (line 38) | def print_pdf(id):
function print_comment (line 66) | def print_comment(count):
FILE: spider163/spider/search.py
function searchSong (line 15) | def searchSong(key):
function searchAlbum (line 36) | def searchAlbum(key):
function searchSinger (line 59) | def searchSinger(key):
function searchPlaylist (line 78) | def searchPlaylist(key):
FILE: spider163/utils/config.py
function get_path (line 28) | def get_path():
function get_db (line 32) | def get_db():
function get_mail (line 41) | def get_mail():
function format_db (line 49) | def format_db():
function get_mysql (line 65) | def get_mysql():
function get_port (line 72) | def get_port():
FILE: spider163/utils/encrypt.py
function aes (line 11) | def aes(text, sec_key):
FILE: spider163/utils/healthz.py
function is_correct_config (line 13) | def is_correct_config():
function is_correct_db (line 32) | def is_correct_db():
function can_spider (line 47) | def can_spider():
function expose_data (line 51) | def expose_data():
FILE: spider163/utils/mail.py
function send_email (line 8) | def send_email(host,port, subject, user, content):
FILE: spider163/utils/pylog.py
function Log (line 18) | def Log(msg):
function Table (line 23) | def Table(tb):
function Blue (line 27) | def Blue(msg):
function green (line 31) | def green(msg):
function red (line 35) | def red(msg):
function print_err (line 39) | def print_err(msg):
function print_warn (line 43) | def print_warn(msg):
function print_info (line 47) | def print_info(msg):
FILE: spider163/utils/pysql.py
class Playlist163 (line 16) | class Playlist163(Base):
class Music163 (line 35) | class Music163(Base):
class Toplist163 (line 51) | class Toplist163(Base):
class Comment163 (line 71) | class Comment163(Base):
class Lyric163 (line 83) | class Lyric163(Base):
function single (line 92) | def single(table, k, v):
function stat_playlist (line 100) | def stat_playlist():
function stat_music (line 107) | def stat_music():
function stat_data (line 116) | def stat_data():
function random_data (line 124) | def random_data():
function initdb (line 134) | def initdb():
function dropdb (line 141) | def dropdb():
FILE: spider163/utils/tools.py
function ignored (line 16) | def ignored(*exceptions):
function encode (line 23) | def encode(s):
function hex (line 30) | def hex(s):
function md5 (line 37) | def md5(s):
function curl (line 43) | def curl(url, headers, type = const.RETURN_JSON):
FILE: spider163/www/static/js/scan.js
function commentlist (line 12) | function commentlist() {
FILE: spider163/www/static/js/stat.js
function dataCount (line 40) | function dataCount() {
function playlist (line 66) | function playlist() {
function music (line 97) | function music() {
FILE: spider163/www/web.py
function index (line 17) | def index():
function spider (line 22) | def spider(type=None):
function get_playlist (line 27) | def get_playlist():
function get_music (line 34) | def get_music():
function get_lyric (line 41) | def get_lyric():
function get_comment (line 48) | def get_comment():
function statistics (line 55) | def statistics():
function stat_playlist (line 60) | def stat_playlist():
function stat_music (line 65) | def stat_music():
function stat_data (line 70) | def stat_data():
function scan (line 75) | def scan():
function scan_data (line 80) | def scan_data():
function bussiness (line 90) | def bussiness():
Condensed preview — 54 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (130K chars).
[
{
"path": ".gitignore",
"chars": 1082,
"preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
},
{
"path": ".travis.yml",
"chars": 120,
"preview": "language: python\n\npython:\n - 3.6\n\ninstall:\n - pip install -e .\nscript:\n - python -m unittest discover -p \"*_test.py\"\n"
},
{
"path": "LICENSE",
"chars": 1069,
"preview": "MIT License\n\nCopyright (c) 2017 Cheng YuMeng\n\nPermission is hereby granted, free of charge, to any person obtaining a co"
},
{
"path": "MANIFEST.in",
"chars": 82,
"preview": "graft spider163/www/templates\ngraft spider163/www/static\ngraft spider163/template\n"
},
{
"path": "Makefile",
"chars": 360,
"preview": ".PHONY: docker-build docker-run build\n\nUSER := $(shell id -u)\nBRANCH := $(shell git rev-parse --abbrev-ref HEAD)\nVERSION"
},
{
"path": "README.md",
"chars": 2897,
"preview": " \n# spider163\n[√\n* 增加一些自动执行的脚本 √\n* 增加pip支持 √\n"
},
{
"path": "doc/2018.Q1.TODO.md",
"chars": 212,
"preview": "#### 主要方向\n* 文件类下载(mp3等)[完成下载mp3]\n* 搜索框架(基于ES,可能衍生子项目)[TODO]\n* 最小安装的探索(MySQL依赖太复杂)[生成word]\n* 迁移到python3 [验证性完成]\n* k8s下部署"
},
{
"path": "doc/2018.Q2.TODO.md",
"chars": 106,
"preview": "#### 主要方向\n* 探索搜索框架【可能通过衍生子项目实现】\n* 优化集成测试\n* 开发邮箱订阅功能[dev]\n* 集成telegraf/influxdb/grafana[dev]\n* 赚钱!赚钱!!赚钱!!!"
},
{
"path": "hack/docker-compose.yaml",
"chars": 686,
"preview": "version: '2'\nservices:\n mysql163:\n image: mysql:5.6.36\n container_name: mysql163\n networks:\n - default\n "
},
{
"path": "hack/spider/Dockerfile",
"chars": 310,
"preview": "FROM python:3.6\nRUN mkdir /root/code & mkdir /root/spider163\nWORKDIR /root/code\nADD ./ /root/code/\nADD hack/spider/spide"
},
{
"path": "hack/spider/spider163.conf",
"chars": 87,
"preview": "[core]\ndb=mysql://root:a1b2c3d4e@mysql163.localhost/spider163?charset=utf8mb4\nport=1630"
},
{
"path": "pypi.sh",
"chars": 86,
"preview": "python setup.py clean\npython setup.py sdist\ntwine upload dist/*\npython setup.py clean\n"
},
{
"path": "setup.py",
"chars": 2197,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*\n\nfrom setuptools import setup, find_packages, Command\nimport os\nimport imp\n"
},
{
"path": "spider163/__init__.py",
"chars": 2,
"preview": "\n\n"
},
{
"path": "spider163/bin/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "spider163/bin/cli.py",
"chars": 9809,
"preview": "# -*- coding: utf-8 -*-\n\nimport os, datetime\n\nfrom cement.core.foundation import CementApp\nfrom cement.core.controller i"
},
{
"path": "spider163/bin/cli_test.py",
"chars": 798,
"preview": "import unittest\n\nfrom spider163.spider import playlist\nfrom spider163.spider import mp3\nfrom spider163.spider import sea"
},
{
"path": "spider163/mail/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "spider163/mail/mail.py",
"chars": 1714,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nfrom spider163 import settings\nfrom spider163 import version\nfrom spider1"
},
{
"path": "spider163/settings.py",
"chars": 902,
"preview": "# -*- coding: utf-8 -*-\n\nfrom sqlalchemy import create_engine\nfrom sqlalchemy.orm import scoped_session, sessionmaker\nfr"
},
{
"path": "spider163/spider/__init__.py",
"chars": 219,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nfrom spider163 import settings\nfrom spider163.utils import pylog\ntry:\n "
},
{
"path": "spider163/spider/authorize.py",
"chars": 4378,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nimport os\nimport requests\nimport time\n\nfrom spider163.utils import encrypt"
},
{
"path": "spider163/spider/comment.py",
"chars": 11177,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nimport os\nimport requests\nimport datetime\n\nfrom bs4 import BeautifulSoup\n"
},
{
"path": "spider163/spider/lyric.py",
"chars": 2160,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nfrom spider163.spider import public as uapi\nfrom spider163 import settings"
},
{
"path": "spider163/spider/mp3.py",
"chars": 3325,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nimport os\nimport requests\n\nfrom terminaltables import AsciiTable\n\nfrom sp"
},
{
"path": "spider163/spider/music.py",
"chars": 7478,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nimport datetime\n\nfrom spider163.spider import public as uapi\nfrom spider1"
},
{
"path": "spider163/spider/playlist.py",
"chars": 2612,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nimport datetime\n\nfrom terminaltables import AsciiTable\n\nfrom spider163.uti"
},
{
"path": "spider163/spider/public.py",
"chars": 2322,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nheader = {\n 'Referer': 'http://music.163.com/',\n 'H"
},
{
"path": "spider163/spider/read.py",
"chars": 3884,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nfrom docx import Document\nfrom xlwt import Workbook\nfrom docx.enum.dml im"
},
{
"path": "spider163/spider/search.py",
"chars": 3499,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nimport requests\nfrom terminaltables import AsciiTable\n\nfrom spider163.util"
},
{
"path": "spider163/template/spider163.conf",
"chars": 196,
"preview": "# 请把本配置文件名字中删掉.default保存到工作目录中,默认为~/spider163/\n# 请修改相关配置为本地可用配置\n[core]\ndb=mysql://root:password@127.0.0.1/database?chars"
},
{
"path": "spider163/utils/__init__.py",
"chars": 23,
"preview": "# --* coding: utf-8 -*-"
},
{
"path": "spider163/utils/config.py",
"chars": 2008,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nimport os\nimport re\n\nfrom spider163 import version\nif version.PYTHON3 is T"
},
{
"path": "spider163/utils/const.py",
"chars": 112,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nRETURN_JSON = \"return json data\"\nRETURE_HTML = \"return html data\""
},
{
"path": "spider163/utils/encrypt.py",
"chars": 616,
"preview": "# coding=utf-8\n\nimport base64\n\nfrom cryptography.hazmat.primitives.ciphers import (\n Cipher, algorithms, modes\n)\nfrom"
},
{
"path": "spider163/utils/healthz.py",
"chars": 2387,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nimport os\nimport json\nfrom sqlalchemy import func\n\nfrom spider163.utils im"
},
{
"path": "spider163/utils/mail.py",
"chars": 908,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nimport smtplib\nimport datetime\n\n\ndef send_email(host,port, subject, user,"
},
{
"path": "spider163/utils/pylog.py",
"chars": 767,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nimport logging\nfrom spider163.utils import config\nfrom logbook import File"
},
{
"path": "spider163/utils/pysql.py",
"chars": 6164,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nimport random\n\nfrom sqlalchemy import Column, Integer, String, TIMESTAMP, "
},
{
"path": "spider163/utils/tools.py",
"chars": 1112,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\nimport contextlib\nimport codecs\nimport requests\nimport json\nimport hashli"
},
{
"path": "spider163/version.py",
"chars": 1785,
"preview": "# -*- coding: utf-8 -*-\nimport os\nimport sys\n\nPYTHON3 = False\nif sys.version > \"3\":\n PYTHON3 = True\n\nVERSION = \"2.7.8"
},
{
"path": "spider163/www/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "spider163/www/static/css/spider163.css",
"chars": 2321,
"preview": ".brand-intro {\n text-align: center;\n background-color: #a811b5;\n color: #fff;\n padding: 120px 0;\n backgro"
},
{
"path": "spider163/www/static/js/macarons.js",
"chars": 4845,
"preview": "(function (root, factory) {\n if (typeof define === 'function' && define.amd) {\n // AMD. Register as an anonymo"
},
{
"path": "spider163/www/static/js/scan.js",
"chars": 1323,
"preview": "$(function () {\n this.createDom = function(){\n }\n this.documentEvent = function(){\n\n }\n this.init = funct"
},
{
"path": "spider163/www/static/js/spider163.js",
"chars": 4929,
"preview": "$(function () {\n this.createDom = function () {\n this.spiderPlaylistObj = $(\"#spiderPlaylist\");\n this.S"
},
{
"path": "spider163/www/static/js/stat.js",
"chars": 4399,
"preview": "$(function () {\n this.createDom = function () {\n this.spiderPlaylistObj = $(\"#spiderPlaylist\");\n }\n this"
},
{
"path": "spider163/www/templates/bussiness.html",
"chars": 1614,
"preview": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n <meta charset=\"UTF-8\">\n <title>Welcome To Spider163</title>\n <link rel"
},
{
"path": "spider163/www/templates/index.html",
"chars": 1649,
"preview": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n <meta charset=\"UTF-8\">\n <title>Welcome To Spider163</title>\n <link rel"
},
{
"path": "spider163/www/templates/scan.html",
"chars": 1447,
"preview": "<!DOCTYPE html>\n<html lang=\"en\" ng-app=\"spider\">\n<head>\n <meta charset=\"UTF-8\">\n <title>Welcome To Spider163</titl"
},
{
"path": "spider163/www/templates/spider.html",
"chars": 5120,
"preview": "<!DOCTYPE html>\n<html lang=\"en\" ng-app=\"spider\">\n<head>\n <meta charset=\"UTF-8\">\n <title>Welcome To Spider163</titl"
},
{
"path": "spider163/www/templates/stat.html",
"chars": 3534,
"preview": "<!DOCTYPE html>\n<html lang=\"en\" ng-app=\"stat\">\n<head>\n <meta charset=\"UTF-8\">\n <title>Welcome To Spider163</title>"
},
{
"path": "spider163/www/web.py",
"chars": 2279,
"preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\nfrom flask import Flask, request, json, jsonify\nfrom flask import render_t"
}
]
About this extraction
This page contains the full source code of the Chengyumeng/spider163 GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 54 files (110.7 KB), approximately 32.7k tokens, and a symbol index with 157 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.