Full Code of kangvcar/InfoSpider for AI

master 610cbf45dc8d cached

65 files

718.4 KB

209.5k tokens

369 symbols

1 requests

Download .txt

Showing preview only (746K chars total). Download the full file or copy to clipboard to get everything.

Repository: kangvcar/InfoSpider
Branch: master
Commit: 610cbf45dc8d
Files: 65
Total size: 718.4 KB

Directory structure:
gitextract_ndsd6n6o/

├── .github/
│   ├── FUNDING.yml
│   └── ISSUE_TEMPLATE/
│       ├── bug_report.md
│       └── feature_request.md
├── .gitignore
├── LICENSE
├── README.md
├── README_EN.md
├── Spiders/
│   ├── A12306/
│   │   └── main12306.py
│   ├── JdSpider/
│   │   └── jd_more_info.py
│   ├── __init__.py
│   ├── alipay/
│   │   └── main.py
│   ├── bilibili/
│   │   └── main.py
│   ├── browser/
│   │   └── main.py
│   ├── chsi/
│   │   └── main.py
│   ├── cloudmusic/
│   │   └── main.py
│   ├── cnblog/
│   │   └── main.py
│   ├── csdn/
│   │   └── main.py
│   ├── ctrip/
│   │   └── main.py
│   ├── github/
│   │   └── main.py
│   ├── jianshu/
│   │   └── main.py
│   ├── mail/
│   │   └── main.py
│   ├── moments_album/
│   │   └── main.py
│   ├── oschina/
│   │   └── main.py
│   ├── qqfriend/
│   │   └── main.py
│   ├── qqqun/
│   │   └── main.py
│   ├── shgjj/
│   │   └── main.py
│   ├── taobao/
│   │   └── spider.py
│   ├── telephone/
│   │   └── main.py
│   ├── yidong/
│   │   └── main.py
│   └── zhihu/
│       └── main.py
├── docs/
│   ├── .nojekyll
│   ├── QuickStart.md
│   ├── README.md
│   ├── _coverpage.md
│   ├── ads.txt
│   └── index.html
├── extension/
│   ├── index.css
│   ├── index.html
│   └── js/
│       ├── FileSaver.js
│       ├── cnblog/
│       │   ├── cnblogrun0.js
│       │   ├── cnblogrun1.js
│       │   └── cnblogrun2.js
│       ├── github/
│       │   ├── githubrun1.js
│       │   ├── githubrun2.js
│       │   ├── githubrun3.js
│       │   ├── githubrun4.js
│       │   └── githubrun5.js
│       ├── index.js
│       ├── jianshu/
│       │   ├── jianshurun1.js
│       │   └── jianshurun2.js
│       ├── jquery.js
│       └── oschina/
│           └── oschinarun0.js
├── install_deps.sh
├── requirements.txt
├── tests/
│   ├── DeepAnalysis/
│   │   ├── dataprocess.py
│   │   ├── model.py
│   │   └── trainer.py
│   ├── blog_analyse/
│   │   ├── cnblog.ipynb
│   │   ├── postdate_line.html
│   │   ├── stop_word.txt
│   │   └── topic_wordcloud.html
│   └── ctrip/
│       └── main.py
├── tools/
│   ├── main.py
│   └── stop_word.txt
└── uitest/
    └── main.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .github/FUNDING.yml
================================================
# These are supported funding model platforms
ko_fi: kangvcar
custom: ['https://afdian.net/a/kangvcar']


================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name:  "\U0001F41B Bug Report"
about: "If something isn't working as expected \U0001F914."
title: ''
labels: 'bug'
assignees: ''

---

## Bug Report

**Description**: [Description of the issue]

**Expected behavior**: [What should happen]

**Current behavior**: [What happpens instead of the expected behavior]

**Steps to Reproduce**:

1. [First Step]
2. [Second Step]
3. [and so on ]

**Reproduce how often**: [What percentage of the time does it reproduce?]

**Possible solution**: [Not obligatory, but suggest a fix/reason for the bug]


**Context (Environment)**:[The code version, python version, operating system or other software/libs you use]

## Additional Information

[Any other useful information about the problem].

================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================


================================================
FILE: .gitignore
================================================
*.pyc
*.json
*.xlsx
*.swp
data
.idea
*.log
__pycache__

================================================
FILE: LICENSE
================================================
GNU GENERAL PUBLIC LICENSE
                       Version 3, 29 June 2007

 Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.

                            Preamble

  The GNU General Public License is a free, copyleft license for
software and other kinds of works.

  The licenses for most software and other practical works are designed
to take away your freedom to share and change the works.  By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users.  We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors.  You can apply it to
your programs, too.

  When we speak of free software, we are referring to freedom, not
price.  Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.

  To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights.  Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.

  For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received.  You must make sure that they, too, receive
or can get the source code.  And you must show them these terms so they
know their rights.

  Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.

  For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software.  For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.

  Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so.  This is fundamentally incompatible with the aim of
protecting users' freedom to change the software.  The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable.  Therefore, we
have designed this version of the GPL to prohibit the practice for those
products.  If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.

  Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary.  To prevent this, the GPL assures that
patents cannot be used to render the program non-free.

  The precise terms and conditions for copying, distribution and
modification follow.

                       TERMS AND CONDITIONS

  0. Definitions.

  "This License" refers to version 3 of the GNU General Public License.

  "Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.

  "The Program" refers to any copyrightable work licensed under this
License.  Each licensee is addressed as "you".  "Licensees" and
"recipients" may be individuals or organizations.

  To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy.  The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.

  A "covered work" means either the unmodified Program or a work based
on the Program.

  To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy.  Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.

  To "convey" a work means any kind of propagation that enables other
parties to make or receive copies.  Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.

  An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License.  If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.

  1. Source Code.

  The "source code" for a work means the preferred form of the work
for making modifications to it.  "Object code" means any non-source
form of a work.

  A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.

  The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form.  A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.

  The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities.  However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work.  For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.

  The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.

  The Corresponding Source for a work in source code form is that
same work.

  2. Basic Permissions.

  All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met.  This License explicitly affirms your unlimited
permission to run the unmodified Program.  The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work.  This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.

  You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force.  You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright.  Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.

  Conveying under any other circumstances is permitted solely under
the conditions stated below.  Sublicensing is not allowed; section 10
makes it unnecessary.

  3. Protecting Users' Legal Rights From Anti-Circumvention Law.

  No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.

  When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.

  4. Conveying Verbatim Copies.

  You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.

  You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.

  5. Conveying Modified Source Versions.

  You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:

    a) The work must carry prominent notices stating that you modified
    it, and giving a relevant date.

    b) The work must carry prominent notices stating that it is
    released under this License and any conditions added under section
    7.  This requirement modifies the requirement in section 4 to
    "keep intact all notices".

    c) You must license the entire work, as a whole, under this
    License to anyone who comes into possession of a copy.  This
    License will therefore apply, along with any applicable section 7
    additional terms, to the whole of the work, and all its parts,
    regardless of how they are packaged.  This License gives no
    permission to license the work in any other way, but it does not
    invalidate such permission if you have separately received it.

    d) If the work has interactive user interfaces, each must display
    Appropriate Legal Notices; however, if the Program has interactive
    interfaces that do not display Appropriate Legal Notices, your
    work need not make them do so.

  A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit.  Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.

  6. Conveying Non-Source Forms.

  You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:

    a) Convey the object code in, or embodied in, a physical product
    (including a physical distribution medium), accompanied by the
    Corresponding Source fixed on a durable physical medium
    customarily used for software interchange.

    b) Convey the object code in, or embodied in, a physical product
    (including a physical distribution medium), accompanied by a
    written offer, valid for at least three years and valid for as
    long as you offer spare parts or customer support for that product
    model, to give anyone who possesses the object code either (1) a
    copy of the Corresponding Source for all the software in the
    product that is covered by this License, on a durable physical
    medium customarily used for software interchange, for a price no
    more than your reasonable cost of physically performing this
    conveying of source, or (2) access to copy the
    Corresponding Source from a network server at no charge.

    c) Convey individual copies of the object code with a copy of the
    written offer to provide the Corresponding Source.  This
    alternative is allowed only occasionally and noncommercially, and
    only if you received the object code with such an offer, in accord
    with subsection 6b.

    d) Convey the object code by offering access from a designated
    place (gratis or for a charge), and offer equivalent access to the
    Corresponding Source in the same way through the same place at no
    further charge.  You need not require recipients to copy the
    Corresponding Source along with the object code.  If the place to
    copy the object code is a network server, the Corresponding Source
    may be on a different server (operated by you or a third party)
    that supports equivalent copying facilities, provided you maintain
    clear directions next to the object code saying where to find the
    Corresponding Source.  Regardless of what server hosts the
    Corresponding Source, you remain obligated to ensure that it is
    available for as long as needed to satisfy these requirements.

    e) Convey the object code using peer-to-peer transmission, provided
    you inform other peers where the object code and Corresponding
    Source of the work are being offered to the general public at no
    charge under subsection 6d.

  A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.

  A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling.  In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage.  For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product.  A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.

  "Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source.  The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.

  If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information.  But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).

  The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed.  Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.

  Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.

  7. Additional Terms.

  "Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law.  If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.

  When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it.  (Additional permissions may be written to require their own
removal in certain cases when you modify the work.)  You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.

  Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:

    a) Disclaiming warranty or limiting liability differently from the
    terms of sections 15 and 16 of this License; or

    b) Requiring preservation of specified reasonable legal notices or
    author attributions in that material or in the Appropriate Legal
    Notices displayed by works containing it; or

    c) Prohibiting misrepresentation of the origin of that material, or
    requiring that modified versions of such material be marked in
    reasonable ways as different from the original version; or

    d) Limiting the use for publicity purposes of names of licensors or
    authors of the material; or

    e) Declining to grant rights under trademark law for use of some
    trade names, trademarks, or service marks; or

    f) Requiring indemnification of licensors and authors of that
    material by anyone who conveys the material (or modified versions of
    it) with contractual assumptions of liability to the recipient, for
    any liability that these contractual assumptions directly impose on
    those licensors and authors.

  All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10.  If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term.  If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.

  If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.

  Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.

  8. Termination.

  You may not propagate or modify a covered work except as expressly
provided under this License.  Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).

  However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.

  Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.

  Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License.  If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.

  9. Acceptance Not Required for Having Copies.

  You are not required to accept this License in order to receive or
run a copy of the Program.  Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance.  However,
nothing other than this License grants you permission to propagate or
modify any covered work.  These actions infringe copyright if you do
not accept this License.  Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.

  10. Automatic Licensing of Downstream Recipients.

  Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License.  You are not responsible
for enforcing compliance by third parties with this License.

  An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations.  If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.

  You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License.  For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.

  11. Patents.

  A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based.  The
work thus licensed is called the contributor's "contributor version".

  A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version.  For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.

  Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.

  In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement).  To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.

  If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients.  "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.

  If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.

  A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License.  You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.

  Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.

  12. No Surrender of Others' Freedom.

  If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License.  If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all.  For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.

  13. Use with the GNU Affero General Public License.

  Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
combined work, and to convey the resulting work.  The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.

  14. Revised Versions of this License.

  The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time.  Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.

  Each version is given a distinguishing version number.  If the
Program specifies that a certain numbered version of the GNU General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation.  If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
by the Free Software Foundation.

  If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.

  Later license versions may give you additional or different
permissions.  However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.

  15. Disclaimer of Warranty.

  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

  16. Limitation of Liability.

  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.

  17. Interpretation of Sections 15 and 16.

  If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.

                     END OF TERMS AND CONDITIONS

            How to Apply These Terms to Your New Programs

  If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.

  To do so, attach the following notices to the program.  It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.

    {one line to give the program's name and a brief idea of what it does.}
    Copyright (C) {year}  {name of author}

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

Also add information on how to contact you by electronic and paper mail.

  If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:

    {project}  Copyright (C) {year}  {fullname}
    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
    This is free software, and you are welcome to redistribute it
    under certain conditions; type `show c' for details.

The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License.  Of course, your program's commands
might be different; for a GUI interface, you would use an "about box".

  You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
<http://www.gnu.org/licenses/>.

  The GNU General Public License does not permit incorporating your program
into proprietary programs.  If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library.  If this is what you want to do, use the GNU Lesser General
Public License instead of this License.  But first, please read
<http://www.gnu.org/philosophy/why-not-lgpl.html>.

================================================
FILE: README.md
================================================
<p align="center">
    <img src="https://i.loli.net/2020/10/20/SKOdFZpVYo4LvgT.png" alt="InfoSpider logo"/>
</p>

***

<p align="center">
    <a>
        <img alt="GitHub stars" src="https://img.shields.io/github/stars/kangvcar/infospider?style=social">
    </a>
    <a>
        <img src="https://img.shields.io/badge/python-v3-blue?style=flat-square" alt="UW2eVx.png" />
    </a>
    <a>
        <img src="https://img.shields.io/badge/platform-Windows-blue?style=flat-square" alt="UW2eVx.png" />
    </a>
    <a>
        <img src="https://img.shields.io/website?up_message=%E4%BD%BF%E7%94%A8%E6%96%87%E6%A1%A3&url=https%3A%2F%2Finfospider.vercel.app%2F" alt="UW2eVx.png" />
    </a>
    <a>
    <img alt="GitHub repo size" src="https://img.shields.io/github/repo-size/kangvcar/infospider?style=flat-square">
    </a>
    <a>
    <img alt="GitHub repo size" src="https://img.shields.io/badge/license-GPL-blue?style=flat-square">
    </a>
</p>
<p align="center">一个神奇的工具箱，拿回你的个人信息。</p>
<p align="center">👉⚡<a href="https://infospider.vercel.app/">使用说明</a> ⚡| <a href="https://www.bilibili.com/video/BV14f4y1R7oF/">视频演示</a> | <a href="https://github.com/kangvcar/InfoSpider/blob/master/README_EN.md">English</a> | <a href="https://mbd.pub/o/bread/aZiTlJo=">获取最新维护版本</a> | <a href="https://t.me/+b-Rdy7_9QuwyMGI1">TG交流群</a></p>

### 🗣️ TG交流群：[加入群组](https://t.me/+b-Rdy7_9QuwyMGI1)

👍 推荐 RapidProxy 代理IP：

<a href="https://www.rapidproxy.io/?ref=Spider" target="_blank">
  <img src="https://pic1.imgdb.cn/item/69846f90013471eb7b528c3f.jpg" style="width:40%;" alt="RapidProxy">
</a>

👍 推荐 BestProxy 代理IP：

<a href="https://bestproxy.com/?keyword=soocgr8r" target="_blank">
  <img src="https://pic1.imgdb.cn/item/688c621858cb8da5c8f7c873.png" style="width:40%;" alt="BestProxy">
</a>

👍 推荐 ThorData 代理IP：

<a href="https://www.thordata.com/?ls=github&lk=infospider" target="_blank">
  <img src="https://pic1.imgdb.cn/item/6941236e0dd29e7e2247ce21.jpg" style="width:40%;" alt="Thordata">
</a>

### 开发者回忆录
<details>
<summary>点击展开👉 开发者回忆录</summary>

#### 场景一
小明一如往常打开 Chrome 浏览器逛着论坛，贴吧，一不小心点开了网页上的广告，跳转到了京东商城，下意识去关闭窗口时发现 （**OS：咦？京东怎么知道我最近心心念念的宝贝呢？刚好我正需要呢！**），既然打开了那就看看商品详情吧 （**OS：哎哟不错哦**），那就下单试试吧！

#### 场景二
小白听着网易云音乐的每日推荐歌单无法自拔 （**OS：哇！怎么播放列表里都是我喜欢的音乐风格？网易云音乐太棒了吧!深得我心啊！黑胶会员必须来一个！**），逛着知乎里的“如何优雅的XXX?”，“XXX是怎样一种体验？”，“如何评价XXX?” （**OS：咦？这个问题就是我刚好想问的，原来早已有人提问！什么？？？还有几千条回答！！进去逛逛看！**）

#### 场景三
小达上班时不忘充实自己，逛着各大技术论坛博客园、CSDN、开源中国、简书、掘金等等，发现首页的内容推荐太棒了（**OS：这些技术博文太棒了，不用找就出来了**），再打开自己的博客主页发现不知不觉地自己也坚持写博文也有三年了，自己的技术栈也越来越丰富（**OS：怎么博客后台都不提供一个数据分析系统呢？我想看看我这几年来的发文数量，发文时间，想知道哪些博文比较热门，想看看我在哪些技术上花费的时间更多，想看看我过去的创作高峰期时在晚上呢？还是凌晨？我希望系统能给我更多指引数据让我更好的创作！**）

看到以上几个场景你可能会感叹科技在进步，技术在发展，极大地改善了我们的生活方式。

但当你深入思考，你浏览的每个网站，注册的每个网站，他们都记录着你的信息你的足迹。

细思恐极的背后是自己的个人数据被赤裸裸的暴露在互联网上并且被众多的公司利用用户数据获得巨额利益，如对用户的数据收集分析后进行定制的广告推送，收取高额广告费。但作为数据的生产者却没能分享属于自己的数据收益。

#### 想法

如果有一个这样的工具，它能帮你拿回你的个人信息，它能帮你把分散在各种站点的个人信息聚合起来，它能帮你分析你的个人数据并给你提供建议，它能帮你把个人数据可视化让你更清楚地了解自己。

> 你是否会需要这样的工具呢? 你是否会喜欢这样的工具呢？

基于以上，我着手开发了 **[INFO-SPIDER](https://github.com/kangvcar/InfoSpider)** 👇👇👇

</details>

### What is INFO-SPIDER

INFO-SPIDER 是一个集众多数据源于一身的爬虫工具箱，旨在安全快捷的帮助用户拿回自己的数据，工具代码开源，流程透明。并提供数据分析功能，基于用户数据生成图表文件，使得用户更直观、深入了解自己的信息。
目前支持数据源包括GitHub、QQ邮箱、网易邮箱、阿里邮箱、新浪邮箱、Hotmail邮箱、Outlook邮箱、京东、淘宝、支付宝、中国移动、中国联通、中国电信、知乎、哔哩哔哩、网易云音乐、QQ好友、QQ群、生成朋友圈相册、浏览器浏览历史、12306、博客园、CSDN博客、开源中国博客、简书。

详细使用说明参照[使用说明文档](https://infospider.vercel.app)或[视频教程](https://www.bilibili.com/video/BV14f4y1R7oF/)

你可以在 [![Gitter](https://badges.gitter.im/Info-Spider/community.svg)](https://gitter.im/Info-Spider/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge) 与我们一起交流学习

### Features

- 安全可靠：本项目为开源项目，代码简洁，所有源码可见，本地运行，安全可靠。
- 使用简单：提供 GUI 界面，只需点击所需获取的数据源并根据提示操作即可。
- 结构清晰：本项目的所有数据源相互独立，可移植性高，**所有爬虫脚本在项目的 [Spiders](https://github.com/kangvcar/InfoSpider/tree/master/Spiders) 文件下**。
- 数据源丰富：本项目目前支持多达24+个数据源，持续更新。
- 数据格式统一：爬取的所有数据都将存储为json格式，方便后期数据分析。
- 个人数据丰富：本项目将尽可能多地为你爬取个人数据，后期数据处理可根据需要删减。
- 数据分析：本项目提供个人数据的可视化分析，目前仅部分支持。
- 文档丰富：本项目包含完整全面的[使用说明文档](https://infospider.vercel.app)和[视频教程](https://www.bilibili.com/video/BV14f4y1R7oF/)

### Screenshot

![screenshot.png](https://i.loli.net/2020/10/26/4NJyMhrsGPwvxgd.png)

### QuickStart

#### 依赖安装

1. 安装[python3](https://www.python.org/downloads/)和Chrome浏览器

2. 安装与Chrome浏览器相同版本的[驱动](http://chromedriver.storage.googleapis.com/index.html)

3. 安装依赖库 `pip install -r requirements.txt`

> 如果您在这一步操作遇到问题，可以获取[免安装版InfoSpider](https://mbd.pub/o/bread/aZiTlJo=)

#### 工具运行

1. 进入 tools 目录

2. 运行 `python3 main.py`

3. 在打开的窗口**点击数据源按钮**, 根据提示**选择数据保存路径**

4. 弹出的浏览器**输入用户密码**后会自动开始爬取数据, 爬取完成浏览器会自动关闭.
   
5. 在对应的目录下可以**查看下载下来的数据**(xxx.json), **数据分析图表**(xxx.html)

### 购买服务

> ***限量发售中...***，[去看看](https://mbd.pub/o/bread/aZiTlJo=)

1. InfoSpider 最新维护版本
2. 更全面的个人数据分析
3. 免去安装程序的所有依赖环境，便捷，适合小白
4. 已打包好的程序，双击即可运行程序
5. 手把手教你如何打包 InfoSpider
6. 开发者一对一技术支持
7. ***购买后即可免费获得即将发布的全新2.0版本***


<p align="center">
<img src="https://i.loli.net/2020/10/20/IRbLzEmBv9Ktwp4.jpg" alt="wechat" height=20% width=20%/></br>
<a href="https://mbd.pub/o/bread/aZiTlJo="><b>购买链接</b></a>
</p>

### 数据源
- [x] GitHub
- [x] QQ邮箱
- [x] 网易邮箱
- [x] 阿里邮箱
- [x] 新浪邮箱
- [x] Hotmail邮箱
- [x] Outlook邮箱
- [x] 京东
- [x] 淘宝
- [x] 支付宝
- [x] 中国移动
- [x] 中国联通
- [x] 中国电信
- [x] 知乎
- [x] 哔哩哔哩
- [x] 网易云音乐
- [x] QQ好友([cjh0613](https://github.com/cjh0613/python-pub/blob/1c308fe90386f6d6e69e2202bb0c4acd4857576f/%E8%8E%B7%E5%8F%96QQ%E5%A5%BD%E5%8F%8B%E5%88%97%E8%A1%A8.py))
- [x] QQ群([cjh0613](https://github.com/cjh0613/python-pub/blob/1c308fe90386f6d6e69e2202bb0c4acd4857576f/%E8%8E%B7%E5%8F%96QQ%E5%A5%BD%E5%8F%8B%E5%88%97%E8%A1%A8.py))
- [x] 生成朋友圈相册
- [x] 浏览器浏览历史
- [x] 12306
- [x] 博客园
- [x] CSDN博客
- [x] 开源中国博客
- [x] 简书

### 数据分析

- [x] 博客园
- [x] CSDN博客
- [x] 开源中国博客
- [x] 简书

### 计划

- 提供web界面操作，适应多平台
- 对爬取的个人数据进行统计分析
- 融合机器学习技术、自然语言处理技术等对数据深入分析
- 把分析结果绘制图表直观展示
- 添加更多数据源...

### Visitors

![](http://profile-counter.glitch.me/kangvcar/count.svg)

### Developers want to say

1. 该项目解决了个人数据分散在各种各样的公司之间，经常形成数据孤岛，多维数据无法融合的痛点。
2. 作者认为该项目的最大潜力在于能把多维数据进行融合并对个人数据进行分析，是个人数据效益最大化。
3. 该项目使用爬虫手段获取数据，所以程序存在时效问题（需要持续维护，根据网站的更新做出修改）。
4. 该项目的结构清晰，所有数据源相互独立，可移植性高，所有爬虫脚本在项目的[Spiders](https://github.com/kangvcar/InfoSpider/tree/master/Spiders)文件下，可移植到你的程序中。
5. 目前该项目v1.0版本仅在Windows平台上测试，Python 3.7，未适配多平台。
6. 计划在v2.0版本对项目进行重构，提供web端操作与数据可视化，以适配多平台。
7. 本项目[INFO-SPIDER](https://github.com/kangvcar/InfoSpider)代码已开源，欢迎star支持。

### Contributors

<a href="https://github.com/kangvcar/infospider/graphs/contributors">
  <img src="https://contributors-img.web.app/image?repo=kangvcar/infospider" />
</a>

### License
GPL-3.0

### Star History

[![Star History Chart](https://api.star-history.com/svg?repos=kangvcar/InfoSpider&type=Date)](https://star-history.com/#kangvcar/InfoSpider&Date)



================================================
FILE: README_EN.md
================================================
<p align="center">
    <img src="https://i.loli.net/2020/10/20/SKOdFZpVYo4LvgT.png" alt="logo"/>
</p>

***

<p align="center">
    <a>
        <img alt="GitHub stars" src="https://img.shields.io/github/stars/kangvcar/infospider?style=social">
    </a>
    <a>
        <img src="https://img.shields.io/badge/python-v3-blue?style=flat-square" alt="UW2eVx.png" />
    </a>
    <a>
        <img src="https://img.shields.io/badge/platform-Windows-blue?style=flat-square" alt="UW2eVx.png" />
    </a>
    <a>
        <img src="https://img.shields.io/website?up_message=%E4%BD%BF%E7%94%A8%E6%96%87%E6%A1%A3&url=https%3A%2F%2Finfospider.vercel.app%2F" alt="UW2eVx.png" />
    </a>
    <a>
    <img alt="GitHub repo size" src="https://img.shields.io/github/repo-size/kangvcar/infospider?style=flat-square">
    </a>
    <a>
    <img alt="GitHub repo size" src="https://img.shields.io/badge/license-GPL-blue?style=flat-square">
    </a>
</p>
<p align="center">A magic toolbox to get back your personal information.</p>
<p align="center"><a href="https://infospider.vercel.app/">Documentation</a> | <a href="https://www.bilibili.com/video/BV14f4y1R7oF/">Video Demo</a></p>

### Donate

<p align="center">Support Me!</p>

[Paypal](https://paypal.me/kangvcar?locale.x=zh_XC)

### Developer Memoir🌈
<details>
<summary>Click to expand👉 Developer Memoir🌈</summary>

#### Scenes 1

As usual, Xiao Ming opened the Chrome browser to browse the BBS, Tieba. Accidentally, Xiaoming opened the advertisement on the web page and jumped to JingDong Mall. When he went to close the window subconsciously, he found that (OS: it was just the product I needed!) How would JD know?Now that I've opened it, let's see the details of the product! Not bad. （OS: Give it a try!)

#### Scenes 2

Bai listens to the netease cloud music daily recommended song list can not get out of it (OS: wow! Why the playlist full of my favorite music styles? How great the netease cloud music! Love it so much! I have to buy a mumbership), strolling through ZhiHu's "How elegant XXX?, "What kind of experience is XXX?, "How do you evaluate XXX? (OS: Huh? This question is just what I want to ask, it has already been asked! What?? Thousands of answers!! Go inside and have a look!)

#### Scenes 3

Xiao Da never forget to enrich himself at work. As the major technical cnblog, CSDN, OSChina, JianShu, JueJin, etc.,  he find the homepage content recommendation is great (OS: these technical net posts are so great. I don't have to look for it as it came out). When he open the blog home page unconsciously,he found  himself stick to write blog for three years, its technology stack is becoming more and more rich (OS: how to blog background does not provide a data analysis system? I want to see how many posts I've done over the years, when I've done it, which posts are hot, which technologies I've spent more time on, and which times I've been at my peak in the evenings? In the wee hours? I hope the system can give me more guidance data so that I can create better! Looking at the above scenes, you may sigh over the progress of technology, which has greatly improved our way of life. )

#### Idea

If you have a tool like this, it can help you get your personal information back, it can help you aggregate your personal information from various sites, it can help you analyze your personal data and give you Suggestions, it can help you visualize your personal data so that you can know yourself better.

> Would you need such a tool? Would you like such a tool?

Based on the above, I started to develop **[INFO-SPIDER](https://github.com/kangvcar/InfoSpider)** 👇👇👇

</details>

### What is INFO-SPIDER

INFO-SPIDER  is a crawler toolbox with numerous data sources. It aims to help users get their data back safely and quickly. The tool code is open source and the process is transparent.
It also provides data analysis function and generates chart files based on user data, so that users can have a more intuitive and in-depth understanding of their own information.
Currently supported data sources including GitHub, QQ mailbox, NetEase mailbox, Ali mailbox, Sina mailbox, Hotmail mailbox, Outlook mailbox, JingDong, TaoBao, Alipay, China Mobile, China Unicom, China Telecom, ZhiHu, Bilibili, NetEase Cloud Music, QQ Friends, QQ Groups, WeChat Moments Album, Browser History, 12306, Cnblog, CSDN, OSCHINA, JianShu.

Refer to the [document](https://infospider.vercel.app) or [Video Demo](https://www.bilibili.com/video/BV14f4y1R7oF/) for details

You can communicate with us on [![Gitter](https://badges.gitter.im/Info-Spider/community.svg)](https://gitter.im/Info-Spider/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge)

### Features
- Safe and Reliable: this project is open source, all source code visible, local operation, safe and reliable.
- Easy to Use: to provide a GUI interface, just click the data source you want to get and follow the prompts.
- Clear Structure: All the data sources of the project are independent from each other, and their portability is high. All crawler scripts are under the [Spiders](https://github.com/kangvcar/InfoSpider/tree/master/Spiders) catalogue.
- Rich Data Sources: This project currently supports up to 24+ data sources, which are constantly updated.
- Uniform Data Format: All crawled data will be stored in JSON format.
- Rich Personal Data: This project will crawl as much personal data as possible for you, and later data processing can be reduced as needed.
- Data Analysis: This project provides visual analysis of personal data, which is currently only partially supported.
- Documentation: This project contains complete  [document](https://infospider.vercel.app) or [Video Demo](https://www.bilibili.com/video/BV14f4y1R7oF/) .

### Screenshot

![screenshot.png](https://i.loli.net/2020/10/26/4NJyMhrsGPwvxgd.png)

### QuickStart

#### Requirements
- Step1: Install python3 and Chrome
- Step2: Install the same driver as the Chrome browser
- Step3: Run `pip install -r requirements.txt`

#### Run the project
- Step1: `cd tools`
- Step2: `python3 main.py`
- Step3: Click the Data Source button in the open window and select the data save path as prompted
- Step4: The popup browser will automatically start crawling data after entering the user password, and the browser will automatically close after crawling.
- Step5: In the corresponding directory, you can view the downloaded data (xxx. JSON), data analysis chart (XXx. HTML)

### Plan
- Provide web interface operation, adapt to multi-platform
- Conduct statistical analysis of personal data
- It integrates machine learning technology and natural language processing technology to analyze the data in depth
- Chart the analysis results visually
- Add more data sources...

### Visitors

![](http://profile-counter.glitch.me/kangvcar/count.svg)

### Contributors

<a href="https://github.com/kangvcar/infospider/graphs/contributors">
  <img src="https://contributors-img.web.app/image?repo=kangvcar/infospider" />
</a>

### Sponsors

![](https://github.com/kangvcar/InfoSpider/blob/master/docs/_media/JetBrains.png?raw=true)

Thank you to JetBrains, who provide Open Source License for PyCharm!

### License

GPL-3.0

================================================
FILE: Spiders/A12306/main12306.py
================================================
import json
import datetime
import os
import sys
import requests
from tkinter.filedialog import askdirectory

# session = requests.session()
# cookie_dict = {
#     'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
#     'Cookie': 'JSESSIONID=4C46731EE8AC434BD50749C80CFCF67F; tk=dwjpjiuCb4bW06qo4oJMr5MKOw-4IRX_s1zBI-KyKSUq7-Ialm1110; RAIL_EXPIRATION=1554990611687; RAIL_DEVICEID=DNgKpgqbWzSopAqvFXfNXT3opSKTcwfTxGEIB_s60TyBtq6xHTqC1XAjQUm57eeWNoksjoHBbHDLx5HTeC_5lomXnDhs5MQ0Sv8XOOrSe2TBpQo4nlBQTR9GXc286CHhhprU0rQccB5BQ9kL5O4bfEcJADAKZq52; BIGipServerpassport=786956554.50215.0000; route=6f50b51faa11b987e576cdb301e545c4; BIGipServerotn=3973513482.24610.0000'
#
# }
# requests.utils.add_dict_to_cookiejar(session.cookies, cookie_dict)
# resp = session.post('https://kyfw.12306.cn/otn/index/initMy12306Api')


class Info(object):
    def __init__(self, cookie):
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        self.session = requests.session()
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
        }
        cookie_dict = {}
        list = cookie.split(';')
        for i in list:
            try:
                cookie_dict[i.split('=')[0]] = i.split('=')[1]
            except Exception:
                pass
        requests.utils.add_dict_to_cookiejar(self.session.cookies, cookie_dict)

    # 个人信息，json格式
    def get_user_info(self):
        url = 'https://kyfw.12306.cn/otn/modifyUser/initQueryUserInfoApi'
        resp = self.session.get(url)
        json_data = json.loads(resp.content.decode())
        self.save_json('user_info.json', resp.content.decode())
        return 0

    # 未完成订单 https://kyfw.12306.cn/otn/queryOrder/queryMyOrderNoComplete
    def get_OrderNoComplete(self):
        url = 'https://kyfw.12306.cn/otn/queryOrder/queryMyOrderNoComplete'
        data = '_json_att='
        resp = self.session.post(url, data=data, verify=False)
        json_data = json.loads(resp.content.decode())
        self.save_json('user_order_no_complete.json', resp.content.decode())
        return 0

    # 未出行订单 https://kyfw.12306.cn/otn/queryOrder/queryMyOrder
    def get_Order(self):
        url = 'https://kyfw.12306.cn/otn/queryOrder/queryMyOrder'
        from dateutil.relativedelta import relativedelta

        # 时间可变
        queryStartDate = (datetime.date.today() - relativedelta(months=+1)).strftime("%Y-%m-%d")
        queryEndDate = datetime.datetime.now().strftime("%Y-%m-%d")
        data = {'come_from_flag': 'my_order',
                'pageIndex': 0,
                'pageSize': 8,
                'query_where': 'G',
                'queryStartDate': queryStartDate,
                'queryEndDate': queryEndDate,
                'queryType': 1,
                'sequeue_train_name': ''}
        resp = self.session.post(url, data=data)
        json_data = json.loads(resp.content.decode())
        self.save_json('user_order.json', resp.content.decode())
        return 0

    # 联系人  https://kyfw.12306.cn/otn/passengers/query
    def get_passengers(self):
        url = 'https://kyfw.12306.cn/otn/passengers/query'
        data = {'pageIndex': 1,
                'pageSize': 10}
        resp = self.session.post(url, data=data)
        json_data = json.loads(resp.content.decode())
        self.save_json('user_passengers.json', resp.content.decode())
        return 0

    # 车票快递地址  https://kyfw.12306.cn/otn/address/initApi
    def get_address(self):
        url = 'https://kyfw.12306.cn/otn/address/initApi'
        data = None
        resp = self.session.post(url, data=data)
        json_data = json.loads(resp.content.decode())
        self.save_json('user_address.json', resp.content.decode())
        return 0

    # 保险订单  https://kyfw.12306.cn/otn/insurance/queryMyIns
    def get_insurance(self):
        url = 'https://kyfw.12306.cn/otn/insurance/queryMyIns'
        # 时间可变
        from dateutil.relativedelta import relativedelta
        queryStartDate = (datetime.date.today() - relativedelta(months=+1)).strftime("%Y-%m-%d")
        queryEndDate = datetime.datetime.now().strftime("%Y-%m-%d")
        data = {'come_from_flag': 'my_ins',
                'pageIndex': 0,
                'pageSize': 8,
                'query_where': 'H',
                'queryStartDate': queryStartDate,
                'queryEndDate': queryEndDate,
                'queryType': 1,
                'sequeue_train_name': ''}
        data = 'queryStartDate=2019-04-09&queryEndDate=2019-04-09&pageSize=8&pageIndex=1&query_where=H&sequeue_train_name=&come_from_flag=my_ins'
        resp = self.session.post(url, data=data)
        self.save_json('user_insurance.json', resp.content.decode())
        return 0

    # 历史订单 https://kyfw.12306.cn/otn/queryOrder/queryMyOrder
    def get_History_Order(self):
        url = 'https://kyfw.12306.cn/otn/queryOrder/queryMyOrder'
        from dateutil.relativedelta import relativedelta

        cookie_dict = {'Referer': 'https://kyfw.12306.cn/otn/view/train_order.html'}

        self.headers.update(cookie_dict)
        # 时间可变
        queryStartDate = (datetime.date.today() - relativedelta(months=+1)).strftime("%Y-%m-%d")
        queryEndDate = datetime.datetime.now().strftime("%Y-%m-%d")

        # data = {'come_from_flag': 'my_order',
        #         'pageIndex': 0,
        #         'pageSize': 8,
        #         'query_where': 'H',
        #         'queryStartDate': queryStartDate,
        #         'queryEndDate': queryEndDate,
        #         'queryType': 1,
        #         'sequeue_train_name': ''}

        data = 'come_from_flag=my_order&pageIndex=0&pageSize=8&query_where=H&queryStartDate=2019-06-01&queryEndDate=2019-06-21&queryType=1&sequeue_train_name=15659358815'
        resp = self.session.post(url, headers=self.headers, data=data, verify=False)
        self.save_json('user_history_order.json', resp.content.decode())
        return 0

    # 会员信息
    def get_level(self):
        url = 'https://cx.12306.cn/tlcx/memberInfo/memberPointQuery'
        data = 'queryType=0'
        resp = self.session.post(url, data=data)
        self.save_json('user_level.json', resp.content.decode())
        return 0

    def save_json(self, name, ret):
        # file_path = os.path.join(os.path.dirname(__file__) + '/' + name)
        with open(self.path + os.sep + name, 'w', encoding='utf-8') as f:
            f.write(ret)


if __name__ == '__main__':
    pass
    # a = Info()
    # user = a.get_user_info()
    # a.save_json('user.json', user)
    # OrderNoComplete = a.get_OrderNoComplete()
    # a.save_json('OrderNoComplete.json',OrderNoComplete)
    # Order = a.get_Order()
    # a.save_json('Order.json',Order)
    # passengers = a.get_passengers()
    # a.save_json('passengers.json',passengers)
    # address = a.get_address()
    # a.save_json('address.json',address)
    # insurance = a.get_insurance()
    # a.save_json('insurance.json',insurance)
    # History_Order = a.get_History_Order()
    # a.save_json('History_Order.json',History_Order)
    #
    # # 换json
    # level = a.get_level()
    # a.save_json('level.json',level)


================================================
FILE: Spiders/JdSpider/jd_more_info.py
================================================
# coding: utf8
import json
import os
import re
import sys
import requests
from lxml import etree
import datetime
import bs4
from pprint import pprint
from tkinter.filedialog import askdirectory
from tqdm import tqdm
from tqdm import trange

class JSpider(object):
    def __init__(self, cookie, data_dir="./"):
        # self.data_dir = data_dir
        self.data_dir = askdirectory(title='选择信息保存文件夹')
        if str(self.data_dir) == "":
            sys.exit(1)
        self.session = requests.session()
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
        }
        cookie_dict = {}
        list = cookie.split(';')
        for i in list:
            try:
                cookie_dict[i.split('=')[0]] = i.split('=')[1]
            except IndexError:
                cookie_dict[''] = i
        requests.utils.add_dict_to_cookiejar(self.session.cookies, cookie_dict)

    # 个人信息
    def get_user_info(self):
        url = 'https://wq.jd.com/user/info/QueryXBCreditScore?_=1556338705353&sceneval=2&g_login_type=1&callback=getCreditInfoCb&g_tk=2127038752&g_ty=ls'
        self.headers['Referer'] = 'https://wqs.jd.com/my/asset.html'
        resp = self.session.get(url, headers=self.headers).content.decode().replace('try{getCreditInfoCb(', '').replace(
            ');}catch(e){}', '')

        json_data = json.loads(resp)['data']
        url = 'https://api.m.jd.com/api?appid=pc_home_page&functionId=getBaseUserInfo&loginType=3'
        resp = json.loads(self.session.get(url, headers=self.headers).content.decode())['returnObj']
        json_data.append(resp)
        str = json.dumps(json_data)
        self.write_json('user_info.json', str)

    def write_json(self, name, str):
        #file_path = os.path.join(os.path.dirname(__file__) + '/' + name)
        try:
            # os.mkdir(os.path.join(self.data_dir, "./jd"))
            os.mkdir(os.path.join(self.data_dir))
        except OSError:
            pass
        # file_path = os.path.join(self.data_dir, './jd/' + name)
        file_path = os.path.join(self.data_dir, name)
        with open(file_path, 'w') as f:
            f.write(str)

    # 信用账单
    def get_creditData(self):
        url = 'https://trade.jr.jd.com/async/creditData.action'
        resp = self.session.get(url, headers=self.headers)
        self.write_json('creditData.json', resp.content.decode())

    # 钱包概括
    def get_browseDataNew(self):
        url = 'https://trade.jr.jd.com/async/browseDataNew.action'
        resp = self.session.get(url)
        self.write_json('wallet.json', resp.content.decode())

    # 收益账单
    def get_income(self):
        url = 'https://trade.jr.jd.com/centre/getOverviewInData.action'
        resp = self.session.get(url)
        self.write_json('income.json', resp.content.decode())

    # 地址
    def get_addr(self):
        url = 'https://easybuy.jd.com/address/getEasyBuyList.action'
        headers = {
            'Host': 'easybuy.jd.com',
            'Connection': 'keep-alive',
            'Cache-Control': 'max-age=0',
            'Upgrade-Insecure-Requests': '1',
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
            'Referer': 'https://home.jd.com/',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8'
        }
        resp = self.session.get(url, headers=headers)
        obj_list = etree.HTML(resp.content.decode()).xpath('//div[@class="item-lcol"]')
        json_list = []
        for obj in obj_list:
            item = {}
            item['name'] = obj.xpath('./div[1]/div[1]/text()')[0].strip()
            item['addr'] = obj.xpath('./div[2]/div[1]/text()')[0].strip()
            item['detail_addr'] = obj.xpath('./div[3]/div[1]/text()')[0].strip()
            item['mobile'] = obj.xpath('./div[4]/div[1]/text()')[0].strip()
            item['tel'] = obj.xpath('./div[5]/div[1]/text()')[0].strip()
            item['email'] = obj.xpath('./div[6]/div[1]/text()')[0].strip()
            json_list.append(item)
        self.write_json('addr.json', json.dumps(json_list))

    # 银行卡
    def get_YHK(self):
        url = 'https://authpay.jd.com/card/queryBindCard.action'
        resp = self.session.get(url, headers=self.headers)
        item = {}
        # print(resp.content.decode())
        item['name'] = etree.HTML(resp.content.decode()).xpath('//span[contains(text(),"持卡人姓名")]/text()')[0].replace(
            '持卡人姓名：', '')
        item['mobile'] = etree.HTML(resp.content.decode()).xpath('//span[contains(text(),"手机号：")]/text()')[0].replace(
            '手机号：', '')
        item['last_num'] = etree.HTML(resp.content.decode()).xpath('//span[contains(text(),"尾号")]/text()')[0].replace(
            '尾号', '')
        str = json.dumps(item)
        self.write_json('YHK_info.json', str)

    # 小金库账单
    def get_xjk_info(self):
        url = 'https://xjk.jr.jd.com/gold/account'
        resp = self.session.get(url)
        self.write_json('xjk.json', resp.content.decode())

    # 理财收益
    def get_finance_income(self):
        url = 'https://trade.jr.jd.com/ajaxFinance/queryFundInfo.action'
        resp = self.session.get(url)
        self.write_json('finance_income.json', resp.content.decode())

    # 钢镚数
    def get_GB_num(self):
        url = 'https://gb.jd.com/asset/myassets.html?from=myzc-left-gb'
        resp = self.session.get(url)
        num = etree.HTML(resp.content.decode()).xpath('//em[@class="h-i-num"]/text()')[0]
        str = json.dumps({'gb_num': num})
        self.write_json('GB_num.json', str)

    # 京东金融交易单
    def get_JY_bill(self):
        # 可获取多页，定义pageNo
        url = 'https://trade.jr.jd.com/trade/tradebuynew.action?pageNo=0&pageSize=10&timeFlag=0&projectType=0&orderStatus=-1&date1=&date2='
        resp = self.session.get(url, headers=self.headers)
        self.write_json('jiaoyi_bill.json', resp.content.decode())

    # 收藏店铺
    def get_follow_shops(self):
        url = 'https://t.jd.com/follow/vender'
        resp = self.session.get(url)
        ele = etree.HTML(resp.content.decode())
        obj_list = ele.xpath('//div[@class="mf-shop-list "]/div')
        json_list = []
        for obj in obj_list:
            item = {}
            item['name'] = ''.join(obj.xpath('.//div[@class="shop-name"]//text()')).strip()
            item['url'] = obj.xpath('.//div[@class="shop-name"]/a/@href')[0]
            json_list.append(item)
        str = json.dumps(json_list)
        self.write_json('follow_shops.json', str)

    # 收藏s商品
    def get_follow_products(self):
        url = 'https://t.jd.com/follow/product'
        resp = self.session.get(url)
        ele = etree.HTML(resp.content.decode())
        obj_list = ele.xpath('//div[@class="mf-goods-list clearfix "]/div')
        json_list = []
        for obj in obj_list:
            item = {}
            item['name'] = ''.join(obj.xpath('.//div[@class="p-name"]//text()')).strip()
            item['url'] = obj.xpath('.//div[@class="p-name"]/a/@href')[0]
            item['price'] = ''.join(obj.xpath('.//div[@class="p-price"]/strong/@price')).strip()
            item['status'] = ''.join(obj.xpath('.//div[@class="p-stats"]//text()')).strip()
            json_list.append(item)
        str = json.dumps(json_list)
        self.write_json('follow_products.json', str)

    # 收藏s商品
    def get_cart(self):
        url = 'https://cart.jd.com/cart.action'
        resp = self.session.get(url)
        ele = etree.HTML(resp.content.decode())
        obj_list = ele.xpath('//div[@class="item-form"]')
        json_list = []
        for obj in obj_list:
            item = {}
            item['name'] = ''.join(obj.xpath('.//div[@class="p-name"]//text()')).strip()
            item['skus'] = ''.join(obj.xpath('.//div[@class="cell p-props p-props-new"]//text()')).strip()
            item['url'] = obj.xpath('.//div[@class="p-name"]/a/@href')[0]
            item['price'] = ''.join(obj.xpath('.//div[@class="cell p-sum"]//text()')).strip()
            item['num'] = ''.join(obj.xpath('.//div[@class="cell p-quantity"]//input/@value')).strip()
            json_list.append(item)
        str = json.dumps(json_list)
        self.write_json('carts.json', str)

    # 已购商品
    def get_orders(self):
        for i in [2020, 2019, 2018]:
            url = 'https://order.jd.com/center/list.action?search=0&d={}&s=4096'.format(i)
            # print(url)
            resp = self.session.get(url)
            # print(resp.content.decode('gbk'))
            ele = etree.HTML(resp.content.decode('gbk'))
            obj_list = ele.xpath('//table[@class="td-void order-tb"]/tbody')[1:]
            json_list = []
            url = 'https://order.jd.com/lazy/getOrderProductInfo.action'
            try:
                data = {
                    'orderWareIds': '{}'.format(
                        re.findall(r"ORDER_CONFIG\['orderWareIds'\]='([\d,]+)'", resp.content.decode('gbk'))[0]),
                    'orderWareTypes': '{}'.format(
                        re.findall(r"ORDER_CONFIG\['orderWareTypes'\]='([\d,]+)'", resp.content.decode('gbk'))[0]),
                    'orderIds': '{}'.format(
                        re.findall(r"ORDER_CONFIG\['orderIds'\]='([\d,]+)'", resp.content.decode('gbk'))[0]),
                    'orderTypes': '{}'.format(
                        re.findall(r"ORDER_CONFIG\['orderTypes'\]='([\d,]+)'", resp.content.decode('gbk'))[0]),
                    'orderSiteIds': '{}'.format(
                        re.findall(r"ORDER_CONFIG\['orderSiteIds'\]='([\d,]+)'", resp.content.decode('gbk'))[0]),
                    'sendPays': '{}'.format(
                        re.findall(r"ORDER_CONFIG\['sendPays'\]='([\d,]+)'", resp.content.decode('gbk'))[0]),
                }
            except Exception:
                return
            json_list = json.loads(self.session.post(url, data=data).content.decode('gbk'))
            ret_list = []
            for obj in obj_list:
                try:
                    item = json_list[obj_list.index(obj)]
                    item['goods-number'] = ''.join(obj.xpath('.//div[@class="goods-number"]//text()')).strip()
                    item['consignee tooltip'] = ''.join(obj.xpath('.//div[@class="consignee tooltip"]/text()')).strip()
                    item['amount'] = ''.join(obj.xpath('.//div[@class="amount"]//text()')).strip()
                    item['order-shop'] = ''.join(obj.xpath('.//span[@class="order-shop"]//text()')).strip()
                    ret_list.append(item)
                except Exception:
                    continue

            self.write_json('jd_orders_{}.json'.format(i), json.dumps(ret_list))

        # dict = {
        #     "orderType": 37,
        #     "erpOrderId": "%s"
        # }
        #
        # obj_list = [{"orderType": 37, "erpOrderId": "%s" % (i.replace('tb-', ''))} for i in
        #             ele.xpath('//table[@class="td-void order-tb"]/tbody/@id')]
        #
        # url = 'https://ordergw.jd.com/orderCenter/app/1.0/?queryList={}'.format(json.dumps(obj_list))
        # result_resp = self.session.get(url, headers=self.headers).content.decode().replace('null(', '')[:-1]


    #############################################################################

    def getAndStoreBoughtItems(self):
        orderParseResults = []
        currentYear = datetime.datetime.now().year
        # print(currentYear)
        for year in trange(currentYear, 2013-1, -1):
            paramYear = year
            if year == currentYear:
                paramYear = 2
            elif year < 2014:
                paramYear = 3
            # print("paramYear: ", paramYear)
            r = self.getOnePageOrder(paramYear)
            orderParseResults.extend(r)

        datatable = self.changeOrderParseResultListToTable(orderParseResults)
        self.writeDatatableIntoFile("allOrders.csv", datatable)

    def getOnePageOrder(self, year):
        orderParseResults = []
        curPage = 1
        while True:
            # print("curPage: ", curPage)
            params = {"search":"0", "d":str(year), "s":"4096", "page":str(curPage)}
            resp = self.session.get(url="https://order.jd.com/center/list.action", params=params, headers=self.headers)
            #print(resp.request.url)
            #print(resp.text)
            resultHtml = resp.text
            
            r = self.parseOnePageOrder(resultHtml)
            orderParseResults.extend(r)

            if '<span class="next-disabled">下一页<b></b></span>' in resultHtml or '最近没有下过订单哦~' in resultHtml:
                # print("breakout")
                break

            curPage += 1
        return orderParseResults

    def parseOnePageOrder(self, resultHtml):
        soup = bs4.BeautifulSoup(resultHtml, 'html.parser')
        linkItems = soup.find_all(attrs={"name":"orderIdLinks"})
        #print(linkItems)
        
        urls = []
        for e in linkItems:
            url = e['href']
            # print(url)
            if url.startswith("http://"):
                url = "https" + url[4:]
            elif url.startswith("//"):
                url = "https:" + url
            elif url == "javascript:void(0);":
                continue
            elif not url.startswith("https://"):
                raise Exception("Unsupported url: " + url)
            urls.append(url)
        #print(urls)

        orderParseResults = []
        for url in urls:
            # print(url)
            if url.startswith("https://details.jd.com/normal/item.action"):
                orderParseResult = self.getOrderOfNormal(url)
            elif url.startswith("https://chongzhi.jd.com/order/order_autoDetail.action"):
                orderParseResult = self.getOrderOfChongzhi(url)
            #elif url.startswith("https://home.jd.hk/"):
            #    # TODO
            #    pass
            #elif url.startswith("https://huishou.jd.com/orderdetail"):
            #    pass
            else:
                continue
                # raise Exception("TODO: unknown jd order detail url: " + url)
            orderParseResults.append(orderParseResult)

        return orderParseResults

    def getOrderOfNormal(self, orderDetailUrl, resultHtml=None):
        if resultHtml is None:
            resp = self.session.get(url=orderDetailUrl, headers=self.headers)
            resultHtml = resp.text
        #print(resultHtml)
        soup = bs4.BeautifulSoup(resultHtml, 'html.parser')

        orderId = soup.find(id="orderid")["value"]
        # print(orderId)
        
        goodsTotal = soup.find(class_="goods-total")
        #print(goodsTotal)
        #print(goodsTotal.ul)
        goodsTotalAmount = 0
        orderTotalAmount = 0
        for li in goodsTotal.ul.children:
            if not isinstance(li, bs4.element.Tag):
                continue
            #TODO:check the result of find_all
            label, txt = li.find_all("span")[:2]
            #print(label.string, txt.string)
            if "商品总额" in str(label.string):
                goodsTotalAmount = float(txt.string.strip().replace("¥", ""))
            elif "应付总额" in str(label.string):
                orderTotalAmount = float(txt.string.strip().replace("¥", ""))
        # print(goodsTotalAmount, orderTotalAmount)

        statusString = ""
        paymentTime = 0
        eleStatus = soup.find(class_="state-txt")
        statusString = eleStatus.string
        # print(statusString)
        payInfo = soup.find(id="pay-info-nozero")
        #print(payInfo)
        if payInfo:
            divs = payInfo.find_all(class_="item")
            # print(divs)
            for div in divs:
                label = div.find(class_="label").string.strip()
                if "付款时间" in label:
                    paymentTime = div.find(class_="info-rcol").string.strip()
                    break
        # print(paymentTime)

        orderItemParseResults = []

        goodsList = soup.find(class_="goods-list")
        goodsListTableBody = goodsList.table.tbody
        #print(goodsListTableBody)
        trs = goodsListTableBody.find_all("tr")
        #print(trs)
        for tr in trs:
            if tr.has_attr("style") and tr["style"] == "display: none":
                break
            tds = tr.find_all("td")
            #print(tds)
            productId = tds[2].string.strip()
            productName = tds[1].find(class_="p-name").a.string.strip()
            # print(productId, productName)

            orderItemParseResult = {}
            orderItemParseResult["productId"] = productId
            orderItemParseResult["productName"] = productName
            # TODO: more info
            orderItemParseResults.append(orderItemParseResult)

        orderParseResult = {}
        orderParseResult["orderId"] = orderId
        orderParseResult["paymentTime"] = paymentTime
        orderParseResult["status"] = statusString
        #orderParseResult["goodsTotalAmount"] = goodsTotalAmount
        orderParseResult["orderTotalAmount"] = orderTotalAmount
        orderParseResult["orderItems"] = orderItemParseResults
        # TODO: more info

        # pprint(orderParseResult)
        return orderParseResult

    def getOrderOfChongzhi(self, orderDetailUrl, resultHtml=None):
        if resultHtml is None:
            resp = self.session.get(url=orderDetailUrl, headers=self.headers)
            resultHtml = resp.text
        #print(resultHtml)
        soup = bs4.BeautifulSoup(resultHtml, 'html.parser')

        orderStateDiv = soup.find(id="orderstate")
        #print(orderStateDiv)
        orderId = orderStateDiv.find(class_="fl").string.replace("："," ").split()[1]
        # print(orderId)
        statusString = orderStateDiv.find(class_="ftx-02").string
        # print(statusString)

        totalAmountDiv = soup.find(class_="total")
        #print(totalAmountDiv)
        orderTotalAmount = float(totalAmountDiv.find(class_="ftx-01").b.string.strip())
        # print(orderTotalAmount)

        orderInfoDiv = soup.find(id="orderinfo")
        orderInfoUl = orderInfoDiv.find(class_="fore").dd.ul
        #print(orderInfoUl)
        paymentTime = 0
        for li in orderInfoUl.find_all("li"):
            if not li.string:
                break
            text = li.string.strip()
            if "下单时间" in text:
                paymentTime = text.replace("下单时间：", "")
                break
        # print(paymentTime)

        orderItemParseResults = []
        trs = orderInfoDiv.find(class_="p-list").table.tbody.find_all("tr")
        #print(trs)
        for i in range(1, len(trs)):
            tr = trs[i]
            tds = tr.find_all("td")
            #print(tds)
            
            productId = tds[0].string.strip()
            productName = tds[1].div.a.string.strip()
            
            orderItemParseResult = {}
            orderItemParseResult["productId"] = productId
            orderItemParseResult["productName"] = productName
            # TODO: more info
            orderItemParseResults.append(orderItemParseResult)

        orderParseResult = {}
        orderParseResult["orderId"] = orderId
        orderParseResult["paymentTime"] = paymentTime
        orderParseResult["status"] = statusString
        orderParseResult["orderTotalAmount"] = orderTotalAmount
        orderParseResult["orderItems"] = orderItemParseResults
        # TODO: more info

        # pprint(orderParseResult)
        return orderParseResult

    def changeOrderParseResultListToTable(self, orderParseResultList):
        datatable = []
        datatable.append(["订单id", "付款时间", "状态", "金额", "商品id", "商品名称"])
        for orderParseResult in orderParseResultList:
            orderItemList = orderParseResult["orderItems"]
            for i, orderItem in enumerate(orderItemList):
                if i == 0:
                    datatable.append([
                        orderParseResult["orderId"],
                        orderParseResult["paymentTime"],
                        orderParseResult["status"],
                        orderParseResult["orderTotalAmount"],
                        orderItem["productId"],
                        orderItem["productName"]
                    ])
                else:
                    datatable.append([
                        "",
                        "",
                        "",
                        "",
                        orderItem["productId"],
                        orderItem["productName"]
                    ])
        return datatable
    
    def writeDatatableIntoFile(self, filename, datatable):

        # file_dic = os.path.join(self.data_dir, "./jd/")
        file_dic = os.path.join(self.data_dir)
        if not os.path.exists(file_dic):
            os.mkdir(file_dic)
        with open(os.path.join(file_dic+filename), "w+") as f:
            for row in datatable:
                rowdata = ",".join([str(tmp).replace(",","") for tmp in row])
                f.write(rowdata+"\n")

    #############################################################################
        

if __name__ == '__main__':
    cookie = 'shshshfpa=7da97119-ca02-a4b5-f883-70bffbb95d2d-1551953689; shshshfpb=oMWkS2uhzSRpZkjikQcMliQ%3D%3D; PCSYCityID=904; areaId=12; ipLoc-djd=12-965-967-38496; user-key=c2b0ef35-0281-4be2-bcf4-4148cb7f518c; sid=509cf3bcff856462fbf0d27defc956e5; __jdu=1551953690075275239163; cn=2; mt_xid=V2_52007VwMWVl1QVlgYQRhdA2MAFFZeWlpaGEspCAFjA0ZUXVBODx4cG0AAMlRFTlVQAA0DHB8MUDJQEAdcXgdTL0oYXA17AhdOXlBDWRxCHV0OZQUiUm1YYlgYTRFeAGYDGmJfXFNf; TrackID=15Z7AWYzc3eAvMsE9Og3XV9Vqvs8o3ajKGACHAWtDTYe7ivuULPCSkMo9tWJS9lHXwCZ8LdfUDyXQ3mWG_bQVpmsyOSMbH2Pp27E7FkLg140GREw3XL6HXu6C6fcCcMXL; __jda=122270672.1551953690075275239163.1551953690.1556614926.1557023684.18; __jdc=122270672; __jdv=122270672|baidu|-|organic|not set|1557023684472; shshshfp=4e62136f3f0d1f33c8191339402dd3a2; thor=BF23B370DD491EEE15CB4D3DBB29B61D6F15B23D58582409C58B0131D8A52E7B2A06114EC2F615519BB1B4EF9CC199A07E6CE4B2E82A125954D7C292D5F544DE013BB9CC77B3B4756CB9125C1021395832FFA1913E13F06D3D1F9ACF727A228E96D03B8E2FFAED7952795D7D31A24E79D58C916331895A6F660C9D84B083319A88C75EED9A114DC7D913A8DD9F83A466; pinId=PFNaS1XmeXF9TBCvJEIlnw; pin=18621759441_p; unick=jd_Miss+Vivi; ceshi3.com=000; _tp=s3iKj%2BxpTSF0rzuw4y6G4g%3D%3D; _pst=18621759441_p; shshshsID=a2fc2b444f7cc5ec786568326575de27_2_1557023849318; 3AB9D23F7A4B3C9B=Z2QEKJIHVKXOC2JNFZIJKKVZ5XVWWIJWQ5TRJ7BOO4K45QVTUZKRK4WXVBSIP5WQ7ZQYP22ZTJWGQILVDNH7G7QGDA; __jdb=122270672.6.1551953690075275239163|18.1557023684'
    spider = JSpider(cookie)
    # test
    '''
    with open("jd.txt", "r", encoding="utf8") as f:
        htmlcontent = f.read()
    spider.parseOnePageOrder(htmlcontent)
    '''
    '''
    #with open("normal_item_action_ori.html", "r", encoding="utf8") as f:
    with open("chongzhi_order_autoDetail_action_ori.html", "r", encoding="utf8") as f:
        htmlcontent = f.read()
    orderParseResult = spider.getOrderOfChongzhi(None, htmlcontent)
    datatable = spider.changeOrderParseResultListToTable([orderParseResult])
    spider.writeDatatableIntoFile(os.path.join(spider.data_dir, "allOrders.csv"), datatable)
    '''
    # spider.get_orders()
    # spider.get_creditData()
    # spider.get_browseDataNew()
    # spider.get_income()
    # spider.get_user_info()
    # spider.get_addr()
    # spider.get_YHK()
    # spider.get_xjk_info()
    # spider.get_finance_income()
    # spider.get_GB_num()
    # spider.get_JY_bill()
    # spider.get_follow_shops()
    # spider.get_follow_products()
    # spider.get_cart()


================================================
FILE: Spiders/__init__.py
================================================


================================================
FILE: Spiders/alipay/main.py
================================================
import json
import os
import re
import os
import requests
from lxml import etree
from selenium import webdriver
from selenium.webdriver import ChromeOptions
from tkinter.filedialog import askdirectory
from tqdm import tqdm

class ASpider(object):
    def __init__(self, cookie):
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        self.session = requests.session()
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
            'referer': ''
        }
        cookie_dict = {}
        list = cookie.split(';')
        for i in list:
            try:
                cookie_dict[i.split('=')[0]] = i.split('=')[1]
            except Exception:
                pass
        requests.utils.add_dict_to_cookiejar(self.session.cookies, cookie_dict)

    def get_user_info(self):
        url = 'https://custweb.alipay.com/account/index.htm'
        resp = self.session.get(url)
        obj = etree.HTML(resp.content.decode()).xpath('//tbody')[0]
        item = {}
        item['name'] = ''.join(obj.xpath('./tr[1]/td[1]//text()')).strip()
        item['email'] = ''.join(obj.xpath('./tr[2]/td[1]//text()')).strip()
        item['mobile'] = ''.join(obj.xpath('./tr[3]/td[1]//text()')).strip()
        item['tb_name'] = ''.join(obj.xpath('./tr[4]/td[1]//text()')).strip()
        item['register_time'] = ''.join(obj.xpath('./tr[7]/td[1]//text()')).strip()
        self.write_json(self.path + os.sep + 'user_info.json', json.dumps(item))

    def write_json(self, name, str):
        # file_path = os.path.join(os.path.dirname(__file__) + '/' + name)
        with open(name, 'w') as f:
            f.write(str)

    def get_YEB(self):
        url = 'https://yebprod.alipay.com/yeb/asset.htm'
        resp = self.session.get(url)
        ele = etree.HTML(resp.content.decode('gbk'))
        item = {}
        # print(etree.tostring(ele))
        item['eye-val'] = re.sub('\s', '', ele.xpath('.//span[@class="eye-val"]/text()')[0])
        item['total_val'] = re.sub('\s', '', ele.xpath('.//div[@class="box-bill-foot-account eye-val"]/text()')[0])
        item['Unavailable_val'] = re.sub('\s', '', ele.xpath('.//div[@class="box-bill-foot-account eye-val"]/text()')[1])
        self.write_json(self.path + os.sep + 'yu_e_bao.json', json.dumps(item))

    def get_bills(self):
        url = 'https://lab.alipay.com/consume/record/items.htm'
        self.headers['referer'] = 'https://my.alipay.com/portal/i.htm'
        resp = self.session.get(url, headers=self.headers, verify=False)
        obj_list = etree.HTML(resp.content.decode('gbk')).xpath('//tbody/tr')
        json_list = []
        for obj in tqdm(obj_list):
            item = {}
            item['number'] = ''.join(obj.xpath('./td[1]//text()')).strip()
            item['time'] = ''.join(obj.xpath('./td[2]//text()')).strip()
            item['info'] = ''.join(obj.xpath('./td[3]//text()')).strip()
            item['income'] = ''.join(obj.xpath('./td[4]//text()')).strip()
            item['outcome'] = ''.join(obj.xpath('./td[5]//text()')).strip()
            item['balance'] = ''.join(obj.xpath('./td[6]//text()')).strip()
            item['from'] = ''.join(obj.xpath('./td[7]//text()')).strip()
            item['detail'] = ''.join(obj.xpath('./td[8]//text()')).strip()
            json_list.append(item)
        ye = ''.join(obj_list[0].xpath('./td[6]//text()')).strip()
        ye_dict = {'YuE': ye}
        self.write_json(self.path + os.sep + 'bill_list.json', json.dumps(json_list))
        self.write_json(self.path + os.sep + 'balance.json', json.dumps(ye_dict))


if __name__ == '__main__':
    cookie = 'cna=FMHmFL1zqnUCASQH4bAneyUf; mobileSendTime=-1; credibleMobileSendTime=-1; ctuMobileSendTime=-1; riskMobileBankSendTime=-1; riskMobileAccoutSendTime=-1; riskMobileCreditSendTime=-1; riskCredibleMobileSendTime=-1; riskOriginalAccountMobileSendTime=-1; isg=BMTEs6f5RNVXdvCZiIUsYWqLlUR2dekgISd1n95lQQ9SCWTTBu-t19yoSeF0FCCf; l=bBgcZ5c7vJ2Of-mJBOCwCuI8L179_IRYSuPRwCmXi_5pZ6T68E7Olorn_F96Vj5Rs4TB4UJxb0v9-etXw; UM_distinctid=169b3c04ea8509-063bdd824c9e64-12306d51-fa000-169b3c04ea95a8; unicard1.vm="K1iSL1mnW5fEFTtXnTWZPQ=="; NEW_ALIPAY_TIP=1; csrfToken=M_AdqLObk41r9VvTDoRdyy2Q; CLUB_ALIPAY_COM=2088022680005311; iw.userid="K1iSL1mnW5fEFTtXnTWZPQ=="; ali_apache_tracktmp="uid=2088022680005311"; session.cookieNameId=ALIPAYJSESSIONID; LoginForm=alipay_login_auth; alipay="K1iSL1mnW5fEFTtXnTWZPca48DVsXJKl1U07jLnVskUcfw=="; spanner=hWXgcY78eHIkRX5btAjBSJV5G91m2+NMXt2T4qEYgj0=; locale=zh-cn; CHAIR_SESS=JWYmdXvINYrjfJhNfnAOApEy7drxxpERpaBXObg17RYQr9jGJZDWNQuk7GTZ-NeYuRSIYTsU7tiaFoLpKJpwTQ2FZqKmOSphZ98CHxZicmK3XOz8tgVdDWKxbBKLiiY4Tk4zkLNOIkCMlfoY4vOsGvxtikpzFXx61uyLzy-_-PGsZT1UzN0CDKSYTq1xRxaYhfp7vURB4eAzWjJpQXXmxXDq8A8cqmAyErsLtLBG8MfxigkVOwR88J5o95xQFcJ0; ctoken=QwetGqWKOjvvPRGx; zone=GZ00D; ALIPAYJSESSIONID=RZ257CXtTz7r7Ra0sc4QHeC4nrz1eyauthRZ25GZ00; rtk=umvDaVnzeH3Uz7V5rmCCnDE+MOkI1ZKNRTuJzmidxn8p1ZcI5EA'
    spider = ASpider(cookie)
    spider.get_bills()
    spider.get_user_info()
    spider.get_YEB()


================================================
FILE: Spiders/bilibili/main.py
================================================
import os
import json
import time
import requests
from tkinter.filedialog import askdirectory

class BilibiliHistory(object):
    def __init__(self, cookie_str):
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        self.MAX_PAGE = 10
        self.PAGE_PER_NUM = 200

        self.cookie = cookie_str
        self.history = self.get_all_bili_history()
        self.save(self.history, 'bilibili_history.json')
        self.userinfo = self.get_user_info()
        self.save(self.userinfo, 'user_info.json')

    def get_all_bili_history(self):
        headers = self.get_header()
        # history = {'all': []}
        history = []
        for page_num in range(self.MAX_PAGE):
            
            url = 'https://api.bilibili.com/x/v2/history?pn={pn}&ps={ps}&jsonp=jsonp'.format(pn=page_num, ps=self.PAGE_PER_NUM)
            result = self.req_get(headers, url)
            # print('page = {} code = {} datalen = {}'.format(page_num, result['code'], len(result['data'])))
            print('爬取中...')
            time.sleep(1)
            # if len(result['data']) == 0:
            if not result['data']:
                print('爬取完成...')
                break
            # if page_num == 2:
            #     break
            history.append(result)
        return history

    def get_user_info(self):
        headers = self.get_header()
        url = 'https://api.bilibili.com/x/member/web/account'
        result = self.req_get(headers, url)
        return result

    def req_get(self, headers, url):
        resp = requests.get(url, headers=headers)
        return json.loads(resp.text)

    def save(self, data, filename):
        with open(self.path + os.sep + filename, 'w', encoding='utf-8') as fp:
            json.dump(data, fp, ensure_ascii=False)
        return True

    def get_header(self):
        headers = {
            'Accept': '*/*',
            'Accept-Encoding': 'gzip, deflate, br',
            'Accept-Language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7',
            'Connection': 'keep-alive',
            'Cookie': self.cookie,
            'Host': 'api.bilibili.com',
            'Referer': 'https://www.bilibili.com/account/history',
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 '
                        '(KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
        }
        return headers




================================================
FILE: Spiders/browser/main.py
================================================
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import os
import sys
import json
import time
import sqlite3  
import operator  
from collections import OrderedDict  
import matplotlib.pyplot as plt  
from tkinter.filedialog import askdirectory
from tqdm import tqdm

class Browserhistory(object):
    def __init__(self):
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        #path to user's history database (Chrome)  
        self.data_path=os.path.expanduser('~') + r"\AppData\Local\Google\Chrome\User Data\Default"
        self.history_db=os.path.join(self.data_path,'history')
        #querying the db  
        c = sqlite3.connect(self.history_db)  
        cursor = c.cursor()  
        select_statement = "SELECT urls.id, urls.url, urls.title, urls.visit_count, urls.last_visit_time, visits.visit_time, visits.visit_duration FROM urls, visits WHERE urls.id = visits.url;"  
        cursor.execute(select_statement)
        self.results = cursor.fetchall() #tuple  

        self.data_save_as_json(self.results)

    # transfer timestamp format
    def timestamp_format(self, timestamp):
        if timestamp > 13000000000000000:
            time_c = timestamp/1000000-11644473600
            return time.strftime("%Y-%m-%d %X", time.localtime(time_c))
        else:
            return timestamp

    # transfer to json and save to file.
    def data_save_as_json(self, data):
        history_list = []
        for i in tqdm(data):
            item = {}
            item['urls.id'] = i[0]
            item['urls.url'] = i[1]
            item['urls.title'] = i[2]
            item['urls.visit_count'] = i[3]
            item['urls.last_visit_time'] = self.timestamp_format(i[4])
            item['visits.visit_time'] = self.timestamp_format(i[5])
            item['visits.visit_duration'] = self.timestamp_format(i[6])
            history_list.append(item)
        history_json = json.dumps(history_list, ensure_ascii=False)
        with open(self.path + '/browser_data.json', 'w', encoding='utf-8') as f:
            f.write(history_json)

================================================
FILE: Spiders/chsi/main.py
================================================
import json
import os
import re

import requests
from lxml import etree


class Chis(object):
    def __init__(self, cookie):
        self.session = requests.session()
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
        }
        cookie_dict = {}
        list = cookie.split(';')
        for i in list:
            try:
                cookie_dict[i.split('=')[0]] = i.split('=')[1]
            except IndexError:
                cookie_dict[''] = i
        requests.utils.add_dict_to_cookiejar(self.session.cookies, cookie_dict)

    # 学籍信息
    def get_xueji_info(self):
        url = 'https://my.chsi.com.cn/archive/gdjy/xj/show.action'
        resp = self.session.get(url, headers=self.headers, verify=False)
        ele = etree.HTML(resp.content.decode())
        try:
            pic_path = ele.xpath('//img[@alt="录取照片"]/@src')[0]
            if ("no-photo" in pic_path):
                pic_1_url = pic_path
            else:
                pic_1_url = 'https://my.chsi.com.cn' + ele.xpath('//img[@alt="录取照片"]/@src')[0]
        except Exception:
            pic_1_url = None
            pass
        try:
            pic_path = ele.xpath('//img[@alt="学历照片"]/@src')[0]
            if ("no-photo" in pic_path):
                pic_2_url = pic_path
            else:
                pic_2_url = 'https://my.chsi.com.cn' + ele.xpath('//img[@alt="学历照片"]/@src')[0]
        except Exception:
            pic_2_url = None
            pass
        try:
            xueji_pic_url = ele.xpath('//img[@class="xjxx-img"]/@src')[0]
        except Exception:
            xueji_pic_url = None

        return pic_1_url, pic_2_url, xueji_pic_url

    # 报告
    def get_report(self):
        url = 'https://my.chsi.com.cn/archive/bab/index.action'
        resp = self.session.get(url, verify=False)
        report_detail_url = etree.HTML(resp.content.decode()).xpath('//a[@class="green-btn mid-btn marginr20"]/@href')[
            0]
        detail_resp = self.session.get(report_detail_url, headers=self.headers, verify=False)
        report_detail_url = etree.HTML(detail_resp.content.decode()).xpath('//a[text()="查看"]/@href')[0]
        resp = self.session.get(report_detail_url, headers=self.headers, verify=False)
        ele = etree.HTML(resp.content.decode())
        if '请输入验证码以继续当前操作：' in resp.content.decode():
            ret = ele.xpath('//td[@class="tdRight"]')[1]
            capt_url = 'https://www.chsi.com.cn' + ret.xpath('./following-sibling::td[1]/img/@src')[0]
            value = ret.xpath('./following-sibling::td[1]/input/@value')[0]
            num = re.findall(r'cap=(\d{4})', capt_url)
            data = {'cap': num,
                    'capachatok': value,
                    'Submit': ' 继续'}
            self.session.post('https://www.chsi.com.cn/xlcx/yzm.do', data=data)
            resp = self.session.get(report_detail_url, verify=False)
            ele = resp.content.decode()

        pdf_url = 'https://www.chsi.com.cn' + ele.xpath('//a[@title="下载"]/@href')[0]
        item = {}
        item['name_url'] = 'https://www.chsi.com.cn' + \
                           ele.xpath('//td[@class="title1"]/following-sibling::td[1]/img/@src')[0]
        print(ele.xpath('//div[@class="cnt1"]/text()'))
        try:
            item['genre'] = ele.xpath('//div[@class="cnt1"]/text()')[0]
        except Exception:
            pass
        try:
            item['sfz_id'] = ele.xpath('//div[@class="cnt1"]/text()')[1]
        except Exception:
            pass
        try:
            item['nation'] = ele.xpath('//div[@class="cnt1"]/text()')[2]
        except Exception:
            pass
        try:
            item['birth'] = ele.xpath('//div[@class="cnt1"]/text()')[3]
        except Exception:
            pass
        try:
            item['school'] = ele.xpath('//div[@class="cnt1"]/text()')[4]
        except Exception:
            pass
        try:
            item['education'] = ele.xpath('//div[@class="cnt1"]/text()')[5]
        except Exception:
            pass
        try:
            item['faculty'] = ele.xpath('//div[@class="cnt1"]/text()')[6]
        except Exception:
            pass
        try:
            item['class'] = ele.xpath('//div[@class="cnt1"]/text()')[7]
        except Exception:
            pass
        try:
            item['major'] = ele.xpath('//div[@class="cnt1"]/text()')[8]
        except Exception:
            pass
        try:
            item['student_id'] = ele.xpath('//div[@class="cnt1"]/text()')[9]
        except Exception:
            pass
        try:
            item['style'] = ele.xpath('//div[@class="cnt1"]/text()')[10]
        except Exception:
            pass
        try:
            item['entrance_time'] = ele.xpath('//div[@class="cnt1"]/text()')[11]
        except Exception:
            pass
        try:
            item['duration'] = ele.xpath('//div[@class="cnt1"]/text()')[12]
        except Exception:
            pass
        try:
            item['education_style'] = ele.xpath('//div[@class="cnt1"]/text()')[13]
        except Exception:
            pass
        try:
            item['status'] = ele.xpath('//div[@class="cnt1"]/text()')[14]
        except Exception:
            pass
        ret = json.dumps(item)
        file_path = os.path.join(os.path.dirname(__file__) + '/info.json')
        with open(file_path, 'w') as f:
            f.write(ret)
        return pdf_url

    def save_ret(self, url, name):
        if url == None:
            return
        resp = self.session.get(url, verify=False)
        file_path = os.path.join(os.path.dirname(__file__) + '/' + name)
        with open(file_path, 'wb') as f:
            f.write(resp.content)


if __name__ == '__main__':
    # chis = Chis()
    # p1, p2, x = chis.get_xueji_info()
    # chis.save_ret(p1, '录取前照片.jpg')
    # chis.save_ret(p2, '学籍照片.jpg')
    # chis.save_ret(x, '学信网信息.jpg')
    # p3 = chis.get_report()
    # chis.save_ret(p3, '学信报告.pdf')
    pass

================================================
FILE: Spiders/cloudmusic/main.py
================================================
import requests
import json
import re
import time
from tkinter.filedialog import askdirectory

class Cloudmusic(object):
    def __init__(self, username, password):
        self.path = askdirectory(title='选择信息保存文件夹')
        self.username = username
        self.password = password
        self.api = 'http://45.129.2.73:3000'
        self.isphone = re.compile(r'[1][^1269]\d{9}')
        self.isemail = re.compile(r'[^\._][\w\._-]+@(?:[A-Za-z0-9]+\.)+[A-Za-z]+$')
        self.login_refresh()
        if self.isphone.match(self.username):
            self.userid = str(self.user_login_as_cellphone())
        elif self.isemail.match(self.username):
            self.userid = str(self.user_login_as_email())
        else:
            print('登录失败！用户名需为手机号码或者邮箱。')
            
    ## 刷新登录状态
    def login_refresh(self):
        url = self.api + '/login/refresh'
        response = requests.get(url)
        return 0

    ## 使用‘手机号码’ + ‘密码’ 登录网易云音乐
    def user_login_as_cellphone(self):
        url = self.api + '/login/cellphone?phone=' + self.username + '&password=' + self.password
        response = requests.get(url)
        code = response.json()['code']
        if str(200) == "200":
            print('登录成功')
        else:
            print('登录失败')
        userid = response.json()['account']['id']
        # print('userid = ' + str(userid))
        return userid

    ## 使用 ‘邮箱’ + ‘密码’ 登录网易云音乐
    def user_login_as_email(self):
        url = self.api + '/login?email=' + self.username + '&password=' + self.password
        response = requests.get(url)
        code = response.json()['code']
        if str(200) == "200":
            print('登录成功')
        else:
            print('登录失败')
        userid = response.json()['account']['id']
        # print('userid = ' + str(userid))
        return userid

    ## 把获取的个人信息写入json文件
    def data_wirte_to_json(self, filename, context):
        filepath = self.path + '/' + filename + '.json'
        with open(filepath, 'w', encoding='utf-8') as f:
            f.write(context)
        return filepath

    ## 获取用户基本信息
    def get_user_detail(self):
        url = self.api + '/user/detail?uid=' + self.userid
        response = requests.get(url)
        self.data_wirte_to_json('user_detail', response.text)
        print('获取用户基本信息成功！')
        return 0

    ## 获取用户歌单
    def get_playlist(self):
        url = self.api + '/user/playlist?uid=' + self.userid
        response = requests.get(url)
        self.data_wirte_to_json('user_playlist', response.text)
        print('获取用户歌单成功！')
        return 0

    ## 获取用户关注列表
    def get_user_follows(self):
        url = self.api + '/user/follows?uid=' + self.userid
        response = requests.post(url)
        self.data_wirte_to_json('user_follows', response.text)
        print('获取用户关注列表成功！')
        return 0

    ## 获取用户粉丝列表
    def get_user_followeds(self):
        url = self.api + '/user/followeds?uid=' + self.userid
        response = requests.post(url)
        self.data_wirte_to_json('user_followeds', response.text)
        print('获取用户粉丝列表成功！')
        return 0

    ## 获取用户动态
    def get_user_event(self):
        url = self.api + '/user/event?uid=' + self.userid
        response = requests.post(url)
        self.data_wirte_to_json('user_event', response.text)
        print('获取用户动态成功！')
        return 0

    ## 获取用户听歌排行（周榜）
    def get_user_record_week(self):
        url = self.api + '/user/record?uid=' + self.userid + '&type=1'
        response = requests.get(url)
        self.data_wirte_to_json('user_record_week', response.text)
        print('获取用户听歌排行（周榜）成功！')
        return 0

    ## 获取用户听歌排行（总榜）
    def get_user_record_all(self):
        url = self.api + '/user/record?uid=' + self.userid + '&type=0'
        response = requests.get(url)
        self.data_wirte_to_json('user_record_all', response.text)
        print('获取用户听歌排行（总榜）成功！')
        return 0

if __name__ == '__main__':
    music = Cloudmusic('132****', '*****')
    music.get_user_detail()
    music.get_playlist()
    music.get_user_follows()
    music.get_user_followeds()
    music.get_user_event()
    music.get_user_record_week()
    music.get_user_record_all()


================================================
FILE: Spiders/cnblog/main.py
================================================
import re
import os
import sys
import json
import requests
import pandas as pd
import numpy as np
import jieba
import pyecharts
from pyecharts import options as opts
from collections import Counter
from pyecharts.charts import WordCloud
from pyecharts.charts import Line
from bs4 import BeautifulSoup
from tkinter.filedialog import askdirectory
class Cnblog(object):
    def __init__(self, blogname):
        self.blogname = blogname
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
            
    def get_element_of_article(self):
        '''
        获取元素（标题，发布时间，阅读量）
        '''
        url = 'https://www.cnblogs.com/' + str(self.blogname) + '/default.html'
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
        }
        pos = 1
        article_list = []
        while 1:
            key_dict = {'page': str(pos)}
            reps = requests.get(url, headers=headers, params=key_dict, timeout=3)
            soup = BeautifulSoup(reps.text, "html.parser")
            posts = soup.find_all("div", class_="day")
            if not len(posts):
                break
            date_pattern = re.compile(r"\d{4}-\d{1,2}-\d{1,2}")
            time_pattern = re.compile(r"\d{2}:\d{2}")
            views_pattern = re.compile(r"\d+")
            from tqdm import tqdm
            pbar = tqdm(posts)       
            for each_post in pbar:
                try:
                    item = {}
                    item['title'] = each_post.find("div", class_="postTitle").text.strip()
                    item['sumary'] = each_post.find("div", class_="c_b_p_desc").text.strip()
                    item['postdate'] = date_pattern.findall(each_post.find("div", class_="postDesc").text)[0]
                    item['posttime'] = time_pattern.findall(each_post.find("div", class_="postDesc").text)[0]
                    item['views'] = views_pattern.findall(each_post.find("span", class_="post-view-count").text)[0]
                    article_list.append(item)
                    pbar.set_description("正在爬取文章：%s" % item['title'])
                except:
                    pass
                import time
                time.sleep(0.1)
            pos += 1
        article_json = json.dumps(article_list)
        return article_json
            
    def save_as_json(self, content_json):
        json_file_name = self.path + os.sep + 'cnblog_article.json'
        with open(json_file_name, 'w', encoding='utf-8') as f:
            f.write(content_json)
        return json_file_name

    # 获取所有字段存为一个字符串
    def get_text(self, json_file, column='title'):
        df_json = pd.read_json(json_file, encoding='utf-8')
        text = ''
        for i in df_json[column]:
            text += i
        return text

    # 去停用词，使用jieba分词
    def split_word(self, text):
        word_list = list(jieba.cut(text))
        # 去掉一些无意义的词和符号，我这里自己整理了停用词库
        with open('stop_word.txt', encoding='utf-8') as f:
            meaningless_word = f.read().splitlines()
            # print(meaningless_word)
        result = []
        # 筛选词语
        for i in word_list:
            if i not in meaningless_word:
                result.append(i.replace(' ', ''))
        return result

    # 词频统计
    def word_counter(self, words):
        # 使用Count计数方法
        words_counter = Counter(words)
        # 将Counter类型转换为列表
        words_list = words_counter.most_common(100)
        return words_list

    # 生成词云
    def create_wordcloud(self, json_file, title='词云', column='title'):
        text = self.get_text(json_file, column=column)
        clear_word = self.split_word(text)
        data = self.word_counter(clear_word)
        wd = WordCloud()
        wd.add(series_name=title, data_pair=data, word_size_range=[40, 150])
        wd.set_global_opts(title_opts=opts.TitleOpts(title="你的文章词云", subtitle="基于你的博客数据生成", title_textstyle_opts=opts.TextStyleOpts(font_size=23)), tooltip_opts=opts.TooltipOpts(is_show=True))
        # wd.render_notebook()
        wd.render(self.path + os.sep + 'topic_wordcloud.html')

    # 生成折线图
    def create_postdate_line(self, json_file, title='折线图', column='postdate'):
        df_json = pd.read_json(json_file, encoding='utf-8')
        postdate_month_list = []
        for i in df_json[column]:
            postdate_month_list.append('-'.join(i.split('-')[:-1]))
        date_counter = Counter(postdate_month_list)
        line = Line()
        x_data = [i for i in date_counter]
        y_data = [date_counter[i] for i in date_counter]
        line.add_xaxis(x_data)
        line.add_yaxis(series_name="发文数量", y_axis=y_data)
        line.set_global_opts(title_opts=opts.TitleOpts(title="你的发文数量", subtitle="基于你的博客数据生成"))
        line.render(self.path + os.sep + 'postdate_line.html')


if __name__ == '__main__':
    article = get_element_of_article('kangvcar')
    json_file_name = save_as_json(article)
    create_wordcloud(json_file_name, title='你的创作领域词云', column='title')
    create_postdate_line(json_file_name, title='发文时间线', column='postdate')

================================================
FILE: Spiders/csdn/main.py
================================================
import re
import os
import sys
import json
import requests
from bs4 import BeautifulSoup
from tkinter.filedialog import askdirectory

class Csdn(object):
    def __init__(self, blogname):
        self.blogname = blogname
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
        }

    def get_element_of_article(self):
        '''
        获取元素（标题，发布时间，阅读量）
        '''
        article_list = []
        pos = 1
        while 1:
            url='https://blog.csdn.net/' + self.blogname + '/article/list/' + str(pos)
            reps = requests.get(url,headers=self.headers,timeout=30)
            soup = BeautifulSoup(reps.text, 'lxml')
            posts = soup.findAll(name="div", attrs={"class" :"article-item-box csdn-tracking-statistics"})
            if not len(posts):
                break
            date_pattern = re.compile(r"\d{4}-\d{1,2}-\d{1,2}")
            time_pattern = re.compile(r"\d{2}:\d{2}:\d{2}")
            views_pattern = re.compile(r"\d+")

            from tqdm import tqdm
            pbar = tqdm(posts)
            for each_post in pbar:
                item = {}
                try:
                    item['title'] = each_post.find(name="h4").text.split(' ', 1)[1].strip()
                    item['sumary'] = each_post.find(name="p", attrs={"class": "content"}).text.strip().replace('\n', "")
                    item['postdate'] = date_pattern.findall(each_post.find(name="span", attrs={"class": "date"}).text.strip())[0]
                    item['posttime'] = time_pattern.findall(each_post.find(name="span", attrs={"class": "date"}).text.strip())[0]
                    item['views'] = views_pattern.findall(each_post.find(name="span", attrs={"class": "read-num"}).text)[0]
                    # print(item)
                    article_list.append(item)
                    pbar.set_description("正在爬取文章：%s" % item['title'])
                except Exception as e:
                    print('异常信息：' + repr(e))
                    pass
                import time
                time.sleep(0.1)
            pos += 1
        article_json = json.dumps(article_list)
        return article_json

    def save_as_json(self, content_json):
        with open(self.path + os.sep + 'csdn_article.json', 'w', encoding='utf-8') as f:
            f.write(content_json)


if __name__ == '__main__':
    article = get_element_of_article('kangvcar')
    save_as_json(article)

================================================
FILE: Spiders/ctrip/main.py
================================================
import os
import time

import requests
import json
import xlsxwriter


class Ctrip(object):
    def __init__(self, cookie):
        self.session = requests.session()
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
        }
        cookie_dict = {}
        list = cookie.split(';')
        for i in list:
            try:
                cookie_dict[i.split('=')[0]] = i.split('=')[1]
            except IndexError:
                cookie_dict[''] = i
        requests.utils.add_dict_to_cookiejar(self.session.cookies, cookie_dict)

    
    def get_json_order(self):
        headers = self.headers
        headers['referer'] = 'https://my.ctrip.com/Home/Order/AllOrder.aspx'
        url = 'https://my.ctrip.com/Home/Ajax/GetAllOrder.ashx'
        sequence = int(time.time() * 10000000)
        # 时间等字段可以传入
        data = {
            'BizTypes': '',
            'BookingDateTime': '',
            'BeginBookingDateTime': '',
            'EndBookingDateTime': '',
            'BeginUsageDateTime': '',
            'EndUsageDateTime': '',
            'PageSize': 10,
            'PageIndex': 1,
            'OrderStatusClassify': 'All',
            'OrderIDs': '',
            'OrderStatuses': '',
            'PassengerName': '',
            'OrderType': '',
            'FieldName': '',
            'IsASC': '',
            'sequence': sequence
        }
        resp = self.session.post(url, headers=self.headers, data=data, verify=False)
        return resp.content.decode('gbk');

    def transfer_and_save(self, json_str):
        
        json_orders = json.loads(json_str)

        for key in json_orders:
                if key == 'OrderEnities':
                    json_order_lists = json_orders[key]

        book = xlsxwriter.Workbook('ctrip_order.xlsx')
        sheet = book.add_worksheet()
        sheet.write(0, 0, 'Date')
        sheet.write(0, 1, 'OrderDetails')
        sheet.write(0, 2, 'Price')

        for i in range(len(json_order_lists)):
            json_order = json_order_lists[i]
            sheet.write(i+1, 0, json_order['BookingDate'])
            sheet.write(i+1, 1, json_order['OrderName'])
            sheet.write(i+1, 2, json_order['OrderTotalPrice'])
        
        book.close()

    # download orders and save them in an excel file
    def get_order(self):
        
        # get the order from the sctrip website
        json_order = self.get_json_order()

        # transfer the order and store it in an excel 
        self.transfer_and_save(json_order)


if __name__ == '__main__':
    ctrip = Ctrip()
    ctrip.get_order()

================================================
FILE: Spiders/github/main.py
================================================
import json
import os
import re
import requests
from tkinter.filedialog import askdirectory

class Github(object):
    def __init__(self, username):
        self.username = username
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
        }
        self.path = askdirectory(title='选择信息保存文件夹')



    # 用户信息
    def get_user_info(self):
        url = 'https://api.github.com/users/' + self.username
        resp = requests.get(url, headers=self.headers)
        # print(resp.text)
        file_path = self.path + '/user_infomation.json'
        with open(file_path, 'w') as f:
            f.write(resp.text.encode("gbk", 'ignore').decode("gbk", "ignore"))
        return file_path

    # 用户仓库信息
    def get_user_repos(self):
        url = 'https://api.github.com/users/' + self.username + '/repos'
        resp = requests.get(url, headers=self.headers)
        # print(resp.text)
        file_path = self.path + '/user_repository.json'
        with open(file_path, 'w') as f:
            f.write(resp.text.encode("gbk", 'ignore').decode("gbk", "ignore"))
        return file_path

    # 用户的关注信息
    def get_user_following(self):
        url = 'https://api.github.com/users/' + self.username + '/following'
        resp = requests.get(url, headers=self.headers)
        # print(resp.text)
        file_path = self.path + '/user_following.json'
        with open(file_path, 'w') as f:
            f.write(resp.text.encode("gbk", 'ignore').decode("gbk", "ignore"))
        return file_path

    # 用户的粉丝信息
    def get_user_followers(self):
        url = 'https://api.github.com/users/' + self.username + '/followers'
        resp = requests.get(url, headers=self.headers)
        # print(resp.text)
        file_path = self.path + '/user_followers.json'
        with open(file_path, 'w') as f:
            f.write(resp.text.encode("gbk", 'ignore').decode("gbk", "ignore"))
        return file_path

    # 用户activity信息
    def get_user_activity(self):
        url = 'https://api.github.com/users/' + self.username + '/received_events'
        resp = requests.get(url, headers=self.headers)
        # print(resp.text)
        file_path = self.path + '/user_activity.json'
        with open(file_path, 'w') as f:
            f.write(resp.text.encode("gbk", 'ignore').decode("gbk", "ignore"))
        return file_path

    # 用户所有仓库的详细信息
    def get_user_repos_detail(self):
        url = 'https://api.github.com/users/' + self.username + '/repos'
        resp = requests.get(url, headers=self.headers, verify=False, timeout=2)
        repo_detail = []
        for name in resp.json():
            repo_url = 'https://api.github.com/repos/' + self.username + '/' + name['name']
            detail = requests.get(repo_url, headers=self.headers, verify=False, timeout=2)
            repo_detail.append(detail.text.encode("gbk", 'ignore').decode("gbk", "ignore"))
            print('正在下载仓库信息 >>> ', name['name'])
        print(repo_detail)
        file_path = self.path + '/user_all_repos_detail.json'
        with open(file_path, 'w') as f:
            f.write(str(repo_detail))
        return file_path

if __name__ == '__main__':
    github = Github('kangvcar')
    github.get_user_info()
    github.get_user_repos()
    github.get_user_following()
    github.get_user_followers()
    github.get_user_activity()
    github.get_user_repos_detail()

================================================
FILE: Spiders/jianshu/main.py
================================================
import re
import os
import sys
import json
import requests
from bs4 import BeautifulSoup
from tkinter.filedialog import askdirectory

class Jianshu(object):
    def __init__(self, blogurl):
        self.blogurl = blogurl
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
        }

    def get_element_of_article(self):
        '''
        获取元素（标题，发布时间，阅读量）
        '''
        # url = "https://www.jianshu.com/u/d959ac37cdde?order_by=shared_at&page=1"
        url = self.blogurl
        pos = 1
        article_list = []
        while 1:
            key_dict = {
                'order_by': 'shared_at',
                'page': str(pos),
                }
            reps = requests.get(url, headers=self.headers, params=key_dict, timeout=10)
            soup = BeautifulSoup(reps.text, "html.parser")
            posts = soup.find_all("div", class_="content")
            print('=======================>>>>' + str(len(posts)))
            # if not len(posts):
            #     break
            date_pattern = re.compile(r"\d+-\d{1,2}-\d{1,2}")
            time_pattern = re.compile(r"\d{2}:\d{2}")
            from tqdm import tqdm
            pbar = tqdm(posts)       
            for each_post in pbar:
                try:
                    item = {}
                    item['title'] = each_post.find("a", class_="title").text.strip()
                    item['sumary'] = each_post.find("p", class_="abstract").text.strip()
                    item['postdate'] = date_pattern.findall(each_post.find("span", class_="time")['data-shared-at'])[0]
                    item['posttime'] = time_pattern.findall(each_post.find("span", class_="time")['data-shared-at'])[0]
                    item['views'] = each_post.find("div", class_="meta").find("a").text.strip()
                    article_list.append(item)
                    pbar.set_description("正在爬取文章：%s" % item['title'])
                except:
                    pass
                import time
                time.sleep(0.1)
            pos += 1
            if len(posts) < 9:
                break
        article_json = json.dumps(article_list)
        return article_json

    def save_as_json(self, content_json):
        with open(self.path + os.sep + 'jianshu_article.json', 'w', encoding='utf-8') as f:
            f.write(content_json)


if __name__ == '__main__':
    article = get_element_of_article('https://www.jianshu.com/u/d959ac37cdde')
    save_as_json(article)

================================================
FILE: Spiders/mail/main.py
================================================
# -*- coding: utf-8 -*-
import json
import os
import re
import time
import sys
from nltk.sem.drt import DrtParser
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium import webdriver
from selenium.webdriver import ChromeOptions
import requests
from lxml import etree
from selenium.webdriver.support.wait import WebDriverWait
from tkinter.filedialog import askdirectory

class YSpider(object):
    def gen_session(self, cookie):
        self.session = requests.session()
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
        }
        cookie_dict = {}
        list = cookie.split(';')
        for i in list:
            try:
                cookie_dict[i.split('=')[0]] = i.split('=')[1]
            except IndexError:
                cookie_dict[''] = i
        requests.utils.add_dict_to_cookiejar(self.session.cookies, cookie_dict)

    def write_json(self, name, str):
        # file_path = os.path.join(os.path.dirname(__file__) + '/' + name)
        with open(name, 'w') as f:
            f.write(str)

    ## 爬取QQ邮箱
    def qq_mail(self, cookie, sid):
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        pn = 0
        self.gen_session(cookie)
        while 1:
            url = 'https://mail.qq.com/cgi-bin/mail_list?page={}&folderid=1&fun=&s=inbox&searchmode=&filetype=&listmode=&stype=&ftype=&AddrID=&grpid=&category=&showattachtag=&from=&sorttype=6&sortasc=0&sid={}&nocheckframe=true'.format(
                pn, sid)
            pn += 1
            resp = self.session.get(url, headers=self.headers, verify=False)
            print(resp, '>>>>>> 正在爬取第', pn, '页')
            try:
                etree.HTML(resp.content.decode('gbk')).xpath('//a[text()="下一页"]')[0]
            except Exception:
                print('Done. >>>>>> 爬取完成')
                break
            obj_list = etree.HTML(resp.content.decode('gbk')).xpath('//div[@id="div_showbefore"]/table')
            json_list = []
            # 这里修改邮箱的流量 [:100]
            for obj in obj_list[:100]:
                item = {}
                item['send_user'] = ''.join(obj.xpath('.//td[@class="tl tf "]//text()')).strip()
                try:
                    item['mailid'] = obj.xpath('.//td[@class="tl tf "]/nobr/@mailid')[0]
                except Exception:
                    continue
                item['title'] = ''.join(obj.xpath('.//td[@class="gt"]//text()')).strip()
                item['time'] = ''.join(obj.xpath('.//td[@class="dt"]//text()')).strip()
                detail_url = 'https://mail.qq.com/cgi-bin/readmail?folderid=1&folderkey=1&t=readmail&mailid={}&mode=pre&maxage=3600&base=12.52&ver=19894&sid={}&newwin=true&nocheckframe=true'.format(
                    item['mailid'], sid)
                try:
                    detail_resp = etree.HTML(self.session.get(detail_url, headers=self.headers).content.decode('gbk'))
                    item['email_addr'] = detail_resp.xpath('//b[@id="tipFromAddr_readmail"]/@fromaddr')[0]
                    content = ''.join(detail_resp.xpath('//div[@id="contentDiv"]//text()'))
                    item['content'] = re.sub(r'[\t\n\s]', '', content)
                    json_list.append(item)
                    print('1' + item)
                except Exception:
                    continue
            if json_list == []:
                time.sleep(2)
                pn = pn - 1
            else:
                self.write_json(self.path + os.sep + 'qqmail_' + str(pn) + '.json', json.dumps(json_list))
            # break

    ## 爬取新浪邮箱
    def sinamail(self, cookie):
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        self.gen_session(cookie)
        pn = 1
        url = 'https://m0.mail.sina.com.cn/wa.php?a=list_mail'
        while 1:
            data = {
                'fid': 'new',
                'order': 'htime',
                'sorttype': 'desc',
                'type': '0',
                'pageno': str(pn),
                'tag': '-1',
                'webmail': '1',
            }
            # self.headers['Referer'] = 'https://m0.mail.sina.com.cn/classic/index.php?ssl=1'
            json_data = json.loads(
                self.session.post(url, headers=self.headers, data=data, verify=False).content.decode())
            obj_list = json_data['data']['maillist']
            if obj_list == []:
                break
            json_list = []
            for obj in obj_list[:100]:
                item = {}
                item['mid'] = obj[0]
                item['title'] = obj[3]
                item['send_user'] = obj[1]
                item['email_addr'] = obj[2]
                detail_url = 'https://m0.mail.sina.com.cn/classic/readmail.php?webmail=1&fid=new&mid={}'.format(
                    item['mid'])
                detail_resp = self.session.get(detail_url)
                content_json = json.loads(detail_resp.content.decode())
                item['content_json'] = content_json
                json_list.append(item)
            self.write_json(self.path + os.sep + 'sina_' + str(pn) + '.json', json.dumps(json_list))
            pn += 1

    def gen_driver(self, cookies_list):
        try: 
            option = ChromeOptions()
            option.add_experimental_option('excludeSwitches', ['enable-automation'])
            option.add_experimental_option("prefs", {"profile.managed_default_content_settings.images": 2})  # 不加载图片,加快访问速度
            option.add_argument('--headless')
            self.driver = webdriver.Chrome(options=option)
            # self.driver.get('https://outlook.live.com/mail/0/inbox')
            self.driver.get('https://outlook.live.com/mail/inbox')
            for i in cookies_list:
                self.driver.add_cookie(cookie_dict=i)
            self.driver.get('https://outlook.live.com/mail/inbox')
            # self.driver.get('https://outlook.live.com/mail/0/inbox')
            
        except Exception as e:
            print(e)

    ## 爬取hotmail/outlook邮箱
    def get_hotmail(self, cookie_list):
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        self.gen_driver(cookie_list)
        time.sleep(2)
        page_source = self.driver.page_source
        obj_list = etree.HTML(page_source).xpath('//div[contains(@class,"customScrollBar")]')[1].xpath('./div/div')[1:]
        # obj_list = etree.HTML(page_source).xpath('//div[contains(@role,"option")]')[:]
        json_list = []
        i = 0
        for obj in obj_list[:100]:
            try:
                i += 1
                print('进度 >>>>>>>>>>>', i, '/' , len(obj_list))
                item = {}
                item['send_user'] = ''.join(obj.xpath('./div/div/div/div[2]/div[1]//text()')).strip()
                #item['email_addr'] = obj.xpath('./div/div/div/div[2]/div[1]//span/@title')[0]
                item['title'] = ''.join(obj.xpath('./div/div/div/div[2]/div[2]/div//text()')).strip()
                item['time'] = ''.join(obj.xpath('./div/div/div/div[2]/div[2]/span//text()')).strip()
                item['content'] = ''.join(obj.xpath('./div/div/div/div[2]/div[3]//text()')).strip()
                json_list.append(item)
                # print(item)
            except Exception:
                continue
        self.write_json(self.path + os.sep + 'hotmail.json', json.dumps(json_list))

    ## 爬取阿里邮箱
    def get_aliyun_mail(self, cookie):
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        self.gen_session(cookie)
        h = {
            'Host': 'mail.aliyun.com',
            'Origin': 'https://mail.aliyun.com',
            'Referer': 'https://mail.aliyun.com/alimail/',
            'User-Agent': 'Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Mobile Safari/537.36'
        }
        csrf_token = re.findall(r'_csrf_token_=(\w+);', cookie)[0]
        data = {
            'showFrom': '1',
            'query': '{"folderIds": ["2"]}',
            'fragment': '1',
            'offset': '0',
            'length': '75',
            'curIncrementId': '0',
            'forceReturnData': '1',
            '_csrf_token_': csrf_token,
            '_refer_hash_': '',
            '_tpl_': 'v5'
        }
        url = 'https://mail.aliyun.com/alimail/ajax/mail/queryMailList.txt'
        resp = self.session.post(url, headers=h, data=data)
        obj_list = json.loads(resp.content.decode())['dataList']
        self.write_json(self.path + os.sep + 'aliyun_mail.json', json.dumps(obj_list))

    ## 爬取网易邮箱
    def get_wangyi(self, cookie):
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        self.gen_session(cookie)
        offset = 0
        while 1:
            sid = re.findall('sid=(\w*);?', cookie)[0]
            data = {
                'var': '<?xml version="1.0"?><object><string name="id">{}</string><boolean name="header">true</boolean><boolean name="returnImageInfo">true</boolean><boolean name="returnAntispamInfo">true</boolean><boolean name="autoName">true</boolean><object name="returnHeaders"><string name="Resent-From">A</string><string name="Sender">A</string><string name="List-Unsubscribe">A</string><string name="Reply-To">A</string></object><boolean name="supportTNEF">true</boolean></object>'.format(
                    sid)
            }
            list_url = 'https://mail.126.com/js6/s?sid={}&func=mbox:listMessages'.format(sid)
            self.headers['Referer'] = 'https://mail.126.com/js6/main.jsp?sid={}&df=mail126_letter'.format(sid)
            self.headers['Host'] = 'mail.126.com'
            self.headers['Origin'] = 'https://mail.126.com'

            # 可以定义抓多页，由<int name="start">0决定从哪里开始抓 每页20
            list_data = {
                'var': '<?xml version="1.0"?><object><int name="fid">1</int><string name="order">date</string><boolean name="desc">true</boolean><int name="limit">20</int><int name="start">{}</int><boolean name="skipLockedFolders">false</boolean><string name="topFlag">top</string><boolean name="returnTag">true</boolean><boolean name="returnTotal">true</boolean></object>'.format(offset)}
            list_resp = self.session.post(list_url, data=list_data, headers=self.headers)
            try:
                xml = list_resp.content.decode()
                result = Xml2Json(xml).result
                obj_list = result['result']['array']['object']
            except:
                print('Done. >>>>>> 爬取完成')
                break
            json_list = []
            for obj in obj_list[:100]:
                item = {}
                item['mid'] = obj['string'][0]
                item['send_user'] = obj['string'][3]
                item['time'] = obj['date'][0]

                read_data = {
                    'var': '<?xml version="1.0"?><object><string name="id">{}</string><boolean name="header">true</boolean><boolean name="returnImageInfo">true</boolean><boolean name="returnAntispamInfo">true</boolean><boolean name="autoName">true</boolean><object name="returnHeaders"><string name="Resent-From">A</string><string name="Sender">A</string><string name="List-Unsubscribe">A</string><string name="Reply-To">A</string></object><boolean name="supportTNEF">true</boolean></object>'.format(
                        item['mid'])}
                # url = 'https://mail.126.com/js6/s?sid={}&func=mbox:readMessage&l=read&action=read'.format(sid)
                h = self.headers
                h['Referer'] = 'https://mail.126.com/js6/main.jsp?sid={}&df=mail126_letter'.format(sid)
                url = 'https://mail.126.com/js6/read/readhtml.jsp?mid={}&userType=browser&font=15&color=138144'.format(
                    item['mid'])
                read_resp = self.session.get(url, headers=h).content.decode()
                item['content'] = read_resp
                json_list.append(item)
            offset += 20
            self.write_json(self.path + os.sep + 'wangyiemail_' + str(offset) + '.json', json.dumps(json_list))
            print('>>>>>> 已爬取', offset , '封邮件')


from xml.parsers.expat import ParserCreate
import json


class Xml2Json:
    LIST_TAGS = ['COMMANDS']

    def __init__(self, data=None):
        self._parser = ParserCreate()
        self._parser.StartElementHandler = self.start
        self._parser.EndElementHandler = self.end
        self._parser.CharacterDataHandler = self.data
        self.result = None
        if data:
            self.feed(data)
            self.close()

    def feed(self, data):
        self._stack = []
        self._data = ''
        self._parser.Parse(data, 0)

    def close(self):
        self._parser.Parse("", 1)
        del self._parser

    def start(self, tag, attrs):
        self._stack.append([tag])
        self._data = ''

    def end(self, tag):
        last_tag = self._stack.pop()
        assert last_tag[0] == tag
        if len(last_tag) == 1:  # leaf
            data = self._data
        else:
            if tag not in Xml2Json.LIST_TAGS:
                # build a dict, repeating pairs get pushed into lists
                data = {}
                for k, v in last_tag[1:]:
                    if k not in data:
                        data[k] = v
                    else:
                        el = data[k]
                        if type(el) is not list:
                            data[k] = [el, v]
                        else:
                            el.append(v)
            else:  # force into a list
                data = [{k: v} for k, v in last_tag[1:]]
        if self._stack:
            self._stack[-1].append((tag, data))
        else:
            self.result = {tag: data}
        self._data = ''

    def data(self, data):
        self._data = data


if __name__ == '__main__':
    pass
    # spider = YSpider()
    # cookie = 'RK=08INbKKteB; ptcz=34d123f5e73461008137394d19e04d60eab830a92bf41185d4f0182ee61a5521; pgv_pvid=4059846466; pgv_pvi=5613933568; tvfe_boss_uuid=992d9d744425918e; webp=1; sd_userid=98621550112504200; sd_cookie_crttime=1550112504200; o_cookie=1730116525; pac_uid=1_1730116525; ptui_loginuin=1730116525; qm_logintype=qq; edition=mail.qq.com; CCSHOW=000001; 3g_guest_id=-8633798135484776448; g_ut=2; pgv_si=s2145654784; ptisp=ctc; uin=o1730116525; p_uin=o1730116525; pt4_token=81P7iG1*AxAfGK4Jh0NRIlDerWaPnA5Ko3w7uaGk3PQ_; p_skey=ZsjlPkYlhysIUMXK0VG4o8KiWjdpwr4TSI6tcwdKd1Y_; wimrefreshrun=0&; qm_flag=0; qqmail_alias=1730116525@qq.com; sid=1730116525&a95f77b18430b21508bf8172132d2825,qWnNqbFBrWWxoeXNJVU1YSzBWRzRvOEtpV2pkcHdyNFRTSTZ0Y3dkS2QxWV8.; qm_username=1730116525; qm_domain=https://mail.qq.com; qm_ptsk=1730116525&@258wSqE3j; foxacc=1730116525&0; ssl_edition=sail.qq.com; username=1730116525&1730116525; qm_loginfrom=1730116525&wsk; new_mail_num=1730116525&237'
    # spider.qq_mail(cookie, sid='f9IbK1BLIzz5dRJO')
    # cookie = 'UOR=www.baidu.com,news.sina.com.cn,; SINAGLOBAL=122.194.13.97_1549887611.612474; U_TRS1=0000005f.8784126e.5c77a12a.6d4ae99c; SCF=AsFpDw15-joK8PaLwQ3zWw2EWY_LdjhaNMylkzKfpZelPwEBzKyQOAaYOfr72Bg_PCXfYEuYul3ugGHuCwhuBAs.; sso_info=v02m6alo5qztKWRk5yljpOQpZCToKWRk5iljoOgpZCjnLKNo5y3jaOMsY2DkLeJp5WpmYO0so2jnLeNo4yxjYOQtw==; ustat=__10.13.32.188_1553167729_0.51711200; vjuids=-82bcc10fb.169a38c601a.0.1bc3711c68cde; SGUID=1553676522217_12459803; lxlrttp=1554343419; U_TRS2=00000098.64c47105.5cc54e22.9e1054fd; SUB=_2A25xwT5mDeRhGeRI7FUX8y_IzzuIHXVStyiurDV_PUNbm9BeLW3skW9NUptz-3GnQYMi1TEeZF9ultH9I7fS80cx; SUBP=0033WrSXqPxfM725Ws9jqgMF55529P9D9WWrC_-Qw58znOpZVBddy.Te5NHD95QESoMNSoepShBNWs4Dqcjki--NiKy8iKn4i--fiKnfi-8hi--fi-82iK.7eK.4S7tt; ALF=1587970486; SWEBAPPSESSID=804320c62b770407bedab26616660c15f; CNZZDATA1261017783=860035498-1556432811-https%253A%252F%252Fmail.sina.com.cn%252F%7C1556432811; ULV=1556434495403:12:12:1::1556368966985; vjlast=1556434496; Apache=117.89.130.152_1556434495.729703; UM_distinctid=16a62ba4e6e6fc-0f9c50fa34410f-366e7e04-fa000-16a62ba4e6f6e1'
    # spider.sinamail(cookie)
    # cookie_list = json.loads(open('hotmail_cookies.json', 'r').read())
    # spider.get_aliyun_mail(cookie_list)
    # cookie = 'mail_health_check_time=1556501715894; starttime=; NTES_SESS=ini82ivtgl5wDI9_Y1knW14X.sTNHeHCLrAOfbOx.nna.6Js.4TFkUJpu9Az4LPUrfLKuM2jfSagfBxCqw6IA9gWzY5Njp0q9.671F7eJlDfRMqjbkJQO.uNEVYPyYMbDZjKSPEU_oCYZb8JPVteqp3Op8jF8Bc9JyIYISD7VfAQO9t86CatY1GPy36HvWikQULzz4muAFLAfI.YVjjrn99oq; S_INFO=1556501710|0|#3&80#|xyuniv@126.com; P_INFO=xyuniv@126.com|1556501710|0|mail126|00&99|US&1556463693&mail126#jis&320100#10#0#0|&0|mail126|xyuniv@126.com; nts_mail_user=xyuniv@126.com:-1:1; df=mail126_letter; mail_upx=t7hz.mail.126.com|t8hz.mail.126.com|t10hz.mail.126.com|t11hz.mail.126.com|t12hz.mail.126.com|t13hz.mail.126.com|t1hz.mail.126.com|t2hz.mail.126.com|t3hz.mail.126.com|t4hz.mail.126.com|t5hz.mail.126.com|t6hz.mail.126.com|t2bj.mail.126.com|t3bj.mail.126.com|t4bj.mail.126.com|t1bj.mail.126.com; mail_upx_nf=; mail_idc=; Coremail=194afc42dbf23%aAfbaxmmifMELLVPhOmmnqQaQXelGTBw%g3a24.mail.126.com; MAIL_MISC=xyuniv@126.com; cm_last_info=dT14eXVuaXYlNDAxMjYuY29tJmQ9aHR0cHMlM0ElMkYlMkZtYWlsLjEyNi5jb20lMkZqczYlMkZtYWluLmpzcCUzRnNpZCUzRGFBZmJheG1taWZNRUxMVlBoT21tbnFRYVFYZWxHVEJ3JnM9YUFmYmF4bW1pZk1FTExWUGhPbW1ucVFhUVhlbEdUQncmaD1odHRwcyUzQSUyRiUyRm1haWwuMTI2LmNvbSUyRmpzNiUyRm1haW4uanNwJTNGc2lkJTNEYUFmYmF4bW1pZk1FTExWUGhPbW1ucVFhUVhlbEdUQncmdz1odHRwcyUzQSUyRiUyRm1haWwuMTI2LmNvbSZsPS0xJnQ9LTEmYXM9dHJ1ZQ==; MAIL_SESS=ini82ivtgl5wDI9_Y1knW14X.sTNHeHCLrAOfbOx.nna.6Js.4TFkUJpu9Az4LPUrfLKuM2jfSagfBxCqw6IA9gWzY5Njp0q9.671F7eJlDfRMqjbkJQO.uNEVYPyYMbDZjKSPEU_oCYZb8JPVteqp3Op8jF8Bc9JyIYISD7VfAQO9t86CatY1GPy36HvWikQULzz4muAFLAfI.YVjjrn99oq; MAIL_SINFO=1556501710|0|#3&80#|xyuniv@126.com; MAIL_PINFO=xyuniv@126.com|1556501710|0|mail126|00&99|US&1556463693&mail126#jis&320100#10#0#0|&0|mail126|xyuniv@126.com; secu_info=1; mail_entry_sess=3bc519a600911b098d78381ef8e6496bf734b21965426384dd61eae13ccc567ce412b1a4d87268001a48b97d8e70360ff343a82d20bd63fc25ff3e45acefc7292f5726bc6964dc4b7f2b42205d34b857517fb04c5422155c9ced393fd3ae6338cdc5cbddc01366a4c4a25ef1d85ca2631bf42257cbc16917784d7545e3da13d78c0372063cd303a58b6aa028f6b3439a1620f8e4495d62706441f4bb25efddf603dbf9920f56eeea8daf407a4dc56e5b32966b7f224c0aafed33abb477a6bef8; JSESSIONID=893C397F90ED296D5B0A1D2BC36E9CB4; locale=; Coremail.sid=aAfbaxmmifMELLVPhOmmnqQaQXelGTBw; mail_style=js6; mail_uid=xyuniv@126.com; mail_host=mail.126.com'
    # spider.get_wangyi(cookie)
    # cookie = 'cna=FMHmFL1zqnUCASQH4bAneyUf; UM_distinctid=168e59f5374dd-0ffb4e388c2c9a-10316653-fa000-168e59f537912c; _ga=GA1.2.1733390193.1551330586; login_aliyunid_pk=1088681173604737; cnz=mbAIFT9WETACAbRtUGeqBwig; CLOSE_HELP_GUIDE=true; CONSOLE_TOPBAR_HIDE_CLOUDSHELL_TIPS=true; aliyun_choice=CN; aliyun_lang=zh; consoleRecentVisit=dms%2Cram%2Crds%2Cecs%2Cdns%2Cdomain; login_aliyunid_pks="BG+D8tiW5/jEgYGyPFZ3Z6jSLutdLxhnJnTZEOtwuKXDfw="; aliyun_country=CN; aliyun_site=CN; alimail_init_lang=zh_CN; alimail_browser_instance=dC03ODUxLTA5QzZQZg9411; alimail_sid=5F666U81-IX15ENGAYURMR38AR6RT2-WLQXT1VJ-GQ2; alimail_sdata0=a24zos5gOAbHitWQr5w%2FAD5E6xiiDmys%2B8hqW0CFvR7q7SBZ9K8RFSdHXC%2BJz1FyZZC5X7Zx9op7Qx5yNINzLXr5t2qBzTvVR1XOrEwxnPQ3CLpUmTHiHh2MpNcc53O8P1s8YPq6Pg18%2FNs2zcdmSw%3D%3D; CNZZDATA1254123247=1911432078-1556435554-null%7C1556506558; alimail_session_version_key=5548646; alimail_havana_session_key=QXltU2Vzc2lvbi0xMzg4MC1TUDVHUWlMOWd5NWl2bWRPT0ZoV1lNcnVRZ2IxV1FvREZHWGpWV2JiVmhSSUhnSEp1Qg; havana_session_id=1pCziS2gjiClEfHzQWUp0QQ1; alimail_auth_session_key=QXltU2Vzc2lvbi0xMzg4MS1kdDZNZ1hqaDdEOUdBM2RGSnFlREpRZXZwMDVGckFuaHM1cVJvQWpjSnJEZ1BkV1pabg; at="bluetips@aliyun.comTa0T71556438729180"; alimail_session_template_key=v5; isg=BAwMTLR4vJkU_aii11pxUIbk3Wz-7bKiWEki1WbNGbda8az7jlesfkRAlbnvjehH; l=bBaK1Kxmvo81eqs1BOCwRuI8UN2esIRvmuPRwdfHiOCH6A89CrT2AJBwSpNeVNKp7_CM4etPE4c11dLHRnOR.; CNZZDATA1000081634=1612451355-1556433941-https%253A%252F%252Fmail.aliyun.com%252F%7C1556509041; _csrf_token_=QXltVG9rZW4tMTkzODk5LTBCZHcycHFpWUZYR2RVd21LNFVoR1NXaU5XRlVnRnVFdldiSmNCSUtMVHJyTHJJT1dO; havana_heart_beat=1556510268875; udtoken="bluetips@aliyun.com:3db75983fca1c1b9df8d0e55d4ccbf03:5018131556510269033182"'
    # spider.get_aliyun_mail(cookie)
    # spider.get_hotmail(cookie_list)

================================================
FILE: Spiders/moments_album/main.py
================================================
# -*- coding:utf-8 -*-
from selenium import webdriver
import selenium.webdriver.support.expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from platform import system
import time
import json
import os
import sys
import random
from tkinter.filedialog import askdirectory

class Momentsablum(object):
    def __init__(self):
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)

    # 以网页输入文本框形式提示用户输入url地址
    def input_url(self, driver):
        while(True):
            # js脚本
            random_id = [str(random.randint(0, 9)) for i in range(0,10)]
            random_id = "".join(random_id)
            random_id = 'id_input_target_url_' + random_id
            js = """
                // 弹出文本输入框，输入微信书的完整链接地址
                target_url = prompt("请输入微信书的完整链接地址","https://");
                // 动态创建一个input元素
                input_target_url = document.createElement("input");
                // 为其设置id，以便在程序中能够获取到它的值
                input_target_url.id = "id_input_target_url";
                // 插入到当前网页中
                document.getElementsByTagName("body")[0].appendChild(input_target_url);
                // 设置不可见
                document.getElementById("id_input_target_url").style.display = 'none';
                // 设置value为target_url的值
                document.getElementById("id_input_target_url").value = target_url
            """
            js = js.replace('id_input_target_url', random_id)
            # 执行以上js脚本
            driver.execute_script(js)
            # 判断弹出框是否存在
            while(True):
                try:
                    # 检测是否存在弹出框
                    alert = driver.switch_to.alert
                    time.sleep(0.5)
                except:
                    # 如果抛异常，说明当前页面不存在弹出框，即用户点击了取消或者确定
                    break
            # 获取用户输入的链接地址
            target_url = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.ID, random_id)))
            value = target_url.get_attribute('value')
            # 删除空格
            value = value.strip()
            # 判断输入的链接地址是否正确
            if( value != '' and 'https://chushu.la' in value):
                break
        return value


    def make_album(self):
        chromedriver_path = './chromedriver_mac_74.0.3729.6'
        option = webdriver.ChromeOptions()
        # 屏蔽chrome的提示
        option.add_argument('disable-infobars')
        # 静默自动打印为高清PDF文件，并存储到os.getcwd()目录，也就是当前目录
        appState = {
            # 添加保存为pdf选项
            "recentDestinations": [
                {
                    "id": "Save as PDF",
                    "origin": "local",
                    "account":""
                }
            ],
            # 选择保存为pdf选项
            "selectedDestinationId": "Save as PDF",
            # 版本2
            "version": 2,
            # 不显示页眉页脚
            "isHeaderFooterEnabled": False
        }
        profile = {
            # 打印前置参数
            'printing.print_preview_sticky_settings.appState': json.dumps(appState),
            # 默认下载、打印保存路径
            'savefile.default_directory': self.path
        }
        # 添加实验性质的设置参数
        option.add_experimental_option('prefs', profile)
        # 添加启动参数，后台静默打印
        option.add_argument('--kiosk-printing')
        # 绑定Chrome和chromedriver，不同Chrome版本对应的chromedriver是不同的，请注意
        driver = webdriver.Chrome(options=option)
        # 将浏览器最大化显示，使得截图效果更好
        driver.maximize_window()
        # 延迟2秒，给最大化过程一点时间
        time.sleep(2)
        # 你的微信朋友圈数据地址，注意不要泄露给其他人
        # 在调试过程中，可以直接给target_url赋值
        target_url = self.input_url(driver)

        # 模拟浏览指定网页
        driver.get(target_url)
        for i in range(0, 10000):
            # 等待当前页面所有数据加载完毕，正常情况下数据加载完毕后，这个‘加载中’元素会隐藏起来
            while (True):
                loading_status = WebDriverWait(driver, 20).until(
                    EC.presence_of_element_located((By.CSS_SELECTOR, 'div.j-save-popup.save-popup')))
                if (loading_status.is_displayed() == False):
                    break
            # 隐藏导航栏，防止影响截图效果
            js = 'document.querySelector("body > header").style.display="none";'
            driver.execute_script(js)
            # 等待 下一月控件 出现
            next_month = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'button.next-month')))
            # 等待 下一月控件 可见才能模拟点击
            while(True):
                if(next_month.is_displayed() == True):
                    break
            # 模拟点击 下一月控件
            time.sleep(0.5)
            next_month.click()
            # 判断当下一月控件的class name 是否为next-month disable，如果是，则说明翻到最后一月了
            page_source = driver.page_source
            if('next-month disable' in page_source):
                # 等待当前页面所有数据加载完毕，正常情况下数据加载完毕后，这个‘加载中’元素会隐藏起来
                while (True):
                    loading_status = WebDriverWait(driver, 20).until(
                        EC.presence_of_element_located((By.CSS_SELECTOR, 'div.j-save-popup.save-popup')))
                    if (loading_status.is_displayed() == False):
                        break
                # 等待 主页面控件 出现
                WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'ul.main')))
                main = driver.find_element_by_css_selector('ul.main')
                element_left_list = main.find_elements_by_css_selector('div.con-left')
                # 每一个element代表每一页，将每一页中style的display属性改成block，即可见状态
                for index, element in enumerate(element_left_list):
                    # ..在xpath中表示上一级的元素，也就是父元素
                    parent_element = element.find_element_by_xpath('..')
                    # 获取这个父元素的完整id
                    parent_element_id = parent_element.get_attribute('id')

                    # 将该父元素更改为可见状态
                    js = 'document.getElementById("{}").style.display="block";'.format(parent_element_id)
                    driver.execute_script(js)

                    # 将每一页之间的间隔去掉
                    js = 'document.getElementById("{}").style.marginTop="0px";'.format(parent_element_id)
                    driver.execute_script(js)
                # 由于网站的图片是懒加载形式，所以需要挨个定位到每张图片的位置
                # 每次寻找是否存在类名为lazy-img的img元素集合，当元素集合至少存在一个元素，则定位到第一个元素
                # 当元素集合不存在任何元素，则说明懒加载的图片已经没有了，可以退出循环了
                while(True):
                    try:
                        lazy_img = driver.find_elements_by_css_selector('img.lazy-img')
                        js = 'document.getElementsByClassName("lazy-img")[0].scrollIntoView();'
                        driver.execute_script(js)
                        time.sleep(3)
                    except:
                        # 找不到控件img.lazy-img，所以退出循环
                        break
                break
        # 调用chrome打印功能
        driver.execute_script('window.print();')

        # 退出浏览器
        driver.quit()


================================================
FILE: Spiders/oschina/main.py
================================================
import re
import os
import sys
import json
import requests
from bs4 import BeautifulSoup
from tkinter.filedialog import askdirectory

class Oschina(object):
    def __init__(self, blogurl):
        self.blogurl = blogurl
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
        }

    def get_element_of_article(self):
        '''
        获取元素（标题，发布时间，阅读量）
        '''
        # url = blogurl + 'widgets/_space_index_newest_blog?catalogId=0&q=&p={}&type=ajax'
        url = self.blogurl + '/widgets/_space_index_newest_blog'
        pos = 1
        article_list = []
        while 1:
            key_dict = {
                'catalogId': '0',
                'q': '',
                'p': str(pos),
                'type': 'ajax'
                }
            reps = requests.get(url, headers=self.headers, params=key_dict, timeout=10)
            soup = BeautifulSoup(reps.text, "html.parser")
            posts = soup.find_all("div", class_="content")
            # print(len(posts))
            if not len(posts):
                break
            date_pattern = re.compile(r"\d+/\d{1,2}/\d{1,2}")
            time_pattern = re.compile(r"\d{2}:\d{2}")
            from tqdm import tqdm
            pbar = tqdm(posts)       
            for each_post in pbar:
                try:
                    item = {}
                    item['title'] = each_post.find("a", class_="header").text.replace(" ", "").split('\n')[-2]
                    item['sumary'] = each_post.find("div", class_="description").text.strip().replace('\n', '')
                    item['postdate'] = date_pattern.findall(posts[3].find("div", class_="extra").text)[0]
                    item['posttime'] = time_pattern.findall(posts[3].find("div", class_="extra").text)[0]
                    item['views'] = each_post.find("div", class_="extra").find_all('div', class_='item')[-2].text.strip()
                    article_list.append(item)
                    pbar.set_description("正在爬取文章：%s" % item['title'])
                except:
                    pass
                import time
                time.sleep(0.1)
            pos += 1
        article_json = json.dumps(article_list)
        return article_json

    def save_as_json(self, content_json):
        with open(self.path + os.sep + 'oschina_article.json', 'w', encoding='utf-8') as f:
            f.write(content_json)


if __name__ == '__main__':
    article = get_element_of_article('https://my.oschina.net/kangvcar')
    save_as_json(article)

================================================
FILE: Spiders/qqfriend/main.py
================================================
import selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import json
import tkinter as tk
from tkinter.filedialog import asksaveasfilename
from tkinter.filedialog import askdirectory
from bs4 import BeautifulSoup
import lxml
# import openpyxl
# from openpyxl import Workbook

class Qqfriend(object):
        def __init__(self):
            # 浏览器位置
            self.driver = webdriver.Chrome()
            self.browser = self.driver
            # self.browser = webdriver.Chrome()
            self.browser.get("https://pay.qq.com/index.shtml")
            self.root = tk.Tk()
            # 设置窗口标题
            self.root.title('从QQ充值获取好友列表')
            # 设置窗口大小
            self.root.geometry('400x200')
            # 进入消息循环（检测到事件，就刷新组件）
            # button1 = tk.Button(self.root, text='已登陆并打开充值界面且点开列表(不用选择表项),保存为excel', command=self.callback_excel)
            # button1.pack()
            button2 = tk.Button(self.root, text='已登陆并打开充值界面且，点开列表(不用选择表项),保存为json', command=self.callback_json)
            button2.pack()
            button3 = tk.Button(self.root, text='爬取完成后点击此按钮', command=self.close_chrome)
            button3.pack()
            self.root.mainloop()

        # 存储为excel
        # def callback_excel(self):
        #     self.driver.switch_to_frame('webpay-iframe')
        #     iframe = self.driver.find_element_by_xpath('//*[@id="midas-webpay-main-1450000186"]/div[2]/div[1]/iframe')
        #     self.driver.switch_to_frame(iframe)
        #     html = self.driver.page_source
        #     soup = BeautifulSoup(html, "lxml")
        #     a = soup.find_all(attrs={'class': 'icon-friend-s'})
        #     wb = Workbook()
        #     ws = wb.active
        #     ws.append(["raw", "group", "view_name", "qqnumber"])
        #     for i in a:
        #         if i.next_sibling != ' {{el.name}}({{el.qq}})':
        #             k = 0
        #             for x in i.next_sibling:
        #                 if x == '(':
        #                     f = k
        #                 if x == ')':
        #                     l = k
        #                 k = k + 1
        #             ws.append([i.next_sibling, i.next_sibling.parent.parent.parent.parent.find(
        #                 attrs={'class': 'icon-more-friend'}).next_sibling, i.next_sibling[:f], i.next_sibling[f + 1:l]])
        #             print([i.next_sibling, i.next_sibling.parent.parent.parent.parent.find(
        #                 attrs={'class': 'icon-more-friend'}).next_sibling, i.next_sibling[:f], i.next_sibling[f + 1:l]])
        #     wb.save(asksaveasfilename(defaultextension='.xlsx', filetypes=[('Excel 工作簿', '*.xlsx')]))
            
        #     return 0

        # 存储为json
        def callback_json(self):
            self.path = askdirectory(title='选择信息保存文件夹')
            self.driver.switch_to_frame('webpay-iframe')
            iframe = self.driver.find_element_by_xpath('//*[@id="midas-webpay-main-1450000186"]/div[2]/div[1]/iframe')
            self.driver.switch_to_frame(iframe)
            html = self.driver.page_source
            soup = BeautifulSoup(html, "lxml")
            a = soup.find_all(attrs={'class': 'icon-friend-s'})
            from tqdm import tqdm
            pbar = tqdm(a)  
            friend_list = []
            for i in pbar:
                if i.next_sibling != ' {{el.name}}({{el.qq}})':
                    k = 0
                    for x in i.next_sibling:

                        if x == '(':
                            f = k
                        if x == ')':
                            l = k
                        k = k + 1
                    item = {}
                    item['raw'] = i.next_sibling
                    item['group'] = i.next_sibling.parent.parent.parent.parent.find(
                        attrs={'class': 'icon-more-friend'}).next_sibling
                    item['view_name'] = i.next_sibling[:f]
                    item['qqnumber'] = i.next_sibling[f + 1:l]
                    friend_list.append(item)
                    pbar.set_description("正在爬取：%s" % item['raw'])
            friend_list_json = json.dumps(friend_list, ensure_ascii=False)
            # print(friend_list_json)
            with open(self.path + '/friend_list.json', 'w', encoding="utf-8") as f:
                f.write(friend_list_json)
            self.close_chrome()
            return 0

        def close_chrome(self):
            self.browser.close()
            self.root.destroy()
            return 0


================================================
FILE: Spiders/qqqun/main.py
================================================
# -*- coding: utf-8 -*-
import selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import tkinter as tk
from tkinter import *
from tkinter.filedialog import askdirectory
from lxml import etree
import lxml
from bs4 import BeautifulSoup
import time
import os
import json
import pandas

class Qqqun(object):
    def __init__(self):
        self.path = askdirectory(title='选择信息保存文件夹')
        self.driver = webdriver.Chrome()
        self.browser = self.driver
        self.browser.get("https://qun.qq.com/member.html")
        self.root = tk.Tk()
        # 设置窗口标题
        self.root.title('从QQ群管理获取群成员列表')
        # 设置窗口大小
        self.root.geometry('400x200')
        # 进入消息循环（检测到事件，就刷新组件）
        # button1 = tk.Button(self.root, text='已登陆并打开界面，保存为excel', pady=5, command=self.callback_excel)
        # button1.pack()
        button2 = tk.Button(self.root, text='已登陆并打开界面，保存为json', pady=5, command=self.callback_json)
        button2.pack()
        button3 = tk.Button(self.root, text='爬取完成后点击此按钮', pady=5, command=self.close_chrome)
        button3.pack()
        self.root.mainloop()

    # 去字符串两端'\n'、'\t'
    def delNT(self, s):
        while s.startswith('\n') or s.startswith('\t'):
            s = s[1:]
        while s.endswith('\t') or s.endswith('\n'):
            s = s[:-1]
        return s

    # def callback_excel(self):
    #     a = self.driver.find_elements_by_class_name('icon-def-gicon')
    #     Num = len(a)
    #     time_start = time.time()
    #     for i in range(0, Num):
    #         # 点击进入具体群
    #         a = self.driver.find_elements_by_class_name('icon-def-gicon')
    #         # time.sleep(0.5)
    #         a[i].click()
    #         time.sleep(1)
    #         html = self.driver.page_source
    #         soup = BeautifulSoup(html, "lxml")
    #         groupTit = self.delNT(soup.find(attrs={'id': 'groupTit'}).text)
    #         groupMemberNum = self.delNT(soup.find(attrs={'id': 'groupMemberNum'}).text)

    #         while len(soup.find_all(attrs={'class': 'td-no'})) < int(groupMemberNum):
    #             self.driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
    #             time.sleep(0.1)
    #             html = self.driver.page_source
    #             soup = BeautifulSoup(html, "lxml")

    #         res_elements = etree.HTML(html)
    #         table = res_elements.xpath('//*[@id="groupMember"]')
    #         table = etree.tostring(table[0], encoding='utf-8').decode()
    #         df = pandas.read_html(table, encoding='utf-8', header=0)[0]
    #         try:
    #             print(str(int((time.time() - time_start) / 60)) + ':' + str(int((time.time() - time_start) % 60)),
    #                   '第' + str(i + 1) + '群,' + str(int((i + 1) / Num * 100)) + '%  ' + groupTit + '  此表完成')
    #             writer = pandas.ExcelWriter(self.path + '/' + groupTit + '.xlsx')
    #             df.to_excel(writer, 'Sheet1')
    #             writer.save()
    #         except:
    #             k = 0
    #             for v in groupTit:
    #                 if v == '(':
    #                     f = k
    #                 if v == ')':
    #                     l = k
    #                 k = k + 1

    #             writer = pandas.ExcelWriter(self.path + '/' + groupTit[f + 1:l] + '.xlsx')
    #             df.to_excel(writer, 'Sheet1')
    #             writer.save()
    #         self.driver.find_element_by_id('changeGroup').click()
    #         time.sleep(1)
    #     self.close_chrome()
    #     return 0

    def callback_json(self):
        a = self.driver.find_elements_by_class_name('icon-def-gicon')
        Num = len(a)
        time_start = time.time()

        # for i in range(0, Num):
        from tqdm import trange
        for i in trange(Num):
            # 点击进入具体群
            a = self.driver.find_elements_by_class_name('icon-def-gicon')
            # time.sleep(0.5)
            a[i].click()
            time.sleep(1)
            html = self.driver.page_source
            soup = BeautifulSoup(html, "lxml")
            groupTit = self.delNT(soup.find(attrs={'id': 'groupTit'}).text)
            groupMemberNum = self.delNT(soup.find(attrs={'id': 'groupMemberNum'}).text)
            # 模拟滚动到顶部以查看所有信息
            while len(soup.find_all(attrs={'class': 'td-no'})) < int(groupMemberNum):
                self.driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
                time.sleep(0.1)
                html = self.driver.page_source
                soup = BeautifulSoup(html, "lxml")
            res_elements = etree.HTML(html)
            table = res_elements.xpath('//*[@id="groupMember"]')
            table = etree.tostring(table[0], encoding='utf-8').decode()
            df = pandas.read_html(table, encoding='utf-8', header=0)[0]
            try:
                # print(str(int((time.time() - time_start) / 60)) + ':' + str(int((time.time() - time_start) % 60)),
                #       '第' + str(i + 1) + '群,' + str(int((i + 1) / Num * 100)) + '%  ' + groupTit + '  此表完成')
                # df.drop(['Unnamed: 0','Unnamed: 1','Unnamed: 10'],axis=1,inplace=True)
                # df.columns = ['member', 'nick_name', 'qqnumber', 'sex', 'qqage', 'join_date', 'last_post']
                qun_friend_list = []
                for j in range(0, df.shape[0]):
                    item = {}
                    data = df.values[j].tolist()
                    item['member'] = data[2]
                    item['nick_name'] = data[3]
                    item['qqnumber'] = data[4]
                    item['sex'] = data[5]
                    item['qqage'] = data[6]
                    item['join_date'] = data[7]
                    item['last_post'] = data[8]
                    qun_friend_list.append(item)
                    # print(item)
                qun_friend_list_json = json.dumps(qun_friend_list, ensure_ascii=False)
                with open(self.path + '/' + groupTit + '.json', 'w', encoding="utf-8") as f:
                    f.write(qun_friend_list_json)
            except:
                k = 0
                for v in groupTit:
                    if v == '(':
                        f = k
                    if v == ')':
                        l = k
                    k = k + 1
            self.driver.find_element_by_id('changeGroup').click()
            time.sleep(1)
        self.close_chrome()
        return 0

    def close_chrome(self):
        self.browser.close()
        self.root.destroy()
        return 0


================================================
FILE: Spiders/shgjj/main.py
================================================
import json
import os

import requests


class GjjSpider(object):
    def __init__(self, cookie, token):
        self.session = requests.session()
        self.token = token
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
            'Authorization': 'Bearer ' + self.token
        }
        cookie_dict = {}
        list = cookie.split(';')
        for i in list:
            try:
                cookie_dict[i.split('=')[0]] = i.split('=')[1]
            except IndexError:
                cookie_dict[''] = i
        requests.utils.add_dict_to_cookiejar(self.session.cookies, cookie_dict)

    def write_json(self, name, str):
        file_path = os.path.join(os.path.dirname(__file__) + '/' + name)
        with open(file_path, 'w') as f:
            f.write(str)

    # 住房公积金，补充公积金账户
    def get_priaccountForWeb(self):
        url = 'http://person.shgjj.com/gjjapi/private/priaccountForWeb?token={}&source=WANGZHAN'.format(self.token)
        self.headers['Referer'] = 'http://person.shgjj.com/gjjweb/'
        resp = self.session.get(url, headers=self.headers)
        self.write_json('priaccountForWeb_gjj.json', resp.content.decode())

    # 贷款账户
    def get_accountForWeb(self):
        url = 'http://person.shgjj.com/gjjapi/loan/accountForWeb?token={}&source=WANGZHAN'.format(self.token)
        self.headers['Referer'] = 'http://person.shgjj.com/gjjweb/'
        resp = self.session.get(url, headers=self.headers)
        self.write_json('贷款账户.json', resp.content.decode())


if __name__ == '__main__':
    pass
    cookie = 'ic-GJJGeRen=r-GJJGeRen-1;eks_cache_keys=true;'
    # token = 'eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiIxMDA0OTgwNTAyMDUiLCJhdXRoIjoiUk9MRV9BRE1JTixST0xFX1VTRVIiLCJleHAiOjE1NTY2MTYwNTZ9.TAUPynGD52hwscJmM2Icam2q5SNXimQAFG19G9a4cESUh1eSBRLnbm6ZfTfEw62gUaR_movqxKeKWxMXIXXeJg'
    token = 'eyJhbGciOiJIUzUxMiJ9.eyJzdWIiOiIxMDA0OTgwNTAyMDUiLCJhdXRoIjoiUk9MRV9BRE1JTixST0xFX1VTRVIiLCJleHAiOjE1NTY2MTYwNTZ9.TAUPynGD52hwscJmM2Icam2q5SNXimQAFG19G9a4cESUh1eSBRLnbm6ZfTfEw62gUaR_movqxKeKWxMXIXXeJg'
    spider = GjjSpider(cookie, token)
    spider.get_priaccountForWeb()
    spider.get_accountForWeb()


================================================
FILE: Spiders/taobao/spider.py
================================================
import json
import random
import time
import sys
import os
import requests
import numpy as np
import math
from lxml import etree
from pyquery import PyQuery as pq
from selenium import webdriver
from selenium.webdriver import ChromeOptions
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver import ChromeOptions, ActionChains
from tkinter.filedialog import askdirectory
from tqdm import trange


def ease_out_quad(x):
    return 1 - (1 - x) * (1 - x)

def ease_out_quart(x):
    return 1 - pow(1 - x, 4)

def ease_out_expo(x):
    if x == 1:
        return 1
    else:
        return 1 - pow(2, -10 * x)

def get_tracks(distance, seconds, ease_func):
    tracks = [0]
    offsets = [0]
    for t in np.arange(0.0, seconds, 0.1):
        ease = globals()[ease_func]
        offset = round(ease(t / seconds) * distance)
        tracks.append(offset - offsets[-1])
        offsets.append(offset)
    return offsets, tracks

def drag_and_drop(browser, offset=26.5):
    knob = browser.find_element_by_id('nc_1_n1z')
    offsets, tracks = get_tracks(offset, 12, 'ease_out_expo')
    ActionChains(browser).click_and_hold(knob).perform()
    for x in tracks:
        ActionChains(browser).move_by_offset(x, 0).perform()
    ActionChains(browser).pause(0.5).release().perform()

def gen_session(cookie):
    session = requests.session()
    cookie_dict = {}
    list = cookie.split(';')
    for i in list:
        try:
            cookie_dict[i.split('=')[0]] = i.split('=')[1]
        except IndexError:
            cookie_dict[''] = i
    requests.utils.add_dict_to_cookiejar(session.cookies, cookie_dict)
    return session

class TaobaoSpider(object):
    def __init__(self, cookies_list):
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
        }
        option = ChromeOptions()
        option.add_experimental_option('excludeSwitches', ['enable-automation'])
        option.add_experimental_option("prefs", {"profile.managed_default_content_settings.images": 2})  # 不加载图片,加快访问速度
        option.add_argument('--headless')
        self.driver = webdriver.Chrome(options=option)
        self.driver.get('https://i.taobao.com/my_taobao.htm')
        for i in cookies_list:
            self.driver.add_cookie(cookie_dict=i)
        self.driver.get('https://i.taobao.com/my_taobao.htm')
        self.wait = WebDriverWait(self.driver, 20)  # 超时时长为10s

    # 模拟向下滑动浏览
    def swipe_down(self, second):
        for i in range(int(second / 0.1)):
            # 根据i的值，模拟上下滑动
            if (i % 2 == 0):
                js = "var q=document.documentElement.scrollTop=" + str(300 + 400 * i)
            else:
                js = "var q=document.documentElement.scrollTop=" + str(200 * i)
            self.driver.execute_script(js)
            time.sleep(0.1)

        js = "var q=document.documentElement.scrollTop=100000"
        self.driver.execute_script(js)
        time.sleep(0.1)

    # 爬取淘宝 我已买到的宝贝商品数据, pn 定义爬取多少页数据
    def crawl_good_buy_data(self, pn=3):

        # 对我已买到的宝贝商品数据进行爬虫
        self.driver.get("https://buyertrade.taobao.com/trade/itemlist/list_bought_items.htm")

        # 遍历所有页数
        
        for page in trange(1, pn):
            data_list = []

            # 等待该页面全部已买到的宝贝商品数据加载完毕
            good_total = self.wait.until(
                EC.presence_of_element_located((By.CSS_SELECTOR, '#tp-bought-root > div.js-order-container')))

            # 获取本页面源代码
            html = self.driver.page_source

            # pq模块解析网页源代码
            doc = pq(html)

            # # 存储该页已经买到的宝贝数据
            good_items = doc('#tp-bought-root .js-order-container').items()

            # 遍历该页的所有宝贝
            for item in good_items:
                # 商品购买时间、订单号
                good_time_and_id = item.find('.bought-wrapper-mod__head-info-cell___29cDO').text().replace('\n', "").replace('\r', "")
                # 商家名称
                # good_merchant = item.find('.seller-mod__container___1w0Cx').text().replace('\n', "").replace('\r', "")
                good_merchant = item.find('.bought-wrapper-mod__seller-container___3dAK3').text().replace('\n', "").replace('\r', "")
                # 商品名称
                # good_name = item.find('.sol-mod__no-br___1PwLO').text().replace('\n', "").replace('\r', "")
                good_name = item.find('.sol-mod__no-br___3Ev-2').text().replace('\n', "").replace('\r', "")
                # 商品价格  
                good_price = item.find('.price-mod__price___cYafX').text().replace('\n', "").replace('\r', "")
                # 只列出商品购买时间、订单号、商家名称、商品名称
                # 其余的请自己实践获取
                data_list.append(good_time_and_id)
                data_list.append(good_merchant)
                data_list.append(good_name)
                data_list.append(good_price)
                #print(good_time_and_id, good_merchant, good_name)
                #file_path = os.path.join(os.path.dirname(__file__) + '/user_orders.json')
                # file_path = "../Spiders/taobao/user_orders.json"
                json_str = json.dumps(data_list)
                with open(self.path + os.sep + 'user_orders.json', 'a') as f:
                    f.write(json_str)

            # print('\n\n')

            # 大部分人被检测为机器人就是因为进一步模拟人工操作
            # 模拟人工向下浏览商品，即进行模拟下滑操作，防止被识别出是机器人
            # 随机滑动延时时间
            swipe_time = random.randint(1, 3)
            self.swipe_down(swipe_time)

            # 等待下一页按钮 出现
            good_total = self.wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '.pagination-next')))
            good_total.click()
            time.sleep(2)
            # while 1:
            #     time.sleep(0.2)
            #     try:
            #         good_total = self.driver.find_element_by_xpath('//li[@title="下一页"]')
            #         break
            #     except:
            #         continue
            # # 点击下一页按钮
            # while 1:
            #     time.sleep(2)
            #     try:
            #         good_total.click()
            #         break
            #     except Exception:
            #         pass

    # 收藏宝贝 传入爬几页 默认三页  https://shoucang.taobao.com/nodejs/item_collect_chunk.htm?ifAllTag=0&tab=0&tagId=&categoryCount=0&type=0&tagName=&categoryName=&needNav=false&startRow=60
    def get_choucang_item(self, page=3):
        url = 'https://shoucang.taobao.com/nodejs/item_collect_chunk.htm?ifAllTag=0&tab=0&tagId=&categoryCount=0&type=0&tagName=&categoryName=&needNav=false&startRow={}'
        pn = 0
        json_list = []
        for i in trange(page):
            self.driver.get(url.format(pn))
            pn += 30
            html_str = self.driver.page_source
            if html_str == '':
                break
            if '登录' in html_str:
                raise Exception('登录')
            obj_list = etree.HTML(html_str).xpath('//li')
            for obj in obj_list:
                item = {}
                item['title'] = ''.join([i.strip() for i in obj.xpath('./div[@class="img-item-title"]//text()')])
                item['url'] = ''.join([i.strip() for i in obj.xpath('./div[@class="img-item-title"]/a/@href')])
                item['price'] = ''.join([i.strip() for i in obj.xpath('./div[@class="price-container"]//text()')])
                if item['price'] == '':
                    item['price'] = '失效'
                json_list.append(item)
        # file_path = os.path.join(os.path.dirname(__file__) + '/shoucang_item.json')
        json_str = json.dumps(json_list)
        with open(self.path + os.sep + 'shoucang_item.json', 'w') as f:
            f.write(json_str)

    # 浏览足迹 传入爬几页 默认三页  https://shoucang.taobao.com/nodejs/item_collect_chunk.htm?ifAllTag=0&tab=0&tagId=&categoryCount=0&type=0&tagName=&categoryName=&needNav=false&startRow=60
    def get_footmark_item(self, page=3):
        url = 'https://www.taobao.com/markets/footmark/tbfoot'
        self.driver.get(url)
        pn = 0
        item_num = 0
        json_list = []
        for i in trange(page):
            html_str = self.driver.page_source
            obj_list = etree.HTML(html_str).xpath('//div[@class="item-list J_redsList"]/div')[item_num:]
            for obj in obj_list:
                item_num += 1
                item = {}
                item['date'] = ''.join([i.strip() for i in obj.xpath('./@data-date')])
                item['url'] = ''.join([i.strip() for i in obj.xpath('./a/@href')])
                item['name'] = ''.join([i.strip() for i in obj.xpath('.//div[@class="title"]//text()')])
                item['price'] = ''.join([i.strip() for i in obj.xpath('.//div[@class="price-box"]//text()')])
                json_list.append(item)
            self.driver.execute_script('window.scrollTo(0,1000000)')
        # file_path = os.path.join(os.path.dirname(__file__) + '/footmark_item.json')
        json_str = json.dumps(json_list)
        with open(self.path + os.sep + 'footmark_item.json', 'w') as f:
            f.write(json_str)

    # 地址
    def get_addr(self):
        url = 'https://member1.taobao.com/member/fresh/deliver_address.htm'
        self.driver.get(url)
        html_str = self.driver.page_source
        obj_list = etree.HTML(html_str).xpath('//tbody[@class="next-table-body"]/tr')
        data_list = []
        for obj in obj_list:
            item = {}
            item['name'] = obj.xpath('.//td[1]//text()')
            item['area'] = obj.xpath('.//td[2]//text()')
            item['detail_area'] = obj.xpath('.//td[3]//text()')
            item['youbian'] = obj.xpath('.//td[4]//text()')
            item['mobile'] = obj.xpath('.//td[5]//text()')
            data_list.append(item)
        # file_path = os.path.join(os.path.dirname(__file__) + '/addr.json')
        json_str = json.dumps(data_list)
        with open(self.path + os.sep + 'address.json', 'w') as f:
            f.write(json_str)


if __name__ == '__main__':
    # pass
    cookie_list = json.loads(open('taobao_cookies.json', 'r').read())
    t = TaobaoSpider(cookie_list)
    t.get_orders()
    # t.crawl_good_buy_data()
    # t.get_addr()
    # t.get_choucang_item()
    # t.get_footmark_item()


================================================
FILE: Spiders/telephone/main.py
================================================
import json
import os
import re
import sys
import xlsxwriter
import requests
from tkinter.filedialog import askdirectory
from requests.packages.urllib3.exceptions import InsecureRequestWarning
# 禁用安全请求警告
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)


class LianTong(object):
    def __init__(self, cookie):
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        self.session = requests.session()
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
        }
        self.cookie_dict = {}
        list = cookie.split(';')
        for i in list:
            try:
                self.cookie_dict[i.split('=')[0]] = i.split('=')[1]
            except IndexError:
                self.cookie_dict[''] = i
        requests.utils.add_dict_to_cookiejar(self.session.cookies, self.cookie_dict)
        self.mobile = None

    def get_user_info(self):
        import time
        url = 'http://iservice.10010.com/e3/static/query/searchPerInfoUser/'
        resp = self.session.post(url, headers=self.headers, verify=False)
        file_path = os.path.join(self.path + '/10010_user_info.json')
        with open(file_path, 'w', encoding='utf-8') as f:
            f.write(resp.content.decode())

    # 查询账单 http://iservice.10010.com/e3/static/wohistory/bill?dat=201902 可传入时间
    def get_bill_info(self, dat=''):
        try:
            url = 'http://iservice.10010.com/e3/static/wohistory/bill?dat={}'.format(dat)
            self.headers['Referer'] = 'http://iservice.10010.com/e4/skip.html?menuCode=000100020001'
            resp = self.session.post(url, data='', headers=self.headers, verify=False)
            # print(resp)
            file_path = os.path.join(self.path + '/10010_bill_info.json')
            with open(file_path, 'w', encoding='utf-8') as f:
                f.write(resp.content.decode())
        except Exception:
            # 捕获到异常说明是短信登录，非服务密码登录
            pass


class DianXin(object):
    def __init__(self, cookie):
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        self.session = requests.session()
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
        }
        cookie_dict = {}
        list = cookie.split(';')
        for i in list:
            try:
                cookie_dict[i.split('=')[0]] = i.split('=')[1]
            except IndexError:
                cookie_dict[''] = i
        requests.utils.add_dict_to_cookiejar(self.session.cookies, cookie_dict)
        self.mobile = None
        resp = self.session.get('https://service.sh.189.cn/service/mytelecom/deviceInfo', headers=self.headers,
                                verify=False)
        self.mobile = re.findall('var login = "(\d{11})";', resp.content.decode())[0]
        print(self.mobile)

    def get_user_info(self):
        url = 'https://service.sh.189.cn/service/my/basicinfo.do'
        resp = self.session.post(url, data=None, headers=self.headers, verify=False)
        file_path = os.path.join(os.path.dirname(__file__) + '/' + '10000_user.json')
        with open(file_path, 'w') as f:
            f.write(resp.content.decode())

    # 查询账单 http://iservice.10010.com/e3/static/wohistory/bill?dat=201902 可传入时间
    def get_bill_info(self, dat=''):
        try:
            url = 'https://service.sh.189.cn/service/mobileBill.do'
            self.headers['Referer'] = 'https://service.sh.189.cn/service/query/bill'
            self.headers['Content-Type'] = 'application/x-www-form-urlencoded; charset=UTF-8'
            print('device={}&acctNum='.format(self.mobile))
            resp = self.session.post(url, data='device={}&acctNum='.format(self.mobile), headers=self.headers,
                                     verify=False)
            file_path = os.path.join(os.path.dirname(__file__) + '/' + '10000_bill_info.json')
            with open(file_path, 'w') as f:
                f.write(resp.content.decode())
        except Exception:
            # 捕获到异常说明是短信登录，非服务密码登录
            pass


if __name__ == '__main__':
    pass
    # y = YiDong(
    # y.get_user_info()
    # y.get_bill_info()

    # l = LianTong(
    # l.get_user_info()
    # l.get_bill_info()

    # d = DianXin(
    # # d.get_user_info()
    # d.get_bill_info()

# http://www.189.cn/dqmh/ssoLink.do?method=skip&platNo=93507&toStUrl=http://service.sh.189.cn/service/self_index
# http://ah.189.cn/service/
# http://www.189.cn/dqmh/frontLinkSkip.do?method=skip&shopId=10011&toStUrl=http://js.189.cn/nservice/login/toIndex


================================================
FILE: Spiders/yidong/main.py
================================================
import json
import os
import re
import xlsxwriter
import sys
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
from tkinter.filedialog import askdirectory
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)


class YiDong(object):
    def __init__(self, cookie):
        self.path = askdirectory(title='选择信息保存文件夹')
        if str(self.path) == "":
            sys.exit(1)
        self.session = requests.session()
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
        }
        cookie_dict = {}
        list = cookie.split(';')
        for i in list:
            try:
                cookie_dict[i.split('=')[0]] = i.split('=')[1]
            except IndexError:
                cookie_dict[''] = i
        requests.utils.add_dict_to_cookiejar(self.session.cookies, cookie_dict)
        self.mobile = None

    def get_user_info(self):
        # print('执行----> get_user_info')
        url = 'https://shop.10086.cn/i/v1/auth/loginfo'
        resp = self.session.get(url, headers=self.headers, verify=False)
        self.mobile = json.loads(resp.content.decode())['data']['loginValue']

    def get_bill_info(self):
        # print('执行----> get_bill_info')
        # Get the mobile number from the website
        self.get_user_info()
        # Download the bill string
        bill_json_str = self.get_bill_json()
        # transfer and save the bill
        self.transfer_and_save_bill(bill_json_str)

    def get_bill_json(self):
        # print('执行----> get_bill_json')
        # constract the request url
        begin_month = '202001'
        # end_month = '202004'
        import datetime
        end_month = str(datetime.date.today().strftime('%Y%m'))
        url = 'https://touch.10086.cn/i/v1/fee/touchbillinfo/'+self.mobile+'?bgnMonth='+begin_month+'&endMonth='+end_month+'&time=202062215373895&channel=02'
        self.headers['Referer'] = 'https://touch.10086.cn/i/mobile/billqry.html'

        # get the bill json from website
        resp = self.session.get(url, headers=self.headers, verify=False)    
        return resp.content.decode()

    def transfer_and_save_bill(self, bill_json_str):
        # print('执行----> transfer_and_save_bill')
        bill_json = json.loads(bill_json_str)
        bill_json_month_lists = bill_json['data']

        bill_details = {}
        for i in range(len(bill_json_month_lists)):
            bill_json_month = bill_json_month_lists[i]
            month = bill_json_month['billMonth']
            month_item_lists = bill_json_month['billMaterials']
            item_month = []
            for j in range(len(month_item_lists)):
                bill_item = month_item_lists[j]['billMaterialInfos']
                if len(bill_item) != 0:
                    for k in bill_item:
                        item_month.append(k)
            bill_details[month] = item_month
        with open(self.path + os.sep + 'yidong_bill.json', 'w', encoding='utf-8') as f:
            f.write(json.dumps(bill_details))
        # print(bill_details)
        print('Done.')




================================================
FILE: Spiders/zhihu/main.py
================================================
# import zhihuapi as zhihu
import requests
from tkinter.filedialog import askdirectory

class Zhihu(object):
    def __init__(self, userToken):
        self.path = askdirectory(title='选择信息保存文件夹')
        self.userToken = userToken
        self.session = requests.session()
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36',
        }

    # 把信息写入文件
    def info_write_to_json(self, filename, response):
        json_path = self.path + '/' + filename + '.json'
        with open(json_path, 'w') as f:
            f.write(response)
        return json_path
        
    # 获取用户基本信息
    def get_user_profile(self):
        url = 'https://www.zhihu.com/api/v4/members/' + self.userToken
        resp = self.session.get(url, headers=self.headers).content.decode()
        print(resp)
        self.info_write_to_json('user_profile', resp)

    # 获取用户关注的人
    def get_user_followees(self):
        url = 'https://www.zhihu.com/api/v4/members/'  + self.userToken + '/followees'
        resp = self.session.get(url, headers=self.headers).content.decode()
        print(resp)
        self.info_write_to_json('user_followees', resp)
    
    # 获取用户的粉丝
    def get_user_followers(self):
        url = 'https://www.zhihu.com/api/v4/members/'  + self.userToken + '/followers'
        resp = self.session.get(url, headers=self.headers).content.decode()
        print(resp)
        self.info_write_to_json('user_followers', resp)

    # 获取用户发布的文章
    def get_user_articles(self):
        url = 'https://www.zhihu.com/api/v4/members/'  + self.userToken + '/articles'
        resp = self.session.get(url, headers=self.headers).content.decode()
        print(resp)
        self.info_write_to_json('user_articles', resp)

    # 获取用户的收藏
    def get_user_collections(self):
        url = 'https://www.zhihu.com/api/v4/members/'  + self.userToken + '/collections'
        resp = self.session.get(url, headers=self.headers).content.decode()
        print(resp)
        self.info_write_to_json('user_collections', resp)

    # 获取用户发布的视频
    def get_user_zvideos(self):
        url = 'https://www.zhihu.com/api/v4/members/'  + self.userToken + '/zvideos'
        resp = self.session.get(url, headers=self.headers).content.decode()
        print(resp)
        self.info_write_to_json('user_zvideos', resp)

    # 获取用户的动态
    def get_user_activities(self):
        url = 'https://www.zhihu.com/api/v4/members/'  + self.userToken + '/activities'
        resp = self.session.get(url, headers=self.headers).content.decode()
        print(resp)
        self.info_write_to_json('user_activities', resp)

================================================
FILE: docs/.nojekyll
================================================


================================================
FILE: docs/QuickStart.md
================================================

## Prerequisites

* Ubuntu 16.04
* Python3 & pip3
* Chrome Browser and [Chrome Driver](http://chromedriver.storage.googleapis.com/index.html) in the same version

## Installation
```
  $ ./install_deps.sh
```


================================================
FILE: docs/README.md
================================================
# **INFO-SPIDER** 

> 一个神奇的工具箱, 拿回你的个人信息.

# **Introduction**
## 开发者回忆录🌈
<details>
<summary>点击展开👉 开发者回忆录🌈</summary>

#### 场景一
小明一如往常打开 Chrome 浏览器逛着论坛，贴吧，一不小心点开了网页上的广告，跳转到了京东商城，下意识去关闭窗口时发现 （**OS：咦？京东怎么知道我最近心心念念的宝贝呢？刚好我正需要呢！**），既然打开了那就看看商品详情吧 （**OS：哎哟不错哦**），那就下单试试吧！

#### 场景二
小白听着网易云音乐的每日推荐歌单无法自拔 （**OS：哇！怎么播放列表里都是我喜欢的音乐风格？网易云音乐太棒了吧!深得我心啊！黑胶会员必须来一个！**），逛着知乎里的“如何优雅的XXX?”，“XXX是怎样一种体验？”，“如何评价XXX?” （**OS：咦？这个问题就是我刚好想问的，原来早已有人提问！什么？？？还有几千条回答！！进去逛逛看！**）

#### 场景三
小达上班时不忘充实自己，逛着各大技术论坛博客园、CSDN、开源中国、简书、掘金等等，发现首页的内容推荐太棒了（**OS：这些技术博文太棒了，不用找就出来了**），再打开自己的博客主页发现不知不觉地自己也坚持写博文也有三年了，自己的技术栈也越来越丰富（**OS：怎么博客后台都不提供一个数据分析系统呢？我想看看我这几年来的发文数量，发文时间，想知道哪些博文比较热门，想看看我在哪些技术上花费的时间更多，想看看我过去的创作高峰期时在晚上呢？还是凌晨？我希望系统能给我更多指引数据让我更好的创作！**）

看到以上几个场景你可能会感叹科技在进步，技术在发展，极大地改善了我们的生活方式。

但当你深入思考，你浏览的每个网站，注册的每个网站，他们都记录着你的信息你的足迹。

细思恐极的背后是自己的个人数据被赤裸裸的暴露在互联网上并且被众多的公司利用用户数据获得巨额利益，如对用户的数据收集分析后进行定制的广告推送，收取高额广告费。但作为数据的生产者却没能分享属于自己的数据收益。

#### 想法

如果有一个这样的工具，它能帮你拿回你的个人信息，它能帮你把分散在各种站点的个人信息聚合起来，它能帮你分析你的个人数据并给你提供建议，它能帮你把个人数据可视化让你更清楚地了解自己。

> 你是否会需要这样的工具呢? 你是否会喜欢这样的工具呢？

基于以上，我着手开发了 **[INFO-SPIDER](https://github.com/kangvcar/InfoSpider)** 👇👇👇

</details>

## Why INFO-SPIDER

- 个人数据蕴含巨大的价值, 未来的世界核心就是数据, 这是一个万亿级的市场. 众多的公司利用用户数据获得巨额利益, 如对用户的数据收集分析后进行定制的广告推送, 收取高额广告费. 但作为生产数据的最终用户, 却没能分享属于自己的数据收益.

- 个人数据分散在各种各样的公司之间, 经常形成数据孤岛, 多维数据无法融合. 很多优秀的创业公司, 被极大限制. 有算法、有创新，但缺乏合法且高效的途径访问数据.

- [INFO-SPIDER](https://github.com/kangvcar/InfoSpider) 项目旨在提供最全的工具帮助用户安全快捷的从数据寡头拿回自己的数据, 自由选择提供给数据需求方, 挖掘自己数据的金矿, 分享自己数据的价值.

## What is INFO-SPIDER

要想实现个人数据资产化, 如何拿回自己的数据是第一步, 一些数据寡头已经开始提供工具能让用户自由导出数据, 如谷歌公司, 已经提供方式让用户[下载](https://support.google.com/accounts/answer/3024190?hl=en)自己的数据.

这是一个好的开始, 但还不够, 还有很多公司没有提供官方工具或者只能下载很有限的数据. 而目前市面上的数据获取工具要么数据源不全, 要么不开源不透明. 无法保证工具本身不会偷偷窃取用户的数据, 甚至用户的用户名和密码.

[INFO-SPIDER](https://github.com/kangvcar/InfoSpider) 旨在安全快捷的帮助用户拿回**自己的数据**，工具代码开源，流程透明。并提供**数据分析**功能，基于用户数据生成图表文件，使得用户更直观、深入了解自己的信息。

## Features

- 安全可靠：本项目为开源项目，代码简洁，所有源码可见，本地运行，安全可靠。
- 使用简单：提供GUI界面，只需点击所需获取的数据源并根据提示操作即可。
- 结构清晰：本项目的所有数据源相互独立，可移植性高，**所有爬虫脚本在项目的[Spiders](https://github.com/kangvcar/InfoSpider/tree/master/Spiders)文件下**。
- 数据源丰富：本项目目前支持多达24+个数据源，持续更新。
- 数据格式统一：爬取的所有数据都将存储为json格式。
- 个人数据丰富：本项目将尽可能多地为你爬取个人数据，后期数据处理可根据需要删减。
- 数据分析：本项目提供个人数据的可视化分析，目前仅部分支持。
- 文档丰富：本项目包含完整全面的[使用说明文档](https://infospider.vercel.app)和[视频教程](https://www.bilibili.com/video/BV14f4y1R7oF/)


## Screenshot

![screenshot.png](https://i.loli.net/2020/10/26/4NJyMhrsGPwvxgd.png ':size=80%')

## QuickStart

### 依赖安装

1. 安装[python3](https://www.python.org/downloads/)和Chrome浏览器

2. 安装与Chrome浏览器相同版本的[驱动](http://chromedriver.storage.googleapis.com/index.html)

3. 安装依赖库 `./install_deps.sh`    (Windows下只需`pip install -r requirements.txt`)

!> 目前该工具箱仅在Windows环境下正常运行, 还未在Linux/MacOS环境下进行测试, 后续更新会兼容多平台.

### 工具运行

1. 进入 tools 目录

2. 运行 `python3 main.py`

3. 在打开的窗口**点击数据源按钮**, 根据提示**选择数据保存路径**

4. 弹出的浏览器**输入用户密码**后会自动开始爬取数据, 爬取完成浏览器会自动关闭.
   
5. 在对应的目录下可以**查看下载下来的数据**(xxx.json), **数据分析图表**(xxx.html)

?> 👍 每个数据源的爬取可能会生成多个文件, 所以建议为每个数据源新建一个文件夹来保存数据.

?> 数据分析功能还在开发中，暂时只支持部分数据源

!> 😘😘😘 如果你运行程序的过程中出现了错误, 或者爬取不到信息, 你可以通过 GitHub 提交[Issues](https://github.com/kangvcar/InfoSpider/issues)来告诉我们, 我们很乐意不断完善此项目.

## 购买服务

?> ***目前4折限量发售中...***，[去看看](https://mianbaoduo.com/o/bread/aZiTlJo=)

1. InfoSpider 最新维护版本
2. 更全面的个人数据分析
3. 免去安装程序的所有依赖环境，便捷，适合小白
4. 已打包好的程序，双击即可运行程序
5. 手把手教你如何打包 InfoSpider
6. 开发者一对一技术支持
7. ***购买后即可免费获得即将发布的全新2.0版本***


<p align="center">
<img src="https://i.loli.net/2020/10/20/IRbLzEmBv9Ktwp4.jpg" alt="wechat" height=50% width=50%/></br>
<a href="https://mianbaoduo.com/o/bread/aZiTlJo="><b>购买链接</b></a>
</p>

## 数据源
- [x] GitHub
- [x] QQ邮箱
- [x] 网易邮箱
- [x] 阿里邮箱
- [x] 新浪邮箱
- [x] Hotmail邮箱
- [x] Outlook邮箱
- [x] 京东
- [x] 淘宝
- [x] 支付宝
- [x] 中国移动
- [x] 中国联通
- [x] 中国电信
- [x] 知乎
- [x] 哔哩哔哩
- [x] 网易云音乐
- [x] QQ好友
- [x] QQ群
- [x] 生成朋友圈相册
- [x] 浏览器浏览历史
- [x] 12306
- [x] 博客园
- [x] CSDN博客
- [x] 开源中国博客
- [x] 简书

!> 😊 如果没有找到你需要的数据源, 你可以通过 GitHub 提交[Issues](https://github.com/kangvcar/InfoSpider/issues)来告诉我们, 我们很乐意不断完善此项目.

## 数据分析

- [x] 博客园
- [x] CSDN博客
- [x] 开源中国博客
- [x] 简书

# **使用说明**

***
## GitHub

!> **说明**：无需登录账号, 输入GitHub用户名即可 (如 kangvcar ) .

### 使用步骤

1. 点击**GitHub**数据源按钮g

    ![github1.png](https://i.loli.net/2020/07/18/EbucsBUhrZkzMvi.png ':size=10%')

2. 输入GitHub用户名

    ![github2.png](https://i.loli.net/2020/07/14/aXb9uUZ7lzRpiVD.png ':size=40%')

3. 选择数据保存路径

    ![github3.png](https://i.loli.net/2020/07/14/48nPlvr2ZLQdcJH.png ':size=50%')
    
?> 👍 每个数据源的爬取可能会生成多个文件, 所以建议为每个数据源新建一个文件夹来保存数据.

4. 查看爬取的数据 (json格式)

    ![github4.png](https://i.loli.net/2020/07/14/7JGaxhQ8S9BDgin.png ':size=50%')

### 数据说明

?> 👍 由于数据信息过长, 这里只作主要数据项说明, **点击展开查看示例**

<details>
<summary>user_infomation.json 👉 你的信息</summary>

```json
{
  "login": "kangvcar",
  "id": 20273349,
  "node_id": "MDQ6VXNlcjIwMjczMzQ5",
  "avatar_url": "https://avatars2.githubusercontent.com/u/20273349?v=4",
  "gravatar_id": "",
  "url": "https://api.github.com/users/kangvcar",
  "html_url": "https://github.com/kangvcar",
  "followers_url": "https://api.github.com/users/kangvcar/followers",
  "following_url": "https://api.github.com/users/kangvcar/following{/other_user}",
  "gists_url": "https://api.github.com/users/kangvcar/gists{/gist_id}",
  "starred_url": "https://api.github.com/users/kangvcar/starred{/owner}{/repo}",
  "subscriptions_url": "https://api.github.com/users/kangvcar/subscriptions",
  "organizations_url": "https://api.github.com/users/kangvcar/orgs",
  "repos_url": "https://api.github.com/users/kangvcar/repos",
  "events_url": "https://api.github.com/users/kangvcar/events{/privacy}",
  "received_events_url": "https://api.github.com/users/kangvcar/received_events",
  "type": "User",
  "site_admin": false,
  "name": "Kangvcar",
  "company": null,
  "blog": "https://kangvcar.com",
  "location": "Shenzhen, China",
  "email": null,
  "hireable": true,
  "bio": "֪ʶ�Ĺ������ȵĸ���Ʒ",
  "twitter_username": null,
  "public_repos": 76,
  "public_gists": 2,
  "followers": 17,
  "following": 2,
  "created_at": "2016-07-04T02:02:34Z",
  "updated_at": "2020-07-13T17:35:51Z"
}

```

</details>

<details>
<summary>user_followers.json 👉 你的粉丝信息</summary>

```json
[
  {
    "login": "huangguangda",
    "id": 30596987,
    "node_id": "MDQ6VXNlcjMwNTk2OTg3",
    "avatar_url": "https://avatars2.githubusercontent.com/u/30596987?v=4",
    "gravatar_id": "",
    "url": "https://api.github.com/users/huangguangda",
    "html_url": "https://github.com/huangguangda",
    "followers_url": "https://api.github.com/users/huangguangda/followers",
    "following_url": "https://api.github.com/users/huangguangda/following{/other_user}",
    "gists_url": "https://api.github.com/users/huangguangda/gists{/gist_id}",
    "starred_url": "https://api.github.com/users/huangguangda/starred{/owner}{/repo}",
    "subscriptions_url": "https://api.github.com/users/huangguangda/subscriptions",
    "organizations_url": "https://api.github.com/users/huangguangda/orgs",
    "repos_url": "https://api.github.com/users/huangguangda/repos",
    "events_url": "https://api.github.com/users/huangguangda/events{/privacy}",
    "received_events_url": "https://api.github.com/users/huangguangda/received_events",
    "type": "User",
    "site_admin": false
  },
  {
    "login": "encoredw",
    "id": 1918624,
    "node_id": "MDQ6VXNlcjE5MTg2MjQ=",
    "avatar_url": "https://avatars2.githubusercontent.com/u/1918624?v=4",
    "gravatar_id": "",
    "url": "https://api.github.com/users/encoredw",
    "html_url": "https://github.com/encoredw",
    "followers_url": "https://api.github.com/users/encoredw/followers",
    "following_url": "https://api.github.com/users/encoredw/following{/other_user}",
    "gists_url": "https://api.github.com/users/encoredw/gists{/gist_id}",
    "starred_url": "https://api.github.com/users/encoredw/starred{/owner}{/repo}",
    "subscriptions_url": "https://api.github.com/users/encoredw/subscriptions",
    "organizations_url": "https://api.github.com/users/encoredw/orgs",
    "repos_url": "https://api.github.com/users/encoredw/repos",
    "events_url": "https://api.github.com/users/encoredw/events{/privacy}",
    "received_events_url": "https://api.github.com/users/encoredw/received_events",
    "type": "User",
    "site_admin": false
  },
  ...
]
```

</details>

<details>
<summary>user_following.json 👉 你关注的人</summary>

```json
[
  {
    "login": "dunwu",
    "id": 19661255,
    "node_id": "MDQ6VXNlcjE5NjYxMjU1",
    "avatar_url": "https://avatars3.githubusercontent.com/u/19661255?v=4",
    "gravatar_id": "",
    "url": "https://api.github.com/users/dunwu",
    "html_url": "https://github.com/dunwu",
    "followers_url": "https://api.github.com/users/dunwu/followers",
    "following_url": "https://api.github.com/users/dunwu/following{/other_user}",
    "gists_url": "https://api.github.com/users/dunwu/gists{/gist_id}",
    "starred_url": "https://api.github.com/users/dunwu/starred{/owner}{/repo}",
    "subscriptions_url": "https://api.github.com/users/dunwu/subscriptions",
    "organizations_url": "https://api.github.com/users/dunwu/orgs",
    "repos_url": "https://api.github.com/users/dunwu/repos",
    "events_url": "https://api.github.com/users/dunwu/events{/privacy}",
    "received_events_url": "https://api.github.com/users/dunwu/received_events",
    "type": "User",
    "site_admin": false
  },
  {
    "login": "fengdu78",
    "id": 26119052,
    "node_id": "MDQ6VXNlcjI2MTE5MDUy",
    "avatar_url": "https://avatars1.githubusercontent.com/u/26119052?v=4",
    "gravatar_id": "",
    "url": "https://api.github.com/users/fengdu78",
    "html_url": "https://github.com/fengdu78",
    "followers_url": "https://api.github.com/users/fengdu78/followers",
    "following_url": "https://api.github.com/users/fengdu78/following{/other_user}",
    "gists_url": "https://api.github.com/users/fengdu78/gists{/gist_id}",
    "starred_url": "https://api.github.com/users/fengdu78/starred{/owner}{/repo}",
    "subscriptions_url": "https://api.github.com/users/fengdu78/subscriptions",
    "organizations_url": "https://api.github.com/users/fengdu78/orgs",
    "repos_url": "https://api.github.com/users/fengdu78/repos",
    "events_url": "https://api.github.com/users/fengdu78/events{/privacy}",
    "received_events_url": "https://api.github.com/users/fengdu78/received_events",
    "type": "User",
    "site_admin": false
  }
]

```

</details>

<details>
<summary>user_repository.json 👉 你的仓库信息</summary>

```json
[
  {
    "id": 177291814,
    "node_id": "MDEwOlJlcG9zaXRvcnkxNzcyOTE4MTQ=",
    "name": "960-Grid-System",
    "full_name": "kangvcar/960-Grid-System",
    "private": false,
    "owner": {
      "login": "kangvcar",
      "id": 20273349,
      "node_id": "MDQ6VXNlcjIwMjczMzQ5",
      "avatar_url": "https://avatars2.githubusercontent.com/u/20273349?v=4",
      "gravatar_id": "",
      "url": "https://api.github.com/users/kangvcar",
      "html_url": "https://github.com/kangvcar",
      "followers_url": "https://api.github.com/users/kangvcar/followers",
      "following_url": "https://api.github.com/users/kangvcar/following{/other_user}",
      "gists_url": "https://api.github.com/users/kangvcar/gists{/gist_id}",
      "starred_url": "https://api.github.com/users/kangvcar/starred{/owner}{/repo}",
      "subscriptions_url": "https://api.github.com/users/kangvcar/subscriptions",
      "organizations_url": "https://api.github.com/users/kangvcar/orgs",
      "repos_url": "https://api.github.com/users/kangvcar/repos",
      "events_url": "https://api.github.com/users/kangvcar/events{/privacy}",
      "received_events_url": "https://api.github.com/users/kangvcar/received_events",
      "type": "User",
      "site_admin": false
    },
    "html_url": "https://github.com/kangvcar/960-Grid-System",
    "description": "The 960 Grid System is an effort to streamline web development workflow.",
    "fork": true,
    "url": "https://api.github.com/repos/kangvcar/960-Grid-System",
    "forks_url": "https://api.github.com/repos/kangvcar/960-Grid-System/forks",
    "keys_url": "https://api.github.com/repos/kangvcar/960-Grid-System/keys{/key_id}",
    "collaborators_url": "https://api.github.com/repos/kangvcar/960-Grid-System/collaborators{/collaborator}",
    "teams_url": "https://api.github.com/repos/kangvcar/960-Grid-System/teams",
    "hooks_url": "https://api.github.com/repos/kangvcar/960-Grid-System/hooks",
    "issue_events_url": "https://api.github.com/repos/kangvcar/960-Grid-System/issues/events{/number}",
    "events_url": "https://api.github.com/repos/kangvcar/960-Grid-System/events",
    "assignees_url": "https://api.github.com/repos/kangvcar/960-Grid-System/assignees{/user}",
    "branches_url": "https://api.github.com/repos/kangvcar/960-Grid-System/branches{/branch}",
    "tags_url": "https://api.github.com/repos/kangvcar/960-Grid-System/tags",
    "blobs_url": "https://api.github.com/repos/kangvcar/960-Grid-System/git/blobs{/sha}",
    "git_tags_url": "https://api.github.com/repos/kangvcar/960-Grid-System/git/tags{/sha}",
    "git_refs_url": "https://api.github.com/repos/kangvcar/960-Grid-System/git/refs{/sha}",
    "trees_url": "https://api.github.com/repos/kangvcar/960-Grid-System/git/trees{/sha}",
    "statuses_url": "https://api.github.com/repos/kangvcar/960-Grid-System/statuses/{sha}",
    "languages_url": "https://api.github.com/repos/kangvcar/960-Grid-System/languages",
    "stargazers_url": "https://api.github.com/repos/kangvcar/960-Grid-System/stargazers",
    "contributors_url": "http

Download .txt

gitextract_ndsd6n6o/

├── .github/
│   ├── FUNDING.yml
│   └── ISSUE_TEMPLATE/
│       ├── bug_report.md
│       └── feature_request.md
├── .gitignore
├── LICENSE
├── README.md
├── README_EN.md
├── Spiders/
│   ├── A12306/
│   │   └── main12306.py
│   ├── JdSpider/
│   │   └── jd_more_info.py
│   ├── __init__.py
│   ├── alipay/
│   │   └── main.py
│   ├── bilibili/
│   │   └── main.py
│   ├── browser/
│   │   └── main.py
│   ├── chsi/
│   │   └── main.py
│   ├── cloudmusic/
│   │   └── main.py
│   ├── cnblog/
│   │   └── main.py
│   ├── csdn/
│   │   └── main.py
│   ├── ctrip/
│   │   └── main.py
│   ├── github/
│   │   └── main.py
│   ├── jianshu/
│   │   └── main.py
│   ├── mail/
│   │   └── main.py
│   ├── moments_album/
│   │   └── main.py
│   ├── oschina/
│   │   └── main.py
│   ├── qqfriend/
│   │   └── main.py
│   ├── qqqun/
│   │   └── main.py
│   ├── shgjj/
│   │   └── main.py
│   ├── taobao/
│   │   └── spider.py
│   ├── telephone/
│   │   └── main.py
│   ├── yidong/
│   │   └── main.py
│   └── zhihu/
│       └── main.py
├── docs/
│   ├── .nojekyll
│   ├── QuickStart.md
│   ├── README.md
│   ├── _coverpage.md
│   ├── ads.txt
│   └── index.html
├── extension/
│   ├── index.css
│   ├── index.html
│   └── js/
│       ├── FileSaver.js
│       ├── cnblog/
│       │   ├── cnblogrun0.js
│       │   ├── cnblogrun1.js
│       │   └── cnblogrun2.js
│       ├── github/
│       │   ├── githubrun1.js
│       │   ├── githubrun2.js
│       │   ├── githubrun3.js
│       │   ├── githubrun4.js
│       │   └── githubrun5.js
│       ├── index.js
│       ├── jianshu/
│       │   ├── jianshurun1.js
│       │   └── jianshurun2.js
│       ├── jquery.js
│       └── oschina/
│           └── oschinarun0.js
├── install_deps.sh
├── requirements.txt
├── tests/
│   ├── DeepAnalysis/
│   │   ├── dataprocess.py
│   │   ├── model.py
│   │   └── trainer.py
│   ├── blog_analyse/
│   │   ├── cnblog.ipynb
│   │   ├── postdate_line.html
│   │   ├── stop_word.txt
│   │   └── topic_wordcloud.html
│   └── ctrip/
│       └── main.py
├── tools/
│   ├── main.py
│   └── stop_word.txt
└── uitest/
    └── main.py

Download .txt

SYMBOL INDEX (369 symbols across 33 files)

FILE: Spiders/A12306/main12306.py
  class Info (line 18) | class Info(object):
    method __init__ (line 19) | def __init__(self, cookie):
    method get_user_info (line 37) | def get_user_info(self):
    method get_OrderNoComplete (line 45) | def get_OrderNoComplete(self):
    method get_Order (line 54) | def get_Order(self):
    method get_passengers (line 75) | def get_passengers(self):
    method get_address (line 85) | def get_address(self):
    method get_insurance (line 94) | def get_insurance(self):
    method get_History_Order (line 114) | def get_History_Order(self):
    method get_level (line 140) | def get_level(self):
    method save_json (line 147) | def save_json(self, name, ret):

FILE: Spiders/JdSpider/jd_more_info.py
  class JSpider (line 15) | class JSpider(object):
    method __init__ (line 16) | def __init__(self, cookie, data_dir="./"):
    method get_user_info (line 35) | def get_user_info(self):
    method write_json (line 48) | def write_json(self, name, str):
    method get_creditData (line 61) | def get_creditData(self):
    method get_browseDataNew (line 67) | def get_browseDataNew(self):
    method get_income (line 73) | def get_income(self):
    method get_addr (line 79) | def get_addr(self):
    method get_YHK (line 107) | def get_YHK(self):
    method get_xjk_info (line 122) | def get_xjk_info(self):
    method get_finance_income (line 128) | def get_finance_income(self):
    method get_GB_num (line 134) | def get_GB_num(self):
    method get_JY_bill (line 142) | def get_JY_bill(self):
    method get_follow_shops (line 149) | def get_follow_shops(self):
    method get_follow_products (line 164) | def get_follow_products(self):
    method get_cart (line 181) | def get_cart(self):
    method get_orders (line 199) | def get_orders(self):
    method getAndStoreBoughtItems (line 255) | def getAndStoreBoughtItems(self):
    method getOnePageOrder (line 272) | def getOnePageOrder(self, year):
    method parseOnePageOrder (line 293) | def parseOnePageOrder(self, resultHtml):
    method getOrderOfNormal (line 332) | def getOrderOfNormal(self, orderDetailUrl, resultHtml=None):
    method getOrderOfChongzhi (line 410) | def getOrderOfChongzhi(self, orderDetailUrl, resultHtml=None):
    method changeOrderParseResultListToTable (line 470) | def changeOrderParseResultListToTable(self, orderParseResultList):
    method writeDatatableIntoFile (line 496) | def writeDatatableIntoFile(self, filename, datatable):

FILE: Spiders/alipay/main.py
  class ASpider (line 12) | class ASpider(object):
    method __init__ (line 13) | def __init__(self, cookie):
    method get_user_info (line 31) | def get_user_info(self):
    method write_json (line 43) | def write_json(self, name, str):
    method get_YEB (line 48) | def get_YEB(self):
    method get_bills (line 59) | def get_bills(self):

FILE: Spiders/bilibili/main.py
  class BilibiliHistory (line 7) | class BilibiliHistory(object):
    method __init__ (line 8) | def __init__(self, cookie_str):
    method get_all_bili_history (line 21) | def get_all_bili_history(self):
    method get_user_info (line 41) | def get_user_info(self):
    method req_get (line 47) | def req_get(self, headers, url):
    method save (line 51) | def save(self, data, filename):
    method get_header (line 56) | def get_header(self):

FILE: Spiders/browser/main.py
  class Browserhistory (line 14) | class Browserhistory(object):
    method __init__ (line 15) | def __init__(self):
    method timestamp_format (line 32) | def timestamp_format(self, timestamp):
    method data_save_as_json (line 40) | def data_save_as_json(self, data):

FILE: Spiders/chsi/main.py
  class Chis (line 9) | class Chis(object):
    method __init__ (line 10) | def __init__(self, cookie):
    method get_xueji_info (line 25) | def get_xueji_info(self):
    method get_report (line 55) | def get_report(self):
    method save_ret (line 147) | def save_ret(self, url, name):

FILE: Spiders/cloudmusic/main.py
  class Cloudmusic (line 7) | class Cloudmusic(object):
    method __init__ (line 8) | def __init__(self, username, password):
    method login_refresh (line 24) | def login_refresh(self):
    method user_login_as_cellphone (line 30) | def user_login_as_cellphone(self):
    method user_login_as_email (line 43) | def user_login_as_email(self):
    method data_wirte_to_json (line 56) | def data_wirte_to_json(self, filename, context):
    method get_user_detail (line 63) | def get_user_detail(self):
    method get_playlist (line 71) | def get_playlist(self):
    method get_user_follows (line 79) | def get_user_follows(self):
    method get_user_followeds (line 87) | def get_user_followeds(self):
    method get_user_event (line 95) | def get_user_event(self):
    method get_user_record_week (line 103) | def get_user_record_week(self):
    method get_user_record_all (line 111) | def get_user_record_all(self):

FILE: Spiders/cnblog/main.py
  class Cnblog (line 16) | class Cnblog(object):
    method __init__ (line 17) | def __init__(self, blogname):
    method get_element_of_article (line 23) | def get_element_of_article(self):
    method save_as_json (line 63) | def save_as_json(self, content_json):
    method get_text (line 70) | def get_text(self, json_file, column='title'):
    method split_word (line 78) | def split_word(self, text):
    method word_counter (line 92) | def word_counter(self, words):
    method create_wordcloud (line 100) | def create_wordcloud(self, json_file, title='词云', column='title'):
    method create_postdate_line (line 111) | def create_postdate_line(self, json_file, title='折线图', column='postdat...

FILE: Spiders/csdn/main.py
  class Csdn (line 9) | class Csdn(object):
    method __init__ (line 10) | def __init__(self, blogname):
    method get_element_of_article (line 19) | def get_element_of_article(self):
    method save_as_json (line 58) | def save_as_json(self, content_json):

FILE: Spiders/ctrip/main.py
  class Ctrip (line 9) | class Ctrip(object):
    method __init__ (line 10) | def __init__(self, cookie):
    method get_json_order (line 25) | def get_json_order(self):
    method transfer_and_save (line 52) | def transfer_and_save(self, json_str):
    method get_order (line 75) | def get_order(self):

FILE: Spiders/github/main.py
  class Github (line 7) | class Github(object):
    method __init__ (line 8) | def __init__(self, username):
    method get_user_info (line 18) | def get_user_info(self):
    method get_user_repos (line 28) | def get_user_repos(self):
    method get_user_following (line 38) | def get_user_following(self):
    method get_user_followers (line 48) | def get_user_followers(self):
    method get_user_activity (line 58) | def get_user_activity(self):
    method get_user_repos_detail (line 68) | def get_user_repos_detail(self):

FILE: Spiders/jianshu/main.py
  class Jianshu (line 9) | class Jianshu(object):
    method __init__ (line 10) | def __init__(self, blogurl):
    method get_element_of_article (line 19) | def get_element_of_article(self):
    method save_as_json (line 62) | def save_as_json(self, content_json):

FILE: Spiders/mail/main.py
  class YSpider (line 17) | class YSpider(object):
    method gen_session (line 18) | def gen_session(self, cookie):
    method write_json (line 32) | def write_json(self, name, str):
    method qq_mail (line 38) | def qq_mail(self, cookie, sid):
    method sinamail (line 86) | def sinamail(self, cookie):
    method gen_driver (line 125) | def gen_driver(self, cookies_list):
    method get_hotmail (line 143) | def get_hotmail(self, cookie_list):
    method get_aliyun_mail (line 171) | def get_aliyun_mail(self, cookie):
    method get_wangyi (line 201) | def get_wangyi(self, cookie):
  class Xml2Json (line 256) | class Xml2Json:
    method __init__ (line 259) | def __init__(self, data=None):
    method feed (line 269) | def feed(self, data):
    method close (line 274) | def close(self):
    method start (line 278) | def start(self, tag, attrs):
    method end (line 282) | def end(self, tag):
    method data (line 308) | def data(self, data):

FILE: Spiders/moments_album/main.py
  class Momentsablum (line 14) | class Momentsablum(object):
    method __init__ (line 15) | def __init__(self):
    method input_url (line 21) | def input_url(self, driver):
    method make_album (line 64) | def make_album(self):

FILE: Spiders/oschina/main.py
  class Oschina (line 9) | class Oschina(object):
    method __init__ (line 10) | def __init__(self, blogurl):
    method get_element_of_article (line 19) | def get_element_of_article(self):
    method save_as_json (line 62) | def save_as_json(self, content_json):

FILE: Spiders/qqfriend/main.py
  class Qqfriend (line 13) | class Qqfriend(object):
    method __init__ (line 14) | def __init__(self):
    method callback_json (line 63) | def callback_json(self):
    method close_chrome (line 99) | def close_chrome(self):

FILE: Spiders/qqqun/main.py
  class Qqqun (line 16) | class Qqqun(object):
    method __init__ (line 17) | def __init__(self):
    method delNT (line 37) | def delNT(self, s):
    method callback_json (line 92) | def callback_json(self):
    method close_chrome (line 153) | def close_chrome(self):

FILE: Spiders/shgjj/main.py
  class GjjSpider (line 7) | class GjjSpider(object):
    method __init__ (line 8) | def __init__(self, cookie, token):
    method write_json (line 24) | def write_json(self, name, str):
    method get_priaccountForWeb (line 30) | def get_priaccountForWeb(self):
    method get_accountForWeb (line 37) | def get_accountForWeb(self):

FILE: Spiders/taobao/spider.py
  function ease_out_quad (line 21) | def ease_out_quad(x):
  function ease_out_quart (line 24) | def ease_out_quart(x):
  function ease_out_expo (line 27) | def ease_out_expo(x):
  function get_tracks (line 33) | def get_tracks(distance, seconds, ease_func):
  function drag_and_drop (line 43) | def drag_and_drop(browser, offset=26.5):
  function gen_session (line 51) | def gen_session(cookie):
  class TaobaoSpider (line 63) | class TaobaoSpider(object):
    method __init__ (line 64) | def __init__(self, cookies_list):
    method swipe_down (line 83) | def swipe_down(self, second):
    method crawl_good_buy_data (line 98) | def crawl_good_buy_data(self, pn=3):
    method get_choucang_item (line 175) | def get_choucang_item(self, page=3):
    method get_footmark_item (line 202) | def get_footmark_item(self, page=3):
    method get_addr (line 226) | def get_addr(self):

FILE: Spiders/telephone/main.py
  class LianTong (line 13) | class LianTong(object):
    method __init__ (line 14) | def __init__(self, cookie):
    method get_user_info (line 32) | def get_user_info(self):
    method get_bill_info (line 41) | def get_bill_info(self, dat=''):
  class DianXin (line 55) | class DianXin(object):
    method __init__ (line 56) | def __init__(self, cookie):
    method get_user_info (line 78) | def get_user_info(self):
    method get_bill_info (line 86) | def get_bill_info(self, dat=''):

FILE: Spiders/yidong/main.py
  class YiDong (line 12) | class YiDong(object):
    method __init__ (line 13) | def __init__(self, cookie):
    method get_user_info (line 31) | def get_user_info(self):
    method get_bill_info (line 37) | def get_bill_info(self):
    method get_bill_json (line 46) | def get_bill_json(self):
    method transfer_and_save_bill (line 60) | def transfer_and_save_bill(self, bill_json_str):

FILE: Spiders/zhihu/main.py
  class Zhihu (line 5) | class Zhihu(object):
    method __init__ (line 6) | def __init__(self, userToken):
    method info_write_to_json (line 15) | def info_write_to_json(self, filename, response):
    method get_user_profile (line 22) | def get_user_profile(self):
    method get_user_followees (line 29) | def get_user_followees(self):
    method get_user_followers (line 36) | def get_user_followers(self):
    method get_user_articles (line 43) | def get_user_articles(self):
    method get_user_collections (line 50) | def get_user_collections(self):
    method get_user_zvideos (line 57) | def get_user_zvideos(self):
    method get_user_activities (line 64) | def get_user_activities(self):

FILE: extension/js/FileSaver.js
  function bom (line 29) | function bom(blob, opts) {
  function download (line 49) | function download(url, name, opts) {
  function corsEnabled (line 65) | function corsEnabled(url) {
  function click (line 78) | function click(node) {

FILE: extension/js/cnblog/cnblogrun2.js
  function bom (line 29) | function bom(blob, opts) {
  function download (line 49) | function download(url, name, opts) {
  function corsEnabled (line 65) | function corsEnabled(url) {
  function click (line 78) | function click(node) {

FILE: extension/js/github/githubrun5.js
  function bom (line 29) | function bom(blob, opts) {
  function download (line 49) | function download(url, name, opts) {
  function corsEnabled (line 65) | function corsEnabled(url) {
  function click (line 78) | function click(node) {

FILE: extension/js/jianshu/jianshurun2.js
  function bom (line 29) | function bom(blob, opts) {
  function download (line 49) | function download(url, name, opts) {
  function corsEnabled (line 65) | function corsEnabled(url) {
  function click (line 78) | function click(node) {

FILE: extension/js/jquery.js
  function DOMEval (line 107) | function DOMEval( code, node, doc ) {
  function toType (line 137) | function toType( obj ) {
  function isArrayLike (line 507) | function isArrayLike( obj ) {
  function Sizzle (line 759) | function Sizzle( selector, context, results, seed ) {
  function createCache (line 907) | function createCache() {
  function markFunction (line 927) | function markFunction( fn ) {
  function assert (line 936) | function assert( fn ) {
  function addHandle (line 960) | function addHandle( attrs, handler ) {
  function siblingCheck (line 975) | function siblingCheck( a, b ) {
  function createInputPseudo (line 1001) | function createInputPseudo( type ) {
  function createButtonPseudo (line 1012) | function createButtonPseudo( type ) {
  function createDisabledPseudo (line 1023) | function createDisabledPseudo( disabled ) {
  function createPositionalPseudo (line 1079) | function createPositionalPseudo( fn ) {
  function testContext (line 1102) | function testContext( context ) {
  function setFilters (line 2313) | function setFilters() {}
  function toSelector (line 2387) | function toSelector( tokens ) {
  function addCombinator (line 2397) | function addCombinator( matcher, combinator, base ) {
  function elementMatcher (line 2464) | function elementMatcher( matchers ) {
  function multipleContexts (line 2478) | function multipleContexts( selector, contexts, results ) {
  function condense (line 2487) | function condense( unmatched, map, filter, context, xml ) {
  function setMatcher (line 2508) | function setMatcher( preFilter, selector, matcher, postFilter, postFinde...
  function matcherFromTokens (line 2608) | function matcherFromTokens( tokens ) {
  function matcherFromGroupMatchers (line 2671) | function matcherFromGroupMatchers( elementMatchers, setMatchers ) {
  function nodeName (line 3029) | function nodeName( elem, name ) {
  function winnow (line 3039) | function winnow( elements, qualifier, not ) {
  function sibling (line 3334) | function sibling( cur, dir ) {
  function createOptions (line 3427) | function createOptions( options ) {
  function Identity (line 3652) | function Identity( v ) {
  function Thrower (line 3655) | function Thrower( ex ) {
  function adoptValue (line 3659) | function adoptValue( value, resolve, reject, noValue ) {
  function resolve (line 3752) | function resolve( depth, deferred, handler, special ) {
  function completed (line 4117) | function completed() {
  function fcamelCase (line 4212) | function fcamelCase( _all, letter ) {
  function camelCase (line 4219) | function camelCase( string ) {
  function Data (line 4236) | function Data() {
  function getData (line 4405) | function getData( data ) {
  function dataAttr (line 4430) | function dataAttr( elem, key, data ) {
  function adjustCSS (line 4742) | function adjustCSS( elem, prop, valueParts, tween ) {
  function getDefaultDisplay (line 4810) | function getDefaultDisplay( elem ) {
  function showHide (line 4833) | function showHide( elements, show ) {
  function getAll (line 4965) | function getAll( context, tag ) {
  function setGlobalEval (line 4990) | function setGlobalEval( elems, refElements ) {
  function buildFragment (line 5006) | function buildFragment( elems, context, scripts, selection, ignored ) {
  function returnTrue (line 5098) | function returnTrue() {
  function returnFalse (line 5102) | function returnFalse() {
  function expectSync (line 5112) | function expectSync( elem, type ) {
  function safeActiveElement (line 5119) | function safeActiveElement() {
  function on (line 5125) | function on( elem, types, selector, data, fn, one ) {
  function leverageNative (line 5613) | function leverageNative( el, type, expectSync ) {
  function manipulationTarget (line 5962) | function manipulationTarget( elem, content ) {
  function disableScript (line 5973) | function disableScript( elem ) {
  function restoreScript (line 5977) | function restoreScript( elem ) {
  function cloneCopyEvent (line 5987) | function cloneCopyEvent( src, dest ) {
  function fixInput (line 6020) | function fixInput( src, dest ) {
  function domManip (line 6033) | function domManip( collection, args, callback, ignored ) {
  function remove (line 6125) | function remove( elem, selector, keepData ) {
  function computeStyleTests (line 6439) | function computeStyleTests() {
  function roundPixelMeasures (line 6483) | function roundPixelMeasures( measure ) {
  function curCSS (line 6576) | function curCSS( elem, name, computed ) {
  function addGetHookIf (line 6629) | function addGetHookIf( conditionFn, hookFn ) {
  function vendorPropName (line 6654) | function vendorPropName( name ) {
  function finalPropName (line 6669) | function finalPropName( name ) {
  function setPositiveNumber (line 6695) | function setPositiveNumber( _elem, value, subtract ) {
  function boxModelAdjustment (line 6707) | function boxModelAdjustment( elem, dimension, box, isBorderBox, styles, ...
  function getWidthOrHeight (line 6775) | function getWidthOrHeight( elem, dimension, extra ) {
  function Tween (line 7151) | function Tween( elem, options, prop, end, easing ) {
  function schedule (line 7274) | function schedule() {
  function createFxNow (line 7287) | function createFxNow() {
  function genFx (line 7295) | function genFx( type, includeWidth ) {
  function createTween (line 7315) | function createTween( value, prop, animation ) {
  function defaultPrefilter (line 7329) | function defaultPrefilter( elem, props, opts ) {
  function propFilter (line 7501) | function propFilter( props, specialEasing ) {
  function Animation (line 7538) | function Animation( elem, properties, options ) {
  function stripAndCollapse (line 8254) | function stripAndCollapse( value ) {
  function getClass (line 8260) | function getClass( elem ) {
  function classesToArray (line 8264) | function classesToArray( value ) {
  function buildParams (line 8894) | function buildParams( prefix, obj, traditional, add ) {
  function addToPrefiltersOrTransports (line 9047) | function addToPrefiltersOrTransports( structure ) {
  function inspectPrefiltersOrTransports (line 9081) | function inspectPrefiltersOrTransports( structure, options, originalOpti...
  function ajaxExtend (line 9110) | function ajaxExtend( target, src ) {
  function ajaxHandleResponses (line 9130) | function ajaxHandleResponses( s, jqXHR, responses ) {
  function ajaxConvert (line 9188) | function ajaxConvert( s, response, jqXHR, isSuccess ) {
  function done (line 9704) | function done( status, nativeStatusText, responses, headers ) {

FILE: extension/js/oschina/oschinarun0.js
  function bom (line 29) | function bom(blob, opts) {
  function download (line 49) | function download(url, name, opts) {
  function corsEnabled (line 65) | function corsEnabled(url) {
  function click (line 78) | function click(node) {

FILE: tests/DeepAnalysis/dataprocess.py
  class StockDataset (line 51) | class StockDataset(Dataset):
    method __init__ (line 52) | def __init__(self, x, y):
    method __getitem__ (line 55) | def __getitem__(self, index):
    method __len__ (line 57) | def __len__(self):

FILE: tests/DeepAnalysis/model.py
  class LSTM (line 10) | class LSTM(nn.Module):
    method __init__ (line 11) | def __init__(self, input_size=1, hidden_size=100, output_size=1):
    method init_hidden (line 18) | def init_hidden(self):
    method forward (line 22) | def forward(self, input):

FILE: tests/DeepAnalysis/trainer.py
  function read_data (line 13) | def read_data():
  function normalize_data (line 22) | def normalize_data(data):
  function create_train_data (line 28) | def create_train_data(data, seq_len):
  function create_test_data (line 41) | def create_test_data(data, seq_len):
  function train_model (line 51) | def train_model(x_train, y_train, x_test, y_test):
  function predict (line 69) | def predict(model, x_test, y_test, scaler):
  function evaluate_model (line 79) | def evaluate_model(y_pred, y_test):

FILE: tests/ctrip/main.py
  class SpiderHelper (line 17) | class SpiderHelper:
    method __init__ (line 18) | def __init__(self):
    method Automation (line 21) | def Automation(self, url):
    method getCookie3 (line 28) | def getCookie3(self, login_url, quit):
    method getCookie2 (line 43) | def getCookie2(self, login_url, curr_url, extra_url, quit):
    method getCookie (line 60) | def getCookie(self, login):

FILE: tools/main.py
  class Button (line 43) | class Button:
    method __init__ (line 45) | def __init__(self, frame, pnl, item):
    method Automation (line 53) | def Automation(self, url):
    method getCookie3 (line 65) | def getCookie3(self, login_url, quit):
    method getCookie2 (line 81) | def getCookie2(self, login_url, curr_url, extra_url, quit):
    method getCookie4 (line 105) | def getCookie4(self, login_url, curr_url, quit):
    method getCookie (line 127) | def getCookie(self, login):
    method updateStatus (line 134) | def updateStatus(self, frame, status):
  class JdButton (line 150) | class JdButton(Button):
    method OnClick (line 151) | def OnClick(self, event):
  class ChisButton (line 185) | class ChisButton(Button):
    method OnClick (line 186) | def OnClick(self, event):
  class YidongButton (line 209) | class YidongButton(Button):
    method OnClick (line 210) | def OnClick(self, event):
  class GjjButton (line 233) | class GjjButton(Button):
    method OnClick (line 234) | def OnClick(self, event):
  class A12306Button (line 266) | class A12306Button(Button):
    method OnClick (line 267) | def OnClick(self, event):
  class CtripButton (line 306) | class CtripButton(Button):
    method OnClick (line 307) | def OnClick(self, event):
  class LiantongButton (line 318) | class LiantongButton(Button):
    method OnClick (line 319) | def OnClick(self, event):
  class DianxingButton (line 345) | class DianxingButton(Button):
    method OnClick (line 346) | def OnClick(self, event):
  class WymailButton (line 371) | class WymailButton(Button):
    method OnClick (line 372) | def OnClick(self, event):
  class HotmailButton (line 394) | class HotmailButton(Button):
    method OnClick (line 395) | def OnClick(self, event):
  class QqmailButton (line 418) | class QqmailButton(Button):
    method OnClick (line 419) | def OnClick(self, event):
  class AlimailButton (line 446) | class AlimailButton(Button):
    method OnClick (line 447) | def OnClick(self, event):
  class XlmailButton (line 472) | class XlmailButton(Button):
    method OnClick (line 473) | def OnClick(self, event):
  class TaobaoButton (line 500) | class TaobaoButton(Button):
    method OnClick (line 501) | def OnClick(self, event):
  class ZfbButton (line 534) | class ZfbButton(Button):
    method OnClick (line 535) | def OnClick(self, event):
  class GithubButton (line 561) | class GithubButton(Button):
    method OnClick (line 562) | def OnClick(self, event):
  class QqButton (line 583) | class QqButton(Button):
    method OnClick (line 584) | def OnClick(self, event):
  class QqqunButton (line 606) | class QqqunButton(Button):
    method OnClick (line 607) | def OnClick(self, event):
  class ZhihuButton (line 627) | class ZhihuButton(Button):
    method OnClick (line 628) | def OnClick(self, event):
  class CloudmusicButton (line 658) | class CloudmusicButton(Button):
    method OnClick (line 659) | def OnClick(self, event):
  class BilibiliButton (line 686) | class BilibiliButton(Button):
    method OnClick (line 687) | def OnClick(self, event):
  class WechatButton (line 704) | class WechatButton(Button):
    method OnClick (line 705) | def OnClick(self, event):
  class WechatmomentButton (line 717) | class WechatmomentButton(Button):
    method OnClick (line 718) | def OnClick(self, event):
  class MomentsalbumButton (line 721) | class MomentsalbumButton(Button):
    method OnClick (line 722) | def OnClick(self, event):
  class BrowserButton (line 751) | class BrowserButton(Button):
    method OnClick (line 752) | def OnClick(self, event):
  class CnblogButton (line 762) | class CnblogButton(Button):
    method OnClick (line 763) | def OnClick(self, event):
  class CsdnButton (line 781) | class CsdnButton(Button):
    method OnClick (line 782) | def OnClick(self, event):
  class OschinaButton (line 798) | class OschinaButton(Button):
    method OnClick (line 799) | def OnClick(self, event):
  class JianshuButton (line 815) | class JianshuButton(Button):
    method OnClick (line 816) | def OnClick(self, event):
  class Item (line 832) | class Item:
    method __init__ (line 837) | def __init__(self, x, y, title, img):
  class CreateFrame (line 843) | class CreateFrame(wx.Frame):
    method __init__ (line 845) | def __init__(self, *args, **kw):

Download .json

Condensed preview — 65 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (803K chars).

[
  {
    "path": ".github/FUNDING.yml",
    "chars": 104,
    "preview": "# These are supported funding model platforms\nko_fi: kangvcar\ncustom: ['https://afdian.net/a/kangvcar']\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "chars": 733,
    "preview": "---\nname:  \"\\U0001F41B Bug Report\"\nabout: \"If something isn't working as expected \\U0001F914.\"\ntitle: ''\nlabels: 'bug'\na"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "chars": 0,
    "preview": ""
  },
  {
    "path": ".gitignore",
    "chars": 54,
    "preview": "*.pyc\n*.json\n*.xlsx\n*.swp\ndata\n.idea\n*.log\n__pycache__"
  },
  {
    "path": "LICENSE",
    "chars": 35120,
    "preview": "GNU GENERAL PUBLIC LICENSE\n                       Version 3, 29 June 2007\n\n Copyright (C) 2007 Free Software Foundation,"
  },
  {
    "path": "README.md",
    "chars": 6697,
    "preview": "<p align=\"center\">\n    <img src=\"https://i.loli.net/2020/10/20/SKOdFZpVYo4LvgT.png\" alt=\"InfoSpider logo\"/>\n</p>\n\n***\n\n<"
  },
  {
    "path": "README_EN.md",
    "chars": 7202,
    "preview": "<p align=\"center\">\n    <img src=\"https://i.loli.net/2020/10/20/SKOdFZpVYo4LvgT.png\" alt=\"logo\"/>\n</p>\n\n***\n\n<p align=\"ce"
  },
  {
    "path": "Spiders/A12306/main12306.py",
    "chars": 7322,
    "preview": "import json\nimport datetime\nimport os\nimport sys\nimport requests\nfrom tkinter.filedialog import askdirectory\n\n# session "
  },
  {
    "path": "Spiders/JdSpider/jd_more_info.py",
    "chars": 23789,
    "preview": "# coding: utf8\nimport json\nimport os\nimport re\nimport sys\nimport requests\nfrom lxml import etree\nimport datetime\nimport "
  },
  {
    "path": "Spiders/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "Spiders/alipay/main.py",
    "chars": 5170,
    "preview": "import json\nimport os\nimport re\nimport os\nimport requests\nfrom lxml import etree\nfrom selenium import webdriver\nfrom sel"
  },
  {
    "path": "Spiders/bilibili/main.py",
    "chars": 2438,
    "preview": "import os\nimport json\nimport time\nimport requests\nfrom tkinter.filedialog import askdirectory\n\nclass BilibiliHistory(obj"
  },
  {
    "path": "Spiders/browser/main.py",
    "chars": 2105,
    "preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\nimport os\nimport sys\nimport json\nimport time\nimport sqlite3  \nimport opera"
  },
  {
    "path": "Spiders/chsi/main.py",
    "chars": 6050,
    "preview": "import json\nimport os\nimport re\n\nimport requests\nfrom lxml import etree\n\n\nclass Chis(object):\n    def __init__(self, coo"
  },
  {
    "path": "Spiders/cloudmusic/main.py",
    "chars": 4160,
    "preview": "import requests\nimport json\nimport re\nimport time\nfrom tkinter.filedialog import askdirectory\n\nclass Cloudmusic(object):"
  },
  {
    "path": "Spiders/cnblog/main.py",
    "chars": 5148,
    "preview": "import re\nimport os\nimport sys\nimport json\nimport requests\nimport pandas as pd\nimport numpy as np\nimport jieba\nimport py"
  },
  {
    "path": "Spiders/csdn/main.py",
    "chars": 2625,
    "preview": "import re\nimport os\nimport sys\nimport json\nimport requests\nfrom bs4 import BeautifulSoup\nfrom tkinter.filedialog import "
  },
  {
    "path": "Spiders/ctrip/main.py",
    "chars": 2693,
    "preview": "import os\nimport time\n\nimport requests\nimport json\nimport xlsxwriter\n\n\nclass Ctrip(object):\n    def __init__(self, cooki"
  },
  {
    "path": "Spiders/github/main.py",
    "chars": 3462,
    "preview": "import json\nimport os\nimport re\nimport requests\nfrom tkinter.filedialog import askdirectory\n\nclass Github(object):\n    d"
  },
  {
    "path": "Spiders/jianshu/main.py",
    "chars": 2675,
    "preview": "import re\nimport os\nimport sys\nimport json\nimport requests\nfrom bs4 import BeautifulSoup\nfrom tkinter.filedialog import "
  },
  {
    "path": "Spiders/mail/main.py",
    "chars": 20521,
    "preview": "# -*- coding: utf-8 -*-\nimport json\nimport os\nimport re\nimport time\nimport sys\nfrom nltk.sem.drt import DrtParser\nfrom s"
  },
  {
    "path": "Spiders/moments_album/main.py",
    "chars": 6927,
    "preview": "# -*- coding:utf-8 -*-\nfrom selenium import webdriver\nimport selenium.webdriver.support.expected_conditions as EC\nfrom s"
  },
  {
    "path": "Spiders/oschina/main.py",
    "chars": 2715,
    "preview": "import re\nimport os\nimport sys\nimport json\nimport requests\nfrom bs4 import BeautifulSoup\nfrom tkinter.filedialog import "
  },
  {
    "path": "Spiders/qqfriend/main.py",
    "chars": 4514,
    "preview": "import selenium\nfrom selenium import webdriver\nfrom selenium.webdriver.chrome.options import Options\nimport json\nimport "
  },
  {
    "path": "Spiders/qqqun/main.py",
    "chars": 6538,
    "preview": "# -*- coding: utf-8 -*-\nimport selenium\nfrom selenium import webdriver\nfrom selenium.webdriver.chrome.options import Opt"
  },
  {
    "path": "Spiders/shgjj/main.py",
    "chars": 2247,
    "preview": "import json\nimport os\n\nimport requests\n\n\nclass GjjSpider(object):\n    def __init__(self, cookie, token):\n        self.se"
  },
  {
    "path": "Spiders/taobao/spider.py",
    "chars": 10416,
    "preview": "import json\nimport random\nimport time\nimport sys\nimport os\nimport requests\nimport numpy as np\nimport math\nfrom lxml impo"
  },
  {
    "path": "Spiders/telephone/main.py",
    "chars": 4799,
    "preview": "import json\nimport os\nimport re\nimport sys\nimport xlsxwriter\nimport requests\nfrom tkinter.filedialog import askdirectory"
  },
  {
    "path": "Spiders/yidong/main.py",
    "chars": 3188,
    "preview": "import json\nimport os\nimport re\nimport xlsxwriter\nimport sys\nimport requests\nfrom requests.packages.urllib3.exceptions i"
  },
  {
    "path": "Spiders/zhihu/main.py",
    "chars": 2692,
    "preview": "# import zhihuapi as zhihu\nimport requests\nfrom tkinter.filedialog import askdirectory\n\nclass Zhihu(object):\n    def __i"
  },
  {
    "path": "docs/.nojekyll",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "docs/QuickStart.md",
    "chars": 210,
    "preview": "\n## Prerequisites\n\n* Ubuntu 16.04\n* Python3 & pip3\n* Chrome Browser and [Chrome Driver](http://chromedriver.storage.goog"
  },
  {
    "path": "docs/README.md",
    "chars": 122335,
    "preview": "# **INFO-SPIDER** \n\n> 一个神奇的工具箱, 拿回你的个人信息.\n\n# **Introduction**\n## 开发者回忆录🌈\n<details>\n<summary>点击展开👉 开发者回忆录🌈</summary>\n\n###"
  },
  {
    "path": "docs/_coverpage.md",
    "chars": 409,
    "preview": "\n<!-- _coverpage.md -->\n<!-- ![cover_page](/_media/logo-transparent-100px.png) -->\n![logo](/_media/logo-transparent-100p"
  },
  {
    "path": "docs/ads.txt",
    "chars": 58,
    "preview": "google.com, pub-3091494829711028, DIRECT, f08c47fec0942fa0"
  },
  {
    "path": "docs/index.html",
    "chars": 3478,
    "preview": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n  <meta charset=\"UTF-8\">\n  <title>INFO-SPIDER - 拿回你的个人信息</title>\n  <link rel=\"sh"
  },
  {
    "path": "extension/index.css",
    "chars": 601,
    "preview": ".cnblog{\n    width:40px;\n    height: 40px;\n    border:0;\n    background:url(img/cnblog.png) no-repeat;\n    background-si"
  },
  {
    "path": "extension/index.html",
    "chars": 973,
    "preview": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n    <meta charset=\"UTF-8\">\n    <meta http-equiv=\"X-UA-Compatible\" content=\"IE=ed"
  },
  {
    "path": "extension/js/FileSaver.js",
    "chars": 6468,
    "preview": "(function (global, factory) {\n    if (typeof define === \"function\" && define.amd) {\n      define([], factory);\n    } els"
  },
  {
    "path": "extension/js/cnblog/cnblogrun0.js",
    "chars": 202,
    "preview": "window.onload = function () {\n    chrome.storage.sync.get('cnblogname',function(budget){\n        var cnblog1=\"https://ho"
  },
  {
    "path": "extension/js/cnblog/cnblogrun1.js",
    "chars": 1687,
    "preview": "window.onload = function () {\n    // for(var i=0;i<10;i++)\n    // {\n    //      console.log(document.getElementsByTagNam"
  },
  {
    "path": "extension/js/cnblog/cnblogrun2.js",
    "chars": 7729,
    "preview": "(function (global, factory) {\n    if (typeof define === \"function\" && define.amd) {\n      define([], factory);\n    } els"
  },
  {
    "path": "extension/js/github/githubrun1.js",
    "chars": 733,
    "preview": "window.onload = function () {\n\n  //  console.log(document.getElementsByTagName('pre')[0].innerHTML)\n    var data = docum"
  },
  {
    "path": "extension/js/github/githubrun2.js",
    "chars": 528,
    "preview": "\nwindow.onload = function () {\n    console.log(document.getElementsByTagName('pre')[0].innerHTML)\n    let data = documen"
  },
  {
    "path": "extension/js/github/githubrun3.js",
    "chars": 532,
    "preview": "\nwindow.onload = function () {\n    console.log(document.getElementsByTagName('pre')[0].innerHTML)\n    let data = documen"
  },
  {
    "path": "extension/js/github/githubrun4.js",
    "chars": 533,
    "preview": "\nwindow.onload = function () {\n    console.log(document.getElementsByTagName('pre')[0].innerHTML)\n    let data = documen"
  },
  {
    "path": "extension/js/github/githubrun5.js",
    "chars": 7451,
    "preview": "(function (global, factory) {\n    if (typeof define === \"function\" && define.amd) {\n      define([], factory);\n    } els"
  },
  {
    "path": "extension/js/index.js",
    "chars": 2681,
    "preview": "window.onload = function () {\n    $('#github').click(function(){\n        var githubname=prompt(\"请输入用户名\");\n        chrome"
  },
  {
    "path": "extension/js/jianshu/jianshurun1.js",
    "chars": 409,
    "preview": "window.onload = function () {\n    console.log(document.getElementsByTagName('pre')[0].innerHTML)\n\n    data=document.getE"
  },
  {
    "path": "extension/js/jianshu/jianshurun2.js",
    "chars": 6915,
    "preview": "(function (global, factory) {\n    if (typeof define === \"function\" && define.amd) {\n      define([], factory);\n    } els"
  },
  {
    "path": "extension/js/jquery.js",
    "chars": 288579,
    "preview": "/*!\n * jQuery JavaScript Library v3.6.0\n * https://jquery.com/\n *\n * Includes Sizzle.js\n * https://sizzlejs.com/\n *\n * C"
  },
  {
    "path": "extension/js/oschina/oschinarun0.js",
    "chars": 8106,
    "preview": "(function (global, factory) {\n    if (typeof define === \"function\" && define.amd) {\n      define([], factory);\n    } els"
  },
  {
    "path": "install_deps.sh",
    "chars": 166,
    "preview": "sudo apt-get install build-essential libgtk-3-dev libgstreamer-plugins-base1.0-dev  libwebkitgtk-3.0-dev libxslt-dev fre"
  },
  {
    "path": "requirements.txt",
    "chars": 271,
    "preview": "matplotlib==3.2.0\npyecharts==1.7.1\nselenium==3.141.0\nXlsxWriter==1.2.9\nopenpyxl==3.0.4\nnltk==3.9\npyquery==1.4.0\nlxml==4."
  },
  {
    "path": "tests/DeepAnalysis/dataprocess.py",
    "chars": 1758,
    "preview": "### 使用pytorch创建dataset和dataloader\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim "
  },
  {
    "path": "tests/DeepAnalysis/model.py",
    "chars": 849,
    "preview": "### 使用pytorch，创建一个LSTM模型\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\n"
  },
  {
    "path": "tests/DeepAnalysis/trainer.py",
    "chars": 2983,
    "preview": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\nimport numpy as np\nimport"
  },
  {
    "path": "tests/blog_analyse/cnblog.ipynb",
    "chars": 7194,
    "preview": "{\n \"metadata\": {\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_ext"
  },
  {
    "path": "tests/blog_analyse/postdate_line.html",
    "chars": 6761,
    "preview": "<!DOCTYPE html>\n<html>\n<head>\n    <meta charset=\"UTF-8\">\n    <title>Awesome-pyecharts</title>\n            <script type=\""
  },
  {
    "path": "tests/blog_analyse/stop_word.txt",
    "chars": 2124,
    "preview": "\n$\n0\n1\n2\n3\n4\n5\n6\n7\n8\n9\n?\n_\n“\n”\n、\n。\n《\n》\n一\n一些\n一何\n一切\n一则\n一方面\n一旦\n一来\n一样\n一般\n一转眼\n一瞬间\n万一\n上\n上下\n下\n不\n不仅\n不但\n不光\n不单\n不只\n不外乎\n不如\n不妨\n不尽\n不尽"
  },
  {
    "path": "tests/blog_analyse/topic_wordcloud.html",
    "chars": 31108,
    "preview": "<!DOCTYPE html>\n<html>\n<head>\n    <meta charset=\"UTF-8\">\n    <title>Awesome-pyecharts</title>\n            <script type=\""
  },
  {
    "path": "tests/ctrip/main.py",
    "chars": 2996,
    "preview": "import json\nimport os\nimport re\nimport threading\n\nimport wx\nimport time\nfrom selenium import webdriver\nfrom selenium.web"
  },
  {
    "path": "tools/main.py",
    "chars": 34574,
    "preview": "# -*- coding: utf-8 -*-\nimport json\nimport os\nimport re\nimport threading\nimport traceback\nimport wx\nimport time\nfrom sel"
  },
  {
    "path": "tools/stop_word.txt",
    "chars": 2078,
    "preview": "$\n0\n1\n2\n3\n4\n5\n6\n7\n8\n9\n?\n_\n“\n”\n、\n。\n《\n》\n一\n一些\n一何\n一切\n一则\n一方面\n一旦\n一来\n一样\n一般\n一转眼\n万一\n上\n上下\n下\n不\n不仅\n不但\n不光\n不单\n不只\n不外乎\n不如\n不妨\n不尽\n不尽然\n不得\n不"
  },
  {
    "path": "uitest/main.py",
    "chars": 59,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\nimport wx\n\n"
  }
]

About this extraction

This page contains the full source code of the kangvcar/InfoSpider GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 65 files (718.4 KB), approximately 209.5k tokens, and a symbol index with 369 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo