Full Code of Macr0phag3/GithubMonitor for AI

master 7c0e0eb7a4a3 cached

8 files

21.9 KB

6.6k tokens

20 symbols

1 requests

Download .txt

Repository: Macr0phag3/GithubMonitor
Branch: master
Commit: 7c0e0eb7a4a3
Files: 8
Total size: 21.9 KB

Directory structure:
gitextract_n6syoe_j/

├── .gitignore
├── LICENSE
├── README.md
├── leak_test/
│   └── leak_test
├── mysqlite.py
├── reporter.py
├── spider.py
└── template.html

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
config.json
result.html
github
__pycache__/
*.pyc


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2019 Macr0phag3

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# github_monitor

## 项目介绍
由于很多猪队友的存在，公司敏感信息通过 GitHub 泄露出去是很常见的。这个项目主要根据关键字与 hosts 生成的关键词，利用 github 提供的 api，监控 git 泄漏，并在检测到信息泄露的时候发送邮件通知。

## 特性
1. 对于泄露有对应的泄漏定级，可作为严重性的参考
2. 简单却完善：利用 api 获取 GitHub 的搜索结果是最简单高效的方式，加上关键词的限定，保证不超过 GitHub 的 api 限制
3. 注释比较详细，可以很快地进行定制
4. 自动组合关键字

## 快速开始
### 依赖
- pip install PyGithub
- pip install jinja2

### 配置
- 在项目文件夹下新建一个 `config.json` 文件，按照 `spider.py` 里的注释配置。config.json 的示例：
```
{
  "hosts" : [
    "*********.com",
    "*********.com @",
],

"sender_email":{
  "uname":"*********@qq.com",
  "smtp":"smtp.qq.com",
  "port":25,
  "passwd":"*********"
},

"receiver_email":[
  "*********@qq.com",
  "*********@qq.com"
],

"token":"*******************",
"admin_email":"*********@qq.com"
}
```
hosts 中，带 `@` 的说明是邮件类型，在代码中会进行特殊处理，详细处理可见代码

- `spider.py` 中的 `file_url` 可能需要修改

### 运行方式
- crontab 一个小时运行一次 或者直接 python spider.py

## 代码主要逻辑
![代码主要逻辑](https://github.com/Macr0phag3/GithubMonitor/raw/master/pics/pic2.jpg)


## 结果示例
![结果示例](https://github.com/Macr0phag3/GithubMonitor/raw/master/pics/pic1.jpg)

## 一些想法
个人认为，github 监控最难的在于如何判断检索到的数据是否含有泄露的敏感信息，这是一个很难的问题。

对于攻击方来说，一般只是为了利用泄露信息，那么对于 github 泄密的判断，只需要有就行。假如一共 100 条信息，能检测到 10 条也是很有价值的。当然，发现的泄露越多越好，为了达到这一目的，甚至可以上机器学习，提高对敏感信息的判断力。误报率比较低（谁都不想兴冲冲地去看泄露信息结果发现 `password: "********"` :D ）。

**而我这个代码的作用是监控自身公司的泄露。** 对于防守方（公司）检测自身泄露来说，不小心放过一条都意味着很大的风险。换句话说，100 条泄露必须尽可能达到 100% 的检测率，甚至不惜以误报率换取准确率。所以，让代码去判断泄露是很无力的，需要人眼过一遍。那么问题来了，那么多数据，人眼看不过来怎么办呢。

**提高监控关键字的精确性。** 举个例子，假如你的公司域名/ip 为 qq.com/1.1.1.1，那么最好在监控的关键字附上 qq.com/1.1.1.1 这样。类似的方法有很多（自己公司的文件应该有一些特征的。当然肯定有特殊情况，特殊对待吧），目的是减少搜索结果，能提高精确性，降低人的负担。如果你检测的是 `qiniu.com password` 你会发现每一轮都会有大量的数据，所以别用模糊的关键字。

这一方法还解决了 github api 只能拿到前 1000 个搜索结果（不是页面）的问题，搜索结果少意味着更新的数据也不会多，不会超过 1000 的限制。如果你检测的是 `password` 你会发现每轮更新的数据都不止 1000 条，这样会产生漏报（万一就是第 1001 条泄露的呢）。

如果你能理解我上面说的，就没必要自己写 github 的爬虫解析页面，直接调用 api 就好了。

**信任已有，监控增量**，对于攻击者来说，会认为已有的 github 数据存在泄露，需要去淘一遍（当然也有监控增量的）。而对于公司来说，是假设现在 github 没有泄露，然后去监控它的增量，不会淘一遍已有的 github 数据。增量数据包含 2 种：
1. 新增泄露：新 push 的文件
2. 更新泄露：update 的文件

当然，什么都扛不住猪队友呀 :D

## 更新
2019-01-07, 可以免费在 github 上创建私有仓库了。

**强烈建议需要保密的仓库更改为私有**

**强烈建议需要保密的仓库更改为私有**

**强烈建议需要保密的仓库更改为私有**

## License
Copyright © 2018 [Macr0phag3](https://github.com/Macr0phag3).

This project is MIT licensed.

## Others
<img src="https://clean-1252075454.cos.ap-nanjing.myqcloud.com/20200528120800990.png" width="500">

[![Stargazers over time](https://starchart.cc/Macr0phag3/GithubMonitor.svg)](https://starchart.cc/Macr0phag3/GithubMonitor)


================================================
FILE: leak_test/leak_test
================================================
host = http://yin126.com/
password = "test in 2019-03-03 13:53:44"



================================================
FILE: mysqlite.py
================================================
# -*- coding: utf-8 -*-

# 2018.11.23 11:07:22 by Tr0y

import sqlite3
import time


def _get_hour():
    '''
    返回上个小时的时间戳
    假如现在是 2018.11.21 19:44:02， 那么返回 '1542794400'
    即 2018.11.21 18:00:00 的时间戳

    返回值：
        字符串；上个小时的时间戳
    '''

    return int(
        time.mktime(
            time.strptime(
                time.strftime("%Y-%m-%d %H"), "%Y-%m-%d %H")
        )
    )-3600


class MySqlite:
    def __init__(self, dbname, tablename):
        '''
        初始化

        参数：
            dbname：字符串；数据库名
            tablename：字符串；表名
        '''

        self.dbname = dbname
        self.tablename = tablename

        self.conn = sqlite3.connect(self.dbname)

        self._create()

    def _create(self):  #
        '''
        若数据库不存在，则创建数据库
        '''

        query = """create table IF NOT EXISTS {tablename}(
            url VARCHAR(100),
            sha VARCHAR(40),
            repository VARCHAR(100),
            keyword VARCHAR(100),
            filename VARCHAR(100),
            level VARCHAR(5),
            update_time VARCHAR(10),
            last_record_time VARCHAR(10),
            PRIMARY KEY (url, sha)
        );""".format(tablename=self.tablename)  # 不存在才新建
        self.conn.execute(query)
        self.conn.commit()

    def _select(self, sql):
        '''
        查询

        参数：
            sql：字符串；查询的语句

        返回值：
            rows：2 维列表；查询的结果
        '''

        result = self.conn.execute(sql)
        self.conn.commit()
        rows = result.fetchall()
        return rows  # [(, ... ,), (, ... ,)]

    def _insert(self, url, sha, repository, filename, keyword, level, update_time):
        '''
        插入数据
        column 顺序与参数顺序一致
        **插入的数据类型均为字符串**

        参数：
            url：        字符串；代码文件的 url
            sha：        字符串；代码文件的 sha
            repository： 字符串；代码文件的仓库
            filename：   字符串；代码文件的文件名
            keyword：    字符串；代码文件命中的关键字
            level：      整数；  泄露级别
            update_time：字符串；数据库中此记录被更新的时间
        '''

        data = '''INSERT INTO {tablename}(url, sha, repository, keyword, filename, level, update_time, last_record_time) VALUES('{url}','{sha}','{repository}','{keyword}','{filename}','{level}','{update_time}', '∞');
        '''.format(
            tablename=self.tablename,
            url=url,
            sha=sha,
            repository=repository,
            keyword=keyword,
            filename=filename,
            level=level,
            update_time=update_time
        )

        self.conn.execute(data)
        self.conn.commit()

    def _update(self, url, sha, repository, filename, keyword, level, update_time, last_record_time):
        '''
        更新数据

        参数：
            url：             字符串；代码文件的 url
            sha：             字符串；代码文件的 sha
            repository：      字符串；代码文件的仓库
            filename：        字符串；代码文件的文件名
            keyword：         字符串；代码文件命中的关键字
            level：           整数；  泄露级别
            update_time：     字符串；数据库中此记录被更新的时间
            last_record_time：字符串；数据库中此记录上一次被更新的时间
        '''

        data = '''UPDATE {tablename} SET url='{url}', sha='{sha}', repository='{repository}', keyword='{keyword}', filename='{filename}', level='{level}', update_time='{update_time}', last_record_time='{last_record_time}' where url='{url}';
        '''.format(
            tablename=self.tablename,
            url=url,
            sha=sha,
            repository=repository,
            keyword=keyword,
            filename=filename,
            level=level,
            update_time=update_time,
            last_record_time=last_record_time
        )

        self.conn.execute(data)
        self.conn.commit()

    def Record(self, url, sha, repository, filename, keyword, update_time, negative):
        '''
        根据数据库情况，判断新数据记录方式

        参数：
            url：        字符串；代码文件的 url
            sha：        字符串；代码文件的 sha
            repository： 字符串；代码文件的仓库
            filename：   字符串；代码文件的文件名
            keyword：    字符串；代码文件命中的关键字
            update_time：字符串；数据库中此记录被更新的时间
            negative：   布尔值；是否为误报

        返回值
            level：整数；泄露级别
        '''

        result = self._select(
            '''SELECT url, sha, update_time FROM {tablename} where url='{url}'; '''.format(
                url=url,
                tablename=self.tablename
            ))  # 查询是否存在此 url 的记录

        if result:  # 已存在
            if result[0][1] != sha:  # 文件 sha 发生变化
                if negative:
                    level = 1
                else:
                    level = 2

                # 旧的 update_time 作为新的 last_record_time
                self._update(url, sha, repository, filename, keyword, level, update_time, result[0][2])
            else:
                level = 0
        else:
            if negative:
                level = 1
            else:
                level = 3

            self._insert(url, sha, repository, filename, keyword, level, update_time)

        return level

    def Get_Data(self, keyword, level):
        '''
        获取上一轮的泄露记录

        参数：
            keyword：字符串；关键字
            level：字符串；泄露级别

        返回值：
            result：2 维列表；泄露记录
        '''

        last_hour_time = _get_hour()
        result = self._select('''SELECT * FROM {tablename} where keyword='{keyword}' and update_time>='{last_hour_time}' and update_time<'{now_hour_time}' and level='{level}';
        '''.format(
            tablename=self.tablename,
            keyword=keyword,
            level=level,
            last_hour_time=last_hour_time,
            now_hour_time=last_hour_time+3600  # 加个小于当前小时的限制，防止此轮刚更新就报告
        ))

        for i, r in enumerate(result):
            result[i] = list(result[i])  # tuple 转 list
            result[i][-1] = r[-1] if r[-1] == "∞" else time.strftime(
                "%Y-%m-%d %H:%M:%S", time.localtime(  # 时间戳转成可读性的时间
                    int(r[-1])
                )
            )

        return result


================================================
FILE: reporter.py
================================================
# -*- coding: utf-8 -*-

# 2018.11.23 11:07:07 by nobody

import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText


class Reporter:
    def __init__(self, email_from, smtp_server, smtp_port, email_username=None, email_password=None):
        self.email_from = email_from
        self.smtp_server = smtp_server
        self.smtp_port = smtp_port
        self.email_username = email_username
        self.email_password = email_password
        self.sent_emails_counter = 0

    def _send_email(self, email, email_to_string):
        mail_server = smtplib.SMTP(host=self.smtp_server, port=self.smtp_port)
        if int(self.smtp_port) != 25:
            mail_server.starttls()
        if self.email_username is not None:
            mail_server.login(self.email_username, self.email_password)
        try:
            mail_server.sendmail(self.email_from, email_to_string, email.as_string())
            self.sent_emails_counter += 1
        finally:
            mail_server.close()

    def alert(self, content, email_to_string):
        email = MIMEMultipart('alternative')
        email['Subject'] = "Github Monitor"
        email['From'] = self.email_from
        email['To'] = email_to_string
        part_html = MIMEText(content, 'html', 'utf-8')
        email.attach(part_html)

        self._send_email(email, email_to_string)


================================================
FILE: spider.py
================================================
# -*- coding: utf-8 -*-

# running by py3.x
# 2018.11.23 11:07:22 by Tr0y

import json
import random
import time
import traceback

from github import Github  # pip install PyGithub
from jinja2 import Template  # pip install jinja2

import mysqlite
from reporter import Reporter


def GenerateKeywords(hosts):
    '''
    hosts * key
    n*n 种组合的关键字
    其中 host 带 @ 的还要加上 smtp 关键字

    host 的格式为：
        www.baidu.com
        或者
        www.baidu.com @

    参数：
        hosts：列表；监控的域名
    返回值：
        keywords：列表；生成的关键字
    '''

    key = ["password", "passwd", "密码"]
    keywords = []

    for h in hosts:
        if "@" in h:
            h = h.split("@")[0] + " smtp"

        for k in key:
            keywords.append(h + " " + k)

    return keywords


def GenerateHTML(results):
    '''
    利用模板生成报告(results)

    参数：
        results：字典；本轮发现的泄露

    返回值：
        c：字符串；生成的 HTML 源码
    '''

    with open(file_url + "template.html", "r") as fp:
        template = Template(fp.read())
        c = template.render(
            results=results,
        )

    return c


class GithubMonitor:
    '''
    Github 泄露监控
    '''

    def __init__(self, keywords, token):
        '''
        初始化

        参数：
            keywords：列表；要搜索的关键字
            token：字符串；用于授权使用 Github 的 api
        '''

        self.keywords = keywords
        self.token = token

        self.no_update = 0  # 连续旧记录的数量

        self.github = Github(self.token)

    def _analysis_page(self, result, keyword):
        '''
        处理搜索页面

        参数：
            result：实例；搜索页面返回的结果
            keyword：字符串；关键字
        '''

        page_id = 0

        # 0-33 页，每页 30 个结果
        # 对应 github 的 1000 个结果的限制
        while page_id < 34:
            try:
                items = result.get_page(page_id)  # 获取页面的详细记录
                ana_result = self._analysis_result(items, keyword)
                if not ana_result:
                    print("[WARNING] 连续 30 条数据都没有更新")
                    print("[WARNING] 在第{}页退出".format(page_id))
                    break

                elif ana_result is None:
                    print("[WARNING] 搜索页面为空")
                    print("[WARNING] 在第{}页退出".format(page_id))
                    break

            except Exception as e:
                err = str(e)

                # 速度过快会触发 github 的爬虫检测
                if "You have triggered an" in err:
                    sleep_time = random.randint(20, 60)
                    print("[WARNING] Too fast! Sleep for {}s".format(sleep_time))
                    time.sleep(sleep_time)  # sleep 一会
                    continue

                elif "timed out" in err:
                    # 出现 time out 则重复运行（page_id 不变）
                    print("[WARNING] Read data time out! Just repeat it")
                    continue

                elif "Server Error" in err:
                    print("[WARNING] Github Server Error! Just repeat it")
                    continue

                elif "Connection aborted." in err:
                    # Connection aborted 则重复
                    print(
                        "[WARNING] Remote end closed connection without response! Just repeat it")
                    continue

                elif "Unexpected problem" in err:
                    print("[WARNING] Unexpected problem! Just repeat it")
                    continue

                else:
                    # 其他错误则发邮件报告异常
                    err = traceback.format_exc()
                    # 打印出来，以便在日志中看到
                    print("[ERROR] Something went wrong!\n" + err)
                    r.alert(
                        "Github Monitor ERROR: Something went wrong!\n\n" + err, admin_email)
                    raise  # 释放异常，强制停止脚本

            page_id += 1

        print("[INFO] 结束关键字: " + keyword + "\n\n")

    def _analysis_result(self, items, keyword):
        '''
        分析搜索页面
        '''

        result_id = 0

        result_count = len(items)
        if not result_count:  # 结果为空
            return None

        while result_id < result_count:
            item = items[result_id]
            try:
                if all(list([kw in item.decoded_content.decode("utf8") for kw in keyword.split(" ")])):
                    negative = False  # 关键字 不 都存在，疑似误报
                else:
                    negative = True

                url = "https://www.github.com/" + \
                    item.repository.full_name + "/blob/master/" + item.path

                update_time = str(int(time.time()))
                record_result = DB.Record(  # 扔给 Record 处理
                    url,
                    item.sha,
                    item.repository.full_name,  # repository
                    item.path,  # filename
                    keyword,
                    update_time,
                    negative,
                )

                if record_result == 3:  # 新泄露
                    self.no_update = 0

                elif record_result == 2:  # 更新泄露
                    self.no_update = 0

                elif record_result == 1:  # 疑似误报
                    self.no_update = 0

                else:  # 旧的数据（一个小时之前爬过）
                    if self.no_update > 30:
                        # 连续 30 条记录都是旧的数据说明后面的数据也是旧的
                        return False

                    self.no_update += 1

            except Exception as e:
                err = str(e)

                # 速度过快触发 github 的爬虫检测就 sleep 一会
                if "You have triggered an" in err:
                    sleep_time = random.randint(20, 60)
                    print("sleep for {}s".format(sleep_time))
                    time.sleep(sleep_time)
                    continue

                elif "timed out" in err:
                    print("[WARNING] Read data time out! Just repeat it")
                    continue

                elif "Server Error" in err:
                    print("[WARNING] Github Server Error! Just repeat it")
                    continue

                elif "Unexpected problem" in err:
                    print("[WARNING] Unexpected problem! Just repeat it")
                    continue

                elif "Connection aborted." in err:
                    # Connection aborted 则重复
                    print(
                        "[WARNING] Remote end closed connection without response! Just repeat it"
                    )
                    continue

                elif "Not Found" in err:
                    # 跳过 Not Found
                    print("[WARNING] File not found! Just pass it")

                else:
                    # 出现其他错误的时候扔给 analysis_page() 中的异常检测处理
                    raise

            result_id += 1

        return True

    def search(self):
        '''
        根据关键字搜索 Github 上的代码
        '''

        for keyword in self.keywords:
            result = self.github.search_code(
                keyword,  # 关键字
                sort="indexed",  # 按最新的索引记录排序
                order="desc",  # 最新的索引放在最前面
            )

            self._analysis_page(result, keyword)


# --------------------- 可能需要修改 ----------------------
file_url = "./"
DB = mysqlite.MySqlite(file_url + "github", "leak")
# -------------------------------------------------------

# 读取配置
# 将配置放在单独的 json 文件中
# 再设置 .gitgnore 防止泄露
with open(file_url + "config.json", "r") as fp:
    config = json.load(fp)

hosts = config["hosts"]  # 监控的 host

admin_email = config["admin_email"]  # 管理员邮箱（报错的时候通知）

token = config["token"]  # Github token
r = Reporter(
    config["sender_email"]["uname"],
    config["sender_email"]["smtp"],
    config["sender_email"]["port"],
    config["sender_email"]["uname"],
    config["sender_email"]["passwd"]
)

keywords = GenerateKeywords(hosts)

Monitor = GithubMonitor(keywords, token)
Monitor.search()

send_flag = 0
results = {}
for keyword in keywords:
    results[keyword] = []
    empty = True
    for level in range(3, 0, -1):
        result = DB.Get_Data(keyword, level)  # 获取上一轮的泄漏记录
        if result:
            send_flag = 1
            results[keyword].append(result)
            empty = False
        else:
            results[keyword].append([(None, ) * 7 + ("∞",)])

    if empty:  # 不汇报无泄漏的关键字
        results.pop(keyword)

DB.conn.close()

if send_flag:  # 为 0 时说明 所有关键字都无泄漏
    print("[Info] Send email")
    c = GenerateHTML(results)

    for email_addr in config["receiver_email"]:
        r.alert(c, email_addr)

    with open(file_url + "result.html", 'w') as fp:
        fp.write(c)
else:
    print("[Info] Nothing to do")

'''
results 示例：

{'qiniu 密码': [('www.github.com/nicoson/CNR-Video-Audit/blob/master/README.md',
   '0f00caf3b2bc2828428b568148b1939bdce5f6c6',
   'nicoson/CNR-Video-Audit',
   'qiniu 密码',
   'README.md',
   '3',
   '1542811078',
   '∞'),

  ('www.github.com/Macr0phag3/github_monitor/blob/master/template.html',
   'e7e35a1fd081e31675a2644fbe91d56356f5e74d',
   'Macr0phag3/github_monitor',
   'qiniu 密码',
   'template.html',
   '3',
   '1542811744',
   '∞'),

  ('www.github.com/Macr0phag3/github_monitor/blob/master/spider.py',
   '2b3fd456e58eb5dc0ee6d72b98a9494f7dda9423',
   'Macr0phag3/github_monitor',
   'qiniu 密码',
   'spider.py',
   '2',
   '1542811745',
   '∞'),

  ('www.github.com/shuaizhupeiqi/shuaizhupeiqi.github.io/blob/master/page/2/index.html',
   '413fc90095643fa9e0acc0e5bdb8a6d7c116fc3a',
   'shuaizhupeiqi/shuaizhupeiqi.github.io',
   'qiniu 密码',
   'page/2/index.html',
   '2',
   '1542811520',
   '∞')]}
'''


================================================
FILE: template.html
================================================
<!-- 2018.11.23 11:07:22 by Tr0y -->

<title>Github Leak Report</title>

{% for keyword, value in results.items() %}
    <h1>命中关键字：{{ keyword }}</h1>

    <h2>新发现泄露(Level 3)</h2>
    <table border="0" cellpadding="5" cellspacing="1" bgcolor="black">
        <tr align="center" bgcolor="white">
          <td>ID</td>
          <td>File SHA</td>
          <td>Code Location</td>
          <td>Last Record Time</td>
        </tr>

        {% for item in value[0] %}
          <tr align="center" bgcolor="white">
              <td>{{ loop.index }}</td>
              <td>{{ item[1] }}</td>
              <td><a href={{ item[0] }}> {{ item[2] }} </a></td>
              <td>{{ item[7] }}</a></td>
          </tr>
        {% endfor %}
    </table>

    <h2>更新泄露(Level 2)</h2>
    <table border="0" cellpadding="5" cellspacing="1" bgcolor="black">
      <tr align="center" bgcolor="white">
          <td>ID</td>
          <td>File SHA</td>
          <td>Code Location</td>
          <td>Last Record Time</td>
      </tr>

      {% for item in value[1] %}
      <tr align="center" bgcolor="white">
          <td>{{ loop.index }}</td>
          <td>{{ item[1] }}</td>
          <td><a href={{ item[0] }}> {{ item[2] }} </a></td>
          <td>{{ item[7] }}</a></td>
      </tr>
      {% endfor %}
    </table>

    <h2>疑似误报(Level 1)</h2>
    <table border="0" cellpadding="5" cellspacing="1" bgcolor="black">
      <tr align="center" bgcolor="white">
          <td>ID</td>
          <td>File SHA</td>
          <td>Code Location</td>
          <td>Last Record Time</td>
      </tr>

      {% for item in value[2] %}
      <tr align="center" bgcolor="white">
          <td>{{ loop.index }}</td>
          <td>{{ item[1] }}</td>
          <td><a href={{ item[0] }}> {{ item[2] }} </a></td>
          <td>{{ item[7] }}</a></td>
      </tr>
      {% endfor %}
    </table>

    <br>
    <hr size=3 noshade/>

{% endfor %}

<!--
0: url
1: sha
2: repository
3: keyword
4: filename
5: level
6: update_time
7: last_record_time
-->

Download .txt

gitextract_n6syoe_j/

├── .gitignore
├── LICENSE
├── README.md
├── leak_test/
│   └── leak_test
├── mysqlite.py
├── reporter.py
├── spider.py
└── template.html

Download .txt

SYMBOL INDEX (20 symbols across 3 files)

FILE: mysqlite.py
  function _get_hour (line 9) | def _get_hour():
  class MySqlite (line 27) | class MySqlite:
    method __init__ (line 28) | def __init__(self, dbname, tablename):
    method _create (line 44) | def _create(self):  #
    method _select (line 63) | def _select(self, sql):
    method _insert (line 79) | def _insert(self, url, sha, repository, filename, keyword, level, upda...
    method _update (line 110) | def _update(self, url, sha, repository, filename, keyword, level, upda...
    method Record (line 141) | def Record(self, url, sha, repository, filename, keyword, update_time,...
    method Get_Data (line 185) | def Get_Data(self, keyword, level):

FILE: reporter.py
  class Reporter (line 10) | class Reporter:
    method __init__ (line 11) | def __init__(self, email_from, smtp_server, smtp_port, email_username=...
    method _send_email (line 19) | def _send_email(self, email, email_to_string):
    method alert (line 31) | def alert(self, content, email_to_string):

FILE: spider.py
  function GenerateKeywords (line 18) | def GenerateKeywords(hosts):
  function GenerateHTML (line 48) | def GenerateHTML(results):
  class GithubMonitor (line 68) | class GithubMonitor:
    method __init__ (line 73) | def __init__(self, keywords, token):
    method _analysis_page (line 89) | def _analysis_page(self, result, keyword):
    method _analysis_result (line 158) | def _analysis_result(self, items, keyword):
    method search (line 248) | def search(self):

Download .json

Condensed preview — 8 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (28K chars).

[
  {
    "path": ".gitignore",
    "chars": 50,
    "preview": "config.json\nresult.html\ngithub\n__pycache__/\n*.pyc\n"
  },
  {
    "path": "LICENSE",
    "chars": 1067,
    "preview": "MIT License\n\nCopyright (c) 2019 Macr0phag3\n\nPermission is hereby granted, free of charge, to any person obtaining a copy"
  },
  {
    "path": "README.md",
    "chars": 2465,
    "preview": "# github_monitor\n\n## 项目介绍\n由于很多猪队友的存在，公司敏感信息通过 GitHub 泄露出去是很常见的。这个项目主要根据关键字与 hosts 生成的关键词，利用 github 提供的 api，监控 git 泄漏，并在检"
  },
  {
    "path": "leak_test/leak_test",
    "chars": 68,
    "preview": "host = http://yin126.com/\npassword = \"test in 2019-03-03 13:53:44\"\n\n"
  },
  {
    "path": "mysqlite.py",
    "chars": 5916,
    "preview": "# -*- coding: utf-8 -*-\n\n# 2018.11.23 11:07:22 by Tr0y\n\nimport sqlite3\nimport time\n\n\ndef _get_hour():\n    '''\n    返回上个小时"
  },
  {
    "path": "reporter.py",
    "chars": 1377,
    "preview": "# -*- coding: utf-8 -*-\n\n# 2018.11.23 11:07:07 by nobody\n\nimport smtplib\nfrom email.mime.multipart import MIMEMultipart\n"
  },
  {
    "path": "spider.py",
    "chars": 9431,
    "preview": "# -*- coding: utf-8 -*-\n\n# running by py3.x\n# 2018.11.23 11:07:22 by Tr0y\n\nimport json\nimport random\nimport time\nimport "
  },
  {
    "path": "template.html",
    "chars": 2014,
    "preview": "<!-- 2018.11.23 11:07:22 by Tr0y -->\n\n<title>Github Leak Report</title>\n\n{% for keyword, value in results.items() %}\n   "
  }
]

About this extraction

This page contains the full source code of the Macr0phag3/GithubMonitor GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 8 files (21.9 KB), approximately 6.6k tokens, and a symbol index with 20 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo