Repository: twiny/spidy
Branch: main
Commit: fc5a8447c142
Files: 23
Total size: 46.0 KB
Directory structure:
gitextract_sc4cxpbk/
├── .github/
│ └── ISSUE_TEMPLATE/
│ ├── bug_report.md
│ ├── custom.md
│ └── feature_request.md
├── .gitignore
├── LICENSE
├── README.md
├── cmd/
│ └── spidy/
│ ├── api/
│ │ ├── spider.go
│ │ └── version
│ └── main.go
├── config/
│ └── example.config.yaml
├── go.mod
├── go.sum
└── internal/
├── pkg/
│ ├── hbyte/
│ │ └── hbyte.go
│ └── spider/
│ └── v1/
│ ├── domain.go
│ ├── page.go
│ ├── setting.go
│ ├── store.go
│ ├── string_replacer.go
│ ├── tld_list.go
│ ├── utils.go
│ └── writer.go
└── service/
├── cache/
│ └── cache.go
└── writer/
└── csv_writer.go
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: ''
assignees: ''
---
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Go to '...'
2. Click on '....'
3. Scroll down to '....'
4. See error
**Expected behavior**
A clear and concise description of what you expected to happen.
**Screenshots**
If applicable, add screenshots to help explain your problem.
**Desktop (please complete the following information):**
- OS: [e.g. iOS]
- Browser [e.g. chrome, safari]
- Version [e.g. 22]
**Smartphone (please complete the following information):**
- Device: [e.g. iPhone6]
- OS: [e.g. iOS8.1]
- Browser [e.g. stock browser, safari]
- Version [e.g. 22]
**Additional context**
Add any other context about the problem here.
================================================
FILE: .github/ISSUE_TEMPLATE/custom.md
================================================
---
name: Custom issue template
about: Describe this issue template's purpose here.
title: ''
labels: ''
assignees: ''
---
================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: ''
assignees: ''
---
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.
================================================
FILE: .gitignore
================================================
cmd/tests
config/config.yaml
log/
result/
store/
bin/
bbolt/
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2022 Twiny
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
## Spidy
A tool that crawl websites to find domain names and checks thier availiabity.
### Install
```sh
git clone https://github.com/twiny/spidy.git
cd ./spidy
# build
go build -o bin/spidy -v cmd/spidy/main.go
# run
./bin/spidy -c config/config.yaml -u https://github.com
```
## Usage
```sh
NAME:
Spidy - Domain name scraper
USAGE:
spidy [global options] command [command options] [arguments...]
VERSION:
2.0.0
COMMANDS:
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--config path, -c path path to config file
--help, -h show help (default: false)
--urls urls, -u urls urls of page to scrape (accepts multiple inputs)
--version, -v print the version (default: false)
```
## Configuration
```yaml
# main crawler config
crawler:
max_depth: 10 # max depth of pages to visit per website.
# filter: [] # regexp filter
rate_limit: "1/5s" # 1 request per 5 sec
max_body_size: "20MB" # max page body size
user_agents: # array of user-agents
- "Spidy/2.1; +https://github.com/ twiny/spidy"
# proxies: [] # array of proxy. http(s), SOCKS5
# Logs
log:
rotate: 7 # log rotation
path: "./log" # log directory
# Store
store:
ttl: "24h" # keep cache for 24h
path: "./store" # store directory
# Results
result:
path: ./result # result directory
parralle: 3 # number of concurrent workers
timeout: "5m" # request timeout
tlds: ["biz", "cc", "com", "edu", "info", "net", "org", "tv"] # array of domain extension to check.
```
## TODO
- [ ] Add support to more `writers`.
- [ ] Add terminal logging.
- [ ] Add test cases.
## Issues
NOTE: This package is provided "as is" with no guarantee. Use it at your own risk and always test it yourself before using it in a production environment. If you find any issues, please create a new issue.
================================================
FILE: cmd/spidy/api/spider.go
================================================
package api
import (
"context"
_ "embed"
"fmt"
"log"
"net/http"
"os"
"os/signal"
"strconv"
"sync"
"syscall"
//
"github.com/twiny/spidy/v2/internal/pkg/spider/v1"
"github.com/twiny/spidy/v2/internal/service/cache"
"github.com/twiny/spidy/v2/internal/service/writer"
//
"github.com/twiny/domaincheck"
"github.com/twiny/flog"
"github.com/twiny/wbot"
)
//go:embed version
var Version string
// Spider
type Spider struct {
wg *sync.WaitGroup
setting *spider.Setting
bot *wbot.WBot
pages chan *spider.Page
check *domaincheck.Checker
store spider.Storage
write spider.Writer
log *flog.Logger
}
// NewSpider
func NewSpider(fp string) (*Spider, error) {
// get settings
setting := spider.ParseSetting(fp)
// crawler opts
opts := []wbot.Option{
wbot.SetParallel(setting.Parralle),
wbot.SetMaxDepth(setting.Crawler.MaxDepth),
wbot.SetRateLimit(setting.Crawler.Limit.Rate, setting.Crawler.Limit.Interval),
wbot.SetMaxBodySize(setting.Crawler.MaxBodySize),
wbot.SetUserAgents(setting.Crawler.UserAgents),
wbot.SetProxies(setting.Crawler.Proxies),
}
bot := wbot.NewWBot(opts...)
check, err := domaincheck.NewChecker()
if err != nil {
return nil, err
}
// store
store, err := cache.NewCache(setting.Store.TTL, setting.Store.Path)
if err != nil {
return nil, err
}
// logger
log, err := flog.NewLogger(setting.Log.Path, "spidy", setting.Log.Rotate)
if err != nil {
return nil, err
}
write, err := writer.NewCSVWriter(setting.Result.Path)
if err != nil {
return nil, err
}
return &Spider{
wg: &sync.WaitGroup{},
setting: setting,
bot: bot,
pages: make(chan *spider.Page, setting.Parralle),
check: check,
store: store,
write: write,
log: log,
}, nil
}
// Start
func (s *Spider) Start(links []string) error {
// go crawl
s.wg.Add(len(links))
for _, link := range links {
go func(l string) {
defer s.wg.Done()
//
if err := s.bot.Crawl(l); err != nil {
s.log.Error(err.Error(), map[string]string{"url": l})
}
}(link)
}
// check domains
s.wg.Add(s.setting.Parralle)
for i := 0; i < s.setting.Parralle; i++ {
go func() {
defer s.wg.Done()
// results
for res := range s.bot.Stream() {
// if response is ok
if res.Status != http.StatusOK {
s.log.Info("bad HTTP status", map[string]string{
"url": res.URL.String(),
"status": strconv.Itoa(res.Status),
})
continue
}
// extract domains
domains := spider.FindDomains(res.Body)
// check availability
for _, domain := range domains {
root := fmt.Sprintf("%s.%s", domain.Name, domain.TLD)
// check if allowed extension
if len(s.setting.TLDs) > 0 {
if ok := s.setting.TLDs[domain.TLD]; !ok {
s.log.Info("unsupported domain", map[string]string{
"domain": root,
"url": res.URL.String(),
})
continue
}
}
// skip if already checked
if s.store.HasChecked(root) {
s.log.Info("already checked", map[string]string{
"domain": root,
"url": res.URL.String(),
})
continue
}
//
ctx, cancel := context.WithTimeout(context.Background(), s.setting.Timeout)
defer cancel()
status, err := s.check.Check(ctx, root)
if err != nil {
s.log.Error(err.Error(), map[string]string{
"domain": root,
"url": res.URL.String(),
})
continue
}
// save domain
if err := s.write.Write(&spider.Domain{
URL: res.URL.String(),
Name: domain.Name,
TLD: domain.TLD,
Status: status.String(),
}); err != nil {
s.log.Error(err.Error(), map[string]string{
"domain": root,
"url": res.URL.String(),
})
continue
}
// terminal print
fmt.Printf("[Spidy] == domain: %s - status %s\n", root, status.String())
}
}
}()
}
s.wg.Wait()
return nil
}
// Shutdown
func (s *Spider) Shutdown() error {
// attempt graceful shutdown
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM, syscall.SIGQUIT)
<-sigs
log.Println("shutting down ...")
// 2nd ctrl+c kills program
go func() {
sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM, syscall.SIGQUIT)
<-sigs
log.Println("killing program ...")
os.Exit(0)
}()
s.bot.Close()
s.log.Close()
if err := s.store.Close(); err != nil {
return err
}
os.Exit(0)
return nil
}
================================================
FILE: cmd/spidy/api/version
================================================
2.0.0
================================================
FILE: cmd/spidy/main.go
================================================
package main
import (
"log"
"os"
//
"github.com/twiny/spidy/v2/cmd/spidy/api"
//
"github.com/urfave/cli/v2"
)
// main
func main() {
app := &cli.App{
Name: "Spidy",
HelpName: "spidy",
Usage: "Domain name scraper",
Version: api.Version,
Flags: []cli.Flag{
&cli.StringFlag{
Name: "config",
Aliases: []string{"c"},
Usage: "`path` to config file",
Required: true,
},
&cli.StringSliceFlag{
Name: "urls",
Aliases: []string{"u"},
Usage: "`urls` of page to scrape",
Required: true,
},
},
Action: func(c *cli.Context) error {
s, err := api.NewSpider(c.String("config"))
if err != nil {
return err
}
go s.Shutdown()
return s.Start(c.StringSlice("urls"))
},
}
if err := app.Run(os.Args); err != nil {
log.Println(err)
return
}
}
================================================
FILE: config/example.config.yaml
================================================
crawler:
max_depth: 10
# filter: []
rate_limit: "1/5s"
max_body_size: "20MB"
user_agents:
- "Spidy/2.1; +https://github.com/twiny/spidy"
# proxies: []
log:
rotate: 7
path: "./log"
store:
ttl: "24h"
path: "./store"
result:
path: ./result
parralle: 3
timeout: "5m"
tlds: ["biz", "cc", "com", "edu", "info", "net", "org", "tv"]
================================================
FILE: go.mod
================================================
module github.com/twiny/spidy/v2
go 1.18
require (
github.com/PuerkitoBio/goquery v1.8.0
github.com/twiny/carbon v1.0.1
github.com/twiny/domaincheck v0.1.0
github.com/twiny/flog v1.0.3
github.com/twiny/wbot v0.1.5
github.com/urfave/cli/v2 v2.10.3
golang.org/x/net v0.0.0-20220513224357-95641704303c
gopkg.in/yaml.v3 v3.0.1
)
require (
github.com/andybalholm/cascadia v1.3.1 // indirect
github.com/benbjohnson/clock v1.3.0 // indirect
github.com/cespare/xxhash v1.1.0 // indirect
github.com/cespare/xxhash/v2 v2.1.1 // indirect
github.com/cpuguy83/go-md2man/v2 v2.0.2 // indirect
github.com/dgraph-io/badger/v3 v3.2103.2 // indirect
github.com/dgraph-io/ristretto v0.1.0 // indirect
github.com/dustin/go-humanize v1.0.0 // indirect
github.com/fatih/color v1.10.0 // indirect
github.com/goccy/go-yaml v1.9.4 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b // indirect
github.com/golang/groupcache v0.0.0-20190702054246-869f871628b6 // indirect
github.com/golang/protobuf v1.3.1 // indirect
github.com/golang/snappy v0.0.3 // indirect
github.com/google/flatbuffers v1.12.1 // indirect
github.com/klauspost/compress v1.12.3 // indirect
github.com/mattn/go-colorable v0.1.8 // indirect
github.com/mattn/go-isatty v0.0.12 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/russross/blackfriday/v2 v2.1.0 // indirect
github.com/twiny/ratelimit v0.0.0-20220509163414-256d3376b0ac // indirect
github.com/twiny/whois/v2 v2.0.1 // indirect
github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673 // indirect
go.opencensus.io v0.22.5 // indirect
golang.org/x/sys v0.0.0-20211216021012-1d35b9e2eb4e // indirect
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
)
================================================
FILE: go.sum
================================================
cloud.google.com/go v0.26.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw=
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
github.com/OneOfOne/xxhash v1.2.2 h1:KMrpdQIwFcEqXDklaen+P1axHaj9BSKzvpUUfnHldSE=
github.com/OneOfOne/xxhash v1.2.2/go.mod h1:HSdplMjZKSmBqAxg5vPj2TmRDmfkzw+cTzAElWljhcU=
github.com/PuerkitoBio/goquery v1.8.0 h1:PJTF7AmFCFKk1N6V6jmKfrNH9tV5pNE6lZMkG0gta/U=
github.com/PuerkitoBio/goquery v1.8.0/go.mod h1:ypIiRMtY7COPGk+I/YbZLbxsxn9g5ejnI2HSMtkjZvI=
github.com/andybalholm/cascadia v1.3.1 h1:nhxRkql1kdYCc8Snf7D5/D3spOX+dBgjA6u8x004T2c=
github.com/andybalholm/cascadia v1.3.1/go.mod h1:R4bJ1UQfqADjvDa4P6HZHLh/3OxWWEqc0Sk8XGwHqvA=
github.com/armon/consul-api v0.0.0-20180202201655-eb2c6b5be1b6/go.mod h1:grANhF5doyWs3UAsr3K4I6qtAmlQcZDesFNEHPZAzj8=
github.com/benbjohnson/clock v1.3.0 h1:ip6w0uFQkncKQ979AypyG0ER7mqUSBdKLOgAle/AT8A=
github.com/benbjohnson/clock v1.3.0/go.mod h1:J11/hYXuz8f4ySSvYwY0FKfm+ezbsZBKZxNJlLklBHA=
github.com/cespare/xxhash v1.1.0 h1:a6HrQnmkObjyL+Gs60czilIUGqrzKutQD6XZog3p+ko=
github.com/cespare/xxhash v1.1.0/go.mod h1:XrSqR1VqqWfGrhpAt58auRo0WTKS1nRRg3ghfAqPWnc=
github.com/cespare/xxhash/v2 v2.1.1 h1:6MnRN8NT7+YBpUIWxHtefFZOKTAPgGjpQSxqLNn0+qY=
github.com/cespare/xxhash/v2 v2.1.1/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
github.com/client9/misspell v0.3.4/go.mod h1:qj6jICC3Q7zFZvVWo7KLAzC3yx5G7kyvSDkc90ppPyw=
github.com/coreos/etcd v3.3.10+incompatible/go.mod h1:uF7uidLiAD3TWHmW31ZFd/JWoc32PjwdhPthX9715RE=
github.com/coreos/go-etcd v2.0.0+incompatible/go.mod h1:Jez6KQU2B/sWsbdaef3ED8NzMklzPG4d5KIOhIy30Tk=
github.com/coreos/go-semver v0.2.0/go.mod h1:nnelYz7RCh+5ahJtPPxZlU+153eP4D4r3EedlOD2RNk=
github.com/cpuguy83/go-md2man v1.0.10/go.mod h1:SmD6nW6nTyfqj6ABTjUi3V3JVMnlJmwcJI5acqYI6dE=
github.com/cpuguy83/go-md2man/v2 v2.0.2 h1:p1EgwI/C7NhT0JmVkwCD2ZBK8j4aeHQX2pMHHBfMQ6w=
github.com/cpuguy83/go-md2man/v2 v2.0.2/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o=
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/dgraph-io/badger/v3 v3.2103.2 h1:dpyM5eCJAtQCBcMCZcT4UBZchuTJgCywerHHgmxfxM8=
github.com/dgraph-io/badger/v3 v3.2103.2/go.mod h1:RHo4/GmYcKKh5Lxu63wLEMHJ70Pac2JqZRYGhlyAo2M=
github.com/dgraph-io/ristretto v0.1.0 h1:Jv3CGQHp9OjuMBSne1485aDpUkTKEcUqF+jm/LuerPI=
github.com/dgraph-io/ristretto v0.1.0/go.mod h1:fux0lOrBhrVCJd3lcTHsIJhq1T2rokOu6v9Vcb3Q9ug=
github.com/dgryski/go-farm v0.0.0-20190423205320-6a90982ecee2 h1:tdlZCpZ/P9DhczCTSixgIKmwPv6+wP5DGjqLYw5SUiA=
github.com/dgryski/go-farm v0.0.0-20190423205320-6a90982ecee2/go.mod h1:SqUrOPUnsFjfmXRMNPybcSiG0BgUW2AuFH8PAnS2iTw=
github.com/dustin/go-humanize v1.0.0 h1:VSnTsYCnlFHaM2/igO1h6X3HA71jcobQuxemgkq4zYo=
github.com/dustin/go-humanize v1.0.0/go.mod h1:HtrtbFcZ19U5GC7JDqmcUSB87Iq5E25KnS6fMYU6eOk=
github.com/fatih/color v1.10.0 h1:s36xzo75JdqLaaWoiEHk767eHiwo0598uUxyfiPkDsg=
github.com/fatih/color v1.10.0/go.mod h1:ELkj/draVOlAH/xkhN6mQ50Qd0MPOk5AAr3maGEBuJM=
github.com/fsnotify/fsnotify v1.4.7/go.mod h1:jwhsz4b93w/PPRr/qN1Yymfu8t87LnFCMoQvtojpjFo=
github.com/go-playground/assert/v2 v2.0.1/go.mod h1:VDjEfimB/XKnb+ZQfWdccd7VUvScMdVu0Titje2rxJ4=
github.com/go-playground/locales v0.13.0 h1:HyWk6mgj5qFqCT5fjGBuRArbVDfE4hi8+e8ceBS/t7Q=
github.com/go-playground/locales v0.13.0/go.mod h1:taPMhCMXrRLJO55olJkUXHZBHCxTMfnGwq/HNwmWNS8=
github.com/go-playground/universal-translator v0.17.0 h1:icxd5fm+REJzpZx7ZfpaD876Lmtgy7VtROAbHHXk8no=
github.com/go-playground/universal-translator v0.17.0/go.mod h1:UkSxE5sNxxRwHyU+Scu5vgOQjsIJAF8j9muTVoKLVtA=
github.com/go-playground/validator/v10 v10.4.1 h1:pH2c5ADXtd66mxoE0Zm9SUhxE20r7aM3F26W0hOn+GE=
github.com/go-playground/validator/v10 v10.4.1/go.mod h1:nlOn6nFhuKACm19sB/8EGNn9GlaMV7XkbRSipzJ0Ii4=
github.com/goccy/go-yaml v1.9.4 h1:S0GCYjwHKVI6IHqio7QWNKNThUl6NLzFd/g8Z65Axw8=
github.com/goccy/go-yaml v1.9.4/go.mod h1:U/jl18uSupI5rdI2jmuCswEA2htH9eXfferR3KfscvA=
github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q=
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b h1:VKtxabqXZkF25pY9ekfRL6a582T4P37/31XEstQ5p58=
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b/go.mod h1:SBH7ygxi8pfUlaOkMMuAQtPIUF8ecWP5IEl/CR7VP2Q=
github.com/golang/groupcache v0.0.0-20190702054246-869f871628b6 h1:ZgQEtGgCBiWRM39fZuwSd1LwSqqSW0hOdXCYYDX0R3I=
github.com/golang/groupcache v0.0.0-20190702054246-869f871628b6/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc=
github.com/golang/mock v1.1.1/go.mod h1:oTYuIxOrZwtPieC+H1uAHpcLFnEyAGVDL/k47Jfbm0A=
github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/golang/protobuf v1.3.1 h1:YF8+flBXS5eO826T4nzqPrxfhQThhXl0YzfuUPu4SBg=
github.com/golang/protobuf v1.3.1/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/golang/snappy v0.0.3 h1:fHPg5GQYlCeLIPB9BZqMVR5nR9A+IM5zcgeTdjMYmLA=
github.com/golang/snappy v0.0.3/go.mod h1:/XxbfmMg8lxefKM7IXC3fBNl/7bRcc72aCRzEWrmP2Q=
github.com/google/flatbuffers v1.12.1 h1:MVlul7pQNoDzWRLTw5imwYsl+usrS1TXG2H4jg6ImGw=
github.com/google/flatbuffers v1.12.1/go.mod h1:1AeVuKshWv4vARoZatz6mlQ0JxURH0Kv5+zNeJKJCa8=
github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU=
github.com/google/go-cmp v0.5.4/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/hashicorp/hcl v1.0.0/go.mod h1:E5yfLk+7swimpb2L/Alb/PJmXilQ/rhwaUYs4T20WEQ=
github.com/inconshreveable/mousetrap v1.0.0/go.mod h1:PxqpIevigyE2G7u3NXJIT2ANytuPF1OarO4DADm73n8=
github.com/kisielk/errcheck v1.5.0/go.mod h1:pFxgyoBC7bSaBwPgfKdkLd5X25qrDl4LWUI2bnpBCr8=
github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck=
github.com/klauspost/compress v1.12.3 h1:G5AfA94pHPysR56qqrkO2pxEexdDzrpFJ6yt/VqWxVU=
github.com/klauspost/compress v1.12.3/go.mod h1:8dP1Hq4DHOhN9w426knH3Rhby4rFm6D8eO+e+Dq5Gzg=
github.com/kr/pretty v0.1.0 h1:L/CwN0zerZDmRFUapSPitk6f+Q3+0za1rQkzVuMiMFI=
github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo=
github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
github.com/kr/text v0.1.0 h1:45sCR5RtlFHMR4UwH9sdQ5TC8v0qDQCHnXt+kaKSTVE=
github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI=
github.com/leodido/go-urn v1.2.0 h1:hpXL4XnriNwQ/ABnpepYM/1vCLWNDfUNts8dX3xTG6Y=
github.com/leodido/go-urn v1.2.0/go.mod h1:+8+nEpDfqqsY+g338gtMEUOtuK+4dEMhiQEgxpxOKII=
github.com/magiconair/properties v1.8.0/go.mod h1:PppfXfuXeibc/6YijjN8zIbojt8czPbwD3XqdrwzmxQ=
github.com/mattn/go-colorable v0.1.8 h1:c1ghPdyEDarC70ftn0y+A/Ee++9zz8ljHG1b13eJ0s8=
github.com/mattn/go-colorable v0.1.8/go.mod h1:u6P/XSegPjTcexA+o6vUJrdnUu04hMope9wVRipJSqc=
github.com/mattn/go-isatty v0.0.12 h1:wuysRhFDzyxgEmMf5xjvJ2M9dZoWAXNNr5LSBS7uHXY=
github.com/mattn/go-isatty v0.0.12/go.mod h1:cbi8OIDigv2wuxKPP5vlRcQ1OAZbq2CE4Kysco4FUpU=
github.com/mitchellh/go-homedir v1.1.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0=
github.com/mitchellh/mapstructure v1.1.2/go.mod h1:FVVH3fgwuzCH5S8UJGiWEs2h04kUh9fWfEaFds41c1Y=
github.com/pelletier/go-toml v1.2.0/go.mod h1:5z9KED0ma1S8pY6P1sdut58dfprrGBbd/94hg7ilaic=
github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/russross/blackfriday v1.5.2/go.mod h1:JO/DiYxRf+HjHt06OyowR9PTA263kcR/rfWxYHBV53g=
github.com/russross/blackfriday/v2 v2.1.0 h1:JIOH55/0cWyOuilr9/qlrm0BSXldqnqwMsf35Ld67mk=
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/spaolacci/murmur3 v0.0.0-20180118202830-f09979ecbc72/go.mod h1:JwIasOWyU6f++ZhiEuf87xNszmSA2myDM2Kzu9HwQUA=
github.com/spaolacci/murmur3 v1.1.0 h1:7c1g84S4BPRrfL5Xrdp6fOJ206sU9y293DDHaoy0bLI=
github.com/spaolacci/murmur3 v1.1.0/go.mod h1:JwIasOWyU6f++ZhiEuf87xNszmSA2myDM2Kzu9HwQUA=
github.com/spf13/afero v1.1.2/go.mod h1:j4pytiNVoe2o6bmDsKpLACNPDBIoEAkihy7loJ1B0CQ=
github.com/spf13/cast v1.3.0/go.mod h1:Qx5cxh0v+4UWYiBimWS+eyWzqEqokIECu5etghLkUJE=
github.com/spf13/cobra v0.0.5/go.mod h1:3K3wKZymM7VvHMDS9+Akkh4K60UwM26emMESw8tLCHU=
github.com/spf13/jwalterweatherman v1.0.0/go.mod h1:cQK4TGJAtQXfYWX+Ddv3mKDzgVb68N+wFjFa4jdeBTo=
github.com/spf13/pflag v1.0.3/go.mod h1:DYY7MBk1bdzusC3SYhjObp+wFpr4gzcvqqNjLnInEg4=
github.com/spf13/viper v1.3.2/go.mod h1:ZiWeW+zYFKm7srdB9IoDzzZXaJaI5eL9QjNiN/DMA2s=
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs=
github.com/stretchr/testify v1.4.0 h1:2E4SXV/wtOkTonXsotYi4li6zVWxYlZuYNCXe9XRJyk=
github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4=
github.com/twiny/carbon v1.0.1 h1:srGnk3N4KbAvCVgieWzYgZkLoBYGjnerTdxqzPy3TQs=
github.com/twiny/carbon v1.0.1/go.mod h1:Ymh/hwZd8cZWYWnSL9xqSaQMd955k9EJx4/YS8wVdv0=
github.com/twiny/domaincheck v0.1.0 h1:ByFbTKzdLymEaEkqAoA+vFuBxi33zOOyXCOTvvAm95c=
github.com/twiny/domaincheck v0.1.0/go.mod h1:vlDqt80kuclqhfG3KrTu/rJd7aZe5P6viJ2acVuUvL4=
github.com/twiny/flog v1.0.3 h1:iBTf+yEm/maBTJYFaMgD2lXIE5g7gSZnaTnmVXbs1tI=
github.com/twiny/flog v1.0.3/go.mod h1:Hi9bzahz0Zmw30XiBT9oqWOrc10ive6L42Owwz02Vp8=
github.com/twiny/ratelimit v0.0.0-20220509163414-256d3376b0ac h1:nT+8DFvrU5Nu3Be2bK7LooU8AslFJeypQoAF+wm1CM0=
github.com/twiny/ratelimit v0.0.0-20220509163414-256d3376b0ac/go.mod h1:C589KqlnfcMeRAJ+evrNJwSf9ddkXO926hRDtgjjoYM=
github.com/twiny/wbot v0.1.5 h1:yTfTv6+tmVHik6aY2DLuJZUG5/WPP37oE2TAgXkXRno=
github.com/twiny/wbot v0.1.5/go.mod h1:JNeqtjncCXLALd0qaKw2q/4kC8F34weLiyf9QOljzQk=
github.com/twiny/whois/v2 v2.0.1 h1:jDqkiq0wv2qdm9d/bquhQpg7AhJDYf89g7ozZElSTuA=
github.com/twiny/whois/v2 v2.0.1/go.mod h1:UeyP4HmWFruXXuYQ722s/BnWgwxi7fRb/bk9Fnqm7OA=
github.com/ugorji/go/codec v0.0.0-20181204163529-d75b2dcb6bc8/go.mod h1:VFNgLljTbGfSG7qAOspJ7OScBnGdDN/yBr0sguwnwf0=
github.com/urfave/cli/v2 v2.10.3 h1:oi571Fxz5aHugfBAJd5nkwSk3fzATXtMlpxdLylSCMo=
github.com/urfave/cli/v2 v2.10.3/go.mod h1:f8iq5LtQ/bLxafbdBSLPPNsgaW0l/2fYYEHhAyPlwvo=
github.com/xordataexchange/crypt v0.0.3-0.20170626215501-b2862e3d0a77/go.mod h1:aYKd//L2LvnjZzWKhF00oedf4jCCReLcmhLdhm1A27Q=
github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673 h1:bAn7/zixMGCfxrRTfdpNzjtPYqr8smhKouy9mxVdGPU=
github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673/go.mod h1:N3UwUGtsrSj3ccvlPHLoLsHnpR27oXr4ZE984MbSER8=
github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
go.opencensus.io v0.22.5 h1:dntmOdLpSpHlVqbW5Eay97DelsZHe+55D+xC6i0dDS0=
go.opencensus.io v0.22.5/go.mod h1:5pWMHQbX5EPX2/62yrJeAkowc+lfs/XD7Uxpq3pI6kk=
golang.org/x/crypto v0.0.0-20181203042331-505ab145d0a9/go.mod h1:6SG95UA2DQfeDnfUPMdvaQW0Q7yPrPDi9nlGo2tz2b4=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9 h1:psW17arqaxU48Z5kZ0CQnkZWQJsqcURM6tKiBApRjXI=
golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto=
golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA=
golang.org/x/lint v0.0.0-20181026193005-c67002cb31c3/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE=
golang.org/x/lint v0.0.0-20190227174305-5b3e6a55c961/go.mod h1:wehouNa3lNwaWXcvxsM5YxQ5yQlVC4a0KAMCusXpPoU=
golang.org/x/lint v0.0.0-20190313153728-d0100b6bd8b3/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc=
golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20180826012351-8a410e7b638d/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20190213061140-3a22650c66bd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20190311183353-d8887717615a/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
golang.org/x/net v0.0.0-20200226121028-0de0cce0169b/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s=
golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU=
golang.org/x/net v0.0.0-20210916014120-12bc252f5db8/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y=
golang.org/x/net v0.0.0-20220513224357-95641704303c h1:nF9mHSvoKBLkQNQhJZNsc66z2UzAMUbLGjC95CF3pU0=
golang.org/x/net v0.0.0-20220513224357-95641704303c/go.mod h1:CfG3xpIq0wQ8r1q4Su4UZFWDARRcnwPjda9FqA0JpMk=
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20190227155943-e225da77a7e6/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20181205085412-a5c9d58dba9a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20190502145724-3ef323f4f1fd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20200116001909-b77594299b42/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20200223170610-d5e6a3e2c0ae/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210124154548-22da62e12c0c/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20210423082822-04245dca01da/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20211216021012-1d35b9e2eb4e h1:fLOSk5Q00efkSvAm+4xcoXD+RRmLmmulPn5I3Y9F2EM=
golang.org/x/sys v0.0.0-20211216021012-1d35b9e2eb4e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.2/go.mod h1:bEr9sfX3Q8Zfm5fL9x+3itogRgK3+ptLWKqgva+5dAk=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20190114222345-bf090417da8b/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20190226205152-f727befe758c/go.mod h1:9Yl7xja0Znq3iFh3HoIrodX9oNMXvdceNzlUR8zjMvY=
golang.org/x/tools v0.0.0-20190311212946-11955173bddd/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs=
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo=
golang.org/x/tools v0.0.0-20200619180055-7c47624df98f/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE=
golang.org/x/tools v0.0.0-20210106214847-113979e3529a/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 h1:go1bK/D/BFZV2I8cIQd1NKEZ+0owSTG1fDTci4IqFcE=
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
google.golang.org/appengine v1.1.0/go.mod h1:EbEs0AVv82hx2wNQdGPgUI5lhzA/G0D9YwlJXL52JkM=
google.golang.org/appengine v1.4.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4=
google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8/go.mod h1:JiN7NxoALGmiZfu7CAH4rXhgtRTLTxftemlI0sWmxmc=
google.golang.org/genproto v0.0.0-20190425155659-357c62f0e4bb/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE=
google.golang.org/grpc v1.19.0/go.mod h1:mqu4LbDTu4XGKhr4mRzUsmM4RtVoemTSY81AxZiDr8c=
google.golang.org/grpc v1.20.1/go.mod h1:10oTOabMzJvdu6/UiuZezV6QK5dSlG84ov/aaiqXj38=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15 h1:YR8cESwS4TdDjEe65xsg0ogRM/Nc3DYOhEAlW+xobZo=
gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v2 v2.2.2 h1:ZCJp+EgiOT7lHqUV2J862kp8Qj64Jo6az82+3Td9dZw=
gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4=
================================================
FILE: internal/pkg/hbyte/hbyte.go
================================================
package hbyte
import (
"fmt"
"strings"
)
const (
b = "b"
kb = "kb"
mb = "mb"
gb = "gb"
tb = "tb"
)
// type BYTE int64
const (
B int64 = 1 << (10 * iota)
KB
MB
GB
TB
)
// Parse
func Parse(s string) int64 {
// lower case
s = strings.ToLower(s)
var n int64
var unit string
fmt.Sscanf(s, "%d%s", &n, &unit)
switch unit {
case b:
return n
case kb:
return n * KB
case mb:
return n * MB
case gb:
return n * GB
case tb:
return n * TB
default:
return n
}
}
// String
func String(n int64) string {
switch {
case n >= TB:
return fmt.Sprintf("%d %s", n/TB, tb)
case n >= GB:
return fmt.Sprintf("%d %s", n/GB, gb)
case n >= MB:
return fmt.Sprintf("%d %s", n/MB, mb)
case n >= KB:
return fmt.Sprintf("%d %s", n/KB, kb)
default:
return fmt.Sprintf("%d %s", n, b)
}
}
================================================
FILE: internal/pkg/spider/v1/domain.go
================================================
package spider
// Domain
type Domain struct {
URL string
Name string
TLD string
Status string
}
// CSVRow
func (d Domain) CSVRow() []string {
var row []string
return append(row, d.URL, d.Name, d.TLD, d.Status)
}
================================================
FILE: internal/pkg/spider/v1/page.go
================================================
package spider
import "net/url"
// Page
type Page struct {
URL *url.URL
Status int
Body []byte
}
================================================
FILE: internal/pkg/spider/v1/setting.go
================================================
package spider
import (
"io/ioutil"
"runtime"
"strconv"
"strings"
"time"
//
"github.com/twiny/spidy/v2/internal/pkg/hbyte"
"gopkg.in/yaml.v3"
)
// default cores
var core = func() int {
c := runtime.NumCPU()
if c == 1 {
return c
}
return c - 1
}()
// defaultSetting
var defaultSetting = &Setting{
Crawler: struct {
MaxDepth int32
Filter []string
Limit struct {
Rate int
Interval time.Duration
}
MaxBodySize int64
UserAgents []string
Proxies []string
}{
MaxDepth: 10,
Filter: []string{},
Limit: struct {
Rate int
Interval time.Duration
}{
Rate: 1,
Interval: time.Second,
},
MaxBodySize: 10 * 1024 * 1024, // 10 MB
UserAgents: []string{`Spidy/2.1; +https://github.com/twiny/spidy`},
Proxies: []string{},
},
Log: struct {
Rotate int
Path string
}{
Rotate: 7,
Path: "./log",
},
Store: struct {
TTL time.Duration
Path string
}{
TTL: 6 * time.Hour, // format: 1h, 1d, 1w, 1m - minimum 6h
Path: "./store",
},
Result: struct{ Path string }{
Path: "./result",
},
Parralle: core,
Timeout: 1 * time.Minute,
TLDs: tlds,
}
// Setting
type Setting struct {
Crawler struct {
MaxDepth int32
Filter []string
Limit struct {
Rate int
Interval time.Duration
}
MaxBodySize int64
UserAgents []string
Proxies []string
}
Log struct {
Rotate int // format: 30d
Path string
}
Store struct {
TTL time.Duration
Path string
}
Result struct {
Path string
}
Parralle int
Timeout time.Duration
TLDs map[string]bool
}
// ParseSetting
func ParseSetting(fp string) *Setting {
data, err := ioutil.ReadFile(fp)
if err != nil {
return defaultSetting
}
var s = struct {
Crawler struct {
MaxDepth int32 `yaml:"max_depth"`
Filter []string `yaml:"filter,flow"`
RateLimit string `yaml:"rate_limit"` // format: req/time.Duration => 5/1s
MaxBodySize string `yaml:"max_body_size"`
UserAgents []string `yaml:"user_agents,flow"`
Proxies []string `yaml:"proxies,flow"`
} `yaml:"crawler"`
Log struct {
Rotate int `yaml:"rotate"` // format: 30d
Path string `yaml:"path"`
} `yaml:"log"`
Store struct {
TTL string `yaml:"ttl"` // format: 1h, 24h
Path string `yaml:"path"`
} `yaml:"store"`
Result struct {
Path string `yaml:"path"`
} `yaml:"result"`
Parralle int `yaml:"parralle"`
Timeout string `yaml:"timeout"`
TLDs []string `yaml:"tlds,flow"`
}{}
if err := yaml.Unmarshal(data, &s); err != nil {
return defaultSetting
}
rate, interval := parseRateLimit(s.Crawler.RateLimit)
return &Setting{
Crawler: struct {
MaxDepth int32
Filter []string
Limit struct {
Rate int
Interval time.Duration
}
MaxBodySize int64
UserAgents []string
Proxies []string
}{
MaxDepth: s.Crawler.MaxDepth,
Filter: s.Crawler.Filter,
Limit: struct {
Rate int
Interval time.Duration
}{
Rate: rate,
Interval: interval,
},
MaxBodySize: parseBodySize(s.Crawler.MaxBodySize),
UserAgents: s.Crawler.UserAgents,
Proxies: s.Crawler.Proxies,
},
Log: struct {
Rotate int
Path string
}{
Rotate: s.Log.Rotate,
Path: s.Log.Path,
},
Store: struct {
TTL time.Duration
Path string
}{
TTL: parseTTL(s.Store.TTL),
Path: s.Store.Path,
},
Result: struct{ Path string }{
Path: s.Result.Path,
},
Parralle: s.Parralle,
Timeout: parseTimeout(s.Timeout),
TLDs: parseTLDs(s.TLDs),
}
}
// parseRateLimit
func parseRateLimit(s string) (int, time.Duration) {
// default rate limit
dr, di := defaultSetting.Crawler.Limit.Rate, defaultSetting.Crawler.Limit.Interval
if s == "" {
return dr, di
}
parts := strings.Split(s, "/")
if len(parts) != 2 {
return dr, di
}
r, i := parts[0], parts[1]
rate, err := strconv.Atoi(r)
if err != nil {
return dr, di
}
interval, err := time.ParseDuration(i)
if err != nil {
return dr, di
}
return rate, interval
}
// parseTLDs
func parseTLDs(list []string) map[string]bool {
m := map[string]bool{}
for _, s := range list {
m[s] = true
}
return m
}
// parseTimeout
func parseTimeout(s string) time.Duration {
d, err := time.ParseDuration(s)
if err != nil {
return defaultSetting.Timeout
}
return d
}
// parseTTL
func parseTTL(s string) time.Duration {
d, err := time.ParseDuration(s)
if err != nil {
return defaultSetting.Timeout
}
return d
}
// parseBodySize
func parseBodySize(s string) int64 {
size := hbyte.Parse(s)
if size == 0 {
return defaultSetting.Crawler.MaxBodySize
}
return size
}
================================================
FILE: internal/pkg/spider/v1/store.go
================================================
package spider
// Storage
type Storage interface {
HasChecked(name string) bool
Close() error
}
================================================
FILE: internal/pkg/spider/v1/string_replacer.go
================================================
package spider
import "strings"
// UnescapeHTML: replace Unicode Character with a whitespace
// to avoid getting wrong results when extracting domain from text.
var UnescapeHTML = strings.NewReplacer(
`\u002f`, ` `,
`\u002F`, ` `,
//
`\u0020`, ` `,
`\u0021`, ` `,
`\u0022`, ` `,
`\u0023`, ` `,
`\u0024`, ` `,
`\u0025`, ` `,
`\u0026`, ` `,
`\u0027`, ` `,
`\u0028`, ` `,
`\u0029`, ` `,
//
`\u002a`, ` `,
`\u002A`, ` `,
//
`\u002b`, ` `,
`\u002B`, ` `,
//
`\u002c`, ` `,
`\u002C`, ` `,
//
`\u002d`, ` `,
`\u002D`, ` `,
//
`\u002e`, ` `,
`\u002E`, ` `,
//
`\u0030`, ` `,
`\u0031`, ` `,
`\u0032`, ` `,
`\u0033`, ` `,
`\u0034`, ` `,
`\u0035`, ` `,
`\u0036`, ` `,
`\u0037`, ` `,
`\u0038`, ` `,
`\u0039`, ` `,
//
`\u003a`, ` `,
`\u003A`, ` `,
//
`\u003b`, ` `,
`\u003B`, ` `,
//
`\u003c`, ` `,
`\u003C`, ` `,
//
`\u003d`, ` `,
`\u003D`, ` `,
//
`\u003e`, ` `,
`\u003E`, ` `,
//
`\u003f`, ` `,
`\u003F`, ` `,
//
`\u0040`, ` `,
`\u0041`, ` `,
`\u0042`, ` `,
`\u0043`, ` `,
`\u0044`, ` `,
`\u0045`, ` `,
`\u0046`, ` `,
`\u0047`, ` `,
`\u0048`, ` `,
`\u0049`, ` `,
//
`\u004a`, ` `,
`\u004A`, ` `,
//
`\u004b`, ` `,
`\u004B`, ` `,
//
`\u004c`, ` `,
`\u004C`, ` `,
//
`\u004d`, ` `,
`\u004D`, ` `,
//
`\u004e`, ` `,
`\u004E`, ` `,
//
`\u004f`, ` `,
`\u004F`, ` `,
//
`\u0050`, ` `,
`\u0051`, ` `,
`\u0052`, ` `,
`\u0053`, ` `,
`\u0054`, ` `,
`\u0055`, ` `,
`\u0056`, ` `,
`\u0057`, ` `,
`\u0058`, ` `,
`\u0059`, ` `,
//
`\u005a`, ` `,
`\u005A`, ` `,
//
`\u005b`, ` `,
`\u005B`, ` `,
//
`\u005c`, ` `,
`\u005C`, ` `,
//
`\u005d`, ` `,
`\u005D`, ` `,
//
`\u005e`, ` `,
`\u005E`, ` `,
//
`\u005f`, ` `,
`\u005F`, ` `,
//
`\u0060`, ` `,
`\u0061`, ` `,
`\u0062`, ` `,
`\u0063`, ` `,
`\u0064`, ` `,
`\u0065`, ` `,
`\u0066`, ` `,
`\u0067`, ` `,
`\u0068`, ` `,
`\u0069`, ` `,
//
`\u006a`, ` `,
`\u006A`, ` `,
//
`\u006b`, ` `,
`\u006B`, ` `,
//
`\u006c`, ` `,
`\u006C`, ` `,
//
`\u006d`, ` `,
`\u006D`, ` `,
//
`\u006e`, ` `,
`\u006E`, ` `,
//
`\u006f`, ` `,
`\u006F`, ` `,
//
`\u0070`, ` `,
`\u0071`, ` `,
`\u0072`, ` `,
`\u0073`, ` `,
`\u0074`, ` `,
`\u0075`, ` `,
`\u0076`, ` `,
`\u0077`, ` `,
`\u0078`, ` `,
`\u0079`, ` `,
//
`\u007a`, ` `,
`\u007A`, ` `,
//
`\u007b`, ` `,
`\u007B`, ` `,
//
`\u007c`, ` `,
`\u007C`, ` `,
//
`\u007d`, ` `,
`\u007D`, ` `,
//
`\u007e`, ` `,
`\u007E`, ` `,
//
`%20`, ` `,
`%21`, ` `,
`%22`, ` `,
`%23`, ` `,
`%24`, ` `,
`%25`, ` `,
`%26`, ` `,
`%27`, ` `,
`%28`, ` `,
`%29`, ` `,
`%2A`, ` `,
`%2B`, ` `,
`%2C`, ` `,
`%2D`, ` `,
`%2E`, ` `,
`%2F`, ` `,
`%30`, ` `,
`%31`, ` `,
`%32`, ` `,
`%33`, ` `,
`%34`, ` `,
`%35`, ` `,
`%36`, ` `,
`%37`, ` `,
`%38`, ` `,
`%39`, ` `,
`%3A`, ` `,
`%3B`, ` `,
`%3C`, ` `,
`%3D`, ` `,
`%3E`, ` `,
`%3F`, ` `,
`%40`, ` `,
`%41`, ` `,
`%42`, ` `,
`%43`, ` `,
`%44`, ` `,
`%45`, ` `,
`%46`, ` `,
`%47`, ` `,
`%48`, ` `,
`%49`, ` `,
`%4A`, ` `,
`%4B`, ` `,
`%4C`, ` `,
`%4D`, ` `,
`%4E`, ` `,
`%4F`, ` `,
`%50`, ` `,
`%51`, ` `,
`%52`, ` `,
`%53`, ` `,
`%54`, ` `,
`%55`, ` `,
`%56`, ` `,
`%57`, ` `,
`%58`, ` `,
`%59`, ` `,
`%5A`, ` `,
`%5B`, ` `,
`%5C`, ` `,
`%5D`, ` `,
`%5E`, ` `,
`%5F`, ` `,
`%60`, ` `,
`%61`, ` `,
`%62`, ` `,
`%63`, ` `,
`%64`, ` `,
`%65`, ` `,
`%66`, ` `,
`%67`, ` `,
`%68`, ` `,
`%69`, ` `,
`%6A`, ` `,
`%6B`, ` `,
`%6C`, ` `,
`%6D`, ` `,
`%6E`, ` `,
`%6F`, ` `,
`%70`, ` `,
`%71`, ` `,
`%72`, ` `,
`%73`, ` `,
`%74`, ` `,
`%75`, ` `,
`%76`, ` `,
`%77`, ` `,
`%78`, ` `,
`%79`, ` `,
`%7A`, ` `,
`%7B`, ` `,
`%7C`, ` `,
`%7D`, ` `,
`%7E`, ` `,
`%7F`, ` `,
`%80`, ` `,
`%81`, ` `,
`%82`, ` `,
`%83`, ` `,
`%84`, ` `,
`%85`, ` `,
`%86`, ` `,
`%87`, ` `,
`%88`, ` `,
`%89`, ` `,
`%8A`, ` `,
`%8B`, ` `,
`%8C`, ` `,
`%8D`, ` `,
`%8E`, ` `,
`%8F`, ` `,
`%90`, ` `,
`%91`, ` `,
`%92`, ` `,
`%93`, ` `,
`%94`, ` `,
`%95`, ` `,
`%96`, ` `,
`%97`, ` `,
`%98`, ` `,
`%99`, ` `,
`%9A`, ` `,
`%9B`, ` `,
`%9C`, ` `,
`%9D`, ` `,
`%9E`, ` `,
`%9F`, ` `,
`%A0`, ` `,
`%A1`, ` `,
`%A2`, ` `,
`%A3`, ` `,
`%A4`, ` `,
`%A5`, ` `,
`%A6`, ` `,
`%A7`, ` `,
`%A8`, ` `,
`%A9`, ` `,
`%AA`, ` `,
`%AB`, ` `,
`%AC`, ` `,
`%AD`, ` `,
`%AE`, ` `,
`%AF`, ` `,
`%B0`, ` `,
`%B1`, ` `,
`%B2`, ` `,
`%B3`, ` `,
`%B4`, ` `,
`%B5`, ` `,
`%B6`, ` `,
`%B7`, ` `,
`%B8`, ` `,
`%B9`, ` `,
`%BA`, ` `,
`%BB`, ` `,
`%BC`, ` `,
`%BD`, ` `,
`%BE`, ` `,
`%BF`, ` `,
`%C0`, ` `,
`%C1`, ` `,
`%C2`, ` `,
`%C3`, ` `,
`%C4`, ` `,
`%C5`, ` `,
`%C6`, ` `,
`%C7`, ` `,
`%C8`, ` `,
`%C9`, ` `,
`%CA`, ` `,
`%CB`, ` `,
`%CC`, ` `,
`%CD`, ` `,
`%CE`, ` `,
`%CF`, ` `,
`%D0`, ` `,
`%D1`, ` `,
`%D2`, ` `,
`%D3`, ` `,
`%D4`, ` `,
`%D5`, ` `,
`%D6`, ` `,
`%D7`, ` `,
`%D8`, ` `,
`%D9`, ` `,
`%DA`, ` `,
`%DB`, ` `,
`%DC`, ` `,
`%DD`, ` `,
`%DE`, ` `,
`%DF`, ` `,
`%E0`, ` `,
`%E1`, ` `,
`%E2`, ` `,
`%E3`, ` `,
`%E4`, ` `,
`%E5`, ` `,
`%E6`, ` `,
`%E7`, ` `,
`%E8`, ` `,
`%E9`, ` `,
`%EA`, ` `,
`%EB`, ` `,
`%EC`, ` `,
`%ED`, ` `,
`%EE`, ` `,
`%EF`, ` `,
`%F0`, ` `,
`%F1`, ` `,
`%F2`, ` `,
`%F3`, ` `,
`%F4`, ` `,
`%F5`, ` `,
`%F6`, ` `,
`%F7`, ` `,
`%F8`, ` `,
`%F9`, ` `,
`%FA`, ` `,
`%FB`, ` `,
`%FC`, ` `,
`%FD`, ` `,
`%FE`, ` `,
`%FF`, ` `,
)
================================================
FILE: internal/pkg/spider/v1/tld_list.go
================================================
package spider
// allowed TLDs: list of allowed domain tlds to avoid getting bad extensions.
var tlds = map[string]bool{
"ac": true,
"ae": true,
"aero": true,
"af": true,
"ag": true,
"am": true,
"as": true,
"asia": true,
"at": true,
"au": true,
"ax": true,
"be": true,
"bg": true,
"bi": true,
"biz": true,
"bj": true,
"br": true,
"by": true,
"ca": true,
"cat": true,
"cc": true,
"cl": true,
"cn": true,
"co": true,
"com": true,
"coop": true,
"cx": true,
"de": true,
"dk": true,
"dm": true,
"dz": true,
"edu": true,
"ee": true,
"eu": true,
"fi": true,
"fo": true,
"fr": true,
"gb.com": true,
"qc.com": true,
"ge": true,
"gl": true,
"gov": true,
"gs": true,
"hk": true,
"hr": true,
"hu": true,
"hu.com": true,
"id": true,
"ie": true,
"in": true,
"info": true,
"int": true,
"io": true,
"ir": true,
"is": true,
"je": true,
"jobs": true,
"kg": true,
"kr": true,
"la": true,
"lu": true,
"lv": true,
"ly": true,
"ma": true,
"md": true,
"me": true,
"mk": true,
"mobi": true,
"ms": true,
"mu": true,
"mx": true,
"name": true,
"net": true,
"nf": true,
"ng": true,
"no": true,
"no.com": true,
"nu": true,
"nz": true,
"org": true,
"pl": true,
"pr": true,
"pro": true,
"pw": true,
"ro": true,
"ru": true,
"sa.com": true,
"sc": true,
"se": true,
"se.com": true,
"sg": true,
"sh": true,
"si": true,
"sk": true,
"sm": true,
"st": true,
"so": true,
"su": true,
"tc": true,
"tel": true,
"tf": true,
"th": true,
"tk": true,
"tl": true,
"tm": true,
"tn": true,
"travel": true,
"tw": true,
"tv": true,
"tz": true,
"ua": true,
"uk": true,
"us": true,
"uy.com": true,
"uz": true,
"vc": true,
"ve": true,
"vg": true,
"ws": true,
"xxx": true,
"yu": true,
"za.com": true,
}
================================================
FILE: internal/pkg/spider/v1/utils.go
================================================
package spider
import (
"bytes"
"regexp"
"strings"
"github.com/PuerkitoBio/goquery"
"golang.org/x/net/publicsuffix"
)
var (
// domain regexp
domainRegexp = regexp.MustCompile(`(([[:alnum:]]-?)?([[:alnum:]]-?)+\.)+[[:alpha:]]{2,4}`)
)
// FindDomains
func FindDomains(body []byte) (domains []Domain) {
doc, err := goquery.NewDocumentFromReader(bytes.NewReader(body))
if err != nil {
return
}
var s = UnescapeHTML.Replace(doc.Text())
for _, domain := range domainRegexp.FindAllString(s, -1) {
name, tld, ok := splitDomain(domain)
if ok {
domains = append(domains, Domain{
Name: name,
TLD: tld,
})
}
}
return
}
// SplitDomain
func splitDomain(d string) (name string, tld string, ok bool) {
// get domain tld
root, err := publicsuffix.EffectiveTLDPlusOne(d)
if err != nil {
return
}
//convert to domain name, and tld
i := strings.Index(root, ".")
tld = root[i+1:]
if _, ok = tlds[tld]; !ok {
return
}
root = strings.ToLower(root)
tld = strings.ToLower(tld)
name = strings.TrimSuffix(root, "."+tld)
return
}
================================================
FILE: internal/pkg/spider/v1/writer.go
================================================
package spider
// Writer
type Writer interface {
Write(*Domain) error
}
================================================
FILE: internal/service/cache/cache.go
================================================
package cache
import (
"time"
//
//
"github.com/twiny/carbon"
)
// Cache
type Cache struct {
ttl time.Duration
db *carbon.Cache
}
// NewCache
func NewCache(ttl time.Duration, dir string) (*Cache, error) {
db, err := carbon.NewCache(dir)
if err != nil {
return nil, err
}
return &Cache{
ttl: ttl,
db: db,
}, nil
}
// HasChecked
func (c *Cache) HasChecked(name string) bool {
// first check if domain is in cache
b, err := c.db.Get(name)
if err != nil || b == nil {
// if not found save to cache
if err := c.db.Set(name, []byte(name), c.ttl); err != nil {
return false
}
return false
}
return true
}
// Close
func (c *Cache) Close() error {
c.db.Close()
return nil
}
================================================
FILE: internal/service/writer/csv_writer.go
================================================
package writer
import (
"encoding/csv"
"os"
"path/filepath"
"sync"
"time"
"github.com/twiny/spidy/v2/internal/pkg/spider/v1"
)
// CSVWriter
type CSVWriter struct {
l *sync.Mutex
f *os.File
w *csv.Writer
}
// NewCSVWriter
func NewCSVWriter(dir string) (*CSVWriter, error) {
if _, err := os.Stat(dir); os.IsNotExist(err) {
if err := os.MkdirAll(dir, 0755); err != nil {
return nil, err
}
}
name := time.Now().Format("2006-01-02")
fp := filepath.Join(dir, name+"_domains.csv")
// open or create log
f, err := os.OpenFile(fp, os.O_APPEND|os.O_CREATE|os.O_WRONLY, os.ModePerm)
if err != nil {
return nil, err
}
return &CSVWriter{
l: &sync.Mutex{},
f: f,
w: csv.NewWriter(f),
}, nil
}
// Write
func (c *CSVWriter) Write(d *spider.Domain) error {
c.l.Lock()
defer func() {
c.l.Unlock()
c.w.Flush()
}()
return c.w.Write([]string{d.Name + "." + d.TLD, d.Status})
}
// Close
func (c *CSVWriter) Close() error {
return c.f.Close()
}
gitextract_sc4cxpbk/
├── .github/
│ └── ISSUE_TEMPLATE/
│ ├── bug_report.md
│ ├── custom.md
│ └── feature_request.md
├── .gitignore
├── LICENSE
├── README.md
├── cmd/
│ └── spidy/
│ ├── api/
│ │ ├── spider.go
│ │ └── version
│ └── main.go
├── config/
│ └── example.config.yaml
├── go.mod
├── go.sum
└── internal/
├── pkg/
│ ├── hbyte/
│ │ └── hbyte.go
│ └── spider/
│ └── v1/
│ ├── domain.go
│ ├── page.go
│ ├── setting.go
│ ├── store.go
│ ├── string_replacer.go
│ ├── tld_list.go
│ ├── utils.go
│ └── writer.go
└── service/
├── cache/
│ └── cache.go
└── writer/
└── csv_writer.go
SYMBOL INDEX (39 symbols across 11 files)
FILE: cmd/spidy/api/spider.go
type Spider (line 30) | type Spider struct
method Start (line 93) | func (s *Spider) Start(links []string) error {
method Shutdown (line 188) | func (s *Spider) Shutdown() error {
function NewSpider (line 42) | func NewSpider(fp string) (*Spider, error) {
FILE: cmd/spidy/main.go
function main (line 16) | func main() {
FILE: internal/pkg/hbyte/hbyte.go
constant b (line 9) | b = "b"
constant kb (line 10) | kb = "kb"
constant mb (line 11) | mb = "mb"
constant gb (line 12) | gb = "gb"
constant tb (line 13) | tb = "tb"
constant B (line 19) | B int64 = 1 << (10 * iota)
constant KB (line 20) | KB
constant MB (line 21) | MB
constant GB (line 22) | GB
constant TB (line 23) | TB
function Parse (line 27) | func Parse(s string) int64 {
function String (line 53) | func String(n int64) string {
FILE: internal/pkg/spider/v1/domain.go
type Domain (line 4) | type Domain struct
method CSVRow (line 12) | func (d Domain) CSVRow() []string {
FILE: internal/pkg/spider/v1/page.go
type Page (line 6) | type Page struct
FILE: internal/pkg/spider/v1/setting.go
type Setting (line 74) | type Setting struct
function ParseSetting (line 103) | func ParseSetting(fp string) *Setting {
function parseRateLimit (line 189) | func parseRateLimit(s string) (int, time.Duration) {
function parseTLDs (line 218) | func parseTLDs(list []string) map[string]bool {
function parseTimeout (line 227) | func parseTimeout(s string) time.Duration {
function parseTTL (line 236) | func parseTTL(s string) time.Duration {
function parseBodySize (line 245) | func parseBodySize(s string) int64 {
FILE: internal/pkg/spider/v1/store.go
type Storage (line 4) | type Storage interface
FILE: internal/pkg/spider/v1/utils.go
function FindDomains (line 18) | func FindDomains(body []byte) (domains []Domain) {
function splitDomain (line 40) | func splitDomain(d string) (name string, tld string, ok bool) {
FILE: internal/pkg/spider/v1/writer.go
type Writer (line 4) | type Writer interface
FILE: internal/service/cache/cache.go
type Cache (line 13) | type Cache struct
method HasChecked (line 31) | func (c *Cache) HasChecked(name string) bool {
method Close (line 45) | func (c *Cache) Close() error {
function NewCache (line 19) | func NewCache(ttl time.Duration, dir string) (*Cache, error) {
FILE: internal/service/writer/csv_writer.go
type CSVWriter (line 14) | type CSVWriter struct
method Write (line 45) | func (c *CSVWriter) Write(d *spider.Domain) error {
method Close (line 56) | func (c *CSVWriter) Close() error {
function NewCSVWriter (line 21) | func NewCSVWriter(dir string) (*CSVWriter, error) {
Condensed preview — 23 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (52K chars).
[
{
"path": ".github/ISSUE_TEMPLATE/bug_report.md",
"chars": 834,
"preview": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: ''\nlabels: ''\nassignees: ''\n\n---\n\n**Describe the b"
},
{
"path": ".github/ISSUE_TEMPLATE/custom.md",
"chars": 126,
"preview": "---\nname: Custom issue template\nabout: Describe this issue template's purpose here.\ntitle: ''\nlabels: ''\nassignees: ''\n\n"
},
{
"path": ".github/ISSUE_TEMPLATE/feature_request.md",
"chars": 595,
"preview": "---\nname: Feature request\nabout: Suggest an idea for this project\ntitle: ''\nlabels: ''\nassignees: ''\n\n---\n\n**Is your fea"
},
{
"path": ".gitignore",
"chars": 60,
"preview": "cmd/tests\nconfig/config.yaml\nlog/\nresult/\nstore/\nbin/\nbbolt/"
},
{
"path": "LICENSE",
"chars": 1062,
"preview": "MIT License\n\nCopyright (c) 2022 Twiny\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof t"
},
{
"path": "README.md",
"chars": 1869,
"preview": "## Spidy\nA tool that crawl websites to find domain names and checks thier availiabity.\n\n### Install\n\n```sh\ngit clone htt"
},
{
"path": "cmd/spidy/api/spider.go",
"chars": 4536,
"preview": "package api\n\nimport (\n\t\"context\"\n\t_ \"embed\"\n\t\"fmt\"\n\t\"log\"\n\t\"net/http\"\n\t\"os\"\n\t\"os/signal\"\n\t\"strconv\"\n\t\"sync\"\n\t\"syscall\"\n\n"
},
{
"path": "cmd/spidy/api/version",
"chars": 5,
"preview": "2.0.0"
},
{
"path": "cmd/spidy/main.go",
"chars": 841,
"preview": "package main\n\nimport (\n\t\"log\"\n\t\"os\"\n\n\t//\n\n\t\"github.com/twiny/spidy/v2/cmd/spidy/api\"\n\n\t//\n\t\"github.com/urfave/cli/v2\"\n)\n"
},
{
"path": "config/example.config.yaml",
"chars": 374,
"preview": "crawler:\n max_depth: 10\n # filter: []\n rate_limit: \"1/5s\"\n max_body_size: \"20MB\"\n user_agents:\n - \"S"
},
{
"path": "go.mod",
"chars": 1790,
"preview": "module github.com/twiny/spidy/v2\n\ngo 1.18\n\nrequire (\n\tgithub.com/PuerkitoBio/goquery v1.8.0\n\tgithub.com/twiny/carbon v1."
},
{
"path": "go.sum",
"chars": 18754,
"preview": "cloud.google.com/go v0.26.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw=\ngithub.com/BurntSushi/toml v0.3.1/go."
},
{
"path": "internal/pkg/hbyte/hbyte.go",
"chars": 816,
"preview": "package hbyte\n\nimport (\n\t\"fmt\"\n\t\"strings\"\n)\n\nconst (\n\tb = \"b\"\n\tkb = \"kb\"\n\tmb = \"mb\"\n\tgb = \"gb\"\n\ttb = \"tb\"\n)\n\n// type BY"
},
{
"path": "internal/pkg/spider/v1/domain.go",
"chars": 228,
"preview": "package spider\n\n// Domain\ntype Domain struct {\n\tURL string\n\tName string\n\tTLD string\n\tStatus string\n}\n\n// CSVRow\n"
},
{
"path": "internal/pkg/spider/v1/page.go",
"chars": 107,
"preview": "package spider\n\nimport \"net/url\"\n\n// Page\ntype Page struct {\n\tURL *url.URL\n\tStatus int\n\tBody []byte\n}\n"
},
{
"path": "internal/pkg/spider/v1/setting.go",
"chars": 4649,
"preview": "package spider\n\nimport (\n\t\"io/ioutil\"\n\t\"runtime\"\n\t\"strconv\"\n\t\"strings\"\n\t\"time\"\n\n\t//\n\t\"github.com/twiny/spidy/v2/internal"
},
{
"path": "internal/pkg/spider/v1/store.go",
"chars": 99,
"preview": "package spider\n\n// Storage\ntype Storage interface {\n\tHasChecked(name string) bool\n\tClose() error\n}\n"
},
{
"path": "internal/pkg/spider/v1/string_replacer.go",
"chars": 5361,
"preview": "package spider\n\nimport \"strings\"\n\n// UnescapeHTML: replace Unicode Character with a whitespace\n// to avoid getting wrong"
},
{
"path": "internal/pkg/spider/v1/tld_list.go",
"chars": 2198,
"preview": "package spider\n\n// allowed TLDs: list of allowed domain tlds to avoid getting bad extensions.\nvar tlds = map[string]bool"
},
{
"path": "internal/pkg/spider/v1/utils.go",
"chars": 1069,
"preview": "package spider\n\nimport (\n\t\"bytes\"\n\t\"regexp\"\n\t\"strings\"\n\n\t\"github.com/PuerkitoBio/goquery\"\n\t\"golang.org/x/net/publicsuffi"
},
{
"path": "internal/pkg/spider/v1/writer.go",
"chars": 74,
"preview": "package spider\n\n// Writer\ntype Writer interface {\n\tWrite(*Domain) error\n}\n"
},
{
"path": "internal/service/cache/cache.go",
"chars": 709,
"preview": "package cache\n\nimport (\n\t\"time\"\n\n\t//\n\n\t//\n\t\"github.com/twiny/carbon\"\n)\n\n// Cache\ntype Cache struct {\n\tttl time.Duration\n"
},
{
"path": "internal/service/writer/csv_writer.go",
"chars": 975,
"preview": "package writer\n\nimport (\n\t\"encoding/csv\"\n\t\"os\"\n\t\"path/filepath\"\n\t\"sync\"\n\t\"time\"\n\n\t\"github.com/twiny/spidy/v2/internal/pk"
}
]
About this extraction
This page contains the full source code of the twiny/spidy GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 23 files (46.0 KB), approximately 21.1k tokens, and a symbol index with 39 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.