Repository: twiny/spidy Branch: main Commit: fc5a8447c142 Files: 23 Total size: 46.0 KB Directory structure: gitextract_sc4cxpbk/ ├── .github/ │ └── ISSUE_TEMPLATE/ │ ├── bug_report.md │ ├── custom.md │ └── feature_request.md ├── .gitignore ├── LICENSE ├── README.md ├── cmd/ │ └── spidy/ │ ├── api/ │ │ ├── spider.go │ │ └── version │ └── main.go ├── config/ │ └── example.config.yaml ├── go.mod ├── go.sum └── internal/ ├── pkg/ │ ├── hbyte/ │ │ └── hbyte.go │ └── spider/ │ └── v1/ │ ├── domain.go │ ├── page.go │ ├── setting.go │ ├── store.go │ ├── string_replacer.go │ ├── tld_list.go │ ├── utils.go │ └── writer.go └── service/ ├── cache/ │ └── cache.go └── writer/ └── csv_writer.go ================================================ FILE CONTENTS ================================================ ================================================ FILE: .github/ISSUE_TEMPLATE/bug_report.md ================================================ --- name: Bug report about: Create a report to help us improve title: '' labels: '' assignees: '' --- **Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: 1. Go to '...' 2. Click on '....' 3. Scroll down to '....' 4. See error **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. **Desktop (please complete the following information):** - OS: [e.g. iOS] - Browser [e.g. chrome, safari] - Version [e.g. 22] **Smartphone (please complete the following information):** - Device: [e.g. iPhone6] - OS: [e.g. iOS8.1] - Browser [e.g. stock browser, safari] - Version [e.g. 22] **Additional context** Add any other context about the problem here. ================================================ FILE: .github/ISSUE_TEMPLATE/custom.md ================================================ --- name: Custom issue template about: Describe this issue template's purpose here. title: '' labels: '' assignees: '' --- ================================================ FILE: .github/ISSUE_TEMPLATE/feature_request.md ================================================ --- name: Feature request about: Suggest an idea for this project title: '' labels: '' assignees: '' --- **Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] **Describe the solution you'd like** A clear and concise description of what you want to happen. **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context or screenshots about the feature request here. ================================================ FILE: .gitignore ================================================ cmd/tests config/config.yaml log/ result/ store/ bin/ bbolt/ ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2022 Twiny Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ ## Spidy A tool that crawl websites to find domain names and checks thier availiabity. ### Install ```sh git clone https://github.com/twiny/spidy.git cd ./spidy # build go build -o bin/spidy -v cmd/spidy/main.go # run ./bin/spidy -c config/config.yaml -u https://github.com ``` ## Usage ```sh NAME: Spidy - Domain name scraper USAGE: spidy [global options] command [command options] [arguments...] VERSION: 2.0.0 COMMANDS: help, h Shows a list of commands or help for one command GLOBAL OPTIONS: --config path, -c path path to config file --help, -h show help (default: false) --urls urls, -u urls urls of page to scrape (accepts multiple inputs) --version, -v print the version (default: false) ``` ## Configuration ```yaml # main crawler config crawler: max_depth: 10 # max depth of pages to visit per website. # filter: [] # regexp filter rate_limit: "1/5s" # 1 request per 5 sec max_body_size: "20MB" # max page body size user_agents: # array of user-agents - "Spidy/2.1; +https://github.com/ twiny/spidy" # proxies: [] # array of proxy. http(s), SOCKS5 # Logs log: rotate: 7 # log rotation path: "./log" # log directory # Store store: ttl: "24h" # keep cache for 24h path: "./store" # store directory # Results result: path: ./result # result directory parralle: 3 # number of concurrent workers timeout: "5m" # request timeout tlds: ["biz", "cc", "com", "edu", "info", "net", "org", "tv"] # array of domain extension to check. ``` ## TODO - [ ] Add support to more `writers`. - [ ] Add terminal logging. - [ ] Add test cases. ## Issues NOTE: This package is provided "as is" with no guarantee. Use it at your own risk and always test it yourself before using it in a production environment. If you find any issues, please create a new issue. ================================================ FILE: cmd/spidy/api/spider.go ================================================ package api import ( "context" _ "embed" "fmt" "log" "net/http" "os" "os/signal" "strconv" "sync" "syscall" // "github.com/twiny/spidy/v2/internal/pkg/spider/v1" "github.com/twiny/spidy/v2/internal/service/cache" "github.com/twiny/spidy/v2/internal/service/writer" // "github.com/twiny/domaincheck" "github.com/twiny/flog" "github.com/twiny/wbot" ) //go:embed version var Version string // Spider type Spider struct { wg *sync.WaitGroup setting *spider.Setting bot *wbot.WBot pages chan *spider.Page check *domaincheck.Checker store spider.Storage write spider.Writer log *flog.Logger } // NewSpider func NewSpider(fp string) (*Spider, error) { // get settings setting := spider.ParseSetting(fp) // crawler opts opts := []wbot.Option{ wbot.SetParallel(setting.Parralle), wbot.SetMaxDepth(setting.Crawler.MaxDepth), wbot.SetRateLimit(setting.Crawler.Limit.Rate, setting.Crawler.Limit.Interval), wbot.SetMaxBodySize(setting.Crawler.MaxBodySize), wbot.SetUserAgents(setting.Crawler.UserAgents), wbot.SetProxies(setting.Crawler.Proxies), } bot := wbot.NewWBot(opts...) check, err := domaincheck.NewChecker() if err != nil { return nil, err } // store store, err := cache.NewCache(setting.Store.TTL, setting.Store.Path) if err != nil { return nil, err } // logger log, err := flog.NewLogger(setting.Log.Path, "spidy", setting.Log.Rotate) if err != nil { return nil, err } write, err := writer.NewCSVWriter(setting.Result.Path) if err != nil { return nil, err } return &Spider{ wg: &sync.WaitGroup{}, setting: setting, bot: bot, pages: make(chan *spider.Page, setting.Parralle), check: check, store: store, write: write, log: log, }, nil } // Start func (s *Spider) Start(links []string) error { // go crawl s.wg.Add(len(links)) for _, link := range links { go func(l string) { defer s.wg.Done() // if err := s.bot.Crawl(l); err != nil { s.log.Error(err.Error(), map[string]string{"url": l}) } }(link) } // check domains s.wg.Add(s.setting.Parralle) for i := 0; i < s.setting.Parralle; i++ { go func() { defer s.wg.Done() // results for res := range s.bot.Stream() { // if response is ok if res.Status != http.StatusOK { s.log.Info("bad HTTP status", map[string]string{ "url": res.URL.String(), "status": strconv.Itoa(res.Status), }) continue } // extract domains domains := spider.FindDomains(res.Body) // check availability for _, domain := range domains { root := fmt.Sprintf("%s.%s", domain.Name, domain.TLD) // check if allowed extension if len(s.setting.TLDs) > 0 { if ok := s.setting.TLDs[domain.TLD]; !ok { s.log.Info("unsupported domain", map[string]string{ "domain": root, "url": res.URL.String(), }) continue } } // skip if already checked if s.store.HasChecked(root) { s.log.Info("already checked", map[string]string{ "domain": root, "url": res.URL.String(), }) continue } // ctx, cancel := context.WithTimeout(context.Background(), s.setting.Timeout) defer cancel() status, err := s.check.Check(ctx, root) if err != nil { s.log.Error(err.Error(), map[string]string{ "domain": root, "url": res.URL.String(), }) continue } // save domain if err := s.write.Write(&spider.Domain{ URL: res.URL.String(), Name: domain.Name, TLD: domain.TLD, Status: status.String(), }); err != nil { s.log.Error(err.Error(), map[string]string{ "domain": root, "url": res.URL.String(), }) continue } // terminal print fmt.Printf("[Spidy] == domain: %s - status %s\n", root, status.String()) } } }() } s.wg.Wait() return nil } // Shutdown func (s *Spider) Shutdown() error { // attempt graceful shutdown sigs := make(chan os.Signal, 1) signal.Notify(sigs, syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM, syscall.SIGQUIT) <-sigs log.Println("shutting down ...") // 2nd ctrl+c kills program go func() { sigs := make(chan os.Signal, 1) signal.Notify(sigs, syscall.SIGHUP, syscall.SIGINT, syscall.SIGTERM, syscall.SIGQUIT) <-sigs log.Println("killing program ...") os.Exit(0) }() s.bot.Close() s.log.Close() if err := s.store.Close(); err != nil { return err } os.Exit(0) return nil } ================================================ FILE: cmd/spidy/api/version ================================================ 2.0.0 ================================================ FILE: cmd/spidy/main.go ================================================ package main import ( "log" "os" // "github.com/twiny/spidy/v2/cmd/spidy/api" // "github.com/urfave/cli/v2" ) // main func main() { app := &cli.App{ Name: "Spidy", HelpName: "spidy", Usage: "Domain name scraper", Version: api.Version, Flags: []cli.Flag{ &cli.StringFlag{ Name: "config", Aliases: []string{"c"}, Usage: "`path` to config file", Required: true, }, &cli.StringSliceFlag{ Name: "urls", Aliases: []string{"u"}, Usage: "`urls` of page to scrape", Required: true, }, }, Action: func(c *cli.Context) error { s, err := api.NewSpider(c.String("config")) if err != nil { return err } go s.Shutdown() return s.Start(c.StringSlice("urls")) }, } if err := app.Run(os.Args); err != nil { log.Println(err) return } } ================================================ FILE: config/example.config.yaml ================================================ crawler: max_depth: 10 # filter: [] rate_limit: "1/5s" max_body_size: "20MB" user_agents: - "Spidy/2.1; +https://github.com/twiny/spidy" # proxies: [] log: rotate: 7 path: "./log" store: ttl: "24h" path: "./store" result: path: ./result parralle: 3 timeout: "5m" tlds: ["biz", "cc", "com", "edu", "info", "net", "org", "tv"] ================================================ FILE: go.mod ================================================ module github.com/twiny/spidy/v2 go 1.18 require ( github.com/PuerkitoBio/goquery v1.8.0 github.com/twiny/carbon v1.0.1 github.com/twiny/domaincheck v0.1.0 github.com/twiny/flog v1.0.3 github.com/twiny/wbot v0.1.5 github.com/urfave/cli/v2 v2.10.3 golang.org/x/net v0.0.0-20220513224357-95641704303c gopkg.in/yaml.v3 v3.0.1 ) require ( github.com/andybalholm/cascadia v1.3.1 // indirect github.com/benbjohnson/clock v1.3.0 // indirect github.com/cespare/xxhash v1.1.0 // indirect github.com/cespare/xxhash/v2 v2.1.1 // indirect github.com/cpuguy83/go-md2man/v2 v2.0.2 // indirect github.com/dgraph-io/badger/v3 v3.2103.2 // indirect github.com/dgraph-io/ristretto v0.1.0 // indirect github.com/dustin/go-humanize v1.0.0 // indirect github.com/fatih/color v1.10.0 // indirect github.com/goccy/go-yaml v1.9.4 // indirect github.com/gogo/protobuf v1.3.2 // indirect github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b // indirect github.com/golang/groupcache v0.0.0-20190702054246-869f871628b6 // indirect github.com/golang/protobuf v1.3.1 // indirect github.com/golang/snappy v0.0.3 // indirect github.com/google/flatbuffers v1.12.1 // indirect github.com/klauspost/compress v1.12.3 // indirect github.com/mattn/go-colorable v0.1.8 // indirect github.com/mattn/go-isatty v0.0.12 // indirect github.com/pkg/errors v0.9.1 // indirect github.com/russross/blackfriday/v2 v2.1.0 // indirect github.com/twiny/ratelimit v0.0.0-20220509163414-256d3376b0ac // indirect github.com/twiny/whois/v2 v2.0.1 // indirect github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673 // indirect go.opencensus.io v0.22.5 // indirect golang.org/x/sys v0.0.0-20211216021012-1d35b9e2eb4e // indirect golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect ) ================================================ FILE: go.sum ================================================ cloud.google.com/go v0.26.0/go.mod h1:aQUYkXzVsufM+DwF1aE+0xfcU+56JwCaLick0ClmMTw= github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU= github.com/OneOfOne/xxhash v1.2.2 h1:KMrpdQIwFcEqXDklaen+P1axHaj9BSKzvpUUfnHldSE= github.com/OneOfOne/xxhash v1.2.2/go.mod h1:HSdplMjZKSmBqAxg5vPj2TmRDmfkzw+cTzAElWljhcU= github.com/PuerkitoBio/goquery v1.8.0 h1:PJTF7AmFCFKk1N6V6jmKfrNH9tV5pNE6lZMkG0gta/U= github.com/PuerkitoBio/goquery v1.8.0/go.mod h1:ypIiRMtY7COPGk+I/YbZLbxsxn9g5ejnI2HSMtkjZvI= github.com/andybalholm/cascadia v1.3.1 h1:nhxRkql1kdYCc8Snf7D5/D3spOX+dBgjA6u8x004T2c= github.com/andybalholm/cascadia v1.3.1/go.mod h1:R4bJ1UQfqADjvDa4P6HZHLh/3OxWWEqc0Sk8XGwHqvA= github.com/armon/consul-api v0.0.0-20180202201655-eb2c6b5be1b6/go.mod h1:grANhF5doyWs3UAsr3K4I6qtAmlQcZDesFNEHPZAzj8= github.com/benbjohnson/clock v1.3.0 h1:ip6w0uFQkncKQ979AypyG0ER7mqUSBdKLOgAle/AT8A= github.com/benbjohnson/clock v1.3.0/go.mod h1:J11/hYXuz8f4ySSvYwY0FKfm+ezbsZBKZxNJlLklBHA= github.com/cespare/xxhash v1.1.0 h1:a6HrQnmkObjyL+Gs60czilIUGqrzKutQD6XZog3p+ko= github.com/cespare/xxhash v1.1.0/go.mod h1:XrSqR1VqqWfGrhpAt58auRo0WTKS1nRRg3ghfAqPWnc= github.com/cespare/xxhash/v2 v2.1.1 h1:6MnRN8NT7+YBpUIWxHtefFZOKTAPgGjpQSxqLNn0+qY= github.com/cespare/xxhash/v2 v2.1.1/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs= github.com/client9/misspell v0.3.4/go.mod h1:qj6jICC3Q7zFZvVWo7KLAzC3yx5G7kyvSDkc90ppPyw= github.com/coreos/etcd v3.3.10+incompatible/go.mod h1:uF7uidLiAD3TWHmW31ZFd/JWoc32PjwdhPthX9715RE= github.com/coreos/go-etcd v2.0.0+incompatible/go.mod h1:Jez6KQU2B/sWsbdaef3ED8NzMklzPG4d5KIOhIy30Tk= github.com/coreos/go-semver v0.2.0/go.mod h1:nnelYz7RCh+5ahJtPPxZlU+153eP4D4r3EedlOD2RNk= github.com/cpuguy83/go-md2man v1.0.10/go.mod h1:SmD6nW6nTyfqj6ABTjUi3V3JVMnlJmwcJI5acqYI6dE= github.com/cpuguy83/go-md2man/v2 v2.0.2 h1:p1EgwI/C7NhT0JmVkwCD2ZBK8j4aeHQX2pMHHBfMQ6w= github.com/cpuguy83/go-md2man/v2 v2.0.2/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o= github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/dgraph-io/badger/v3 v3.2103.2 h1:dpyM5eCJAtQCBcMCZcT4UBZchuTJgCywerHHgmxfxM8= github.com/dgraph-io/badger/v3 v3.2103.2/go.mod h1:RHo4/GmYcKKh5Lxu63wLEMHJ70Pac2JqZRYGhlyAo2M= github.com/dgraph-io/ristretto v0.1.0 h1:Jv3CGQHp9OjuMBSne1485aDpUkTKEcUqF+jm/LuerPI= github.com/dgraph-io/ristretto v0.1.0/go.mod h1:fux0lOrBhrVCJd3lcTHsIJhq1T2rokOu6v9Vcb3Q9ug= github.com/dgryski/go-farm v0.0.0-20190423205320-6a90982ecee2 h1:tdlZCpZ/P9DhczCTSixgIKmwPv6+wP5DGjqLYw5SUiA= github.com/dgryski/go-farm v0.0.0-20190423205320-6a90982ecee2/go.mod h1:SqUrOPUnsFjfmXRMNPybcSiG0BgUW2AuFH8PAnS2iTw= github.com/dustin/go-humanize v1.0.0 h1:VSnTsYCnlFHaM2/igO1h6X3HA71jcobQuxemgkq4zYo= github.com/dustin/go-humanize v1.0.0/go.mod h1:HtrtbFcZ19U5GC7JDqmcUSB87Iq5E25KnS6fMYU6eOk= github.com/fatih/color v1.10.0 h1:s36xzo75JdqLaaWoiEHk767eHiwo0598uUxyfiPkDsg= github.com/fatih/color v1.10.0/go.mod h1:ELkj/draVOlAH/xkhN6mQ50Qd0MPOk5AAr3maGEBuJM= github.com/fsnotify/fsnotify v1.4.7/go.mod h1:jwhsz4b93w/PPRr/qN1Yymfu8t87LnFCMoQvtojpjFo= github.com/go-playground/assert/v2 v2.0.1/go.mod h1:VDjEfimB/XKnb+ZQfWdccd7VUvScMdVu0Titje2rxJ4= github.com/go-playground/locales v0.13.0 h1:HyWk6mgj5qFqCT5fjGBuRArbVDfE4hi8+e8ceBS/t7Q= github.com/go-playground/locales v0.13.0/go.mod h1:taPMhCMXrRLJO55olJkUXHZBHCxTMfnGwq/HNwmWNS8= github.com/go-playground/universal-translator v0.17.0 h1:icxd5fm+REJzpZx7ZfpaD876Lmtgy7VtROAbHHXk8no= github.com/go-playground/universal-translator v0.17.0/go.mod h1:UkSxE5sNxxRwHyU+Scu5vgOQjsIJAF8j9muTVoKLVtA= github.com/go-playground/validator/v10 v10.4.1 h1:pH2c5ADXtd66mxoE0Zm9SUhxE20r7aM3F26W0hOn+GE= github.com/go-playground/validator/v10 v10.4.1/go.mod h1:nlOn6nFhuKACm19sB/8EGNn9GlaMV7XkbRSipzJ0Ii4= github.com/goccy/go-yaml v1.9.4 h1:S0GCYjwHKVI6IHqio7QWNKNThUl6NLzFd/g8Z65Axw8= github.com/goccy/go-yaml v1.9.4/go.mod h1:U/jl18uSupI5rdI2jmuCswEA2htH9eXfferR3KfscvA= github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q= github.com/gogo/protobuf v1.3.2/go.mod h1:P1XiOD3dCwIKUDQYPy72D8LYyHL2YPYrpS2s69NZV8Q= github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b h1:VKtxabqXZkF25pY9ekfRL6a582T4P37/31XEstQ5p58= github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b/go.mod h1:SBH7ygxi8pfUlaOkMMuAQtPIUF8ecWP5IEl/CR7VP2Q= github.com/golang/groupcache v0.0.0-20190702054246-869f871628b6 h1:ZgQEtGgCBiWRM39fZuwSd1LwSqqSW0hOdXCYYDX0R3I= github.com/golang/groupcache v0.0.0-20190702054246-869f871628b6/go.mod h1:cIg4eruTrX1D+g88fzRXU5OdNfaM+9IcxsU14FzY7Hc= github.com/golang/mock v1.1.1/go.mod h1:oTYuIxOrZwtPieC+H1uAHpcLFnEyAGVDL/k47Jfbm0A= github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U= github.com/golang/protobuf v1.3.1 h1:YF8+flBXS5eO826T4nzqPrxfhQThhXl0YzfuUPu4SBg= github.com/golang/protobuf v1.3.1/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U= github.com/golang/snappy v0.0.3 h1:fHPg5GQYlCeLIPB9BZqMVR5nR9A+IM5zcgeTdjMYmLA= github.com/golang/snappy v0.0.3/go.mod h1:/XxbfmMg8lxefKM7IXC3fBNl/7bRcc72aCRzEWrmP2Q= github.com/google/flatbuffers v1.12.1 h1:MVlul7pQNoDzWRLTw5imwYsl+usrS1TXG2H4jg6ImGw= github.com/google/flatbuffers v1.12.1/go.mod h1:1AeVuKshWv4vARoZatz6mlQ0JxURH0Kv5+zNeJKJCa8= github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU= github.com/google/go-cmp v0.5.4/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/hashicorp/hcl v1.0.0/go.mod h1:E5yfLk+7swimpb2L/Alb/PJmXilQ/rhwaUYs4T20WEQ= github.com/inconshreveable/mousetrap v1.0.0/go.mod h1:PxqpIevigyE2G7u3NXJIT2ANytuPF1OarO4DADm73n8= github.com/kisielk/errcheck v1.5.0/go.mod h1:pFxgyoBC7bSaBwPgfKdkLd5X25qrDl4LWUI2bnpBCr8= github.com/kisielk/gotool v1.0.0/go.mod h1:XhKaO+MFFWcvkIS/tQcRk01m1F5IRFswLeQ+oQHNcck= github.com/klauspost/compress v1.12.3 h1:G5AfA94pHPysR56qqrkO2pxEexdDzrpFJ6yt/VqWxVU= github.com/klauspost/compress v1.12.3/go.mod h1:8dP1Hq4DHOhN9w426knH3Rhby4rFm6D8eO+e+Dq5Gzg= github.com/kr/pretty v0.1.0 h1:L/CwN0zerZDmRFUapSPitk6f+Q3+0za1rQkzVuMiMFI= github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo= github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ= github.com/kr/text v0.1.0 h1:45sCR5RtlFHMR4UwH9sdQ5TC8v0qDQCHnXt+kaKSTVE= github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI= github.com/leodido/go-urn v1.2.0 h1:hpXL4XnriNwQ/ABnpepYM/1vCLWNDfUNts8dX3xTG6Y= github.com/leodido/go-urn v1.2.0/go.mod h1:+8+nEpDfqqsY+g338gtMEUOtuK+4dEMhiQEgxpxOKII= github.com/magiconair/properties v1.8.0/go.mod h1:PppfXfuXeibc/6YijjN8zIbojt8czPbwD3XqdrwzmxQ= github.com/mattn/go-colorable v0.1.8 h1:c1ghPdyEDarC70ftn0y+A/Ee++9zz8ljHG1b13eJ0s8= github.com/mattn/go-colorable v0.1.8/go.mod h1:u6P/XSegPjTcexA+o6vUJrdnUu04hMope9wVRipJSqc= github.com/mattn/go-isatty v0.0.12 h1:wuysRhFDzyxgEmMf5xjvJ2M9dZoWAXNNr5LSBS7uHXY= github.com/mattn/go-isatty v0.0.12/go.mod h1:cbi8OIDigv2wuxKPP5vlRcQ1OAZbq2CE4Kysco4FUpU= github.com/mitchellh/go-homedir v1.1.0/go.mod h1:SfyaCUpYCn1Vlf4IUYiD9fPX4A5wJrkLzIz1N1q0pr0= github.com/mitchellh/mapstructure v1.1.2/go.mod h1:FVVH3fgwuzCH5S8UJGiWEs2h04kUh9fWfEaFds41c1Y= github.com/pelletier/go-toml v1.2.0/go.mod h1:5z9KED0ma1S8pY6P1sdut58dfprrGBbd/94hg7ilaic= github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4= github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= github.com/russross/blackfriday v1.5.2/go.mod h1:JO/DiYxRf+HjHt06OyowR9PTA263kcR/rfWxYHBV53g= github.com/russross/blackfriday/v2 v2.1.0 h1:JIOH55/0cWyOuilr9/qlrm0BSXldqnqwMsf35Ld67mk= github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM= github.com/spaolacci/murmur3 v0.0.0-20180118202830-f09979ecbc72/go.mod h1:JwIasOWyU6f++ZhiEuf87xNszmSA2myDM2Kzu9HwQUA= github.com/spaolacci/murmur3 v1.1.0 h1:7c1g84S4BPRrfL5Xrdp6fOJ206sU9y293DDHaoy0bLI= github.com/spaolacci/murmur3 v1.1.0/go.mod h1:JwIasOWyU6f++ZhiEuf87xNszmSA2myDM2Kzu9HwQUA= github.com/spf13/afero v1.1.2/go.mod h1:j4pytiNVoe2o6bmDsKpLACNPDBIoEAkihy7loJ1B0CQ= github.com/spf13/cast v1.3.0/go.mod h1:Qx5cxh0v+4UWYiBimWS+eyWzqEqokIECu5etghLkUJE= github.com/spf13/cobra v0.0.5/go.mod h1:3K3wKZymM7VvHMDS9+Akkh4K60UwM26emMESw8tLCHU= github.com/spf13/jwalterweatherman v1.0.0/go.mod h1:cQK4TGJAtQXfYWX+Ddv3mKDzgVb68N+wFjFa4jdeBTo= github.com/spf13/pflag v1.0.3/go.mod h1:DYY7MBk1bdzusC3SYhjObp+wFpr4gzcvqqNjLnInEg4= github.com/spf13/viper v1.3.2/go.mod h1:ZiWeW+zYFKm7srdB9IoDzzZXaJaI5eL9QjNiN/DMA2s= github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs= github.com/stretchr/testify v1.4.0 h1:2E4SXV/wtOkTonXsotYi4li6zVWxYlZuYNCXe9XRJyk= github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4= github.com/twiny/carbon v1.0.1 h1:srGnk3N4KbAvCVgieWzYgZkLoBYGjnerTdxqzPy3TQs= github.com/twiny/carbon v1.0.1/go.mod h1:Ymh/hwZd8cZWYWnSL9xqSaQMd955k9EJx4/YS8wVdv0= github.com/twiny/domaincheck v0.1.0 h1:ByFbTKzdLymEaEkqAoA+vFuBxi33zOOyXCOTvvAm95c= github.com/twiny/domaincheck v0.1.0/go.mod h1:vlDqt80kuclqhfG3KrTu/rJd7aZe5P6viJ2acVuUvL4= github.com/twiny/flog v1.0.3 h1:iBTf+yEm/maBTJYFaMgD2lXIE5g7gSZnaTnmVXbs1tI= github.com/twiny/flog v1.0.3/go.mod h1:Hi9bzahz0Zmw30XiBT9oqWOrc10ive6L42Owwz02Vp8= github.com/twiny/ratelimit v0.0.0-20220509163414-256d3376b0ac h1:nT+8DFvrU5Nu3Be2bK7LooU8AslFJeypQoAF+wm1CM0= github.com/twiny/ratelimit v0.0.0-20220509163414-256d3376b0ac/go.mod h1:C589KqlnfcMeRAJ+evrNJwSf9ddkXO926hRDtgjjoYM= github.com/twiny/wbot v0.1.5 h1:yTfTv6+tmVHik6aY2DLuJZUG5/WPP37oE2TAgXkXRno= github.com/twiny/wbot v0.1.5/go.mod h1:JNeqtjncCXLALd0qaKw2q/4kC8F34weLiyf9QOljzQk= github.com/twiny/whois/v2 v2.0.1 h1:jDqkiq0wv2qdm9d/bquhQpg7AhJDYf89g7ozZElSTuA= github.com/twiny/whois/v2 v2.0.1/go.mod h1:UeyP4HmWFruXXuYQ722s/BnWgwxi7fRb/bk9Fnqm7OA= github.com/ugorji/go/codec v0.0.0-20181204163529-d75b2dcb6bc8/go.mod h1:VFNgLljTbGfSG7qAOspJ7OScBnGdDN/yBr0sguwnwf0= github.com/urfave/cli/v2 v2.10.3 h1:oi571Fxz5aHugfBAJd5nkwSk3fzATXtMlpxdLylSCMo= github.com/urfave/cli/v2 v2.10.3/go.mod h1:f8iq5LtQ/bLxafbdBSLPPNsgaW0l/2fYYEHhAyPlwvo= github.com/xordataexchange/crypt v0.0.3-0.20170626215501-b2862e3d0a77/go.mod h1:aYKd//L2LvnjZzWKhF00oedf4jCCReLcmhLdhm1A27Q= github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673 h1:bAn7/zixMGCfxrRTfdpNzjtPYqr8smhKouy9mxVdGPU= github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673/go.mod h1:N3UwUGtsrSj3ccvlPHLoLsHnpR27oXr4ZE984MbSER8= github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= go.opencensus.io v0.22.5 h1:dntmOdLpSpHlVqbW5Eay97DelsZHe+55D+xC6i0dDS0= go.opencensus.io v0.22.5/go.mod h1:5pWMHQbX5EPX2/62yrJeAkowc+lfs/XD7Uxpq3pI6kk= golang.org/x/crypto v0.0.0-20181203042331-505ab145d0a9/go.mod h1:6SG95UA2DQfeDnfUPMdvaQW0Q7yPrPDi9nlGo2tz2b4= golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9 h1:psW17arqaxU48Z5kZ0CQnkZWQJsqcURM6tKiBApRjXI= golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto= golang.org/x/exp v0.0.0-20190121172915-509febef88a4/go.mod h1:CJ0aWSM057203Lf6IL+f9T1iT9GByDxfZKAQTCR3kQA= golang.org/x/lint v0.0.0-20181026193005-c67002cb31c3/go.mod h1:UVdnD1Gm6xHRNCYTkRU2/jEulfH38KcIWyp/GAMgvoE= golang.org/x/lint v0.0.0-20190227174305-5b3e6a55c961/go.mod h1:wehouNa3lNwaWXcvxsM5YxQ5yQlVC4a0KAMCusXpPoU= golang.org/x/lint v0.0.0-20190313153728-d0100b6bd8b3/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc= golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/net v0.0.0-20180724234803-3673e40ba225/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= golang.org/x/net v0.0.0-20180826012351-8a410e7b638d/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= golang.org/x/net v0.0.0-20190213061140-3a22650c66bd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= golang.org/x/net v0.0.0-20190311183353-d8887717615a/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg= golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= golang.org/x/net v0.0.0-20200226121028-0de0cce0169b/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU= golang.org/x/net v0.0.0-20210916014120-12bc252f5db8/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y= golang.org/x/net v0.0.0-20220513224357-95641704303c h1:nF9mHSvoKBLkQNQhJZNsc66z2UzAMUbLGjC95CF3pU0= golang.org/x/net v0.0.0-20220513224357-95641704303c/go.mod h1:CfG3xpIq0wQ8r1q4Su4UZFWDARRcnwPjda9FqA0JpMk= golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U= golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20190227155943-e225da77a7e6/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sys v0.0.0-20180830151530-49385e6e1522/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20181205085412-a5c9d58dba9a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20190502145724-3ef323f4f1fd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20200116001909-b77594299b42/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20200223170610-d5e6a3e2c0ae/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20210124154548-22da62e12c0c/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20210423082822-04245dca01da/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20211216021012-1d35b9e2eb4e h1:fLOSk5Q00efkSvAm+4xcoXD+RRmLmmulPn5I3Y9F2EM= golang.org/x/sys v0.0.0-20211216021012-1d35b9e2eb4e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.2/go.mod h1:bEr9sfX3Q8Zfm5fL9x+3itogRgK3+ptLWKqgva+5dAk= golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= golang.org/x/tools v0.0.0-20190114222345-bf090417da8b/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ= golang.org/x/tools v0.0.0-20190226205152-f727befe758c/go.mod h1:9Yl7xja0Znq3iFh3HoIrodX9oNMXvdceNzlUR8zjMvY= golang.org/x/tools v0.0.0-20190311212946-11955173bddd/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs= golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtnZ6UAqBI28+e2cm9otk0dWdXHAEo= golang.org/x/tools v0.0.0-20200619180055-7c47624df98f/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE= golang.org/x/tools v0.0.0-20210106214847-113979e3529a/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA= golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 h1:go1bK/D/BFZV2I8cIQd1NKEZ+0owSTG1fDTci4IqFcE= golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0= google.golang.org/appengine v1.1.0/go.mod h1:EbEs0AVv82hx2wNQdGPgUI5lhzA/G0D9YwlJXL52JkM= google.golang.org/appengine v1.4.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4= google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8/go.mod h1:JiN7NxoALGmiZfu7CAH4rXhgtRTLTxftemlI0sWmxmc= google.golang.org/genproto v0.0.0-20190425155659-357c62f0e4bb/go.mod h1:VzzqZJRnGkLBvHegQrXjBqPurQTc5/KpmUdxsrq26oE= google.golang.org/grpc v1.19.0/go.mod h1:mqu4LbDTu4XGKhr4mRzUsmM4RtVoemTSY81AxZiDr8c= google.golang.org/grpc v1.20.1/go.mod h1:10oTOabMzJvdu6/UiuZezV6QK5dSlG84ov/aaiqXj38= gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15 h1:YR8cESwS4TdDjEe65xsg0ogRM/Nc3DYOhEAlW+xobZo= gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= gopkg.in/yaml.v2 v2.2.2 h1:ZCJp+EgiOT7lHqUV2J862kp8Qj64Jo6az82+3Td9dZw= gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= honnef.co/go/tools v0.0.0-20190102054323-c2f93a96b099/go.mod h1:rf3lG4BRIbNafJWhAfAdb/ePZxsR/4RtNHQocxwk9r4= ================================================ FILE: internal/pkg/hbyte/hbyte.go ================================================ package hbyte import ( "fmt" "strings" ) const ( b = "b" kb = "kb" mb = "mb" gb = "gb" tb = "tb" ) // type BYTE int64 const ( B int64 = 1 << (10 * iota) KB MB GB TB ) // Parse func Parse(s string) int64 { // lower case s = strings.ToLower(s) var n int64 var unit string fmt.Sscanf(s, "%d%s", &n, &unit) switch unit { case b: return n case kb: return n * KB case mb: return n * MB case gb: return n * GB case tb: return n * TB default: return n } } // String func String(n int64) string { switch { case n >= TB: return fmt.Sprintf("%d %s", n/TB, tb) case n >= GB: return fmt.Sprintf("%d %s", n/GB, gb) case n >= MB: return fmt.Sprintf("%d %s", n/MB, mb) case n >= KB: return fmt.Sprintf("%d %s", n/KB, kb) default: return fmt.Sprintf("%d %s", n, b) } } ================================================ FILE: internal/pkg/spider/v1/domain.go ================================================ package spider // Domain type Domain struct { URL string Name string TLD string Status string } // CSVRow func (d Domain) CSVRow() []string { var row []string return append(row, d.URL, d.Name, d.TLD, d.Status) } ================================================ FILE: internal/pkg/spider/v1/page.go ================================================ package spider import "net/url" // Page type Page struct { URL *url.URL Status int Body []byte } ================================================ FILE: internal/pkg/spider/v1/setting.go ================================================ package spider import ( "io/ioutil" "runtime" "strconv" "strings" "time" // "github.com/twiny/spidy/v2/internal/pkg/hbyte" "gopkg.in/yaml.v3" ) // default cores var core = func() int { c := runtime.NumCPU() if c == 1 { return c } return c - 1 }() // defaultSetting var defaultSetting = &Setting{ Crawler: struct { MaxDepth int32 Filter []string Limit struct { Rate int Interval time.Duration } MaxBodySize int64 UserAgents []string Proxies []string }{ MaxDepth: 10, Filter: []string{}, Limit: struct { Rate int Interval time.Duration }{ Rate: 1, Interval: time.Second, }, MaxBodySize: 10 * 1024 * 1024, // 10 MB UserAgents: []string{`Spidy/2.1; +https://github.com/twiny/spidy`}, Proxies: []string{}, }, Log: struct { Rotate int Path string }{ Rotate: 7, Path: "./log", }, Store: struct { TTL time.Duration Path string }{ TTL: 6 * time.Hour, // format: 1h, 1d, 1w, 1m - minimum 6h Path: "./store", }, Result: struct{ Path string }{ Path: "./result", }, Parralle: core, Timeout: 1 * time.Minute, TLDs: tlds, } // Setting type Setting struct { Crawler struct { MaxDepth int32 Filter []string Limit struct { Rate int Interval time.Duration } MaxBodySize int64 UserAgents []string Proxies []string } Log struct { Rotate int // format: 30d Path string } Store struct { TTL time.Duration Path string } Result struct { Path string } Parralle int Timeout time.Duration TLDs map[string]bool } // ParseSetting func ParseSetting(fp string) *Setting { data, err := ioutil.ReadFile(fp) if err != nil { return defaultSetting } var s = struct { Crawler struct { MaxDepth int32 `yaml:"max_depth"` Filter []string `yaml:"filter,flow"` RateLimit string `yaml:"rate_limit"` // format: req/time.Duration => 5/1s MaxBodySize string `yaml:"max_body_size"` UserAgents []string `yaml:"user_agents,flow"` Proxies []string `yaml:"proxies,flow"` } `yaml:"crawler"` Log struct { Rotate int `yaml:"rotate"` // format: 30d Path string `yaml:"path"` } `yaml:"log"` Store struct { TTL string `yaml:"ttl"` // format: 1h, 24h Path string `yaml:"path"` } `yaml:"store"` Result struct { Path string `yaml:"path"` } `yaml:"result"` Parralle int `yaml:"parralle"` Timeout string `yaml:"timeout"` TLDs []string `yaml:"tlds,flow"` }{} if err := yaml.Unmarshal(data, &s); err != nil { return defaultSetting } rate, interval := parseRateLimit(s.Crawler.RateLimit) return &Setting{ Crawler: struct { MaxDepth int32 Filter []string Limit struct { Rate int Interval time.Duration } MaxBodySize int64 UserAgents []string Proxies []string }{ MaxDepth: s.Crawler.MaxDepth, Filter: s.Crawler.Filter, Limit: struct { Rate int Interval time.Duration }{ Rate: rate, Interval: interval, }, MaxBodySize: parseBodySize(s.Crawler.MaxBodySize), UserAgents: s.Crawler.UserAgents, Proxies: s.Crawler.Proxies, }, Log: struct { Rotate int Path string }{ Rotate: s.Log.Rotate, Path: s.Log.Path, }, Store: struct { TTL time.Duration Path string }{ TTL: parseTTL(s.Store.TTL), Path: s.Store.Path, }, Result: struct{ Path string }{ Path: s.Result.Path, }, Parralle: s.Parralle, Timeout: parseTimeout(s.Timeout), TLDs: parseTLDs(s.TLDs), } } // parseRateLimit func parseRateLimit(s string) (int, time.Duration) { // default rate limit dr, di := defaultSetting.Crawler.Limit.Rate, defaultSetting.Crawler.Limit.Interval if s == "" { return dr, di } parts := strings.Split(s, "/") if len(parts) != 2 { return dr, di } r, i := parts[0], parts[1] rate, err := strconv.Atoi(r) if err != nil { return dr, di } interval, err := time.ParseDuration(i) if err != nil { return dr, di } return rate, interval } // parseTLDs func parseTLDs(list []string) map[string]bool { m := map[string]bool{} for _, s := range list { m[s] = true } return m } // parseTimeout func parseTimeout(s string) time.Duration { d, err := time.ParseDuration(s) if err != nil { return defaultSetting.Timeout } return d } // parseTTL func parseTTL(s string) time.Duration { d, err := time.ParseDuration(s) if err != nil { return defaultSetting.Timeout } return d } // parseBodySize func parseBodySize(s string) int64 { size := hbyte.Parse(s) if size == 0 { return defaultSetting.Crawler.MaxBodySize } return size } ================================================ FILE: internal/pkg/spider/v1/store.go ================================================ package spider // Storage type Storage interface { HasChecked(name string) bool Close() error } ================================================ FILE: internal/pkg/spider/v1/string_replacer.go ================================================ package spider import "strings" // UnescapeHTML: replace Unicode Character with a whitespace // to avoid getting wrong results when extracting domain from text. var UnescapeHTML = strings.NewReplacer( `\u002f`, ` `, `\u002F`, ` `, // `\u0020`, ` `, `\u0021`, ` `, `\u0022`, ` `, `\u0023`, ` `, `\u0024`, ` `, `\u0025`, ` `, `\u0026`, ` `, `\u0027`, ` `, `\u0028`, ` `, `\u0029`, ` `, // `\u002a`, ` `, `\u002A`, ` `, // `\u002b`, ` `, `\u002B`, ` `, // `\u002c`, ` `, `\u002C`, ` `, // `\u002d`, ` `, `\u002D`, ` `, // `\u002e`, ` `, `\u002E`, ` `, // `\u0030`, ` `, `\u0031`, ` `, `\u0032`, ` `, `\u0033`, ` `, `\u0034`, ` `, `\u0035`, ` `, `\u0036`, ` `, `\u0037`, ` `, `\u0038`, ` `, `\u0039`, ` `, // `\u003a`, ` `, `\u003A`, ` `, // `\u003b`, ` `, `\u003B`, ` `, // `\u003c`, ` `, `\u003C`, ` `, // `\u003d`, ` `, `\u003D`, ` `, // `\u003e`, ` `, `\u003E`, ` `, // `\u003f`, ` `, `\u003F`, ` `, // `\u0040`, ` `, `\u0041`, ` `, `\u0042`, ` `, `\u0043`, ` `, `\u0044`, ` `, `\u0045`, ` `, `\u0046`, ` `, `\u0047`, ` `, `\u0048`, ` `, `\u0049`, ` `, // `\u004a`, ` `, `\u004A`, ` `, // `\u004b`, ` `, `\u004B`, ` `, // `\u004c`, ` `, `\u004C`, ` `, // `\u004d`, ` `, `\u004D`, ` `, // `\u004e`, ` `, `\u004E`, ` `, // `\u004f`, ` `, `\u004F`, ` `, // `\u0050`, ` `, `\u0051`, ` `, `\u0052`, ` `, `\u0053`, ` `, `\u0054`, ` `, `\u0055`, ` `, `\u0056`, ` `, `\u0057`, ` `, `\u0058`, ` `, `\u0059`, ` `, // `\u005a`, ` `, `\u005A`, ` `, // `\u005b`, ` `, `\u005B`, ` `, // `\u005c`, ` `, `\u005C`, ` `, // `\u005d`, ` `, `\u005D`, ` `, // `\u005e`, ` `, `\u005E`, ` `, // `\u005f`, ` `, `\u005F`, ` `, // `\u0060`, ` `, `\u0061`, ` `, `\u0062`, ` `, `\u0063`, ` `, `\u0064`, ` `, `\u0065`, ` `, `\u0066`, ` `, `\u0067`, ` `, `\u0068`, ` `, `\u0069`, ` `, // `\u006a`, ` `, `\u006A`, ` `, // `\u006b`, ` `, `\u006B`, ` `, // `\u006c`, ` `, `\u006C`, ` `, // `\u006d`, ` `, `\u006D`, ` `, // `\u006e`, ` `, `\u006E`, ` `, // `\u006f`, ` `, `\u006F`, ` `, // `\u0070`, ` `, `\u0071`, ` `, `\u0072`, ` `, `\u0073`, ` `, `\u0074`, ` `, `\u0075`, ` `, `\u0076`, ` `, `\u0077`, ` `, `\u0078`, ` `, `\u0079`, ` `, // `\u007a`, ` `, `\u007A`, ` `, // `\u007b`, ` `, `\u007B`, ` `, // `\u007c`, ` `, `\u007C`, ` `, // `\u007d`, ` `, `\u007D`, ` `, // `\u007e`, ` `, `\u007E`, ` `, // `%20`, ` `, `%21`, ` `, `%22`, ` `, `%23`, ` `, `%24`, ` `, `%25`, ` `, `%26`, ` `, `%27`, ` `, `%28`, ` `, `%29`, ` `, `%2A`, ` `, `%2B`, ` `, `%2C`, ` `, `%2D`, ` `, `%2E`, ` `, `%2F`, ` `, `%30`, ` `, `%31`, ` `, `%32`, ` `, `%33`, ` `, `%34`, ` `, `%35`, ` `, `%36`, ` `, `%37`, ` `, `%38`, ` `, `%39`, ` `, `%3A`, ` `, `%3B`, ` `, `%3C`, ` `, `%3D`, ` `, `%3E`, ` `, `%3F`, ` `, `%40`, ` `, `%41`, ` `, `%42`, ` `, `%43`, ` `, `%44`, ` `, `%45`, ` `, `%46`, ` `, `%47`, ` `, `%48`, ` `, `%49`, ` `, `%4A`, ` `, `%4B`, ` `, `%4C`, ` `, `%4D`, ` `, `%4E`, ` `, `%4F`, ` `, `%50`, ` `, `%51`, ` `, `%52`, ` `, `%53`, ` `, `%54`, ` `, `%55`, ` `, `%56`, ` `, `%57`, ` `, `%58`, ` `, `%59`, ` `, `%5A`, ` `, `%5B`, ` `, `%5C`, ` `, `%5D`, ` `, `%5E`, ` `, `%5F`, ` `, `%60`, ` `, `%61`, ` `, `%62`, ` `, `%63`, ` `, `%64`, ` `, `%65`, ` `, `%66`, ` `, `%67`, ` `, `%68`, ` `, `%69`, ` `, `%6A`, ` `, `%6B`, ` `, `%6C`, ` `, `%6D`, ` `, `%6E`, ` `, `%6F`, ` `, `%70`, ` `, `%71`, ` `, `%72`, ` `, `%73`, ` `, `%74`, ` `, `%75`, ` `, `%76`, ` `, `%77`, ` `, `%78`, ` `, `%79`, ` `, `%7A`, ` `, `%7B`, ` `, `%7C`, ` `, `%7D`, ` `, `%7E`, ` `, `%7F`, ` `, `%80`, ` `, `%81`, ` `, `%82`, ` `, `%83`, ` `, `%84`, ` `, `%85`, ` `, `%86`, ` `, `%87`, ` `, `%88`, ` `, `%89`, ` `, `%8A`, ` `, `%8B`, ` `, `%8C`, ` `, `%8D`, ` `, `%8E`, ` `, `%8F`, ` `, `%90`, ` `, `%91`, ` `, `%92`, ` `, `%93`, ` `, `%94`, ` `, `%95`, ` `, `%96`, ` `, `%97`, ` `, `%98`, ` `, `%99`, ` `, `%9A`, ` `, `%9B`, ` `, `%9C`, ` `, `%9D`, ` `, `%9E`, ` `, `%9F`, ` `, `%A0`, ` `, `%A1`, ` `, `%A2`, ` `, `%A3`, ` `, `%A4`, ` `, `%A5`, ` `, `%A6`, ` `, `%A7`, ` `, `%A8`, ` `, `%A9`, ` `, `%AA`, ` `, `%AB`, ` `, `%AC`, ` `, `%AD`, ` `, `%AE`, ` `, `%AF`, ` `, `%B0`, ` `, `%B1`, ` `, `%B2`, ` `, `%B3`, ` `, `%B4`, ` `, `%B5`, ` `, `%B6`, ` `, `%B7`, ` `, `%B8`, ` `, `%B9`, ` `, `%BA`, ` `, `%BB`, ` `, `%BC`, ` `, `%BD`, ` `, `%BE`, ` `, `%BF`, ` `, `%C0`, ` `, `%C1`, ` `, `%C2`, ` `, `%C3`, ` `, `%C4`, ` `, `%C5`, ` `, `%C6`, ` `, `%C7`, ` `, `%C8`, ` `, `%C9`, ` `, `%CA`, ` `, `%CB`, ` `, `%CC`, ` `, `%CD`, ` `, `%CE`, ` `, `%CF`, ` `, `%D0`, ` `, `%D1`, ` `, `%D2`, ` `, `%D3`, ` `, `%D4`, ` `, `%D5`, ` `, `%D6`, ` `, `%D7`, ` `, `%D8`, ` `, `%D9`, ` `, `%DA`, ` `, `%DB`, ` `, `%DC`, ` `, `%DD`, ` `, `%DE`, ` `, `%DF`, ` `, `%E0`, ` `, `%E1`, ` `, `%E2`, ` `, `%E3`, ` `, `%E4`, ` `, `%E5`, ` `, `%E6`, ` `, `%E7`, ` `, `%E8`, ` `, `%E9`, ` `, `%EA`, ` `, `%EB`, ` `, `%EC`, ` `, `%ED`, ` `, `%EE`, ` `, `%EF`, ` `, `%F0`, ` `, `%F1`, ` `, `%F2`, ` `, `%F3`, ` `, `%F4`, ` `, `%F5`, ` `, `%F6`, ` `, `%F7`, ` `, `%F8`, ` `, `%F9`, ` `, `%FA`, ` `, `%FB`, ` `, `%FC`, ` `, `%FD`, ` `, `%FE`, ` `, `%FF`, ` `, ) ================================================ FILE: internal/pkg/spider/v1/tld_list.go ================================================ package spider // allowed TLDs: list of allowed domain tlds to avoid getting bad extensions. var tlds = map[string]bool{ "ac": true, "ae": true, "aero": true, "af": true, "ag": true, "am": true, "as": true, "asia": true, "at": true, "au": true, "ax": true, "be": true, "bg": true, "bi": true, "biz": true, "bj": true, "br": true, "by": true, "ca": true, "cat": true, "cc": true, "cl": true, "cn": true, "co": true, "com": true, "coop": true, "cx": true, "de": true, "dk": true, "dm": true, "dz": true, "edu": true, "ee": true, "eu": true, "fi": true, "fo": true, "fr": true, "gb.com": true, "qc.com": true, "ge": true, "gl": true, "gov": true, "gs": true, "hk": true, "hr": true, "hu": true, "hu.com": true, "id": true, "ie": true, "in": true, "info": true, "int": true, "io": true, "ir": true, "is": true, "je": true, "jobs": true, "kg": true, "kr": true, "la": true, "lu": true, "lv": true, "ly": true, "ma": true, "md": true, "me": true, "mk": true, "mobi": true, "ms": true, "mu": true, "mx": true, "name": true, "net": true, "nf": true, "ng": true, "no": true, "no.com": true, "nu": true, "nz": true, "org": true, "pl": true, "pr": true, "pro": true, "pw": true, "ro": true, "ru": true, "sa.com": true, "sc": true, "se": true, "se.com": true, "sg": true, "sh": true, "si": true, "sk": true, "sm": true, "st": true, "so": true, "su": true, "tc": true, "tel": true, "tf": true, "th": true, "tk": true, "tl": true, "tm": true, "tn": true, "travel": true, "tw": true, "tv": true, "tz": true, "ua": true, "uk": true, "us": true, "uy.com": true, "uz": true, "vc": true, "ve": true, "vg": true, "ws": true, "xxx": true, "yu": true, "za.com": true, } ================================================ FILE: internal/pkg/spider/v1/utils.go ================================================ package spider import ( "bytes" "regexp" "strings" "github.com/PuerkitoBio/goquery" "golang.org/x/net/publicsuffix" ) var ( // domain regexp domainRegexp = regexp.MustCompile(`(([[:alnum:]]-?)?([[:alnum:]]-?)+\.)+[[:alpha:]]{2,4}`) ) // FindDomains func FindDomains(body []byte) (domains []Domain) { doc, err := goquery.NewDocumentFromReader(bytes.NewReader(body)) if err != nil { return } var s = UnescapeHTML.Replace(doc.Text()) for _, domain := range domainRegexp.FindAllString(s, -1) { name, tld, ok := splitDomain(domain) if ok { domains = append(domains, Domain{ Name: name, TLD: tld, }) } } return } // SplitDomain func splitDomain(d string) (name string, tld string, ok bool) { // get domain tld root, err := publicsuffix.EffectiveTLDPlusOne(d) if err != nil { return } //convert to domain name, and tld i := strings.Index(root, ".") tld = root[i+1:] if _, ok = tlds[tld]; !ok { return } root = strings.ToLower(root) tld = strings.ToLower(tld) name = strings.TrimSuffix(root, "."+tld) return } ================================================ FILE: internal/pkg/spider/v1/writer.go ================================================ package spider // Writer type Writer interface { Write(*Domain) error } ================================================ FILE: internal/service/cache/cache.go ================================================ package cache import ( "time" // // "github.com/twiny/carbon" ) // Cache type Cache struct { ttl time.Duration db *carbon.Cache } // NewCache func NewCache(ttl time.Duration, dir string) (*Cache, error) { db, err := carbon.NewCache(dir) if err != nil { return nil, err } return &Cache{ ttl: ttl, db: db, }, nil } // HasChecked func (c *Cache) HasChecked(name string) bool { // first check if domain is in cache b, err := c.db.Get(name) if err != nil || b == nil { // if not found save to cache if err := c.db.Set(name, []byte(name), c.ttl); err != nil { return false } return false } return true } // Close func (c *Cache) Close() error { c.db.Close() return nil } ================================================ FILE: internal/service/writer/csv_writer.go ================================================ package writer import ( "encoding/csv" "os" "path/filepath" "sync" "time" "github.com/twiny/spidy/v2/internal/pkg/spider/v1" ) // CSVWriter type CSVWriter struct { l *sync.Mutex f *os.File w *csv.Writer } // NewCSVWriter func NewCSVWriter(dir string) (*CSVWriter, error) { if _, err := os.Stat(dir); os.IsNotExist(err) { if err := os.MkdirAll(dir, 0755); err != nil { return nil, err } } name := time.Now().Format("2006-01-02") fp := filepath.Join(dir, name+"_domains.csv") // open or create log f, err := os.OpenFile(fp, os.O_APPEND|os.O_CREATE|os.O_WRONLY, os.ModePerm) if err != nil { return nil, err } return &CSVWriter{ l: &sync.Mutex{}, f: f, w: csv.NewWriter(f), }, nil } // Write func (c *CSVWriter) Write(d *spider.Domain) error { c.l.Lock() defer func() { c.l.Unlock() c.w.Flush() }() return c.w.Write([]string{d.Name + "." + d.TLD, d.Status}) } // Close func (c *CSVWriter) Close() error { return c.f.Close() }