Repository: bellecp/fast-p Branch: master Commit: 762d856f328c Files: 6 Total size: 14.5 KB Directory structure: gitextract_5ax35fwf/ ├── .gitignore ├── .goreleaser.yml ├── LICENSE ├── README.md ├── main.go └── p ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ dist/* ================================================ FILE: .goreleaser.yml ================================================ # This is an example goreleaser.yaml file with some sane defaults. # Make sure to check the documentation at http://goreleaser.com builds: - env: - CGO_ENABLED=0 archive: replacements: darwin: Darwin linux: Linux windows: Windows 386: i386 amd64: x86_64 checksum: name_template: 'checksums.txt' snapshot: name_template: "{{ .Tag }}-next" changelog: sort: asc filters: exclude: - '^docs:' - '^test:' nfpm: # You can change the name of the package. # Default: `{{ .ProjectName }}_{{ .Version }}_{{ .Os }}_{{ .Arch }}{{ if .Arm }}v{{ .Arm }}{{ end }}` name_template: "{{ .ProjectName }}_{{ .Version }}_{{ .Os }}_{{ .Arch }}" homepage: https://github.com/bellecp/fast-p description: Fast commandline pdf fuzzy finder maintainer: http://github.com/bellecp # Formats to be generated. formats: - deb - rpm license: MIT brew: name: fast-pdf-finder github: owner: bellecp name: homebrew-fast-p # Git author used to commit to the repository. # Defaults are shown. commit_author: name: bellecp email: bellecp@users.noreply.github.com # Your app's homepage. # Default is empty. homepage: "https://github.com/bellecp/fast-p" # Your app's description. # Default is empty. description: "Fast, command-line PDF finder" # Packages your package depends on. dependencies: - grep - fzf - coreutils - findutils - poppler - pkg-config - the_silver_searcher # Custom install script for brew. # Default is 'bin.install "program"'. install: | bin.install "fast-p" ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2018 bellecp (github.com/bellecp) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # fast-p Quickly find and open a pdf among a collection of thousands of unsorted pdfs through fzf (fuzzy finder) - [Installation on Linux](#installation-on-unix-or-linux-based-systems) - [Installation on OSX](#installation-on-osx-with-homebrew) - [Usage](#usage) - [How to clear the cache?](#how-to-clear-the-cache) - [Launch with keyboard shortcut in Ubuntu](#launch-with-keyboard-shortcut-in-ubuntu) - [See it in action](#see-it-in-action) - [Is the historical bash code still available?](#is-the-historical-bash-code-still-available) # Installation on Unix or Linux based systems 1. __Requirements.__ Make sure the following requirements are satisfied: - install ``pdftotext``. This comes with the texlive distribution on linux, On ubuntu, ``sudo apt-get install poppler-utils`` . - install ``fzf``: https://github.com/junegunn/fzf - install ``GNU grep``, ``ag`` (silver searcher). 2. __Install binary__. Do either one of the two steps below: - __Compile from source with ``go`` and ``go get``.__ With a working ``golang`` installation, do ```go install github.com/bellecp/fast-p@v0.2.5``` It will fetch the code and its dependencies, compile and create an executable ``fast-p`` in the ``/bin`` folder of your go installation, typically ``~/go/bin``. Make sure the command ``fast-p`` can be found (for instance, add ``~/go/bin`` to your ``$PATH``.) - Or: __Use the precompiled binary for your architecture.__ Download the binary that corresponds to your architecture at https://github.com/bellecp/fast-p/releases and make sure that the command ``fast-p`` can be found. For instance, put the binary file ``fast-p`` in ``~/custom/bin`` and add ``export PATH=~/custom/bin:$PATH`` to your ``.bashrc``. 3. __Tweak your .bashrc__. Add the following code to your ``.bashrc`` ``` p () { open=xdg-open # this will open pdf file withthe default PDF viewer on KDE, xfce, LXDE and perhaps on other desktops. ag -U -g ".pdf$" \ | fast-p \ | fzf --read0 --reverse -e -d $'\t' \ --preview-window down:80% --preview ' v=$(echo {q} | tr " " "|"); echo -e {1}"\n"{2} | grep -E "^|$v" -i --color=always; ' \ | cut -z -f 1 -d $'\t' | tr -d '\n' | xargs -r --null $open > /dev/null 2> /dev/null } ``` - You may replace ``ag -U -g ".pdf$"`` with another command that returns a list of pdf files. - You may replace ``open=...`` by your favorite PDF viewer, for instance ``open=evince`` or ``open=okular``. # Installation on OSX with homebrew 1. Install [homebrew](https://brew.sh/) and __run__ ``` brew install bellecp/fast-p/fast-pdf-finder ``` _The above brew formula is experimental. Please report any issues/suggestions/feedback at _ 2. __Tweak your .bashrc__. Add the following code to your ``.bashrc`` ``` p () { local open open=open # on OSX, "open" opens a pdf in preview ag -U -g ".pdf$" \ | fast-p \ | fzf --read0 --reverse -e -d $'\t' \ --preview-window down:80% --preview ' v=$(echo {q} | gtr " " "|"); echo -e {1}"\n"{2} | ggrep -E "^|$v" -i --color=always; ' \ | gcut -z -f 1 -d $'\t' | gtr -d '\n' | gxargs -r --null $open > /dev/null 2> /dev/null } ``` - You may replace ``ag -U -g ".pdf$"`` with another command that returns a list of pdf files. - You may replace ``open=...`` by your favorite PDF viewer, for instance ``open=evince`` or ``open=okular``. __Remark:__ On OSX, we use the command line tools ``gcut``, ``gxargs``, ``ggrep``, ``gtr`` which are the GNU versions of the tools ``cut``, ``xargs``, ``grep``, ``tr``. This way, we avoid the specifics of the versions of these tools pre-installed on OSX, and the same ``.bashrc`` code can be used for both OSX and GNU Linux systems. # Usage Use the command ``p`` to browse among the PDF files in the current directory and its subdirectories. The first run of the command will take some time to cache the text extracted from each pdf. Further runs of the command will be much faster since the text extraction will only apply to new pdfs. # How to clear the cache? To clear the cache (which contains text extracted from PDF), you can run 'fast-p --clear-cache'. This will safely remove the file located at: ``~/.cache/fast-p-pdftotext-output/fast-p_cached_pdftotext_output.db`` For older versions, please manually delete the cache file found at ``~/.cache/fast-p_cached_pdftotext_output.db`` # Launch with keyboard shortcut in Ubuntu On Ubuntu desktop (tested in 18.04), one may add a keyboard shortcut to launch a new terminal running the ``p`` command right away. With the following script, the new terminal window will automatically close after choosing a PDF. Create a file ``~/.fast-p-rc`` with ``` source .bashrc p; sleep 0.15; exit; ``` and in Ubuntu Settings/Keyboard, add a custom shortcut that runs the command ``gnome-terminal -- sh -c "bash --rcfile .fast-p-rc"``. # See it in action ![illustration of the p command](https://user-images.githubusercontent.com/1019692/34446795-12229072-ecac-11e7-856a-ec0df0de60ae.gif) # Is the historical bash code still available? Yes, see https://github.com/bellecp/fast-p/blob/master/p but using the go binary as explained above is recommended for speed and interoperability. ================================================ FILE: main.go ================================================ package main import ( "bufio" "encoding/hex" "flag" "fmt" "github.com/boltdb/bolt" "github.com/cespare/xxhash" "github.com/mitchellh/go-homedir" "io" "log" "os" "os/exec" "path/filepath" ) func hash_file_xxhash(filePath string) (string, error) { var returnMD5String string file, err := os.Open(filePath) if err != nil { return returnMD5String, err } defer file.Close() hash := xxhash.New() if _, err := io.Copy(hash, file); err != nil { return returnMD5String, err } hashInBytes := hash.Sum(nil)[:] returnMD5String = hex.EncodeToString(hashInBytes) return returnMD5String, nil } func main() { flag.Usage = func() { fmt.Printf(`Usage: fast-p [OPTIONS] Reads a list of PDF filenames from STDIN and returns a list of null-byte separated items of the form filename[TAB]text where "text" is the text extracted from the first two pages of the PDF by pdftotext and [TAB] denotes a tab character "\t". Common usage of this tool is to pipe the result to FZF with a command in your .bashrc as explained in https://github.com/bellecp/fast-p. `) flag.PrintDefaults() } version := flag.Bool("version", false, "Display program version") clearCache := flag.Bool("clear-cache", false, "Delete cache file located at: \n~/.cache/fast-p-pdftotext-output/fast-p_cached_pdftotext_output.db") flag.Parse() if *version != false { fmt.Printf("v.0.2.5 \nhttps://github.com/bellecp/fast-p\n") os.Exit(0) } if *clearCache != false { removePath, err := homedir.Expand("~/.cache/fast-p-pdftotext-output/fast-p_cached_pdftotext_output.db") if err != nil { log.Fatal(err) os.Exit(1) } os.Remove(removePath) os.Exit(0) } // Create ~/.cache folder if does not exist // https://stackoverflow.com/questions/37932551/mkdir-if-not-exists-using-golang cachePath, err := homedir.Expand("~/.cache/fast-p-pdftotext-output/") os.MkdirAll(cachePath, os.ModePerm) // open BoltDB cache database scanner := bufio.NewScanner(os.Stdin) boltDbFilepath := filepath.Join(cachePath, "fast-p_cached_pdftotext_output.db") if err != nil { log.Fatal(err) } db, err := bolt.Open(boltDbFilepath, 0600, nil) bucketName := "fast-p_bucket_for_cached_pdftotext_output" if err != nil { log.Fatal(err) } defer db.Close() nullByte := "\u0000" db.Update(func(tx *bolt.Tx) error { _, err := tx.CreateBucketIfNotExists([]byte(bucketName)) if err != nil { return fmt.Errorf("create bucket: %s", err) } return nil }) missing := make(map[string]string) alreadySeen := make(map[string]bool) for scanner.Scan() { filepath := scanner.Text() hash, err := hash_file_xxhash(filepath) if alreadySeen[hash] != true { alreadySeen[hash] = true if err != nil { log.Println("err", hash) } var content string found := false err2 := db.View(func(tx *bolt.Tx) error { b := tx.Bucket([]byte(bucketName)) v := b.Get([]byte(hash)) if v != nil { found = true content = string(v) } return nil }) if err2 != nil { log.Println(err2) } if found == true { fmt.Println(filepath + "\t" + content + nullByte) } else { missing[hash] = filepath } } } for hash, filepath := range missing { cmd := exec.Command("pdftotext", "-l", "2", filepath, "-") out, err := cmd.CombinedOutput() content := string(out) if err != nil { log.Println(err) } fmt.Println(filepath + "\t" + content + nullByte) db.Update(func(tx *bolt.Tx) error { b := tx.Bucket([]byte(bucketName)) err := b.Put([]byte(hash), []byte(content)) if err != nil { fmt.Println(err) } return nil }) } } ================================================ FILE: p ================================================ # This file is kept only for historical reasons. # It is recommended to use the go binary and the installatoin procedure # describe at https://github.com/bellecp/fast-p ## Installation # - install ``pdftotext``. This comes with the texlive distribution on linux or with poppler on OSX. # - install ``fzf``: https://github.com/junegunn/fzf # - install ``xxhash``: https://github.com/Cyan4973/xxHash # - install ``GNU grep``, ``ag`` (silver searcher) # - clone the repository: ``$ git clone https://github.com/bellecp/fast-p.git`` # - add a line ``source fast-p/p`` to your .bashrc or .bash_profile # - Run the command ``p``. The first run of the command will take some time to # cache the text extracted from each pdf. Further runs of the command will be # much faster since the text extraction will only apply to new pdfs. # ## Usage # # Run the command ``p`` and start typing keywords to search for pdf. # Type "enter" to view the pdf in the default viewer p () { local DIR open CACHEDLIST PDFLIST PDFLIST="/tmp/fewijbbioasBBBB" CACHEDLIST="/tmp/fewijbbioasAAAA" DIR="${HOME}/.cache/pdftotext" mkdir -p "${DIR}" touch "$DIR/NOOP" if [ "$(uname)" = "Darwin" ]; then open=open else open="gio open" fi # escale filenames # compute xxh sum # replace separator by tab character # sort to prepare for join # remove duplicates ag -U -g ".pdf$"| sed 's/\([ \o47()"&;\\]\)/\\\1/g;s/\o15/\\r/g' \ | xargs xxh64sum \ | sed 's/ /\t/' \ | sort \ | awk 'BEGIN {FS="\t"; OFS="\t"}; !seen[$1]++ {print $1, $2}' \ >| $PDFLIST # printed (hashsum,cached text) for every previously cached output of pdftotext # remove full path # replace separator by tab character # sort to prepare for join grep "" ~/.cache/pdftotext/* \ | sed 's=.*cache/pdftotext/==' \ | sed 's/:/\t/' \ | sort \ >| $CACHEDLIST { echo " "; # starting to type query sends it to fzf right away join -t ' ' $PDFLIST $CACHEDLIST; # already cached pdfs # Next, apply pdftotext to pdfs that haven't been cached yet comm -13 \ <(cat $CACHEDLIST | awk 'BEGIN {FS="\t"; OFS="\t"}; {print $1}') \ <(cat $PDFLIST | awk 'BEGIN {FS="\t"; OFS="\t"}; {print $1}') \ | join -t ' ' - $PDFLIST \ | awk 'BEGIN {FS="\t"; OFS="\t"}; !seen[$1]++ {print $1, $2}' \ | \ while read -r LINE; do local CACHE IFS=" "; set -- $LINE; CACHE="$DIR/$1" pdftotext -f 1 -l 2 "$2" - 2>/dev/null | tr "\n" "__" >| $CACHE echo -e "$1 $2 $(cat $CACHE)" done } | fzf --reverse -e -d '\t' \ --with-nth=2,3 \ --preview-window down:80% \ --preview ' v=$(echo {q} | tr " " "|"); echo {2} | grep -E "^|$v" -i --color=always; echo {3} | tr "__" "\n" | grep -E "^|$v" -i --color=always; ' \ | awk 'BEGIN {FS="\t"; OFS="\t"}; {print $2}' \ | sed 's/\([ \o47()"&;\\]\)/\\\1/g;s/\o15/\\r/g' \ | xargs $open > /dev/null 2> /dev/null }