Repository: bellecp/fast-p
Branch: master
Commit: 762d856f328c
Files: 6
Total size: 14.5 KB

Directory structure:
gitextract_5ax35fwf/

├── .gitignore
├── .goreleaser.yml
├── LICENSE
├── README.md
├── main.go
└── p

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
dist/*


================================================
FILE: .goreleaser.yml
================================================
# This is an example goreleaser.yaml file with some sane defaults.
# Make sure to check the documentation at http://goreleaser.com
builds:
- env:
  - CGO_ENABLED=0
archive:
  replacements:
    darwin: Darwin
    linux: Linux
    windows: Windows
    386: i386
    amd64: x86_64
checksum:
  name_template: 'checksums.txt'
snapshot:
  name_template: "{{ .Tag }}-next"
changelog:
  sort: asc
  filters:
    exclude:
    - '^docs:'
    - '^test:'

nfpm:
  # You can change the name of the package.
  # Default: `{{ .ProjectName }}_{{ .Version }}_{{ .Os }}_{{ .Arch }}{{ if .Arm }}v{{ .Arm }}{{ end }}`
  name_template: "{{ .ProjectName }}_{{ .Version }}_{{ .Os }}_{{ .Arch }}"

  homepage: https://github.com/bellecp/fast-p
  description: Fast commandline pdf fuzzy finder
  maintainer: http://github.com/bellecp

  # Formats to be generated.
  formats:
    - deb
    - rpm
  license: MIT

brew:
  name: fast-pdf-finder

  github:
    owner: bellecp
    name: homebrew-fast-p

  # Git author used to commit to the repository.
  # Defaults are shown.
  commit_author:
    name: bellecp
    email: bellecp@users.noreply.github.com

  # Your app's homepage.
  # Default is empty.
  homepage: "https://github.com/bellecp/fast-p"

  # Your app's description.
  # Default is empty.
  description: "Fast, command-line PDF finder"

  # Packages your package depends on.
  dependencies:
    - grep
    - fzf
    - coreutils
    - findutils
    - poppler
    - pkg-config
    - the_silver_searcher

  # Custom install script for brew.
  # Default is 'bin.install "program"'.
  install: |
    bin.install "fast-p"


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2018 bellecp (github.com/bellecp)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# fast-p

Quickly find and open a pdf among a collection of thousands of unsorted pdfs through fzf (fuzzy finder)

- [Installation on Linux](#installation-on-unix-or-linux-based-systems)
- [Installation on OSX](#installation-on-osx-with-homebrew)
- [Usage](#usage)
- [How to clear the cache?](#how-to-clear-the-cache)
- [Launch with keyboard shortcut in Ubuntu](#launch-with-keyboard-shortcut-in-ubuntu)
- [See it in action](#see-it-in-action)
- [Is the historical bash code still available?](#is-the-historical-bash-code-still-available)

# Installation on Unix or Linux based systems

1. __Requirements.__ Make sure the following requirements are satisfied:
    - install ``pdftotext``. This comes with the texlive distribution on linux,
    On ubuntu, ``sudo apt-get install poppler-utils`` . 
    - install ``fzf``: https://github.com/junegunn/fzf
    - install ``GNU grep``,  ``ag`` (silver searcher).

2. __Install binary__. Do either one of the two steps below:
    - __Compile from source with ``go`` and ``go get``.__
    With a working ``golang`` installation, do 
    ```go install github.com/bellecp/fast-p@v0.2.5```
    It will fetch the code and its dependencies,
    compile and create an executable ``fast-p`` in the ``/bin`` folder of your go
    installation, typically ``~/go/bin``. Make sure the command ``fast-p`` can be
    found (for instance, add ``~/go/bin`` to your ``$PATH``.)
    - Or: __Use the precompiled binary for your architecture.__ Download the binary that corresponds to your
    architecture at https://github.com/bellecp/fast-p/releases and make sure that
    the command ``fast-p`` can be found. For instance,
    put the binary file ``fast-p`` in ``~/custom/bin`` and add ``export
    PATH=~/custom/bin:$PATH`` to your ``.bashrc``.

3. __Tweak your .bashrc__. Add the following code to your ``.bashrc``
```
p () {
    open=xdg-open   # this will open pdf file withthe default PDF viewer on KDE, xfce, LXDE and perhaps on other desktops.

    ag -U -g ".pdf$" \
    | fast-p \
    | fzf --read0 --reverse -e -d $'\t'  \
        --preview-window down:80% --preview '
            v=$(echo {q} | tr " " "|"); 
            echo -e {1}"\n"{2} | grep -E "^|$v" -i --color=always;
        ' \
    | cut -z -f 1 -d $'\t' | tr -d '\n' | xargs -r --null $open > /dev/null 2> /dev/null
}

```
- You may replace ``ag -U -g ".pdf$"`` with another command that returns a list of pdf files.
- You may replace ``open=...`` by your favorite PDF viewer, for instance ``open=evince`` or ``open=okular``.

# Installation on OSX with homebrew

1. Install [homebrew](https://brew.sh/) and  __run__
```
brew install bellecp/fast-p/fast-pdf-finder
```
_The above brew formula is experimental. 
Please report any issues/suggestions/feedback at <https://github.com/bellecp/fast-p/issues/11>_


2. __Tweak your .bashrc__. Add the following code to your ``.bashrc``
```
p () {
    local open
    open=open   # on OSX, "open" opens a pdf in preview
    ag -U -g ".pdf$" \
    | fast-p \
    | fzf --read0 --reverse -e -d $'\t'  \
        --preview-window down:80% --preview '
            v=$(echo {q} | gtr " " "|"); 
            echo -e {1}"\n"{2} | ggrep -E "^|$v" -i --color=always;
        ' \
    | gcut -z -f 1 -d $'\t' | gtr -d '\n' | gxargs -r --null $open > /dev/null 2> /dev/null
}

```
- You may replace ``ag -U -g ".pdf$"`` with another command that returns a list of pdf files.
- You may replace ``open=...`` by your favorite PDF viewer, for instance ``open=evince`` or ``open=okular``.

__Remark:__ On OSX, we use the command line tools ``gcut``, ``gxargs``, ``ggrep``, ``gtr`` which are the GNU versions
of the tools ``cut``, ``xargs``, ``grep``, ``tr``. This way, we avoid the specifics of the versions of these tools pre-installed on OSX,
and the same ``.bashrc`` code can be used for both OSX and GNU Linux systems.

# Usage

Use the command ``p`` to browse among the PDF files in the current directory and its subdirectories.

The first run of the command will take some time to cache the text extracted from each pdf. Further runs of the command will be much faster since the text extraction will only apply to new pdfs.

# How to clear the cache?

To clear the cache (which contains text extracted from PDF), you can run 'fast-p --clear-cache'. This will safely remove the file located at:
``~/.cache/fast-p-pdftotext-output/fast-p_cached_pdftotext_output.db``

For older versions, please manually delete the cache file found at
``~/.cache/fast-p_cached_pdftotext_output.db``

# Launch with keyboard shortcut in Ubuntu

On Ubuntu desktop (tested in 18.04), one may add a keyboard shortcut to launch a new terminal running the ``p`` command right away.
With the following script, the new terminal window will automatically close after choosing a PDF.

Create a file ``~/.fast-p-rc`` with
```
source .bashrc
p;
sleep 0.15; exit;
```
and in Ubuntu Settings/Keyboard, add a custom shortcut that runs the command
``gnome-terminal -- sh -c "bash --rcfile .fast-p-rc"``.


# See it in action

![illustration of the p command](https://user-images.githubusercontent.com/1019692/34446795-12229072-ecac-11e7-856a-ec0df0de60ae.gif)


# Is the historical bash code still available?

Yes, see https://github.com/bellecp/fast-p/blob/master/p but using the go binary as explained above is recommended for speed and interoperability.


================================================
FILE: main.go
================================================
package main

import (
	"bufio"
	"encoding/hex"
	"flag"
	"fmt"
	"github.com/boltdb/bolt"
	"github.com/cespare/xxhash"
	"github.com/mitchellh/go-homedir"
	"io"
	"log"
	"os"
	"os/exec"
	"path/filepath"
)

func hash_file_xxhash(filePath string) (string, error) {
	var returnMD5String string
	file, err := os.Open(filePath)
	if err != nil {
		return returnMD5String, err
	}
	defer file.Close()
	hash := xxhash.New()
	if _, err := io.Copy(hash, file); err != nil {
		return returnMD5String, err
	}
	hashInBytes := hash.Sum(nil)[:]
	returnMD5String = hex.EncodeToString(hashInBytes)
	return returnMD5String, nil

}

func main() {
	flag.Usage = func() {
		fmt.Printf(`Usage: fast-p [OPTIONS]
    Reads a list of PDF filenames from STDIN and returns a list of null-byte
    separated items of the form
        filename[TAB]text
    where "text" is the text extracted from the first two pages of the PDF
    by pdftotext and [TAB] denotes a tab character "\t".

    Common usage of this tool is to pipe the result to FZF with a command in
    your .bashrc as explained in https://github.com/bellecp/fast-p.


`)
		flag.PrintDefaults()
	}
	version := flag.Bool("version", false, "Display program version")
	clearCache := flag.Bool("clear-cache", false, "Delete cache file located at: \n~/.cache/fast-p-pdftotext-output/fast-p_cached_pdftotext_output.db")
	flag.Parse()

	if *version != false {
		fmt.Printf("v.0.2.5 \nhttps://github.com/bellecp/fast-p\n")
		os.Exit(0)
	}

	if *clearCache != false {
		removePath, err := homedir.Expand("~/.cache/fast-p-pdftotext-output/fast-p_cached_pdftotext_output.db")
		if err != nil {
			log.Fatal(err)
			os.Exit(1)
		}
		os.Remove(removePath)
		os.Exit(0)
	}

	// Create ~/.cache folder if does not exist
	// https://stackoverflow.com/questions/37932551/mkdir-if-not-exists-using-golang
	cachePath, err := homedir.Expand("~/.cache/fast-p-pdftotext-output/")
	os.MkdirAll(cachePath, os.ModePerm)

	// open BoltDB cache database
	scanner := bufio.NewScanner(os.Stdin)
	boltDbFilepath := filepath.Join(cachePath, "fast-p_cached_pdftotext_output.db")
	if err != nil {
		log.Fatal(err)
	}
	db, err := bolt.Open(boltDbFilepath, 0600, nil)
	bucketName := "fast-p_bucket_for_cached_pdftotext_output"
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	nullByte := "\u0000"

	db.Update(func(tx *bolt.Tx) error {
		_, err := tx.CreateBucketIfNotExists([]byte(bucketName))
		if err != nil {
			return fmt.Errorf("create bucket: %s", err)
		}
		return nil
	})

	missing := make(map[string]string)
	alreadySeen := make(map[string]bool)

	for scanner.Scan() {
		filepath := scanner.Text()
		hash, err := hash_file_xxhash(filepath)
		if alreadySeen[hash] != true {
			alreadySeen[hash] = true
			if err != nil {
				log.Println("err", hash)
			}
			var content string
			found := false
			err2 := db.View(func(tx *bolt.Tx) error {
				b := tx.Bucket([]byte(bucketName))
				v := b.Get([]byte(hash))
				if v != nil {
					found = true
					content = string(v)
				}
				return nil
			})
			if err2 != nil {
				log.Println(err2)
			}
			if found == true {
				fmt.Println(filepath + "\t" + content + nullByte)
			} else {
				missing[hash] = filepath
			}
		}
	}
	for hash, filepath := range missing {
		cmd := exec.Command("pdftotext", "-l", "2", filepath, "-")
		out, err := cmd.CombinedOutput()
		content := string(out)
		if err != nil {
			log.Println(err)
		}
		fmt.Println(filepath + "\t" + content + nullByte)
		db.Update(func(tx *bolt.Tx) error {
			b := tx.Bucket([]byte(bucketName))
			err := b.Put([]byte(hash), []byte(content))
			if err != nil {
				fmt.Println(err)
			}
			return nil
		})
	}
}


================================================
FILE: p
================================================
# This file is kept only for historical reasons.  
# It is recommended to use the go binary and the installatoin procedure
# describe at https://github.com/bellecp/fast-p

## Installation
# - install ``pdftotext``. This comes with the texlive distribution on linux or with poppler on OSX.
# - install ``fzf``: https://github.com/junegunn/fzf
# - install ``xxhash``: https://github.com/Cyan4973/xxHash
# - install ``GNU grep``,  ``ag`` (silver searcher)
# - clone the repository: ``$ git clone https://github.com/bellecp/fast-p.git`` 
# - add a line ``source fast-p/p`` to your .bashrc or .bash_profile
# - Run the command ``p``. The first run of the command will take some time to
# cache the text extracted from each pdf. Further runs of the command will be
# much faster since the text extraction will only apply to new pdfs.
#
## Usage
#
# Run the command ``p`` and start typing keywords to search for pdf.
# Type "enter" to view the pdf in the default viewer

p () {
    local DIR open CACHEDLIST PDFLIST
    PDFLIST="/tmp/fewijbbioasBBBB"
    CACHEDLIST="/tmp/fewijbbioasAAAA"
    DIR="${HOME}/.cache/pdftotext"
    mkdir -p "${DIR}"
    touch "$DIR/NOOP"
    if [ "$(uname)" = "Darwin" ]; then
        open=open
    else
        open="gio open"
    fi

    # escale filenames
    # compute xxh sum
    # replace separator by tab character
    # sort to prepare for join
    # remove duplicates
    ag -U -g ".pdf$"| sed 's/\([ \o47()"&;\\]\)/\\\1/g;s/\o15/\\r/g'  \
        | xargs xxh64sum \
        | sed 's/  /\t/' \
        | sort \
        | awk 'BEGIN {FS="\t"; OFS="\t"}; !seen[$1]++ {print $1, $2}' \
        >| $PDFLIST

    # printed (hashsum,cached text) for every previously cached output of pdftotext
    # remove full path
    # replace separator by tab character
    # sort to prepare for join
    grep "" ~/.cache/pdftotext/* \
        | sed 's=.*cache/pdftotext/==' \
        | sed 's/:/\t/' \
        | sort \
        >| $CACHEDLIST

    {
        echo " "; # starting to type query sends it to fzf right away
        join -t '	' $PDFLIST $CACHEDLIST; # already cached pdfs
        # Next, apply pdftotext to pdfs that haven't been cached yet
        comm -13 \
            <(cat $CACHEDLIST | awk 'BEGIN {FS="\t"; OFS="\t"}; {print $1}') \
            <(cat $PDFLIST | awk 'BEGIN {FS="\t"; OFS="\t"}; {print $1}') \
            | join -t '	' - $PDFLIST \
            | awk 'BEGIN {FS="\t"; OFS="\t"}; !seen[$1]++ {print $1, $2}' \
            | \
            while read -r LINE; do
                local CACHE
                IFS="	"; set -- $LINE;
                CACHE="$DIR/$1"
                pdftotext -f 1 -l 2 "$2" - 2>/dev/null | tr "\n" "__" >| $CACHE
                echo -e "$1	$2	$(cat $CACHE)"
            done
} | fzf --reverse -e -d '\t'  \
    --with-nth=2,3 \
    --preview-window down:80% \
    --preview '
v=$(echo {q} | tr " " "|");
echo {2} | grep -E "^|$v" -i --color=always;
echo {3} | tr "__" "\n" | grep -E "^|$v" -i --color=always;
' \
    | awk 'BEGIN {FS="\t"; OFS="\t"}; {print $2}'  \
    | sed 's/\([ \o47()"&;\\]\)/\\\1/g;s/\o15/\\r/g'  \
    | xargs $open > /dev/null 2> /dev/null

}