[
  {
    "path": ".gitignore",
    "content": "dist/*\n"
  },
  {
    "path": ".goreleaser.yml",
    "content": "# This is an example goreleaser.yaml file with some sane defaults.\n# Make sure to check the documentation at http://goreleaser.com\nbuilds:\n- env:\n  - CGO_ENABLED=0\narchive:\n  replacements:\n    darwin: Darwin\n    linux: Linux\n    windows: Windows\n    386: i386\n    amd64: x86_64\nchecksum:\n  name_template: 'checksums.txt'\nsnapshot:\n  name_template: \"{{ .Tag }}-next\"\nchangelog:\n  sort: asc\n  filters:\n    exclude:\n    - '^docs:'\n    - '^test:'\n\nnfpm:\n  # You can change the name of the package.\n  # Default: `{{ .ProjectName }}_{{ .Version }}_{{ .Os }}_{{ .Arch }}{{ if .Arm }}v{{ .Arm }}{{ end }}`\n  name_template: \"{{ .ProjectName }}_{{ .Version }}_{{ .Os }}_{{ .Arch }}\"\n\n  homepage: https://github.com/bellecp/fast-p\n  description: Fast commandline pdf fuzzy finder\n  maintainer: http://github.com/bellecp\n\n  # Formats to be generated.\n  formats:\n    - deb\n    - rpm\n  license: MIT\n\nbrew:\n  name: fast-pdf-finder\n\n  github:\n    owner: bellecp\n    name: homebrew-fast-p\n\n  # Git author used to commit to the repository.\n  # Defaults are shown.\n  commit_author:\n    name: bellecp\n    email: bellecp@users.noreply.github.com\n\n  # Your app's homepage.\n  # Default is empty.\n  homepage: \"https://github.com/bellecp/fast-p\"\n\n  # Your app's description.\n  # Default is empty.\n  description: \"Fast, command-line PDF finder\"\n\n  # Packages your package depends on.\n  dependencies:\n    - grep\n    - fzf\n    - coreutils\n    - findutils\n    - poppler\n    - pkg-config\n    - the_silver_searcher\n\n  # Custom install script for brew.\n  # Default is 'bin.install \"program\"'.\n  install: |\n    bin.install \"fast-p\"\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2018 bellecp (github.com/bellecp)\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# fast-p\n\nQuickly find and open a pdf among a collection of thousands of unsorted pdfs through fzf (fuzzy finder)\n\n- [Installation on Linux](#installation-on-unix-or-linux-based-systems)\n- [Installation on OSX](#installation-on-osx-with-homebrew)\n- [Usage](#usage)\n- [How to clear the cache?](#how-to-clear-the-cache)\n- [Launch with keyboard shortcut in Ubuntu](#launch-with-keyboard-shortcut-in-ubuntu)\n- [See it in action](#see-it-in-action)\n- [Is the historical bash code still available?](#is-the-historical-bash-code-still-available)\n\n# Installation on Unix or Linux based systems\n\n1. __Requirements.__ Make sure the following requirements are satisfied:\n    - install ``pdftotext``. This comes with the texlive distribution on linux,\n    On ubuntu, ``sudo apt-get install poppler-utils`` . \n    - install ``fzf``: https://github.com/junegunn/fzf\n    - install ``GNU grep``,  ``ag`` (silver searcher).\n\n2. __Install binary__. Do either one of the two steps below:\n    - __Compile from source with ``go`` and ``go get``.__\n    With a working ``golang`` installation, do \n    ```go install github.com/bellecp/fast-p@v0.2.5```\n    It will fetch the code and its dependencies,\n    compile and create an executable ``fast-p`` in the ``/bin`` folder of your go\n    installation, typically ``~/go/bin``. Make sure the command ``fast-p`` can be\n    found (for instance, add ``~/go/bin`` to your ``$PATH``.)\n    - Or: __Use the precompiled binary for your architecture.__ Download the binary that corresponds to your\n    architecture at https://github.com/bellecp/fast-p/releases and make sure that\n    the command ``fast-p`` can be found. For instance,\n    put the binary file ``fast-p`` in ``~/custom/bin`` and add ``export\n    PATH=~/custom/bin:$PATH`` to your ``.bashrc``.\n\n3. __Tweak your .bashrc__. Add the following code to your ``.bashrc``\n```\np () {\n    open=xdg-open   # this will open pdf file withthe default PDF viewer on KDE, xfce, LXDE and perhaps on other desktops.\n\n    ag -U -g \".pdf$\" \\\n    | fast-p \\\n    | fzf --read0 --reverse -e -d $'\\t'  \\\n        --preview-window down:80% --preview '\n            v=$(echo {q} | tr \" \" \"|\"); \n            echo -e {1}\"\\n\"{2} | grep -E \"^|$v\" -i --color=always;\n        ' \\\n    | cut -z -f 1 -d $'\\t' | tr -d '\\n' | xargs -r --null $open > /dev/null 2> /dev/null\n}\n\n```\n- You may replace ``ag -U -g \".pdf$\"`` with another command that returns a list of pdf files.\n- You may replace ``open=...`` by your favorite PDF viewer, for instance ``open=evince`` or ``open=okular``.\n\n# Installation on OSX with homebrew\n\n1. Install [homebrew](https://brew.sh/) and  __run__\n```\nbrew install bellecp/fast-p/fast-pdf-finder\n```\n_The above brew formula is experimental. \nPlease report any issues/suggestions/feedback at <https://github.com/bellecp/fast-p/issues/11>_\n\n\n2. __Tweak your .bashrc__. Add the following code to your ``.bashrc``\n```\np () {\n    local open\n    open=open   # on OSX, \"open\" opens a pdf in preview\n    ag -U -g \".pdf$\" \\\n    | fast-p \\\n    | fzf --read0 --reverse -e -d $'\\t'  \\\n        --preview-window down:80% --preview '\n            v=$(echo {q} | gtr \" \" \"|\"); \n            echo -e {1}\"\\n\"{2} | ggrep -E \"^|$v\" -i --color=always;\n        ' \\\n    | gcut -z -f 1 -d $'\\t' | gtr -d '\\n' | gxargs -r --null $open > /dev/null 2> /dev/null\n}\n\n```\n- You may replace ``ag -U -g \".pdf$\"`` with another command that returns a list of pdf files.\n- You may replace ``open=...`` by your favorite PDF viewer, for instance ``open=evince`` or ``open=okular``.\n\n__Remark:__ On OSX, we use the command line tools ``gcut``, ``gxargs``, ``ggrep``, ``gtr`` which are the GNU versions\nof the tools ``cut``, ``xargs``, ``grep``, ``tr``. This way, we avoid the specifics of the versions of these tools pre-installed on OSX,\nand the same ``.bashrc`` code can be used for both OSX and GNU Linux systems.\n\n# Usage\n\nUse the command ``p`` to browse among the PDF files in the current directory and its subdirectories.\n\nThe first run of the command will take some time to cache the text extracted from each pdf. Further runs of the command will be much faster since the text extraction will only apply to new pdfs.\n\n# How to clear the cache?\n\nTo clear the cache (which contains text extracted from PDF), you can run 'fast-p --clear-cache'. This will safely remove the file located at:\n``~/.cache/fast-p-pdftotext-output/fast-p_cached_pdftotext_output.db``\n\nFor older versions, please manually delete the cache file found at\n``~/.cache/fast-p_cached_pdftotext_output.db``\n\n# Launch with keyboard shortcut in Ubuntu\n\nOn Ubuntu desktop (tested in 18.04), one may add a keyboard shortcut to launch a new terminal running the ``p`` command right away.\nWith the following script, the new terminal window will automatically close after choosing a PDF.\n\nCreate a file ``~/.fast-p-rc`` with\n```\nsource .bashrc\np;\nsleep 0.15; exit;\n```\nand in Ubuntu Settings/Keyboard, add a custom shortcut that runs the command\n``gnome-terminal -- sh -c \"bash --rcfile .fast-p-rc\"``.\n\n\n\n# See it in action\n\n![illustration of the p command](https://user-images.githubusercontent.com/1019692/34446795-12229072-ecac-11e7-856a-ec0df0de60ae.gif)\n\n\n# Is the historical bash code still available?\n\nYes, see https://github.com/bellecp/fast-p/blob/master/p but using the go binary as explained above is recommended for speed and interoperability.\n\n"
  },
  {
    "path": "main.go",
    "content": "package main\n\nimport (\n\t\"bufio\"\n\t\"encoding/hex\"\n\t\"flag\"\n\t\"fmt\"\n\t\"github.com/boltdb/bolt\"\n\t\"github.com/cespare/xxhash\"\n\t\"github.com/mitchellh/go-homedir\"\n\t\"io\"\n\t\"log\"\n\t\"os\"\n\t\"os/exec\"\n\t\"path/filepath\"\n)\n\nfunc hash_file_xxhash(filePath string) (string, error) {\n\tvar returnMD5String string\n\tfile, err := os.Open(filePath)\n\tif err != nil {\n\t\treturn returnMD5String, err\n\t}\n\tdefer file.Close()\n\thash := xxhash.New()\n\tif _, err := io.Copy(hash, file); err != nil {\n\t\treturn returnMD5String, err\n\t}\n\thashInBytes := hash.Sum(nil)[:]\n\treturnMD5String = hex.EncodeToString(hashInBytes)\n\treturn returnMD5String, nil\n\n}\n\nfunc main() {\n\tflag.Usage = func() {\n\t\tfmt.Printf(`Usage: fast-p [OPTIONS]\n    Reads a list of PDF filenames from STDIN and returns a list of null-byte\n    separated items of the form\n        filename[TAB]text\n    where \"text\" is the text extracted from the first two pages of the PDF\n    by pdftotext and [TAB] denotes a tab character \"\\t\".\n\n    Common usage of this tool is to pipe the result to FZF with a command in\n    your .bashrc as explained in https://github.com/bellecp/fast-p.\n\n\n`)\n\t\tflag.PrintDefaults()\n\t}\n\tversion := flag.Bool(\"version\", false, \"Display program version\")\n\tclearCache := flag.Bool(\"clear-cache\", false, \"Delete cache file located at: \\n~/.cache/fast-p-pdftotext-output/fast-p_cached_pdftotext_output.db\")\n\tflag.Parse()\n\n\tif *version != false {\n\t\tfmt.Printf(\"v.0.2.5 \\nhttps://github.com/bellecp/fast-p\\n\")\n\t\tos.Exit(0)\n\t}\n\n\tif *clearCache != false {\n\t\tremovePath, err := homedir.Expand(\"~/.cache/fast-p-pdftotext-output/fast-p_cached_pdftotext_output.db\")\n\t\tif err != nil {\n\t\t\tlog.Fatal(err)\n\t\t\tos.Exit(1)\n\t\t}\n\t\tos.Remove(removePath)\n\t\tos.Exit(0)\n\t}\n\n\t// Create ~/.cache folder if does not exist\n\t// https://stackoverflow.com/questions/37932551/mkdir-if-not-exists-using-golang\n\tcachePath, err := homedir.Expand(\"~/.cache/fast-p-pdftotext-output/\")\n\tos.MkdirAll(cachePath, os.ModePerm)\n\n\t// open BoltDB cache database\n\tscanner := bufio.NewScanner(os.Stdin)\n\tboltDbFilepath := filepath.Join(cachePath, \"fast-p_cached_pdftotext_output.db\")\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tdb, err := bolt.Open(boltDbFilepath, 0600, nil)\n\tbucketName := \"fast-p_bucket_for_cached_pdftotext_output\"\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tdefer db.Close()\n\n\tnullByte := \"\\u0000\"\n\n\tdb.Update(func(tx *bolt.Tx) error {\n\t\t_, err := tx.CreateBucketIfNotExists([]byte(bucketName))\n\t\tif err != nil {\n\t\t\treturn fmt.Errorf(\"create bucket: %s\", err)\n\t\t}\n\t\treturn nil\n\t})\n\n\tmissing := make(map[string]string)\n\talreadySeen := make(map[string]bool)\n\n\tfor scanner.Scan() {\n\t\tfilepath := scanner.Text()\n\t\thash, err := hash_file_xxhash(filepath)\n\t\tif alreadySeen[hash] != true {\n\t\t\talreadySeen[hash] = true\n\t\t\tif err != nil {\n\t\t\t\tlog.Println(\"err\", hash)\n\t\t\t}\n\t\t\tvar content string\n\t\t\tfound := false\n\t\t\terr2 := db.View(func(tx *bolt.Tx) error {\n\t\t\t\tb := tx.Bucket([]byte(bucketName))\n\t\t\t\tv := b.Get([]byte(hash))\n\t\t\t\tif v != nil {\n\t\t\t\t\tfound = true\n\t\t\t\t\tcontent = string(v)\n\t\t\t\t}\n\t\t\t\treturn nil\n\t\t\t})\n\t\t\tif err2 != nil {\n\t\t\t\tlog.Println(err2)\n\t\t\t}\n\t\t\tif found == true {\n\t\t\t\tfmt.Println(filepath + \"\\t\" + content + nullByte)\n\t\t\t} else {\n\t\t\t\tmissing[hash] = filepath\n\t\t\t}\n\t\t}\n\t}\n\tfor hash, filepath := range missing {\n\t\tcmd := exec.Command(\"pdftotext\", \"-l\", \"2\", filepath, \"-\")\n\t\tout, err := cmd.CombinedOutput()\n\t\tcontent := string(out)\n\t\tif err != nil {\n\t\t\tlog.Println(err)\n\t\t}\n\t\tfmt.Println(filepath + \"\\t\" + content + nullByte)\n\t\tdb.Update(func(tx *bolt.Tx) error {\n\t\t\tb := tx.Bucket([]byte(bucketName))\n\t\t\terr := b.Put([]byte(hash), []byte(content))\n\t\t\tif err != nil {\n\t\t\t\tfmt.Println(err)\n\t\t\t}\n\t\t\treturn nil\n\t\t})\n\t}\n}\n"
  },
  {
    "path": "p",
    "content": "# This file is kept only for historical reasons.  \n# It is recommended to use the go binary and the installatoin procedure\n# describe at https://github.com/bellecp/fast-p\n\n## Installation\n# - install ``pdftotext``. This comes with the texlive distribution on linux or with poppler on OSX.\n# - install ``fzf``: https://github.com/junegunn/fzf\n# - install ``xxhash``: https://github.com/Cyan4973/xxHash\n# - install ``GNU grep``,  ``ag`` (silver searcher)\n# - clone the repository: ``$ git clone https://github.com/bellecp/fast-p.git`` \n# - add a line ``source fast-p/p`` to your .bashrc or .bash_profile\n# - Run the command ``p``. The first run of the command will take some time to\n# cache the text extracted from each pdf. Further runs of the command will be\n# much faster since the text extraction will only apply to new pdfs.\n#\n## Usage\n#\n# Run the command ``p`` and start typing keywords to search for pdf.\n# Type \"enter\" to view the pdf in the default viewer\n\np () {\n    local DIR open CACHEDLIST PDFLIST\n    PDFLIST=\"/tmp/fewijbbioasBBBB\"\n    CACHEDLIST=\"/tmp/fewijbbioasAAAA\"\n    DIR=\"${HOME}/.cache/pdftotext\"\n    mkdir -p \"${DIR}\"\n    touch \"$DIR/NOOP\"\n    if [ \"$(uname)\" = \"Darwin\" ]; then\n        open=open\n    else\n        open=\"gio open\"\n    fi\n\n    # escale filenames\n    # compute xxh sum\n    # replace separator by tab character\n    # sort to prepare for join\n    # remove duplicates\n    ag -U -g \".pdf$\"| sed 's/\\([ \\o47()\"&;\\\\]\\)/\\\\\\1/g;s/\\o15/\\\\r/g'  \\\n        | xargs xxh64sum \\\n        | sed 's/  /\\t/' \\\n        | sort \\\n        | awk 'BEGIN {FS=\"\\t\"; OFS=\"\\t\"}; !seen[$1]++ {print $1, $2}' \\\n        >| $PDFLIST\n\n    # printed (hashsum,cached text) for every previously cached output of pdftotext\n    # remove full path\n    # replace separator by tab character\n    # sort to prepare for join\n    grep \"\" ~/.cache/pdftotext/* \\\n        | sed 's=.*cache/pdftotext/==' \\\n        | sed 's/:/\\t/' \\\n        | sort \\\n        >| $CACHEDLIST\n\n    {\n        echo \" \"; # starting to type query sends it to fzf right away\n        join -t '\t' $PDFLIST $CACHEDLIST; # already cached pdfs\n        # Next, apply pdftotext to pdfs that haven't been cached yet\n        comm -13 \\\n            <(cat $CACHEDLIST | awk 'BEGIN {FS=\"\\t\"; OFS=\"\\t\"}; {print $1}') \\\n            <(cat $PDFLIST | awk 'BEGIN {FS=\"\\t\"; OFS=\"\\t\"}; {print $1}') \\\n            | join -t '\t' - $PDFLIST \\\n            | awk 'BEGIN {FS=\"\\t\"; OFS=\"\\t\"}; !seen[$1]++ {print $1, $2}' \\\n            | \\\n            while read -r LINE; do\n                local CACHE\n                IFS=\"\t\"; set -- $LINE;\n                CACHE=\"$DIR/$1\"\n                pdftotext -f 1 -l 2 \"$2\" - 2>/dev/null | tr \"\\n\" \"__\" >| $CACHE\n                echo -e \"$1\t$2\t$(cat $CACHE)\"\n            done\n} | fzf --reverse -e -d '\\t'  \\\n    --with-nth=2,3 \\\n    --preview-window down:80% \\\n    --preview '\nv=$(echo {q} | tr \" \" \"|\");\necho {2} | grep -E \"^|$v\" -i --color=always;\necho {3} | tr \"__\" \"\\n\" | grep -E \"^|$v\" -i --color=always;\n' \\\n    | awk 'BEGIN {FS=\"\\t\"; OFS=\"\\t\"}; {print $2}'  \\\n    | sed 's/\\([ \\o47()\"&;\\\\]\\)/\\\\\\1/g;s/\\o15/\\\\r/g'  \\\n    | xargs $open > /dev/null 2> /dev/null\n\n}\n"
  }
]