Full Code of gilbertchen/benchmarking for AI

master b56d7e7f9771 cached
8 files
32.4 KB
11.1k tokens
2 symbols
1 requests
Download .txt
Repository: gilbertchen/benchmarking
Branch: master
Commit: b56d7e7f9771
Files: 8
Total size: 32.4 KB

Directory structure:
gitextract_847r0nph/

├── LICENSE
├── README.md
├── common.sh
├── linux-backup-test.sh
├── linux-restore-test.sh
├── tabulate.py
├── vbox-backup-test.sh
└── vbox-restore-test.sh

================================================
FILE CONTENTS
================================================

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2017 

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
## Objective

To benchmark the performance and storage efficiency of 4 backup tools, [Duplicacy](https://github.com/gilbertchen/duplicacy), [restic](https://github.com/restic/restic), [Attic](https://github.com/borgbackup/borg), and [duplicity](http://duplicity.nongnu.org/), using datasets that are publicly available.

## Disclaimer
As the developer of Duplicacy, I have little first-hand experience with other tools, other than setting them up and running for these experiments for the first time for this performance study.  It is highly possible that configurations for other tools may not be optimal.  Therefore, results presented here should not be viewed as conclusive until they are independently confirmed by other people.

## Setup

All tests were performed on a Mac mini 2012 model running macOS Sierra (10.12.3), with a 2.3 GHZ Intel i7 4-core processor and 16 GB memory.

The following table lists several important configuration parameters or algorithms that may have significant impact on the overall performance.

|                    |   Duplicacy   |   restic              |   Attic    |  duplicity  | 
|:------------------:|:-------------:|:---------------------:|:----------:|:-----------:|
| Version            |   2.0.3      |    0.6.1               |    BorgBackup 1.1.0b6    |    0.7.12    |
| Average chunk size |     1MB<sup>[1]</sup>     |    1MB               |     2MB    |     25MB     |
| Hash               |     blake2    |    SHA256             |  blake2 <sup>[2]</sup>|  SHA1    |
| Compression        |    lz4        |    not implemented    |    lz4     | zlib level 1|
| Encryption         |    AES-GCM    |   AES-CTR             |  AES-CTR   |  GnuPG      |

[1] The chunk size in Duplicacy is configurable with the default being 4MB.  It was set it to 1MB to match that of restic

[2] Enabled by `-e repokey-blake2` which is only available in 1.1.0+

## Backing up the Linux code base

The first dataset is the [Linux code base](https://github.com/torvalds/linux) mostly because it is the largest github repository that we could find and it has frequent commits (good for testing incremental backups).  Its size is 1.76 GB with about 58K files, so it is a relatively small repository consisting of small files, but it represents a popular use case where a backup tool runs alongside a version control program such as git to frequently save changes made between checkins.

To test incremental backup, a random commit on July 2016 was selected, and the entire code base is rolled back to that commit. After the initial backup was finished, other commits were chosen such that they were about one month apart.  The code base is then moved forward to these commits one by one to emulate incremental changes.  Details can be found in linux-backup-test.sh.

Backups were all saved to a storage directory on the same hard disk as the code base, to eliminate the performance variations introduced by different implementation of networked or cloud storage backends.

Here are the elapsed real times (in seconds) as reported by the `time` command, with the user CPU times and system CPU times in the parentheses:

|                    |   Duplicacy  |   restic   |   Attic    |  duplicity  | 
|:------------------:|:----------------:|:----------:|:----------:|:-----------:|
| Initial backup | 13.7 (16.9, 1.6) | 20.7 (69.9, 9.9) | 26.9 (23.1, 3.1) | 44.2 (56.3, 4.6) | 
| 2nd backup | 4.8 (4.8, 0.5) | 8.0 (15.3, 2.5) | 15.4 (13.4, 1.5) | 19.5 (17.9, 1.1) | 
| 3rd backup | 6.9 (8.0, 1.0) | 11.9 (32.2, 4.0) | 19.6 (16.4, 2.0) | 29.8 (29.3, 1.9) | 
| 4th backup | 3.3 (3.1, 0.4) | 7.0 (12.7, 2.2) | 13.7 (12.1, 1.2) | 18.6 (17.3, 0.9) | 
| 5th backup | 9.9 (11.0, 1.0) | 11.4 (33.5, 3.8) | 19.9 (17.1, 2.1) | 28.0 (27.6, 1.5) | 
| 6th backup | 3.8 (3.9, 0.5) | 8.0 (17.7, 2.7) | 16.8 (14.1, 1.6) | 22.0 (20.7, 1.0) | 
| 7th backup | 5.1 (5.1, 0.5) | 7.8 (16.0, 2.4) | 14.3 (12.6, 1.3) | 21.6 (20.3, 1.0) | 
| 8th backup | 9.5 (10.8, 1.1) | 13.5 (49.3, 4.8) | 18.3 (15.9, 1.8) | 35.0 (33.6, 1.9) | 
| 9th backup | 4.3 (4.5, 0.6) | 9.0 (20.6, 2.8) | 15.7 (13.7, 1.5) | 24.9 (23.6, 1.1) | 
| 10th backup | 7.9 (9.1, 0.9) | 20.2 (38.4, 4.7) | 32.2 (18.1, 2.3) | 35.0 (33.8, 1.8) | 
| 11th backup | 4.6 (4.5, 0.6) | 9.1 (19.6, 2.8) | 16.8 (14.5, 1.7) | 28.1 (26.4, 1.3) | 
| 12th backup | 7.4 (8.8, 1.0) | 12.0 (38.4, 4.0) | 21.7 (18.4, 2.2) | 37.4 (37.0, 2.0) | 


Clearly Duplicacy was the winner by a comfortable margin.  It is interesting that restic, while being the second fastest, consumed far more CPU times than the elapsed real times, which is bad for the user case where users want to keep the backup tool running in the background to minimize the interference with other tasks.  This could be caused by using too many threads (or more precisely goroutines, since restic is written in GO) in its local storage backend implementation.  However, even if this issue is fixable, as restic currently does not support compression, the addition of compression will only further slow down its backup speeds.

Now let us look at the sizes of the backup storage after each backup:

|                    |   Duplicacy  |   restic   |   Attic    |  duplicity  | 
|:------------------:|:----------------:|:----------:|:----------:|:-----------:|
| Initial backup     | 224MB | 631MB | 259MB | 183MB |
| 2nd backup         | 246MB | 692MB | 280MB | 185MB |
| 3rd backup         | 333MB | 912MB | 367MB | 203MB |
| 4th backup         | 340MB | 934MB | 373MB | 204MB |
| 5th backup         | 429MB | 1.1GB | 466MB | 222MB |
| 6th backup         | 457MB | 1.2GB | 492MB | 224MB |
| 7th backup         | 475MB | 1.2GB | 504MB | 227MB |
| 8th backup         | 576MB | 1.5GB | 607MB | 247MB |
| 9th backup         | 609MB | 1.6GB | 636MB | 251MB |
| 10th backup        | 706MB | 1.8GB | 739MB | 268MB |
| 11th backup        | 734MB | 1.9GB | 766MB | 270MB |
| 12th backup        | 834MB | 2.2GB | 869MB | 294MB |


Although duplicity was the most storage efficient, it should be noted that it uses zlib, which is known to compress better than lz4 used by Duplicacy and Attic.  In addition, duplicity has a serious flaw in its incremental model -- the user has to decide whether to perform a full backup or an incremental backup on each run.  That is because while an incremental backup saves a lot of storage space, it is also dependent on previous backups due to the design of duplicity, making it impossible to delete any single backup on a long chain of dependent backups. So there is always a dilemma of how often to perform a full backup for duplicity users.

We also ran linux-restore-test.sh to test restore speeds.  The destination directory was emptied before each restore, so we only test full restore, not incremental restore.  Again, Duplicacy was not only the fastest but also the most stable.  The restore times of restic and Attic increased considerably for backups created later, with restic's performance deteriorating far more quickly.  This is perhaps due to the fact that both restic and Attic group a number of chunks into a pack, so to restore a later backup one may need to unpack many packs belonging to earlier backups.  In contrast, chunks in Duplicacy are independent entities and are never packed, so any backup can be quickly restored from chunks that compose that backup, without the need to retrieve data from other backups.

|                    |   Duplicacy  |   restic   |   Attic    |  duplicity  | 
|:------------------:|:----------------:|:----------:|:----------:|:-----------:|
| 1st restore | 38.8 (18.4, 11.5) | 38.4 (17.3, 8.6) | 81.5 (18.8, 12.5) | 251.6 (133.4, 51.9) | 
| 2nd restore | 35.2 (11.5, 12.9) | 92.7 (25.1, 12.6) | 41.1 (17.0, 11.4) | 256.6 (133.7, 48.4) | 
| 3rd restore | 33.9 (9.7, 10.9) | 136.7 (27.7, 15.0) | 35.3 (17.3, 11.5) | 231.4 (134.5, 46.9) | 
| 4th restore | 34.5 (14.0, 10.8) | 149.7 (26.9, 15.1) | 46.4 (17.9, 12.5) | 213.8 (134.5, 43.5) | 
| 5th restore | 30.2 (9.4, 9.4) | 198.3 (28.6, 17.3) | 58.2 (18.9, 13.3) | 236.4 (134.3, 49.2) | 
| 6th restore | 34.7 (11.2, 9.3) | 348.6 (30.2, 20.8) | 65.5 (19.5, 13.4) | 250.7 (135.3, 40.9) | 
| 7th restore | 36.8 (9.2, 9.6) | 238.8 (29.3, 18.6) | 64.8 (19.4, 13.6) | 225.7 (125.1, 42.7) | 
| 8th restore | 26.0 (9.7, 8.1) | 251.5 (32.5, 21.7) | 83.1 (20.9, 14.3) | 261.0 (126.0, 45.3) | 
| 9th restore | 31.5 (8.8, 8.7) | 269.5 (31.0, 21.0) | 80.3 (20.5, 14.1) | 230.6 (126.8, 43.0) | 
| 10th restore | 40.5 (8.7, 8.1) | 290.6 (32.0, 22.1) | 91.9 (21.5, 15.0) | 242.4 (128.9, 46.3) | 
| 11th restore | 34.6 (8.3, 7.6) | 472.7 (33.0, 26.3) | 125.3 (22.3, 15.1) | 278.5 (127.9, 49.1) | 
| 12th restore | 76.4 (20.4, 13.1) | 387.7 (33.4, 24.7) | 103.2 (23.1, 16.1) | 240.3 (134.9, 44.8) | 


## Backing up a VirtualBox virtual machine

The second test was targeted at the other end of the spectrum - a dataset with fewer but much larger files.  Virtual machine files typically fall into this category.  The particular dataset for this test is a VirtualBox virtual machine file.  The base disk image is 64 bit CentOS 7, downloaded from http://www.osboxes.org/centos/.  Its size is about 4 GB, still small compared to virtual machines that are actually being used everyday, but it is enough to quantify performance differences between these 4 backup tools.

The first backup was performed right after the virtual machine had been set up without installing any software.  The second backup was performed after installing common developer tools using the command `yum groupinstall 'Development Tools'`.  The third backup was performed after a power on immediately followed by a power off.

The following table lists the backup times by these 4 tools.  With default settings, Duplicacy was generally slower than Attic.  However, this is mainly because Attic does not [compute file hashes](https://www.bountysource.com/issues/31735500-show-which-distinct-versions-of-a-file-exist), while Duplicacy does.  For a fair comparison, an option was added to Duplicacy to disable file hash computation and that made Duplicacy slightly faster than Attic.  This is not to say that Duplicacy should make this option the default.  Although chunk hashes along can guarantee the integrity of backups, file hashes can be useful in many ways. For instance, file hashes enable users to quickly identify which files in existing backups are changed.  They also allow third-party tools to compare files on disks to those in the backups.  It is unlikely that Duplicacy will stop computing file hashes by default in favor of slight performance gains.

Surprisingly, for the third backup restic was the fastest.  This can be explained partly by the lack of compression, partly by the high CPU usage.

|                    |   Duplicacy (default settings)  | Duplicacy (no file hash) |   restic   |   Attic    |  duplicity  | 
|:------------------:|:----------------:|:----------------:|:----------:|:----------:|:-----------:|
| Initial backup | 80.6 (100.7, 3.3) | 41.4 (57.7, 3.2) | 136.5 (116.4, 13.7) | 47.6 (46.9, 4.9) | 255.6 (226.9, 18.5) | 
| 2nd backup | 49.4 (52.9, 2.0) | 36.5 (40.8, 2.1) | 32.2 (70.4, 4.8) | 39.2 (34.2, 2.4) | 334.3 (343.4, 4.6) | 
| 3rd backup | 45.7 (44.6, 1.4) | 34.5 (33.1, 1.4) | 17.3 (55.1, 2.2) | 36.1 (31.8, 1.7) | 42.0 (35.3, 2.2) | 
 
Not surprisingly, duplicity is still the most storage efficient with restic being the worst:

|                    |   Duplicacy  |   restic   |   Attic    |  duplicity  | 
|:------------------:|:----------------:|:----------:|:----------:|:-----------:|
| Initial backup     | 2.0G | 4.1G | 2.0G | 1.7G |
| 2nd backup         | 2.6G | 5.0G | 2.6G | 1.9G |
| 3rd backup         | 2.6G | 5.1G | 2.7G | 1.9G |

A full restore was also performed for each backup.  Again, not computing the file hash helped improve the performance, but at the risk of possible undetected data corruption. 

|                    |   Duplicacy (default settings)  | Duplicacy (no file hash)  |   restic   |   Attic    |  duplicity  | 
|:------------------:|:----------------:|:----------------:|:----------:|:----------:|:-----------:|
| 1st restore | 130.5 (72.4, 5.7) | 76.8 (23.7, 4.2) | 202.6 (52.1, 7.1) | 99.6 (30.9, 6.6) | 728.3 (195.6, 87.0) | 
| 2nd restore | 138.9 (79.4, 5.4) | 121.5 (27.8, 6.6) | 230.8 (59.5, 8.3) | 115.7 (35.6, 7.8) | 720.5 (191.2, 87.7) | 
| 3rd restore | 145.4 (73.2, 5.4) | 123.9 (27.7, 6.6) | 244.8 (59.8, 8.1) | 122.2 (35.7, 7.9) | 749.7 (196.1, 87.9) | 

## Conclusion

The performances of 4 different backup tools on two publicly available datasets were compared.  Duplicacy is clearly the top performer for the first dataset and as fast as Attic for the second if the file hash computation is disabled.  However, it should be noted as both datasets are small and may be very different in nature from your data to be backed up.  Therefore, I strongly encourage you to run your own experiments using scripts available in this github repository in order to determine which one is the best for you.


================================================
FILE: common.sh
================================================

if [ -z "$DUPLICACY_PATH" ]; then
    DUPLICACY_PATH="`which duplicacy 2>/dev/null || echo ""`"
fi

if [ -z "$RESTIC_PATH" ]; then
    RESTIC_PATH="`which restic  2>/dev/null || echo ""`"
fi

if [ -z "$ATTIC_PATH" ]; then
    ATTIC_PATH="`which attic 2>/dev/null || echo ""`"
fi

if [ -z "$DUPLICITY_PATH" ]; then
    DUPLICITY_PATH="`which duplicity 2>/dev/null || echo ""`"
fi

if [ -z "$RDEDUP_PATH" ]; then
    RDEDUP_PATH="`which rdedup 2>/dev/null || echo ""`"
fi

if [ -z "$RDUP_PATH" ]; then
    RDUP_PATH="`which rdup 2>/dev/null || echo ""`"
fi

if [ -z "$RDEDUP_PATH" -o -z "$RDUP_PATH" ]; then
    RDEDUP_PATH=""
    RDUP_PATH=""
fi

if [ ! -z "$DUPLICITY_PATH" ]; then
    if [ -z "$GPG_KEY" ]; then
        echo "GPG_KEY must be set for duplicity to work properly"
        DUPLICITY_PATH=""
    fi
fi

BACKUP_DIR="`realpath ${TEST_DIR}/linux`"

DUPLICACY_STORAGE=${TEST_DIR}/linux-duplicacy-storage
RESTIC_STORAGE=${TEST_DIR}/linux-restic-storage
ATTIC_STORAGE=${TEST_DIR}/linux-attic-storage
DUPLICITY_STORAGE=${TEST_DIR}/linux-duplicity-storage
RDEDUP_STORAGE=${TEST_DIR}/linux-rdedup-storage

DUPLICACY_RESTORE=${TEST_DIR}/linux-duplicacy-restore
RESTIC_RESTORE=${TEST_DIR}/linux-restic-restore
ATTIC_RESTORE=${TEST_DIR}/linux-attic-restore
DUPLICITY_RESTORE=${TEST_DIR}/linux-duplicity-restore
RDEDUP_RESTORE=${TEST_DIR}/linux-rdedup-restore

# Used as the storage password throughout the tests
PASSWORD=12345678




================================================
FILE: linux-backup-test.sh
================================================
#!/bin/bash

set -o errexit
set -o pipefail

if [ "$#" -eq 0 ]; then
    echo "Usage: $0 <test dir>"
    exit 1
fi


# Set up directories
TEST_DIR="`realpath $1`"
source "common.sh"

# Clean up the storages
rm -rf ${DUPLICACY_STORAGE}
mkdir -p ${DUPLICACY_STORAGE}
rm -rf ${RESTIC_STORAGE}
mkdir -p ${RESTIC_STORAGE}
rm -rf ${ATTIC_STORAGE}
mkdir -p ${ATTIC_STORAGE}
rm -rf ${DUPLICITY_STORAGE}
mkdir -p ${DUPLICITY_STORAGE}
rm -rf ${RDEDUP_STORAGE}
mkdir -p ${RDEDUP_STORAGE}

# Download the github repository if needed
if [ ! -d "${BACKUP_DIR}" ]; then
    git clone https://github.com/torvalds/linux.git ${BACKUP_DIR}
fi

function duplicacy_backup()
{
    pushd ${BACKUP_DIR}
    time env DUPLICACY_PASSWORD=${PASSWORD} ${DUPLICACY_PATH} backup -stats | grep -v Uploaded
    popd
}

function restic_backup()
{
    time env RESTIC_PASSWORD=${PASSWORD} ${RESTIC_PATH} -r ${RESTIC_STORAGE} --exclude-file=${BACKUP_DIR}/.duplicacy/restic-exclude backup ${BACKUP_DIR}
}

function attic_backup()
{
    time env BORG_PASSPHRASE=${PASSWORD} ${ATTIC_PATH} create --compression lz4 ${ATTIC_STORAGE}::$1 ${BACKUP_DIR} --exclude-from ${BACKUP_DIR}/.duplicacy/attic-exclude 
}

function duplicity_backup()
{
    time ${DUPLICITY_PATH} -v0 --encrypt-key ${GPG_KEY} --sign-key ${GPG_KEY} --gpg-options "--compress-level=1" --exclude-filelist ${BACKUP_DIR}/.duplicacy/duplicity-exclude ${BACKUP_DIR} file://${DUPLICITY_STORAGE}
}

function rdedup_backup()
{
    local TS=$(date '+%y%m%d%H%M%S')
    time bash -c "${RDUP_PATH} -n -E ${BACKUP_DIR}/.duplicacy/rdedup-exclude /dev/null ${BACKUP_DIR} | ${RDEDUP_PATH} --dir ${RDEDUP_STORAGE} store $TS"
}

function all_backup()
{
    echo ======================================== backup $1 ========================================
    if [ ! -z "$DUPLICACY_PATH" ]; then
        duplicacy_backup
    fi
    if [ ! -z "$RESTIC_PATH" ]; then
        restic_backup
    fi
    if [ ! -z "$ATTIC_PATH" ]; then
        attic_backup $1
    fi
    if [ ! -z "$DUPLICITY_PATH" ]; then
        duplicity_backup
    fi
    if [ ! -z "$RDEDUP_PATH" ]; then
        rdedup_backup
    fi
    du -sh ${TEST_DIR}/linux-*-storage
}

echo =========================================== init ========================================
rm -rf ${BACKUP_DIR}/.duplicacy
mkdir -p ${BACKUP_DIR}/.duplicacy

if [ ! -z "$DUPLICACY_PATH" ]; then
    pushd ${BACKUP_DIR}
    env DUPLICACY_PASSWORD=${PASSWORD} ${DUPLICACY_PATH} init test ${DUPLICACY_STORAGE} -e -c 1M
    echo "-.git/" > ${BACKUP_DIR}/.duplicacy/filters
    popd
fi

if [ ! -z "$RESTIC_PATH" ]; then
    echo ".git/**" > ${BACKUP_DIR}/.duplicacy/restic-exclude
    echo ".duplicacy/**" >> ${BACKUP_DIR}/.duplicacy/restic-exclude
    env RESTIC_PASSWORD=${PASSWORD} ${RESTIC_PATH} -r ${RESTIC_STORAGE} init
fi

if [ ! -z "$ATTIC_PATH" ]; then
    echo "${BACKUP_DIR}/.git/*" > ${BACKUP_DIR}/.duplicacy/attic-exclude
    echo "${BACKUP_DIR}/.duplicacy/*" >> ${BACKUP_DIR}/.duplicacy/attic-exclude
    env BORG_PASSPHRASE=${PASSWORD} ${ATTIC_PATH} init -e repokey-blake2 ${ATTIC_STORAGE}
fi

if [ ! -z "$DUPLICITY_PATH" ]; then
    echo "- ${BACKUP_DIR}/.git" > ${BACKUP_DIR}/.duplicacy/duplicity-exclude
    echo "- ${BACKUP_DIR}/.duplicacy" >> ${BACKUP_DIR}/.duplicacy/duplicity-exclude
fi

if [ ! -z "$RDEDUP_PATH" ]; then
    echo "${BACKUP_DIR}/.git" > ${BACKUP_DIR}/.duplicacy/rdedup-exclude
    echo "${BACKUP_DIR}/.duplicacy" >> ${BACKUP_DIR}/.duplicacy/rdedup-exclude
    env RDEDUP_PASSPHRASE=${PASSWORD} rdedup --dir ${RDEDUP_STORAGE} init --chunk-size 1M
fi

du -sh ${TEST_DIR}/linux-*-storage

cd ${BACKUP_DIR}

git checkout -f 4f302921c1458d790ae21147f7043f4e6b6a1085 # commit on 07/02/2016
all_backup 1

git checkout -f 3481b68285238054be519ad0c8cad5cc2425e26c # commit on 08/03/2016 
all_backup 2

git checkout -f 46e36683f433528bfb7e5754ca5c5c86c204c40a # commit on 09/02/2016 
all_backup 3

git checkout -f 566c56a493ea17fd321abb60d59bfb274489bb18 # commit on 10/05/2016 
all_backup 4

git checkout -f 1be81ea5860744520e06d0dfb9e3490b45902dbb # commit on 11/01/2016 
all_backup 5

git checkout -f ef3d232245ab7a1bf361c52449e612e4c8b7c5ab # commit on 12/02/2016 
all_backup 6

git checkout -f 0e377f3b9ae936aefe5aaca4c2e2546d57b63df7 # commit on 01/05/2017
all_backup 7

git checkout -f cb23ebdfa6a491cf2173323059d846b4c5c9264e # commit on 02/04/2017 
all_backup 8

git checkout -f 67db256ed1e09fa03551f90ab3562df34c802a0b # commit on 03/02/2017 
all_backup 9

git checkout -f 1aed89640a899cd695bbfc976a4356affa474646 # commit on 04/05/2017 
all_backup 10

git checkout -f a6128f47f7940d8388ca7c8623fbe24e52f8fae6 # commit on 05/05/2017 
all_backup 11

git checkout -f 57caf4ec2b8bfbcb4f738ab5a12eedf3a8786045 # commit on 06/05/2017 
all_backup 12



================================================
FILE: linux-restore-test.sh
================================================
#!/bin/bash

#
# This script is to be run after linux-backup-test.sh.  It will restore backups
# in TEST_DIR/linux-*-storage to TEST_DIR/linux-*-restore
#
# NOTE:
#    Please make sure that this script doesn't run pass midnight, otherwise it
# would not be able to restore duplicity backups because it assumed backups were
# created on the same day. 

if [ "$#" -eq 0 ]; then
    echo "Usage: $0 <test dir>"
    exit 1
fi


# Set up directories
TEST_DIR="`realpath $1`"
source "common.sh"

rm -rf ${DUPLICACY_RESTORE}
mkdir -p ${DUPLICACY_RESTORE}
rm -rf ${RESTIC_RESTORE}
mkdir -p ${RESTIC_RESTORE}
rm -rf ${ATTIC_RESTORE}
mkdir -p ${ATTIC_RESTORE}
rm -rf ${DUPLICITY_RESTORE}
mkdir -p ${DUPLICITY_RESTORE}
rm -rf ${RDEDUP_RESTORE}
mkdir -p ${RDEDUP_RESTORE}

function duplicacy_restore()
{  
    rm -rf ${DUPLICACY_RESTORE}/* 
    pushd ${DUPLICACY_RESTORE}
    time env DUPLICACY_PASSWORD=${PASSWORD} ${DUPLICACY_PATH} restore -r $1 -stats | grep -v Downloaded
    popd
}


function restic_restore()
{
    rm -rf ${RESTIC_RESTORE}/* 
    # We need to find the snapshot id to restore
    TODAY=`date +"%Y-%m-%d"`
    SNAPSHOT=`env RESTIC_PASSWORD=${PASSWORD} ${RESTIC_PATH} -r ${RESTIC_STORAGE} snapshots | grep $TODAY | head -n $1 | tail -n 1 | awk '{print $1;}'`
    echo Restoring from $SNAPSHOT
    time env RESTIC_PASSWORD=${PASSWORD} ${RESTIC_PATH} -r ${RESTIC_STORAGE} restore $SNAPSHOT --target ${RESTIC_RESTORE}
}

function attic_restore()
{
    rm -rf ${ATTIC_RESTORE}/* 
    pushd ${ATTIC_RESTORE}
    time env BORG_PASSPHRASE=${PASSWORD} ${ATTIC_PATH} extract ${ATTIC_STORAGE}::$1    
    popd
}

function duplicity_restore()
{
    rm -rf ${DUPLICITY_RESTORE}/* 
    # duplicity is crazy -- the --restore-time option doesn't take the time format printed by its own colleciton-status command!
    TODAY=`date +"%Y-%m-%d"`
    RESTORE_TIME=`${DUPLICITY_PATH} -v0 --encrypt-key ${GPG_KEY} --sign-key ${GPG_KEY} collection-status file://${DUPLICITY_STORAGE} | grep 'Full\|Incremental' | head -n $1 | tail -n 1 | awk '{print $5;}'`
    RESTORE_TIME=${TODAY}T${RESTORE_TIME}
    echo Restoring from $RESTORE_TIME
    time ${DUPLICITY_PATH} --force -v0 --encrypt-key ${GPG_KEY} restore -t $RESTORE_TIME file://${DUPLICITY_STORAGE} ${DUPLICITY_RESTORE}
}

function rdedup_restore()
{
    rm -rf ${RDEDUP_RESTORE}/* 
    RESTORE_NAME="`${RDEDUP_PATH} --dir ${RDEDUP_STORAGE} list | sort | head -n $1 | tail -n 1`"
    echo Restoring from $RESTORE_NAME
    time bash -c "env RDEDUP_PASSPHRASE=${PASSWORD} ${RDEDUP_PATH} --dir ${RDEDUP_STORAGE} load $RESTORE_NAME | ${RDUP_PATH}-up -r ${BACKUP_DIR} ${RDEDUP_RESTORE}"
}


function all_restore()
{

    echo ======================================== restore $1 ========================================
    if [ ! -z "$DUPLICACY_PATH" ]; then
        duplicacy_restore $1
    fi
    if [ ! -z "$RESTIC_PATH" ]; then
        restic_restore $1
    fi
    if [ ! -z "$ATTIC_PATH" ]; then
        attic_restore $1
    fi
    if [ ! -z "$DUPLICITY_PATH" ]; then
        duplicity_restore $1
    fi
    if [ ! -z "$RDEDUP_PATH" ]; then
        rdedup_restore $1
    fi
}

# Initialize the duplicacy directory to be restored
if [ ! -z "$DUPLICACY_PATH" ]; then
    pushd ${DUPLICACY_RESTORE}
    env DUPLICACY_PASSWORD=${PASSWORD} ${DUPLICACY_PATH} init test ${DUPLICACY_STORAGE} -e
    popd
fi

if [ ! -z "$RESTIC_PATH" ]; then
    echo restic snapshots:
    env RESTIC_PASSWORD=${PASSWORD} ${RESTIC_PATH} -r ${RESTIC_STORAGE} snapshots
fi

if [ ! -z "$DUPLICITY_PATH" ]; then
    echo duplicity archives: 
    ${DUPLICITY_PATH} -v0 --encrypt-key ${GPG_KEY} --sign-key ${GPG_KEY} collection-status file://${DUPLICITY_STORAGE} | grep "Full\|Incremental"
fi

for i in `seq 1 12`; do
    all_restore $i
done



================================================
FILE: tabulate.py
================================================
#!/usr/bin/python

import os
import sys
import re

#
# This script is written to extract elapsed times from linux-backup-test.sh or linux-restore-test.sh
#
# Usage:
#
#     ./linux-backup-test.sh &> linux-backup-test.results
#     python tabulate.py linux-backup-test.results 

def getBackup(i):
    l = ["Initial", "2nd", "3rd"]
    if i < len(l):
        return l[i] + " backup"
    else:
        return str(i + 1) + "th backup"

def getTime(minute, second):
    t = int(minute) * 60 + float(second)
    return "%.1f" % t

if len(sys.argv) <= 1:
    print "usage:", sys.argv[0], "<test result file>"
    sys.exit(1)

i = 0
for line in open(sys.argv[1]).readlines():
    if line.startswith("====") and "init" not in line:
        print "\n|", getBackup(i), "|",
        i += 1 
        continue
    m = re.match(r"real\s+(\d+)m([\d.]+)s", line)
    if m:
        print getTime(m.group(1), m.group(2)),
        continue

    m = re.match(r"user\s+(\d+)m([\d.]+)s", line)
    if m:
        print "(", getTime(m.group(1), m.group(2)), ",", 
        continue
    m = re.match(r"sys\s+(\d+)m([\d.]+)s", line)
    if m:
        print getTime(m.group(1), m.group(2)), ") |", 
        continue
  
print "" 
     


================================================
FILE: vbox-backup-test.sh
================================================
#!/bin/bash

#
# Usage:
#     vbox-backup-test.sh <vm dir> <test dir> <action>
#
#     <vm dir>: the directory that contains the virtual machine; can't have spaces in the path
#     <test dir>: where the storage directories will be created
#     <action>: init or backup; init will also run the initial backup
#

if [ "$#" -lt 2 ]; then
    echo "Usage: $0 <vm dir> <test dir> <action>"
    exit 1
fi

if [ -z "$DUPLICACY_PATH" ]; then
    echo "DUPLICACY_PATH must be set to the path of the Duplicacy executable"
    exit 1
fi

if [ -z "$RESTIC_PATH" ]; then
    echo "RESTIC_PATH must be set to the path of the restic executable"
    exit 1
fi

if [ -z "$ATTIC_PATH" ]; then
    echo "ATTIC_PATH must be set to the path of the attic executable"
    exit 1
fi

if [ -z "$DUPLICITY_PATH" ]; then
    echo "DUPLICITY_PATH must be set to the path of the duplicity executable"
    exit 1
fi

if [ -z "$GPG_KEY" ]; then
    echo "GPG_KEY must be set for duplicity to work properly"
    exit 1 
fi

if [ -z "$PASSPHRASE" ]; then
    echo "PASSPHRASE must be set for duplicity to work properly"
    exit 1 
fi

# Set up directories
BACKUP_DIR=$1
TEST_DIR=$2
ACTION=$3
DUPLICACY_STORAGE=${TEST_DIR}/vbox-duplicacy-storage
RESTIC_STORAGE=${TEST_DIR}/vbox-restic-storage
ATTIC_STORAGE=${TEST_DIR}/vbox-attic-storage
DUPLICITY_STORAGE=${TEST_DIR}/vbox-duplicity-storage

# Used as the storage password throughout the tests
PASSWORD=12345678

function duplicacy_backup()
{
    time env DUPLICACY_PASSWORD=${PASSWORD} ${DUPLICACY_PATH} backup -stats | grep -v Uploaded | grep -v Skipped
}

function restic_backup()
{
    time env RESTIC_PASSWORD=${PASSWORD} ${RESTIC_PATH} -r ${RESTIC_STORAGE} --exclude-file=${BACKUP_DIR}/.duplicacy/restic-exclude backup ${BACKUP_DIR}
}

function attic_backup()
{
    time env BORG_PASSPHRASE=${PASSWORD} ${ATTIC_PATH} create --stats --debug --compression lz4 ${ATTIC_STORAGE}::$1 ${BACKUP_DIR} --exclude-from ${BACKUP_DIR}/.duplicacy/attic-exclude 
}

function duplicity_backup()
{
    time ${DUPLICITY_PATH} -v0 --encrypt-key ${GPG_KEY} --sign-key ${GPG_KEY} --gpg-options "--compress-level=1" --exclude-filelist ${BACKUP_DIR}/.duplicacy/duplicity-exclude ${BACKUP_DIR} file://${DUPLICITY_STORAGE}
}

function all_backup()
{
    echo ======================================== backup $1 ========================================
    duplicacy_backup
    restic_backup
    attic_backup $1
    duplicity_backup
    du -sh ${TEST_DIR}/vbox-*-storage
}

pushd ${BACKUP_DIR}

INDEX_FILE=${TEST_DIR}/vbox.index

if [ -e ${INDEX_FILE} ]; then
    INDEX=$((`cat ${INDEX_FILE}` + 1))
fi

if [ "$ACTION" == "init" ]; then

   echo =========================================== init ========================================
   # Clean up the storages
   rm -rf ${DUPLICACY_STORAGE}
   mkdir -p ${DUPLICACY_STORAGE}
   rm -rf ${RESTIC_STORAGE}
   mkdir -p ${RESTIC_STORAGE}
   rm -rf ${ATTIC_STORAGE}
   mkdir -p ${ATTIC_STORAGE}
   rm -rf ${DUPLICITY_STORAGE}
   mkdir -p ${DUPLICITY_STORAGE}

   rm -rf ${BACKUP_DIR}/.duplicacy
   env DUPLICACY_PASSWORD=${PASSWORD} ${DUPLICACY_PATH} init test ${DUPLICACY_STORAGE} -e -c 2M
   echo "-.git/" > ${BACKUP_DIR}/.duplicacy/filters

   echo ".git/**" > ${BACKUP_DIR}/.duplicacy/restic-exclude
   echo ".duplicacy/**" >> ${BACKUP_DIR}/.duplicacy/restic-exclude
   env RESTIC_PASSWORD=${PASSWORD} ${RESTIC_PATH} -r ${RESTIC_STORAGE} init

   echo "${BACKUP_DIR}/.git/*" > ${BACKUP_DIR}/.duplicacy/attic-exclude
   echo "${BACKUP_DIR}/.duplicacy/*" >> ${BACKUP_DIR}/.duplicacy/attic-exclude
   env BORG_PASSPHRASE=${PASSWORD} ${ATTIC_PATH} init -e repokey ${ATTIC_STORAGE}

   echo "- ${BACKUP_DIR}/.git" > ${BACKUP_DIR}/.duplicacy/duplicity-exclude
   echo "- ${BACKUP_DIR}/.duplicacy" >> ${BACKUP_DIR}/.duplicacy/duplicity-exclude

   du -sh ${TEST_DIR}/vbox-*-storage

   INDEX=1
   echo ${INDEX} > ${INDEX_FILE}
fi

echo Backup ${INDEX}
all_backup ${INDEX}
echo ${INDEX} > ${INDEX_FILE}




================================================
FILE: vbox-restore-test.sh
================================================
#!/bin/bash

#
# This script is to be run after vbox-backup-test.sh.  It will restore backups
# in TEST_DIR/vbox-*-storage to TEST_DIR/vbox-*-restore
#
# NOTE:
#    Please make sure that this script doesn't run pass midnight, otherwise it
# would not be able to restore duplicity backups because it assumed backups were
# created on the same day.
#

if [ "$#" -eq 0 ]; then
    echo "Usage: $0 <test dir>"
    exit 1
fi

if [ -z "$DUPLICACY_PATH" ]; then
    echo "DUPLICACY_PATH must be set to the path of the Duplicacy executable"
    exit 1
fi

if [ -z "$RESTIC_PATH" ]; then
    echo "RESTIC_PATH must be set to the path of the restic executable"
    exit 1
fi

if [ -z "$ATTIC_PATH" ]; then
    echo "ATTIC_PATH must be set to the path of the attic executable"
    exit 1
fi

if [ -z "$DUPLICITY_PATH" ]; then
    echo "DUPLICITY_PATH must be set to the path of the duplicity executable"
    exit 1
fi

if [ -z "$GPG_KEY" ]; then
    echo "GPG_KEY must be set for duplicity to work properly"
    exit 1
fi

if [ -z "$PASSPHRASE" ]; then
    echo "PASSPHRASE must be set for duplicity to work properly"
    exit 1
fi

# Set up directories
TEST_DIR=$1
DUPLICACY_STORAGE=${TEST_DIR}/vbox-duplicacy-storage
RESTIC_STORAGE=${TEST_DIR}/vbox-restic-storage
ATTIC_STORAGE=${TEST_DIR}/vbox-attic-storage
DUPLICITY_STORAGE=${TEST_DIR}/vbox-duplicity-storage

DUPLICACY_RESTORE=${TEST_DIR}/vbox-duplicacy-restore
RESTIC_RESTORE=${TEST_DIR}/vbox-restic-restore
ATTIC_RESTORE=${TEST_DIR}/vbox-attic-restore
DUPLICITY_RESTORE=${TEST_DIR}/vbox-duplicity-restore

# Used as the storage password throughout the tests
PASSWORD=12345678

rm -rf ${DUPLICACY_RESTORE}
mkdir -p ${DUPLICACY_RESTORE}
rm -rf ${RESTIC_RESTORE}
mkdir -p ${RESTIC_RESTORE}
rm -rf ${ATTIC_RESTORE}
mkdir -p ${ATTIC_RESTORE}
rm -rf ${DUPLICITY_RESTORE}
mkdir -p ${DUPLICITY_RESTORE}

function duplicacy_restore()
{  
    rm -rf ${DUPLICACY_RESTORE}/* 
    pushd ${DUPLICACY_RESTORE}
    time env DUPLICACY_PASSWORD=${PASSWORD} ${DUPLICACY_PATH} restore -r $1 -stats | grep -v Downloaded
    popd
}


function restic_restore()
{
    rm -rf ${RESTIC_RESTORE}/* 
    # We need to find the snapshot id to restore
    TODAY=`date +"%Y-%m-%d"`
    SNAPSHOT=`env RESTIC_PASSWORD=${PASSWORD} ${RESTIC_PATH} -r ${RESTIC_STORAGE} snapshots | grep $TODAY | head -n $1 | tail -n 1 | awk '{print $1;}'`
    echo Restoring from $SNAPSHOT
    time env RESTIC_PASSWORD=${PASSWORD} ${RESTIC_PATH} -r ${RESTIC_STORAGE} restore $SNAPSHOT --target ${RESTIC_RESTORE}
}

function attic_restore()
{
    rm -rf ${ATTIC_RESTORE}/* 
    pushd ${ATTIC_RESTORE}
    time env BORG_PASSPHRASE=${PASSWORD} ${ATTIC_PATH} extract ${ATTIC_STORAGE}::$1    
    popd
}

function duplicity_restore()
{
    rm -rf ${DUPLICITY_RESTORE}/* 
    # duplicity is crazy -- the --restore-time option doesn't take the time format printed by its own colleciton-status command!
    TODAY=`date +"%Y-%m-%d"`
    RESTORE_TIME=`${DUPLICITY_PATH} -v0 --encrypt-key ${GPG_KEY} --sign-key ${GPG_KEY} collection-status file://${DUPLICITY_STORAGE} | grep 'Full\|Incremental' | head -n $1 | tail -n 1 | awk '{print $5;}'`
    RESTORE_TIME=${TODAY}T${RESTORE_TIME}
    echo Restoring from $RESTORE_TIME
    time ${DUPLICITY_PATH} --force -v0 --encrypt-key ${GPG_KEY} restore -t $RESTORE_TIME file://${DUPLICITY_STORAGE} ${DUPLICITY_RESTORE}
}

function all_restore()
{

    echo ======================================== restore $1 ========================================
    duplicacy_restore $1
    #restic_restore $1
    attic_restore $1
    #duplicity_restore $1
}


# Initialize the duplicacy directory to be restored
pushd ${DUPLICACY_RESTORE}
env DUPLICACY_PASSWORD=${PASSWORD} ${DUPLICACY_PATH} init test ${DUPLICACY_STORAGE} -e
popd

echo restic snapshots:
env RESTIC_PASSWORD=${PASSWORD} ${RESTIC_PATH} -r ${RESTIC_STORAGE} snapshots

echo duplicity archives: 
${DUPLICITY_PATH} -v0 --encrypt-key ${GPG_KEY} --sign-key ${GPG_KEY} collection-status file://${DUPLICITY_STORAGE} | grep "Full\|Incremental"

all_restore 1
all_restore 2 
all_restore 3 

Download .txt
gitextract_847r0nph/

├── LICENSE
├── README.md
├── common.sh
├── linux-backup-test.sh
├── linux-restore-test.sh
├── tabulate.py
├── vbox-backup-test.sh
└── vbox-restore-test.sh
Download .txt
SYMBOL INDEX (2 symbols across 1 files)

FILE: tabulate.py
  function getBackup (line 15) | def getBackup(i):
  function getTime (line 22) | def getTime(minute, second):
Condensed preview — 8 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (35K chars).
[
  {
    "path": "LICENSE",
    "chars": 1057,
    "preview": "MIT License\n\nCopyright (c) 2017 \n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this s"
  },
  {
    "path": "README.md",
    "chars": 12990,
    "preview": "## Objective\n\nTo benchmark the performance and storage efficiency of 4 backup tools, [Duplicacy](https://github.com/gilb"
  },
  {
    "path": "common.sh",
    "chars": 1434,
    "preview": "\nif [ -z \"$DUPLICACY_PATH\" ]; then\n    DUPLICACY_PATH=\"`which duplicacy 2>/dev/null || echo \"\"`\"\nfi\n\nif [ -z \"$RESTIC_PA"
  },
  {
    "path": "linux-backup-test.sh",
    "chars": 4729,
    "preview": "#!/bin/bash\n\nset -o errexit\nset -o pipefail\n\nif [ \"$#\" -eq 0 ]; then\n    echo \"Usage: $0 <test dir>\"\n    exit 1\nfi\n\n\n# S"
  },
  {
    "path": "linux-restore-test.sh",
    "chars": 3750,
    "preview": "#!/bin/bash\n\n#\n# This script is to be run after linux-backup-test.sh.  It will restore backups\n# in TEST_DIR/linux-*-sto"
  },
  {
    "path": "tabulate.py",
    "chars": 1206,
    "preview": "#!/usr/bin/python\n\nimport os\nimport sys\nimport re\n\n#\n# This script is written to extract elapsed times from linux-backup"
  },
  {
    "path": "vbox-backup-test.sh",
    "chars": 3944,
    "preview": "#!/bin/bash\n\n#\n# Usage:\n#     vbox-backup-test.sh <vm dir> <test dir> <action>\n#\n#     <vm dir>: the directory that cont"
  },
  {
    "path": "vbox-restore-test.sh",
    "chars": 4059,
    "preview": "#!/bin/bash\n\n#\n# This script is to be run after vbox-backup-test.sh.  It will restore backups\n# in TEST_DIR/vbox-*-stora"
  }
]

About this extraction

This page contains the full source code of the gilbertchen/benchmarking GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 8 files (32.4 KB), approximately 11.1k tokens, and a symbol index with 2 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!