[
  {
    "path": ".gitignore",
    "content": "/_tmp/\n"
  },
  {
    "path": "README.md",
    "content": "# Prof\n\nSelf-contained C/C++ profiler library for Linux.\n\nProf offers a quick way to measure performance events (CPU clock cycles,\ncache misses, branch mispredictions, etc.) of C/C++ code snippets. Prof is\njust a wrapper around the `perf_event_open` system call, its main goal is to\nbe easy to setup and painless to use for targeted optimizations, namely, when\nthe hot spot has already been identified. In no way Prof is a replacement for\na fully-fledged profiler like perf, gprof, callgrind, etc.\n\n## Examples\n\n### Minimal\n\nThe following snippet prints the rough number of CPU clock cycles spent in\nexecuting the code between the two Prof calls:\n\n```c\n#include \"prof.h\"\n\nint main()\n{\n    PROF_START();\n    // slow code goes here...\n    PROF_STDOUT();\n}\n```\n\n### Custom options\n\nThe following snippet instead counts both read and write faults of the level\n1 data cache that occur in the userland code between the two Prof calls:\n\n```c\n#include <stdio.h>\n\n#define PROF_USER_EVENTS_ONLY\n#define PROF_EVENT_LIST \\\n    PROF_EVENT_CACHE(L1D, READ, MISS) \\\n    PROF_EVENT_CACHE(L1D, WRITE, MISS)\n#include \"prof.h\"\n\nint main()\n{\n    uint64_t faults[2] = { 0 };\n\n    PROF_START();\n    // slow code goes here...\n    PROF_DO(faults[index] += counter);\n\n    // fast or uninteresting code goes here...\n\n    PROF_START();\n    // slow code goes here...\n    PROF_DO(faults[index] += counter);\n\n    printf(\"L1: R = %\" PRIu64 \"; W = %\" PRIu64 \"\\faults[0], faults[1]);\n}\n```\n\n## Installation\n\nJust include `prof.h`. Here is a quick way to fetch the latest version:\n\n    wget -q https://raw.githubusercontent.com/cyrus-and/prof/master/prof.h\n\nPlease be aware that Prof uses `__attribute__((constructor))` to be the more\nstraightforward to setup as possible, so the header cannot be included more\nthan once.\n\nThis also means that in order to use Prof from additional threads, the setup\ncode (`prof_init` and `prof_fini` calls) must be replicated for each one of\nthem, for example:\n\n```c\nvoid *thread(void *args) {\n    prof_init();\n\n    // ...\n\n    prof_fini();\n    return NULL;\n}\n```\n\n## Setup\n\nSince Prof uses `perf_event_open` make sure to have the permission to access\nthe performance counters: either run the program as superuser (discouraged)\nor set the value of `perf_event_paranoid` appropriately, for example:\n\n```console\n$ echo 1 | sudo tee /proc/sys/kernel/perf_event_paranoid\n```\n\nOptionally make it permanent with:\n\n```console\n$ echo 'kernel.perf_event_paranoid=1' | sudo tee /etc/sysctl.d/local.conf\n```\n\nSee `man perf_event_open` for more information.\n\n## API\n\n### PROF_START()\n\nReset the counters and (re)start counting the events.\n\nThe events to be monitored are specified by setting the `PROF_EVENT_LIST`\nmacro before including this file to a list of `PROF_EVENT_*` invocations;\ndefaults to counting the number CPU clock cycles.\n\nIf the `PROF_USER_EVENTS_ONLY` macro is defined before including this file\nthen kernel and hypervisor events are excluded from the count.\n\n### PROF_EVENT(type, config)\n\nSpecify an event to be monitored, `type` and `config` are defined in the\ndocumentation of the `perf_event_open` system call.\n\n### PROF_EVENT_HW(config)\n\nSame as `PROF_EVENT` but for hardware events; prefix `PERF_COUNT_HW_` must be\nomitted from `config`.\n\n### PROF_EVENT_SW(config)\n\nSame as `PROF_EVENT` but for software events; prefix `PERF_COUNT_SW_` must be\nomitted from `config`.\n\n### PROF_EVENT_CACHE(cache, op, result)\n\nSame as `PROF_EVENT` but for cache events; prefixes `PERF_COUNT_HW_CACHE_`,\n`PERF_COUNT_HW_CACHE_OP_` and `PERF_COUNT_HW_CACHE_RESULT_` must be omitted\nfrom `cache`, `op` and `result`, respectively. Again `cache`, `op` and\n`result` are defined in the documentation of the `perf_event_open` system\ncall.\n\n### PROF_STOP()\n\nStop counting the events. The counter array can then be accessed with\n`PROF_COUNTERS`.\n\n### PROF_COUNTERS\n\nAccess the counter array. The order of counters is the same of the events\ndefined in `PROF_EVENT_LIST`. Elements of this array are 64 bit unsigned\nintegers.\n\n### PROF_DO(block)\n\nStop counting the events and execute the code provided by `block` for each\nevent. Within `code`: `index` refers to the event position index in the\ncounter array defined by `PROF_COUNTERS`; `counter` is the actual value of\nthe counter. `index` is a 64 bit unsigned integer.\n\n### PROF_CALL(callback)\n\nSame as `PROF_DO` except that `callback` is the name of a *callable* object\n(e.g. a function) which, for each event, is be called with the two parameters\n`index` and `counter`.\n\n### PROF_FILE(file)\n\nStop counting the events and write to `file` (a stdio.h `FILE *`) as many\nlines as are events in `PROF_EVENT_LIST`. Each line contains `index` and\n`counter` (as defined by `PROF_DO`) separated by a tabulation character. If\nthere is only one event then `index` is omitted.\n\n### PROF_STDOUT()\n\nSame as `PROF_LOG_FILE` except that `file` is `stdout`.\n\n### PROF_STDERR()\n\nSame as `PROF_LOG_FILE` except that `file` is `stderr`.\n\n## License\n\nCopyright (c) 2024 Andrea Cardaci <cyrus.and@gmail.com>\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in\nall copies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n\n<!-- autogenerated from prof.h -->\n"
  },
  {
    "path": "make-readme.sh",
    "content": "#!/bin/sh\n\n<prof.h >README.md awk '\nBEGIN                  { first_doc = 1 }\n/^ \\*\\/$/              { def = NR + 1; in_doc = 0 }\nin_doc                 { sub(\" \\\\* ?\", \"\"); doc=(doc == \"\" ? $0 : doc \"\\n\" $0) }\n/^\\/\\*$/               { in_doc = 1 }\nNR==def                { if (first_doc) first_doc = 0; else print \"\" }\nNR==def && /^#define/  { NF -= 1; $1 = \"###\"; printf \"%s\\n\\n%s\", $0, doc; doc=\"\" }\nNR==def && !/^#define/ { print doc; doc = \"\" }\nEND                    { printf \"\\n%s\\n\\n<!-- autogenerated from prof.h -->\\n\", doc }\n'\n"
  },
  {
    "path": "prof.h",
    "content": "/*\n * # Prof\n *\n * Self-contained C/C++ profiler library for Linux.\n *\n * Prof offers a quick way to measure performance events (CPU clock cycles,\n * cache misses, branch mispredictions, etc.) of C/C++ code snippets. Prof is\n * just a wrapper around the `perf_event_open` system call, its main goal is to\n * be easy to setup and painless to use for targeted optimizations, namely, when\n * the hot spot has already been identified. In no way Prof is a replacement for\n * a fully-fledged profiler like perf, gprof, callgrind, etc.\n *\n * ## Examples\n *\n * ### Minimal\n *\n * The following snippet prints the rough number of CPU clock cycles spent in\n * executing the code between the two Prof calls:\n *\n * ```c\n * #include \"prof.h\"\n *\n * int main()\n * {\n *     PROF_START();\n *     // slow code goes here...\n *     PROF_STDOUT();\n * }\n * ```\n *\n * ### Custom options\n *\n * The following snippet instead counts both read and write faults of the level\n * 1 data cache that occur in the userland code between the two Prof calls:\n *\n * ```c\n * #include <stdio.h>\n *\n * #define PROF_USER_EVENTS_ONLY\n * #define PROF_EVENT_LIST \\\n *     PROF_EVENT_CACHE(L1D, READ, MISS) \\\n *     PROF_EVENT_CACHE(L1D, WRITE, MISS)\n * #include \"prof.h\"\n *\n * int main()\n * {\n *     uint64_t faults[2] = { 0 };\n *\n *     PROF_START();\n *     // slow code goes here...\n *     PROF_DO(faults[index] += counter);\n *\n *     // fast or uninteresting code goes here...\n *\n *     PROF_START();\n *     // slow code goes here...\n *     PROF_DO(faults[index] += counter);\n *\n *     printf(\"L1: R = %\" PRIu64 \"; W = %\" PRIu64 \"\\faults[0], faults[1]);\n * }\n * ```\n *\n * ## Installation\n *\n * Just include `prof.h`. Here is a quick way to fetch the latest version:\n *\n *     wget -q https://raw.githubusercontent.com/cyrus-and/prof/master/prof.h\n *\n * Please be aware that Prof uses `__attribute__((constructor))` to be the more\n * straightforward to setup as possible, so the header cannot be included more\n * than once.\n *\n * This also means that in order to use Prof from additional threads, the setup\n * code (`prof_init` and `prof_fini` calls) must be replicated for each one of\n * them, for example:\n *\n * ```c\n * void *thread(void *args) {\n *     prof_init();\n *\n *     // ...\n *\n *     prof_fini();\n *     return NULL;\n * }\n * ```\n *\n * ## Setup\n *\n * Since Prof uses `perf_event_open` make sure to have the permission to access\n * the performance counters: either run the program as superuser (discouraged)\n * or set the value of `perf_event_paranoid` appropriately, for example:\n *\n * ```console\n * $ echo 1 | sudo tee /proc/sys/kernel/perf_event_paranoid\n * ```\n *\n * Optionally make it permanent with:\n *\n * ```console\n * $ echo 'kernel.perf_event_paranoid=1' | sudo tee /etc/sysctl.d/local.conf\n * ```\n *\n * See `man perf_event_open` for more information.\n */\n#ifndef PROF_H\n#define PROF_H\n\n#include <errno.h>\n#include <inttypes.h>\n#include <linux/perf_event.h>\n#include <stdarg.h>\n#include <stdint.h>\n#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n#include <sys/ioctl.h>\n#include <sys/syscall.h>\n#include <unistd.h>\n\n/*\n * ## API\n */\n\n/*\n * Reset the counters and (re)start counting the events.\n *\n * The events to be monitored are specified by setting the `PROF_EVENT_LIST`\n * macro before including this file to a list of `PROF_EVENT_*` invocations;\n * defaults to counting the number CPU clock cycles.\n *\n * If the `PROF_USER_EVENTS_ONLY` macro is defined before including this file\n * then kernel and hypervisor events are excluded from the count.\n */\n#define PROF_START()                                                           \\\n    do {                                                                       \\\n        PROF_IOCTL_(ENABLE);                                                   \\\n        PROF_IOCTL_(RESET);                                                    \\\n    } while (0)\n\n/*\n * Specify an event to be monitored, `type` and `config` are defined in the\n * documentation of the `perf_event_open` system call.\n */\n#define PROF_EVENT(type, config)                                               \\\n    (uint32_t)(type), (uint64_t)(config),\n\n/*\n * Same as `PROF_EVENT` but for hardware events; prefix `PERF_COUNT_HW_` must be\n * omitted from `config`.\n */\n#define PROF_EVENT_HW(config)                                                  \\\n    PROF_EVENT(PERF_TYPE_HARDWARE, PERF_COUNT_HW_ ## config)\n\n/*\n * Same as `PROF_EVENT` but for software events; prefix `PERF_COUNT_SW_` must be\n * omitted from `config`.\n */\n#define PROF_EVENT_SW(config)                                                  \\\n    PROF_EVENT(PERF_TYPE_SOFTWARE, PERF_COUNT_SW_ ## config)\n\n/*\n * Same as `PROF_EVENT` but for cache events; prefixes `PERF_COUNT_HW_CACHE_`,\n * `PERF_COUNT_HW_CACHE_OP_` and `PERF_COUNT_HW_CACHE_RESULT_` must be omitted\n * from `cache`, `op` and `result`, respectively. Again `cache`, `op` and\n * `result` are defined in the documentation of the `perf_event_open` system\n * call.\n */\n#define PROF_EVENT_CACHE(cache, op, result)                                    \\\n    PROF_EVENT(PERF_TYPE_HW_CACHE,                                             \\\n               (PERF_COUNT_HW_CACHE_ ## cache) |                               \\\n               (PERF_COUNT_HW_CACHE_OP_ ## op << 8) |                          \\\n               (PERF_COUNT_HW_CACHE_RESULT_ ## result << 16))\n\n/*\n * Stop counting the events. The counter array can then be accessed with\n * `PROF_COUNTERS`.\n */\n#define PROF_STOP()                                                            \\\n    do {                                                                       \\\n        PROF_IOCTL_(DISABLE);                                                  \\\n        PROF_READ_COUNTERS_(prof_event_buf_);                                  \\\n    } while (0)\n\n/*\n * Access the counter array. The order of counters is the same of the events\n * defined in `PROF_EVENT_LIST`. Elements of this array are 64 bit unsigned\n * integers.\n */\n#define PROF_COUNTERS                                                          \\\n    (prof_event_buf_ + 1)\n\n/*\n * Stop counting the events and execute the code provided by `block` for each\n * event. Within `code`: `index` refers to the event position index in the\n * counter array defined by `PROF_COUNTERS`; `counter` is the actual value of\n * the counter. `index` is a 64 bit unsigned integer.\n */\n#define PROF_DO(block)                                                         \\\n    do {                                                                       \\\n        uint64_t i_;                                                           \\\n        PROF_STOP();                                                           \\\n        for (i_ = 0; i_ < prof_event_cnt_; i_++) {                             \\\n            uint64_t index = i_;                                               \\\n            uint64_t counter = prof_event_buf_[i_ + 1];                        \\\n            (void)index;                                                       \\\n            (void)counter;                                                     \\\n            block;                                                             \\\n        }                                                                      \\\n    } while (0)\n\n/*\n * Same as `PROF_DO` except that `callback` is the name of a *callable* object\n * (e.g. a function) which, for each event, is be called with the two parameters\n * `index` and `counter`.\n */\n#define PROF_CALL(callback)                                                    \\\n    PROF_DO(callback(index, counter))\n\n/*\n * Stop counting the events and write to `file` (a stdio.h `FILE *`) as many\n * lines as are events in `PROF_EVENT_LIST`. Each line contains `index` and\n * `counter` (as defined by `PROF_DO`) separated by a tabulation character. If\n * there is only one event then `index` is omitted.\n */\n#define PROF_FILE(file)                                                        \\\n    PROF_DO(if (prof_event_cnt_ > 1) {                                         \\\n            fprintf((file), \"%\" PRIu64 \"\\t%\" PRIu64 \"\\n\", index, counter);     \\\n        } else {                                                               \\\n            fprintf((file), \"%\" PRIu64 \"\\n\", counter);                         \\\n        }                                                                      \\\n    )\n\n/*\n * Same as `PROF_LOG_FILE` except that `file` is `stdout`.\n */\n#define PROF_STDOUT()                                                          \\\n    PROF_FILE(stdout)\n\n/*\n * Same as `PROF_LOG_FILE` except that `file` is `stderr`.\n */\n#define PROF_STDERR()                                                          \\\n    PROF_FILE(stderr)\n\n/* DEFAULTS ----------------------------------------------------------------- */\n\n#ifndef PROF_EVENT_LIST\n#ifdef PERF_COUNT_HW_REF_CPU_CYCLES /* since Linux 3.3 */\n#define PROF_EVENT_LIST PROF_EVENT_HW(REF_CPU_CYCLES)\n#else\n#define PROF_EVENT_LIST PROF_EVENT_HW(CPU_CYCLES)\n#endif\n#endif\n\n/* UTILITY ------------------------------------------------------------------ */\n\n#define PROF_ASSERT_(x)                                                        \\\n    do {                                                                       \\\n        if (!(x)) {                                                            \\\n            fprintf(stderr, \"# %s:%d: PROF error\", __FILE__, __LINE__);        \\\n            if (errno) {                                                       \\\n                fprintf(stderr, \" (%s)\", strerror(errno));                     \\\n            }                                                                  \\\n            printf(\"\\n\");                                                      \\\n            abort();                                                           \\\n        }                                                                      \\\n    } while (0)\n\n#define PROF_IOCTL_(mode)                                                      \\\n    do {                                                                       \\\n        PROF_ASSERT_(ioctl(prof_fd_,                                           \\\n                           PERF_EVENT_IOC_ ## mode,                            \\\n                           PERF_IOC_FLAG_GROUP) != -1);                        \\\n    } while (0)\n\n#define PROF_READ_COUNTERS_(buffer)                                            \\\n    do {                                                                       \\\n        const ssize_t to_read = sizeof(uint64_t) * (prof_event_cnt_ + 1);      \\\n        PROF_ASSERT_(read(prof_fd_, buffer, to_read) == to_read);              \\\n    } while (0)\n\n/* SETUP -------------------------------------------------------------------- */\n\nstatic __thread int prof_fd_;\nstatic __thread uint64_t prof_event_cnt_;\nstatic __thread uint64_t *prof_event_buf_;\n\nstatic void prof_init_(uint64_t dummy, ...) {\n    uint32_t type;\n    va_list ap;\n\n    prof_fd_ = -1;\n    prof_event_cnt_ = 0;\n    va_start(ap, dummy);\n    while (type = va_arg(ap, uint32_t), type != (uint32_t)-1) {\n        struct perf_event_attr pe;\n        uint64_t config;\n        int fd;\n\n        config = va_arg(ap, uint64_t);\n\n        memset(&pe, 0, sizeof(struct perf_event_attr));\n        pe.size = sizeof(struct perf_event_attr);\n        pe.read_format = PERF_FORMAT_GROUP;\n        pe.type = type;\n        pe.config = config;\n        #ifdef PROF_USER_EVENTS_ONLY\n        pe.exclude_kernel = 1;\n        pe.exclude_hv = 1;\n        #endif\n\n        fd = syscall(__NR_perf_event_open, &pe, 0, -1, prof_fd_, 0);\n        PROF_ASSERT_(fd != -1);\n        if (prof_fd_ == -1) {\n            prof_fd_ = fd;\n        }\n\n        prof_event_cnt_++;\n    }\n    va_end(ap);\n\n    prof_event_buf_ = (uint64_t *)malloc((prof_event_cnt_ + 1) *\n                                         sizeof(uint64_t));\n}\n\nvoid __attribute__((constructor)) prof_init()\n{\n    prof_init_(0, PROF_EVENT_LIST /*,*/ (uint32_t)-1);\n}\n\nvoid __attribute__((destructor)) prof_fini()\n{\n    PROF_ASSERT_(close(prof_fd_) != -1);\n    free(prof_event_buf_);\n}\n\n#endif\n\n/*\n * ## License\n *\n * Copyright (c) 2024 Andrea Cardaci <cyrus.and@gmail.com>\n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to deal\n * in the Software without restriction, including without limitation the rights\n * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n * copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n * SOFTWARE.\n */\n"
  }
]