Full Code of cp2k/dbcsr for AI

develop b088a30daf33 cached

395 files

37.2 MB

3.9M tokens

389 symbols

1 requests

Copy disabled (too large) Download .txt

Showing preview only (15,541K chars total). Download the full file to get everything.

Repository: cp2k/dbcsr
Branch: develop
Commit: b088a30daf33
Files: 395
Total size: 37.2 MB

Directory structure:
gitextract_o5s2z7tn/

├── .ccls
├── .clang-format
├── .cmake-format.py
├── .codecov.yml
├── .fortls
├── .git-blame-ignore-revs
├── .gitattributes
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   └── bug_report.md
│   └── workflows/
│       ├── doc-generation.yml
│       ├── docker-build-env.yml
│       ├── release.yml
│       ├── testing-gcc.yml
│       ├── testing-linux.yml
│       └── testing-macos.yml
├── .gitignore
├── .gitmodules
├── .packit.yaml
├── .pre-commit/
│   ├── check_header.py
│   ├── clang-format-fypp.sh
│   └── headers/
│       ├── c_cpp.1
│       ├── c_cpp.2
│       ├── c_cpp.3
│       ├── fortran.1
│       ├── fortran.2
│       ├── fypp.1
│       ├── script.1
│       └── script.2
├── .pre-commit-config.yaml
├── .ruff.toml
├── AUTHORS
├── CMakeLists.txt
├── CONTRIBUTING.md
├── DBCSR.md
├── LICENSE
├── README.md
├── VERSION
├── cmake/
│   ├── CheckCompilerSupport.cmake
│   ├── CompilerConfiguration.cmake
│   ├── CustomTargets.cmake
│   ├── GetGitRevisionDescription.cmake
│   ├── GetGitRevisionDescription.cmake.in
│   ├── compiler-tests/
│   │   ├── f2008-block_construct.f90
│   │   ├── f2008-contiguous.f90
│   │   ├── f2008-norm2.f90
│   │   └── f95-reshape-order-allocatable.f90
│   └── fypp-sources.cmake
├── docs/
│   ├── CMakeLists.txt
│   ├── guide/
│   │   ├── 1-DBCSR/
│   │   │   ├── index.md
│   │   │   └── publications.md
│   │   ├── 2-user-guide/
│   │   │   ├── 1-installation/
│   │   │   │   ├── 1-cmake-build-recipes.md
│   │   │   │   ├── 2-supported-compilers.md
│   │   │   │   ├── 3-using-dbcsr-in-a-cmake-project.md
│   │   │   │   ├── 4-docker.md
│   │   │   │   └── index.md
│   │   │   ├── 2-tests/
│   │   │   │   └── index.md
│   │   │   ├── 3-examples/
│   │   │   │   └── index.md
│   │   │   ├── 4-gpu/
│   │   │   │   └── index.md
│   │   │   └── index.md
│   │   ├── 3-developer-guide/
│   │   │   ├── 1-tooling/
│   │   │   │   └── index.md
│   │   │   ├── 2-documentation/
│   │   │   │   └── index.md
│   │   │   ├── 3-programming/
│   │   │   │   ├── 1-overview/
│   │   │   │   │   └── index.md
│   │   │   │   ├── 2-accelerator-backend/
│   │   │   │   │   ├── 1-code-structure.md
│   │   │   │   │   ├── 2-libsmm_acc/
│   │   │   │   │   │   ├── 1-kernels.md
│   │   │   │   │   │   ├── 2-parameters.md
│   │   │   │   │   │   ├── 3-tune.md
│   │   │   │   │   │   └── index.md
│   │   │   │   │   ├── 3-libsmm_ocl/
│   │   │   │   │   │   ├── 1-autotune.md
│   │   │   │   │   │   ├── 2-bulktune.md
│   │   │   │   │   │   └── index.md
│   │   │   │   │   └── index.md
│   │   │   │   └── index.md
│   │   │   ├── 4-performance/
│   │   │   │   ├── 1-insights.md
│   │   │   │   ├── 2-just-in-time-compilation.md
│   │   │   │   └── index.md
│   │   │   └── index.md
│   │   └── index.md
│   └── media/
│       └── logo/
│           └── logo.ppt
├── examples/
│   ├── .gitignore
│   ├── CMakeLists.txt
│   ├── README.md
│   ├── dbcsr_example_1.F
│   ├── dbcsr_example_2.F
│   ├── dbcsr_example_3.F
│   ├── dbcsr_example_3.cpp
│   ├── dbcsr_tensor_example_1.F
│   └── dbcsr_tensor_example_2.cpp
├── src/
│   ├── .gitignore
│   ├── CMakeLists.txt
│   ├── PACKAGE
│   ├── acc/
│   │   ├── PACKAGE
│   │   ├── README.md
│   │   ├── acc.h
│   │   ├── acc_bench.c
│   │   ├── acc_bench.h
│   │   ├── acc_libsmm.h
│   │   ├── acc_triplets.sh
│   │   ├── cuda/
│   │   │   ├── Makefile
│   │   │   ├── PACKAGE
│   │   │   ├── acc_cuda.cpp
│   │   │   ├── acc_cuda.h
│   │   │   ├── dbcsr_cuda_nvtx_cu.cpp
│   │   │   └── dbcsr_cuda_profiling.F
│   │   ├── cuda_hip/
│   │   │   ├── PACKAGE
│   │   │   ├── acc_blas.cpp
│   │   │   ├── acc_blas.h
│   │   │   ├── acc_dev.cpp
│   │   │   ├── acc_error.cpp
│   │   │   ├── acc_error.h
│   │   │   ├── acc_event.cpp
│   │   │   ├── acc_init.cpp
│   │   │   ├── acc_mem.cpp
│   │   │   ├── acc_stream.cpp
│   │   │   ├── acc_utils.cpp
│   │   │   ├── acc_utils.h
│   │   │   └── calculate_norms.cpp
│   │   ├── dbcsr_acc_device.F
│   │   ├── dbcsr_acc_devmem.F
│   │   ├── dbcsr_acc_event.F
│   │   ├── dbcsr_acc_hostmem.F
│   │   ├── dbcsr_acc_init.F
│   │   ├── dbcsr_acc_stream.F
│   │   ├── dbcsr_acc_timings.F
│   │   ├── hip/
│   │   │   ├── PACKAGE
│   │   │   ├── acc_hip.cpp
│   │   │   ├── acc_hip.h
│   │   │   └── dbcsr_hip_profiling.F
│   │   ├── libsmm_acc/
│   │   │   ├── .gitignore
│   │   │   ├── CMakeLists.txt
│   │   │   ├── PACKAGE
│   │   │   ├── README.md
│   │   │   ├── generate_kernels.py
│   │   │   ├── generate_parameters.py
│   │   │   ├── kernels/
│   │   │   │   ├── PACKAGE
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── autotuning_properties.json
│   │   │   │   ├── gpu_properties.json
│   │   │   │   ├── smm_acc.py
│   │   │   │   ├── smm_acc_common.h
│   │   │   │   ├── smm_acc_dnt_base.py
│   │   │   │   ├── smm_acc_dnt_largeDB1.h
│   │   │   │   ├── smm_acc_dnt_largeDB1.py
│   │   │   │   ├── smm_acc_dnt_largeDB2.h
│   │   │   │   ├── smm_acc_dnt_largeDB2.py
│   │   │   │   ├── smm_acc_dnt_medium.h
│   │   │   │   ├── smm_acc_dnt_medium.py
│   │   │   │   ├── smm_acc_dnt_small.h
│   │   │   │   ├── smm_acc_dnt_small.py
│   │   │   │   ├── smm_acc_dnt_tiny.h
│   │   │   │   ├── smm_acc_dnt_tiny.py
│   │   │   │   ├── smm_acc_predict.py
│   │   │   │   └── smm_acc_transpose.h
│   │   │   ├── libcusmm/
│   │   │   │   ├── .gitignore
│   │   │   │   └── PACKAGE
│   │   │   ├── libsmm_acc.cpp
│   │   │   ├── libsmm_acc.h
│   │   │   ├── libsmm_acc_benchmark.cpp
│   │   │   ├── libsmm_acc_benchmark.h
│   │   │   ├── libsmm_acc_init.cpp
│   │   │   ├── libsmm_acc_init.h
│   │   │   ├── parameters/
│   │   │   │   ├── parameters_A100.json
│   │   │   │   ├── parameters_H100.json
│   │   │   │   ├── parameters_K20X.json
│   │   │   │   ├── parameters_K40.json
│   │   │   │   ├── parameters_K80.json
│   │   │   │   ├── parameters_Mi100.json
│   │   │   │   ├── parameters_Mi250.json
│   │   │   │   ├── parameters_Mi300.json
│   │   │   │   ├── parameters_Mi350.json
│   │   │   │   ├── parameters_Mi50.json
│   │   │   │   ├── parameters_P100.json
│   │   │   │   └── parameters_V100.json
│   │   │   ├── parameters_utils.h
│   │   │   └── tune/
│   │   │       ├── .gitignore
│   │   │       ├── README.md
│   │   │       ├── archive.sh
│   │   │       ├── cleanup.sh
│   │   │       ├── requirements.txt
│   │   │       ├── tune_collect.py
│   │   │       ├── tune_merge.py
│   │   │       ├── tune_setup.py
│   │   │       └── tune_submit.py
│   │   └── opencl/
│   │       ├── Makefile
│   │       ├── PACKAGE
│   │       ├── README.md
│   │       ├── acc_getenv.sh
│   │       ├── acc_opencl.c
│   │       ├── acc_opencl.h
│   │       ├── acc_opencl.sh
│   │       ├── acc_opencl_event.c
│   │       ├── acc_opencl_mem.c
│   │       ├── acc_opencl_stream.c
│   │       ├── common/
│   │       │   ├── opencl_atomics.h
│   │       │   └── opencl_common.h
│   │       └── smm/
│   │           ├── .gitignore
│   │           ├── CMakeLists.txt
│   │           ├── PACKAGE
│   │           ├── README-autotune.md
│   │           ├── README-bulktune.md
│   │           ├── README.md
│   │           ├── kernels/
│   │           │   ├── multiply.cl
│   │           │   └── transpose.cl
│   │           ├── opencl_libsmm.c
│   │           ├── opencl_libsmm.h
│   │           ├── opencl_test.sh
│   │           ├── params/
│   │           │   ├── README.md
│   │           │   ├── tune_multiply_A100.csv
│   │           │   ├── tune_multiply_BMG.csv
│   │           │   ├── tune_multiply_GH200.csv
│   │           │   ├── tune_multiply_H100.csv
│   │           │   ├── tune_multiply_Mi250.csv
│   │           │   ├── tune_multiply_P100.csv
│   │           │   ├── tune_multiply_PVC.csv
│   │           │   └── tune_multiply_V100.csv
│   │           ├── requirements.txt
│   │           ├── tune_multiply.py
│   │           └── tune_multiply.sh
│   ├── base/
│   │   ├── PACKAGE
│   │   ├── dbcsr_base_hooks.F
│   │   ├── dbcsr_base_uses.f90
│   │   ├── dbcsr_kinds.F
│   │   ├── dbcsr_machine.F
│   │   ├── dbcsr_machine_internal.F
│   │   └── dbcsr_machine_posix.f90
│   ├── block/
│   │   ├── PACKAGE
│   │   ├── dbcsr_block_access.F
│   │   ├── dbcsr_block_operations.F
│   │   ├── dbcsr_index_operations.F
│   │   └── dbcsr_iterator_operations.F
│   ├── cmake/
│   │   └── DBCSRConfig.cmake.in
│   ├── core/
│   │   ├── PACKAGE
│   │   ├── dbcsr_array_types.F
│   │   ├── dbcsr_config.F
│   │   ├── dbcsr_dict.F
│   │   ├── dbcsr_dict.fypp
│   │   ├── dbcsr_error_handling.F
│   │   ├── dbcsr_iter_types.F
│   │   ├── dbcsr_lib.F
│   │   ├── dbcsr_list.F
│   │   ├── dbcsr_list.fypp
│   │   ├── dbcsr_list_callstackentry.F
│   │   ├── dbcsr_list_routinereport.F
│   │   ├── dbcsr_list_routinestat.F
│   │   ├── dbcsr_list_timerenv.F
│   │   ├── dbcsr_log_handling.F
│   │   ├── dbcsr_methods.F
│   │   ├── dbcsr_print_messages.F
│   │   ├── dbcsr_timings.F
│   │   ├── dbcsr_timings_base_type.F
│   │   ├── dbcsr_timings_report.F
│   │   ├── dbcsr_timings_types.F
│   │   └── dbcsr_types.F
│   ├── data/
│   │   ├── PACKAGE
│   │   ├── dbcsr.fypp
│   │   ├── dbcsr_data_methods.F
│   │   ├── dbcsr_data_methods_low.F
│   │   ├── dbcsr_data_operations.F
│   │   ├── dbcsr_data_types.F
│   │   ├── dbcsr_mem_methods.F
│   │   └── dbcsr_ptr_util.F
│   ├── dbcsr.h
│   ├── dbcsr_api.F
│   ├── dbcsr_api_c.F
│   ├── mm/
│   │   ├── PACKAGE
│   │   ├── dbcsr_acc_operations.F
│   │   ├── dbcsr_mm.F
│   │   ├── dbcsr_mm_3d.F
│   │   ├── dbcsr_mm_accdrv.F
│   │   ├── dbcsr_mm_cannon.F
│   │   ├── dbcsr_mm_common.F
│   │   ├── dbcsr_mm_csr.F
│   │   ├── dbcsr_mm_dist_operations.F
│   │   ├── dbcsr_mm_hostdrv.F
│   │   ├── dbcsr_mm_multrec.F
│   │   ├── dbcsr_mm_sched.F
│   │   ├── dbcsr_mm_types.F
│   │   └── dbcsr_multiply_api.F
│   ├── mpi/
│   │   ├── PACKAGE
│   │   ├── dbcsr_mp_methods.F
│   │   ├── dbcsr_mp_operations.F
│   │   ├── dbcsr_mpiwrap.F
│   │   └── dbcsr_mpiwrap.fypp
│   ├── ops/
│   │   ├── PACKAGE
│   │   ├── dbcsr_csr_conversions.F
│   │   ├── dbcsr_io.F
│   │   ├── dbcsr_operations.F
│   │   ├── dbcsr_test_methods.F
│   │   ├── dbcsr_tests.F
│   │   └── dbcsr_transformations.F
│   ├── tas/
│   │   ├── PACKAGE
│   │   ├── dbcsr_tas.fypp
│   │   ├── dbcsr_tas_base.F
│   │   ├── dbcsr_tas_global.F
│   │   ├── dbcsr_tas_io.F
│   │   ├── dbcsr_tas_mm.F
│   │   ├── dbcsr_tas_reshape_ops.F
│   │   ├── dbcsr_tas_split.F
│   │   ├── dbcsr_tas_test.F
│   │   ├── dbcsr_tas_types.F
│   │   └── dbcsr_tas_util.F
│   ├── tensors/
│   │   ├── PACKAGE
│   │   ├── dbcsr_allocate_wrap.F
│   │   ├── dbcsr_array_list_methods.F
│   │   ├── dbcsr_tensor.F
│   │   ├── dbcsr_tensor.fypp
│   │   ├── dbcsr_tensor.h
│   │   ├── dbcsr_tensor_api.F
│   │   ├── dbcsr_tensor_api_c.F
│   │   ├── dbcsr_tensor_block.F
│   │   ├── dbcsr_tensor_index.F
│   │   ├── dbcsr_tensor_io.F
│   │   ├── dbcsr_tensor_reshape.F
│   │   ├── dbcsr_tensor_split.F
│   │   ├── dbcsr_tensor_test.F
│   │   └── dbcsr_tensor_types.F
│   ├── utils/
│   │   ├── PACKAGE
│   │   ├── dbcsr_array_sort.F
│   │   ├── dbcsr_array_sort.fypp
│   │   ├── dbcsr_blas_operations.F
│   │   ├── dbcsr_btree.F
│   │   ├── dbcsr_btree.fypp
│   │   ├── dbcsr_files.F
│   │   ├── dbcsr_hash_table.f90
│   │   ├── dbcsr_hash_table_types.f90
│   │   ├── dbcsr_min_heap.F
│   │   ├── dbcsr_string_utilities.F
│   │   └── dbcsr_toollib.F
│   └── work/
│       ├── PACKAGE
│       └── dbcsr_work_operations.F
├── tests/
│   ├── .gitignore
│   ├── CMakeLists.txt
│   ├── README.md
│   ├── dbcsr_acc_test.c
│   ├── dbcsr_performance_driver.F
│   ├── dbcsr_performance_multiply.F
│   ├── dbcsr_tas_unittest.F
│   ├── dbcsr_tensor_test.cpp
│   ├── dbcsr_tensor_unittest.F
│   ├── dbcsr_test.cpp
│   ├── dbcsr_test_add.F
│   ├── dbcsr_test_csr_conversions.F
│   ├── dbcsr_test_multiply.F
│   ├── dbcsr_test_scale_by_vector.F
│   ├── dbcsr_unittest1.F
│   ├── dbcsr_unittest2.F
│   ├── dbcsr_unittest3.F
│   ├── dbcsr_unittest4.F
│   ├── generate_libsmm_acc_timer_multiply.py
│   ├── generate_libsmm_acc_unittest_multiply.py
│   ├── input.perf
│   ├── inputs/
│   │   ├── test_H2O.perf
│   │   ├── test_rect1_dense.perf
│   │   ├── test_rect1_sparse.perf
│   │   ├── test_rect2_dense.perf
│   │   ├── test_rect2_sparse.perf
│   │   ├── test_singleblock.perf
│   │   ├── test_square_dense.perf
│   │   ├── test_square_sparse.perf
│   │   ├── test_square_sparse_bigblocks.perf
│   │   └── test_square_sparse_rma.perf
│   ├── libsmm_acc_timer_multiply.cpp.template
│   ├── libsmm_acc_unittest_multiply.cpp.template
│   └── libsmm_acc_unittest_transpose.cpp
└── tools/
    ├── build_libsmm/
    │   ├── COPYRIGHT
    │   ├── README
    │   ├── config/
    │   │   ├── cray.cce
    │   │   ├── cray.gnu
    │   │   ├── cray.intel.libsci
    │   │   ├── cray.intel.mkl
    │   │   ├── cray_mic.intel
    │   │   ├── linux.gnu
    │   │   ├── linux.intel
    │   │   ├── local_libxsmm.gnu
    │   │   ├── mic.intel
    │   │   ├── none.wlm
    │   │   ├── pbs.wlm
    │   │   └── slurm.wlm
    │   ├── config.in
    │   ├── generate
    │   ├── generate.bash
    │   ├── lib_gen.f90
    │   ├── make.gen
    │   ├── multrec_gen.f90
    │   ├── mults.f90
    │   ├── small_gen.f90
    │   └── tiny_gen.f90
    ├── docker/
    │   ├── Dockerfile.build-env-latest-gcc
    │   ├── Dockerfile.build-env-rocm
    │   ├── Dockerfile.build-env-ubuntu
    │   ├── Dockerfile.build-env-ubuntu-cuda
    │   ├── Makefile
    │   ├── README.md
    │   └── lsan.supp
    └── fedora/
        ├── dbcsr.rpmlintrc
        └── dbcsr.spec

================================================
FILE CONTENTS
================================================

================================================
FILE: .ccls
================================================
clang
%c -std=c17
%cpp -std=c++17
-Isrc/

================================================
FILE: .clang-format
================================================
---
AlignAfterOpenBracket: DontAlign
AlignEscapedNewlines: DontAlign
AlignTrailingComments: false
AllowShortCaseLabelsOnASingleLine: true
AllowShortIfStatementsOnASingleLine: AllIfsAndElse
AllowShortLoopsOnASingleLine: true
BraceWrapping:
  AfterControlStatement: MultiLine
  BeforeCatch: true
  BeforeElse: true
BreakBeforeBraces: Custom
ColumnLimit: 132
ConstructorInitializerIndentWidth: 0
ContinuationIndentWidth: 2
IndentCaseLabels: true
IndentPPDirectives: AfterHash
IndentWidth: 2
KeepEmptyLinesAtTheStartOfBlocks: false
MaxEmptyLinesToKeep: 2
PenaltyBreakAssignment: 50
PointerAlignment: Left
ReflowComments: false
SortIncludes: false
SpaceAfterTemplateKeyword: false
UseTab: Never

...


================================================
FILE: .cmake-format.py
================================================
# flake8: noqa
with section("format"):
    separate_ctrl_name_with_space = True


================================================
FILE: .codecov.yml
================================================
coverage:
  precision: 1
  round: down
  range: 60..100
comment:
  require_changes: true
  after_n_builds: 12


================================================
FILE: .fortls
================================================
{
  "excl_paths": ["build"]
}

================================================
FILE: .git-blame-ignore-revs
================================================
# git commit hashes with whitespace/reformatting changes only
# Make git-blame use this file by running:
#   git config blame.ignoreRevsFile .git-blame-ignore-revs

21f91d84e3cda3eaf838a3b510077c9e01c8aeb8
fcfa8ae3551ae0d3517db59cbc3ebaea0dccc967
53891420300f402fd70afcbfb00e1b21024040d3
e948f9db58989f081fa0b4e4e4236e06b5566364
972f09eb17c021c4970ddbd08596869b43e88a33
8e647643ad149d2b886c74db1d50376066d65483
8ebeab6fd522ec54e89ab7c6dd14bae36efc8d67
3169190d69019d1221b1fee92e35d2501479f403
df7ea8fb053b17a0bb6f9c3b243652395cf9dbe7
224479c43e58305d2da2f41eff727baca5ef72c8
f4d6abdf94381392a496e736c8684f06664bd9e5
e8e1fe118ce19e6e6942405baeae2f96dbd620cb
fa1838a1e4ae03b87c3c1e4b37ba4dc2b97d75ae
65f3f47c78c51548bb84332f239b79e6c0094d6f


================================================
FILE: .gitattributes
================================================
.gitattributes export-ignore
.gitignore export-ignore
.gitmodules export-ignore
.travis.yml export-ignore
.github export-ignore


================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a report to help us improve

---

**Describe the bug**
A clear and concise description of what the bug is, including the stacktrace (if there was one).

**To Reproduce**
Steps to reproduce the behavior:
1. Built with the command: '...'
2. Run like this: '....'
3. On the architecture/host/platform: '....'
4. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**Environment:**
 - Operating system & version
 - Compiler vendor & version
 - Build environment (make or cmake)
 - Configuration of DBCSR (either the cmake flags or the `Makefile.inc`)
 - MPI implementation and version
 - If CUDA is being used: CUDA version and GPU architecture
 - BLAS/LAPACK implementation and version
 - If applicable: Runtime information (how many nodes, type of nodes, ...)


================================================
FILE: .github/workflows/doc-generation.yml
================================================
---
name: Generating documentation
on:
  push:
    branches:
    - 'develop'
    tags:
    - 'v*'
  workflow_dispatch:

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    container:
      image: ghcr.io/cp2k/dbcsr-build-env-ubuntu-22.04:develop
      volumes:
      - "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro"

    steps:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0
        submodules: true

    - name: Configure
      run: |
        mkdir -p build
        cd build
        cmake -G Ninja \
          -DUSE_MPI=ON \
          -DBUILD_TESTING=ON \
          -DUSE_OPENMP=ON \
          -DUSE_SMM=libxsmm \
          -DMPI_EXECUTABLE_SUFFIX=.mpich \
          ..

    - name: Build
      run: |
        cmake --build build -- doc
        touch build/doc/.nojekyll

    - name: Configure git to trust the workspace despite the different owner
      run:
        git config --global --add safe.directory "$GITHUB_WORKSPACE"

    - name: Deploy Development Documentation
      if: github.repository == 'cp2k/dbcsr' && github.ref == 'refs/heads/develop'
      uses: JamesIves/github-pages-deploy-action@releases/v4
      with:
        branch: gh-pages
        folder: build/doc
        target-folder: develop
        clean: true
        clean-exclude: |
          releases/
        ssh-key: ${{ secrets.SSH_DEPLOY_KEY }}

    - name: Get the release version
      id: get_version
      run: echo ::set-output name=VERSION::${GITHUB_REF/refs\/tags\/v/}
      shell: bash

    - name: Deploy Release Documentation
      if: github.repository == 'cp2k/dbcsr' && contains(github.ref, 'tags')
      uses: JamesIves/github-pages-deploy-action@releases/v4
      with:
        branch: gh-pages
        folder: build/doc
        target-folder: 'releases/v${{ steps.get_version.outputs.VERSION }}'
        ssh-key: ${{ secrets.SSH_DEPLOY_KEY }}

#  vim: set ts=2 sw=2 tw=0 :


================================================
FILE: .github/workflows/docker-build-env.yml
================================================
---
name: Publish DBCSR Build Environments to the GitHub Contrainer Registry

on:
  push:
    branches:
    - 'develop'
    paths:
    - 'tools/docker/**'
    - '.github/workflows/docker-build-env.yml'
  schedule:  # runs on the last commit of the repo's default branch
    - cron: '45 23 * * *'
  workflow_dispatch:

jobs:
  docker-build-env:
    runs-on: ubuntu-latest
    if: github.repository == 'cp2k/dbcsr' # Only run from main repo
    strategy:
      matrix:
        include:
          - docker_image: ghcr.io/cp2k/dbcsr-build-env-ubuntu-22.04
            context: tools/docker
            file: Dockerfile.build-env-ubuntu
            registry: ghcr.io
          - docker_image: ghcr.io/cp2k/dbcsr-build-env-latest-gcc
            context: tools/docker
            file: Dockerfile.build-env-latest-gcc
            registry: ghcr.io
          - docker_image: ghcr.io/cp2k/dbcsr-build-env-rocm
            context: tools/docker
            file: Dockerfile.build-env-rocm
            registry: ghcr.io
          - docker_image: ghcr.io/cp2k/dbcsr-build-env-ubuntu-22.04-cuda
            context: tools/docker
            file: Dockerfile.build-env-ubuntu-cuda
            registry: ghcr.io

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v4

      - name: Prepare
        id: prep
        run: |
          DOCKER_IMAGE=${{ matrix.docker_image }}
          VERSION=latest
          if [[ $GITHUB_REF == refs/tags/* ]]; then
            VERSION=${GITHUB_REF#refs/tags/}
          elif [[ $GITHUB_REF == refs/heads/* ]]; then
            VERSION=$(echo ${GITHUB_REF#refs/heads/} | sed -r 's#/+#-#g')
          elif [[ $GITHUB_REF == refs/pull/* ]]; then
            VERSION=pr-${{ github.event.number }}
          fi
          TAGS="${DOCKER_IMAGE}:${VERSION}"
          if [ "${{ github.event_name }}" = "push" ]; then
            TAGS="$TAGS,${DOCKER_IMAGE}:sha-${GITHUB_SHA::8}"
          fi
          echo "version=${VERSION}" >> $GITHUB_OUTPUT
          echo "tags=${TAGS}" >> $GITHUB_OUTPUT
          echo "created=$(date -u +'%Y-%m-%dT%H:%M:%SZ')" >> $GITHUB_OUTPUT

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Login to Container registry
        uses: docker/login-action@v3
        with:
          registry: ${{ matrix.registry }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and push container image
        uses: docker/build-push-action@v5
        with:
          context: ${{ matrix.context }}
          file: ${{ matrix.context }}/${{ matrix.file }}
          push: ${{ github.event_name != 'pull_request' }}
          tags: ${{ steps.prep.outputs.tags }}
          labels: |
            org.opencontainers.image.source=${{ github.event.repository.html_url }}
            org.opencontainers.image.created=${{ steps.prep.outputs.created }}
            org.opencontainers.image.revision=${{ github.sha }}


================================================
FILE: .github/workflows/release.yml
================================================
---
name: Create release
on:
  push:
    tags:
    - 'v*'

jobs:
  build-and-upload:
    runs-on: ubuntu-latest
    container:
      image: ghcr.io/cp2k/dbcsr-build-env-ubuntu-22.04:develop

    steps:
    - uses: actions/checkout@v4
      with:
        submodules: true

    - name: Configure
      run: |
        mkdir -p build
        cd build
        cmake -G Ninja \
          -DUSE_MPI=ON \
          -DBUILD_TESTING=ON \
          -DUSE_OPENMP=ON \
          -DUSE_SMM=libxsmm \
          -DMPI_EXECUTABLE_SUFFIX=.mpich \
          ..

    - name: Configure git to trust the workspace despite the different owner
      run:
        git config --global --add safe.directory "$GITHUB_WORKSPACE"

    - name: Build Release Asset
      run: cmake --build build -- dist

    - name: Get the release version
      id: get_version
      run: echo ::set-output name=VERSION::${GITHUB_REF/refs\/tags\/v/}
      shell: bash

    - name: Create Release
      id: create_release
      uses: actions/create-release@latest
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      with:
        tag_name: ${{ github.ref }}
        release_name: Release ${{ github.ref }}
        draft: true
        prerelease: true

    - name: Upload Release Asset
      uses: actions/upload-release-asset@v1
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      with:
        upload_url: ${{ steps.create_release.outputs.upload_url }}
        asset_path: ./build/dist/dbcsr-${{ steps.get_version.outputs.VERSION }}.tar.gz
        asset_name: dbcsr-${{ steps.get_version.outputs.VERSION }}.tar.gz
        asset_content_type: application/gzip

#  vim: set ts=2 sw=2 tw=0 :


================================================
FILE: .github/workflows/testing-gcc.yml
================================================
---
name: Testing with latest gcc
on:
  push:
    branches:
    - 'develop'
  pull_request:

jobs:
  build-and-test:
    runs-on: ubuntu-latest
    container:
      image: ghcr.io/cp2k/dbcsr-build-env-latest-gcc:develop

    steps:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0
        submodules: true

    - name: Configure
      run: |
        mkdir -p build
        cd build
        cmake -G Ninja \
          -DCMAKE_BUILD_TYPE=Debug \
          -DBUILD_TESTING=ON \
          -DUSE_MPI=OFF \
          -DUSE_OPENMP=ON \
          -DUSE_SMM=blas \
          -DUSE_MPI_F08=ON \
          ..

    - name: Build
      run: cmake --build build -- --verbose

    - name: Test
      run: |
        export LSAN_OPTIONS=suppressions=$PWD/tools/docker/lsan.supp
        cd build
        ctest --output-on-failure

#  vim: set ts=2 sw=2 tw=0 :


================================================
FILE: .github/workflows/testing-linux.yml
================================================
---
name: Testing on Linux
on:
  push:
    branches:
    - 'develop'
  pull_request:

jobs:
  ##################################################################################
  # Run pre-commit
  ##################################################################################
  pre-commit:
    runs-on: ubuntu-latest
    container:
      image: ghcr.io/cp2k/dbcsr-build-env-ubuntu-22.04:develop
    steps:
    - uses: actions/checkout@v4
    - name: Run pre-commit
      run: |
        git config --global --add safe.directory "$GITHUB_WORKSPACE"
        pre-commit run --all-files || ( git status --short ; git diff ; exit 1 )

  ##################################################################################
  # Build and test on linux, no accelerator
  ##################################################################################
  build-and-test:
    runs-on: ubuntu-latest
    container:
      image: ghcr.io/cp2k/dbcsr-build-env-ubuntu-22.04:develop

    strategy:
      matrix:
        use_mpi: [MPI=ON, MPI=OFF]
        use_openmp: [OPENMP=ON, OPENMP=OFF]
        use_smm: [SMM=blas, SMM=libxsmm]
        mpi_suffix: [openmpi, mpich]
        exclude:
          - use_mpi: MPI=OFF
            mpi_suffix: mpich

    steps:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0
        submodules: true

    - name: Configure
      run: |
        mkdir -p build
        cd build
        cmake -G Ninja \
          -DCMAKE_BUILD_TYPE=Coverage \
          -DBUILD_TESTING=ON \
          -DUSE_${{ matrix.use_mpi }} \
          -DUSE_${{ matrix.use_openmp }} \
          -DUSE_${{ matrix.use_smm }} \
          -DMPI_EXECUTABLE_SUFFIX=.${{ matrix.mpi_suffix }} \
          -DMPIEXEC_PREFLAGS="$([ "${{ matrix.mpi_suffix }}" = "openmpi" ] && echo "-mca btl ^openib --allow-run-as-root --oversubscribe")" \
          -DLCOV_ARGS="--test-name;${{ matrix.use_mpi }}-${{ matrix.use_openmp }}-${{ matrix.use_smm }}-cpu" \
          -DTEST_MPI_RANKS=auto \
          ..

    - name: Build
      run: cmake --build build -- --verbose

    - name: Test
      run: |
        cd build
        ctest --output-on-failure

    - name: Generate coverage info
      run: |
        cmake --build build -- cov-info
        mv build/coverage.info build/coverage-Linux-${{ matrix.use_mpi }}-${{ matrix.use_openmp }}-${{ matrix.use_smm }}-cpu.info

    - name: Upload coverage data
      uses: actions/upload-artifact@v4
      with:
        name: coverage-data-${{ matrix.use_mpi }}-${{ matrix.use_openmp }}-${{ matrix.use_smm }}-${{ matrix.mpi_suffix }}
        path: build/coverage-*.info

    - name: Upload coverage data (generated files)
      uses: actions/upload-artifact@v4
      if: matrix.use_mpi == 'MPI=ON' && matrix.use_openmp == 'OPENMP=ON' && matrix.use_smm == 'SMM=blas' && matrix.mpi_suffix == 'openmpi'
      with:
        name: coverage-data-${{ matrix.use_mpi }}-${{ matrix.use_openmp }}-${{ matrix.use_smm }}-${{ matrix.mpi_suffix }}-generated-files
        path: |
          build/src/dbcsr.h
          build/src/tensors/dbcsr_tensor.h

  ##################################################################################
  # Build on CUDA
  ##################################################################################
  build-on-cuda:
    runs-on: ubuntu-latest
    container:
      image: ghcr.io/cp2k/dbcsr-build-env-ubuntu-22.04-cuda:develop

    strategy:
      matrix:
        use_mpi: [MPI=ON, MPI=OFF]
        use_openmp: [OPENMP=ON]
        mpi_suffix: [mpich]

    steps:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0
        submodules: true

    - name: Configure
      run: |
        mkdir -p build
        cd build
        cmake -G Ninja \
          -DCMAKE_BUILD_TYPE=Debug \
          -DBUILD_TESTING=ON \
          -DUSE_${{ matrix.use_mpi }} \
          -DUSE_${{ matrix.use_openmp }} \
          -DUSE_ACCEL=cuda \
          -DWITH_GPU=H100 \
          -DWITH_EXAMPLES=ON \
          -DWITH_CUDA_PROFILING=ON \
          ..
    - name: Build
      run: cmake --build build -- --verbose

  ##################################################################################
  # Build on OpenCL
  ##################################################################################
  build-on-opencl:
    runs-on: ubuntu-latest
    container:
      image: ghcr.io/cp2k/dbcsr-build-env-ubuntu-22.04-cuda:develop

    strategy:
      matrix:
        use_openmp: [OPENMP=ON]
        use_smm: [SMM=libxsmm]

    steps:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0
        submodules: true

    - name: Configure
      run: |
        mkdir -p build
        cd build
        cmake -G Ninja \
          -DCMAKE_BUILD_TYPE=Debug \
          -DBUILD_TESTING=ON \
          -DUSE_${{ matrix.use_openmp }} \
          -DUSE_${{ matrix.use_smm }} \
          -DUSE_ACCEL=opencl \
          -DWITH_EXAMPLES=ON \
          ..
    - name: Build
      run: cmake --build build -- --verbose

  ##################################################################################
  # Build on ROCm
  ##################################################################################
  build-on-rocm:
    runs-on: ubuntu-latest
    container:
      image: ghcr.io/cp2k/dbcsr-build-env-rocm:develop

    strategy:
      matrix:
        use_mpi: [MPI=ON, MPI=OFF]
        use_openmp: [OPENMP=ON]
        mpi_suffix: [mpich]

    steps:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0
        submodules: true

    - name: Configure
      run: |
        mkdir -p build
        cd build
        cmake -G Ninja \
          -DCMAKE_BUILD_TYPE=Release \
          -DBUILD_TESTING=ON \
          -DUSE_${{ matrix.use_mpi }} \
          -DUSE_${{ matrix.use_openmp }} \
          -DUSE_ACCEL=hip \
          -DWITH_GPU=Mi250 \
          -DWITH_EXAMPLES=ON \
          -DCMAKE_PREFIX_PATH=/opt/rocm \
          ..
    - name: Build
      run: cmake --build build -- --verbose

  coverage:
    name: Combine & check coverage.
    runs-on: ubuntu-latest
    needs: build-and-test
    container:
      image: ghcr.io/cp2k/dbcsr-build-env-ubuntu-22.04:develop

    steps:
      - uses: actions/checkout@v4

      - name: Download coverage data
        uses: actions/download-artifact@v4.1.7
        with:
          pattern: coverage-data-*
          merge-multiple: true

      - name: Combine coverage
        run: |
          mkdir -p build/src
          mv dbcsr.h tensors build/src/
          echo *.info | xargs printf -- '-a %s\n' | xargs lcov -o merged.info
          genhtml merged.info -o htmlcov
          lcov --summary merged.info

      - name: Upload merged HTML report
        uses: actions/upload-artifact@v4
        with:
          name: html-report
          path: htmlcov

#  vim: set ts=2 sw=2 tw=0 :


================================================
FILE: .github/workflows/testing-macos.yml
================================================
---
name: Testing on macOS
on:
  push:
    branches:
    - 'develop'
  pull_request:

jobs:
  build-and-test:
    runs-on: macos-latest

    strategy:
      matrix:
        use_mpi: [MPI=ON]
        use_openmp: [OPENMP=ON]
        use_smm: [SMM=blas]
        blas_impl: [accelerate,openblas]
        mpi_suffix: [mpich] # Brew openmpi doesn't provide mpi.mod

    steps:
    - uses: actions/checkout@v4
      with:
        fetch-depth: 0
        submodules: true

    - name: Install common dependencies
      run: |
        env HOMEBREW_NO_AUTO_UPDATE=1 brew install \
          ninja

    - name: Install ${{ matrix.mpi_suffix }}
      run: |
        env HOMEBREW_NO_AUTO_UPDATE=1 brew install ${{ matrix.mpi_suffix }}

    - name: Configure
      run: |
        mkdir -p build
        cd build
        env \
          CC=gcc-15 CXX=g++-15 FC=gfortran-15 \
          cmake -G Ninja \
          -DCMAKE_BUILD_TYPE=Release \
          -DBUILD_TESTING=ON \
          -DUSE_${{ matrix.use_mpi }} \
          -DUSE_${{ matrix.use_openmp }} \
          -DUSE_${{ matrix.use_smm }} \
          $([ "${{ matrix.blas_impl }}" = "openblas" ] && echo '-DCMAKE_PREFIX_PATH=/usr/local/opt/openblas') \
          -DMPIEXEC_PREFLAGS="$([ "${{ matrix.mpi_suffix }}" = "openmpi" ] && echo "-mca btl ^openib --allow-run-as-root")" \
          -DTEST_MPI_RANKS=auto \
          ..

    - name: Build
      run: cmake --build build -- --verbose

    - name: Test
      run: |
        cd build
        ctest --output-on-failure

#  vim: set ts=2 sw=2 tw=0 :


================================================
FILE: .gitignore
================================================
# ignore project specific locations & files
/lib/
/obj/
/bin/
/doc/
/install/
*.callgraph

# exclude personal makefile
/Makefile.inc.*

# The following covers some more,
# Created by https://www.gitignore.io/api/vim,emacs,python,fortran

### Emacs ###
# -*- mode: gitignore; -*-
*~
\#*\#
/.emacs.desktop
/.emacs.desktop.lock
*.elc
auto-save-list
tramp
.\#*

# Org-mode
.org-id-locations
*_archive

# flymake-mode
*_flymake.*

# eshell files
/eshell/history
/eshell/lastdir

# elpa packages
/elpa/

# reftex files
*.rel

# AUCTeX auto folder
/auto/

# cask packages
.cask/
dist/

# Flycheck
flycheck_*.el

# server auth directory
/server/

# projectiles files
.projectile
projectile-bookmarks.eld

# directory configuration
.dir-locals.el

# saveplace
places

# url cache
url/cache/

# cedet
ede-projects.el

# smex
smex-items

# company-statistics
company-statistics-cache.el

# anaconda-mode
anaconda-mode/

### Fortran ###
# Prerequisites
*.d

# Compiled Object files
*.slo
*.lo
*.o
*.obj

# Precompiled Headers
*.gch
*.pch

# Compiled Dynamic libraries
*.so
*.dylib
*.dll

# Fortran module files
*.mod
*.smod

# Compiled Static libraries
*.lai
*.la
*.a
*.lib

# Executables
*.exe
*.out
*.app

### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions

# Distribution / packaging
.Python
build/
develop-eggs/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
.pytest_cache/
nosetests.xml
coverage.xml
*.cover
.hypothesis/

# Translations
*.mo
*.pot

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule.*

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/

### Vim ###
# swap
.sw[a-p]
.*.sw[a-p]
# session
Session.vim
# temporary
.netrwhist
# auto-generated tag files
tags
.tags
.tags_swap

# End of https://www.gitignore.io/api/vim,emacs,python,fortran

spack-*

.ccls-cache/

.DS_Store

BUILD/
!tools/fedora/dbcsr.spec


================================================
FILE: .gitmodules
================================================
[submodule "tools/build_utils/fypp"]
	path = tools/build_utils/fypp
	url = https://github.com/aradi/fypp.git


================================================
FILE: .packit.yaml
================================================
specfile_path: tools/fedora/dbcsr.spec
files_to_sync:
  - src: tools/fedora/
    dest: ./
    delete: true
    filters:
      - "protect .git*"
      - "protect sources"
      - "protect changelog"
  - .packit.yaml
upstream_package_name: dbcsr
downstream_package_name: dbcsr
upstream_tag_template: v{version}

targets:
  - fedora-development-x86_64
  - fedora-development-aarch64

_:
  # Job templates
  - &build-in-packit
    job: copr_build
  - &build-at-lecris
    <<: *build-in-packit
    owner: lecris

jobs:
  - <<: *build-at-lecris
    trigger: release
    project: release
  - <<: *build-at-lecris
    trigger: commit
    branch: master
    project: nightly
  - <<: *build-in-packit
    trigger: pull_request
  - job: propose_downstream
    trigger: release
    dist_git_branches:
      - fedora-rawhide
  - job: koji_build
    trigger: commit
    dist_git_branches:
      - fedora-all
  - job: bodhi_update
    trigger: commit
    dist_git_branches:
      - fedora-branched


================================================
FILE: .pre-commit/check_header.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
####################################################################################################
# Copyright (C) by the DBCSR developers group - All rights reserved                                #
# This file is part of the DBCSR library.                                                          #
#                                                                                                  #
# For information on the license, see the LICENSE file.                                            #
# For further information please visit https://dbcsr.cp2k.org                                      #
# SPDX-License-Identifier: GPL-2.0+                                                                #
####################################################################################################

import argparse
import re
import mmap
import sys
import pathlib
from collections import defaultdict
from os import path, listdir
from contextlib import contextmanager

TYPES = {
    "c_cpp": [".c", ".h", ".cc", ".hh", ".cxx", ".hxx", ".cpp", ".hpp", ".cu", ".cl"],
    "fortran": [".F", ".f", ".f90", ".f03"],
    "script": [".py", ".sh"],
    "fypp": [".fypp"],
}

# max number of lines allowed between header and top of file
ALLOWED_LINES = 5

# some assumed max line length to terminate early for large files
MAX_LINE_LENGTH = 128


@contextmanager
def mmap_open(name, mode="r"):
    access = mmap.ACCESS_READ if mode == "r" else mmap.ACCESS_WRITE
    with open(name, mode + "b") as fhandle:
        fmapped = mmap.mmap(fhandle.fileno(), 0, access=access)
        yield fmapped
        fmapped.close()


def check_header(header_dir, files, verbose=False):
    retval = 0
    header_re = defaultdict(list)
    header_len = defaultdict(list)

    for headerfile in listdir(header_dir):
        headertype = pathlib.Path(headerfile).stem
        if headertype in TYPES:
            with open(path.join(header_dir, headerfile), "rb") as fhandle:
                header_content = fhandle.read()
                header_re[headertype].append(re.compile(re.escape(header_content)))
                header_len[headertype].append(len(header_content))
        else:
            print("no matching headerfile to file extensions")
            sys.exit(1)

    ext_map = {e: t for t, exts in TYPES.items() for e in exts}

    for fpath in files:
        _, fext = path.splitext(fpath)

        if fext not in ext_map:
            if verbose:
                print("? {} ... unknown file type, ignoring".format(fpath))
            continue

        with mmap_open(fpath) as fmapped:
            header_type = ext_map[fext]
            for h_re, h_len in zip(header_re[header_type], header_len[header_type]):
                match = h_re.search(fmapped, 0, ALLOWED_LINES * MAX_LINE_LENGTH + h_len)
                if match:
                    break

            if not match:
                print("✗ {} ... required header not found".format(fpath))
                retval = 1
                continue

            lines_above = fmapped[0 : match.start()].splitlines()
            if len(lines_above) > ALLOWED_LINES:
                print(
                    "✗ {} ... header not within first {} lines".format(
                        fpath, ALLOWED_LINES
                    )
                )
                retval = 1
                continue

        if verbose:
            print("✓ {}".format(fpath))

    sys.exit(retval)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Check files for header presence")
    parser.add_argument(
        "files", metavar="FILE", type=str, nargs="+", help="files to check"
    )
    parser.add_argument("--verbose", "-v", action="store_true", default=False)
    args = parser.parse_args()

    header_dir = path.join(path.dirname(path.abspath(__file__)), "headers")
    check_header(header_dir, args.files, args.verbose)


================================================
FILE: .pre-commit/clang-format-fypp.sh
================================================
#!/usr/bin/env bash
####################################################################################################
# Copyright (C) by the DBCSR developers group - All rights reserved                                #
# This file is part of the DBCSR library.                                                          #
#                                                                                                  #
# For information on the license, see the LICENSE file.                                            #
# For further information please visit https://dbcsr.cp2k.org                                      #
# SPDX-License-Identifier: GPL-2.0+                                                                #
####################################################################################################

# clang-format change FYPP directives, need to revert the changes.

function sed_darwin()
{
    sed -i "" "$@"
}

function sed_linux()
{
    sed -i "$@"
}

function main()
{
    local files=""
    for i in "$@"; do
    case $i in
      -*|--*)
      ;;
      *)
          files+="$i "
      ;;
    esac
    done

    clang-format "$@"
    # Fix FYPP directives
    uname="$(uname -s)"
    case "${uname}" in
        Darwin*)
            sed_fcn=sed_darwin
        ;;
        *)
            sed_fcn=sed_linux
        ;;
    esac

    for i in ${files}; do
        ${sed_fcn} -e '/\${$/ { N; s/\${\n[[:space:]]*/\${/; }' "$i"
        ${sed_fcn} -e 's/#[[:space:]]*: /#:/g' -e 's/} \$/}\$/g' "$i"
    done
}

main "$@"


================================================
FILE: .pre-commit/headers/c_cpp.1
================================================
/*------------------------------------------------------------------------------------------------*/
/* Copyright (C) by the DBCSR developers group - All rights reserved                              */
/* This file is part of the DBCSR library.                                                        */
/*                                                                                                */
/* For information on the license, see the LICENSE file.                                          */
/* For further information please visit https://dbcsr.cp2k.org                                    */
/* SPDX-License-Identifier: GPL-2.0+                                                              */
/*------------------------------------------------------------------------------------------------*/


================================================
FILE: .pre-commit/headers/c_cpp.2
================================================
/*------------------------------------------------------------------------------------------------*/
/* Copyright (C) by the DBCSR developers group - All rights reserved                              */
/* Copyright (C) 2022 Advanced Micro Devices, Inc. - All rights reserved                          */
/* This file is part of the DBCSR library.                                                        */
/*                                                                                                */
/* For information on the license, see the LICENSE file.                                          */
/* For further information please visit https://dbcsr.cp2k.org                                    */
/* SPDX-License-Identifier: GPL-2.0+                                                              */
/*------------------------------------------------------------------------------------------------*/


================================================
FILE: .pre-commit/headers/c_cpp.3
================================================
/*------------------------------------------------------------------------------------------------*/
/* Copyright (C) by the DBCSR developers group - All rights reserved                              */
/* This file is part of the DBCSR library.                                                        */
/*                                                                                                */
/* For information on the license, see the LICENSE file.                                          */
/* For further information please visit https://dbcsr.cp2k.org                                    */
/* SPDX-License-Identifier: BSD-3-Clause                                                          */
/*------------------------------------------------------------------------------------------------*/


================================================
FILE: .pre-commit/headers/fortran.1
================================================
!--------------------------------------------------------------------------------------------------!
! Copyright (C) by the DBCSR developers group - All rights reserved                                !
! This file is part of the DBCSR library.                                                          !
!                                                                                                  !
! For information on the license, see the LICENSE file.                                            !
! For further information please visit https://dbcsr.cp2k.org                                      !
! SPDX-License-Identifier: GPL-2.0+                                                                !
!--------------------------------------------------------------------------------------------------!


================================================
FILE: .pre-commit/headers/fortran.2
================================================
!--------------------------------------------------------------------------------------------------!
! Copyright (C) by the DBCSR developers group - All rights reserved                                !
! Copyright (C) 2022 Advanced Micro Devices, Inc. - All rights reserved                            !
! This file is part of the DBCSR library.                                                          !
!                                                                                                  !
! For information on the license, see the LICENSE file.                                            !
! For further information please visit https://dbcsr.cp2k.org                                      !
! SPDX-License-Identifier: GPL-2.0+                                                                !
!--------------------------------------------------------------------------------------------------!


================================================
FILE: .pre-commit/headers/fypp.1
================================================
#!--------------------------------------------------------------------------------------------------!
#! Copyright (C) by the DBCSR developers group - All rights reserved                                !
#! This file is part of the DBCSR library.                                                          !
#!                                                                                                  !
#! For information on the license, see the LICENSE file.                                            !
#! For further information please visit https://dbcsr.cp2k.org                                      !
#! SPDX-License-Identifier: GPL-2.0+                                                                !
#!--------------------------------------------------------------------------------------------------!


================================================
FILE: .pre-commit/headers/script.1
================================================
####################################################################################################
# Copyright (C) by the DBCSR developers group - All rights reserved                                #
# This file is part of the DBCSR library.                                                          #
#                                                                                                  #
# For information on the license, see the LICENSE file.                                            #
# For further information please visit https://dbcsr.cp2k.org                                      #
# SPDX-License-Identifier: GPL-2.0+                                                                #
####################################################################################################


================================================
FILE: .pre-commit/headers/script.2
================================================
####################################################################################################
# Copyright (C) by the DBCSR developers group - All rights reserved                                #
# This file is part of the DBCSR library.                                                          #
#                                                                                                  #
# For information on the license, see the LICENSE file.                                            #
# For further information please visit https://dbcsr.cp2k.org                                      #
# SPDX-License-Identifier: BSD-3-Clause                                                            #
####################################################################################################


================================================
FILE: .pre-commit-config.yaml
================================================
default_language_version:
    python: python3

exclude: '^tools/(build_utils/fypp)'
fail_fast: false
minimum_pre_commit_version: 3.2.0
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
  rev: 'v0.15.13'
  hooks:
  - id: ruff
    args: [ --fix, --exit-non-zero-on-fix ]
    exclude: >-
      (?x)^(
        .cp2k/.*|
      )$
- repo: https://github.com/psf/black-pre-commit-mirror
  rev: 26.5.0
  hooks:
  - id: black
    name: Reformat Python files with the black code formatter
    files: '^.*(/PACKAGE)|(\.py)$'
- repo: https://github.com/pre-commit/pre-commit-hooks
  rev: v6.0.0
  hooks:
  - id: check-ast
  - id: check-yaml
  - id: check-symlinks
  - id: trailing-whitespace
- repo: https://github.com/fortran-lang/fprettify
  rev: v0.3.7
  hooks:
  - id: fprettify
- repo: https://github.com/cheshirekow/cmake-format-precommit
  rev: v0.6.13
  hooks:
  - id: cmake-format
    exclude: >-
      (?x)^(
        cmake/(CheckFortranSourceRuns|CompilerConfiguration|Find(BLAS|LAPACK)|GetGitRevisionDescription).cmake|
      )$
- repo: local
  hooks:
  - id: check-header
    name: check file headers
    entry: ./.pre-commit/check_header.py --verbose
    language: script
    types: [text]
    exclude: >-
      (?x)^(
        tools/.*|
        .cp2k/.*|
        .cmake-format.py|
        src/acc/hip/dbcsr_hip_profiling.F|
      )$
  - id: check-doxygen-tags
    name: no doxygen tags present
    entry: '^\s*!>'
    language: pygrep
    types: [text]
  - id: clang-format-fypp
    name: clang-format-fypp
    description: Format files with ClangFormat, ignore FYPP directives.
    entry: ./.pre-commit/clang-format-fypp.sh
    language: python
    files: \.(c|cc|cxx|cpp|cl|frag|glsl|h|hpp|hxx|ih|ispc|ipp|java|js|m|mm|proto|textproto|vert)$
    args: ['-i', '-fallback-style=none', '--style=file']
    # specify version since clang-format is not stable version-to-version
    additional_dependencies: ['clang-format~=19.1.0']


================================================
FILE: .ruff.toml
================================================
select = ["E", "F", "B"]
line-length = 128
ignore = ["B905"]


================================================
FILE: AUTHORS
================================================
Alfio Lazzaro <alfio.lazzaro@gmail.com>
Andreas Gloeß <agloess@cp2k.org>
Christian Pousa <pousa@cp2k.org>
Dorothea Golze <dorotheagolze@cp2k.org>
Fawzi Mohamed <fawzi@cp2k.org>
Florian Schiffmann <fschiff@cp2k.org>
Gina Sitaraman <gina.sitaraman@amd.com>
Harald Forbert <hforbert@cp2k.org>
H. Bani-Hashemian <hbani@cp2k.org>
Iain Bethune <ibethune@cp2k.org>
Ilia Sivkov <ilia.sivkov@chem.uzh.ch>
Jan Wilhelm <jwilhelm@cp2k.org>
Joost VandeVondele <joost.vandevondele@cscs.ch>
Juerg Hutter <hutter@chem.uzh.ch>
Leopold Grinberg <leopold.grinberg@amd.com>
Lianheng Tong <ltong@cp2k.org>
Marcella Mauri-Iannuzzi <marcella@cp2k.org>
Matthias Krack <mkrack@cp2k.org>
Maximilien Ambroise <maximilien.ambroise@gmail.com>
Nico Holmberg <nholmberg@cp2k.org>
Ole Schuett <ole.schuett@cp2k.org>
Patrick Seewald <pseewald@cp2k.org>
Samuel Andermatt <sandermatt@cp2k.org>
Sergey Chulkov <schulkov@cp2k.org>
Shoshana Alice Jakobovits <jakobovits@cscs.ch>
Teodoro Laino <tlaino@cp2k.org>
Thomas Chassaing <tchassai@cp2k.org>
Urban Borstnik <uborstnik@cp2k.org>
Valery Weber <vweber@cp2k.org>
Vladimir Rybkin <rybkinjr@cp2k.org>


================================================
FILE: CMakeLists.txt
================================================
cmake_minimum_required(VERSION 3.22)

set(CMAKE_INTERPROCEDURAL_OPTIMIZATION FALSE FORCE)

# include our cmake snippets
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} ${CMAKE_CURRENT_SOURCE_DIR}/cmake)
# DBCSR's source directory
set(DBCSR_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/src)

# =================================================================================================
# REQUIRE OUT-OF-SOURCE BUILDS
file(TO_CMAKE_PATH "${PROJECT_BINARY_DIR}/CMakeLists.txt" LOC_PATH)
if (EXISTS "${LOC_PATH}")
  message(
    FATAL_ERROR
      "You cannot build in a source directory (or any directory with a CMakeLists.txt file). Please make a build subdirectory."
  )
endif ()

# =================================================================================================
# PROJECT AND VERSION
include(GetGitRevisionDescription)

git_describe(GIT_DESC)

if (GIT_DESC)
  string(REGEX REPLACE "^v([0-9]+)\\..*" "\\1" VERSION_MAJOR "${GIT_DESC}")
  string(REGEX REPLACE "^v[0-9]+\\.([0-9]+).*" "\\1" VERSION_MINOR
                       "${GIT_DESC}")
  string(REGEX REPLACE "^v[0-9]+\\.[0-9]+\\.([0-9]+).*" "\\1" VERSION_PATCH
                       "${GIT_DESC}")
  string(REGEX REPLACE "^v[0-9]+\\.[0-9]+\\.[0-9]+(.*)" "\\1" VERSION_GIT
                       "${GIT_DESC}")

  git_local_changes(GIT_STATE)
  if ("${GIT_STATE}" STREQUAL "DIRTY")
    set(VERSION_GIT "${VERSION_GIT}-dirty")
  endif ()

  execute_process(
    COMMAND git log -1 --format=%ai
    WORKING_DIRECTORY ${CMAKE_SOURCE_DIR}
    OUTPUT_VARIABLE GIT_COMMIT_DATE
    OUTPUT_STRIP_TRAILING_WHITESPACE)

  # take only the date from the git timestamp:
  string(REGEX REPLACE "^([0-9\\-]+) .*" "\\1" VERSION_DATE
                       "${GIT_COMMIT_DATE}")
else ()
  file(STRINGS VERSION VERSION_INFO)
  foreach (line ${VERSION_INFO})
    if (${line} MATCHES "^([^#].*)=[ \t]*(.*)$")
      set(key ${CMAKE_MATCH_1})
      set(value ${CMAKE_MATCH_2})
      string(REGEX REPLACE "[ \t\n]+$" "" key "${key}")
      string(REGEX REPLACE "[ \t\n]+$" "" value "${value}")
      set(VERSION_${key} "${value}")
      continue()
    endif ()
  endforeach ()
endif ()

project(
  dbcsr
  DESCRIPTION
    "DBCSR: Distributed Block Compressed Sparse Row matrix library (https://dbcsr.cp2k.org)"
)
set(dbcsr_VERSION
    ${VERSION_MAJOR}.${VERSION_MINOR}.${VERSION_PATCH}${VERSION_GIT})
set(dbcsr_APIVERSION ${VERSION_MAJOR}.${VERSION_MINOR})

# =================================================================================================
# OPTIONS
include(CMakeDependentOption)

option(BUILD_SHARED_LIBS "Build shared libraries" OFF)
option(USE_OPENMP "Build with OpenMP support" ON)
option(USE_MPI "Build with MPI support" ON)
option(USE_MPI_F08 "Build with the mpi_f08 module support" OFF)
option(BUILD_TESTING "Build dbcsr unit tests" OFF)
cmake_dependent_option(
  WITH_C_API "Build the C API (ISO_C_BINDINGS)" ON "USE_MPI" OFF
)# the ISO_C_BINDINGS require MPI unconditionally
cmake_dependent_option(WITH_EXAMPLES "Build the examples" ON "USE_MPI" OFF
)# all examples require MPI

set(TEST_MPI_RANKS
    2
    CACHE STRING "Number of MPI ranks for testing")
set(TEST_OMP_THREADS
    2
    CACHE STRING "Number of OpenMP threads for testing")

set(USE_SMM
    "auto"
    CACHE STRING
          "Small Matrix Multiplication implementation to use (default: auto)")
set_property(CACHE USE_SMM PROPERTY STRINGS auto blas libxsmm)

set(USE_ACCEL
    ""
    CACHE STRING "Build with acceleration support (default: none)")
set_property(CACHE USE_ACCEL PROPERTY STRINGS "" opencl cuda hip)

set(SUPPORTED_CUDA_ARCHITECTURES
    K20X
    K40
    K80
    P100
    V100
    A100
    H100)
set(SUPPORTED_HIP_ARCHITECTURES Mi50 Mi100 Mi250 Mi300 Mi350)
set(WITH_GPU
    $<IF:$<STREQUAL:${USE_ACCEL},"opencl">,"","P100">
    CACHE
      STRING
      "Select GPU arch. and embed parameters (default: CUDA/HIP=P100, OPENCL=all)"
)
set(WITH_GPU_PARAMS "${WITH_GPU}")
set_property(CACHE WITH_GPU PROPERTY STRINGS ${SUPPORTED_CUDA_ARCHITECTURES}
                                     ${SUPPORTED_HIP_ARCHITECTURES})

option(WITH_CUDA_PROFILING "Enable profiling within CUDA" OFF)
option(WITH_HIP_PROFILING "Enable profiling within HIP" OFF)

# =================================================================================================
# LANGUAGES AND TESTING
enable_language(Fortran)

if ((WITH_C_API AND WITH_EXAMPLES) OR (NOT USE_ACCEL MATCHES "none"))
  enable_language(CXX)
  enable_language(C)

  if (NOT DEFINED CMAKE_CXX_STANDARD)
    set(CMAKE_CXX_STANDARD 14)
    set(CMAKE_CXX_STANDARD_REQUIRED ON)
  endif ()

  if (NOT DEFINED CMAKE_C_STANDARD)
    set(CMAKE_C_STANDARD 11)
    set(CMAKE_C_STANDARD_REQUIRED ON)
  endif ()

endif ()

# =================================== OpenMP
if (USE_OPENMP)
  find_package(OpenMP REQUIRED)
endif ()

# =================================== LIBXSMM (rely on pkg-config)
if (USE_SMM MATCHES "libxsmm|auto")
  if (USE_SMM MATCHES "libxsmm")
    set(LIBXSMM_REQUIRED "REQUIRED")
  endif ()
  find_package(PkgConfig ${LIBXSMM_REQUIRED})
  if (USE_OPENMP)
    if (BUILD_SHARED_LIBS OR USE_SMM MATCHES "libxsmm-shared")
      pkg_check_modules(LIBXSMMEXT IMPORTED_TARGET GLOBAL libxsmmext-shared)
    else ()
      pkg_check_modules(LIBXSMMEXT IMPORTED_TARGET GLOBAL libxsmmext-static)
    endif ()
    if (NOT LIBXSMMEXT_FOUND)
      pkg_check_modules(LIBXSMMEXT ${LIBXSMM_REQUIRED} IMPORTED_TARGET GLOBAL
                        libxsmmext)
    endif ()
  endif ()
  if (BUILD_SHARED_LIBS OR USE_SMM MATCHES "libxsmm-shared")
    pkg_check_modules(LIBXSMM IMPORTED_TARGET GLOBAL libxsmmf-shared)
  else ()
    pkg_check_modules(LIBXSMM IMPORTED_TARGET GLOBAL libxsmmf-static)
  endif ()
  if (NOT LIBXSMM_FOUND)
    pkg_check_modules(LIBXSMM ${LIBXSMM_REQUIRED} IMPORTED_TARGET GLOBAL
                      libxsmmf)
  endif ()
endif ()

# =================================== SMM (Small Matrix-Matrix multiplication)
if (USE_SMM MATCHES "blas" OR (USE_SMM MATCHES "auto" AND NOT LIBXSMM_FOUND))
  if (USE_ACCEL MATCHES "opencl")
    message(FATAL_ERROR "OpenCL requires USE_SMM=libxsmm")
  endif ()
  message(STATUS "Using BLAS for Small Matrix Multiplication")
elseif (USE_SMM MATCHES "libxsmm" OR USE_SMM MATCHES "auto")
  message(STATUS "Using libxsmm for Small Matrix Multiplication")
else ()
  message(FATAL_ERROR "Unknown SMM library specified")
endif ()

# =================================== BLAS & LAPACK, PkgConfig
find_package(LAPACK REQUIRED) # needed for some of the integrated test routines,
                              # also calls find_package(BLAS)

# =================================== Python this module looks preferably for
# version 3 of Python. If not found, version 2 is searched. In CMake 3.15, if a
# python virtual environment is activated, it will search the virtual
# environment for a python interpreter before searching elsewhere in the system.
# In CMake <3.15, the system is searched before the virtual environment.
if (NOT Python_EXECUTABLE)
  # If the python interpreter is not specified (command line), try finding it:
  find_package(
    Python
    COMPONENTS Interpreter
    REQUIRED)
endif ()

# =================================== MPI
if (USE_MPI)
  get_property(REQUIRED_MPI_COMPONENTS GLOBAL PROPERTY ENABLED_LANGUAGES)
  if (NOT CMAKE_CROSSCOMPILING) # when cross compiling, assume the users know
                                # what they are doing
    set(MPI_DETERMINE_LIBRARY_VERSION TRUE)
  endif ()
  find_package(
    MPI
    COMPONENTS ${REQUIRED_MPI_COMPONENTS}
    REQUIRED)

  if (NOT MPI_Fortran_HAVE_F90_MODULE)
    message(
      FATAL_ERROR
        "\
The listed MPI implementation does not provide the required mpi.mod interface. \
When using the GNU compiler in combination with Intel MPI, please use the \
Intel MPI compiler wrappers. Check the INSTALL.md for more information.")
  endif ()
  if (USE_MPI_F08)
    if (NOT MPI_Fortran_HAVE_F08_MODULE)
      message(
        FATAL_ERROR
          "The listed MPI implementation does not provide the required mpi_f08.mod interface."
      )
    endif ()
  endif ()
  if ("${MPI_Fortran_LIBRARY_VERSION_STRING}" MATCHES "Open MPI v2.1"
      OR "${MPI_Fortran_LIBRARY_VERSION_STRING}" MATCHES "Open MPI v3.1")
    message(
      WARNING
        "RMA with ${MPI_Fortran_LIBRARY_VERSION_STRING} is not supported due to issues with its implementation."
        " Please use a newer version of OpenMPI or switch to MPICH if you plan on using MPI-RMA."
    )
  endif ()
endif ()

# =================================== GPU backends

if (NOT USE_ACCEL MATCHES "none")
  set(DBCSR_ACC_HEADER acc/acc.h acc/acc_bench.h acc/acc_libsmm.h)
endif ()

if (USE_ACCEL MATCHES "opencl")
  find_package(OpenCL REQUIRED)

  set(DBCSR_OPENCL_SCRIPT ${DBCSR_SOURCE_DIR}/acc/opencl/acc_opencl.sh)
  set(DBCSR_OPENCL_COMMON acc/opencl/common/opencl_atomics.h
                          acc/opencl/common/opencl_common.h)
  list(APPEND DBCSR_ACC_HEADER acc/opencl/smm/opencl_libsmm.h
       acc/opencl/acc_opencl.h)
endif ()

if (USE_ACCEL MATCHES "cuda|hip")
  set(GPU_ARCH_NUMBER_K20X 35)
  set(GPU_ARCH_NUMBER_K40 35)
  set(GPU_ARCH_NUMBER_K80 37)
  set(GPU_ARCH_NUMBER_P100 60)
  set(GPU_ARCH_NUMBER_V100 70)
  set(GPU_ARCH_NUMBER_A100 80)
  set(GPU_ARCH_NUMBER_H100 90)
  set(GPU_ARCH_NUMBER_Mi50 gfx906)
  set(GPU_ARCH_NUMBER_Mi100 gfx908)
  set(GPU_ARCH_NUMBER_Mi250 gfx90a)
  set(GPU_ARCH_NUMBER_Mi300 gfx942)
  set(GPU_ARCH_NUMBER_Mi350 gfx950)
endif ()

if (USE_ACCEL MATCHES "cuda")
  enable_language(CUDA)
  find_package(CUDAToolkit REQUIRED)

  if (NOT DEFINED CMAKE_CUDA_STANDARD)
    set(CMAKE_CUDA_STANDARD 14)
    set(CMAKE_CUDA_STANDARD_REQUIRED ON)
  endif ()

  if (CUDAToolkit_VERSION LESS 5.5)
    message(FATAL_ERROR "CUDA version >= 5.5 is required.")
  endif ()

  # Make sure the GPU required is supported
  list(FIND SUPPORTED_CUDA_ARCHITECTURES ${WITH_GPU} GPU_SUPPORTED)
  if (GPU_SUPPORTED EQUAL -1)
    message(
      FATAL_ERROR "GPU architecture requested (${WITH_GPU}) is not supported. "
                  "Please choose from: ${SUPPORTED_CUDA_ARCHITECTURES}")
  endif ()

  # set cuda architecture number and compilation flags
  set(ACC_ARCH_NUMBER ${GPU_ARCH_NUMBER_${WITH_GPU}})

  message(STATUS "GPU target architecture: " ${WITH_GPU})
  message(STATUS "Kernel parameters: " ${WITH_GPU_PARAMS})
  message(STATUS "GPU architecture number: " ${ACC_ARCH_NUMBER})
  message(STATUS "GPU profiling enabled: " ${WITH_CUDA_PROFILING})
endif ()

if (USE_ACCEL MATCHES "hip")
  if (NOT CMAKE_HIP_ARCHITECTURES)
    set(CMAKE_HIP_ARCHITECTURES OFF)
  endif ()
  enable_language(HIP)

  if (NOT DEFINED CMAKE_HIP_STANDARD)
    set(CMAKE_HIP_STANDARD 14)
    set(CMAKE_HIP_STANDARD_REQUIRED ON)
  endif ()

  # Make sure the GPU required is supported
  list(FIND SUPPORTED_HIP_ARCHITECTURES ${WITH_GPU} GPU_SUPPORTED)
  if (GPU_SUPPORTED EQUAL -1)
    message(
      FATAL_ERROR "GPU architecture requested (${WITH_GPU}) is not supported. "
                  "Please choose from: ${SUPPORTED_HIP_ARCHITECTURES}")
  endif ()

  # ROCm is typically installed in /opt/rocm; otherwise let the user set
  # ROCM_PATH as an environment variable or define.
  if (NOT DEFINED ROCM_PATH)
    if (NOT DEFINED ENV{ROCM_PATH})
      set(ROCM_PATH
          "/opt/rocm"
          CACHE PATH "Path to ROCm installation")
    else ()
      set(ROCM_PATH
          $ENV{ROCM_PATH}
          CACHE PATH "Path to ROCm installation")
    endif ()
  endif ()

  # Notice: this is not FindHIP.cmake for hip language support, but
  # hip-config.cmake which contains targets like hip::host for jitting.
  find_package(hip CONFIG REQUIRED HINTS ${ROCM_PATH})

  message(STATUS "Build with HIP ${hip_VERSION}")
  if (hip_VERSION LESS 4.4.0)
    message(FATAL_ERROR "HIP version >= 4.4.0 is required.")
  endif ()

  set(ACC_ARCH_NUMBER ${GPU_ARCH_NUMBER_${WITH_GPU}})
  message(STATUS "GPU target architecture: " ${WITH_GPU})
  message(STATUS "Kernel parameters: " ${WITH_GPU_PARAMS})
  message(STATUS "GPU architecture number: " ${ACC_ARCH_NUMBER})
  message(STATUS "GPU profiling enabled: " ${WITH_HIP_PROFILING})

  # =================================== BLAS on GPU backend
  find_package(hipblas CONFIG REQUIRED HINTS ${ROCM_PATH})

  # =================================== HIPRTC
  find_package(hiprtc REQUIRED)
endif ()

# =================================================================================================
# OPTION HANDLING

# make sure that the default build type is RELEASE
set(default_build_type "Release")
if (NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)
  message(
    STATUS
      "Setting build type to '${default_build_type}' as none was specified.")
  set(CMAKE_BUILD_TYPE
      "${default_build_type}"
      CACHE STRING
            "Choose the type of build, options are: Debug Release Coverage."
            FORCE)
  # set the possible values of build type for cmake-gui
  set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS "Debug" "Release"
                                               "Coverage")
endif ()

# compiler configuration could have impacted package discovery (above)
include(CompilerConfiguration)
include(CheckCompilerSupport)

# subdirectories
add_subdirectory(${DBCSR_SOURCE_DIR})

if (BUILD_TESTING)
  include(CTest)
  add_subdirectory(tests)
endif ()

if (WITH_EXAMPLES)
  add_subdirectory(examples)
endif ()

add_subdirectory(docs)

include(CustomTargets)

# Disable LTO
set(CMAKE_INTERPROCEDURAL_OPTIMIZATION FALSE FORCE)


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing to DBCSR
The core of DBCSR is written in Fortran. All other languages must be supported through bindings.

There is a single [API](./src/dbcsr_api.F) file for DBCSR, which is provided for external usage only. **Do not use the API for any internal DBCSR development!** Packages build on top of DBCSR, for example [DBCSR Tensors](./src/tensors), **must only use** the DBCSR API. Note that any change in the APIs will require a major release of the library.

We support CMake for compilation, please keep the build system updated when adding/removing files. When adding new functions, it is extremely important to provide simple test programs, aka "unit tests", to check whether these functions are performing as they should. The directory [test](./tests) serves as infrastructure for that. If you do not feel comfortable integrating these tests with the build system, please notify the other developers.

Having examples (under the directory [examples](./examples)) is also appreciated. They must be independent of the DBCSR compilation and only use the DBCSR APIs.

DBCSR developers can find additional information on the [Development](https://github.com/cp2k/dbcsr/wiki/Development) wiki page.

## Fortran Code conventions

The code is automatically formatted (via pre-commit hooks) by the [prettify tool](https://github.com/pseewald/fprettify/).

Please make sure that you follow the following code conventions (based on [CP2K conventions](https://www.cp2k.org/dev:codingconventions)):
1. Every `USE` statement should have an `ONLY:` clause, which lists the imported symbols.
2. Every `OMP PARALLEL` region should declare `default(none)`.
3. Every static variable should be marked with the `SAVE` attribute.
4. Every Fortran module should contain the line `IMPLICIT NONE`.
5. Every conversion that might change value should be explicit.
6. Each `.F` file should contain either a `PROGRAM` or a single `MODULE`, whose name must start with the `dbcsr_` suffix and matches the filename. Then, it should start with the DBCSR header. Note that the name of the modules must be unique, even across different directories!
7. Use the routines from [MPI wrappers](./src/mpi) instead of calling MPI directly.
8. Don't use `UNIT=*` in `WRITE` or `PRINT` statements. Instead, request a unit from the logger: `iw=dbcsr_logger_get_default_unit_nr()` and write only if you actually received a unit: `IF(iw>0) WRITE (UNIT=iw, ,,,)`.
9. Avoid to use `STOP`. Prefer the DBCSR error handlers: `DBCSR_WARN`, `DBCSR_ABORT`, `DBCSR_ASSERT`.
10. Each preprocessor flag should start with two underscores and be documented in the [documentation](./docs/guide/3-developer-guide/3-programming/1-overview/index.md#list-of-macros-used-in-the-code).
11. All routines in the API must start with the `dbcsr_` namespace. For submodules API (e.g. [DBCSR Tensors](./src/tensors)), each function has to start with the `dbcsr_<unique ID of the submodule>_` namespace.
12. If you are including files (i.e. macro `#include`), note that the base directory is `src`, please use relative path to it (e.g. `#include "base/dbcsr_base_uses.f90"` instead of `#include "../base/dbcsr_base_uses.f90"`).
13. All Fortran keywords (`FUNCTION`, `SUBROUTINE`, data types...) must be in capital letters.

**Most important, please avoid committing dead code and useless comments!**


================================================
FILE: DBCSR.md
================================================
---
project: DBCSR
project_github: https://github.com/cp2k/dbcsr
project_download: https://github.com/cp2k/dbcsr/releases
project_website: https://dbcsr.cp2k.org
summary: ![DBCSR](media/logo/logo.png)
         {: style="text-align: center"}
author: DBCSR Authors
github: https://github.com/cp2k/dbcsr/blob/master/AUTHORS
fpp_extensions: F
fixed_extensions:
extensions: F
preprocessor: cpp -traditional-cpp -E -Wno-invalid-pp-token
include: ../src
predocmark: >
media_dir: @CMAKE_SOURCE_DIR@/docs/media
md_base_dir: @CMAKE_SOURCE_DIR@
page_dir: @CMAKE_SOURCE_DIR@/docs/guide
src_dir: ./src
         ./tests
         ./examples
output_dir: @CMAKE_BINARY_DIR@/doc
docmark_alt: #
predocmark_alt: <
display: public
         protected
         private
source: true
graph: false
search: false
favicon: @CMAKE_SOURCE_DIR@/docs/media/logo/logo.png
version: @dbcsr_VERSION@
exclude: Makefile
extra_filetypes: cpp #
---

--------------------

DBCSR stands for **D**istributed **B**locked **C**ompressed **S**parse **R**ow.

DBCSR is a library designed to efficiently perform sparse matrix-matrix multiplication, among other operations.

It is MPI and OpenMP parallel and can exploit Nvidia and AMD GPUs via CUDA and HIP.

To get started with DBCSR, go to

- [Installation guide](page/2-user-guide/1-installation/index.html)
- [User guide](page/2-user-guide/index.html)
- [Developer guide](page/3-developer-guide/index.html)

License
-------

DBCSR's source code and related files and documentation are distributed under GPL. See the [LICENSE](https://github.com/cp2k/dbcsr/blob/develop/LICENSE) file for more details.

How to cite
-----------------

To cite DBCSR, use the following paper

```latex
@article{dbcsr,
	title = {{Sparse Matrix Multiplication: The Distributed Block-Compressed Sparse Row Library}},
	journal = {Parallel Computing},
	volume = {40},
	number = {5-6},
	year = {2014},
	issn = {0167-8191},
	author = {Urban Borstnik and Joost VandeVondele and Valery Weber and Juerg Hutter}
}
```

To cite the DBCSR software library, use:

```latex
@misc{dbcsr-software,
	author = {The CP2K Developers Group},
	title = {{DBCSR: Distributed Block Compressed Sparse Row matrix library}},
	publisher = {GitHub},
	journal = {GitHub repository},
	year = {2020},
	url = {https://github.com/cp2k/dbcsr}
}
```

Contributing
-----------------

Your contribution to the project is welcome! Please see [DBCSR's contribution guidelines](https://github.com/cp2k/dbcsr/blob/develop/CONTRIBUTING.md) and [this wiki page](https://github.com/cp2k/dbcsr/wiki/Development).


================================================
FILE: LICENSE
================================================
                    GNU GENERAL PUBLIC LICENSE
                       Version 2, June 1991

 Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.

                            Preamble

  The licenses for most software are designed to take away your
freedom to share and change it.  By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users.  This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it.  (Some other Free Software Foundation software is covered by
the GNU Lesser General Public License instead.)  You can apply it to
your programs, too.

  When we speak of free software, we are referring to freedom, not
price.  Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.

  To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.

  For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have.  You must make sure that they, too, receive or can get the
source code.  And you must show them these terms so they know their
rights.

  We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.

  Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software.  If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.

  Finally, any free program is threatened constantly by software
patents.  We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary.  To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.

  The precise terms and conditions for copying, distribution and
modification follow.

                    GNU GENERAL PUBLIC LICENSE
   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION

  0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License.  The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language.  (Hereinafter, translation is included without limitation in
the term "modification".)  Each licensee is addressed as "you".

Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope.  The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.

  1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.

You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.

  2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:

    a) You must cause the modified files to carry prominent notices
    stating that you changed the files and the date of any change.

    b) You must cause any work that you distribute or publish, that in
    whole or in part contains or is derived from the Program or any
    part thereof, to be licensed as a whole at no charge to all third
    parties under the terms of this License.

    c) If the modified program normally reads commands interactively
    when run, you must cause it, when started running for such
    interactive use in the most ordinary way, to print or display an
    announcement including an appropriate copyright notice and a
    notice that there is no warranty (or else, saying that you provide
    a warranty) and that users may redistribute the program under
    these conditions, and telling the user how to view a copy of this
    License.  (Exception: if the Program itself is interactive but
    does not normally print such an announcement, your work based on
    the Program is not required to print an announcement.)

These requirements apply to the modified work as a whole.  If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works.  But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.

Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.

In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.

  3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:

    a) Accompany it with the complete corresponding machine-readable
    source code, which must be distributed under the terms of Sections
    1 and 2 above on a medium customarily used for software interchange; or,

    b) Accompany it with a written offer, valid for at least three
    years, to give any third party, for a charge no more than your
    cost of physically performing source distribution, a complete
    machine-readable copy of the corresponding source code, to be
    distributed under the terms of Sections 1 and 2 above on a medium
    customarily used for software interchange; or,

    c) Accompany it with the information you received as to the offer
    to distribute corresponding source code.  (This alternative is
    allowed only for noncommercial distribution and only if you
    received the program in object code or executable form with such
    an offer, in accord with Subsection b above.)

The source code for a work means the preferred form of the work for
making modifications to it.  For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable.  However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.

If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.

  4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License.  Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.

  5. You are not required to accept this License, since you have not
signed it.  However, nothing else grants you permission to modify or
distribute the Program or its derivative works.  These actions are
prohibited by law if you do not accept this License.  Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.

  6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions.  You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.

  7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License.  If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all.  For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.

If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.

It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices.  Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.

This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.

  8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded.  In such case, this License incorporates
the limitation as if written in the body of this License.

  9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time.  Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.

Each version is given a distinguishing version number.  If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation.  If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.

  10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission.  For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this.  Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.

                            NO WARRANTY

  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.

  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.

                     END OF TERMS AND CONDITIONS

            How to Apply These Terms to Your New Programs

  If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.

  To do so, attach the following notices to the program.  It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.

    DBCSR: Distributed Block Compressed Sparse Row matrix library
    Copyright (C) by the DBCSR developers group - All rights reserved

    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation; either version 2 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License along
    with this program; if not, write to the Free Software Foundation, Inc.,
    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

Also add information on how to contact you by electronic and paper mail.

If the program is interactive, make it output a short notice like this
when it starts in an interactive mode:

    Gnomovision version 69, Copyright (C) year name of author
    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
    This is free software, and you are welcome to redistribute it
    under certain conditions; type `show c' for details.

The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License.  Of course, the commands you use may
be called something other than `show w' and `show c'; they could even be
mouse-clicks or menu items--whatever suits your program.

You should also get your employer (if you work as a programmer) or your
school, if any, to sign a "copyright disclaimer" for the program, if
necessary.  Here is a sample; alter the names:

  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
  `Gnomovision' (which makes passes at compilers) written by James Hacker.

  <signature of Ty Coon>, 1 April 1989
  Ty Coon, President of Vice

This General Public License does not permit incorporating your program into
proprietary programs.  If your program is a subroutine library, you may
consider it more useful to permit linking proprietary applications with the
library.  If this is what you want to do, use the GNU Lesser General
Public License instead of this License.


================================================
FILE: README.md
================================================
# DBCSR: Distributed Block Compressed Sparse Row matrix library

[![Build Status Linux](https://github.com/cp2k/dbcsr/actions/workflows/testing-linux.yml/badge.svg)](https://github.com/cp2k/dbcsr/actions/workflows/testing-linux.yml) [![Build Status MacOS](https://github.com/cp2k/dbcsr/actions/workflows/testing-macos.yml/badge.svg)](https://github.com/cp2k/dbcsr/actions/workflows/testing-macos.yml) [![Build Status Latest GCC](https://github.com/cp2k/dbcsr/actions/workflows/testing-gcc.yml/badge.svg)](https://github.com/cp2k/dbcsr/actions/workflows/testing-gcc.yml)


[![codecov](https://codecov.io/gh/cp2k/dbcsr/branch/develop/graph/badge.svg)](https://codecov.io/gh/cp2k/dbcsr)
[![Licence](https://img.shields.io/badge/license-GPL%20v2.0-blue.svg)](./LICENSE)
[![GitHub Releases](https://img.shields.io/github/release-pre/cp2k/dbcsr.svg)](https://github.com/cp2k/dbcsr/releases)

DBCSR is a library designed to efficiently perform sparse matrix-matrix multiplication, among other operations.
It is MPI and OpenMP parallel and can exploit Nvidia and AMD GPUs via CUDA and HIP.

<p align="center">
<img src="docs/media/logo/logo.png" width="500">
</p>

## How to Install

Follow the [installation guide](https://cp2k.github.io/dbcsr/develop/page/2-user-guide/1-installation/index.html).

## Documentation

Documentation is [available online](https://cp2k.github.io/dbcsr/) for the latest release.

## How to Cite

To cite DBCSR, use the following paper

```latex
@article{dbcsr,
	title = {{Sparse Matrix Multiplication: The Distributed Block-Compressed Sparse Row Library}},
	journal = {Parallel Computing},
	volume = {40},
	number = {5-6},
	year = {2014},
	issn = {0167-8191},
	author = {Urban Borstnik and Joost VandeVondele and Valery Weber and Juerg Hutter}
}
```

To cite the DBCSR software library, use:

```latex
@misc{dbcsr-software,
	author = {The CP2K Developers Group},
	title = {{DBCSR: Distributed Block Compressed Sparse Row matrix library}},
	publisher = {GitHub},
	journal = {GitHub repository},
	year = {2022},
	url = {https://github.com/cp2k/dbcsr}
}
```

## Contributing to DBCSR

Your contribution to the project is welcome!
Please see [DBCSR's contribution guidelines](./CONTRIBUTING.md) and this [wiki page](https://github.com/cp2k/dbcsr/wiki/Development). For any help, please notify the other developers.


================================================
FILE: VERSION
================================================
MAJOR = 2
MINOR = 9
PATCH = 1
# A specific DATE (YYYY-MM-DD) fixes an official release, otherwise
# it is considered Development version.
DATE  = 2025-12-19




================================================
FILE: cmake/CheckCompilerSupport.cmake
================================================
include(CheckFortranSourceCompiles)

set(CHECK_PROGRAMS f2008-norm2.f90 f2008-block_construct.f90
                   f2008-contiguous.f90 f95-reshape-order-allocatable.f90)

set(_saved_fortran_flags "${CMAKE_Fortran_FLAGS}")
set(CMAKE_Fortran_FLAGS "")
set(CMAKE_TRY_COMPILE_TARGET_TYPE STATIC_LIBRARY)
foreach (prog ${CHECK_PROGRAMS})
  get_filename_component(prog_ext ${prog} EXT) # get the src extension to pass
                                               # along
  get_filename_component(prog_name ${prog} NAME_WE)

  file(READ "${CMAKE_CURRENT_LIST_DIR}/compiler-tests/${prog}" prog_src)
  CHECK_Fortran_SOURCE_COMPILES("${prog_src}" "${prog_name}" SRC_EXT
                                "${prog_ext}")

  if (NOT ${prog_name})
    message(
      FATAL_ERROR "Your compiler does not support all required F2008 features")
  endif ()
endforeach ()
unset(CMAKE_TRY_COMPILE_TARGET_TYPE)
set(CMAKE_Fortran_FLAGS "${_saved_fortran_flags}")


================================================
FILE: cmake/CompilerConfiguration.cmake
================================================
if (CMAKE_Fortran_COMPILER_ID STREQUAL "GNU")
  set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -ffree-form -std=f2008ts -fimplicit-none -Werror=aliasing -Werror=ampersand -Werror=c-binding-type -Werror=intrinsic-shadow -Werror=intrinsics-std -Werror=line-truncation -Werror=tabs -Werror=target-lifetime -Werror=underflow -Werror=unused-but-set-parameter -Werror=unused-but-set-variable -Werror=unused-variable -Werror=unused-dummy-argument -Werror=conversion -Werror=zerotrip -Wno-maybe-uninitialized -Werror=unused-parameter")
  if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL 10) # comparison against CXX version rather than GFortran version
    set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -fallow-argument-mismatch") # required for 10+ (MPI wrap)
   else ()
    set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -Werror=argument-mismatch")  # gcc 10+ has this automatically
  endif ()
  if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL 13) # comparison against CXX version rather than GFortran version
    set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -Wno-error=uninitialized") # false positive (allocatable array)
  endif ()
  include(CheckFortranCompilerFlag)
  check_fortran_compiler_flag("-Wno-error=deprecated-openmp" _fc_has_deprecated_openmp)
  if (_fc_has_deprecated_openmp)
    set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -Wno-error=deprecated-openmp")
  endif ()
  set(CMAKE_Fortran_FLAGS_RELEASE  "-O3 -g -funroll-loops")
  set(CMAKE_Fortran_FLAGS_COVERAGE "-O0 -g --coverage -fno-omit-frame-pointer -fcheck=all,no-array-temps -ffpe-trap=invalid,zero,overflow -fbacktrace -finit-real=snan -finit-integer=-42 -finit-derived -Werror=realloc-lhs -finline-matmul-limit=0 -Werror")
  set(CMAKE_Fortran_FLAGS_DEBUG    "-O2 -ggdb -fno-omit-frame-pointer -fcheck=all -ffpe-trap=invalid,zero,overflow -fbacktrace -finit-real=snan -finit-integer=-42 -finit-derived -finline-matmul-limit=0 -fsanitize=undefined -fsanitize=address -fsanitize-recover=all -Wall -Wextra -Werror -Werror=realloc-lhs -Wno-error=array-temporaries -Wno-error=compare-reals -Wno-error=function-elimination -Wno-error=surprising")
  if ((NOT (USE_MPI)) OR (NOT ("${MPI_Fortran_LIBRARY_VERSION_STRING}" MATCHES "Open MPI")))
    set(CMAKE_Fortran_FLAGS_DEBUG "${CMAKE_Fortran_FLAGS_DEBUG} -fsanitize=leak")
  endif ()
elseif (CMAKE_Fortran_COMPILER_ID STREQUAL "Intel")
  set(CMAKE_Fortran_FLAGS          "${CMAKE_Fortran_FLAGS} -free -stand=f18 -fpp -heap-arrays")
  set(CMAKE_Fortran_FLAGS_RELEASE  "-O3 -g")
  set(CMAKE_Fortran_FLAGS_DEBUG    "-O2 -debug")
elseif (CMAKE_Fortran_COMPILER_ID STREQUAL "IntelLLVM")
  set(CMAKE_Fortran_FLAGS          "${CMAKE_Fortran_FLAGS} -free -stand f18 -fpp -heap-arrays")
  set(CMAKE_Fortran_FLAGS_RELEASE  "-O3 -g")
  set(CMAKE_Fortran_FLAGS_DEBUG    "-O2 -debug")
elseif (CMAKE_Fortran_COMPILER_ID STREQUAL "PGI")
  set(CMAKE_Fortran_FLAGS          "${CMAKE_Fortran_FLAGS} -Mfreeform -Mextend -Mallocatable=03")  # -Mallocatable=03: enable F2003+ assignment semantics
  set(CMAKE_Fortran_FLAGS_RELEASE  "-fast")
  set(CMAKE_Fortran_FLAGS_DEBUG    "-g")
elseif (CMAKE_Fortran_COMPILER_ID STREQUAL "NAG")
  set(CMAKE_Fortran_FLAGS          "${CMAKE_Fortran_FLAGS} -f2008 -free -Warn=reallocation -Warn=subnormal")
  set(CMAKE_Fortran_FLAGS_RELEASE  "-O2")
  set(CMAKE_Fortran_FLAGS_DEBUG    "-g -C")
  if (NOT OpenMP_FOUND)
    set(CMAKE_Fortran_FLAGS_RELEASE  "${CMAKE_Fortran_FLAGS_RELEASE} -gline")  # -gline is only supported without OpenMP
    set(CMAKE_Fortran_FLAGS_DEBUG  "${CMAKE_Fortran_FLAGS_DEBUG} -C=all")  # some checks are not available with OpenMP
  endif ()
elseif (CMAKE_Fortran_COMPILER_ID STREQUAL "Cray")
  set(CMAKE_Fortran_FLAGS          "${CMAKE_Fortran_FLAGS} -f free -M3105 -ME7212")  # -M3105: hide a false-positive warning about modified loop variables due to loop fusing, promote warning 7212 to an error
  set(CMAKE_Fortran_FLAGS_RELEASE  "-O2 -G2")
  set(CMAKE_Fortran_FLAGS_DEBUG    "-G2")
  set(CMAKE_Fortran_MODOUT_FLAG    "-ef")  # override to get lower-case module file names
elseif (CMAKE_Fortran_COMPILER_ID STREQUAL "LLVMFlang")
  set(CMAKE_Fortran_FLAGS          "${CMAKE_Fortran_FLAGS} -ffree-form -std=f2018 -cpp")
  set(CMAKE_Fortran_FLAGS_RELEASE  "-O3")
  set(CMAKE_Fortran_FLAGS_DEBUG    "-O0 -g")
elseif (CMAKE_Fortran_COMPILER_ID STREQUAL "Flang")
  set(CMAKE_Fortran_FLAGS          "${CMAKE_Fortran_FLAGS} -ffree-form -cpp")
  set(CMAKE_Fortran_FLAGS_RELEASE  "-O3")
  set(CMAKE_Fortran_FLAGS_DEBUG    "-O0 -g")
else ()
  message(WARNING "\
Unknown Fortran compiler, trying without any additional (optimization) flags.\n\
You will most likely have to specify extra options for free-form Fortran 2008 for your compiler!\n\
Please open an issue at https://github.com/cp2k/dbcsr/issues with the reported compiler name and the required flags.")
  message("-- CMAKE_Fortran_COMPILER_ID: " ${CMAKE_Fortran_COMPILER_ID})
  message("-- CMAKE_Fortran_COMPILER full path: " ${CMAKE_Fortran_COMPILER})
endif ()

if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
  set(CMAKE_CXX_FLAGS_RELEASE      "-O3 -g -funroll-loops -Wall -Wextra -Werror -Wno-missing-field-initializers")
  set(CMAKE_CXX_FLAGS_COVERAGE     "-O0 -g --coverage -Wall -Wextra -Werror -Wno-missing-field-initializers")
  set(CMAKE_CXX_FLAGS_DEBUG        "-O2 -ggdb -Wall -Wextra -Werror -Wno-missing-field-initializers -fsanitize=undefined -fsanitize=address -fsanitize-recover=all")
  if ((NOT (USE_MPI)) OR (NOT ("${MPI_Fortran_LIBRARY_VERSION_STRING}" MATCHES "Open MPI")))
    set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fsanitize=leak")
  endif ()
elseif (CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
  set(CMAKE_CXX_FLAGS_RELEASE      "-O3 -funroll-loops")
  set(CMAKE_CXX_FLAGS_COVERAGE     "-O0 -g --coverage")
  set(CMAKE_CXX_FLAGS_DEBUG        "-O0 -g")
elseif (CMAKE_CXX_COMPILER_ID STREQUAL "AppleClang")
  set(CMAKE_CXX_FLAGS_RELEASE      "-O3 -funroll-loops")
  set(CMAKE_CXX_FLAGS_COVERAGE     "-O0 -g --coverage")
  set(CMAKE_CXX_FLAGS_DEBUG        "-O0 -g")
  set(CMAKE_EXE_LINKER_FLAGS_COVERAGE "-lgcov")  # Apple's Clang needs an extra
elseif (CMAKE_CXX_COMPILER_ID STREQUAL "Intel")
  set(CMAKE_CXX_FLAGS_RELEASE      "-O3 -g")
  set(CMAKE_CXX_FLAGS_DEBUG        "-O0 -debug")
elseif (CMAKE_CXX_COMPILER_ID STREQUAL "IntelLLVM")
  set(CMAKE_CXX_FLAGS_RELEASE      "-O3 -g")
  set(CMAKE_CXX_FLAGS_DEBUG        "-O0 -debug")
elseif (CMAKE_CXX_COMPILER_ID STREQUAL "PGI")
  set(CMAKE_CXX_FLAGS_RELEASE      "-fast")
  set(CMAKE_CXX_FLAGS_DEBUG        "-g")
elseif (CMAKE_CXX_COMPILER_ID STREQUAL "Cray")
  set(CMAKE_CXX_FLAGS_RELEASE      "-O3")
  set(CMAKE_CXX_FLAGS_DEBUG        "-G2")
  if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 9)
    # prevent deallocation failures due to tcmalloc's free with glibc's aligned_alloc, see https://bugzilla.redhat.com/show_bug.cgi?id=1569391
    set(CMAKE_C_FLAGS              "${CMAKE_C_FLAGS} -h system_alloc")
    set(CMAKE_CXX_FLAGS            "${CMAKE_CXX_FLAGS} -h system_alloc")
    set(CMAKE_Fortran_FLAGS        "${CMAKE_Fortran_FLAGS} -h system_alloc")
    # since the detection of the implicitly linked libraries occurs before we can intervene, filter them out again
    list(FILTER CMAKE_C_IMPLICIT_LINK_LIBRARIES EXCLUDE REGEX "tcmalloc")
    list(FILTER CMAKE_Fortran_IMPLICIT_LINK_LIBRARIES EXCLUDE REGEX "tcmalloc")
  endif ()
  # OpenACC support with CCE is EOL: causes https://github.com/cp2k/dbcsr/issues/261
  # eventually check compiler version (similar to -h system_alloc)
  set(CMAKE_C_FLAGS                "${CMAKE_C_FLAGS} -hnoacc -h nomessage=1234")
  set(CMAKE_CXX_FLAGS              "${CMAKE_CXX_FLAGS} -hnoacc -h nomessage=1234")
  set(CMAKE_Fortran_FLAGS          "${CMAKE_Fortran_FLAGS} -hnoacc -M1234")
else ()
  message(WARNING "\
Unknown C++ compiler, trying without any additional (optimization) flags.\n\
You may have to specify flags for C++11/14 support manually.\n\
Please open an issue at https://github.com/cp2k/dbcsr/issues with the reported compiler name and the required flags.")
  message("-- CMAKE_CXX_COMPILER_ID: " ${CMAKE_CXX_COMPILER_ID})
  message("-- CMAKE_CXX_COMPILER full path: " ${CMAKE_CXX_COMPILER})
endif ()

# inherit C flags from CXX
set(CMAKE_C_FLAGS_RELEASE ${CMAKE_CXX_FLAGS_RELEASE})
set(CMAKE_C_FLAGS_COVERAGE ${CMAKE_CXX_FLAGS_COVERAGE})
set(CMAKE_C_FLAGS_DEBUG ${CMAKE_CXX_FLAGS_DEBUG})


================================================
FILE: cmake/CustomTargets.cmake
================================================
# =================================================================================================
# BUILD DISTRIBUTION
set(ARCHIVE_NAME "${CMAKE_PROJECT_NAME}-${dbcsr_VERSION}")
add_custom_target(
  dist
  COMMENT "Building distribution: ${ARCHIVE_NAME}"
  COMMAND ${CMAKE_COMMAND} -E make_directory "${CMAKE_BINARY_DIR}/dist"
  COMMAND git archive-all "${CMAKE_BINARY_DIR}/dist/${ARCHIVE_NAME}.tar.gz"
  COMMAND ${CMAKE_COMMAND} -E echo "SHA512 Digests:"
  COMMAND ${CMAKE_COMMAND} -E sha512sum
          "${CMAKE_BINARY_DIR}/dist/${ARCHIVE_NAME}.tar.gz"
  WORKING_DIRECTORY ${CMAKE_SOURCE_DIR})

# =================================================================================================
# LCOV - COVERAGE REPORTS GENERATION
find_program(
  LCOV_EXE lcov
  DOC "path to the lcov executable (required to generate coverage reports)")

find_program(
  GENHTML_EXE genhtml
  DOC "path to the genhtml executable (required to generate HTML coverage reports)"
)

set(LCOV_ARGS CACHE STRING
                    "specify additional arguments to pass to lcov for cov-info")
add_custom_target(
  cov-info
  COMMAND
    "${LCOV_EXE}" --directory "${CMAKE_BINARY_DIR}" --base-dir
    "${CMAKE_SOURCE_DIR}" --no-external --capture ${LCOV_ARGS} --output-file
    coverage.info
  COMMAND "${LCOV_EXE}" --list coverage.info
  VERBATIM
  BYPRODUCTS coverage.info
  COMMENT "Generate coverage.info using lcov")

add_custom_target(
  cov-genhtml
  COMMAND "${GENHTML_EXE}" coverage.info --output-directory cov-html
  COMMENT
    "Generate a HTML-based coverage report using lcov in ${CMAKE_BINARY_DIR}/cov-html"
  VERBATIM) # Note: this directory will not be cleaned by `cmake --build .. --
            # clean`
add_dependencies(cov-genhtml cov-info)


================================================
FILE: cmake/GetGitRevisionDescription.cmake
================================================
# - Returns a version string from Git
#
# These functions force a re-configure on each git commit so that you can
# trust the values of the variables in your build system.
#
#  get_git_head_revision(<refspecvar> <hashvar> [<additional arguments to git describe> ...])
#
# Returns the refspec and sha hash of the current head revision
#
#  git_describe(<var> [<additional arguments to git describe> ...])
#
# Returns the results of git describe on the source tree, and adjusting
# the output so that it tests false if an error occurs.
#
#  git_get_exact_tag(<var> [<additional arguments to git describe> ...])
#
# Returns the results of git describe --exact-match on the source tree,
# and adjusting the output so that it tests false if there was no exact
# matching tag.
#
#  git_local_changes(<var>)
#
# Returns either "CLEAN" or "DIRTY" with respect to uncommitted changes.
# Uses the return code of "git diff-index --quiet HEAD --".
# Does not regard untracked files.
#
# Requires CMake 2.6 or newer (uses the 'function' command)
#
# Original Author:
# 2009-2010 Ryan Pavlik <rpavlik@iastate.edu> <abiryan@ryand.net>
# http://academic.cleardefinition.com
# Iowa State University HCI Graduate Program/VRAC
#
# Copyright Iowa State University 2009-2010.
# Distributed under the Boost Software License, Version 1.0.
# (See accompanying file LICENSE_1_0.txt or copy at
# http://www.boost.org/LICENSE_1_0.txt)

if(__get_git_revision_description)
	return()
endif()
set(__get_git_revision_description YES)

# We must run the following at "include" time, not at function call time,
# to find the path to this module rather than the path to a calling list file
get_filename_component(_gitdescmoddir ${CMAKE_CURRENT_LIST_FILE} PATH)

function(get_git_head_revision _refspecvar _hashvar)
	set(GIT_PARENT_DIR "${CMAKE_CURRENT_SOURCE_DIR}")
	set(GIT_DIR "${GIT_PARENT_DIR}/.git")
	while(NOT EXISTS "${GIT_DIR}")	# .git dir not found, search parent directories
		set(GIT_PREVIOUS_PARENT "${GIT_PARENT_DIR}")
		get_filename_component(GIT_PARENT_DIR ${GIT_PARENT_DIR} PATH)
		if(GIT_PARENT_DIR STREQUAL GIT_PREVIOUS_PARENT)
			# We have reached the root directory, we are not in git
			set(${_refspecvar} "GITDIR-NOTFOUND" PARENT_SCOPE)
			set(${_hashvar} "GITDIR-NOTFOUND" PARENT_SCOPE)
			return()
		endif()
		set(GIT_DIR "${GIT_PARENT_DIR}/.git")
	endwhile()
	# check if this is a submodule
	if(NOT IS_DIRECTORY ${GIT_DIR})
		file(READ ${GIT_DIR} submodule)
		string(REGEX REPLACE "gitdir: (.*)\n$" "\\1" GIT_DIR_RELATIVE ${submodule})
		get_filename_component(SUBMODULE_DIR ${GIT_DIR} PATH)
		get_filename_component(GIT_DIR ${SUBMODULE_DIR}/${GIT_DIR_RELATIVE} ABSOLUTE)
	endif()
	if(NOT IS_DIRECTORY "${GIT_DIR}")
		file(READ ${GIT_DIR} worktree)
 		string(REGEX REPLACE "gitdir: (.*)worktrees(.*)\n$" "\\1" GIT_DIR ${worktree})
 	endif()
	set(GIT_DATA "${CMAKE_CURRENT_BINARY_DIR}/CMakeFiles/git-data")
	if(NOT EXISTS "${GIT_DATA}")
		file(MAKE_DIRECTORY "${GIT_DATA}")
	endif()

	if(NOT EXISTS "${GIT_DIR}/HEAD")
		return()
	endif()
	set(HEAD_FILE "${GIT_DATA}/HEAD")
	configure_file("${GIT_DIR}/HEAD" "${HEAD_FILE}" COPYONLY)

	configure_file("${_gitdescmoddir}/GetGitRevisionDescription.cmake.in"
		"${GIT_DATA}/grabRef.cmake"
		@ONLY)
	include("${GIT_DATA}/grabRef.cmake")

	set(${_refspecvar} "${HEAD_REF}" PARENT_SCOPE)
	set(${_hashvar} "${HEAD_HASH}" PARENT_SCOPE)
endfunction()

function(git_describe _var)
	if(NOT GIT_FOUND)
		find_package(Git QUIET)
	endif()
	get_git_head_revision(refspec hash)
	if(NOT GIT_FOUND)
		set(${_var} "GIT-NOTFOUND" PARENT_SCOPE)
		return()
	endif()
	if(NOT hash)
		set(${_var} "HEAD-HASH-NOTFOUND" PARENT_SCOPE)
		return()
	endif()

	# TODO sanitize
	#if((${ARGN}" MATCHES "&&") OR
	#	(ARGN MATCHES "||") OR
	#	(ARGN MATCHES "\\;"))
	#	message("Please report the following error to the project!")
	#	message(FATAL_ERROR "Looks like someone's doing something nefarious with git_describe! Passed arguments ${ARGN}")
	#endif()

	#message(STATUS "Arguments to execute_process: ${ARGN}")

	execute_process(COMMAND
		"${GIT_EXECUTABLE}"
		describe
		${hash}
		${ARGN}
		WORKING_DIRECTORY
		"${CMAKE_CURRENT_SOURCE_DIR}"
		RESULT_VARIABLE
		res
		OUTPUT_VARIABLE
		out
		ERROR_QUIET
		OUTPUT_STRIP_TRAILING_WHITESPACE)
	if(NOT res EQUAL 0)
		set(out "${out}-${res}-NOTFOUND")
	endif()

	set(${_var} "${out}" PARENT_SCOPE)
endfunction()

function(git_get_exact_tag _var)
	git_describe(out --exact-match ${ARGN})
	set(${_var} "${out}" PARENT_SCOPE)
endfunction()

function(git_local_changes _var)
	if(NOT GIT_FOUND)
		find_package(Git QUIET)
	endif()
	get_git_head_revision(refspec hash)
	if(NOT GIT_FOUND)
		set(${_var} "GIT-NOTFOUND" PARENT_SCOPE)
		return()
	endif()
	if(NOT hash)
		set(${_var} "HEAD-HASH-NOTFOUND" PARENT_SCOPE)
		return()
	endif()

	execute_process(COMMAND
		"${GIT_EXECUTABLE}"
		diff-index --quiet HEAD --
		WORKING_DIRECTORY
		"${CMAKE_CURRENT_SOURCE_DIR}"
		RESULT_VARIABLE
		res
		OUTPUT_VARIABLE
		out
		ERROR_QUIET
		OUTPUT_STRIP_TRAILING_WHITESPACE)
	if(res EQUAL 0)
		set(${_var} "CLEAN" PARENT_SCOPE)
	else()
		set(${_var} "DIRTY" PARENT_SCOPE)
	endif()
endfunction()


================================================
FILE: cmake/GetGitRevisionDescription.cmake.in
================================================
#
# Internal file for GetGitRevisionDescription.cmake
#
# Requires CMake 2.6 or newer (uses the 'function' command)
#
# Original Author:
# 2009-2010 Ryan Pavlik <rpavlik@iastate.edu> <abiryan@ryand.net>
# http://academic.cleardefinition.com
# Iowa State University HCI Graduate Program/VRAC
#
# Copyright Iowa State University 2009-2010.
# Distributed under the Boost Software License, Version 1.0.
# (See accompanying file LICENSE_1_0.txt or copy at
# http://www.boost.org/LICENSE_1_0.txt)

set(HEAD_HASH)

file(READ "@HEAD_FILE@" HEAD_CONTENTS LIMIT 1024)

string(STRIP "${HEAD_CONTENTS}" HEAD_CONTENTS)
if(HEAD_CONTENTS MATCHES "ref")
	# named branch
	string(REPLACE "ref: " "" HEAD_REF "${HEAD_CONTENTS}")
	if(EXISTS "@GIT_DIR@/${HEAD_REF}")
		configure_file("@GIT_DIR@/${HEAD_REF}" "@GIT_DATA@/head-ref" COPYONLY)
	else()
		configure_file("@GIT_DIR@/packed-refs" "@GIT_DATA@/packed-refs" COPYONLY)
		file(READ "@GIT_DATA@/packed-refs" PACKED_REFS)
		if(${PACKED_REFS} MATCHES "([0-9a-z]*) ${HEAD_REF}")
			set(HEAD_HASH "${CMAKE_MATCH_1}")
		endif()
	endif()
else()
	# detached HEAD
	configure_file("@GIT_DIR@/HEAD" "@GIT_DATA@/head-ref" COPYONLY)
endif()

if(NOT HEAD_HASH)
	file(READ "@GIT_DATA@/head-ref" HEAD_HASH LIMIT 1024)
	string(STRIP "${HEAD_HASH}" HEAD_HASH)
endif()


================================================
FILE: cmake/compiler-tests/f2008-block_construct.f90
================================================
!--------------------------------------------------------------------------------------------------!
! Copyright (C) by the DBCSR developers group - All rights reserved                                !
! This file is part of the DBCSR library.                                                          !
!                                                                                                  !
! For information on the license, see the LICENSE file.                                            !
! For further information please visit https://dbcsr.cp2k.org                                      !
! SPDX-License-Identifier: GPL-2.0+                                                                !
!--------------------------------------------------------------------------------------------------!

program main
   try: block
      exit try
   end block try
end program


================================================
FILE: cmake/compiler-tests/f2008-contiguous.f90
================================================
!--------------------------------------------------------------------------------------------------!
! Copyright (C) by the DBCSR developers group - All rights reserved                                !
! This file is part of the DBCSR library.                                                          !
!                                                                                                  !
! For information on the license, see the LICENSE file.                                            !
! For further information please visit https://dbcsr.cp2k.org                                      !
! SPDX-License-Identifier: GPL-2.0+                                                                !
!--------------------------------------------------------------------------------------------------!

program main
   implicit none

   ! test whether the compiler supports the CONTIGUOUS keyword
   integer, allocatable, target :: targ(:)
   integer, contiguous, pointer :: ptr(:)

   ! allocated data is always contiguous
   allocate (targ(10))
   ptr => targ

   ! IS_CONTIGUOUS was implemented in gcc-9 and is therefore not tested for yet
end program


================================================
FILE: cmake/compiler-tests/f2008-norm2.f90
================================================
!--------------------------------------------------------------------------------------------------!
! Copyright (C) by the DBCSR developers group - All rights reserved                                !
! This file is part of the DBCSR library.                                                          !
!                                                                                                  !
! For information on the license, see the LICENSE file.                                            !
! For further information please visit https://dbcsr.cp2k.org                                      !
! SPDX-License-Identifier: GPL-2.0+                                                                !
!--------------------------------------------------------------------------------------------------!

program main
   implicit none
   real :: x(2) = [real :: 3, 4]
   if (abs(norm2(x) - 5.) > 1.0D-5) stop 1
end program


================================================
FILE: cmake/compiler-tests/f95-reshape-order-allocatable.f90
================================================
!--------------------------------------------------------------------------------------------------!
! Copyright (C) by the DBCSR developers group - All rights reserved                                !
! This file is part of the DBCSR library.                                                          !
!                                                                                                  !
! For information on the license, see the LICENSE file.                                            !
! For further information please visit https://dbcsr.cp2k.org                                      !
! SPDX-License-Identifier: GPL-2.0+                                                                !
!--------------------------------------------------------------------------------------------------!
program test_reshape
   integer, dimension(4) :: x = [1, 2, 3, 4]
   integer, dimension(:), allocatable :: order

   allocate (order(2))
   order(:) = [2, 1]

   ! PGI <= 19.10 does not accept allocatables for the order parameter
   print *, reshape(x, shape=[2, 2], order=order)
end program


================================================
FILE: cmake/fypp-sources.cmake
================================================
add_custom_target(fypp) # common target for all fypp calls

# Use a system-provided fypp if available, otherwise the bundled one
find_program(
  FYPP_EXECUTABLE fypp
  DOC "The FYPP preprocessor"
  PATHS ../tools/build_utils/fypp/bin)
if (NOT FYPP_EXECUTABLE)
  message(FATAL_ERROR "Failed to find the FYPP preprocessor.")
else ()
  message(STATUS "FYPP preprocessor found.")
endif ()

if ((CMAKE_Fortran_COMPILER_ID STREQUAL "GNU")
    AND (CMAKE_GENERATOR STREQUAL "Ninja")
    AND (CMAKE_VERSION VERSION_GREATER_EQUAL 3.16))
  set(fypp_flags --line-numbering --line-marker-format=gfortran5)
elseif (CMAKE_BUILD_TYPE MATCHES COVERAGE)
  message(
    WARNING
      "Coverage build requested but your environment does not support Line Control directives in Fypp"
  )
  message(
    WARNING
      "You need CMake 3.16+, Ninja (CMake-patched) and gfortran 5+ for this to work!"
  )
  # otherwise the referenced lines in the Coverage report point to either the
  # original (unexpanded files) or to the Fypped sources which may then not be
  # picked up by the postprocessing tools. CMake 3.16+ and Ninja are needed
  # since older CMake (or CMake with make) are unable to parse Line Control
  # directives within line-continued USE stmts, see
  # https://gitlab.kitware.com/cmake/cmake/issues/18188
endif ()

function (ADD_FYPP_SOURCES OUTVAR)
  set(outfiles)

  foreach (f ${ARGN})
    # first we might need to make the input file absolute
    get_filename_component(f "${f}" ABSOLUTE)
    get_filename_component(ext "${f}" EXT)
    # get the relative path of the file to the current source dir
    file(RELATIVE_PATH rf "${CMAKE_CURRENT_SOURCE_DIR}" "${f}")
    # set the output filename of fypped sources
    set(of "${CMAKE_CURRENT_BINARY_DIR}/${rf}")

    # create the output directory if it doesn't exist
    get_filename_component(d "${of}" PATH)
    if (NOT IS_DIRECTORY "${d}")
      file(MAKE_DIRECTORY "${d}")
    endif ()

    if ("${f}" MATCHES ".F$")
      # append the output file to the list of outputs
      list(APPEND outfiles "${of}")
      # now add the custom command to generate the output file
      add_custom_command(
        OUTPUT "${of}"
        COMMAND ${Python_EXECUTABLE} ${FYPP_EXECUTABLE} ARGS ${fypp_flags}
                "${f}" "${of}"
        MAIN_DEPENDENCY "${f}"
        VERBATIM)
    elseif ("${f}" MATCHES ".h$")
      # append the output file to the list of outputs
      list(APPEND outfiles "${of}")
      # now add the custom command to generate the output file
      add_custom_command(
        OUTPUT "${of}"
        COMMAND ${Python_EXECUTABLE} ${FYPP_EXECUTABLE} ARGS "-F" "${f}" "${of}"
        DEPENDS "${f}")
    else ()
      configure_file("${f}" "${of}" COPYONLY)
    endif ()
  endforeach ()

  # build a custom target to fypp seperately (required for example by the doc
  # target)
  add_custom_target("fypp_${OUTVAR}" DEPENDS ${outfiles})
  add_dependencies(fypp "fypp_${OUTVAR}")

  # set the output list in the calling scope
  set(${OUTVAR}
      ${outfiles}
      PARENT_SCOPE)
endfunction ()


================================================
FILE: docs/CMakeLists.txt
================================================
# =================================================================================================
# FORD - DOCUMENTATION GENERATION
find_program(
  FORD_EXE ford
  DOC "path to the ford executable (required to generate the documentation)")

# Copy the FORD project-file into the build directory
set(FORD_PROJECT_FILE "${CMAKE_BINARY_DIR}/DBCSR.md")
configure_file(${CMAKE_SOURCE_DIR}/DBCSR.md "${FORD_PROJECT_FILE}")

# Copy the FORD project-file into the build directory
add_custom_target(
  doc
  COMMENT "Generating API documentation and doc pages"
  COMMAND "${FORD_EXE}" "${FORD_PROJECT_FILE}"
  VERBATIM)

if (BUILD_TESTING)
  add_dependencies(doc doc_copy_tests)
endif ()

if (WITH_C_API AND WITH_EXAMPLES)
  add_dependencies(doc doc_copy_examples)
endif ()

add_dependencies(doc fypp) # only depend on the fypp step to avoid building
                           # everything just for the docs


================================================
FILE: docs/guide/1-DBCSR/index.md
================================================
title: DBCSR

# DBCSR

DBCSR is a sparse matrix library designed to efficiently perform sparse matrix-matrix multiplication, among other operations. It is MPI and OpenMP parallel, and can exploit accelerators.

DBCSR was developed as a part of [CP2K](https://github.com/cp2k/cp2k/), where it provides core functionality for [linear scaling electronic structure theory](https://dx.doi.org/10.1021%2Fct200897x). It is now released as a standalone library for integration in other projects.


================================================
FILE: docs/guide/1-DBCSR/publications.md
================================================
title: Publications

# Publications

- **General overview of the library**

Urban Borstnik, Joost VandeVondele, Valery Weber, and Jürg Hutter. 2014. [Sparse Matrix Multiplication: The Distributed Block-Compressed Sparse Row Library](https://dx.doi.org/10.1016%2Fj.parco.2014.03.012). Parallel Comput. 40, 5-6 (2014), 47–58.

- **Use of one-sided MPI and a 2.5D algorithm**

Alfio Lazzaro, Joost VandeVondele, Jürg Hutter, and Ole Schütt. [Increasing the Efficiency of Sparse Matrix-Matrix Multiplication with a 2.5D Algorithm and One-Sided MPI](https://arxiv.org/abs/1705.10218). In Proceedings of the Platform for Advanced Scientific Computing Conference, PASC ’17, pages 3:1–3:9, New York, NY, USA, 2017. ACM.

- **GPU-Backend**

Ole Schütt, Peter Messmer, Jürg Hutter, and Joost VandeVondele. 2015. [GPU-Accelerated Sparse Matrix–Matrix Multiplication for Linear Scaling Density Functional Theory](https://dx.doi.org/10.1002%2F9781118670712.ch8). John Wiley and Sons, Chapter 8, 173–190.

Ilia Sivkov, Alfio Lazzaro, and Jürg Hutter. [DBCSR: A Library for Dense Matrix Multiplications on Distributed GPU-Accelerated Systems](http://arxiv.org/abs/1910.04796). 2019.


================================================
FILE: docs/guide/2-user-guide/1-installation/1-cmake-build-recipes.md
================================================
title: CMake Build Recipes

# DBCSR CMake Build Recipes

Following are recipes for different combinations of compilers, platforms and libraries.
Unless otherwise noted, the examples assume that after fetching/unpacking DBCSR you created
a directory `build/` inside the DBCSR directory and switched into it using `cd build/`.

The listed examples can usually be combined with other build options like *libxsmm* or *CUDA*
even if the examples are not explicitly given.

The instructions used for building in the Continuous Integration can be found in
the `.ci/` folder or in the `.github/workflows/`.

## GNU

### GNU compiler, system MPI and system-provided OpenBLAS

Most Linux systems provide the GNU compiler, a system MPI (OpenMPI or MPICH) using the
GNU compiler as a backend and OpenBLAS for BLAS/LAPACK:

```bash
    cmake ..
```

### GNU compiler, system MPI and Intel MKL

To use the Intel MKL together with the GNU compiler and possibly a system-MPI,
assuming that MKL is installed in `/sw/intel/mkl`.

Verified with MKL provided as part of the Intel Parallel Studio XE 2019.5 installed in `/sw/intel`
with an OS-provided GCC 7.4.1 on Linux openSUSE Leap 15.1, using CMake 3.12.0.

1. Make sure the MKL environment is properly loaded:

```bash
       source /sw/intel/mkl/bin/mklvars.sh intel64
```

2. Make sure CMake picks the Intel MKL over any system-provided BLAS library:

```bash
       cmake -DBLA_VENDOR=Intel10_64lp_seq ..
```

## Intel

Instructions for using Intel compiler or libraries for different parts on non-Cray systems.
For Cray systems, please check further below.

*Note*: in Intel Parallel Studio 2019 there is a potential issue that `mpirun` fails with
the error `OFI addrinfo() failed` on local (non-cluster) installations.
This can be worked around by setting `export I_MPI_FABRICS=shm`.

### Intel MPI, GNU Compiler and system-provided OpenBLAS

Verified with Intel Parallel Studio XE 2019.5 installed in `/sw/intel`
with an OS-provided GCC 7.4.1 on Linux openSUSE Leap 15.1, using CMake 3.12.0.

1. Make sure that the Intel environment is properly loaded:

```bash
       source /sw/intel/bin/compilervars.sh intel64
```

2. Use the Intel-provided MPI compiler wrappers for the GNU toolchain,
   to override CMake's auto-detection which may pick up the system MPI:

```bash
       CC=mpicc FC=mpifc CXX=mpicxx cmake ..
```

### Intel MPI, GNU Compiler and Intel MKL

Verified with Intel Parallel Studio XE 2019.5 installed in `/sw/intel`
with an OS-provided GCC 7.4.1 on Linux openSUSE Leap 15.1, using CMake 3.12.0.

1. Make sure that the Intel environment is properly loaded:

```bash
       source /sw/intel/bin/compilervars.sh intel64
```

2. Use the Intel-provided MPI compiler wrappers for the GNU toolchain:

```bash
       CC=mpicc FC=mpifc CXX=mpicxx cmake -DBLA_VENDOR=Intel10_64lp_seq ..
```

### Intel MPI, Intel Compiler and Intel MKL

Verified with Intel Parallel Studio XE 2019.5 installed in `/sw/intel`
on Linux openSUSE Leap 15.1, using CMake 3.12.0.

1. Make sure that the Intel environment is properly loaded:

```bash
       source /sw/intel/bin/compilervars.sh intel64
```

2. Use the Intel-provided MPI compiler wrappers:

```bash
       CC=mpiicc FC=mpiifort CXX=mpiicxx cmake -DBLA_VENDOR=Intel10_64lp_seq ..
```

## MacOS

Follow what is described in the previous sections.
For GNU, if you have installed Command Line Tools by Apple and GCC with Homebrew that can lead to a
conflict in which compiler CMake will use. Therefore, we suggest specifying GCC, for example

```bash
    CC=gcc-9 CXX=g++-9 cmake ..
```

where `-9` can be adapted to your version.

### PGI

Please note that you need at least PGI >= 19.11.

Assuming that your `$PATH` is set correctly such that `pgcc`, `pgc++` and `pgfortran` can be found,
run the following to get a DBCSR version without MPI:

```bash
    CC=pgcc CXX=pgc++ FC=pgfortran cmake -DUSE_MPI=OFF ..
```

the `-DUSE_MPI=OFF` is needed here to avoid that CMake picks up any MPI installation, for example from Homebrew.

To build with MPI you need an MPI implementation built for/with the PGI compiler, for example the MPICH
usually bundled with the PGI installation.

Make sure that `$PATH` is correctly set to include `mpicc` and `mpifort` from the PGI MPICH installation, then run:

```bash
    CC=mpicc CXX=mpicxx FC=mpifort MPICH_CC=pgcc cmake ..
```

## Cray

Some machines require additional environments to be loaded to either provide
the modules specified below or to be able to properly build with the loaded modules.

Please contact your cluster/datacenter administrator for more information.

Example for the CSCS' Piz Daint:

```bash
    module load daint-mc  # to build for the non-GPU partition
    module load daint-gpu  # to build for the GPU partition
```

*Note*: the `libsci-cray` has different variants for MPI or OpenMP.
When disabling either MPI or OpenMP support in DBCSR you might want to adjust the
selected BLAS/LAPACK library accordingly (e.g. drop the `_mpi`, or `_mp`).

### CCE and libsci-cray

Verified on CSCS' Piz Daint with CCE 10.0.2 and cray-libsci 20.06.1,
using CMake 3.18.4.

1. Make sure that the `PrgEnv-cray` module is loaded:

```bash
       module load PrgEnv-cray
```

2. While the MPI wrapper/compiler will be detected automatically,
   must the BLAS/LAPACK libraries be specified manually:

```bash
       cmake \
         -DCMAKE_SYSTEM_NAME=CrayLinuxEnvironment \
         -DBLAS_LIBRARIES="-lsci_cray_mpi_mp -lhugetlbfs" \
         -DLAPACK_LIBRARIES="-lsci_cray_mpi_mp" \
         ..
```

### Intel Compiler and libsci-cray

Verified on CSCS' Piz Daint with Intel 19.1 and cray-libsci 20.06.1,
using CMake 3.18.4.

1. Make sure that the `PrgEnv-intel` module is loaded:

```bash
       module load PrgEnv-intel
```

2. While the MPI wrapper/compiler will be detected automatically,
   must the BLAS/LAPACK libraries be specified manually:

```bash
       cmake \
         -DCMAKE_SYSTEM_NAME=CrayLinuxEnvironment \
         -DBLAS_LIBRARIES="-lsci_intel_mpi_mp -lhugetlbfs" \
         -DLAPACK_LIBRARIES="-lsci_intel_mpi_mp" \
         ..
```

### GNU Compiler and libsci-cray

Verified on CSCS' Piz Daint with GNU 8.3.0 and cray-libsci 20.06.1,
using CMake 3.18.4.

1. Make sure that the `PrgEnv-gnu` module is loaded:

```bash
       module load PrgEnv-gnu
```

2. While the MPI wrapper/compiler will be detected automatically,
   must the BLAS/LAPACK libraries be specified manually:

```bash
       cmake \
         -DCMAKE_SYSTEM_NAME=CrayLinuxEnvironment \
         -DBLAS_LIBRARIES="-lsci_gnu_mpi_mp -lhugetlbfs" \
         -DLAPACK_LIBRARIES="-lsci_gnu_mpi_mp" \
         ..
```

## Any compiler

### Custom compiler flags

In the DBCSR build system we preload the default compiler flags (especially the ones for Fortran) with flags
required to build the code with a specific compiler, while additional optimization flags are added based on the
CMake build type.

This allows the user to override optimization flags by setting a custom build type and providing optimization flags
for that build type as follows:

```bash
       cmake \
         -DCMAKE_BUILD_TYPE=custom \
         -DCMAKE_C_FLAGS_CUSTOM="-O3 -march=native" \
         -DCMAKE_CXX_FLAGS_CUSTOM="-O3 -march=native" \
         -DCMAKE_Fortran_FLAGS_CUSTOM="-O3 -march=native" \
         ..
```


================================================
FILE: docs/guide/2-user-guide/1-installation/2-supported-compilers.md
================================================
title: Supported Compilers

# Supported compilers

DBCSR uses the Fortran 2008+ standard, which requires up-to-date compilers.
Currently direct testing is done with the following compilers:

* GNU 11.2.0

Since DBCSR is a core library of CP2K, the code gets additional testing on a
wider range of systems and compilers, you can find more information about this
on [CP2K Compiler Support](https://www.cp2k.org/dev:compiler_support).


================================================
FILE: docs/guide/2-user-guide/1-installation/3-using-dbcsr-in-a-cmake-project.md
================================================
title: Using DBCSR in a CMake project

# Using DBCSR in a CMake project

We are providing CMake helper files to easily include DBCSR in any other CMake-based project.
For this you have to build DBCSR using CMake as described above and then also install it.

As a user being able to run commands as root, use:

```bash
    sudo cmake --build . -- install  # will install to /usr/local
```

If you can not run commands as root, use `-DCMAKE_INSTALL_PREFIX=...` when calling CMake to set
an alternative base installation path for DBCSR instead:

```bash
    cmake -DCMAKE_INSTALL_PREFIX=/my/custom/prefix ..
    cmake --build . -- install
```

In your project's CMake you can then easily search for the DBCSR library:

```cmake
cmake_minimum_required(VERSION 3.22)

enable_language(Fortran C CXX)  # only request the required language

find_package(DBCSR 2.0.0 CONFIG REQUIRED)
find_package(MPI)

# for Fortran:
set(CMAKE_Fortran_FLAGS "-std=f2018")  # your Fortran code likely needs to be F2018+ compatible as well
add_executable(dbcsr_example_fortran dbcsr_example.f90)
target_link_libraries(dbcsr_example_fortran DBCSR::dbcsr)

# for C:
add_executable(dbcsr_example_c dbcsr_example.c)
target_link_libraries(dbcsr_example_c DBCSR::dbcsr_c MPI::MPI_C)

# for C++:
add_executable(dbcsr_example_cpp dbcsr_example.cpp)
target_link_libraries(dbcsr_example_cpp DBCSR::dbcsr_c MPI::MPI_CXX)
```

If you installed DBCSR into a custom prefix, you have to make sure that CMake
is able to find the DBCSR CMake configuration:

```bash
    DBCSR_DIR=/my/custom/prefix cmake ..
```


================================================
FILE: docs/guide/2-user-guide/1-installation/4-docker.md
================================================
title: Docker Images

{!./tools/docker/README.md!}



================================================
FILE: docs/guide/2-user-guide/1-installation/index.md
================================================
title: Install

# Install

## Prerequisites

You need:

* [CMake](https://cmake.org/) (3.22+)
* GNU make or Ninja
* Fortran compiler which supports at least Fortran 2008 (including the TS 29113 when using the C-bindings)
* BLAS+LAPACK implementation (reference, OpenBLAS and MKL have been tested. Note: DBCSR linked to OpenBLAS 0.3.6 gives wrong results on Power9 architectures.)
* Python version installed (2.7 or 3.6+ have been tested)

Optional:

* [LIBXSMM](https://github.com/hfp/libxsmm) (1.10+, and `pkg-config`) for Small Matrix Multiplication acceleration
* LAPACK implementation (reference, OpenBLAS-bundled and MKL have been tested), required when building the tests

To build DBCSR's GPU backend:

* CUDA Toolkit (targets NVIDIA GPUs, minimal version required: 5.5) with cuBLAS
    * Host C++ compiler which supports at least C++11 standard
* or HIP compiler (targets NVIDIA or AMD GPUs) and hipBLAS (ROCm 4.5.2 was tested)
    * Host C++ compiler which supports at least C++11 standard
* or OpenCL, i.e., development headers (`opencl-headers`), generic loader "ocl-icd" (`ocl-icd-opencl-dev`),
    * Vendor specific OpenCL package, e.g.,
      [Intel Compute Runtime](https://github.com/intel/compute-runtime/releases/latest),
      or CUDA Toolkit (includes OpenCL)
    * Nvidia GPU mode shall be `DEFAULT` (`nvidia-smi -c DEFAULT`) if MPI puts multiple ranks onto a single GPU;
      MPS daemon with GPU mode `EXCLUSIVE_PROCESS` is not an option
    * For the OpenCL backend, a plain C compiler is sufficient (C90 standard)
    * Optionally, `clinfo` (can be useful to show available devices)

DBCSR is tested against GNU and Intel compilers on Linux systems, and GNU compiler on MacOS systems.
See a list of supported compilers [here](2-supported-compilers.html).

## Get DBCSR

Download either a [release tarball](https://github.com/cp2k/dbcsr/releases) or clone the latest version from Git using:

```bash
git clone --recursive https://github.com/cp2k/dbcsr.git
```

## Build

DBCSR can be compiled in four main variants:
* Serial, i.e., no OpenMP and no MPI
* OpenMP
* MPI
* OpenMP+MPI
In addition, the variants can support accelerators.

Run inside the `dbcsr` directory:

```bash
mkdir build
cd build
cmake ..
make
```

 The configuration flags for the CMake command are (default first):

```bash
-DUSE_MPI=<ON|OFF>
-DUSE_OPENMP=<ON|OFF>
-DUSE_SMM=<blas|libxsmm>
-DUSE_ACCEL=<opencl|cuda|hip>
-DWITH_CUDA_PROFILING=<OFF|ON>
-DWITH_HIP_PROFILING=<OFF|ON>
-DWITH_C_API=<ON|OFF>
-DWITH_EXAMPLES=<ON|OFF>
-DWITH_GPU=<P100|K20X|K40|K80|V100|Mi50|Mi100|Mi250|Mi300|Mi350>
-DCMAKE_BUILD_TYPE=<Release|Debug|Coverage>
-DBUILD_TESTING=<ON|OFF>
-DTEST_MPI_RANKS=<2|auto|N>
-DTEST_OMP_THREADS=<2|N>
```

When providing a build of LIBXSMM, make sure the `lib` directory is added to the `PKG_CONFIG_PATH` variable prior
to running `cmake`. For example, if LIBXSMM was checked out using Git to your home folder:

```bash
export PKG_CONFIG_PATH="${PKG_CONFIG_PATH}:${HOME}/libxsmm/lib"
```

### CMake Build Recipes

For build recipes on different platforms, make sure to also read the [CMake Build Recipes](1-cmake-build-recipes.html).

### Building with spack

DBCSR and its dependencies can be built with the [spack package manager](https://github.com/spack/spack):

```bash
spack install dbcsr +openmp +mpi +cuda cuda_arch=70
spack install dbcsr +openmp +mpi +rocm amdgpu_target=gfx906
spack install dbcsr +openmp +mpi +opencl ^cuda
```

See `spack info dbcsr` for all supported variants.

### C/C++ Interface

If MPI support is enabled (the default), the C API is automatically built.


================================================
FILE: docs/guide/2-user-guide/2-tests/index.md
================================================
title: Tests

# Tests

## Correctness tests

- [[dbcsr_unittest_1(program)]] (fortran) : test matrix operations: add, multiply and multiply-ghost
- [[dbcsr_unittest_2(program)]] (fortran) : test matrix-multiply with large blocks (block size=100) and rectangular matrices (block size=5)
- [[dbcsr_test_csr_conversions(program)]] (fortran) : test DBCSR to CSR conversion with random matrices
- [[dbcsr_tas_unittest(program)]] (fortran) : unit test for tall-and-skinny matrices
- [[dbcsr_tensor_unittest(program)]] (fortran) : unit test for tensor functionalities
- [dbcsr_tensor_test](../../../sourcefile/dbcsr_tensor_test.cpp.html) (c++) : test the tensor contraction (13|2)x(54|21)=(3|45) 31 and other functions

### GPU-backend correctness tests:

- [[dbcsr_unittest_3(program)]] (fortran) : test matrix-multiply with various block sizes that are run by the libsmm_acc GPU backend if DBCSR is compiled with GPU support
- [libsmm_acc_unittest_multiply](../../../sourcefile/libsmm_acc_unittest_multiply.cpp.html) (c++) : tests all libsmm_acc transpose kernels
- [libsmm_acc_unittest_transpose](../../../sourcefile/libsmm_acc_unittest_transpose.cpp.html) (c++) : tests all libsmm_acc batch-multiplication kernels

## Performance tests

DBCSR performance tests:

- [[dbcsr_performance_driver(program)]] (fortran) : performance tester for matrix operations. The input matrices can be described in an input file in order to test different configurations. See below.

### GPU backend performance tests:

- [libsmm_acc_timer_multiply](../../../sourcefile/libsmm_acc_timer_multiply.cpp.html) (c++) : time all libsmm_acc batch-multiplication kernels

## Running Tests

To run all the tests, use:

```bash
make test
```

Or run individual tests from the `build` directory, as follows:

```bash
srun -N 1 --ntasks-per-core 2 --ntasks-per-node 12 --cpus-per-task 2 ./tests/dbcsr_unittest_1
```

Note that the tests of libsmm_acc (the GPU-backend) do not use MPI since libsmm_acc only operates on-node.

Note that if you are using OpenMP builds, then you have to set the environment variable `OMP_NESTED=false`.

### Input Files for Performance Driver

The test suite comes with a performance driver ([[dbcsr_performance_driver(program)]]), which evaluates the performance of matrix-matrix multiplication in DBCSR.

Input matrices can be specified in an input file, passed to the executable as standard input, for example:

a) To test pure MPI performance test using [n] nodes:

```bash
mpirun -np [n] ./build/tests/dbcsr_perf tests/input.perf 2>&1 | tee perf.log
```

b) To test hybrid MPI/OpenMP performance test using [n] nodes, each spanning [t] threads:

```bash
export OMP_NUM_THREADS=[t]; mpirun -np [n] ./build/tests/dbcsr_perf tests/input.perf 2>&1 | tee perf.log
```

###  How to Write Input Files

Examples of input files can be found in `tests/inputs` for different sizes of matrices and different block sizes.

You can also write custom input files: for more information, follow the template in `tests/input.perf`.


================================================
FILE: docs/guide/2-user-guide/3-examples/index.md
================================================
title: Examples

# Examples

- [[dbcsr_example_1(program)]] : how to create a dbcsr matrix (fortran)
- [[dbcsr_example_2(program)]] : how to set a dbcsr matrix (fortran)
- dbcsr_example_3: how to multiply two dbcsr matrices (in fortran: [[dbcsr_example_3(program)]]) and in c++: [dbcsr_example_3](../../../sourcefile/dbcsr_example_3.cpp.html))
- [[dbcsr_tensor_example_1(program)]] : how to create a dbcsr matrix (fortran)
    - the example can be run with different parameters, controlling block size, sparsity, verbosity and more
- [dbcsr_tensor_example_2](../../../sourcefile/dbcsr_tensor_example_2.cpp.html): tensor contraction example (cpp)
    - tensor1 x tensor2 = tensor3, (13|2)x(54|21)=(3|45)

## Build

Compile the DBCSR library, using `-DUSE_MPI=ON -DWITH_EXAMPLES=ON`.

The examples require MPI. Furthermore, if you are using threading, MPI_THREAD_FUNNELED mode is required.

## Run

You can run the examples, for instance from the `build` directory, as follows:

```bash
srun -N 1 --ntasks-per-core 2 --ntasks-per-node 12 --cpus-per-task 2 ./examples/dbcsr_example_1
```

### Run tensor examples

How to run (this example and DBCSR for tensors in general):

- best performance is obtained by running with mpi and one openmp thread per rank.
- ideally number of mpi ranks should be composed of small prime factors (e.g. powers of 2).
- for sparse data & heterogeneous block sizes, DBCSR should be run on CPUs with libxsmm backend.
- for dense data best performance is obtained by choosing homogeneous block sizes of 64 and by compiling with GPU support.


================================================
FILE: docs/guide/2-user-guide/4-gpu/index.md
================================================
title: GPUs

# Introduction

[CP2K](https://github.com/cp2k/cp2k/) was initially enabled for GPUs by the means of the DBCSR library. The original development focused on scalability and an assumption of a `1:1`-relationship between CPUs and GPUs (one CPU-socket drives one GPU). Multi-GPU asks for associating CPU-ranks with the closest GPU (affinity), but is usually a desparture in terms of algorithms as well (GPU to GPU communication). DBCSR associates ranks with GPUs based on a round-robin scheme using the rank-ID, i.e., GPU-affinity is only achieved with the help of the underlying MPI implementation or support from other runtimes. Aggregating GPU acceleration in as little as possible systems is contrary to the original design of DBCSR (and CP2K at that time). CP2K is a versatile toolbox covering a variety of workloads (input language), which imposes several hotspots beyond DBCSR ([status](https://www.cp2k.org/gpu)).

CP2K or DBCSR can scale to thousands of nodes and furter benefit from thread-scalability once communication starts to dominate (due to higher total rank-counts). Thread-scalability (OpenMP) in DBCSR if not CP2K is not equally developed when compared to process scalability (MPI), i.e., higher rank-counts tend to yield better performance on smaller number of systems or nodes. With multiple ranks per GPU, context switches and other overhead can negatively impact performance. However, more ranks are needed to best drive the CPU-dominated portion of the code, and hence GPU and in particular multi-GPU acceleration poses a challenge.

CP2K almost exclusively uses double-precision calculations on CPUs and GPUs (along with DBCSR's need for atomic update instructions for GPUs). Consumer focused GPU offerings often deliver a FLOP-rate ratio between single and double precision up to `SP:DP = 64:1`, which renders them unsuitable for CP2K like not beneficial when compared to modestly many CPU cores. Further, GPU accleration hinges on memory bandwidth rather than compute which further limits the benefit.

# CUDA/HIP Backend

Users interested to tune kernels for the CUDA/HIP backend and LIBSMM_ACC, can take a look at the [Developer Guide](../../3-developer-guide/3-programming/2-accelerator-backend/2-libsmm_acc/3-tune.html). Following the guide, [tuned parameters](https://github.com/cp2k/dbcsr/tree/develop/src/acc/libsmm_acc/parameters) can be collected for the desired GPU and potentially submitted for the benefit of others.

# OpenCL Backend

This section shows how to auto-tune a kernel for the OpenCL based LIBSMM library. The process builds a stand-alone driver program which is then driven by an [OpenTuner](https://opentuner.org/) based script guiding the auto-tuning of the desired kernel. The [Developer Guide](../../3-developer-guide/3-programming/2-accelerator-backend/3-libsmm_ocl/1-autotune.html) provides more information, e.g., about constraining execution time or parallelizing the tuning-process as well as how to select and tune an entire set of kernels.

For simplicity, the GNU Compiler is used to build the afore mentioned driver program, both DBCSR and LIBXSMM are Git-cloned into the same common directory, e.g., the user's `HOME` directory, and the driver is built for tuning double-precision kernels (DP).

```bash
cd ${HOME}
git clone https://github.com/hfp/libxsmm.git
cd libxsmm
make GNU=1 -j

cd ${HOME}
git clone https://github.com/cp2k/dbcsr.git
cd dbcsr/src/acc/opencl
make
```

Tuning the 23x23x23-kernel for example is the default case. However, to better illustrate the options, M, N, and K are given explicitly. The `tune_multiply.py` script can be used interactively for example, and terminated with CTRL-C which writes a JSON-file with tuned parameters (note a file `.tune_multiply-double-32x32x32.json` is quietly written every time a better set of parameters is found), and then aggregates all JSON-files in the directory into a CSV-file (`tune_multiply.csv`).

```bash
cd ${HOME}/dbcsr/src/acc/opencl/smm
./tune_multiply.py 23x23x23
```

Beside of interactive termination, above process would also terminate based on OpenTuner's default or can be constrained by the number of steps (experiments), time to be spent, or a combination of both. Details can be found in the [Developer Guide](../../3-developer-guide/3-programming/2-accelerator-backend/2-libsmm_acc/3-tune.html).

Suppose the 23x23x23-kernel was tuned for some time (e.g., 5-10 minutes), tuned parameters can be incorporated into the backend. The aggregated parameters (`tune_multiply.csv`) are automatically embedded when rebuilding the library and driver.

```bash
cd ${HOME}/dbcsr/src/acc/opencl
make
```

Important kernels can be further tuned (in addition to spending more time for the process) by widening the set of tuned parameters (`--tuning-level` or `-a` with "0" denoting an unrestricted set of tunables).

```bash
cd ${HOME}/dbcsr/src/acc/opencl/smm
./tune_multiply.py 23x23x23 -a 0
```

To "continue" tuning beyond the default level, the previously found parameters must be embedded (by rebuilding the library and driver program) or can be alternatively specified at runtime (`OPENCL_LIBSMM_SMM_PARAMS=/path/to/tune_multiply.csv`).


================================================
FILE: docs/guide/2-user-guide/index.md
================================================
title: User Guide


================================================
FILE: docs/guide/3-developer-guide/1-tooling/index.md
================================================
title: Tooling

# Build System

We support CMake for compilation. See [here](../../2-user-guide/1-installation/index.html) on how to compile and
[here](../../2-user-guide/1-installation/1-cmake-build-recipes.html) for more CMake details.

Compilations is based on [Fypp](https://github.com/aradi/fypp) meta-progamming package, which is available as submodule.

# CI Setup

DBCSR's CI setup is described in [DBCSR's Github wiki](https://github.com/cp2k/dbcsr/wiki/CI-Setup).

# Development

DBCSR's development (Git branching model, commit messages, releases, etc.) is described in [DBCSR's Github wiki](https://github.com/cp2k/dbcsr/wiki/Development).



================================================
FILE: docs/guide/3-developer-guide/2-documentation/index.md
================================================
title: Documentation

# Documentation

## Build

To build the documentation you need [FORD](https://github.com/Fortran-FOSS-Programmers/ford).

Afterwards use the `doc` target for the CMake generated Makefile:

```bash
    mkdir build
    cd build
    cmake .. # will look for the `ford` binary
    make doc
```

Note that in order to generate the documentation with examples (recommended), the following options should be activated in cmake (these are the options' default values)

```bash
    cmake -DUSE_MPI=ON -DWITH_EXAMPLES=ON .. # these options are default and recommended.
                                             # If set off, the examples' documentation is not generated.
```

The documentation (HTML format) will be located in `doc/`. To view it, open `doc/index.html` in a browser.

## Add Pages

To add pages to the documentation, write Markdown files and add them to the desired location in `dbcsr/docs/guide`. Note that subfolders of `guide` will only be added to the documentation pages if they contain a file `index.md`. For more information on writing pages, see [Ford's documentation](https://github.com/Fortran-FOSS-Programmers/ford/wiki/Writing-Pages).


================================================
FILE: docs/guide/3-developer-guide/3-programming/1-overview/index.md
================================================
title: Overview

# Code Architecture

![DBCSR code architecture](dbcsr_mm_overview.png)

```
dbcsr/
-- src/
---- acc/: contains all code related to accelerators
---- base/: base routines needed to abstract away some machine/compiler dependent functionality
---- block/: block level routines
---- core/: core matrix data structure
---- data/: data handling
---- dist/: data distribution and message passing
---- mm/: matrix-matrix multiplication
---- mpi/: wrappers of the MPI routines
---- ops/: high level operations
---- tas/: tall-and-skinny matrices
---- tensors/: block-sparse tensor framework
---- utils/: utilities
---- work/
```

# Distribution Scheme

Assumed square matrix with 20x20 matrix with 5x5 blocks and a 2x2 processor grid

![DBCSR distribution over processors](dbcsr_dist.png)

![DBCSR block scheme](dbcsr_blocks.png)

# List of standard compiler flags

* OpenMP flag to enable multi-threaded parallelization, e.g. `-fopenmp` for GNU and Intel compilers.
* Warnings, e.g. `-Werror=aliasing -Werror=ampersand -Werror=c-binding-type -Werror=intrinsic-shadow -Werror=intrinsics-std -Werror=line-truncation -Werror=tabs -Werror=target-lifetime -Werror=underflow -Werror=unused-but-set-variable -Werror=unused-variable -Werror=unused-dummy-argument -Werror=conversion -Werror=zerotrip -Werror=uninitialized -Wno-maybe-uninitialized` for GNU compiler.
* Error checkings (only `Coverage` and `Debug` builds), e.g. `-fcheck=all -ffpe-trap=invalid,zero,overflow -fbacktrace -finit-real=snan -finit-integer=-42 -finit-derived -Werror=realloc-lhs -finline-matmul-limit=0` for GNU compiler.

# List of Macros used in the code

| Macro | Explanation | Language |
|-|-|-|
| `__parallel` | Enable MPI runs | Fortran |
| `__USE_MPI_F08` | Enable use of the modern `mpi_f08` module instead of the `mpi` module to reduce interfacing issues | Fortran |
| `__NO_MPI_THREAD_SUPPORT_CHECK` | Workaround for MPI libraries that do not declare they are thread safe (funneled) but you want to use them with OpenMP code anyways | Fortran |
| `__MKL` | Enable use of optimized Intel MKL functions | Fortran
| `__NO_STATM_ACCESS`, `__STATM_RESIDENT` or `__STATM_TOTAL` | Toggle memory usage reporting between resident memory and total memory. In particular, macOS users must use `-D__NO_STATM_ACCESS` | Fortran |
| `__NO_ABORT` | Avoid calling abort, but STOP instead (useful for coverage testing, and to avoid core dumps on some systems) | Fortran |
| `__LIBXSMM` | Enable [LIBXSMM](https://github.com/hfp/libxsmm/) link for optimized small matrix multiplications on CPU | Fortran |
| `__ACCELERATE` | Must be defined on macOS when Apple's Accelerate framework is used for BLAS and LAPACK (this is due to some interface incompatibilities between Accelerate and reference BLAS/LAPACK) | Fortran |
| `NDEBUG`       | Assertions are stripped ("compiled out"), `NDEBUG` is the ANSI-conforming symbol name (not `__NDEBUG`). Regular release builds may carry assertions for safety | Fortran, C, C++ |
| `__CRAY_PM_ACCEL_ENERGY` or `__CRAY_PM_ENERGY` | Switch on collectin energy profiling on Cray systems | Fortran |
| `__DBCSR_ACC` | Enable Accelerator compilation | Fortran, C, C++ |
| `__OPENCL`  | Enable OpenCL acceleration | C |
| `__CUDA_PROFILING`  | To turn on Nvidia Tools Extensions. It requires to link `-lnvToolsExt` | Fortran, C, C++ |
| `__CUDA` | Enable CUDA acceleration | C, C++ |
| `__HIP`  | Enable HIP acceleration | C, C++ |


================================================
FILE: docs/guide/3-developer-guide/3-programming/2-accelerator-backend/1-code-structure.md
================================================
title: Code Structure

# GPU Backend Code Architecture

```
dbcsr/
-- src/
---- acc/: contains interfaces to ACC and LIBSMM (top-level) as well as backends (subdirectories)
------ cuda/: CUDA backend
------ hip/: HIP backend
------ cuda_hip/: common code for CUDA and HIP
------ libsmm_acc/: small matrix-matrix operations on GPU (can use either cuda or hip interface)
------ opencl/: OpenCL backend
------ opencl/smm/: LIBSMM implementation based on OpenCL
```


================================================
FILE: docs/guide/3-developer-guide/3-programming/2-accelerator-backend/2-libsmm_acc/1-kernels.md
================================================
title: Kernels

![kernel parameters and memory](../../../../../media/images/libsmm_acc_parameters_and_memory.png){ width=50% }

{!./src/acc/libsmm_acc/kernels/README.md!}


================================================
FILE: docs/guide/3-developer-guide/3-programming/2-accelerator-backend/2-libsmm_acc/2-parameters.md
================================================
title: Kernel Parameters

# Kernel Parameters

## Batched Matrix-Matrix Multiplication Kernel Parameters

The batched matrix-matrix multiplication kernels are templated on:

* the characteristic dimensions of the multiplication: `m, n, k`
* between 3-7 kernel parameters from (`M`, `N`, `w`, `v`, `threads`, `grouping`, `minblocks`), depending on the algorithm.

## Batched Matrix Transpose Kernel Parameters

The batched transpose kernels are templated on:

* the characteristic dimensions of the transpose: `m, n`


================================================
FILE: docs/guide/3-developer-guide/3-programming/2-accelerator-backend/2-libsmm_acc/3-tune.md
================================================
title: Autotuning Framework

{!./src/acc/libsmm_acc/tune/README.md!}


================================================
FILE: docs/guide/3-developer-guide/3-programming/2-accelerator-backend/2-libsmm_acc/index.md
================================================
title: CUDA/HIP

{!./src/acc/libsmm_acc/README.md!}


================================================
FILE: docs/guide/3-developer-guide/3-programming/2-accelerator-backend/3-libsmm_ocl/1-autotune.md
================================================
title: Autotune

{!./src/acc/opencl/smm/README-autotune.md!}


================================================
FILE: docs/guide/3-developer-guide/3-programming/2-accelerator-backend/3-libsmm_ocl/2-bulktune.md
================================================
title: Parameters

{!./src/acc/opencl/smm/README-bulktune.md!}


================================================
FILE: docs/guide/3-developer-guide/3-programming/2-accelerator-backend/3-libsmm_ocl/index.md
================================================
title: OpenCL

{!./src/acc/opencl/README.md!}

{!./src/acc/opencl/smm/README.md!}


================================================
FILE: docs/guide/3-developer-guide/3-programming/2-accelerator-backend/index.md
================================================
title: Accelerator Backend

{!./src/acc/README.md!}


================================================
FILE: docs/guide/3-developer-guide/3-programming/index.md
================================================
title: Programming


================================================
FILE: docs/guide/3-developer-guide/4-performance/1-insights.md
================================================
title: Insights

# Insights into Performance

## Read Timing & Statistics Reports

At the end of an output file, a report of DBCSR's statistics and timings can be found.

### Statistics

The STATISTICS section of the output file provides some information on matrix-matrix multiplications that were run and their performance characteristics.

Example:

```
-------------------------------------------------------------------------------
-                                                                             -
-                                DBCSR STATISTICS                             -
-                                                                             -
-------------------------------------------------------------------------------
COUNTER                                    TOTAL       BLAS       SMM       ACC
flops    23 x    23 x    23         687272462200       0.0%      0.0%    100.0%
flops inhomo. stacks                           0       0.0%      0.0%      0.0%
flops total                       687.272462E+09       0.0%      0.0%    100.0%
flops max/rank                    687.272462E+09       0.0%      0.0%    100.0%
matmuls inhomo. stacks                         0       0.0%      0.0%      0.0%
matmuls total                           28243300       0.0%      0.0%    100.0%
number of processed stacks                  1600       0.0%      0.0%    100.0%
average stack size                                     0.0       0.0   17652.1
marketing flops                     1.076458E+12
-------------------------------------------------------------------------------
# multiplications                             50
max memory usage/rank              16.650822E+09
# max total images/rank                        1
# max 3D layers                                1
# MPI messages exchanged                       0
MPI messages size (bytes):
 total size                         0.000000E+00
 min size                           0.000000E+00
 max size                           0.000000E+00
 average size                       0.000000E+00
MPI breakdown and total messages size (bytes):
            size <=      128                   0                        0
      128 < size <=     8192                   0                        0
     8192 < size <=    32768                   0                        0
    32768 < size <=   131072                   0                        0
   131072 < size <=  4194304                   0                        0
  4194304 < size <= 16777216                   0                        0
 16777216 < size                               0                        0
-------------------------------------------------------------------------------
```

#### How to Read the Columns

- `TOTAL`: total flops
- `BLAS`: percentage of flops run on BLAS (this could be CUBLAS or HIPBLAS)
- `SMM`: percentage of flops run on SMM (libsmm or libxsmm, CPU)
- `ACC`: percentage of flops run on ACC (libsmm_acc, DBCSR's GPU-accelerated backend)

#### How to Read the Rows (Counters)

Every time "matrix-matrix multiplication" is mentionned in this paragraph, it refers *not* to the sparse multiplication of large matrices, but the multiplication of small dense blocks that the large sparse matrix was decomposed into.

- `flops    23 x    23 x    23`: indicates that batched matrix-matrix multiplication kernels with matrix dimensions (m, n, k) = (23, 23, 23) was run, and provides info on its flops. If several batched matrix-matrix multiplications of different matrix dimensions (m, n, k) were run, they would appear as subsequent separate rows.
- `flops inhomo. stacks`: flops of so-called "inhomogeneous stacks". These are stacks of batched-matrix-matrix multiplications where not all multiplications contained have the same matrix dimensions (m, n, k).
- `flops total`: total flops for all stacks of matrix-matrix multiplication.
- `flops max/rank`: flops of the MPI rank with the most flops.
- `matmuls inhomo. stacks`: number of matrix-matrix multiplications run in inhomogeneous stacks.
- `matmuls total`: number of matrix-matrix multiplications run in total.
- `number of processed stacks`: number of stacks of batched matrix-matrix multiplication.
- `average stack size`: average over all stacks of the stack size (i.e. the number of matrix-matrix multiplications that a stack contains).

### Timings

Example of the statistics section of the output file:

```
-------------------------------------------------------------------------------
-                                                                             -
-                                T I M I N G                                  -
-                                                                             -
-------------------------------------------------------------------------------
SUBROUTINE                       CALLS  ASD         SELF TIME        TOTAL TIME MAXRANK
                               MAXIMUM       AVERAGE  MAXIMUM  AVERAGE  MAXIMUM
dbcsr_performance_driver             1  1.0    0.000    0.000  102.563  102.563       0
dbcsr_perf_multiply_low              1  2.0    0.002    0.002  102.563  102.563       0
perf_multiply                        1  3.0    0.003    0.003  102.077  102.077       0
[...]
-------------------------------------------------------------------------------
```

The columns describe:

- `SUBROUTINE`: the name of the fortran subroutine (or c++ function) timed.
- `CALLS`: number of times the subroutine was called.
- `ASD`: average stack depth: the average number of entries on the call stack when this subroutine is called.
- `SELF TIME`: how much time is spent in the subroutine, or in non-timed subroutines called by this subroutine.
    - `AVERAGE`: the self time averaged over all MPI ranks,
    - `MAXIMUM`: the self time maximum over all MPI ranks,
    - `AVERAGE` and `MAXIMUM` can be used to locate load-imbalance or synchronization points.
- `TOTAL TIME`: how much time is spent in the subroutine, including the time spent in timed subroutines.
    - `AVERAGE`: averaged over all MPI ranks
    - `MAXIMUM`: maximum over all MPI ranks
    - `AVERAGE` and `MAXIMUM` can be used to locate load-imbalance or synchronization points.
- `MAXRANKS`:

#### Time spent in Just-In-Time (JIT) Compilation

For performance debugging and in order to check how much time a program spends doing JIT, look for the functions `jit_kernel_multiply` and `jit_kernel_transpose`.

#### How to Time a Function

By default, the most important subroutines are timed in DBCSR.

If you want to time a subroutine or function that is not timed already, call `timeset` with a routine name and a handle at the beginning of the function, and `timestop` with the same handle at the end of the function.

For examples, just `grep` for `timeset` and `timestop` in the codebase.

This can be done both in fortran code and in the c++ code.


================================================
FILE: docs/guide/3-developer-guide/4-performance/2-just-in-time-compilation.md
================================================
title: JIT

# Just-In-Time (JIT) Compilation

DBCSR's GPU backends rely on templated kernels for batched matrix multiplication and matrix transpose (CUDA/HIP as well as OpenCL backend). If DBCSR were to compile kernels for all possible triplets or MxNxKs (in the case of transposes, for all possible MxNs) ahead-of-time (AOT), it would not only bloat the size of the library but also take a long time to compile the code. Reducing the number of triplets to a "practical set" would either sacrifice performance or limit potential acceleration to certain workloads.

Instead, kernels are generated Just-In-Time (JIT) or "on the fly", i.e., at runtime, as they are requested by the application and workload. For LIBSMM_ACC, the JIT infrastructure is based on [CUDA NVRTC](https://docs.nvidia.com/cuda/nvrtc/), a runtime compilation library for CUDA C++. The OpenCL based LIBSMM relies on the OpenCL runtime library to perform JIT compilation.

No matter which runtime is used and whether JIT compilation is in the order of ~500ms per kernel or not, the compilation time becomes significant during the process of auto-tuning a set of kernels. Therefore extra documentation is provided in either case (CUDA/HIP or OpenCL) on how to collect tuned parameters or to eventually submit a set of tuned parameters for the benefit of others.

To check how much time a program spends for JIT compilation, GPU-backends contribute `jit_kernel_multiply` and `jit_kernel_transpose` entries to the [timings report](1-insights.html). This report appears when the application terminates (final output). The OpenCL backend supports additional [Runtime Settings](../../3-developer-guide/3-programming/2-accelerator-backend/3-opencl-backend.html), e.g., to report compilation time on a per-kernel basis (`ACC_OPENCL_VERBOSE=2`).


================================================
FILE: docs/guide/3-developer-guide/4-performance/index.md
================================================
title: Performance


================================================
FILE: docs/guide/3-developer-guide/index.md
================================================
title: Developer Guide


================================================
FILE: docs/guide/index.md
================================================
title: Guide



================================================
FILE: examples/.gitignore
================================================
*.x


================================================
FILE: examples/CMakeLists.txt
================================================
set(DBCSR_PROGRAM_SRCS_FTN dbcsr_example_1.F dbcsr_example_2.F
                           dbcsr_example_3.F dbcsr_tensor_example_1.F)

set(DBCSR_PROGRAM_SRCS_CPP dbcsr_example_3.cpp dbcsr_tensor_example_2.cpp)

# Compile Fortran examples
foreach (dbcsr_program_src ${DBCSR_PROGRAM_SRCS_FTN})
  get_filename_component(dbcsr_program_name ${dbcsr_program_src} NAME_WE)
  add_executable(${dbcsr_program_name} ${dbcsr_program_src})
  target_link_libraries(${dbcsr_program_name} dbcsr)
  if (OpenMP_FOUND)
    target_link_libraries(${dbcsr_program_name} OpenMP::OpenMP_Fortran)
  endif ()

  # with the Intel compiler CMake 3.12 seems to forget that the source is
  # actually Fortran and needs to be told explicitly:
  if (NOT "${CMAKE_Fortran_COMPILER_ID}" STREQUAL IntelLLVM)
    set_target_properties(${dbcsr_program_name} PROPERTIES LINKER_LANGUAGE
                                                           Fortran)
  endif ()
endforeach ()

# override -Werror for certain translation units
if ((CMAKE_Fortran_COMPILER_ID STREQUAL "GNU")
    AND (CMAKE_Fortran_COMPILER_VERSION VERSION_GREATER_EQUAL 10))
  set_source_files_properties(dbcsr_tensor_example_1.F PROPERTIES COMPILE_FLAGS
                                                                  -Wno-error)
endif ()

# Compile C++ examples
if (WITH_C_API)
  foreach (dbcsr_program_src ${DBCSR_PROGRAM_SRCS_CPP})
    get_filename_component(dbcsr_program_name ${dbcsr_program_src} NAME_WE)
    set(dbcsr_program_name ${dbcsr_program_name}_cpp)
    add_executable(${dbcsr_program_name} ${dbcsr_program_src})
    target_link_libraries(${dbcsr_program_name} dbcsr_c MPI::MPI_CXX)
    if (NOT "${CMAKE_Fortran_COMPILER_ID}" STREQUAL IntelLLVM)
      set_target_properties(${dbcsr_program_name} PROPERTIES LINKER_LANGUAGE
                                                             Fortran)
    endif ()
    if (OpenMP_FOUND)
      target_compile_options(${dbcsr_program_name} PRIVATE ${OpenMP_CXX_FLAGS})
      target_link_libraries(${dbcsr_program_name} OpenMP::OpenMP_Fortran)
    endif ()

    if (CMAKE_CXX_COMPILER_ID STREQUAL "Cray")
      # for recent Cray compiler versions CMake doesn't know
      target_compile_options(${dbcsr_program_name} PRIVATE "-hstd=c++14")
    else ()
      target_compile_features(${dbcsr_program_name} PRIVATE cxx_std_14)
    endif ()
  endforeach ()
endif ()

# =================================== DOCUMENTATION GENERATION Copy example
# source files into the build directory so that their documentation can be
# generated by FORD

set(DBCSR_PROGRAM_SRCS ${DBCSR_PROGRAM_SRCS_FTN} ${DBCSR_PROGRAM_SRCS_CPP})

# Make a list of the copy commands
set(example_copy_commands)
foreach (example ${DBCSR_PROGRAM_SRCS})
  list(
    APPEND
    example_copy_commands
    COMMAND
    ${CMAKE_COMMAND}
    -E
    copy
    ${CMAKE_SOURCE_DIR}/examples/${example}
    ${CMAKE_BINARY_DIR}/examples)
endforeach ()

add_custom_target(
  doc_copy_examples
  COMMENT "Copy examples for documentation generation"
  COMMAND mkdir -p ${CMAKE_BINARY_DIR}/examples ${example_copy_commands}
  VERBATIM)


================================================
FILE: examples/README.md
================================================
# Examples

- [`dbcsr_example_1`](dbcsr_example_1.F): how to create a dbcsr matrix (fortran)
- [`dbcsr_example_2`](dbcsr_example_2.F): how to set a dbcsr matrix (fortran)
- `dbcsr_example_3`: how to multiply two dbcsr matrices ([fortran](dbcsr_example_3.F) and [cpp](dbcsr_example_3.cpp))
- [`dbcsr_tensor_example_1`](dbcsr_tensor_example_1.F): how to create a dbcsr matrix (fortran)
    - the example can be run with different parameters, controlling block size, sparsity, verbosity and more
- [`dbcsr_tensor_example_2`](dbcsr_tensor_example_2.cpp): tensor contraction example (cpp)
    - tensor1 x tensor2 = tensor3, (13|2)x(54|21)=(3|45)

## Build

Compile the DBCSR library, using `-DUSE_MPI=ON -DWITH_EXAMPLES=ON`.

The examples require MPI. Furthermore, if you are using threading, MPI_THREAD_FUNNELED mode is required.

## Run

You can run the examples, for instance from the `build` directory, as follows:

```bash
srun -N 1 --ntasks-per-core 2 --ntasks-per-node 12 --cpus-per-task 2 ./examples/dbcsr_example_1
```

### Run tensor examples

How to run (this example and DBCSR for tensors in general):
* best performance is obtained by running with mpi and one openmp thread per rank.
* ideally number of mpi ranks should be composed of small prime factors (e.g. powers of 2).
* for sparse data & heterogeneous block sizes, DBCSR should be run on CPUs with libxsmm backend.
* for dense data best performance is obtained by choosing homogeneous block sizes of 64 and by compiling with GPU support.


================================================
FILE: examples/dbcsr_example_1.F
================================================
!--------------------------------------------------------------------------------------------------!
! Copyright (C) by the DBCSR developers group - All rights reserved                                !
! This file is part of the DBCSR library.                                                          !
!                                                                                                  !
! For information on the license, see the LICENSE file.                                            !
! For further information please visit https://dbcsr.cp2k.org                                      !
! SPDX-License-Identifier: GPL-2.0+                                                                !
!--------------------------------------------------------------------------------------------------!

PROGRAM dbcsr_example_1
   !! DBCSR example 1:
   !! This example shows how to create a DBCSR matrix

   USE mpi
   USE dbcsr_api, ONLY: &
      dbcsr_create, dbcsr_distribution_new, dbcsr_distribution_release, dbcsr_distribution_type, &
      dbcsr_finalize, dbcsr_finalize_lib, dbcsr_init_lib, dbcsr_print, dbcsr_release, &
      dbcsr_type, dbcsr_type_no_symmetry, dbcsr_type_real_8

   IMPLICIT NONE

   TYPE(dbcsr_type)                         :: matrix_a

   INTEGER, DIMENSION(:), POINTER           :: col_blk_sizes, row_blk_sizes
   INTEGER                                  :: group, numnodes, mynode, nblkrows_total, &
                                               nblkcols_total, ierr
   INTEGER, DIMENSION(2)                    :: npdims
   INTEGER, DIMENSION(:), POINTER           :: col_dist, row_dist
   TYPE(dbcsr_distribution_type)            :: dist
   LOGICAL, DIMENSION(2)                    :: period = .TRUE.
!$ INTEGER                                  :: provided_tsl

   !***************************************************************************************

   !
   ! initialize mpi
!$ IF (.FALSE.) THEN
      CALL mpi_init(ierr)
      IF (ierr /= 0) STOP "Error in MPI_Init"
!$ ELSE
!$    CALL mpi_init_thread(MPI_THREAD_FUNNELED, provided_tsl, ierr)
!$    IF (ierr /= 0) STOP "Error in MPI_Init_thread"
!$    IF (provided_tsl .LT. MPI_THREAD_FUNNELED) THEN
!$       STOP "MPI library does not support the requested level of threading (MPI_THREAD_FUNNELED)."
!$    END IF
!$ END IF

   !
   ! setup the mpi environment
   CALL mpi_comm_size(MPI_COMM_WORLD, numnodes, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Comm_size"
   npdims(:) = 0
   CALL mpi_dims_create(numnodes, 2, npdims, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Dims_create"
   CALL mpi_cart_create(MPI_COMM_WORLD, 2, npdims, period, .FALSE., group, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Cart_create"

   CALL mpi_comm_rank(MPI_COMM_WORLD, mynode, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Comm_rank"
   WRITE (*, *) 'mynode ', mynode, ' numnodes', numnodes

   !***************************************************************************************
   !
   ! initialize the DBCSR library
   CALL dbcsr_init_lib(MPI_COMM_WORLD)

   !
   ! the matrix will contain nblkrows_total row blocks and nblkcols_total column blocks
   nblkrows_total = 4
   nblkcols_total = 3

   !
   ! set the block size for each row and column
   ALLOCATE (row_blk_sizes(nblkrows_total), col_blk_sizes(nblkcols_total))
   row_blk_sizes(:) = 2
   col_blk_sizes(:) = 3

   !
   ! set the row and column distributions (here the distribution is set randomly)
   CALL random_dist(row_dist, nblkrows_total, npdims(1))
   CALL random_dist(col_dist, nblkcols_total, npdims(2))

   !
   ! set the DBCSR distribution object
   CALL dbcsr_distribution_new(dist, group=group, row_dist=row_dist, col_dist=col_dist, reuse_arrays=.TRUE.)

   !
   ! create the DBCSR matrix, i.e. a double precision non symmetric matrix
   ! with nblkrows_total x nblkcols_total blocks and
   ! sizes "sum(row_blk_sizes)" x "sum(col_blk_sizes)", distributed as
   ! specified by the dist object
   CALL dbcsr_create(matrix=matrix_a, &
                     name="this is my matrix a", &
                     dist=dist, &
                     matrix_type=dbcsr_type_no_symmetry, &
                     row_blk_size=row_blk_sizes, &
                     col_blk_size=col_blk_sizes, &
                     data_type=dbcsr_type_real_8, &
                     reuse_arrays=.TRUE.)

   !
   ! finalize the DBCSR matrix
   CALL dbcsr_finalize(matrix_a)

   !
   ! print the *empty* matrix
   CALL dbcsr_print(matrix_a)

   !
   ! release the matrix
   CALL dbcsr_release(matrix_a)

   !
   ! release the distribution
   CALL dbcsr_distribution_release(dist)

   !***************************************************************************************
   !

   ! free comm
   CALL mpi_comm_free(group, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Comm_free"

   ! finalize the DBCSR library
   CALL dbcsr_finalize_lib()

   !
   ! finalize mpi
   CALL mpi_finalize(ierr)
   IF (ierr /= 0) STOP "Error in MPI_finalize"

   !***************************************************************************************

CONTAINS

   SUBROUTINE random_dist(dist_array, dist_size, nbins)
      INTEGER, DIMENSION(:), INTENT(out), POINTER        :: dist_array
      INTEGER, INTENT(in)                                :: dist_size, nbins

      INTEGER                                            :: i

      ALLOCATE (dist_array(dist_size))
      DO i = 1, dist_size
         dist_array(i) = MODULO(nbins - i, nbins)
      END DO

   END SUBROUTINE random_dist

END PROGRAM dbcsr_example_1


================================================
FILE: examples/dbcsr_example_2.F
================================================
!--------------------------------------------------------------------------------------------------!
! Copyright (C) by the DBCSR developers group - All rights reserved                                !
! This file is part of the DBCSR library.                                                          !
!                                                                                                  !
! For information on the license, see the LICENSE file.                                            !
! For further information please visit https://dbcsr.cp2k.org                                      !
! SPDX-License-Identifier: GPL-2.0+                                                                !
!--------------------------------------------------------------------------------------------------!

PROGRAM dbcsr_example_2
   !! DBCSR example 2:
   !! This example shows how to set a DBCSR matrix

   USE mpi
   USE dbcsr_api, ONLY: &
      dbcsr_create, dbcsr_distribution_get, dbcsr_distribution_new, dbcsr_distribution_release, &
      dbcsr_distribution_type, dbcsr_finalize, dbcsr_finalize_lib, dbcsr_get_stored_coordinates, &
      dbcsr_init_lib, dbcsr_nblkcols_total, dbcsr_nblkrows_total, dbcsr_print, dbcsr_put_block, &
      dbcsr_release, dbcsr_type, dbcsr_type_no_symmetry, dbcsr_type_real_8

   IMPLICIT NONE

   TYPE(dbcsr_type)                         :: matrix_a

   INTEGER, DIMENSION(:), POINTER           :: col_blk_sizes, row_blk_sizes
   INTEGER                                  :: group, numnodes, mynode, ierr, &
                                               nblkrows_total, nblkcols_total, node_holds_blk, max_nze, nze, &
                                               row, col, row_s, col_s, max_row_size, max_col_size
   INTEGER, DIMENSION(2)                    :: npdims
   INTEGER, DIMENSION(:), POINTER           :: col_dist, row_dist
   TYPE(dbcsr_distribution_type)            :: dist
   REAL(KIND=KIND(0.0D0)), DIMENSION(:), ALLOCATABLE :: values
   LOGICAL, DIMENSION(2)                    :: period = .TRUE.
!$ INTEGER                                  :: provided_tsl

   !***************************************************************************************

   !
   ! initialize mpi
!$ IF (.FALSE.) THEN
      CALL mpi_init(ierr)
      IF (ierr /= 0) STOP "Error in MPI_Init"
!$ ELSE
!$    CALL mpi_init_thread(MPI_THREAD_FUNNELED, provided_tsl, ierr)
!$    IF (ierr /= 0) STOP "Error in MPI_Init_thread"
!$    IF (provided_tsl .LT. MPI_THREAD_FUNNELED) THEN
!$       STOP "MPI library does not support the requested level of threading (MPI_THREAD_FUNNELED)."
!$    END IF
!$ END IF

   !
   ! setup the mpi environment
   CALL mpi_comm_size(MPI_COMM_WORLD, numnodes, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Comm_size"
   npdims(:) = 0
   CALL mpi_dims_create(numnodes, 2, npdims, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Dims_create"
   CALL mpi_cart_create(MPI_COMM_WORLD, 2, npdims, period, .FALSE., group, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Cart_create"

   CALL mpi_comm_rank(MPI_COMM_WORLD, mynode, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Comm_rank"
   WRITE (*, *) 'mynode ', mynode, ' numnodes', numnodes

   !***************************************************************************************
   !
   ! initialize the DBCSR library
   CALL dbcsr_init_lib(MPI_COMM_WORLD)

   !
   ! the matrix will contain nblkrows_total row blocks and nblkcols_total column blocks
   nblkrows_total = 4
   nblkcols_total = 4

   !
   ! set the block size for each row and column
   ALLOCATE (row_blk_sizes(nblkrows_total), col_blk_sizes(nblkcols_total))
   row_blk_sizes(:) = 2
   col_blk_sizes(:) = 2

   !
   ! set the row and column distributions (here the distribution is set randomly)
   CALL random_dist(row_dist, nblkrows_total, npdims(1))
   CALL random_dist(col_dist, nblkcols_total, npdims(2))

   !
   ! set the DBCSR distribution object
   CALL dbcsr_distribution_new(dist, group=group, row_dist=row_dist, col_dist=col_dist, reuse_arrays=.TRUE.)

   !
   ! create the DBCSR matrix, i.e. a double precision non symmetric matrix
   ! with nblkrows_total x nblkcols_total blocks and
   ! sizes "sum(row_blk_sizes)" x "sum(col_blk_sizes)", distributed as
   ! specified by the dist object
   CALL dbcsr_create(matrix=matrix_a, &
                     name="this is my matrix a", &
                     dist=dist, &
                     matrix_type=dbcsr_type_no_symmetry, &
                     row_blk_size=row_blk_sizes, &
                     col_blk_size=col_blk_sizes, &
                     data_type=dbcsr_type_real_8)

   !
   ! get the node id from the matrix
   CALL dbcsr_distribution_get(dist, mynode=mynode)

   !
   ! get the maximum block size of the matrix
   max_row_size = MAXVAL(row_blk_sizes)
   max_col_size = MAXVAL(col_blk_sizes)
   max_nze = max_row_size*max_col_size

   !
   ! allocate a 1d buffer that is needed to put a block
   ! into the matrix (2d buffer can also be used)
   ALLOCATE (values(max_nze))

   !
   ! loop over the blocks, build a tridiagonal matrix
   DO row = 1, dbcsr_nblkrows_total(matrix_a)
      DO col = MAX(row - 1, 1), MIN(row + 1, dbcsr_nblkcols_total(matrix_a))
         !
         ! get the node id that holds this (row, col) block
         row_s = row; col_s = col
         CALL dbcsr_get_stored_coordinates(matrix_a, row_s, col_s, node_holds_blk)
         !
         ! put the block on the right node
         IF (node_holds_blk .EQ. mynode) THEN
            !
            ! get the size of the block
            nze = row_blk_sizes(row_s)*col_blk_sizes(col_s)
            !
            ! fill the matrix with the random block
            CALL RANDOM_NUMBER(values(1:nze))
            CALL dbcsr_put_block(matrix_a, row_s, col_s, values(1:nze))
         END IF
      END DO
   END DO
   DEALLOCATE (values)

   !
   ! finalize the DBCSR matrix
   CALL dbcsr_finalize(matrix_a)

   !
   ! print the matrix
   CALL dbcsr_print(matrix_a)

   !
   ! release the matrix
   CALL dbcsr_release(matrix_a)

   CALL dbcsr_distribution_release(dist)
   DEALLOCATE (row_blk_sizes, col_blk_sizes)

   !***************************************************************************************
   !

   ! free comm
   CALL mpi_comm_free(group, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Comm_free"

   ! finalize the DBCSR library
   CALL dbcsr_finalize_lib()

   !
   ! finalize mpi
   CALL mpi_finalize(ierr)
   IF (ierr /= 0) STOP "Error in MPI_finalize"

   !***************************************************************************************

CONTAINS

   SUBROUTINE random_dist(dist_array, dist_size, nbins)
      INTEGER, DIMENSION(:), INTENT(out), POINTER        :: dist_array
      INTEGER, INTENT(in)                                :: dist_size, nbins

      INTEGER                                            :: i

      ALLOCATE (dist_array(dist_size))
      DO i = 1, dist_size
         dist_array(i) = MODULO(nbins - i, nbins)
      END DO

   END SUBROUTINE random_dist

END PROGRAM dbcsr_example_2


================================================
FILE: examples/dbcsr_example_3.F
================================================
!--------------------------------------------------------------------------------------------------!
! Copyright (C) by the DBCSR developers group - All rights reserved                                !
! This file is part of the DBCSR library.                                                          !
!                                                                                                  !
! For information on the license, see the LICENSE file.                                            !
! For further information please visit https://dbcsr.cp2k.org                                      !
! SPDX-License-Identifier: GPL-2.0+                                                                !
!--------------------------------------------------------------------------------------------------!

PROGRAM dbcsr_example_3
   !! DBCSR example 3:
   !! This example shows how to multiply two dbcsr matrices

   USE mpi
   USE dbcsr_api, ONLY: &
      dbcsr_create, dbcsr_distribution_get, dbcsr_distribution_new, dbcsr_distribution_release, &
      dbcsr_distribution_type, dbcsr_finalize, dbcsr_finalize_lib, dbcsr_get_stored_coordinates, &
      dbcsr_init_lib, dbcsr_multiply, dbcsr_nblkcols_total, dbcsr_nblkrows_total, dbcsr_print, &
      dbcsr_put_block, dbcsr_release, dbcsr_type, dbcsr_type_no_symmetry, dbcsr_type_real_8

   IMPLICIT NONE

   TYPE(dbcsr_type)                         :: matrix_a, matrix_b, matrix_c

   INTEGER, DIMENSION(:), POINTER           :: col_blk_sizes, row_blk_sizes
   INTEGER                                  :: group, numnodes, mynode, ierr, &
                                               nblkrows_total, nblkcols_total, &
                                               node_holds_blk, max_nze, nze, row, col, row_s, col_s, &
                                               max_row_size, max_col_size
   INTEGER, DIMENSION(2)                    :: npdims
   INTEGER, DIMENSION(:), POINTER           :: col_dist, row_dist
   TYPE(dbcsr_distribution_type)            :: dist
   REAL(KIND=KIND(0.0D0)), DIMENSION(:), ALLOCATABLE :: values
   LOGICAL, DIMENSION(2)                    :: period = .TRUE.
!$ INTEGER                                  :: provided_tsl

   !***************************************************************************************

   !
   ! initialize mpi
!$ IF (.FALSE.) THEN
      CALL mpi_init(ierr)
      IF (ierr /= 0) STOP "Error in MPI_Init"
!$ ELSE
!$    CALL mpi_init_thread(MPI_THREAD_FUNNELED, provided_tsl, ierr)
!$    IF (ierr /= 0) STOP "Error in MPI_Init_thread"
!$    IF (provided_tsl .LT. MPI_THREAD_FUNNELED) THEN
!$       STOP "MPI library does not support the requested level of threading (MPI_THREAD_FUNNELED)."
!$    END IF
!$ END IF

   !
   ! setup the mpi environment
   CALL mpi_comm_size(MPI_COMM_WORLD, numnodes, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Comm_size"
   npdims(:) = 0
   CALL mpi_dims_create(numnodes, 2, npdims, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Dims_create"
   CALL mpi_cart_create(MPI_COMM_WORLD, 2, npdims, period, .FALSE., group, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Cart_create"

   CALL mpi_comm_rank(MPI_COMM_WORLD, mynode, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Comm_rank"
   WRITE (*, *) 'mynode ', mynode, ' numnodes', numnodes

   !***************************************************************************************
   !
   ! initialize the DBCSR library
   CALL dbcsr_init_lib(MPI_COMM_WORLD)

   !
   ! the matrix will contain nblkrows_total row blocks and nblkcols_total column blocks
   nblkrows_total = 4
   nblkcols_total = 4

   !
   ! set the block size for each row and column
   ALLOCATE (row_blk_sizes(nblkrows_total), col_blk_sizes(nblkcols_total))
   row_blk_sizes(:) = 2
   col_blk_sizes(:) = 2

   !
   ! set the row and column distributions (here the distribution is set randomly)
   CALL random_dist(row_dist, nblkrows_total, npdims(1))
   CALL random_dist(col_dist, nblkcols_total, npdims(2))

   !
   ! set the dbcsr distribution object
   CALL dbcsr_distribution_new(dist, group=group, row_dist=row_dist, col_dist=col_dist, reuse_arrays=.TRUE.)

   !
   ! create the dbcsr matrices, i.e. a double precision non symmetric matrix
   ! with nblkrows_total x nblkcols_total blocks and
   ! sizes "sum(row_blk_sizes)" x "sum(col_blk_sizes)", distributed as
   ! specified by the dist object
   CALL dbcsr_create(matrix=matrix_a, &
                     name="this is my matrix a", &
                     dist=dist, &
                     matrix_type=dbcsr_type_no_symmetry, &
                     row_blk_size=row_blk_sizes, &
                     col_blk_size=col_blk_sizes, &
                     data_type=dbcsr_type_real_8)

   CALL dbcsr_create(matrix=matrix_b, &
                     name="this is my matrix b", &
                     dist=dist, &
                     matrix_type=dbcsr_type_no_symmetry, &
                     row_blk_size=row_blk_sizes, &
                     col_blk_size=col_blk_sizes, &
                     data_type=dbcsr_type_real_8)

   CALL dbcsr_create(matrix=matrix_c, &
                     name="this is my matrix c", &
                     dist=dist, &
                     matrix_type=dbcsr_type_no_symmetry, &
                     row_blk_size=row_blk_sizes, &
                     col_blk_size=col_blk_sizes, &
                     data_type=dbcsr_type_real_8)

   ! get the maximum block size of the matrix
   max_row_size = MAXVAL(row_blk_sizes)
   max_col_size = MAXVAL(col_blk_sizes)
   max_nze = max_row_size*max_col_size

   !
   ! set up the a matrix
   CALL dbcsr_distribution_get(dist, mynode=mynode)
   ALLOCATE (values(max_nze))
   DO row = 1, dbcsr_nblkrows_total(matrix_a)
      DO col = MAX(row - 1, 1), MIN(row + 1, dbcsr_nblkcols_total(matrix_a))
         row_s = row; col_s = col
         CALL dbcsr_get_stored_coordinates(matrix_a, row_s, col_s, node_holds_blk)
         IF (node_holds_blk .EQ. mynode) THEN
            nze = row_blk_sizes(row_s)*col_blk_sizes(col_s)
            CALL RANDOM_NUMBER(values(1:nze))
            CALL dbcsr_put_block(matrix_a, row_s, col_s, values(1:nze))
         END IF
      END DO
   END DO
   DEALLOCATE (values)

   !
   ! set up the b matrix
   CALL dbcsr_distribution_get(dist, mynode=mynode)
   ALLOCATE (values(max_nze))
   DO row = 1, dbcsr_nblkrows_total(matrix_b)
      DO col = MAX(row - 1, 1), MIN(row + 1, dbcsr_nblkcols_total(matrix_b))
         row_s = row; col_s = col
         CALL dbcsr_get_stored_coordinates(matrix_b, row_s, col_s, node_holds_blk)
         IF (node_holds_blk .EQ. mynode) THEN
            nze = row_blk_sizes(row_s)*col_blk_sizes(col_s)
            CALL RANDOM_NUMBER(values(1:nze))
            CALL dbcsr_put_block(matrix_b, row_s, col_s, values(1:nze))
         END IF
      END DO
   END DO
   DEALLOCATE (values)

   !
   ! finalize the dbcsr matrices
   CALL dbcsr_finalize(matrix_a)
   CALL dbcsr_finalize(matrix_b)
   CALL dbcsr_finalize(matrix_c)

   !
   ! multiply the matrices
   CALL dbcsr_multiply('N', 'N', 1.0D0, matrix_a, matrix_b, 0.0D0, matrix_c)

   !
   ! print the matrices
   CALL dbcsr_print(matrix_a)
   CALL dbcsr_print(matrix_b)
   CALL dbcsr_print(matrix_c)

   !
   ! release the matrices
   CALL dbcsr_release(matrix_a)
   CALL dbcsr_release(matrix_b)
   CALL dbcsr_release(matrix_c)

   CALL dbcsr_distribution_release(dist)
   DEALLOCATE (row_blk_sizes, col_blk_sizes)

   ! free comm
   CALL mpi_comm_free(group, ierr)
   IF (ierr /= 0) STOP "Error in MPI_Comm_free"

   ! finalize the DBCSR library
   CALL dbcsr_finalize_lib()

   !
   ! finalize mpi
   CALL mpi_finalize(ierr)
   IF (ierr /= 0) STOP "Error in MPI_finalize"

   !***************************************************************************************

CONTAINS

   SUBROUTINE random_dist(dist_array, dist_size, nbins)
      INTEGER, DIMENSION(:), INTENT(out), POINTER        :: dist_array
      INTEGER, INTENT(in)                                :: dist_size, nbins

      INTEGER                                            :: i

      ALLOCATE (dist_array(dist_size))
      DO i = 1, dist_size
         dist_array(i) = MODULO(nbins - i, nbins)
      END DO

   END SUBROUTINE random_dist

END PROGRAM dbcsr_example_3


================================================
FILE: examples/dbcsr_example_3.cpp
================================================
/*------------------------------------------------------------------------------------------------*/
/* Copyright (C) by the DBCSR developers group - All rights reserved                              */
/* This file is part of the DBCSR library.                                                        */
/*                                                                                                */
/* For information on the license, see the LICENSE file.                                          */
/* For further information please visit https://dbcsr.cp2k.org                                    */
/* SPDX-License-Identifier: GPL-2.0+                                                              */
/*------------------------------------------------------------------------------------------------*/

#include <algorithm>
#include <cstdint>
#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <random>
#include <vector>

#include <mpi.h>

#include <dbcsr.h>

// Random distribution by using round-robin assignment
// of blocks to processors
std::vector<int> random_dist(int dist_size, int nbins) {
  std::vector<int> dist(dist_size);

  for (int i = 0; i < dist_size; i++) dist[i] = i % nbins;

  return dist;
}

// DBCSR example 3
// This example shows how to multiply two DBCSR matrices
int main(int argc, char* argv[]) {
  // initialize MPI
  MPI_Init(&argc, &argv);

  // setup the mpi environment
  int mpi_size, mpi_rank;
  MPI_Comm_size(MPI_COMM_WORLD, &mpi_size);
  MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank);

  // make 2D grid
  int dims[2] = {0};
  MPI_Dims_create(mpi_size, 2, dims);
  int periods[2] = {1};
  int reorder = 0;
  MPI_Comm group;
  MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, reorder, &group);

  int coord[2];
  MPI_Cart_coords(group, mpi_rank, 2, coord);

  for (int i = 0; i != mpi_size; ++i) {
    if (mpi_rank == i) {
      std::cout << "I'm processor " << mpi_rank << " over " << mpi_size << " proc"
                << ", (" << coord[0] << ", " << coord[1] << ") in the 2D grid" << std::endl;
    }
    MPI_Barrier(MPI_COMM_WORLD);
  }

  // initialize the DBCSR library
  c_dbcsr_init_lib(MPI_COMM_WORLD, nullptr);

  // Total number of blocks
  int nrows_1 = 4;
  int ncols_1 = 5;
  int nrows_2 = 5;
  int ncols_2 = 4;

  // Block sizes
  std::vector<int> row_blk_sizes_1 = {2, 3, 5, 2};
  std::vector<int> col_blk_sizes_1 = {3, 3, 4, 6, 2};
  std::vector<int> row_blk_sizes_2 = col_blk_sizes_1;
  std::vector<int> col_blk_sizes_2 = {5, 2, 5, 3};

  auto row_dist_1 = random_dist(nrows_1, dims[0]);
  auto col_dist_1 = random_dist(ncols_1, dims[1]);
  auto row_dist_2 = random_dist(nrows_2, dims[0]);
  auto col_dist_2 = random_dist(ncols_2, dims[1]);

  dbcsr_distribution dist1 = nullptr;
  dbcsr_distribution dist2 = nullptr;
  dbcsr_distribution dist3 = nullptr;

  //create distributions
  c_dbcsr_distribution_new(&dist1, group, row_dist_1.data(), row_dist_1.size(), col_dist_1.data(), col_dist_1.size());

  c_dbcsr_distribution_new(&dist2, group, row_dist_2.data(), row_dist_2.size(), col_dist_2.data(), col_dist_2.size());

  c_dbcsr_distribution_new(&dist3, group, row_dist_1.data(), row_dist_1.size(), col_dist_2.data(), col_dist_2.size());

  // Fill all blocks, i.e. dense matrices
  auto fill_matrix = [&](void* matrix, std::vector<int>& irblks, std::vector<int>& icblks) {
    std::vector<double> block;
    std::vector<int> loc_irblks, loc_icblks;

    for (int i = 0; i != (int)irblks.size(); ++i) {
      int blk_proc = -1;
      int ix = irblks[i];
      int jx = icblks[i];
      c_dbcsr_get_stored_coordinates(matrix, ix, jx, &blk_proc);
      if (mpi_rank == blk_proc) {
        loc_irblks.push_back(ix);
        loc_icblks.push_back(jx);
      }
    }

    c_dbcsr_reserve_blocks(matrix, loc_irblks.data(), loc_icblks.data(), loc_irblks.size());

    void* iter = nullptr;
    c_dbcsr_iterator_start(&iter, matrix, nullptr, nullptr, nullptr, nullptr, nullptr);

    while (c_dbcsr_iterator_blocks_left(iter)) {
      int i = -1;
      int j = -1;
      int nblk = -1;
      int rsize = -1;
      int csize = -1;
      bool tr = false;

      double* blk = nullptr;
      c_dbcsr_iterator_next_2d_block_d(iter, &i, &j, &blk, &tr, &nblk, &rsize, &csize, nullptr, nullptr);

      std::generate(blk, blk + rsize * csize, [&]() { return static_cast<double>(std::rand()) / RAND_MAX; });
    }

    c_dbcsr_iterator_stop(&iter);
  };

  dbcsr_matrix matrix_a = nullptr;
  dbcsr_matrix matrix_b = nullptr;
  dbcsr_matrix matrix_c = nullptr;

  c_dbcsr_create_new(&matrix_a, "matrix a", dist1, dbcsr_type_no_symmetry, row_blk_sizes_1.data(), row_blk_sizes_1.size(),
    col_blk_sizes_1.data(), col_blk_sizes_1.size(), nullptr, nullptr, nullptr, nullptr, nullptr, nullptr);

  c_dbcsr_create_new(&matrix_b, "matrix b", dist2, dbcsr_type_no_symmetry, row_blk_sizes_2.data(), row_blk_sizes_2.size(),
    col_blk_sizes_2.data(), col_blk_sizes_2.size(), nullptr, nullptr, nullptr, nullptr, nullptr, nullptr);

  c_dbcsr_create_new(&matrix_c, "matrix c", dist3, dbcsr_type_no_symmetry, row_blk_sizes_1.data(), row_blk_sizes_1.size(),
    col_blk_sizes_2.data(), col_blk_sizes_2.size(), nullptr, nullptr, nullptr, nullptr, nullptr, nullptr);

  // indices of non-zero blocks
  std::vector<int> irblks_1 = {0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3};
  std::vector<int> icblks_1 = {0, 1, 2, 4, 0, 2, 3, 1, 3, 4, 0, 1, 2};

  std::vector<int> irblks_2 = {0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4};
  std::vector<int> icblks_2 = {0, 2, 3, 0, 1, 2, 3, 0, 2, 3, 1, 2, 3, 0, 1, 2, 3};

  std::vector<int> irblks_3 = {0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3};
  std::vector<int> icblks_3 = {0, 1, 2, 3, 0, 2, 3, 1, 2, 3, 0, 1, 2, 3};

  fill_matrix(matrix_a, irblks_1, icblks_1);
  c_dbcsr_finalize(matrix_a);
  fill_matrix(matrix_b, irblks_2, icblks_2);
  c_dbcsr_finalize(matrix_b);
  fill_matrix(matrix_c, irblks_3, icblks_3);
  c_dbcsr_finalize(matrix_c);

  // Compute C = 3.0 * A * B + 2.0 * C
  c_dbcsr_multiply_d('N', 'N', 3.0, matrix_a, matrix_b, 2.0, matrix_c, nullptr, nullptr, nullptr, nullptr, nullptr, nullptr,
    nullptr, nullptr, nullptr);

  c_dbcsr_print(matrix_c);

  // release the matrices
  c_dbcsr_release(&matrix_a);
  c_dbcsr_release(&matrix_b);
  c_dbcsr_release(&matrix_c);

  c_dbcsr_distribution_release(&dist1);
  c_dbcsr_distribution_release(&dist2);
  c_dbcsr_distribution_release(&dist3);

  MPI_Comm_free(&group);

  // finalize the DBCSR library
  c_dbcsr_finalize_lib();

  // finalize MPI
  MPI_Finalize();

  return 0;
}


================================================
FILE: examples/dbcsr_tensor_example_1.F
================================================
!--------------------------------------------------------------------------------------------------!
! Copyright (C) by the DBCSR developers group - All rights reserved                                !
! This file is part of the DBCSR library.                                                          !
!                                                                                                  !
! For information on the license, see the LICENSE file.                                            !
! For further information please visit https://dbcsr.cp2k.org                                      !
! SPDX-License-Identifier: GPL-2.0+                                                                !
!--------------------------------------------------------------------------------------------------!

program dbcsr_tensor_example_1
   !! Sparse tensor contraction example
   use mpi
   use dbcsr_api, only: &
      dbcsr_type, dbcsr_distribution_type, dbcsr_init_lib, dbcsr_distribution_new, &
      dbcsr_type_no_symmetry, dbcsr_create, dbcsr_iterator_start, dbcsr_iterator_blocks_left, &
      dbcsr_iterator_stop, dbcsr_iterator_next_block, dbcsr_iterator_type, dbcsr_put_block, &
      dbcsr_reserve_blocks, dbcsr_scalar, dbcsr_finalize_lib, dbcsr_distribution_release, &
      dbcsr_nblkrows_total, dbcsr_type_real_8, dbcsr_release, dbcsr_nblkcols_total, dbcsr_finalize, &
      dbcsr_get_stored_coordinates, dbcsr_get_info, dbcsr_filter, dbcsr_checksum
   use dbcsr_tensor_api, only: &
      dbcsr_t_create, dbcsr_t_copy_matrix_to_tensor, &
      dbcsr_t_pgrid_type, dbcsr_t_type, dbcsr_t_distribution_type, dbcsr_t_nblks_total, &
      dbcsr_t_reserve_blocks, dbcsr_t_iterator_start, dbcsr_t_iterator_blocks_left, &
      dbcsr_t_iterator_next_block, dbcsr_t_iterator_stop, dbcsr_t_default_distvec, dbcsr_t_put_block, &
      dbcsr_t_copy, dbcsr_t_distribution_new, dbcsr_t_distribution_destroy, dbcsr_t_write_blocks, dbcsr_t_contract, &
      dbcsr_t_copy_tensor_to_matrix, dbcsr_t_destroy, dbcsr_t_pgrid_destroy, dbcsr_t_nblks_total, &
      dbcsr_t_pgrid_create, dbcsr_t_iterator_type, dbcsr_t_get_stored_coordinates, dbcsr_t_get_info, dbcsr_t_filter, &
      dbcsr_t_checksum, dbcsr_t_clear, dbcsr_t_batched_contract_init, dbcsr_t_batched_contract_finalize
   use iso_fortran_env, only: &
      output_unit, real64, int64

! --------------------------------------------------------------------------------------------------
! this example implements the sparse tensor contraction (einstein notation)
! c(n,o) = c(n,o) + a(i,j,k) x a(l,m,k) x b(i,l,n) x (b(m,o,j) + b(o,m,j))
!
! the tensors have the following shape and entries:
! a: n x n x 2n: a(i,j,k) = exp(-1/3*alpha*((i-j)**2+(i-k)**2+(j-k)**2))
! b: n x n x n: b(i,j,k) = exp(-1/3*beta*((i-j)**2+(i-k)**2+(j-k)**2))
! c: n x n: c(i,j) = exp(-1/2*gamma*(i-j)**2)
!
! due to the exponential decay of the tensor elements w.r.t. difference between two indices,
! all tensors are sparse. neglect of small tensor elements is controlled by threshold 'filter_eps':
! tensor blocks with frobenius norm < filter_eps are neglected.
!
! block sizes are set randomly in this example to demonstrate a heterogeneous sparsity pattern,
! these should ideally be adapted to the natural sparsity pattern of the problem
! (e.g. blocks corresponding to a set of gaussian basis functions with same exponent)
!
! DBCSR provides two basic operations in terms of which any tensor contraction can be expressed:
! dbcsr_t_contract: contraction of a pair of tensors
! dbcsr_t_copy: copy supporting redistribution and index permutation
!
! by default, DBCSR supports tensors of ranks between 2 and 4.
! higher ranks can be enabled by adapting 'maxrank' in 'dbcsr_tensor.fypp'.
!
! the above contraction is executed in the following order:
! 1) d(i,j,l,m) = a(i,j,k) x a(l,m,k)
! 2) e(j,m,n) = d(i,j,l,m) x b(i,l,n)
! 3) f(j,m,o) = b(m,o,j) + b(o,m,j)
! 4) c(n,o) = c(n,o) + e(j,m,n) x f(j,m,o)
!
! how to run (this example and DBCSR for tensors in general):
! - best performance is obtained by running with mpi and one openmp thread per rank.
! - ideally number of mpi ranks should be composed of small prime factors (e.g. powers of 2).
! - for sparse data & heterogeneous block sizes, DBCSR should be run on CPUs with libxsmm backend.
! - for dense data best performance is obtained by choosing homogeneous block sizes of 64 and by
!   compiling with GPU support.
! --------------------------------------------------------------------------------------------------

! ------ Parameters ------

   ! example type:
   ! - 1: debug (small & verbose)
   ! - 2: default (medium size)
   ! - 3: large (requires parallelism)
   ! - 4: large, batched contraction to reduce memory (does not require parallelism)
   integer, parameter :: example_type = 2

   ! filter threshold (larger value means more sparse but less accurate)
   real(real64), parameter :: filter_eps = 1.0e-08_real64

   ! number of batches in one dimension (to reduce memory footprint)
   integer, parameter :: nbatch = 8

   ! exponents for gaussians
   real(real64) :: alpha, beta, gamma

   ! maximum block size (actual block sizes are random between 1 and this number)
   integer :: max_bsize

   ! tensor size in one dimension (n)
   integer :: nel

   ! tune sparsity by scaling exponent for calculation of tensor elements
   real(real64) :: scale_exp

   ! contract all tensors at once
   logical :: contract_direct

   ! contract in batches (memory saving)
   logical :: contract_batched

   ! verbosity level
   ! 0: essential output
   ! 1: tensor log
   ! 2: verbose tensor log
   ! 3: verbose tensor log and print all tensor data
   integer :: verbosity

   integer :: &
      ierr, numnodes, mynode, node_holds_blk, io_unit, io_unit_dbcsr, ind, row, col, blk, group, &
      i, j, k, l, n, o, i_arr, j_arr, k_arr, l_arr, n_arr, o_arr, blk_size, &
      min_exp, min_exp_ij, min_exp_ik, min_exp_jk, min_exp_il, min_exp_in, min_exp_ln, &
      ibatch, jbatch, lbatch, mbatch
   integer, dimension(:), allocatable :: &
      offset_i, offset_j, offset_l, offset_k, offset_n, tmp, &
      start_batch_i, start_batch_j, start_batch_l, start_batch_m, &
      end_batch_i, end_batch_j, end_batch_l, end_batch_m
   integer, dimension(:), allocatable, target :: &
      blk_ind_1, blk_ind_2, blk_ind_3, &
      blk_size_i, blk_size_j, blk_size_k, blk_size_l, blk_size_m, blk_size_n, blk_size_o, &
      dist_1, dist_2, dist_3, dist_4
   integer, dimension(:, :), allocatable :: bounds_1, bounds_2, bounds_3
   integer, dimension(:), pointer :: &
      row_dist, col_dist, row_blk_size, col_blk_size, row_offset, col_offset
   integer, dimension(2) :: shape_2d, blk_ind_2d, blk_size_2d, blk_offset_2d, pdims_2d
   integer, dimension(3) :: blk_ind_3d, pdims_3d, shape_3d, blk_size_3d, blk_offset_3d
   integer, dimension(4) :: shape_4d, pdims_4d
   integer, dimension(7) :: shape_ijklmno
   integer(int64) :: nflop_sum, nflop
   real(real64) :: cs, t1, t0, time, flop_rate
   real(real64), dimension(:, :), pointer :: blk_values_2d
   real(real64), dimension(:, :, :), allocatable :: blk_values_3d
   logical :: tr
   logical, dimension(2) :: period = .true.
   type(dbcsr_type) :: c_matrix
   type(dbcsr_distribution_type) :: dist_matrix
   type(dbcsr_iterator_type) :: iter_matrix
   type(dbcsr_t_pgrid_type) :: pgrid_3d, pgrid_4d
   type(dbcsr_t_distribution_type) :: dist_tensor
   type(dbcsr_t_type) :: a_ijk, a_lmk, b_iln, c_no, d_ijlm, e_jmn, f_jmo
   type(dbcsr_t_iterator_type) :: iter_tensor

   ! prefactor in exponent for tensor data
   alpha = 1.0_real64; beta = 0.5_real64; gamma = 2.0_real64

   ! parameters for different example types
   select case (example_type)
   case (1)
      nel = 10
      max_bsize = 3
      verbosity = 3
      scale_exp = 10.0_real64
      contract_direct = .true.
      contract_batched = .false.
   case (2)
      nel = 200
      max_bsize = 10
      verbosity = 1
      scale_exp = 0.01_real64
      contract_direct = .true.
      contract_batched = .false.
   case (3)
      nel = 2000
      max_bsize = 10
      verbosity = 1
      scale_exp = 0.01_real64
      contract_direct = .true.
      contract_batched = .false.
   case (4)
      nel = 2000
      max_bsize = 10
      verbosity = 0
      scale_exp = 0.01_real64
      contract_direct = .false.
      contract_batched = .true.
   end select

   alpha = alpha*scale_exp
   beta = beta*scale_exp
   gamma = gamma*scale_exp

   ! initialize mpi
   call mpi_init(ierr)
   if (ierr /= 0) stop "error in mpi_init"

   call mpi_comm_size(mpi_comm_world, numnodes, ierr)
   if (ierr /= 0) stop "error in mpi_comm_size"

   call mpi_comm_rank(mpi_comm_world, mynode, ierr)
   if (ierr /= 0) stop "error in mpi_comm_rank"

   ! initialize DBCSR
   call dbcsr_init_lib(mpi_comm_world)

   ! prepare output
   io_unit_dbcsr = -1
   io_unit = -1
   if (mynode == 0 .and. verbosity > 0) io_unit_dbcsr = output_unit
   if (mynode == 0) io_unit = output_unit

   ! create block sizes
   call random_blk_sizes(nel, shape_ijklmno(1), blk_size_i)
   call random_blk_sizes(nel, shape_ijklmno(2), blk_size_j)
   call random_blk_sizes(2*nel, shape_ijklmno(3), blk_size_k)
   call random_blk_sizes(nel, shape_ijklmno(4), blk_size_l)
   call random_blk_sizes(nel, shape_ijklmno(5), blk_size_m)
   call random_blk_sizes(nel, shape_ijklmno(6), blk_size_n)
   call random_blk_sizes(nel, shape_ijklmno(7), blk_size_o)

! ------ create matrix c[no] ------

   ! shape (number of blocks in each dimension)
   shape_2d = shape_ijklmno(6:7)

   ! set up 2-dimensional process grid
   pdims_2d(:) = 0
   call mpi_dims_create(numnodes, 2, pdims_2d, ierr)
   if (ierr /= 0) stop "error in mpi_dims_create"
   call mpi_cart_create(mpi_comm_world, 2, pdims_2d, period, .false., group, ierr)
   if (ierr /= 0) stop "error in mpi_cart_create"

   ! row and column distribution (mapping blocks in each dimension to process grid coordinate)
   ! this routine creates a load-balanced distribution for heterogeneous block sizes, alternatively
   ! any custom distribution can be used
   allocate (dist_1(shape_2d(1)))
   call dbcsr_t_default_distvec(shape_2d(1), pdims_2d(1), blk_size_n, dist_1)
   allocate (dist_2(shape_2d(2)))
   call dbcsr_t_default_distvec(shape_2d(2), pdims_2d(2), blk_size_o, dist_2)

   ! convert to pointers because DBCSR matrix api only accepts pointers
   row_dist => dist_1
   col_dist => dist_2

   ! create distribution
   call dbcsr_distribution_new(dist_matrix, group=group, row_dist=row_dist, col_dist=col_dist)
   deallocate (dist_1, dist_2)

   ! convert to pointers since DBCSR matrix api only accepts pointers
   row_blk_size => blk_size_n
   col_blk_size => blk_size_o

   ! create DBCSR matrix
   call dbcsr_create(matrix=c_matrix, name="c[n|o]", dist=dist_matrix, matrix_type=dbcsr_type_no_symmetry, &
                     row_blk_size=row_blk_size, col_blk_size=col_blk_size, data_type=dbcsr_type_real_8)

   call dbcsr_distribution_release(dist_matrix)

! ------ fill matrix c[no] ------

   ! reserve non-zero blocks. for performance it is important to first reserve all present blocks
   ! before calculating them and inserting them into DBCSR matrix.
   call dbcsr_get_info(c_matrix, row_blk_offset=row_offset, col_blk_offset=col_offset)

   ind = 0
   allocate (blk_ind_1(0), blk_ind_2(0))
   do row = 1, dbcsr_nblkrows_total(c_matrix)
      do col = 1, dbcsr_nblkcols_total(c_matrix)

         ! only consider blocks that are local to this rank (according to distribution)
         call dbcsr_get_stored_coordinates(c_matrix, row, col, node_holds_blk)
         if (node_holds_blk /= mynode) cycle

         ! calculate largest matrix element to determine an upper bound for block frobenius norm
         ! block is reserved only if this estimate is larger than the filter_eps parameter
         min_exp = block_minabsdiff(row_offset(row), col_offset(col), row_blk_size(row), col_blk_size(col))
         blk_size = row_blk_size(row)*col_blk_size(col)
         if (blk_size*exp(-0.5*gamma*real(min_exp**2)) < filter_eps) cycle

         ind = ind + 1

         ! store index of block to be reserved
         call move_alloc(blk_ind_1, tmp)
         allocate (blk_ind_1(ind))
         blk_ind_1(:ind - 1) = tmp; deallocate (tmp)

         call move_alloc(blk_ind_2, tmp)
         allocate (blk_ind_2(ind))
         blk_ind_2(:ind - 1) = tmp; deallocate (tmp)

         blk_ind_1(ind) = row
         blk_ind_2(ind) = col

      end do
   end do

   ! reserve blocks
   call dbcsr_reserve_blocks(c_matrix, blk_ind_1, blk_ind_2)
   deallocate (blk_ind_1, blk_ind_2)

   ! iterate over reserved matrix blocks to fill them with data
   call dbcsr_iterator_start(iter_matrix, c_matrix)
   do while (dbcsr_iterator_blocks_left(iter_matrix))
      call dbcsr_iterator_next_block(iter_matrix, blk_ind_2d(1), blk_ind_2d(2), blk_values_2d, tr, &
                                     row_size=blk_size_2d(1), col_size=blk_size_2d(2), &
                                     row_offset=blk_offset_2d(1), col_offset=blk_offset_2d(2))
      do n_arr = 1, blk_size_2d(1)
         do o_arr = 1, blk_size_2d(2)
            ! get matrix element index n & o from block offset
            n = n_arr + blk_offset_2d(1) - 1
            o = o_arr + blk_offset_2d(2) - 1
            ! calculate matrix eleme

Download .txt

gitextract_o5s2z7tn/

├── .ccls
├── .clang-format
├── .cmake-format.py
├── .codecov.yml
├── .fortls
├── .git-blame-ignore-revs
├── .gitattributes
├── .github/
│   ├── ISSUE_TEMPLATE/
│   │   └── bug_report.md
│   └── workflows/
│       ├── doc-generation.yml
│       ├── docker-build-env.yml
│       ├── release.yml
│       ├── testing-gcc.yml
│       ├── testing-linux.yml
│       └── testing-macos.yml
├── .gitignore
├── .gitmodules
├── .packit.yaml
├── .pre-commit/
│   ├── check_header.py
│   ├── clang-format-fypp.sh
│   └── headers/
│       ├── c_cpp.1
│       ├── c_cpp.2
│       ├── c_cpp.3
│       ├── fortran.1
│       ├── fortran.2
│       ├── fypp.1
│       ├── script.1
│       └── script.2
├── .pre-commit-config.yaml
├── .ruff.toml
├── AUTHORS
├── CMakeLists.txt
├── CONTRIBUTING.md
├── DBCSR.md
├── LICENSE
├── README.md
├── VERSION
├── cmake/
│   ├── CheckCompilerSupport.cmake
│   ├── CompilerConfiguration.cmake
│   ├── CustomTargets.cmake
│   ├── GetGitRevisionDescription.cmake
│   ├── GetGitRevisionDescription.cmake.in
│   ├── compiler-tests/
│   │   ├── f2008-block_construct.f90
│   │   ├── f2008-contiguous.f90
│   │   ├── f2008-norm2.f90
│   │   └── f95-reshape-order-allocatable.f90
│   └── fypp-sources.cmake
├── docs/
│   ├── CMakeLists.txt
│   ├── guide/
│   │   ├── 1-DBCSR/
│   │   │   ├── index.md
│   │   │   └── publications.md
│   │   ├── 2-user-guide/
│   │   │   ├── 1-installation/
│   │   │   │   ├── 1-cmake-build-recipes.md
│   │   │   │   ├── 2-supported-compilers.md
│   │   │   │   ├── 3-using-dbcsr-in-a-cmake-project.md
│   │   │   │   ├── 4-docker.md
│   │   │   │   └── index.md
│   │   │   ├── 2-tests/
│   │   │   │   └── index.md
│   │   │   ├── 3-examples/
│   │   │   │   └── index.md
│   │   │   ├── 4-gpu/
│   │   │   │   └── index.md
│   │   │   └── index.md
│   │   ├── 3-developer-guide/
│   │   │   ├── 1-tooling/
│   │   │   │   └── index.md
│   │   │   ├── 2-documentation/
│   │   │   │   └── index.md
│   │   │   ├── 3-programming/
│   │   │   │   ├── 1-overview/
│   │   │   │   │   └── index.md
│   │   │   │   ├── 2-accelerator-backend/
│   │   │   │   │   ├── 1-code-structure.md
│   │   │   │   │   ├── 2-libsmm_acc/
│   │   │   │   │   │   ├── 1-kernels.md
│   │   │   │   │   │   ├── 2-parameters.md
│   │   │   │   │   │   ├── 3-tune.md
│   │   │   │   │   │   └── index.md
│   │   │   │   │   ├── 3-libsmm_ocl/
│   │   │   │   │   │   ├── 1-autotune.md
│   │   │   │   │   │   ├── 2-bulktune.md
│   │   │   │   │   │   └── index.md
│   │   │   │   │   └── index.md
│   │   │   │   └── index.md
│   │   │   ├── 4-performance/
│   │   │   │   ├── 1-insights.md
│   │   │   │   ├── 2-just-in-time-compilation.md
│   │   │   │   └── index.md
│   │   │   └── index.md
│   │   └── index.md
│   └── media/
│       └── logo/
│           └── logo.ppt
├── examples/
│   ├── .gitignore
│   ├── CMakeLists.txt
│   ├── README.md
│   ├── dbcsr_example_1.F
│   ├── dbcsr_example_2.F
│   ├── dbcsr_example_3.F
│   ├── dbcsr_example_3.cpp
│   ├── dbcsr_tensor_example_1.F
│   └── dbcsr_tensor_example_2.cpp
├── src/
│   ├── .gitignore
│   ├── CMakeLists.txt
│   ├── PACKAGE
│   ├── acc/
│   │   ├── PACKAGE
│   │   ├── README.md
│   │   ├── acc.h
│   │   ├── acc_bench.c
│   │   ├── acc_bench.h
│   │   ├── acc_libsmm.h
│   │   ├── acc_triplets.sh
│   │   ├── cuda/
│   │   │   ├── Makefile
│   │   │   ├── PACKAGE
│   │   │   ├── acc_cuda.cpp
│   │   │   ├── acc_cuda.h
│   │   │   ├── dbcsr_cuda_nvtx_cu.cpp
│   │   │   └── dbcsr_cuda_profiling.F
│   │   ├── cuda_hip/
│   │   │   ├── PACKAGE
│   │   │   ├── acc_blas.cpp
│   │   │   ├── acc_blas.h
│   │   │   ├── acc_dev.cpp
│   │   │   ├── acc_error.cpp
│   │   │   ├── acc_error.h
│   │   │   ├── acc_event.cpp
│   │   │   ├── acc_init.cpp
│   │   │   ├── acc_mem.cpp
│   │   │   ├── acc_stream.cpp
│   │   │   ├── acc_utils.cpp
│   │   │   ├── acc_utils.h
│   │   │   └── calculate_norms.cpp
│   │   ├── dbcsr_acc_device.F
│   │   ├── dbcsr_acc_devmem.F
│   │   ├── dbcsr_acc_event.F
│   │   ├── dbcsr_acc_hostmem.F
│   │   ├── dbcsr_acc_init.F
│   │   ├── dbcsr_acc_stream.F
│   │   ├── dbcsr_acc_timings.F
│   │   ├── hip/
│   │   │   ├── PACKAGE
│   │   │   ├── acc_hip.cpp
│   │   │   ├── acc_hip.h
│   │   │   └── dbcsr_hip_profiling.F
│   │   ├── libsmm_acc/
│   │   │   ├── .gitignore
│   │   │   ├── CMakeLists.txt
│   │   │   ├── PACKAGE
│   │   │   ├── README.md
│   │   │   ├── generate_kernels.py
│   │   │   ├── generate_parameters.py
│   │   │   ├── kernels/
│   │   │   │   ├── PACKAGE
│   │   │   │   ├── README.md
│   │   │   │   ├── __init__.py
│   │   │   │   ├── autotuning_properties.json
│   │   │   │   ├── gpu_properties.json
│   │   │   │   ├── smm_acc.py
│   │   │   │   ├── smm_acc_common.h
│   │   │   │   ├── smm_acc_dnt_base.py
│   │   │   │   ├── smm_acc_dnt_largeDB1.h
│   │   │   │   ├── smm_acc_dnt_largeDB1.py
│   │   │   │   ├── smm_acc_dnt_largeDB2.h
│   │   │   │   ├── smm_acc_dnt_largeDB2.py
│   │   │   │   ├── smm_acc_dnt_medium.h
│   │   │   │   ├── smm_acc_dnt_medium.py
│   │   │   │   ├── smm_acc_dnt_small.h
│   │   │   │   ├── smm_acc_dnt_small.py
│   │   │   │   ├── smm_acc_dnt_tiny.h
│   │   │   │   ├── smm_acc_dnt_tiny.py
│   │   │   │   ├── smm_acc_predict.py
│   │   │   │   └── smm_acc_transpose.h
│   │   │   ├── libcusmm/
│   │   │   │   ├── .gitignore
│   │   │   │   └── PACKAGE
│   │   │   ├── libsmm_acc.cpp
│   │   │   ├── libsmm_acc.h
│   │   │   ├── libsmm_acc_benchmark.cpp
│   │   │   ├── libsmm_acc_benchmark.h
│   │   │   ├── libsmm_acc_init.cpp
│   │   │   ├── libsmm_acc_init.h
│   │   │   ├── parameters/
│   │   │   │   ├── parameters_A100.json
│   │   │   │   ├── parameters_H100.json
│   │   │   │   ├── parameters_K20X.json
│   │   │   │   ├── parameters_K40.json
│   │   │   │   ├── parameters_K80.json
│   │   │   │   ├── parameters_Mi100.json
│   │   │   │   ├── parameters_Mi250.json
│   │   │   │   ├── parameters_Mi300.json
│   │   │   │   ├── parameters_Mi350.json
│   │   │   │   ├── parameters_Mi50.json
│   │   │   │   ├── parameters_P100.json
│   │   │   │   └── parameters_V100.json
│   │   │   ├── parameters_utils.h
│   │   │   └── tune/
│   │   │       ├── .gitignore
│   │   │       ├── README.md
│   │   │       ├── archive.sh
│   │   │       ├── cleanup.sh
│   │   │       ├── requirements.txt
│   │   │       ├── tune_collect.py
│   │   │       ├── tune_merge.py
│   │   │       ├── tune_setup.py
│   │   │       └── tune_submit.py
│   │   └── opencl/
│   │       ├── Makefile
│   │       ├── PACKAGE
│   │       ├── README.md
│   │       ├── acc_getenv.sh
│   │       ├── acc_opencl.c
│   │       ├── acc_opencl.h
│   │       ├── acc_opencl.sh
│   │       ├── acc_opencl_event.c
│   │       ├── acc_opencl_mem.c
│   │       ├── acc_opencl_stream.c
│   │       ├── common/
│   │       │   ├── opencl_atomics.h
│   │       │   └── opencl_common.h
│   │       └── smm/
│   │           ├── .gitignore
│   │           ├── CMakeLists.txt
│   │           ├── PACKAGE
│   │           ├── README-autotune.md
│   │           ├── README-bulktune.md
│   │           ├── README.md
│   │           ├── kernels/
│   │           │   ├── multiply.cl
│   │           │   └── transpose.cl
│   │           ├── opencl_libsmm.c
│   │           ├── opencl_libsmm.h
│   │           ├── opencl_test.sh
│   │           ├── params/
│   │           │   ├── README.md
│   │           │   ├── tune_multiply_A100.csv
│   │           │   ├── tune_multiply_BMG.csv
│   │           │   ├── tune_multiply_GH200.csv
│   │           │   ├── tune_multiply_H100.csv
│   │           │   ├── tune_multiply_Mi250.csv
│   │           │   ├── tune_multiply_P100.csv
│   │           │   ├── tune_multiply_PVC.csv
│   │           │   └── tune_multiply_V100.csv
│   │           ├── requirements.txt
│   │           ├── tune_multiply.py
│   │           └── tune_multiply.sh
│   ├── base/
│   │   ├── PACKAGE
│   │   ├── dbcsr_base_hooks.F
│   │   ├── dbcsr_base_uses.f90
│   │   ├── dbcsr_kinds.F
│   │   ├── dbcsr_machine.F
│   │   ├── dbcsr_machine_internal.F
│   │   └── dbcsr_machine_posix.f90
│   ├── block/
│   │   ├── PACKAGE
│   │   ├── dbcsr_block_access.F
│   │   ├── dbcsr_block_operations.F
│   │   ├── dbcsr_index_operations.F
│   │   └── dbcsr_iterator_operations.F
│   ├── cmake/
│   │   └── DBCSRConfig.cmake.in
│   ├── core/
│   │   ├── PACKAGE
│   │   ├── dbcsr_array_types.F
│   │   ├── dbcsr_config.F
│   │   ├── dbcsr_dict.F
│   │   ├── dbcsr_dict.fypp
│   │   ├── dbcsr_error_handling.F
│   │   ├── dbcsr_iter_types.F
│   │   ├── dbcsr_lib.F
│   │   ├── dbcsr_list.F
│   │   ├── dbcsr_list.fypp
│   │   ├── dbcsr_list_callstackentry.F
│   │   ├── dbcsr_list_routinereport.F
│   │   ├── dbcsr_list_routinestat.F
│   │   ├── dbcsr_list_timerenv.F
│   │   ├── dbcsr_log_handling.F
│   │   ├── dbcsr_methods.F
│   │   ├── dbcsr_print_messages.F
│   │   ├── dbcsr_timings.F
│   │   ├── dbcsr_timings_base_type.F
│   │   ├── dbcsr_timings_report.F
│   │   ├── dbcsr_timings_types.F
│   │   └── dbcsr_types.F
│   ├── data/
│   │   ├── PACKAGE
│   │   ├── dbcsr.fypp
│   │   ├── dbcsr_data_methods.F
│   │   ├── dbcsr_data_methods_low.F
│   │   ├── dbcsr_data_operations.F
│   │   ├── dbcsr_data_types.F
│   │   ├── dbcsr_mem_methods.F
│   │   └── dbcsr_ptr_util.F
│   ├── dbcsr.h
│   ├── dbcsr_api.F
│   ├── dbcsr_api_c.F
│   ├── mm/
│   │   ├── PACKAGE
│   │   ├── dbcsr_acc_operations.F
│   │   ├── dbcsr_mm.F
│   │   ├── dbcsr_mm_3d.F
│   │   ├── dbcsr_mm_accdrv.F
│   │   ├── dbcsr_mm_cannon.F
│   │   ├── dbcsr_mm_common.F
│   │   ├── dbcsr_mm_csr.F
│   │   ├── dbcsr_mm_dist_operations.F
│   │   ├── dbcsr_mm_hostdrv.F
│   │   ├── dbcsr_mm_multrec.F
│   │   ├── dbcsr_mm_sched.F
│   │   ├── dbcsr_mm_types.F
│   │   └── dbcsr_multiply_api.F
│   ├── mpi/
│   │   ├── PACKAGE
│   │   ├── dbcsr_mp_methods.F
│   │   ├── dbcsr_mp_operations.F
│   │   ├── dbcsr_mpiwrap.F
│   │   └── dbcsr_mpiwrap.fypp
│   ├── ops/
│   │   ├── PACKAGE
│   │   ├── dbcsr_csr_conversions.F
│   │   ├── dbcsr_io.F
│   │   ├── dbcsr_operations.F
│   │   ├── dbcsr_test_methods.F
│   │   ├── dbcsr_tests.F
│   │   └── dbcsr_transformations.F
│   ├── tas/
│   │   ├── PACKAGE
│   │   ├── dbcsr_tas.fypp
│   │   ├── dbcsr_tas_base.F
│   │   ├── dbcsr_tas_global.F
│   │   ├── dbcsr_tas_io.F
│   │   ├── dbcsr_tas_mm.F
│   │   ├── dbcsr_tas_reshape_ops.F
│   │   ├── dbcsr_tas_split.F
│   │   ├── dbcsr_tas_test.F
│   │   ├── dbcsr_tas_types.F
│   │   └── dbcsr_tas_util.F
│   ├── tensors/
│   │   ├── PACKAGE
│   │   ├── dbcsr_allocate_wrap.F
│   │   ├── dbcsr_array_list_methods.F
│   │   ├── dbcsr_tensor.F
│   │   ├── dbcsr_tensor.fypp
│   │   ├── dbcsr_tensor.h
│   │   ├── dbcsr_tensor_api.F
│   │   ├── dbcsr_tensor_api_c.F
│   │   ├── dbcsr_tensor_block.F
│   │   ├── dbcsr_tensor_index.F
│   │   ├── dbcsr_tensor_io.F
│   │   ├── dbcsr_tensor_reshape.F
│   │   ├── dbcsr_tensor_split.F
│   │   ├── dbcsr_tensor_test.F
│   │   └── dbcsr_tensor_types.F
│   ├── utils/
│   │   ├── PACKAGE
│   │   ├── dbcsr_array_sort.F
│   │   ├── dbcsr_array_sort.fypp
│   │   ├── dbcsr_blas_operations.F
│   │   ├── dbcsr_btree.F
│   │   ├── dbcsr_btree.fypp
│   │   ├── dbcsr_files.F
│   │   ├── dbcsr_hash_table.f90
│   │   ├── dbcsr_hash_table_types.f90
│   │   ├── dbcsr_min_heap.F
│   │   ├── dbcsr_string_utilities.F
│   │   └── dbcsr_toollib.F
│   └── work/
│       ├── PACKAGE
│       └── dbcsr_work_operations.F
├── tests/
│   ├── .gitignore
│   ├── CMakeLists.txt
│   ├── README.md
│   ├── dbcsr_acc_test.c
│   ├── dbcsr_performance_driver.F
│   ├── dbcsr_performance_multiply.F
│   ├── dbcsr_tas_unittest.F
│   ├── dbcsr_tensor_test.cpp
│   ├── dbcsr_tensor_unittest.F
│   ├── dbcsr_test.cpp
│   ├── dbcsr_test_add.F
│   ├── dbcsr_test_csr_conversions.F
│   ├── dbcsr_test_multiply.F
│   ├── dbcsr_test_scale_by_vector.F
│   ├── dbcsr_unittest1.F
│   ├── dbcsr_unittest2.F
│   ├── dbcsr_unittest3.F
│   ├── dbcsr_unittest4.F
│   ├── generate_libsmm_acc_timer_multiply.py
│   ├── generate_libsmm_acc_unittest_multiply.py
│   ├── input.perf
│   ├── inputs/
│   │   ├── test_H2O.perf
│   │   ├── test_rect1_dense.perf
│   │   ├── test_rect1_sparse.perf
│   │   ├── test_rect2_dense.perf
│   │   ├── test_rect2_sparse.perf
│   │   ├── test_singleblock.perf
│   │   ├── test_square_dense.perf
│   │   ├── test_square_sparse.perf
│   │   ├── test_square_sparse_bigblocks.perf
│   │   └── test_square_sparse_rma.perf
│   ├── libsmm_acc_timer_multiply.cpp.template
│   ├── libsmm_acc_unittest_multiply.cpp.template
│   └── libsmm_acc_unittest_transpose.cpp
└── tools/
    ├── build_libsmm/
    │   ├── COPYRIGHT
    │   ├── README
    │   ├── config/
    │   │   ├── cray.cce
    │   │   ├── cray.gnu
    │   │   ├── cray.intel.libsci
    │   │   ├── cray.intel.mkl
    │   │   ├── cray_mic.intel
    │   │   ├── linux.gnu
    │   │   ├── linux.intel
    │   │   ├── local_libxsmm.gnu
    │   │   ├── mic.intel
    │   │   ├── none.wlm
    │   │   ├── pbs.wlm
    │   │   └── slurm.wlm
    │   ├── config.in
    │   ├── generate
    │   ├── generate.bash
    │   ├── lib_gen.f90
    │   ├── make.gen
    │   ├── multrec_gen.f90
    │   ├── mults.f90
    │   ├── small_gen.f90
    │   └── tiny_gen.f90
    ├── docker/
    │   ├── Dockerfile.build-env-latest-gcc
    │   ├── Dockerfile.build-env-rocm
    │   ├── Dockerfile.build-env-ubuntu
    │   ├── Dockerfile.build-env-ubuntu-cuda
    │   ├── Makefile
    │   ├── README.md
    │   └── lsan.supp
    └── fedora/
        ├── dbcsr.rpmlintrc
        └── dbcsr.spec

Download .txt

SYMBOL INDEX (389 symbols across 58 files)

FILE: .pre-commit/check_header.py
  function mmap_open (line 36) | def mmap_open(name, mode="r"):
  function check_header (line 44) | def check_header(header_dir, files, verbose=False):

FILE: examples/dbcsr_example_3.cpp
  function random_dist (line 24) | std::vector<int> random_dist(int dist_size, int nbins) {
  function main (line 34) | int main(int argc, char* argv[]) {

FILE: examples/dbcsr_tensor_example_2.cpp
  function random_dist (line 27) | std::vector<int> random_dist(int dist_size, int nbins) {
  function printvec (line 35) | void printvec(std::vector<int>& v) {
  function fill_random (line 42) | void fill_random(dbcsr_t_tensor tensor, std::vector<std::vector<int>> nz...
  function main (line 121) | int main(int argc, char* argv[]) {

FILE: src/acc/acc.h
  type c_dbcsr_acc_bool_t (line 27) | typedef int c_dbcsr_acc_bool_t;

FILE: src/acc/acc_bench.c
  function parse_params (line 113) | static void parse_params(int argc, char* argv[], FILE** file, const char...
  function parse_nbytes (line 177) | static size_t parse_nbytes(const char* nbytes, size_t* nelems) {
  function main (line 205) | int main(int argc, char* argv[]) {

FILE: src/acc/acc_bench.h
  function INLINE (line 50) | static INLINE void init_stack(

FILE: src/acc/acc_libsmm.h
  type libsmm_acc_data_t (line 31) | typedef enum libsmm_acc_data_t {

FILE: src/acc/cuda/acc_cuda.cpp
  function nvrtcResult (line 12) | nvrtcResult nvrtcGetLowLevelCode(nvrtcProgram prog, char* code) { return...
  function nvrtcResult (line 14) | nvrtcResult nvrtcGetLowLevelCodeSize(nvrtcProgram prog, size_t* codeSize...
  function CUresult (line 16) | CUresult cuLaunchJITKernel(CUfunction f, unsigned int gridDimX, unsigned...

FILE: src/acc/cuda/dbcsr_cuda_nvtx_cu.cpp
  function cuda_nvtx_range_push_cu (line 33) | int cuda_nvtx_range_push_cu(const char* message) {
  function cuda_nvtx_range_pop_cu (line 57) | int cuda_nvtx_range_pop_cu() {
  function cuda_nvtx_name_osthread_cu (line 63) | void cuda_nvtx_name_osthread_cu(char* name) { nvtxNameOsThread(pthread_s...

FILE: src/acc/cuda_hip/acc_dev.cpp
  function c_dbcsr_acc_get_ndevices (line 24) | int c_dbcsr_acc_get_ndevices(int* n_devices) {
  function c_dbcsr_acc_device_synchronize (line 30) | int c_dbcsr_acc_device_synchronize() {
  function c_dbcsr_acc_set_active_device (line 36) | int c_dbcsr_acc_set_active_device(int device_id) {

FILE: src/acc/cuda_hip/acc_error.cpp
  function acc_error_check (line 22) | int acc_error_check(ACC(Error_t) error) {
  function c_dbcsr_acc_clear_errors (line 30) | void c_dbcsr_acc_clear_errors() { ACC(GetLastError)(); }

FILE: src/acc/cuda_hip/acc_event.cpp
  function c_dbcsr_acc_event_create (line 25) | int c_dbcsr_acc_event_create(void** event_p) {
  function c_dbcsr_acc_event_destroy (line 38) | int c_dbcsr_acc_event_destroy(void* event) {

FILE: src/acc/cuda_hip/acc_init.cpp
  function c_dbcsr_acc_init (line 22) | int c_dbcsr_acc_init() {
  function c_dbcsr_acc_finalize (line 40) | int c_dbcsr_acc_finalize() {

FILE: src/acc/cuda_hip/acc_mem.cpp
  function c_dbcsr_acc_dev_mem_allocate (line 29) | int c_dbcsr_acc_dev_mem_allocate(void** dev_mem, size_t n) {
  function c_dbcsr_acc_dev_mem_deallocate (line 39) | int c_dbcsr_acc_dev_mem_deallocate(void* dev_mem) {
  function c_dbcsr_acc_host_mem_allocate (line 48) | int c_dbcsr_acc_host_mem_allocate(void** host_mem, size_t n, void* strea...
  function c_dbcsr_acc_host_mem_deallocate (line 62) | int c_dbcsr_acc_host_mem_deallocate(void* host_mem, void* stream) {
  function c_dbcsr_acc_dev_mem_set_ptr (line 72) | int c_dbcsr_acc_dev_mem_set_ptr(void** dev_mem, void* other, size_t lb) {
  function c_dbcsr_acc_memcpy_h2d (line 79) | int c_dbcsr_acc_memcpy_h2d(const void* host_mem, void* dev_mem, size_t c...
  function c_dbcsr_acc_memcpy_d2h (line 90) | int c_dbcsr_acc_memcpy_d2h(const void* dev_mem, void* host_mem, size_t c...
  function c_dbcsr_acc_memcpy_d2d (line 103) | int c_dbcsr_acc_memcpy_d2d(const void* devmem_src, void* devmem_dst, siz...
  function c_dbcsr_acc_memset_zero (line 120) | int c_dbcsr_acc_memset_zero(void* dev_mem, size_t offset, size_t length,...
  function c_dbcsr_acc_dev_mem_info (line 139) | int c_dbcsr_acc_dev_mem_info(size_t* free, size_t* avail) {

FILE: src/acc/cuda_hip/acc_stream.cpp
  function c_dbcsr_acc_stream_priority_range (line 30) | int c_dbcsr_acc_stream_priority_range(int* least, int* greatest) {
  function c_dbcsr_acc_stream_create (line 40) | int c_dbcsr_acc_stream_create(void** stream_p, const char* name, int pri...
  function c_dbcsr_acc_stream_destroy (line 69) | int c_dbcsr_acc_stream_destroy(void* stream) {
  function c_dbcsr_acc_stream_sync (line 83) | int c_dbcsr_acc_stream_sync(void* stream) {

FILE: src/acc/cuda_hip/acc_utils.cpp
  function acc_get_gpu_warp_size (line 19) | int acc_get_gpu_warp_size() {

FILE: src/acc/cuda_hip/calculate_norms.cpp
  function __global__ (line 48) | __global__ void calculate_norms_d(
  function c_calculate_norms (line 98) | int c_calculate_norms(

FILE: src/acc/hip/acc_hip.cpp
  function hipError_t (line 13) | hipError_t hipHostAlloc(void** ptr, size_t size, unsigned int flags) { r...
  function hipError_t (line 15) | hipError_t hipFreeHost(void* ptr) { return hipHostFree(ptr); }
  function hiprtcResult (line 18) | hiprtcResult hiprtcGetLowLevelCode(hiprtcProgram prog, char* code) { ret...
  function hiprtcResult (line 20) | hiprtcResult hiprtcGetLowLevelCodeSize(hiprtcProgram prog, size_t* codeS...
  function hipError_t (line 22) | hipError_t hipEventCreate(hipEvent_t* event, unsigned flags) { return hi...
  function hipError_t (line 24) | hipError_t hipStreamCreate(hipStream_t* stream, unsigned int flags) { re...
  function hipError_t (line 26) | hipError_t hipLaunchJITKernel(hipFunction_t f, unsigned int gridDimX, un...

FILE: src/acc/libsmm_acc/generate_kernels.py
  function main (line 50) | def main(kernels_folder: Path):
  function cpp_function_to_string (line 89) | def cpp_function_to_string(cpp_file, kernel_name):

FILE: src/acc/libsmm_acc/generate_parameters.py
  function main (line 20) | def main(gpu_version: str, base_dir: Path):
  function write_parameters_file (line 58) | def write_parameters_file(all_pars, gpu_warp_size):

FILE: src/acc/libsmm_acc/kernels/smm_acc.py
  function compatible_mnk (line 73) | def compatible_mnk(algo, m, n, k):
  function params_dict_to_kernel (line 95) | def params_dict_to_kernel(**params):
  function descr_to_kernel (line 120) | def descr_to_kernel(kernel_descr, source="autotuned"):
  function to_string (line 147) | def to_string(*iterable):
  function to_tuple (line 165) | def to_tuple(*iterable):

FILE: src/acc/libsmm_acc/kernels/smm_acc_common.h
  function __device__ (line 29) | static __device__ double atomicAdd(double* address, double val) {

FILE: src/acc/libsmm_acc/kernels/smm_acc_dnt_base.py
  function round_up_to_nearest_multiple (line 15) | def round_up_to_nearest_multiple(x, step):
  function round_down_to_nearest_multiple (line 20) | def round_down_to_nearest_multiple(x, step):
  class Kernel (line 30) | class Kernel:
    method __repr__ (line 35) | def __repr__(self):
    method can_handle (line 38) | def can_handle(self, m, n, k):
    method include (line 42) | def include(self):
    method name (line 46) | def name(self):
    method autotuned (line 50) | def autotuned(self):
    method as_dict (line 54) | def as_dict(self):
    method as_dict_for_parameters_json (line 58) | def as_dict_for_parameters_json(self):
    method as_dict_for_parameters_h (line 92) | def as_dict_for_parameters_h(self):
    method launcher_code (line 121) | def launcher_code(self, compiler):
    method func_signature (line 153) | def func_signature(self):
    method promising_parameters (line 157) | def promising_parameters(m, n, k, gpu, autotuning):
    method baseline (line 163) | def baseline(m, n, k, gpu, autotuning):
    method parameter_set_distance (line 168) | def parameter_set_distance(cls, par_set1, par_set2):

FILE: src/acc/libsmm_acc/kernels/smm_acc_dnt_largeDB1.h
  function namespace (line 19) | namespace ns_smm_acc_dnt_largeDB1 {

FILE: src/acc/libsmm_acc/kernels/smm_acc_dnt_largeDB1.py
  class Kernel_dnt_largeDB1 (line 14) | class Kernel_dnt_largeDB1(Kernel):
    method __init__ (line 32) | def __init__(
    method func_signature (line 56) | def func_signature(self):
    method promising_parameters (line 65) | def promising_parameters(
    method baseline (line 188) | def baseline(m, n, k, gpu, autotuning):

FILE: src/acc/libsmm_acc/kernels/smm_acc_dnt_largeDB2.h
  function namespace (line 19) | namespace ns_smm_acc_dnt_largeDB2 {

FILE: src/acc/libsmm_acc/kernels/smm_acc_dnt_largeDB2.py
  class Kernel_dnt_largeDB2 (line 14) | class Kernel_dnt_largeDB2(Kernel):
    method __init__ (line 32) | def __init__(
    method func_signature (line 56) | def func_signature(self):
    method promising_parameters (line 65) | def promising_parameters(
    method baseline (line 189) | def baseline(m, n, k, gpu, autotuning):

FILE: src/acc/libsmm_acc/kernels/smm_acc_dnt_medium.py
  class Kernel_dnt_medium (line 14) | class Kernel_dnt_medium(Kernel):
    method __init__ (line 28) | def __init__(
    method func_signature (line 48) | def func_signature(self):
    method promising_parameters (line 57) | def promising_parameters(
    method baseline (line 155) | def baseline(m, n, k, gpu, autotuning):

FILE: src/acc/libsmm_acc/kernels/smm_acc_dnt_small.py
  class Kernel_dnt_small (line 14) | class Kernel_dnt_small(Kernel):
    method __init__ (line 28) | def __init__(
    method func_signature (line 43) | def func_signature(self):
    method promising_parameters (line 52) | def promising_parameters(
    method baseline (line 151) | def baseline(m, n, k, gpu, autotuning):

FILE: src/acc/libsmm_acc/kernels/smm_acc_dnt_tiny.py
  class Kernel_dnt_tiny (line 14) | class Kernel_dnt_tiny(Kernel):
    method __init__ (line 19) | def __init__(self, m, n, k, threads, grouping, minblocks, perf, source):
    method func_signature (line 31) | def func_signature(self):
    method promising_parameters (line 37) | def promising_parameters(
    method baseline (line 119) | def baseline(m, n, k, gpu, autotuning):

FILE: src/acc/libsmm_acc/kernels/smm_acc_predict.py
  function get_max_performances_per_mnk (line 160) | def get_max_performances_per_mnk(data):
  function get_baseline_performances_per_mnk (line 191) | def get_baseline_performances_per_mnk(data, algorithm, gpu, autotuning):
  class PredictiveParameters (line 266) | class PredictiveParameters:
    method __init__ (line 271) | def __init__(
    method get (line 330) | def get(self, feature_name):
    method get_features (line 343) | def get_features(self, feature_names):
    method get_perf_scaled (line 354) | def get_perf_scaled(self):
    method get_size_a (line 372) | def get_size_a(self):
    method get_size_b (line 376) | def get_size_b(self):
    method get_size_c (line 380) | def get_size_c(self):
    method get_mnk_string (line 384) | def get_mnk_string(self):
    method get_mnk (line 391) | def get_mnk(self):
    method get_mxnxk (line 395) | def get_mxnxk(self):
    method get_need_sync (line 401) | def get_need_sync(self):
    method get_nblks (line 410) | def get_nblks(self):
    method get_warps_per_blk (line 416) | def get_warps_per_blk(self):
    method get_nwarps (line 422) | def get_nwarps(self):
    method get_sm_desired (line 426) | def get_sm_desired(self):
    method get_threads_per_blk (line 433) | def get_threads_per_blk(self):
    method get_nthreads (line 437) | def get_nthreads(self):
    method get_ru_param_stack_unroll_factor (line 443) | def get_ru_param_stack_unroll_factor(self):
    method get_n_iter (line 447) | def get_n_iter(self):
    method get_Gflops (line 456) | def get_Gflops(self):
    method get_nblocks_per_sm_lim_blks_warps (line 470) | def get_nblocks_per_sm_lim_blks_warps(self):
    method get_ru_tinysmallmed_unroll_factor_a (line 479) | def get_ru_tinysmallmed_unroll_factor_a(self):
    method get_ru_tinysmallmed_unroll_factor_b (line 483) | def get_ru_tinysmallmed_unroll_factor_b(self):
    method get_ru_tinysmallmed_unroll_factor_a_total (line 487) | def get_ru_tinysmallmed_unroll_factor_a_total(self):
    method get_ru_tinysmallmed_unroll_factor_b_total (line 491) | def get_ru_tinysmallmed_unroll_factor_b_total(self):
    method get_ru_tinysmallmed_unroll_factor_c_total (line 495) | def get_ru_tinysmallmed_unroll_factor_c_total(self):
    method get_ru_tiny_max_parallel_work (line 501) | def get_ru_tiny_max_parallel_work(self):
    method get_ru_tiny_min_threads (line 512) | def get_ru_tiny_min_threads(self):
    method get_ru_tiny_buf_size (line 516) | def get_ru_tiny_buf_size(self):
    method get_ru_tiny_smem_per_block (line 520) | def get_ru_tiny_smem_per_block(self):
    method get_ru_tiny_nblks_per_sm (line 528) | def get_ru_tiny_nblks_per_sm(self):
    method get_ru_tiny_nwarps_per_sm (line 535) | def get_ru_tiny_nwarps_per_sm(self):
    method get_ru_tiny_nsm (line 539) | def get_ru_tiny_nsm(self):
    method get_ru_tiny_ngpu (line 545) | def get_ru_tiny_ngpu(self):
    method get_ru_tiny_occupancy (line 551) | def get_ru_tiny_occupancy(self):
    method get_tiny_estimate_Nmem_shared (line 556) | def get_tiny_estimate_Nmem_shared(self):
    method get_tiny_estimate_Nmem_global (line 568) | def get_tiny_estimate_Nmem_global(self):
    method get_tiny_estimate_Nmem (line 578) | def get_tiny_estimate_Nmem(self):
    method get_tiny_estimate_perf (line 590) | def get_tiny_estimate_perf(self):
    method get_ru_smallmedlarge_cmax (line 603) | def get_ru_smallmedlarge_cmax(self):
    method get_ru_smallmedlarge_rmax (line 606) | def get_ru_smallmedlarge_rmax(self):
    method get_ru_smallmedlarge_T (line 609) | def get_ru_smallmedlarge_T(self):
    method get_ru_smallmedlarge_min_threads (line 612) | def get_ru_smallmedlarge_min_threads(self):
    method get_ru_smallmed_tm_max (line 619) | def get_ru_smallmed_tm_max(self):
    method get_ru_smallmed_tn_max (line 622) | def get_ru_smallmed_tn_max(self):
    method get_ru_smallmed_unroll_factor_c (line 625) | def get_ru_smallmed_unroll_factor_c(self):
    method get_ru_smallmed_loop_matmul (line 629) | def get_ru_smallmed_loop_matmul(self):
    method get_ru_smallmed_max_parallel_work (line 633) | def get_ru_smallmed_max_parallel_work(self):
    method get_ru_smallmed_buf_size (line 645) | def get_ru_smallmed_buf_size(self):
    method get_ru_smallmed_smem_per_block (line 657) | def get_ru_smallmed_smem_per_block(self):
    method get_ru_smallmed_regs_per_thread (line 665) | def get_ru_smallmed_regs_per_thread(self):
    method get_load_unroll_factor_1 (line 678) | def get_load_unroll_factor_1(self):
    method get_load_unroll_factor_2 (line 681) | def get_load_unroll_factor_2(self):
    method get_n_mkloads (line 684) | def get_n_mkloads(self):
    method get_n_knloads (line 689) | def get_n_knloads(self):
    method get_ru_large_Pa (line 696) | def get_ru_large_Pa(self):
    method get_ru_large_Pb (line 700) | def get_ru_large_Pb(self):
    method get_ru_large_Pc (line 704) | def get_ru_large_Pc(self):
    method get_ru_large_unroll_factor_a (line 708) | def get_ru_large_unroll_factor_a(self):
    method get_ru_large_unroll_factor_b (line 713) | def get_ru_large_unroll_factor_b(self):
    method get_ru_large_unroll_factor_c (line 718) | def get_ru_large_unroll_factor_c(self):
    method get_ru_large_loop_matmul (line 723) | def get_ru_large_loop_matmul(self):
    method get_ru_large_max_concurrent_work (line 726) | def get_ru_large_max_concurrent_work(self):
    method get_ru_large_regs_per_thread (line 738) | def get_ru_large_regs_per_thread(self):
    method get_ru_large_n_DB_iter (line 748) | def get_ru_large_n_DB_iter(self):
    method get_ru_large_buf_size (line 752) | def get_ru_large_buf_size(self):
    method get_ru_large_smem_per_block (line 766) | def get_ru_large_smem_per_block(self):

FILE: src/acc/libsmm_acc/libsmm_acc.cpp
  function launch_kernel_from_handle (line 44) | inline int launch_kernel_from_handle(
  function libsmm_acc_process_blas (line 256) | int libsmm_acc_process_blas(const int* param_stack, int stack_size, ACC_...
  function libsmm_acc_process_d (line 281) | int libsmm_acc_process_d(const int* param_stack, int stack_size, ACC_DRV...
  function libsmm_acc_process (line 324) | int libsmm_acc_process(const int* param_stack_host, const int* param_sta...
  function libsmm_acc_transpose_d (line 447) | int libsmm_acc_transpose_d(const int* trs_stack, int offset, int stack_s...
  function libsmm_acc_transpose (line 482) | int libsmm_acc_transpose(const int* trs_stack, int offset, int stack_siz...

FILE: src/acc/libsmm_acc/libsmm_acc.h
  type libsmm_acc_algo (line 28) | enum libsmm_acc_algo { largeDB1 = 1, largeDB2 = 2, medium = 3, small = 4...

FILE: src/acc/libsmm_acc/libsmm_acc_benchmark.cpp
  function libsmm_acc_benchmark_init (line 24) | void libsmm_acc_benchmark_init(libsmm_acc_benchmark_t** handle, benchmar...
  function libsmm_acc_benchmark_finalize (line 81) | void libsmm_acc_benchmark_finalize(libsmm_acc_benchmark_t* handle) {
  function matInit (line 103) | void matInit(double* mat, int mat_n, int x, int y, int seed) {
  function stackInit (line 114) | void stackInit(int* stack, int n_stack, int n_c, int n_a, int n_b, int m...
  function stackInitTransp (line 120) | void stackInitTransp(int* stack, int n_stack, int mat_m, int mat_n) {
  function stackCalc (line 126) | void stackCalc(int* stack, int n_stack, double* mat_c, double* mat_a, do...
  function stackTransp (line 148) | void stackTransp(int* stack, int n_stack, double* mat, double* mat_trs, ...
  function checkSum (line 164) | double checkSum(double* mat_c, int n_c, int mat_m, int mat_n) {
  function checkSumTransp (line 173) | double checkSumTransp(double* mat, int n_stack, int mat_m, int mat_n) {
  function clean_string (line 195) | static void clean_string(char* str_in, char* str_out) {
  function libsmm_acc_benchmark (line 208) | int libsmm_acc_benchmark(
  function libsmm_acc_benchmark_transpose_ (line 324) | int libsmm_acc_benchmark_transpose_(int n_stack, int* stack, int* d_stac...
  function libsmm_acc_benchmark_transpose (line 385) | int libsmm_acc_benchmark_transpose(

FILE: src/acc/libsmm_acc/libsmm_acc_benchmark.h
  type benchmark_mode (line 27) | enum benchmark_mode { test, tune, timing }
  type libsmm_acc_benchmark_t (line 29) | typedef struct {

FILE: src/acc/libsmm_acc/libsmm_acc_init.cpp
  function timeset (line 23) | void timeset(const std::string& routine_name, int& handle) {
  function timestop (line 28) | void timestop(int handle) { c_dbcsr_timestop(&handle); }
  function timeset (line 30) | void timeset(const std::string& routine_name, int& handle) {
  function timestop (line 34) | void timestop(int handle) { (void)(handle); }
  function libsmm_acc_gpu_blas_init (line 38) | int libsmm_acc_gpu_blas_init() {
  function libsmm_acc_init (line 60) | int libsmm_acc_init() {
  function libsmm_acc_finalize (line 76) | int libsmm_acc_finalize() {
  function libsmm_acc_check_gpu_warp_size_consistency (line 94) | int libsmm_acc_check_gpu_warp_size_consistency() {
  function libsmm_acc_is_thread_safe (line 107) | int libsmm_acc_is_thread_safe() {

FILE: src/acc/libsmm_acc/parameters_utils.h
  type std (line 18) | typedef std::array<int, 3> Triplet;
  type std (line 19) | typedef std::array<int, 8> KernelParameters;
  function namespace (line 21) | namespace std {
  function get_libsmm_acc_triplets (line 34) | inline void get_libsmm_acc_triplets(std::vector<Triplet>& v, std::unorde...

FILE: src/acc/libsmm_acc/tune/tune_collect.py
  class awinner (line 30) | class awinner:
  function tune_sort_key (line 37) | def tune_sort_key(path: Path):
  function main (line 48) | def main(tune_dir=Path(".")):
  function process_log (line 99) | def process_log(log_fn: Path, mnk, winners):

FILE: src/acc/libsmm_acc/tune/tune_merge.py
  function main (line 20) | def main(param_fn):

FILE: src/acc/libsmm_acc/tune/tune_setup.py
  function main (line 31) | def main(
  function format_params (line 94) | def format_params(params):
  function get_file_extension_from_compiler (line 120) | def get_file_extension_from_compiler(compiler):
  function gen_benchmark (line 125) | def gen_benchmark(outdir, gpu_properties, autotuning_properties, compile...
  function gen_jobfile (line 255) | def gen_jobfile(outdir, compiler, m, n, k, cpus_per_task, max_num_nodes=0):
  function gen_makefile (line 331) | def gen_makefile(outdir, compiler, arch):
  function gen_collect (line 408) | def gen_collect(outdir: Path, triples):
  function writefile (line 419) | def writefile(fn: Path, content):
  function combinations (line 427) | def combinations(*sizes):

FILE: src/acc/libsmm_acc/tune/tune_submit.py
  function tune_sort_key (line 19) | def tune_sort_key(path: Path):
  function main (line 30) | def main(submit_jobs, num_jobs, tune_dir: Path, sbatch_args):

FILE: src/acc/opencl/acc_opencl.c
  function c_dbcsr_acc_opencl_notify (line 83) | void c_dbcsr_acc_opencl_notify(const char errinfo[], const void* private...
  function c_dbcsr_acc_opencl_order_devices (line 97) | int c_dbcsr_acc_opencl_order_devices(const void* dev_a, const void* dev_...
  function c_dbcsr_acc_opencl_configure (line 155) | void c_dbcsr_acc_opencl_configure(void) {
  function c_dbcsr_acc_init (line 315) | int c_dbcsr_acc_init(void) {
  function LIBXSMM_ATTRIBUTE_CTOR (line 671) | LIBXSMM_ATTRIBUTE_CTOR void c_dbcsr_acc_opencl_init(void) {
  function LIBXSMM_ATTRIBUTE_DTOR (line 682) | LIBXSMM_ATTRIBUTE_DTOR void c_dbcsr_acc_opencl_finalize(void) {
  function c_dbcsr_acc_finalize (line 731) | int c_dbcsr_acc_finalize(void) {
  function c_dbcsr_acc_opencl_use_cmem (line 767) | int c_dbcsr_acc_opencl_use_cmem(const c_dbcsr_acc_opencl_device_t* devin...
  function c_dbcsr_acc_clear_errors (line 776) | void c_dbcsr_acc_clear_errors(void) {}
  function c_dbcsr_acc_get_ndevices (line 779) | int c_dbcsr_acc_get_ndevices(int* ndevices) {
  function c_dbcsr_acc_opencl_device_id (line 808) | int c_dbcsr_acc_opencl_device_id(cl_device_id device, int* device_id, in...
  function c_dbcsr_acc_opencl_device_vendor (line 839) | int c_dbcsr_acc_opencl_device_vendor(cl_device_id device, const char ven...
  function c_dbcsr_acc_opencl_device_uid (line 861) | int c_dbcsr_acc_opencl_device_uid(cl_device_id device, const char devnam...
  function c_dbcsr_acc_opencl_device_name (line 895) | int c_dbcsr_acc_opencl_device_name(
  function c_dbcsr_acc_opencl_device_level (line 920) | int c_dbcsr_acc_opencl_device_level(
  function c_dbcsr_acc_opencl_device_ext (line 976) | int c_dbcsr_acc_opencl_device_ext(cl_device_id device, const char* const...
  function c_dbcsr_acc_opencl_create_context (line 999) | int c_dbcsr_acc_opencl_create_context(cl_device_id active_id, cl_context...
  function c_dbcsr_acc_opencl_set_active_device (line 1056) | int c_dbcsr_acc_opencl_set_active_device(ACC_OPENCL_LOCKTYPE* lock, int ...
  function c_dbcsr_acc_set_active_device (line 1239) | int c_dbcsr_acc_set_active_device(int device_id) {
  function c_dbcsr_acc_opencl_flags_atomics (line 1268) | int c_dbcsr_acc_opencl_flags_atomics(const c_dbcsr_acc_opencl_device_t* ...
  function c_dbcsr_acc_opencl_defines (line 1417) | int c_dbcsr_acc_opencl_defines(const char defines[], char buffer[], size...
  function c_dbcsr_acc_opencl_kernel_flags (line 1442) | int c_dbcsr_acc_opencl_kernel_flags(const char build_params[], const cha...
  function c_dbcsr_acc_opencl_kernel (line 1474) | int c_dbcsr_acc_opencl_kernel(size_t source_kind, const char source[], c...
  function c_dbcsr_acc_opencl_set_kernel_ptr (line 1788) | int c_dbcsr_acc_opencl_set_kernel_ptr(cl_kernel kernel, cl_uint arg_inde...
  function c_dbcsr_acc_opencl_duration (line 1814) | double c_dbcsr_acc_opencl_duration(cl_event event, int* result_code) {
  type c_dbcsr_acc_opencl_hist_t (line 1832) | typedef struct c_dbcsr_acc_opencl_hist_t {
  function c_dbcsr_acc_opencl_hist_create (line 1839) | void c_dbcsr_acc_opencl_hist_create(
  function c_dbcsr_acc_opencl_hist_avg (line 1877) | void c_dbcsr_acc_opencl_hist_avg(double* dst, const double* src) {
  function c_dbcsr_acc_opencl_hist_add (line 1883) | void c_dbcsr_acc_opencl_hist_add(double* dst, const double* src) {
  function c_dbcsr_acc_opencl_hist_set (line 1889) | void c_dbcsr_acc_opencl_hist_set(ACC_OPENCL_LOCKTYPE* lock, void* hist, ...
  function c_dbcsr_acc_opencl_hist_get (line 1927) | void c_dbcsr_acc_opencl_hist_get(
  function c_dbcsr_acc_opencl_hist_print (line 1983) | void c_dbcsr_acc_opencl_hist_print(
  function c_dbcsr_acc_opencl_hist_free (line 2014) | void c_dbcsr_acc_opencl_hist_free(void* hist) {
  function c_dbcsr_acc_opencl_strimatch (line 2051) | int c_dbcsr_acc_opencl_strimatch(const char a[], const char b[], const c...

FILE: src/acc/opencl/acc_opencl.h
  type c_dbcsr_acc_opencl_error_t (line 244) | typedef struct c_dbcsr_acc_opencl_error_t {
  type c_dbcsr_acc_opencl_stream_t (line 250) | typedef struct c_dbcsr_acc_opencl_stream_t {
  type c_dbcsr_acc_opencl_device_t (line 259) | typedef struct c_dbcsr_acc_opencl_device_t {
  type c_dbcsr_acc_event_kind_t (line 299) | typedef enum c_dbcsr_acc_event_kind_t {
  type c_dbcsr_acc_opencl_info_memptr_t (line 307) | typedef struct c_dbcsr_acc_opencl_info_memptr_t {
  type c_dbcsr_acc_opencl_atomic_fp_t (line 313) | typedef enum c_dbcsr_acc_opencl_atomic_fp_t {
  type c_dbcsr_acc_opencl_config_t (line 323) | typedef struct c_dbcsr_acc_opencl_config_t {

FILE: src/acc/opencl/acc_opencl_event.c
  function c_dbcsr_acc_event_create (line 17) | int c_dbcsr_acc_event_create(void** event_p) {
  function c_dbcsr_acc_event_destroy (line 39) | int c_dbcsr_acc_event_destroy(void* event) {
  function c_dbcsr_acc_stream_wait_event (line 69) | int c_dbcsr_acc_stream_wait_event(void* stream, void* event) { /* wait f...
  function c_dbcsr_acc_event_record (line 105) | int c_dbcsr_acc_event_record(void* event, void* stream) {
  function c_dbcsr_acc_event_query (line 144) | int c_dbcsr_acc_event_query(void* event, c_dbcsr_acc_bool_t* has_occurre...
  function c_dbcsr_acc_event_synchronize (line 169) | int c_dbcsr_acc_event_synchronize(void* event) { /* waits on the host-si...

FILE: src/acc/opencl/acc_opencl_mem.c
  function c_dbcsr_acc_opencl_pmalloc_init (line 54) | void c_dbcsr_acc_opencl_pmalloc_init(size_t size, size_t* num, void* poo...
  function c_dbcsr_acc_opencl_pfree (line 73) | void c_dbcsr_acc_opencl_pfree(const void* pointer, void* pool[], size_t*...
  function c_dbcsr_acc_opencl_info_memptr_t (line 82) | c_dbcsr_acc_opencl_info_memptr_t* c_dbcsr_acc_opencl_info_hostptr(const ...
  function c_dbcsr_acc_opencl_info_memptr_t (line 97) | c_dbcsr_acc_opencl_info_memptr_t* c_dbcsr_acc_opencl_info_devptr_modify(
  function c_dbcsr_acc_opencl_info_devptr_lock (line 164) | int c_dbcsr_acc_opencl_info_devptr_lock(c_dbcsr_acc_opencl_info_memptr_t...
  function c_dbcsr_acc_opencl_info_devptr (line 184) | int c_dbcsr_acc_opencl_info_devptr(
  function c_dbcsr_acc_host_mem_deallocate_internal (line 198) | int c_dbcsr_acc_host_mem_deallocate_internal(void* host_ptr, cl_command_...
  function c_dbcsr_acc_host_mem_allocate (line 232) | int c_dbcsr_acc_host_mem_allocate(void** host_mem, size_t nbytes, void* ...
  function c_dbcsr_acc_host_mem_deallocate (line 354) | int c_dbcsr_acc_host_mem_deallocate(void* host_mem, void* stream) {
  function c_dbcsr_acc_memcpy_notify (line 396) | void CL_CALLBACK c_dbcsr_acc_memcpy_notify(cl_event event, cl_int event_...
  function c_dbcsr_acc_dev_mem_allocate (line 437) | int c_dbcsr_acc_dev_mem_allocate(void** dev_mem, size_t nbytes) {
  function c_dbcsr_acc_dev_mem_deallocate (line 564) | int c_dbcsr_acc_dev_mem_deallocate(void* dev_mem) {
  function c_dbcsr_acc_dev_mem_set_ptr (line 615) | int c_dbcsr_acc_dev_mem_set_ptr(void** dev_mem, void* other, size_t offs...
  function c_dbcsr_acc_memcpy_h2d (line 640) | int c_dbcsr_acc_memcpy_h2d(const void* host_mem, void* dev_mem, size_t n...
  function c_dbcsr_acc_opencl_memcpy_d2h (line 722) | int c_dbcsr_acc_opencl_memcpy_d2h(
  function c_dbcsr_acc_memcpy_d2d (line 845) | int c_dbcsr_acc_memcpy_d2d(const void* devmem_src, void* devmem_dst, siz...
  function c_dbcsr_acc_opencl_memset (line 931) | int c_dbcsr_acc_opencl_memset(void* dev_mem, int value, size_t offset, s...
  function c_dbcsr_acc_memset_zero (line 997) | int c_dbcsr_acc_memset_zero(void* dev_mem, size_t offset, size_t nbytes,...
  function c_dbcsr_acc_opencl_info_devmem (line 1002) | int c_dbcsr_acc_opencl_info_devmem(cl_device_id device, size_t* mem_free...
  function c_dbcsr_acc_dev_mem_info (line 1074) | int c_dbcsr_acc_dev_mem_info(size_t* mem_free, size_t* mem_total) {

FILE: src/acc/opencl/acc_opencl_stream.c
  function c_dbcsr_acc_opencl_stream_t (line 19) | const c_dbcsr_acc_opencl_stream_t* c_dbcsr_acc_opencl_stream(ACC_OPENCL_...
  function c_dbcsr_acc_opencl_stream_t (line 48) | const c_dbcsr_acc_opencl_stream_t* c_dbcsr_acc_opencl_stream_default(voi...
  function c_dbcsr_acc_stream_create (line 56) | int c_dbcsr_acc_stream_create(void** stream_p, const char* name, int pri...
  function c_dbcsr_acc_stream_destroy (line 190) | int c_dbcsr_acc_stream_destroy(void* stream) {
  function c_dbcsr_acc_stream_priority_range (line 221) | int c_dbcsr_acc_stream_priority_range(int* least, int* greatest) {
  function c_dbcsr_acc_stream_sync (line 263) | int c_dbcsr_acc_stream_sync(void* stream) {
  function c_dbcsr_acc_opencl_device_synchronize (line 297) | int c_dbcsr_acc_opencl_device_synchronize(ACC_OPENCL_LOCKTYPE* lock, int...
  function c_dbcsr_acc_device_synchronize (line 322) | int c_dbcsr_acc_device_synchronize(void) {

FILE: src/acc/opencl/common/opencl_atomics.h
  function atomic32_add64_global (line 57) | __attribute__((always_inline)) inline void atomic32_add64_global(GLOBAL_...
  function atomic_add_global_cmpxchg (line 66) | __attribute__((always_inline)) inline void atomic_add_global_cmpxchg(GLO...
  function atomic_add_global_cmpxchg2 (line 93) | __attribute__((always_inline)) inline void atomic_add_global_cmpxchg2(GL...
  function atomic_add_global_xchg (line 116) | __attribute__((always_inline)) inline void atomic_add_global_xchg(GLOBAL...

FILE: src/acc/opencl/smm/opencl_libsmm.c
  function opencl_libsmm_write_trans_params (line 90) | int opencl_libsmm_write_trans_params(FILE* stream, int only_key, const o...
  function opencl_libsmm_write_smm_params (line 116) | int opencl_libsmm_write_smm_params(FILE* stream, int only_key, const ope...
  function opencl_libsmm_read_smm_params (line 152) | int opencl_libsmm_read_smm_params(char* parambuf, opencl_libsmm_smmkey_t...
  function libsmm_acc_init (line 342) | int libsmm_acc_init(void) {
  function libsmm_acc_finalize (line 545) | int libsmm_acc_finalize(void) {
  function c_dbcsr_acc_bool_t (line 600) | c_dbcsr_acc_bool_t libsmm_acc_is_thread_safe(void) {
  function libsmm_acc_transpose (line 634) | int libsmm_acc_transpose(const int* dev_trs_stack, int offset, int stack...
  function c_dbcsr_acc_bool_t (line 805) | c_dbcsr_acc_bool_t libsmm_acc_process_suitable(
  function opencl_libsmm_acc_set_dbm_launch_fn (line 853) | void opencl_libsmm_acc_set_dbm_launch_fn(opencl_libsmm_acc_dbm_launch_fn...
  function opencl_libsmm_acc_process (line 861) | int opencl_libsmm_acc_process(const int* host_param_stack, const int* de...
  function libsmm_acc_process (line 1246) | int libsmm_acc_process(const int* host_param_stack, const int* dev_param...
  function c_calculate_norms (line 1267) | int c_calculate_norms(const double* mat, int nblks, const int* offsets, ...

FILE: src/acc/opencl/smm/opencl_libsmm.h
  type opencl_libsmm_transkey_t (line 38) | typedef struct opencl_libsmm_transkey_t {
  type opencl_libsmm_trans_t (line 44) | typedef struct opencl_libsmm_trans_t {
  type opencl_libsmm_smmkey_t (line 50) | typedef struct opencl_libsmm_smmkey_t {
  type opencl_libsmm_smm_t (line 58) | typedef struct opencl_libsmm_smm_t {
  type opencl_libsmm_perfest_t (line 67) | typedef struct opencl_libsmm_perfest_t {

FILE: src/acc/opencl/smm/tune_multiply.py
  function start (line 37) | def start(args):
  function env_intvalue (line 58) | def env_intvalue(env, default, lookup=True):
  function ilog2 (line 66) | def ilog2(n):
  class SmmTuner (line 74) | class SmmTuner(MeasurementInterface):
    method __init__ (line 75) | def __init__(self, args):
    method manipulator (line 247) | def manipulator(self):
    method create_param (line 250) | def create_param(
    method launch (line 283) | def launch(self, envs, check=None, nrep=None, verbose=None):
    method seed_configurations (line 305) | def seed_configurations(self):
    method objective (line 327) | def objective(self):
    method environment (line 333) | def environment(self, config):
    method run (line 340) | def run(self, desired_result, input=None, limit=None, message=None, nr...
    method update_jsons (line 406) | def update_jsons(self, filenames):
    method make_csv_record (line 432) | def make_csv_record(self, data, filename):
    method merge_jsons (line 458) | def merge_jsons(self, filenames):
    method rename_dotfile (line 579) | def rename_dotfile(self, dotfile):
    method save_final_config (line 594) | def save_final_config(self, configuration, final=True):
    method handle_sigint (line 659) | def handle_sigint(self, signum, frame):

FILE: src/dbcsr.h
  function c_dbcsr_init_lib (line 77) | inline void c_dbcsr_init_lib(MPI_Comm comm, int* io_unit) {
  function c_dbcsr_distribution_new (line 95) | inline void c_dbcsr_distribution_new(
  function c_dbcsr_get_group (line 292) | inline void c_dbcsr_get_group(dbcsr_matrix c_matrix, MPI_Comm* c_group) {
  function c_dbcsr_distribution_get (line 304) | inline void c_dbcsr_distribution_get(const dbcsr_distribution c_dist, in...
  function c_dbcsr_set (line 358) | inline void c_dbcsr_set(dbcsr_matrix c_matrix, const ${extype}$ c_alpha) {
  function c_dbcsr_add (line 362) | inline void c_dbcsr_add(dbcsr_matrix c_matrix_a, const dbcsr_matrix c_ma...
  function c_dbcsr_scale (line 367) | inline void c_dbcsr_scale(dbcsr_matrix c_matrix_a, const ${extype}$ c_al...
  function c_dbcsr_scale_by_vector (line 371) | inline void c_dbcsr_scale_by_vector(
  function c_dbcsr_multiply (line 376) | inline void c_dbcsr_multiply(char c_transa, char c_transb, const ${extyp...
  function c_dbcsr_add_on_diag (line 384) | inline void c_dbcsr_add_on_diag(dbcsr_matrix c_matrix, const ${extype}$ ...
  function c_dbcsr_set_diag (line 388) | inline void c_dbcsr_set_diag(dbcsr_matrix c_matrix, const ${extype}$* c_...
  function c_dbcsr_get_diag (line 392) | inline void c_dbcsr_get_diag(const dbcsr_matrix c_matrix, ${extype}$* c_...
  function c_dbcsr_trace (line 396) | inline void c_dbcsr_trace(const dbcsr_matrix c_matrix_a, ${extype}$* c_t...
  function c_dbcsr_dot (line 400) | inline void c_dbcsr_dot(const dbcsr_matrix c_matrix_a, const dbcsr_matri...
  function c_dbcsr_get_block_p (line 404) | inline void c_dbcsr_get_block_p(const dbcsr_matrix c_matrix, const int c...
  function c_dbcsr_get_block_p (line 409) | inline void c_dbcsr_get_block_p(const dbcsr_matrix c_matrix, const int c...
  function c_dbcsr_reserve_block2d (line 414) | inline void c_dbcsr_reserve_block2d(dbcsr_matrix c_matrix, const int c_r...
  function c_dbcsr_iterator_next_2d_block (line 420) | inline void c_dbcsr_iterator_next_2d_block(const dbcsr_iterator c_iterat...
  function c_dbcsr_put_block2d (line 427) | inline void c_dbcsr_put_block2d(dbcsr_matrix c_matrix, const int c_row, ...
  function c_dbcsr_get_data (line 432) | inline void c_dbcsr_get_data(const dbcsr_matrix c_matrix, ${extype}$** c...

FILE: src/tensors/dbcsr_tensor.h
  function c_dbcsr_t_distribution_new (line 41) | void c_dbcsr_t_distribution_new(
  function c_dbcsr_t_create_new (line 46) | void c_dbcsr_t_create_new(dbcsr_t_tensor* c_tensor, const char* c_name, ...
  function c_dbcsr_t_contract_$ (line 74) | void c_dbcsr_t_contract_${dsuffix}$(const ${ctype}
  function c_dbcsr_t_contract_index_$ (line 82) | void c_dbcsr_t_contract_index_${dsuffix}$(const ${ctype}
  function c_dbcsr_t_reserve_blocks_index (line 100) | void c_dbcsr_t_reserve_blocks_index(const dbcsr_t_tensor c_tensor, const...
  function c_dbcsr_t_get_info (line 127) | void c_dbcsr_t_get_info(const dbcsr_t_tensor c_tensor, const int tensor_...
  function in (line 187) | in ndims
  function in (line 197) | in ndims
  function in (line 211) | in ndims
  function endfor (line 218) | endfor
  function c_dbcsr_t_iterator_next_block (line 225) | inline void c_dbcsr_t_iterator_next_block(
  function c_dbcsr_t_filter (line 233) | inline void c_dbcsr_t_filter(
  function c_dbcsr_t_set (line 238) | inline void c_dbcsr_t_set(const dbcsr_t_tensor c_tensor, const ${ctype}$...
  function c_dbcsr_t_scale (line 242) | inline void c_dbcsr_t_scale(const dbcsr_t_tensor c_tensor, const ${ctype...
  function c_dbcsr_t_get_data_p (line 246) | inline void c_dbcsr_t_get_data_p(const dbcsr_t_tensor c_tensor, ${ctype}...

FILE: tests/dbcsr_acc_test.c
  function main (line 58) | int main(int argc, char* argv[]) {

FILE: tests/dbcsr_tensor_test.cpp
  function T (line 33) | T get_rand_real() { return dis(gen); }
  function random_dist (line 35) | std::vector<int> random_dist(int dist_size, int nbins) {
  function printvec (line 43) | void printvec(T& v) {
  function fill_random (line 50) | void fill_random(void* tensor, std::vector<std::vector<int>> nzblocks, s...
  function main (line 127) | int main(int argc, char* argv[]) {

FILE: tests/dbcsr_test.cpp
  function random_dist (line 25) | std::vector<int> random_dist(int dist_size, int nbins) {
  function main (line 34) | int main(int argc, char* argv[]) {

FILE: tests/generate_libsmm_acc_timer_multiply.py
  function format_to_cpp (line 18) | def format_to_cpp(kernels):
  function main (line 30) | def main(

FILE: tests/generate_libsmm_acc_unittest_multiply.py
  function format_to_cpp (line 18) | def format_to_cpp(kernels):
  function main (line 30) | def main(

FILE: tests/libsmm_acc_unittest_transpose.cpp
  function main (line 24) | int main(int argc, char** argv) {

Copy disabled (too large) Download .json

Condensed preview — 395 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (17,479K chars).

[
  {
    "path": ".ccls",
    "chars": 40,
    "preview": "clang\n%c -std=c17\n%cpp -std=c++17\n-Isrc/"
  },
  {
    "path": ".clang-format",
    "chars": 696,
    "preview": "---\nAlignAfterOpenBracket: DontAlign\nAlignEscapedNewlines: DontAlign\nAlignTrailingComments: false\nAllowShortCaseLabelsO"
  },
  {
    "path": ".cmake-format.py",
    "chars": 80,
    "preview": "# flake8: noqa\nwith section(\"format\"):\n    separate_ctrl_name_with_space = True\n"
  },
  {
    "path": ".codecov.yml",
    "chars": 110,
    "preview": "coverage:\n  precision: 1\n  round: down\n  range: 60..100\ncomment:\n  require_changes: true\n  after_n_builds: 12\n"
  },
  {
    "path": ".fortls",
    "chars": 29,
    "preview": "{\n  \"excl_paths\": [\"build\"]\n}"
  },
  {
    "path": ".git-blame-ignore-revs",
    "chars": 739,
    "preview": "# git commit hashes with whitespace/reformatting changes only\n# Make git-blame use this file by running:\n#   git config "
  },
  {
    "path": ".gitattributes",
    "chars": 128,
    "preview": ".gitattributes export-ignore\n.gitignore export-ignore\n.gitmodules export-ignore\n.travis.yml export-ignore\n.github export"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "chars": 839,
    "preview": "---\nname: Bug report\nabout: Create a report to help us improve\n\n---\n\n**Describe the bug**\nA clear and concise descriptio"
  },
  {
    "path": ".github/workflows/doc-generation.yml",
    "chars": 1902,
    "preview": "---\nname: Generating documentation\non:\n  push:\n    branches:\n    - 'develop'\n    tags:\n    - 'v*'\n  workflow_dispatch:\n\n"
  },
  {
    "path": ".github/workflows/docker-build-env.yml",
    "chars": 2965,
    "preview": "---\nname: Publish DBCSR Build Environments to the GitHub Contrainer Registry\n\non:\n  push:\n    branches:\n    - 'develop'\n"
  },
  {
    "path": ".github/workflows/release.yml",
    "chars": 1672,
    "preview": "---\nname: Create release\non:\n  push:\n    tags:\n    - 'v*'\n\njobs:\n  build-and-upload:\n    runs-on: ubuntu-latest\n    cont"
  },
  {
    "path": ".github/workflows/testing-gcc.yml",
    "chars": 860,
    "preview": "---\nname: Testing with latest gcc\non:\n  push:\n    branches:\n    - 'develop'\n  pull_request:\n\njobs:\n  build-and-test:\n   "
  },
  {
    "path": ".github/workflows/testing-linux.yml",
    "chars": 6800,
    "preview": "---\nname: Testing on Linux\non:\n  push:\n    branches:\n    - 'develop'\n  pull_request:\n\njobs:\n  ##########################"
  },
  {
    "path": ".github/workflows/testing-macos.yml",
    "chars": 1539,
    "preview": "---\nname: Testing on macOS\non:\n  push:\n    branches:\n    - 'develop'\n  pull_request:\n\njobs:\n  build-and-test:\n    runs-o"
  },
  {
    "path": ".gitignore",
    "chars": 2609,
    "preview": "# ignore project specific locations & files\n/lib/\n/obj/\n/bin/\n/doc/\n/install/\n*.callgraph\n\n# exclude personal makefile\n/"
  },
  {
    "path": ".gitmodules",
    "chars": 109,
    "preview": "[submodule \"tools/build_utils/fypp\"]\n\tpath = tools/build_utils/fypp\n\turl = https://github.com/aradi/fypp.git\n"
  },
  {
    "path": ".packit.yaml",
    "chars": 983,
    "preview": "specfile_path: tools/fedora/dbcsr.spec\nfiles_to_sync:\n  - src: tools/fedora/\n    dest: ./\n    delete: true\n    filters:\n"
  },
  {
    "path": ".pre-commit/check_header.py",
    "chars": 3939,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n#########################################################################"
  },
  {
    "path": ".pre-commit/clang-format-fypp.sh",
    "chars": 1545,
    "preview": "#!/usr/bin/env bash\n####################################################################################################"
  },
  {
    "path": ".pre-commit/headers/c_cpp.1",
    "chars": 808,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": ".pre-commit/headers/c_cpp.2",
    "chars": 909,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": ".pre-commit/headers/c_cpp.3",
    "chars": 808,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": ".pre-commit/headers/fortran.1",
    "chars": 808,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": ".pre-commit/headers/fortran.2",
    "chars": 909,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": ".pre-commit/headers/fypp.1",
    "chars": 816,
    "preview": "#!--------------------------------------------------------------------------------------------------!\n#! Copyright (C) b"
  },
  {
    "path": ".pre-commit/headers/script.1",
    "chars": 808,
    "preview": "####################################################################################################\n# Copyright (C) by "
  },
  {
    "path": ".pre-commit/headers/script.2",
    "chars": 808,
    "preview": "####################################################################################################\n# Copyright (C) by "
  },
  {
    "path": ".pre-commit-config.yaml",
    "chars": 1940,
    "preview": "default_language_version:\n    python: python3\n\nexclude: '^tools/(build_utils/fypp)'\nfail_fast: false\nminimum_pre_commit_"
  },
  {
    "path": ".ruff.toml",
    "chars": 61,
    "preview": "select = [\"E\", \"F\", \"B\"]\nline-length = 128\nignore = [\"B905\"]\n"
  },
  {
    "path": "AUTHORS",
    "chars": 1113,
    "preview": "Alfio Lazzaro <alfio.lazzaro@gmail.com>\nAndreas Gloeß <agloess@cp2k.org>\nChristian Pousa <pousa@cp2k.org>\nDorothea Golze"
  },
  {
    "path": "CMakeLists.txt",
    "chars": 13473,
    "preview": "cmake_minimum_required(VERSION 3.22)\n\nset(CMAKE_INTERPROCEDURAL_OPTIMIZATION FALSE FORCE)\n\n# include our cmake snippets\n"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 3342,
    "preview": "# Contributing to DBCSR\nThe core of DBCSR is written in Fortran. All other languages must be supported through bindings."
  },
  {
    "path": "DBCSR.md",
    "chars": 2551,
    "preview": "---\nproject: DBCSR\nproject_github: https://github.com/cp2k/dbcsr\nproject_download: https://github.com/cp2k/dbcsr/release"
  },
  {
    "path": "LICENSE",
    "chars": 18109,
    "preview": "                    GNU GENERAL PUBLIC LICENSE\n                       Version 2, June 1991\n\n Copyright (C) 1989, 1991 Fr"
  },
  {
    "path": "README.md",
    "chars": 2333,
    "preview": "# DBCSR: Distributed Block Compressed Sparse Row matrix library\n\n[![Build Status Linux](https://github.com/cp2k/dbcsr/ac"
  },
  {
    "path": "VERSION",
    "chars": 159,
    "preview": "MAJOR = 2\nMINOR = 9\nPATCH = 1\n# A specific DATE (YYYY-MM-DD) fixes an official release, otherwise\n# it is considered Dev"
  },
  {
    "path": "cmake/CheckCompilerSupport.cmake",
    "chars": 943,
    "preview": "include(CheckFortranSourceCompiles)\n\nset(CHECK_PROGRAMS f2008-norm2.f90 f2008-block_construct.f90\n                   f20"
  },
  {
    "path": "cmake/CompilerConfiguration.cmake",
    "chars": 8346,
    "preview": "if (CMAKE_Fortran_COMPILER_ID STREQUAL \"GNU\")\n  set(CMAKE_Fortran_FLAGS \"${CMAKE_Fortran_FLAGS} -ffree-form -std=f2008ts"
  },
  {
    "path": "cmake/CustomTargets.cmake",
    "chars": 1743,
    "preview": "# =================================================================================================\n# BUILD DISTRIBUTION"
  },
  {
    "path": "cmake/GetGitRevisionDescription.cmake",
    "chars": 5123,
    "preview": "# - Returns a version string from Git\n#\n# These functions force a re-configure on each git commit so that you can\n# trus"
  },
  {
    "path": "cmake/GetGitRevisionDescription.cmake.in",
    "chars": 1283,
    "preview": "#\n# Internal file for GetGitRevisionDescription.cmake\n#\n# Requires CMake 2.6 or newer (uses the 'function' command)\n#\n# "
  },
  {
    "path": "cmake/compiler-tests/f2008-block_construct.f90",
    "chars": 880,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "cmake/compiler-tests/f2008-contiguous.f90",
    "chars": 1161,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "cmake/compiler-tests/f2008-norm2.f90",
    "chars": 927,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "cmake/compiler-tests/f95-reshape-order-allocatable.f90",
    "chars": 1100,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "cmake/fypp-sources.cmake",
    "chars": 3054,
    "preview": "add_custom_target(fypp) # common target for all fypp calls\n\n# Use a system-provided fypp if available, otherwise the bun"
  },
  {
    "path": "docs/CMakeLists.txt",
    "chars": 902,
    "preview": "# =================================================================================================\n# FORD - DOCUMENTATI"
  },
  {
    "path": "docs/guide/1-DBCSR/index.md",
    "chars": 488,
    "preview": "title: DBCSR\n\n# DBCSR\n\nDBCSR is a sparse matrix library designed to efficiently perform sparse matrix-matrix multiplicat"
  },
  {
    "path": "docs/guide/1-DBCSR/publications.md",
    "chars": 1170,
    "preview": "title: Publications\n\n# Publications\n\n- **General overview of the library**\n\nUrban Borstnik, Joost VandeVondele, Valery W"
  },
  {
    "path": "docs/guide/2-user-guide/1-installation/1-cmake-build-recipes.md",
    "chars": 7317,
    "preview": "title: CMake Build Recipes\n\n# DBCSR CMake Build Recipes\n\nFollowing are recipes for different combinations of compilers, "
  },
  {
    "path": "docs/guide/2-user-guide/1-installation/2-supported-compilers.md",
    "chars": 432,
    "preview": "title: Supported Compilers\n\n# Supported compilers\n\nDBCSR uses the Fortran 2008+ standard, which requires up-to-date comp"
  },
  {
    "path": "docs/guide/2-user-guide/1-installation/3-using-dbcsr-in-a-cmake-project.md",
    "chars": 1567,
    "preview": "title: Using DBCSR in a CMake project\n\n# Using DBCSR in a CMake project\n\nWe are providing CMake helper files to easily i"
  },
  {
    "path": "docs/guide/2-user-guide/1-installation/4-docker.md",
    "chars": 52,
    "preview": "title: Docker Images\n\n{!./tools/docker/README.md!}\n\n"
  },
  {
    "path": "docs/guide/2-user-guide/1-installation/index.md",
    "chars": 3595,
    "preview": "title: Install\n\n# Install\n\n## Prerequisites\n\nYou need:\n\n* [CMake](https://cmake.org/) (3.22+)\n* GNU make or Ninja\n* Fort"
  },
  {
    "path": "docs/guide/2-user-guide/2-tests/index.md",
    "chars": 3015,
    "preview": "title: Tests\n\n# Tests\n\n## Correctness tests\n\n- [[dbcsr_unittest_1(program)]] (fortran) : test matrix operations: add, mu"
  },
  {
    "path": "docs/guide/2-user-guide/3-examples/index.md",
    "chars": 1567,
    "preview": "title: Examples\n\n# Examples\n\n- [[dbcsr_example_1(program)]] : how to create a dbcsr matrix (fortran)\n- [[dbcsr_example_2"
  },
  {
    "path": "docs/guide/2-user-guide/4-gpu/index.md",
    "chars": 5188,
    "preview": "title: GPUs\n\n# Introduction\n\n[CP2K](https://github.com/cp2k/cp2k/) was initially enabled for GPUs by the means of the DB"
  },
  {
    "path": "docs/guide/2-user-guide/index.md",
    "chars": 18,
    "preview": "title: User Guide\n"
  },
  {
    "path": "docs/guide/3-developer-guide/1-tooling/index.md",
    "chars": 653,
    "preview": "title: Tooling\n\n# Build System\n\nWe support CMake for compilation. See [here](../../2-user-guide/1-installation/index.htm"
  },
  {
    "path": "docs/guide/3-developer-guide/2-documentation/index.md",
    "chars": 1178,
    "preview": "title: Documentation\n\n# Documentation\n\n## Build\n\nTo build the documentation you need [FORD](https://github.com/Fortran-F"
  },
  {
    "path": "docs/guide/3-developer-guide/3-programming/1-overview/index.md",
    "chars": 3432,
    "preview": "title: Overview\n\n# Code Architecture\n\n![DBCSR code architecture](dbcsr_mm_overview.png)\n\n```\ndbcsr/\n-- src/\n---- acc/: c"
  },
  {
    "path": "docs/guide/3-developer-guide/3-programming/2-accelerator-backend/1-code-structure.md",
    "chars": 462,
    "preview": "title: Code Structure\n\n# GPU Backend Code Architecture\n\n```\ndbcsr/\n-- src/\n---- acc/: contains interfaces to ACC and LIB"
  },
  {
    "path": "docs/guide/3-developer-guide/3-programming/2-accelerator-backend/2-libsmm_acc/1-kernels.md",
    "chars": 171,
    "preview": "title: Kernels\n\n![kernel parameters and memory](../../../../../media/images/libsmm_acc_parameters_and_memory.png){ width"
  },
  {
    "path": "docs/guide/3-developer-guide/3-programming/2-accelerator-backend/2-libsmm_acc/2-parameters.md",
    "chars": 516,
    "preview": "title: Kernel Parameters\n\n# Kernel Parameters\n\n## Batched Matrix-Matrix Multiplication Kernel Parameters\n\nThe batched ma"
  },
  {
    "path": "docs/guide/3-developer-guide/3-programming/2-accelerator-backend/2-libsmm_acc/3-tune.md",
    "chars": 69,
    "preview": "title: Autotuning Framework\n\n{!./src/acc/libsmm_acc/tune/README.md!}\n"
  },
  {
    "path": "docs/guide/3-developer-guide/3-programming/2-accelerator-backend/2-libsmm_acc/index.md",
    "chars": 52,
    "preview": "title: CUDA/HIP\n\n{!./src/acc/libsmm_acc/README.md!}\n"
  },
  {
    "path": "docs/guide/3-developer-guide/3-programming/2-accelerator-backend/3-libsmm_ocl/1-autotune.md",
    "chars": 61,
    "preview": "title: Autotune\n\n{!./src/acc/opencl/smm/README-autotune.md!}\n"
  },
  {
    "path": "docs/guide/3-developer-guide/3-programming/2-accelerator-backend/3-libsmm_ocl/2-bulktune.md",
    "chars": 63,
    "preview": "title: Parameters\n\n{!./src/acc/opencl/smm/README-bulktune.md!}\n"
  },
  {
    "path": "docs/guide/3-developer-guide/3-programming/2-accelerator-backend/3-libsmm_ocl/index.md",
    "chars": 82,
    "preview": "title: OpenCL\n\n{!./src/acc/opencl/README.md!}\n\n{!./src/acc/opencl/smm/README.md!}\n"
  },
  {
    "path": "docs/guide/3-developer-guide/3-programming/2-accelerator-backend/index.md",
    "chars": 52,
    "preview": "title: Accelerator Backend\n\n{!./src/acc/README.md!}\n"
  },
  {
    "path": "docs/guide/3-developer-guide/3-programming/index.md",
    "chars": 19,
    "preview": "title: Programming\n"
  },
  {
    "path": "docs/guide/3-developer-guide/4-performance/1-insights.md",
    "chars": 6901,
    "preview": "title: Insights\n\n# Insights into Performance\n\n## Read Timing & Statistics Reports\n\nAt the end of an output file, a repor"
  },
  {
    "path": "docs/guide/3-developer-guide/4-performance/2-just-in-time-compilation.md",
    "chars": 1805,
    "preview": "title: JIT\n\n# Just-In-Time (JIT) Compilation\n\nDBCSR's GPU backends rely on templated kernels for batched matrix multipli"
  },
  {
    "path": "docs/guide/3-developer-guide/4-performance/index.md",
    "chars": 19,
    "preview": "title: Performance\n"
  },
  {
    "path": "docs/guide/3-developer-guide/index.md",
    "chars": 23,
    "preview": "title: Developer Guide\n"
  },
  {
    "path": "docs/guide/index.md",
    "chars": 14,
    "preview": "title: Guide\n\n"
  },
  {
    "path": "examples/.gitignore",
    "chars": 4,
    "preview": "*.x\n"
  },
  {
    "path": "examples/CMakeLists.txt",
    "chars": 3066,
    "preview": "set(DBCSR_PROGRAM_SRCS_FTN dbcsr_example_1.F dbcsr_example_2.F\n                           dbcsr_example_3.F dbcsr_tensor"
  },
  {
    "path": "examples/README.md",
    "chars": 1504,
    "preview": "# Examples\n\n- [`dbcsr_example_1`](dbcsr_example_1.F): how to create a dbcsr matrix (fortran)\n- [`dbcsr_example_2`](dbcsr"
  },
  {
    "path": "examples/dbcsr_example_1.F",
    "chars": 5518,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "examples/dbcsr_example_2.F",
    "chars": 7029,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "examples/dbcsr_example_3.F",
    "chars": 8212,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "examples/dbcsr_example_3.cpp",
    "chars": 6528,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "examples/dbcsr_tensor_example_1.F",
    "chars": 40289,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "examples/dbcsr_tensor_example_2.cpp",
    "chars": 10722,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/.gitignore",
    "chars": 8,
    "preview": "!/dist/\n"
  },
  {
    "path": "src/CMakeLists.txt",
    "chars": 13672,
    "preview": "# =================================================================================================\n# INCLUDE\ninclude(fy"
  },
  {
    "path": "src/PACKAGE",
    "chars": 220,
    "preview": "{\n\"description\": \"Distributed Block Compressed Sparse Row, A sparse matrix library\",\n\"archive\": \"libdbcsr\",\n\"requires\": "
  },
  {
    "path": "src/acc/PACKAGE",
    "chars": 143,
    "preview": "{\n\"description\": \"Generic accelerator API\",\n\"archive\": \"libdbcsr\",\n\"requires\": [\"../base\", \"../core\", \"cuda\", \"hip\", \"op"
  },
  {
    "path": "src/acc/README.md",
    "chars": 3269,
    "preview": "# ACCelerator Interface\n\n## Backends\n\nThe accelerator interface (ACC) consists of ISO_C_BINDING based Fortran code of DB"
  },
  {
    "path": "src/acc/acc.h",
    "chars": 3145,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/acc_bench.c",
    "chars": 26684,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/acc_bench.h",
    "chars": 2988,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/acc_libsmm.h",
    "chars": 3190,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/acc_triplets.sh",
    "chars": 6450,
    "preview": "#!/usr/bin/env bash\n####################################################################################################"
  },
  {
    "path": "src/acc/cuda/Makefile",
    "chars": 9010,
    "preview": "# This Makefile builds the SMM benchmark driver without building DBCSR.\n# It is for testing and comparison with other im"
  },
  {
    "path": "src/acc/cuda/PACKAGE",
    "chars": 104,
    "preview": "{\n\"description\": \"Cuda backend for accelerator API\",\n\"archive\":\"libdbcsr\",\n\"requires\": [\"../../base\"]\n}\n"
  },
  {
    "path": "src/acc/cuda/acc_cuda.cpp",
    "chars": 1898,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/cuda/acc_cuda.h",
    "chars": 4235,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/cuda/dbcsr_cuda_nvtx_cu.cpp",
    "chars": 2542,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/cuda/dbcsr_cuda_profiling.F",
    "chars": 3501,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "src/acc/cuda_hip/PACKAGE",
    "chars": 135,
    "preview": "{\n\"description\": \"CUDA/HIP backend for accelerator API\",\n\"archive\":\"libdbcsr\",\n\"requires\": [\"../../base\", \"..\", \"../cuda"
  },
  {
    "path": "src/acc/cuda_hip/acc_blas.cpp",
    "chars": 2248,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/cuda_hip/acc_blas.h",
    "chars": 1619,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/cuda_hip/acc_dev.cpp",
    "chars": 2036,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/cuda_hip/acc_error.cpp",
    "chars": 1300,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/cuda_hip/acc_error.h",
    "chars": 1035,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/cuda_hip/acc_event.cpp",
    "chars": 3809,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/cuda_hip/acc_init.cpp",
    "chars": 1956,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/cuda_hip/acc_mem.cpp",
    "chars": 5360,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/cuda_hip/acc_stream.cpp",
    "chars": 2994,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/cuda_hip/acc_utils.cpp",
    "chars": 1212,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/cuda_hip/acc_utils.h",
    "chars": 907,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/cuda_hip/calculate_norms.cpp",
    "chars": 4496,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/dbcsr_acc_device.F",
    "chars": 4068,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "src/acc/dbcsr_acc_devmem.F",
    "chars": 21274,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "src/acc/dbcsr_acc_event.F",
    "chars": 8847,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "src/acc/dbcsr_acc_hostmem.F",
    "chars": 9521,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "src/acc/dbcsr_acc_init.F",
    "chars": 2753,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "src/acc/dbcsr_acc_stream.F",
    "chars": 7832,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "src/acc/dbcsr_acc_timings.F",
    "chars": 1999,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "src/acc/hip/PACKAGE",
    "chars": 103,
    "preview": "{\n\"description\": \"HIP backend for accelerator API\",\n\"archive\":\"libdbcsr\",\n\"requires\": [\"../../base\"]\n}\n"
  },
  {
    "path": "src/acc/hip/acc_hip.cpp",
    "chars": 2161,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/hip/acc_hip.h",
    "chars": 4905,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/hip/dbcsr_hip_profiling.F",
    "chars": 1623,
    "preview": "!--------------------------------------------------------------------------------------------------!\n! Copyright (C) by "
  },
  {
    "path": "src/acc/libsmm_acc/.gitignore",
    "chars": 79,
    "preview": "# Files generated at build stage\nparameters.h\nsmm_acc_kernels.h\n*.so\n.with_gpu\n"
  },
  {
    "path": "src/acc/libsmm_acc/CMakeLists.txt",
    "chars": 1416,
    "preview": "set(SMM_ACC_KERNELS\n    kernels/smm_acc_common.h\n    kernels/smm_acc_dnt_largeDB1.h\n    kernels/smm_acc_dnt_largeDB2.h\n "
  },
  {
    "path": "src/acc/libsmm_acc/PACKAGE",
    "chars": 162,
    "preview": "{\n\"description\": \"CUDA/HIP-accelerated library for small matrix multiplications\",\n\"archive\": \"libdbcsr\",\n\"requires\": [\"."
  },
  {
    "path": "src/acc/libsmm_acc/README.md",
    "chars": 4951,
    "preview": "# GPU Accelerated Small Matrix Multiplications\n\n`libsmm_acc` is a **lib**rary for **s**mall **m**atrix-**m**atrix multip"
  },
  {
    "path": "src/acc/libsmm_acc/generate_kernels.py",
    "chars": 5823,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n#########################################################################"
  },
  {
    "path": "src/acc/libsmm_acc/generate_parameters.py",
    "chars": 5692,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n#########################################################################"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/PACKAGE",
    "chars": 92,
    "preview": "{\n\"description\": \"Kernel templates for libsmm_acc\",\n\"archive\": \"libdbcsr\",\n\"requires\": []\n}\n"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/README.md",
    "chars": 2135,
    "preview": "# LIBSMM_ACC Kernels\n\n## Directory Organization\n\n* [`autotuning_properties.json`](https://github.com/cp2k/dbcsr/blob/dev"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/__init__.py",
    "chars": 941,
    "preview": "# -*- coding: utf-8 -*-\n################################################################################################"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/autotuning_properties.json",
    "chars": 408,
    "preview": "{\n    \"info\": {\n        \"header\": \"Autotuning characteristics\",\n        \"stack_size\": \"Number of block multiplications c"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/gpu_properties.json",
    "chars": 10778,
    "preview": "{\n    \"info\": {\n        \"header\": \"Hardware-dependent constraints (GPU P100, compute capability 6.0)\",\n        \"nvidia_s"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc.py",
    "chars": 6032,
    "preview": "####################################################################################################\n# Copyright (C) by "
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc_common.h",
    "chars": 2899,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc_dnt_base.py",
    "chars": 7148,
    "preview": "# -*- coding: utf-8 -*-\n################################################################################################"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc_dnt_largeDB1.h",
    "chars": 13697,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc_dnt_largeDB1.py",
    "chars": 9554,
    "preview": "# -*- coding: utf-8 -*-\n################################################################################################"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc_dnt_largeDB2.h",
    "chars": 13830,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc_dnt_largeDB2.py",
    "chars": 9535,
    "preview": "# -*- coding: utf-8 -*-\n################################################################################################"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc_dnt_medium.h",
    "chars": 18644,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc_dnt_medium.py",
    "chars": 8066,
    "preview": "# -*- coding: utf-8 -*-\n################################################################################################"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc_dnt_small.h",
    "chars": 9880,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc_dnt_small.py",
    "chars": 7540,
    "preview": "# -*- coding: utf-8 -*-\n################################################################################################"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc_dnt_tiny.h",
    "chars": 7952,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc_dnt_tiny.py",
    "chars": 6060,
    "preview": "# -*- coding: utf-8 -*-\n################################################################################################"
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc_predict.py",
    "chars": 30723,
    "preview": "####################################################################################################\n# Copyright (C) by "
  },
  {
    "path": "src/acc/libsmm_acc/kernels/smm_acc_transpose.h",
    "chars": 3114,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/libsmm_acc/libcusmm/.gitignore",
    "chars": 16,
    "preview": "cusmm_kernels.h\n"
  },
  {
    "path": "src/acc/libsmm_acc/libcusmm/PACKAGE",
    "chars": 138,
    "preview": "{\n\"description\": \"Cuda accelerated Small Matrix Multiplications\",\n\"archive\": \"libdbcsr\",\n\"requires\": [\"kernels\", \"..\", \""
  },
  {
    "path": "src/acc/libsmm_acc/libsmm_acc.cpp",
    "chars": 21330,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/libsmm_acc/libsmm_acc.h",
    "chars": 2231,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/libsmm_acc/libsmm_acc_benchmark.cpp",
    "chars": 14835,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/libsmm_acc/libsmm_acc_benchmark.h",
    "chars": 3351,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/libsmm_acc/libsmm_acc_init.cpp",
    "chars": 3800,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/libsmm_acc/libsmm_acc_init.h",
    "chars": 1327,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/libsmm_acc/parameters/parameters_H100.json",
    "chars": 519112,
    "preview": "[\n{\"m\": 4, \"n\": 4, \"k\": 4, \"tile_m\": 1, \"tile_n\": 1, \"threads\": 32, \"grouping\": 7, \"minblocks\": 13, \"algorithm\": \"medium"
  },
  {
    "path": "src/acc/libsmm_acc/parameters/parameters_K20X.json",
    "chars": 542668,
    "preview": "[\n{\"m\": 4, \"n\": 4, \"k\": 4, \"threads\": 64, \"grouping\": 16, \"minblocks\": 1, \"algorithm\": \"tiny\", \"perf\": 16.5663, \"source\""
  },
  {
    "path": "src/acc/libsmm_acc/parameters/parameters_K40.json",
    "chars": 3218,
    "preview": "[\n{\"m\": 5, \"n\": 5, \"k\": 5, \"threads\": 64, \"grouping\": 16, \"minblocks\": 1, \"algorithm\": \"tiny\", \"perf\": 30.4899, \"source\""
  },
  {
    "path": "src/acc/libsmm_acc/parameters/parameters_K80.json",
    "chars": 3202,
    "preview": "[\n{\"m\": 5, \"n\": 5, \"k\": 5, \"threads\": 64, \"grouping\": 16, \"minblocks\": 1, \"algorithm\": \"tiny\", \"perf\": 33.4544, \"source\""
  },
  {
    "path": "src/acc/libsmm_acc/parameters/parameters_Mi100.json",
    "chars": 69132,
    "preview": "[\n{\"m\": 3, \"n\": 3, \"k\": 3, \"tile_m\": 1, \"tile_n\": 1, \"threads\": 64, \"grouping\": 6, \"minblocks\": 2, \"algorithm\": \"medium\""
  },
  {
    "path": "src/acc/libsmm_acc/parameters/parameters_Mi250.json",
    "chars": 71118,
    "preview": "[\n{\"m\": 3, \"n\": 3, \"k\": 3, \"tile_m\": 1, \"tile_n\": 1, \"threads\": 64, \"grouping\": 20, \"minblocks\": 8, \"algorithm\": \"medium"
  },
  {
    "path": "src/acc/libsmm_acc/parameters/parameters_Mi300.json",
    "chars": 71118,
    "preview": "[\n{\"m\": 3, \"n\": 3, \"k\": 3, \"tile_m\": 1, \"tile_n\": 1, \"threads\": 64, \"grouping\": 20, \"minblocks\": 8, \"algorithm\": \"medium"
  },
  {
    "path": "src/acc/libsmm_acc/parameters/parameters_Mi350.json",
    "chars": 71118,
    "preview": "[\n{\"m\": 3, \"n\": 3, \"k\": 3, \"tile_m\": 1, \"tile_n\": 1, \"threads\": 64, \"grouping\": 20, \"minblocks\": 8, \"algorithm\": \"medium"
  },
  {
    "path": "src/acc/libsmm_acc/parameters/parameters_Mi50.json",
    "chars": 1711,
    "preview": "[\n{\"m\": 4, \"n\": 4, \"k\": 4, \"tile_m\": 1, \"tile_n\": 1, \"threads\": 64, \"grouping\": 17, \"minblocks\": 13, \"algorithm\": \"mediu"
  },
  {
    "path": "src/acc/libsmm_acc/parameters/parameters_V100.json",
    "chars": 9402714,
    "preview": "[\n{\"m\": 4, \"n\": 4, \"k\": 4, \"tile_m\": 1, \"tile_n\": 1, \"threads\": 32, \"grouping\": 10, \"minblocks\": 1, \"algorithm\": \"medium"
  },
  {
    "path": "src/acc/libsmm_acc/parameters_utils.h",
    "chars": 1737,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/libsmm_acc/tune/.gitignore",
    "chars": 7,
    "preview": "tune_*/"
  },
  {
    "path": "src/acc/libsmm_acc/tune/README.md",
    "chars": 9927,
    "preview": "# Auto-tuning Procedure for Finding Optimal CUDA/HIP Kernel Parameters in `libsmm_acc`\n\nThe performance of the matrix-ma"
  },
  {
    "path": "src/acc/libsmm_acc/tune/archive.sh",
    "chars": 1105,
    "preview": "#!/bin/bash -e\n####################################################################################################\n# Co"
  },
  {
    "path": "src/acc/libsmm_acc/tune/cleanup.sh",
    "chars": 875,
    "preview": "#!/bin/bash -e\n####################################################################################################\n# Co"
  },
  {
    "path": "src/acc/libsmm_acc/tune/requirements.txt",
    "chars": 14,
    "preview": "numpy==1.22.0\n"
  },
  {
    "path": "src/acc/libsmm_acc/tune/tune_collect.py",
    "chars": 5011,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n#########################################################################"
  },
  {
    "path": "src/acc/libsmm_acc/tune/tune_merge.py",
    "chars": 3209,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n#########################################################################"
  },
  {
    "path": "src/acc/libsmm_acc/tune/tune_setup.py",
    "chars": 18207,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n#########################################################################"
  },
  {
    "path": "src/acc/libsmm_acc/tune/tune_submit.py",
    "chars": 4299,
    "preview": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n#########################################################################"
  },
  {
    "path": "src/acc/opencl/Makefile",
    "chars": 10126,
    "preview": "MAKDIR := $(subst //,,$(dir $(firstword $(MAKEFILE_LIST)))/)\nACCDIR := $(MAKDIR)/..\nINCACC := $(wildcard $(MAKDIR)/*.h*)"
  },
  {
    "path": "src/acc/opencl/PACKAGE",
    "chars": 99,
    "preview": "{\n\"description\": \"OpenCL backend for accelerator API\",\n\"archive\": \"libdbcsr\",\n\"requires\": [\"..\"]\n}\n"
  },
  {
    "path": "src/acc/opencl/README.md",
    "chars": 3215,
    "preview": "# Backend\n\nThe OpenCL backend implements the [ACC interface](https://github.com/cp2k/dbcsr/blob/develop/src/acc/acc.h), "
  },
  {
    "path": "src/acc/opencl/acc_getenv.sh",
    "chars": 2111,
    "preview": "#!/usr/bin/env bash\n####################################################################################################"
  },
  {
    "path": "src/acc/opencl/acc_opencl.c",
    "chars": 95761,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/opencl/acc_opencl.h",
    "chars": 20583,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/opencl/acc_opencl.sh",
    "chars": 9047,
    "preview": "#!/usr/bin/env bash\n####################################################################################################"
  },
  {
    "path": "src/acc/opencl/acc_opencl_event.c",
    "chars": 8774,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/opencl/acc_opencl_mem.c",
    "chars": 47665,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/opencl/acc_opencl_stream.c",
    "chars": 14719,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/opencl/common/opencl_atomics.h",
    "chars": 5687,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/opencl/common/opencl_common.h",
    "chars": 2430,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/opencl/smm/.gitignore",
    "chars": 27,
    "preview": "opencl_kernels.h\n.with_gpu\n"
  },
  {
    "path": "src/acc/opencl/smm/CMakeLists.txt",
    "chars": 1883,
    "preview": "set(DBCSR_OPENCL_PARAMS_WITHGPU\n    ${CMAKE_CURRENT_SOURCE_DIR}/params/tune_multiply_${WITH_GPU}.csv)\nset(DBCSR_OPENCL_P"
  },
  {
    "path": "src/acc/opencl/smm/PACKAGE",
    "chars": 133,
    "preview": "{\n\"description\": \"OpenCL-accelerated library for small matrix multiplications\",\n\"archive\": \"libdbcsr\",\n\"requires\": [\"..\""
  },
  {
    "path": "src/acc/opencl/smm/README-autotune.md",
    "chars": 5530,
    "preview": "# Auto Tuning\n\nAuto tuning code for performance is a practical way to find the \"best\" setting for parameterized code (e."
  },
  {
    "path": "src/acc/opencl/smm/README-bulktune.md",
    "chars": 5886,
    "preview": "# Optimized Kernels\n\nOptimized kernel parameters are stored in JSON-files and are automatically summarized into a CSV-fi"
  },
  {
    "path": "src/acc/opencl/smm/README.md",
    "chars": 3563,
    "preview": "# LIBSMM\n\nThe LIBSMM library implements the [ACC LIBSMM interface](https://github.com/cp2k/dbcsr/blob/develop/src/acc/ac"
  },
  {
    "path": "src/acc/opencl/smm/kernels/multiply.cl",
    "chars": 18314,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/opencl/smm/kernels/transpose.cl",
    "chars": 2554,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  },
  {
    "path": "src/acc/opencl/smm/opencl_libsmm.c",
    "chars": 64434,
    "preview": "/*------------------------------------------------------------------------------------------------*/\n/* Copyright (C) by"
  }
]

// ... and 195 more files (download for full content)

About this extraction

This page contains the full source code of the cp2k/dbcsr GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 395 files (37.2 MB), approximately 3.9M tokens, and a symbol index with 389 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo