[
  {
    "path": ".gitignore",
    "content": "# Compiled Object files\n*.o\n*.obj\n\n# Other outputs from LLVM\n*.ll\n*.spv\n*.spt\n*.ptx\n*.bc\n\n# Compiled Dynamic libraries\n*.so\n*.dylib\n*.dll\n\n# Compiled Static libraries\n*.a\n*.lib\n__dummy_docs\n# Executables\n*.exe\n\n# Code coverage\n*.lst\n\n# DUB\n.dub\ndocs.json\n__dummy.html\n\n.DS_Store\n"
  },
  {
    "path": "LICENSE.txt",
    "content": "Boost Software License - Version 1.0 - August 17th, 2003\n\nPermission is hereby granted, free of charge, to any person or organization\nobtaining a copy of the software and accompanying documentation covered by\nthis license (the \"Software\") to use, reproduce, display, distribute,\nexecute, and transmit the Software, and to prepare derivative works of the\nSoftware, and to permit third-parties to whom the Software is furnished to\ndo so, all subject to the following:\n\nThe copyright notices in the Software and this entire statement, including\nthe above license grant, this restriction and the following disclaimer,\nmust be included in all copies of the Software, in whole or in part, and\nall derivative works of the Software, unless such copies or derivative\nworks are solely in the form of machine-executable object code generated by\na source language processor.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT\nSHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE\nFOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,\nARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER\nDEALINGS IN THE SOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# dcompute\n\n[![Latest version](https://img.shields.io/dub/v/dcompute.svg)](http://code.dlang.org/packages/dcompute)\n[![Latest version](https://img.shields.io/github/tag/libmir/dcompute.svg?maxAge=3600)](http://code.dlang.org/packages/dcompute)\n[![License](https://img.shields.io/dub/l/dcompute.svg)](http://code.dlang.org/packages/dcompute)\n[![Gitter](https://img.shields.io/gitter/room/libmir/public.svg)](https://gitter.im/libmir/public)\n\n## About\n\nThis project is a set of libraries designed to work with [LDC][1] to \nenable native execution of D on GPUs (and other more exotic targets of OpenCL such as FPGAs DSPs, hereafter just 'GPUs') on the OpenCL and CUDA runtimes. As DCompute depends on developments in LDC for the code generation, a relatively recent LDC is required, use [1.8.0](https://github.com/ldc-developers/ldc/releases/tag/v1.8.0) or newer.\n\nThere are four main parts: \n* [std](https://github.com/libmir/dcompute/tree/master/source/dcompute/std): A library containing standard functionality for targetting GPUs and abstractions over the intrinsics of OpenCL and CUDA.\n* [driver](https://github.com/libmir/dcompute/tree/master/source/dcompute/driver): For handling all the compute API interactions and provide a friendly, easy-to-use, consistent interface. Of course you can always get down to a lower level of interaction if you need to. You can also use this to execute non-D kernels (e.g. OpenCL or CUDA).\n* [kernels](https://github.com/libmir/dcompute/tree/master/source/dcompute/kernels): A set of standard kernels and primitives to cover a large number of use cases and serve as documentation on how (and how not) to use this library.\n* [tests](https://github.com/libmir/dcompute/tree/master/source/dcompute/tests): A framework for testing kernels. The suite is runnable with `dub test` (see `dub.json` for the configuration used).\n\n## Examples\n\n> **Note:** The `@kernel()` syntax requires LDC 1.42 or later. If you are using an older version of LDC, please use `@kernel` (without parentheses).\n\nKernel:\n```\n@kernel() void saxpy(GlobalPointer!(float) res,\n                   float alpha,\n                   GlobalPointer!(float) x,\n                   GlobalPointer!(float) y, \n                   size_t N)\n{\n    auto i = GlobalIndex.x;\n    if (i >= N) return;\n    res[i] = alpha*x[i] + y[i];\n}\n```\n\nInvoke with (CUDA):\n```\nq.enqueue!(saxpy)\n    ([N,1,1],[1,1,1]) // Grid & block & optional shared memory\n    (b_res,alpha,b_x,b_y, N); // kernel arguments\n```\nequivalent to the CUDA code\n```\nsaxpy<<<1,N,0,q>>>(b_res,alpha,b_x,b_y, N);\n```\n\nFor more examples and the full code see `source/dcompute/tests`.\n## Build Instructions\n\nTo build DCompute you will need:\n* [ldc][1] as the D dcompiler.\n* a SPIRV capable LLVM (available [here](https://github.com/thewilsonator/llvm/tree/compute) to build ldc to to support SPIRV (required for OpenCL)).\n* or LDC built with any LLVM 3.9.1 or greater that has the NVPTX backend enabled, to support CUDA.\n* [dub](https://github.com/dlang/dub) then just run `$dub build.`\n\nAlternatively, you can include dcompute as a dependency, as shown below:\n  * add\n    ```json\n\t\"dependencies\": {\n\t\t\"dcompute\": {\n\t\t\t\"version\": \"~>0.1.1\"\n\t\t}\n\t},\n    ```\n    to your `dub.json` under `dependencies`. You should include the following dub flags under `dflags-ldc`, which are passed to the compiler:\n\t```json\n\t\"dflags-ldc\": [\"-mdcompute-targets=cuda-800\",\"-mdcompute-targets=ocl-300\",\"-version=LDC_DCompute\",\"-oq\"],\n\t```\n\tThe dflags will be passed to LDC to generate code for the specified targets. You can run `ldc2 --help` to look for that flag. Use `ocl-xy0` for OpenCL x.y and `cuda-xy0` for CUDA Compute Capability x.y. So the above flags are for OpenCL 3.0 and CUDA CC 8.0. The two flags must be included separately as shown above.\n    * If you get an error saying `Need to use a DCompute enabled compiler`, you likely forgot the `-mdcompute-targets` flags.\n    * Check NVIDIA's [website](https://developer.nvidia.com/cuda-gpus) for your CUDA Compute Capability.\n  * Alternatively add the equivalent to dub.sdl, `dependency \"dcompute\" version=\"~>0.1.1\"` to your `dub.sdl` and include the dflags.\n\n\nIf you get an error like `Error: unrecognized switch '-mdcompute-targets=cuda-210`, make sure you are using LDC and not DMD: passing `--compiler=/path/to/ldc2` to dub will force it to use `/path/to/ldc2` as the D compiler.\n\nA dmd compatible d compiler,[dmd](https://github.com/dlang/dmd), ldmd or gdmd (available as part of [ldc][1] and [gdc](https://github.com/D-Programming-GDC/GDC) respectively), and cmake for building ldc is also required if you need to build ldc yourself.\n \n## Getting Started\n\nPlease see the [documentation](https://github.com/libmir/dcompute/blob/master/docs/README.md).\n\n## TODO\n\nGenerate OpenCL builtins from [here](https://github.com/KhronosGroup/SPIR-Tools/wiki/SPIR-2.0-built-in-functions)\n\n[1]: https://github.com/ldc-developers/ldc\n\n\n### Our sponsors\n\n[<img src=\"https://raw.githubusercontent.com/libmir/mir-algorithm/master/images/symmetry.png\" height=\"80\" />](http://symmetryinvestments.com/) \t&nbsp; \t&nbsp;\t&nbsp;\t&nbsp;\n[<img src=\"https://raw.githubusercontent.com/libmir/mir-algorithm/master/images/kaleidic.jpeg\" height=\"80\" />](https://github.com/kaleidicassociates)\n"
  },
  {
    "path": "docs/00-prerequsites.md",
    "content": "# Prerequisites\n\nIn order to use DCompute there are a few things you need before you start:\n\n* Capable hardware\n\n* Drivers for said hardware\n\n* LDC: the LLVM D compiler\n\n## Hardware & Drivers\n\nFor NVidia users any GPU with compute capability 2.1 or higher should work, \nalthough the hardware will dictate the available functionality.\nYou'll need to intall the CUDA development tools.\n\nFor everyone else you will need either a CPU or GPU (or other accellerator) \nwith an OpenCL 2.1 or higher device implementation.\n\n## LDC\n\nDue to the fact that DCompute leverages the LLVM NVPTX (for CUDA) & SPIR-V (for OpenCL)\nbackends to generate compute kernel code.\n\nTo see what targets your version of LDC has, execute `ldc2 -version`.\nWe aim to support the most recent releases of LDC, but due to the nature of development\nsome features in DCompute are dependent on features of LDC that may require upgrading your\ncompiler.\n\nIf you wish to be on the bleeding edge we recommend building LLVM & LDC from source. \nBe warned that LLVM has the tendency to break compatibility with LDC so expect that you may\nhave to revert syncing with LLVM. This goes the other way too fixing LDC to be compatible \nwith LLVM trunk will likely break it with a slightly older trunk.\n\n"
  },
  {
    "path": "docs/01-installation.md",
    "content": "Installation\n============\n\nLDC\n---\n\nAs mentioned previously DCompute requires the use of LDC as the D compiler.\nAll [recent releases of LDC](https://github.com/ldc-developers/ldc/releases)\nhave the NVPTX backend enabled for targetting NVidia hardware via CUDA.\n\nTo verify that your LDC build can target both nvptx and spirv backends, you\ncan run `ldc2 --version` and look for `nvptx` and `nvptx64` as well as\n`spirv32` and `spirv64` under Registered targets.\n\nDCompute\n--------\n\nIf you are using dub (highly recommended) then all you need to do is add \n`\"dcompute\": \"~>0.1.1\"` to your dub.json or \n`dependency \"dcompute\" version=\"~>0.1.1\"` to your dub.sdl \ndependencies and you should be good to go and can ignore the rest of this section.\n\nIf you are not using dub DCompute has a few of dependencies that you need to \ninclude:\n\n* [derelict-cl](https://github.com/DerelictOrg/DerelictCL) for OpenCL bindings\n* [bindbc-cuda](https://github.com/badnikhil/bindbc-cuda) for CUDA bindings\n* [derelict-util](https://github.com/DerelictOrg/DerelictUtil) shared library loading utilities used by derelict-cl\n\nConfiguring bindbc-cuda\n-----------------------\n\nUnlike the previous Derelict bindings, `bindbc-cuda` requires you to specify which\nCUDA Driver API version to target via a D version flag in your `dub.json`.\nThis controls which host-side CUDA functions (e.g. `cuMemPrefetchAsync`) are available.\n\nAdd the appropriate version to your `dub.json` configuration:\n\n```json\n\"versions\": [\"CUDA_120\"]\n```\n\nSupported version flags: `CUDA_100`, `CUDA_101`, `CUDA_102`, `CUDA_110`, `CUDA_111`,\n`CUDA_112`, `CUDA_118`, `CUDA_120`, `CUDA_122`, `CUDA_124`, `CUDA_130`, `CUDA_132`.\n\nIf no version flag is specified, `bindbc-cuda` defaults to `CUDA_100` (CUDA 10.0).\nChoose the version that matches the CUDA toolkit installed on your system — you can\ncheck yours by running `nvcc --version`.\n\n**Note:** This version flag is independent of the LDC `-mdcompute-targets` flag.\nThe `dflags` target (e.g. `cuda-210`) controls which GPU hardware architecture\nLDC generates PTX code for, while the `versions` flag controls which driver API\nfunctions are available on the host side.\n\nDrivers\n-------\n\nTo utilise the hardware you need drivers that implement OpenCL 2.1 or higher or CUDA.\nPlease consult your hardware vendors website for drivers.\n\nTODO: add a list.\n"
  },
  {
    "path": "docs/02-hardware.md",
    "content": "Hardware\n========\n\nWriting code for DCompute kernels is a bit different from regular CPU programming.\n\nMost noticable is that you write the kernel as the body of a for loop that is then\nvectorised and run in parallel by the `device`. As a consequence of this, there are\nno sequencing guaruntees and branching is done as vector mask operations. This\nincludes `while` style loops, they will continue until every lane of the vector has\ncompleted the loop.\n\nVirtual function and function pointers are infeasable and therefore not supported,\nthis includes classes and delegates. Alias template parameters still work.\n\nDue to the large number of concurrent threads, it is very easy to end up with a\ndata race, not help by the fact that any synchronisation (fences, atomics) must\nbe done manually. Fences and atomics can be quite expensive.\n\nCPUs\n----\nCaches are present and reasonable in size. Vectors are relatively short. Branch\nprediction is good.\n\nGPUs\n----\n\nCaches may be present but are much smaller relative to the number of threads.\nVectors are generally wider than CPUs. Branch prediction is absent. Top level\ndcache is small, you really dont want to spill your stack. Texture fetch means\nyou can load from nearby in 2D or 3D efficiently.\n\nFPGAs\n-----\n\nInstructions are in hardware, each and every one of them counts: shrinking your\ninstruction count can increase your vector width as vector width is determined\nby the available datapaths. Execution speed is determined by dataflow.\nTiming is very important. You tell a CPU what to do, while you tell an FPGA what to be\n\n"
  },
  {
    "path": "docs/03-kernels.md",
    "content": "Kernels\n=======\n\nAt the heart of DCompute is are the special attributes `@compute` and `@kernel()` from the module `ldc.dcompute`\n\n> **Note:** The `@kernel()` syntax requires LDC 1.42 or later. If you are using an older version of LDC, please use `@kernel` (without parentheses).\n\n`@compute` tell the d compiler that this module should be built to target the device. \n`@compute` takes a single parameter that Indicate wether to target only the device \n(`@compute(CompileFor.deviceOnly)`) or to target host as well (`@compute(CompileFor.hostAndDevice)`).\n\n`@kernel()` specifies that the attached function should be an entry point for the device,\ni.e. you can tell the driver to execute this function on the device, \nwhereas you can't for functions that aren't marked `@kernel()`.\n\nAddress Spaces \n--------------\n\nAlso critical in using DCompute is the notion of address spaced pointers.\nThese are available from the module `ldc.dcompute` in the form of the magic template\n`Pointer!(uint addrspace,T)` which is a pointer to a `T` that resides in the address space `addrspace`. \nthere are 5 address spaces Global, Shared, Constant, Private and Generic.\n\nGlobal is available to all tasks on the device. It is the only address space that the host can both read and write. \n\nShared is memory that is local to a group of threads/work items. \nThreads (or work items in OpenCL speak) are the unit of execution.\n\nConstant memory is memory that is writeable by the host but read only by the device\nand is kind of like read only pages but is has some spacial chaching properties.\n\nPrivate memory is local to a thread and contains its registers and stack. \n\nGeneric is not really an address space but a Generic pointer can point anywhere in \nthe other address spaces and is useful if you are writing library routines that \ndon't know ahead of time where the pointer will point to. You could of course just template the address space.\n\nFor more information on this concept just search for documentation on OpenCL and/or CUDA.\n\nThe table below shows the equivalent terms in DCompute, OpenCL and CUDA.\n\n|  DCompute  |  OpenCL    |   CUDA         |\n|------------|------------|----------------|\n|   Global   | `__global`   |  `__device__`    |\n|   Shared   | `__local`    |  `__shared__`    |\n|   Constant | `__constant` |  `__constant__`  |\n|   Private  | `__private`  |  `__local__ `    |\n|   Generic  | `__generic`  | (no qualifier) |\n\n\nHello World\n-----------\n\nAbout the simplest kernel you can have is shown below (note that @kernel() functions MUST return `void` or you'll get errors)\n\n```d\n@compute(CompileFor.deviceOnly) module mykernels;\nimport ldc.dcompute;\n@kernel() void mykernel(GlobalPointer!float a,GlobalPointer!float b, float c)\n{\n*a = *b + c;\n}\n```\n\nIts not a very useful kernel because it only assigns to the first element of `a`.\n\nCompile with `ldc2 -mdcompute-targets=ocl-210,cuda-350 -oq` to target OpenCL 2.1 and CUDA SM 3.5.\n\nNon D kernels\n-------------\n\nWhile a major part of DCompute is being able to write kernels in D, there is nothing stopping \nyou using it as a nicer wrapper for kernels written in e.g. OpenCL C or CUDA. \nAll that you need to ensure is that the (mangled) name and signature of the kernels D declaration match\nwith its definition in the other language and you can use it as is it were a D kernel.\n\nFor OpenCL this means declaring the kernels `extern(C)`, for CUDA `extern(C++)` unless the kernel is declared \n`extern \"C\"`, in which case use `extern(C)`. You will also need to alter the build process to compile and link\nthe foreign kernel.\n\nE.g.\nOpenCL:\n```opencl\n__kernel void foo() {}\n```\n\nCUDA:\n```cuda\nextern \"C\" __global__ void foo() {}\n```\n\nD:\n```d\n@compute(CompileFor.deviceOnly)\nmodule bar;\n\nextern(C) @kernel() void foo();\n```\n\n"
  },
  {
    "path": "docs/04-std/00-intro.md",
    "content": "The device standard library\n============================\n\nMuch like the regular standard library the device standard library \n(`dcompute.std.*`) provides implementations of common functions,\nusually implemented as compiler intrinsics.\n"
  },
  {
    "path": "docs/04-std/01-index.md",
    "content": "Index\n=====\n\nTo do anything useful with DCompute a thread needs to know it's index, it's position.\nIf you take a look at `dcompute.std.index` you'll see there are quite a few to choose from.\nMost of the indices are three dimensional and represent offsets in a \"3D\" view of memory.\nOf course not all problems are 3D so the y and z values are not always useful.\n\nIf you come from OpenCL or CUDA the table below should help you familiarise yourself with the different indices available.\n\nIndex Terminology:\n\n| DCompute           | CUDA                        | OpenCL\n|--------------------|-----------------------------|--------\n| GlobalDimension    | `gridDim*blockDim`            | get_global_size()\n| GlobalIndex        | `blockDim*blockIdx+threadIdx` | get_global_id()\n|                    |                             |\n| GroupDimension     | gridDim                     | get_num_groups()\n| GroupIndex         | blockIdx                    | get_group_id()\n|                    |                             |\n| SharedDimension    | blockDim                    | get_local_size()\n| SharedIndex        | threadIdx                   | get_local_id()\n|                    |                             |\n| GlobalIndex.linear | A nasty calculation         | get_global_linear_id()\n| SharedIndex.linear | Ditto                       | get_local_linear_id()\n\nNote:\n\\*Index.{x,y,z} are bounded by \\*Dimension.{x,y,z}\n\nUse SharedIndex's to index Shared Memory and GlobalIndex's to index Global Memory\n\nA Group is the ratio of Global to Shared. GroupDimension is NOT the size of a single\ngroup, (thats SharedDimension) rather it is the number of groups along a given dimension.\nSimilarly GroupIndex is how many units of the SharedDimension along a given dimension.\n\nExtending the previous example to add a constant to an array and assign it to another \n(we could have also used `GlobalIndex.linear`). We have:\n\n```d\n@compute(CompileFor.deviceOnly) module mykernels;\nimport ldc.attributes;\nimport ldc.dcompute;\nimport dcompute.std.index;\nalias gf = GlobalPointer!float;\n@kernel() void mykernel(gf a, gf b, float c)\n{\n    auto i = GlobalIndex.x;\n    a[i] = b[i] + c;\n}\n```  \n\nWith the same command line as before.\n\nAutoindex\n---------\n\n`AutoIndexed` is a type that automatically indexes a `GlobalPointer` or `SharedPointer` \nfor making kernel lambda nicer to use.\n"
  },
  {
    "path": "docs/05-driver/00-intro.md",
    "content": "Driver\n======\n\nNow that you've successfully written your kernel, how do you execute it?\nThat's the job of the driver.\n\nThe driver (`dcompute.driver`) manages the interactions with the compute APIs\n(OpenCL and CUDA). This doesn't stop you interacting with them directly, it\njust provides you with a consistent and (as much as is possible) a boiler-plate \nfree interface.\n\nAPI objects\n-----------\n\nThere are a number of driver API objects that wrap the underlying compute API \nobjects. They are summarised briefly below. More in depth information is available\nin the corresponding subsection of this chapter.\n\n**Platform:** Represents one implementation of a compute API. You can query object for the\ndevices that are available though it.\n\n**Device:** Represents a unit of execution (e.g. a GPU). Group devices together to form a\ncontext. You can query a large number of properties about performance characteristics\nand available memory.\n\n**Context:** A key API object. You create queues, buffers/images, samplers and programs from it.\n\n**Memory:** Represents a region of memory. An abstract base class of buffers & images.\n\n**Buffers:** Represents a 1,2 or 3D (possibly strided) linear view of memory.\n\n**Images:**  Represents a 1,2 or 3D view of memory whose layout is determined by the format of the\nimage (number and datatype of the channels).\n\n**Programs:** Represents a hunk of code for a context. You can create Kernels from a linked \nprogram (i.e. all external dependencies resolved).\n\n**Queue:** Represents a list of work (data transfers & kernel launches) and the graph of their\ndependencies.\n\n**Kernel:** Represents a callable function from a Program and associated function parameters.\nSubmit kernels with supplied parameters to a queue to execute them on the queue's \ncontext's devices.\n\n**Event:** Represents a future return value from executing an asynchronous operation, such \nas a data transfer or kernel launch.\n\n# Running a Kernel\n\nNow, let's run our `mykernel` kernel that we have built up (see `04-std/01-index.md`). Recall\nthat our kernel code should be in a separate file. For our main function, we can have something\nas shown below. This is assumes compilation for CUDA backend. Note that we import our \n`mykernels` module containing our kernel code and the dcompute driver for cuda.\n\n```d\nimport std.stdio;\nimport ldc.dcompute;\nimport std.algorithm;\nimport std.stdio;\nimport std.file;\nimport std.traits;\nimport std.meta;\nimport std.exception : enforce;\nimport std.experimental.allocator;\nimport std.array;\nimport mykernels;\nimport dcompute.driver.cuda;\n\nint main()\n{\n    enum size_t N = 128;\n    float c = 5.0;\n    float[N] res, x;\n    foreach (i; 0 .. N)\n    {\n        x[i] = i;\n    }\n\n    Platform.initialise();\n\n    auto devs = Platform.getDevices(theAllocator);\n    auto dev   = devs[0];\n    auto ctx   = Context(dev); scope(exit) ctx.detach();\n\n    // Change the file to match your GPU.\n    Program.globalProgram = Program.fromFile(\"kernels_cuda800_64.ptx\");\n    auto q = Queue(false);\n\n    Buffer!(float) b_res, b_x;\n    b_res =  Buffer!(float)(res[]); scope(exit) b_res.release();\n    b_x   =  Buffer!(float)(x[]);   scope(exit) b_x.release();\n\n    b_x.copy!(Copy.hostToDevice);\n\n    q.enqueue!(mykernel)\n              ([N,1,1],[1,1,1])\n              (b_res,b_x,c);\n    b_res.copy!(Copy.deviceToHost);\n\n    foreach(i; 0 .. N)\n        enforce(res[i] == x[i] + c);\n    writeln(res[]);\n\n    return 0;\n}\n```\nIt is important to change the file path on the `Program.fromFile(\"kernels_cuda800_64.ptx\")` line\nto the ptx file generated by the compilation step. Depending on how you set up dub, it may be in\n`./.dub/obj` or just your project directory. You should verify that your kernels actually show\nup in the ptx file after running dub build (it's in plaintext).\n\nWith the above example, we should get a successful run with the integers from 5 to 132 printed, since\nour kernel adds c, which is 5 in this case, to the input vector, which has 0 to 127 in our case.\n\nSee `source/dcompute/tests` for examples of a slightly more complicated kernel and running with opencl driver.\n"
  },
  {
    "path": "docs/README.md",
    "content": "## Welcome to the DCompute documentation!\n\nDCompute is a library that together with LDC is able to make D compile on GPU.\n\nDcompute is split into three sections, a driver, a standard library and a set of prewritten kernels.\n\nThe driver is intended to abstract the (rather unwieldy) compute API of CUDA and OpenCL.\nBut you can still pull all the leavers yourself if you feel the need.\n\nThe standard library contains the set of primitive operations exposed by the compute environment as well as other common operations.\n\nThese docs are designed to help getting started installing & using DCompute. \n\n## Table of Contents\n\n0. Prerequisites to using DCompute\n1. Installing DCompute\n2. Understanding the hardware that DCompute runs on\n3. Writing kernels\n4. The device standard library\n4.1 index\n5. The compute API driver\n\nYou can find the corresponding Readme for each of the listed items in the parent `docs` directory, labelled with names\nstarting with 00 through 05. For the device standard library and compute API driver, look in the \nsubdirectories `04-std` and `05-driver`, respectively. These instructions will help you install and execute\nyour first kernel with DCompute.\n\n## D Basics Refresher\n\nThis guide assumes that the reader is familiar with the basics of D, although anyone \nfamiliar with the C family of languages should be able to understand most of it.\n\nSome of the main differences are listed below:\n\nThe template instansiation operator is binary `!` in contrast to paired angle brackets\nas in C++/C# et el. If `Foo` is a templated struct that takes one type parameter then \n`Foo!int foo;` declares a variable \n\nThere is a third class of template parameters: aliases (the other two being types and values).\n`alias` template parameters can, in addition to holding types and values, can hold _symbols_.\nThese include variables, functions and lambdas. `alias` when used outside of a template parameter \nlist is the equivalent to `using`, in C++.\n\n`~` is the concatenation operator, used unsurprisingly to concatenate arrays. \nUsed widely in string manipulation.\n\nUniform Function Call Syntax (UFCS) allows you to call a free function as if it were a \nmethod of the type of its first argument (e.g. f(x,y) can be called as x.f(y)).\nThis together with optional parentheses,`x.f()` where `f` is a method or UFCS function of `x`\nmay be written as `x.f`, allows you to write chains of call `h(g(f(x)))` as `x.f.g.h`.\n\n`class`es as are polymorphic reference types. `struct`s are value types. Idomatic D code \ntends to use structs over classes. Classes are not used at all in DCompute.\n\nThe `.` operator will implicity follow any pointers, although it will not dereference the last\none in a chain of `.`s. There is no operator `->` or `::`, these are both handled by `.`.\n\n`static if` is D's conditional compilation construct. Code inside a taken branch is compiled \ninto the object file, code inside a taken branch _not_ taken must be syntactically correct, but \nneed not be semantically correct.\n\nFor more information see the [D Wiki](https://wiki.dlang.org/Coming_From).\n"
  },
  {
    "path": "dub.json",
    "content": "{\n    \"name\": \"dcompute\",\n    \"description\": \"Native Heterogeneous Computing for D\",\n    \"copyright\": \"Copyright © 2017, Nicholas Wilson\",\n    \"authors\": [\"Nicholas Wilson\"],\n    \"license\": \"BSL-1.0\",\n    \"dependencies\": {\n        \"derelict-cl\"  : \"~>3.2.0\",\n        \"bindbc-cuda\": \"~>0.1.0\",\n        \"taggedalgebraic\": \"~>0.10.7\"\n    },\n    \"configurations\": [\n        {\n            \"name\": \"library\",\n            \"targetType\": \"library\",\n            \"excludedSourceFiles\": [\"source/dcompute/tests/*\"],\n        },\n        {\n            \"name\": \"unittest\",\n            \"dflags\" : [\"-mdcompute-targets=cuda-210\" ,\"-oq\"],\n            \"targetType\": \"executable\",\n            \"versions\": [\"DComputeTesting\"],\n        },\n        {\n            \"name\": \"test-cuda\",\n            \"dflags\" : [\"--mdcompute-targets=cuda-210\", \"-oq\"],\n            \"targetType\": \"executable\",\n            \"versions\": [\"DComputeTestCUDA\"],\n        },\n        {\n            \"name\": \"test-ocl\",\n            \"dflags\" : [\"--mdcompute-targets=ocl-200\", \"-oq\"],\n            \"targetType\": \"executable\",\n            \"versions\": [\"DComputeTestOpenCL\"],\n        },\n   ]\n}\n"
  },
  {
    "path": "source/dcompute/driver/README.md",
    "content": "dcompute.driver\n===============\n\nContains the abstracted driver interface for dcompute. It contains a delegation \nlayer to the OpenCL (`dcompute.driver.ocl`) and CUDA (`dcompute.driver.cuda`)\ndrivers, code to load the appropriate system drivers and \"get up and running\" in a\nplatform agnoasic manner. Unless you're doing something that absolutely needs \nspecific driver functionality then you should use this API rather than the \nindividual compute APIs.\n\nThe API objects and their equivalets in OpenCL and CUDA are listed in the table \nbelow.\n\n| Dcompute | CUDA        | OpenCL           |\n| -------- | ----        | ------           |\n| Platform | N/A\\*       | cl_platform_id   |\n| Device   | CUdevice    | cl_device_id     |\n| Context  | CUcontext   | cl_context       |\n| Queue    | CUstream    | cl_command_queue |\n| Memory   | CUdeviceptr | cl_mem \\*\\*      |\n| Module   | CUmodule    | cl_program       |\n| Kernel   | CUfunction  | cl_kernel        |\n| Event    | cudaEvent_t | cl_event         |\n\nIn addition there are a few Allocator types that allocate device local and shared \nvirtual memory.\n\n\\* We make CUDA a platform of its own.\n\n\\*\\* includes buffers, images and pipes.\n\n\nPlatform\n--------\n\nPerforms the loading of the driver and handles any global initialisation required (e.g. `cuInit(0)`). Gives access to its devices.\n\n\nDevice\n------\n\nCan query them for information. You create `Context`s from them.\n\n\nContext\n-------\n\nCreate `Queue`s, `Memory` objects, `Module`s from these. By default this stores a\nsingle `Module` along with it, created for all devices that are part of this \ncontext.\n\n\nQueue\n-----\n\nSubmit `Kernel` invocations and `Memory` transfers/maps (returning `Event`s),\n set `Device` affinity.\n\n\nKernel\n------\n\nExtracted from `Module`s. You can choose to not use these directly if you wish and\nlet this library do all the API bashing for you striaght from the module.\nHowever you can extract these from `Module`s if you wish to avoid the re-extraction \ncosts.\n\n\nEvent\n-----\n\nReturned as a result of enqueuing something. You can set callback on these, or wait \non them. Useful for synchronisation\n\n"
  },
  {
    "path": "source/dcompute/driver/backend.d",
    "content": "module dcompute.driver.backend;\n\nenum Backend\n{\n    OpenCL120,\n    CUDA650,\n}\n"
  },
  {
    "path": "source/dcompute/driver/cuda/TODO",
    "content": "cuLink.*\ncuIpc.*\ncuTexRef.*\ncuTexObj.*\ncuSurfRef.*\ncuSurfObj.*\n"
  },
  {
    "path": "source/dcompute/driver/cuda/buffer.d",
    "content": "module dcompute.driver.cuda.buffer;\n\nimport dcompute.driver.cuda;\n\nstruct Buffer(T)\n{\n    size_t raw;\n\n\t// Host memory associated with this buffer\n    T[] hostMemory;\n\n    this(size_t elems)\n    {\n        status = cast(Status)cuMemAlloc(&raw,elems * T.sizeof);\n        checkErrors();\n        hostMemory = null;\n    }\n\n    this(T[] arr)\n    {\n        status = cast(Status)cuMemAlloc(&raw,arr.length * T.sizeof);\n        checkErrors();\n        hostMemory = arr;\n    }\n    void copy(Copy c)()\n    {\n        static if (c == Copy.hostToDevice)\n        {\n            status = cast(Status)cuMemcpyHtoD(raw, hostMemory.ptr,hostMemory.length * T.sizeof);\n        }\n        else static if  (c == Copy.deviceToHost)\n        {\n            status = cast(Status)cuMemcpyDtoH(hostMemory.ptr,raw,hostMemory.length * T.sizeof);\n        }\n        checkErrors();\n    }\n    alias hostArgOf(U : GlobalPointer!T) = raw; \n    void release()\n    {\n        status = cast(Status)cuMemFree(raw);\n        checkErrors();\n        raw = 0;\n        hostMemory = null;\n    }\n}\n\nalias bf = Buffer!float;\n"
  },
  {
    "path": "source/dcompute/driver/cuda/context.d",
    "content": "module dcompute.driver.cuda.context;\n\nimport dcompute.driver.cuda;\n\nstruct Context\n{\n    CUcontext raw;\n    this(Device dev, uint flags = 0)\n    {\n        status = cast(Status)cuCtxCreate(&raw, flags,dev.raw);\n        checkErrors();\n    }\n    \n    static void push(Context ctx)\n    {\n        status = cast(Status)cuCtxPushCurrent(ctx.raw);\n        checkErrors();\n    }\n    \n    static Context pop()\n    {\n        Context ret;\n        status = cast(Status)cuCtxPopCurrent(&ret.raw);\n        checkErrors();\n        return ret;\n    }\n    static @property Context current()\n    {\n        Context ret;\n        status = cast(Status)cuCtxGetCurrent(&ret.raw);\n        checkErrors();\n        return ret;\n    }\n    \n    static @property void current(Context ctx)\n    {\n        status = cast(Status)cuCtxSetCurrent(ctx.raw);\n        checkErrors();\n    }\n    \n    static void sync()\n    {\n        status = cast(Status)cuCtxSynchronize();\n        checkErrors();\n    }\n    //CUlimit\n    enum Limit\n    {\n        stackSize,\n        printfFifoSize,\n        mallocHeapSize,\n        deviceRuntimeSyncDepth,\n        deviceRuntimePendingLaunchCount\n    }\n    \n    static @property void limit(Limit what)(size_t lim)\n    {\n        status = cast(Status)cuCtxSetLimit(what,lim);\n        checkErrors();\n    }\n    \n    static @property size_t limit(Limit what)()\n    {\n        size_t ret;\n        status = cast(Status)cuCtxGetLimit(&ret,what);\n        checkErrors();\n        return ret;\n    }\n    //CUfunc_cache\n    enum CacheConfig\n    {\n        preferNone,\n        preferShared,\n        preferL1,\n        preferEqual,\n    }\n    \n    static @property void cacheConfig(CacheConfig cc)\n    {\n        status = cast(Status)cuCtxSetSharedMemConfig(cc);\n        checkErrors();\n    }\n    \n    \n    static @property CacheConfig cacheConfig()\n    {\n        CacheConfig ret;\n        status = cast(Status)cuCtxGetSharedMemConfig(cast(int*)&ret);\n        checkErrors();\n        return ret;\n    }\n    \n    @property uint apiVersion()\n    {\n        uint ret;\n        status = cast(Status)cuCtxGetApiVersion(raw,&ret);\n        checkErrors();\n        return ret;\n    }\n    \n    static void getQueuePriorityRange(out int lo, out int hi)\n    {\n        status = cast(Status)cuCtxGetStreamPriorityRange(&lo,&hi);\n        checkErrors();\n    }\n    \n    void detach()\n    {\n        status = cast(Status)cuCtxDetach(raw);\n        checkErrors();\n    }\n}\n"
  },
  {
    "path": "source/dcompute/driver/cuda/device.d",
    "content": "module dcompute.driver.cuda.device;\n\nimport dcompute.driver.cuda;\n\nstruct Device\n{\n    int raw;\n    //struct CUdevprop\n    static struct Info\n    {\n        @(1)  int maxThreadsPerBlock;\n        @(2)  int maxThreadsDimX;\n        @(3)  int maxThreadsDimY;\n        @(4)  int maxThreadsDimZ;\n        @(5)  int maxGridSizeX;\n        @(6)  int maxGridSizeY;\n        @(7)  int maxGridSizeZ;\n        @(8)  int sharedMemPerBlock;\n        @(9)  int totalConstantMemory;\n        @(10) int SIMDWidth; // warp size\n        @(11) int maxPitch;\n        @(12) int regsPerBlock;\n        @(13) int clockRate;\n        @(14) int textureAlign;\n        @(15) int GPUOverlap;\n        @(16) int multiprocessorCount;\n        @(17) int kernelExecTimeout;\n        @(18) int integrated;\n        @(19) int canMapHostMemeory;\n        @(20) int computeMode;\n        @(21) int maxTexture1DWidth;\n        @(22) int maxTexture2DWidth;\n        @(23) int maxTexture2DHeight;\n        @(24) int maxTexture3DWidth;\n        @(25) int maxTexture3DHeight;\n        @(26) int maxTexture3DDepth;\n        @(27) int maxTexture2DLayeredWidth;\n        @(28) int maxTexture2DLayeredHeight;\n        @(29) int maxTexture2DLayeredLayers;\n        @(27) int maxTexture2DArrayWidth;\n        @(28) int maxTexture2DArrayHeight;\n        @(29) int maxTexture2DArrayNumSlices;\n        @(30) int surfaceAlignment;\n        @(31) int concurrentKernels;\n        @(32) int eccEnabled;\n        @(33) int PCIBusID;\n        @(34) int PCIDeviceID;\n        @(35) int tccDriver;\n        @(36) int memoryClockRate;\n        @(37) int globalMemoryBusWidth;\n        @(38) int L2CacheSize;\n        @(39) int maxThreadPerMultiProcessor;\n        @(40) int asyncEngineCount;\n        @(41) int unifiedAddressing;\n        @(42) int maxTexture1DLayeredWidth;\n        @(43) int maxTexture1DLayeredLayers;\n        @(44) int canTex2DGather;\n        @(45) int maxTextrue2DGatherWidth;\n        @(46) int maxTextrue2DGatherHeight;\n        @(47) int maxTexture3DWidthAlternative;\n        @(48) int maxTexture3DHeightAlternative;\n        @(49) int maxTexture3DDepthAlternative;\n        @(50) int PICDomainID;\n        @(51) int texturePitchAlignment;\n        @(52) int textureCubemapWidth;\n        @(53) int textureCubemapLayeredWidths;\n        @(54) int textureCubemapLayeredLayers;\n        @(55) int maxSurface1DWidth;\n        @(56) int maxSurface2DWidth;\n        @(57) int maxSurface2DHeight;\n        @(58) int maxSurface3DWidth;\n        @(59) int maxSurface3DHeight;\n        @(60) int maxSurface3DDepth;\n        @(61) int maxSurface1DLayeredWidth;\n        @(62) int maxSurface1DLayeredLayers;\n        @(63) int maxSurface2DLayeredWidth;\n        @(64) int maxSurface2DLayeredHeight;\n        @(65) int maxSurface2DLayeredLayers;\n        @(66) int maxSurfaceCubemapWidth;\n        @(67) int maxSurfaceCubemapLayeredWidth;\n        @(68) int maxSurfaceCubemapLayeredLayers;\n        @(69) int maxTaxture1DLinearWidth;\n        @(70) int maxTaxture2DLinearWidth;\n        @(71) int maxTaxture2DLinearHeight;\n        @(72) int maxTaxture2DLinearPitch;\n        @(73) int maxTaxture2DMipmappedWidth;\n        @(74) int maxTaxture2DMipmappedHeight;\n        @(75) int computeCapabilityMajor;\n        @(76) int computeCapabilityMinor;\n        @(77) int maxTaxture1DMipmappedWidth;\n        @(78) int streamPrioritiesSupported;\n        @(79) int globalL1CacheSupported;\n        @(80) int localL1CacheSupported;\n        @(81) int maxSharedMemoryPerMultiprocessor;\n        @(82) int maxRegistorsPerMultiprocessor;\n        @(83) int managedMemory;\n        @(84) int multiGPUBoard;\n        @(85) int multiGPUBoardGroupID;\n    }\n    \n    @property size_t totalMemory()\n    {\n        size_t ret;\n        status = cast(Status)cuDeviceTotalMem(&ret,raw);\n        checkErrors();\n        return ret;\n    }\n    \n    //char[] name : cuDeviceGetName\n\n    static foreach (mem; __traits(allMembers, Info)) {\n        mixin(\n            ` @property int `,\n            mem,\n            ` () { int result; `,\n            ` status = cast(Status)cuDeviceGetAttribute( `,\n            ` &result, `,\n            ` cast(CUdevice_attribute) `,\n             __traits(getAttributes, __traits(getMember, Info, mem))[0].stringof,\n            `, raw); `,\n            ` checkErrors(); `,\n            ` return result; `,\n            ` } `);\n    }\n\n  // Unified Memory capability helpers\n\n    /**\n     * Returns true when the device supports CUDA Managed Memory\n     * (cuMemAllocManaged / UnifiedBuffer).\n     * Requires Compute Capability >= 3.0.\n     * Wraps CU_DEVICE_ATTRIBUTE_MANAGED_MEMORY (attribute 83).\n     */\n    @property bool supportsUnifiedMemory()\n    {\n        return managedMemory != 0;\n    }\n\n    /**\n     * Returns true when the device participates in Unified Virtual\n     * Addressing (UVA) i.e. the same virtual address is valid on\n     * both host and device.  True on all 64-bit CUDA systems with\n     * CC >= 2.0.\n     * Wraps CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING (attribute 41).\n     */\n    @property bool supportsUnifiedAddressing()\n    {\n        return unifiedAddressing != 0;\n    }\n}\n"
  },
  {
    "path": "source/dcompute/driver/cuda/event.d",
    "content": "module dcompute.driver.cuda.event;\n\nimport dcompute.driver.cuda;\n\nstruct Event\n{\n    CUevent raw;\n    \n}\n"
  },
  {
    "path": "source/dcompute/driver/cuda/kernel.d",
    "content": "module dcompute.driver.cuda.kernel;\n\nimport dcompute.driver.cuda;\nstruct Kernel(F) if (is(F==function)|| is(F==void))\n{\n    CUfunction raw;\n    \n    static struct Attributes\n    {\n        @(0) int maxThreadsPerBlock;\n        // in Bytes\n        @(1) int sharedSize;\n        @(2) int constSize;\n        @(3) int localSize;\n        \n        @(4) int numRegs;\n        @(5) int ptxVersion;\n        @(6) int binaryVersion;\n        @(7) int cacheModeCa;\n    }\n\n}\n"
  },
  {
    "path": "source/dcompute/driver/cuda/memory.d",
    "content": "module dcompute.driver.cuda.memory;\n\nimport dcompute.driver.error;\nimport dcompute.driver.cuda;\n\n// void pointer like\nstruct MemoryPointer\n{\n    size_t raw;\n    static MemoryPointer allocate(size_t nbytes)\n    {\n        MemoryPointer ret;\n        status = cast(Status)cuMemAlloc(&ret.raw,nbytes);\n        checkErrors();\n        return ret;\n    }\n    //static MemoryPointer allocatePitch(T)(size_t nbytes)\n\n    Memory addressRange()\n    {\n        Memory ret;\n        status = cast(Status)cuMemGetAddressRange(&ret.ptr.raw,&ret.length,raw);\n        checkErrors();\n        return ret;\n    }\n\n}\n\n// void[] like\nstruct Memory\n{\n    MemoryPointer ptr;\n    size_t length;\n\n    enum CopySource\n    {\n        Host,\n        Device,\n        Array\n    }\n\n    // cuMemcpy and friends\n    // TODO: implement this properly\n    /*\n    template copy(T, CopySource from, CopySource to, int dimentions = 1,\n                  Flag!\"peer\" _peer = No.peer)\n    {\n        auto copy(Memory to)\n        {\n            status = cast(Status)cuMemcpy(to.ptr.raw,ptr.raw,length);\n            checkErrors();\n        }\n    }*/\n\n    // TODO: cuMemset & frineds\n\n}\n"
  },
  {
    "path": "source/dcompute/driver/cuda/package.d",
    "content": "module dcompute.driver.cuda;\n\npublic import ldc.dcompute;\npublic import bindbc.cuda;\n\npublic import dcompute.driver.error;\n\npublic import dcompute.driver.cuda.buffer;\npublic import dcompute.driver.cuda.context;\npublic import dcompute.driver.cuda.device;\npublic import dcompute.driver.cuda.event;\npublic import dcompute.driver.cuda.kernel;\npublic import dcompute.driver.cuda.memory;\npublic import dcompute.driver.cuda.platform;\npublic import dcompute.driver.cuda.program;\npublic import dcompute.driver.cuda.queue;\npublic import dcompute.driver.cuda.unified_buffer;\n\nenum Copy\n{\n    hostToDevice,\n    deviceToHost,\n    array,\n}\n\nenum MemoryBankConfig : int\n{\n    default_,\n    fourBytes,\n    eightBytes,\n}\ntemplate HostArgsOf(F) {\n    import std.meta, std.traits;\n    alias HostArgsOf = staticMap!(ReplaceTemplate!(Pointer, Buffer), Parameters!F);\n}\nprivate template ReplaceTemplate(alias needle, alias replacement) {\n    template ReplaceTemplate(T) {\n        static if (is(T : needle!Args, Args...)) {\n            alias ReplaceTemplate = replacement!(Args[1]);\n        } else {\n            alias ReplaceTemplate = T;\n        }\n    }\n}\n"
  },
  {
    "path": "source/dcompute/driver/cuda/platform.d",
    "content": "module dcompute.driver.cuda.platform;\n\nimport dcompute.driver.error;\nimport dcompute.driver.cuda;\nimport std.experimental.allocator.typed;\n\nstruct Platform\n{\n    static void initialise(uint flags =0)\n    {\n        auto support = loadCUDA();\n        if (support == CUDASupport.noLibrary || support == CUDASupport.badLibrary)\n        {\n            status = Status.sharedObjectInitFailed;\n            checkErrors();\n        }\n        status = cast(Status)cuInit(flags);\n        checkErrors();\n    }\n    \n    static Device[] getDevices(A)(A a)\n    {\n        int len;\n        TypedAllocator!(A) allocator;\n        status = cast(Status)cuDeviceGetCount(&len);\n        checkErrors();\n\n        //TODO:\n        //Device[] ret = allocator.makeArray!(Device)(len);\n            Device[] ret = new Device[len];\n        foreach(int i; 0 .. len)\n        {\n            status = cast(Status)cuDeviceGet(&ret[i].raw,i);\n            checkErrors();\n        }\n        return ret;\n    }\n    \n}\n"
  },
  {
    "path": "source/dcompute/driver/cuda/program.d",
    "content": "module dcompute.driver.cuda.program;\n\nimport dcompute.driver.cuda;\n\nimport std.string;\nstruct Program\n{\n    CUmodule raw;\n    \n    Kernel!void getKernelByName(immutable(char)* name)\n    {\n        Kernel!void ret;\n        status = cast(Status)cuModuleGetFunction(&ret.raw,this.raw,name);\n        checkErrors();\n        return ret;\n    }\n    Kernel!(typeof(k)) getKernel(alias k)()\n    {\n        return cast(typeof(return)) getKernelByName(k.mangleof.ptr);\n    }\n    // TODO: Support globals & images. Requires competent compiler. \n    //cuModuleGetGlobal\n    //cuModuleGetTexRef\n    //cuModuleGetSurfRef\n    \n    static Program fromFile(string name)\n    {\n        Program ret;\n        status = cast(Status)cuModuleLoad(&ret.raw,name.toStringz);\n        checkErrors();\n        return ret;\n    }\n\n    static Program fromString(string name)\n    {\n        Program ret;\n        status = cast(Status)cuModuleLoadData(&ret.raw,name.toStringz);\n        checkErrors();\n        return ret;\n    }\n    \n    __gshared static Program globalProgram;\n    //cuModuleLoadDataEx\n    //cuModuleLoadFatBinary\n    \n    void unload()\n    {\n        status = cast(Status)cuModuleUnload(raw);\n        checkErrors();\n    }\n    \n    //TODO: linkstate\n}\n\n\n\n"
  },
  {
    "path": "source/dcompute/driver/cuda/queue.d",
    "content": "// A stream in CUDA speak\nmodule dcompute.driver.cuda.queue;\n\nimport dcompute.driver.cuda;\nstruct Queue\n{\n    CUstream raw;\n    this (bool async)\n    {\n        status = cast(Status)cuStreamCreate(&raw, async ? 1 : 0);\n        checkErrors();\n    }\n    this (bool async, int priority)\n    {\n        status = cast(Status)cuStreamCreateWithPriority(&raw, async ? 1 : 0, priority);\n        checkErrors();\n    }\n    \n    @property bool async()\n    {\n        uint ret;\n        status = cast(Status)cuStreamGetFlags(raw,&ret);\n        checkErrors();\n        return cast(bool) ret;\n    }\n    \n    @property int priority()\n    {\n        int ret;\n        status = cast(Status)cuStreamGetPriority(raw,&ret);\n        checkErrors();\n        return ret;\n    }\n\n    void wait(Event e,uint flags)\n    {\n        status = cast(Status)cuStreamWaitEvent(raw,e.raw,flags);\n        checkErrors();\n    }\n    \n    // cuMemcpy.*Async and friends\n    // TODO: implement this properly\n    /*template copy(T, CopySource from, CopySource to, int dimentions = 1,\n                  Flag!\"peer\" _peer = No.peer)\n    {\n        auto copy(Memory to)\n        {\n            status = cast(Status)cuMemcpy(to.ptr.raw,ptr.raw,length);\n            checkErrors();\n        }\n    }*/\n\n    \n    /*void addCallback(void delegate(Queue,Status) dg)\n    {\n        static CUstreamCallback cb = (void* ,Status void*) =>\n        cuStreamAddCallback\n    }*/\n    \n    auto enqueue(alias k)(uint[3] _grid, uint[3] _block, uint _sharedMem = 0)\n    {\n        static struct Call\n        {\n            Queue q;\n            uint[3] grid, block;\n            uint sharedMem;\n            \n            this(Queue _q,uint[3] _grid, uint[3] _block, uint _sharedMem)\n            {\n                q= _q;\n                grid = _grid;\n                block = _block;\n                sharedMem = _sharedMem;\n            }\n            //TODO integrate evnts into this.\n            void opCall(HostArgsOf!(typeof(k)) args)\n            {\n                auto kernel = Program.globalProgram.getKernel!k();\n                void*[typeof(args).length] vargs;\n                foreach(uint i, ref a; args)\n                {\n                    vargs[i] = cast(void*)&a;\n                }\n                \n                status = cast(Status)\n                        cuLaunchKernel(kernel.raw,\n                                       grid[0], grid[1], grid[2],\n                                       block[0],block[1],block[2],\n                                       sharedMem,\n                                       q.raw,\n                                       vargs.ptr,\n                                       null);\n                checkErrors();\n            }\n        }\n        \n        return Call(this,_grid,_block,_sharedMem);\n    }\n}\n"
  },
  {
    "path": "source/dcompute/driver/cuda/unified_buffer.d",
    "content": "/**\n * Unified Memory (Managed Memory) buffer for CUDA.\n *\n * A UnifiedBuffer!T allocates memory that is accessible from both the host\n * (CPU) and the device (GPU) through a single pointer. The CUDA runtime\n * migrates data automatically, so explicit copy!(Copy.hostToDevice) /\n * copy!(Copy.deviceToHost) calls are not needed.\n *\n *\n * Requirements:\n *   - CUDA Compute Capability >= 3.0\n *   - Device.supportsUnifiedMemory == true\n */\nmodule dcompute.driver.cuda.unified_buffer;\n\nimport dcompute.driver.cuda;\n\n// Attach mode — controls which streams can access the managed allocation\n// initially. CU_MEM_ATTACH_GLOBAL makes the buffer immediately visible to\n// all streams (the most common choice). CU_MEM_ATTACH_HOST restricts it to\n// the host until a stream explicitly attaches to it.\n\nenum AttachMode : uint\n{\n    /// Accessible from any CUDA stream (default). Equivalent to\n    /// CU_MEM_ATTACH_GLOBAL.\n    global_ = CU_MEM_ATTACH_GLOBAL,\n\n    /// Initially host-only. Use cuStreamAttachMemAsync (not yet wrapped) or\n    /// switch to global_ to make the buffer available on the device.\n    /// Equivalent to CU_MEM_ATTACH_HOST.\n    host = CU_MEM_ATTACH_HOST,\n}\n\nstruct UnifiedBuffer(T)\n{\n    /// Raw CUdeviceptr — also a valid host-side pointer on unified-memory\n    /// capable systems (UVA must be enabled, which is true on all 64-bit CUDA\n    /// systems with CC >= 2.0).\n    size_t raw;\n\n    private size_t _length; // number of T elements\n\n    // ------------------------------------------------------------------\n    // Construction\n    // ------------------------------------------------------------------\n\n    /**\n     * Allocate `elems` uninitialised elements of T in managed memory.\n     *\n     * Params:\n     *   elems = number of elements to allocate\n     *   mode  = attachment scope (default: global_)\n     */\n    @trusted this(size_t elems, AttachMode mode = AttachMode.global_)\n    {\n        status = cast(Status)cuMemAllocManaged(&raw, elems * T.sizeof,\n                                              cast(uint)mode);\n        checkErrors();\n        _length = elems;\n    }\n\n    /**\n     * Allocate and initialise from a host slice.\n     * The contents of `arr` are copied into the managed allocation before\n     * returning, so the caller's original array is no longer needed.\n     *\n     * Params:\n     *   arr  = source host data\n     *   mode = attachment scope (default: global_)\n     */\n    this(T[] arr, AttachMode mode = AttachMode.global_)\n    {\n        this(arr.length, mode);\n        hostSlice[] = arr[];\n    }\n\n    // ------------------------------------------------------------------\n    // Host-side access\n    // ------------------------------------------------------------------\n\n    /**\n     * Returns a D slice backed by the managed allocation.\n     * Valid to read/write on the host at any time when no kernel is\n     * concurrently accessing the same memory.\n     */\n    @property @trusted T[] hostSlice()\n    {\n        return (cast(T*)raw)[0 .. _length];\n    }\n\n    /// Number of elements.\n    @property size_t length() const { return _length; }\n\n \n    // Device-side hints\n\n    /**\n     * Prefetch this buffer's data to a device asynchronously.\n     *\n     * Initiates memory migration to the specified device prior to kernel execution\n     * to avoid on-demand page migration latency.\n     *\n     * Note: Explicit prefetching requires CUDA 8.0 or higher. On older drivers\n     * where `cuMemPrefetchAsync` is not available, this is a silent no-op —\n     * unified memory still works correctly via demand paging.\n     */\n    @trusted void prefetch(Device dev, Queue q = Queue.init)\n    {\n        if (cuMemPrefetchAsync == null)\n            return;\n\n        status = cast(Status)cuMemPrefetchAsync(cast(CUdeviceptr)raw, _length * T.sizeof, dev.raw, q.raw);\n        checkErrors();\n    }\n\n\n    /// Free the managed allocation.  After this call `raw` and `length`\n    /// are zeroed; accessing `hostSlice` is undefined behaviour.\n    @trusted void release()\n    {\n        status = cast(Status)cuMemFree(raw);\n        checkErrors();\n        raw = 0;\n        _length = 0;\n    }\n\n    /// Satisfies the same hostArgOf alias contract as Buffer!T so that\n    /// HostArgsOf!kernelFn replaces GlobalPointer!T with UnifiedBuffer!T\n    /// transparently.\n    alias hostArgOf(U : GlobalPointer!T) = raw;\n\n    // Implicit conversion to Buffer!T\n\n    /**\n     * Returns a Buffer!T view of this managed allocation.\n     *\n     * This conversion exists so that UnifiedBuffer!T can be passed directly\n     * to Queue.enqueue!() whose opCall signature is fixed at compile-time\n     * to (Buffer!float, ...) via HostArgsOf.  Because both structs store\n     * `raw` as their first field, CUDA receives the correct CUdeviceptr.\n     *\n     * The Buffer's hostMemory slice is set to hostSlice so that if anyone\n     * accidentally calls copy!() on the returned Buffer, it still touches\n     * the right memory region.\n     */\n    @property Buffer!T asBuffer()\n    {\n        Buffer!T b;\n        b.raw        = raw;\n        b.hostMemory = hostSlice;\n        return b;\n    }\n\n    /// Implicit subtype: UnifiedBuffer!T is accepted wherever Buffer!T is\n    /// expected (e.g. Queue.enqueue!() opCall arguments).\n    alias this = asBuffer;\n}\n"
  },
  {
    "path": "source/dcompute/driver/error.d",
    "content": "/**/\n\nmodule dcompute.driver.error;\n\n// Helpfully OpenCL errors are negative and CUDAs are positive\nenum Status : int {\n    Success = 0,\n    // CUDA Errors.\n    invalidValue                = 1,\n    outOfMemory                 = 2,\n    notInitialized              = 3,\n    deinitialized               = 4,\n    profilerDisabled            = 5,\n    profilerNotInitialized      = 6,\n    profilerAlreadyStarted      = 7,\n    profilerAlradyStopped       = 8,\n    noDevice                    = 100,\n    invalidDevice               = 101,\n    invalidImage                = 200,\n    invalidContext              = 201,\n    contextAlreadyCurrent       = 202,\n    mapFailed                   = 205,\n    unmapFailed                 = 206,\n    arrayIsMapped               = 207,\n    alreadyMapped               = 208,\n    noBinaryForGPU              = 209,\n    alreadyAcquired             = 210,\n    notMapped                   = 211,\n    notMappedAsArray            = 212,\n    notMappedAsPointer          = 213,\n    eccUncorrectable            = 214,\n    unsupportedLimit            = 215,\n    contextAlredyInUse          = 216,\n    peerAccessUnsupported       = 217,\n    invalidPtx                  = 218,\n    invalidGraphicsContext      = 219,\n    nvlinkUncorrectable         = 220,\n    jitCompilerNotFound         = 221,\n    invalidSource               = 300,\n    fileNotFound                = 301,\n    sharedObjectSymbolNotFound  = 302,\n    sharedObjectInitFailed      = 303,\n    operatingSystem             = 304,\n    invalidHandle               = 400,\n    illegalState                = 401,\n    notFound                    = 500,\n    notReady                    = 600,\n    illegalAddress              = 700,\n    launchOutOfResources        = 701,\n    launchTimeout               = 702,\n    launchIncompatibleTexturing = 703,\n    peerAccessAlreadyEnabled    = 704,\n    peerAccessNotEnabled        = 705,\n    primaryContextActive        = 708,\n    contextIsDestroyed          = 709,\n    assertError                 = 710,\n    tooManyPeers                = 711,\n    hostMemoryAlreadyRegistered = 712,\n    hostMemoryNotRegistered     = 713,\n    hardwareStackError          = 714,\n    illegalInstruction          = 715,\n    misalignedAddress           = 716,\n    invalidAddressSpace         = 717,\n    invalidPC                   = 718,\n    launchFailed                = 719,\n    cooperativeLaunchTooLarge   = 720,\n    notPermitted                = 800,\n    notSupported                = 801,\n    systemNotReady              = 802,\n    systemDriverMismatch        = 803,\n    compatNotSupportedOnDevice  = 804,\n    streamCaptureUnsupported    = 900,\n    streamCaptureInvalidated    = 901,\n    streamCaptureMerge          = 902,\n    streamCaptureUnmatched      = 903,\n    streamCaptureUnjoined       = 904,\n    streamCaptureIsolation      = 905,\n    streamCaptureImplicit       = 906,\n    capturedEvent               = 907,\n    streamCaptureWrongThread    = 908,\n    unknown                     = 999,\n\n    // OpenCL Errors.\n    deviceNotFound                 = -1,\n    deviceNotAvailable             = -2,\n    compilerNotAvailable           = -3,\n    memoryObjectAloocationFailure  = -4,\n    outOfResources                 = -5,\n    outOfHostMemory                = -6,\n    profilingInfomationAvailable   = -7,\n    memoryCopyOverlap              = -8,\n    imageFormatMismatch            = -9,\n    imageFormatNotSupported        = -10,\n    buildProgramFailed             = -11,\n    mapFailure                     = -12,\n    misalignedSubBufferOffset      = -13,\n    errorsInWaitList               = -14,\n    compileProgramFailure          = -15,\n    linkerNotAvailable             = -16,\n    linkerFailure                  = -17,\n    devicePartitionFailure         = -18,\n    kernelArgInfoNotAvailable      = -19,\n    \n    invalidValueCL                 = -30,\n    invalidDeviceType              = -31,\n    invalidPlatform                = -32,\n    invalidDeviceCL                = -33,\n    invalidContextCL               = -34,\n    invalidQueueProperties         = -35,\n    invalidQueue                   = -36,\n    invalidHostPointerCL           = -37,\n    invalidMemoryObject            = -38,\n    invalidImageFormatDesctiptor   = -39,\n    invalidImageSize               = -40,\n    invalidSampler                 = -41,\n    invalidBinary                  = -42,\n    invalidBuildOptions            = -43,\n    invalidProgram                 = -44,\n    invalidExecutable              = -45,\n    invalidKernelName              = -46,\n    invalidKernelDefinition        = -47,\n    invalidKernel                  = -48,\n    invalidArgumentIndex           = -49,\n    invalidArgumentValue           = -50,\n    invalidArgumentSize            = -51,\n    invalidKernelArguments         = -52,\n    invalidWorkDimensions          = -53,\n    invaildWorkGroupSize           = -54,\n    invaildWorkItemSize            = -55,\n    invalidGlobalOffest            = -56,\n    invalidEventWaitList           = -57,\n    invalidEvent                   = -58,\n    invalidOperation               = -59,\n    invalidGLObject                = -60,\n    invalidBufferSize              = -61,\n    invalidMipLevel                = -62,\n    invalidGlobalWorkSize          = -63,\n    invalidProperty                = -64,\n    invalidImageDescriptor         = -65,\n    invalidCompilerOptions         = -66,\n    invalidLinkerOptions           = -67,\n    invalidDevicePartitionCount    = -68,\n    \n    invalidGLSharegroupReference   = -1000,\n    platformNotFound               = -1001,\n    invalidD3D10Device             = -1002,\n    invalidD3D10Resource           = -1003,\n    D3D10ResouceAlreadyAcquired    = -1004,\n    D3D10ResourceNotAcquires       = -1005,\n    invalidD3D11Device             = -1006,\n    invalidD3D11Resource           = -1007,\n    D3D11ResourceAlredyAcquired    = -1008,\n    D3D11ResourceNotAcquired       = -1009,\n    invalidDX9MediaAdapter         = -1010,\n    invalidDX9MediaSurface         = -1011,\n    DX9MediaSurfaceAlreadyAcquired = -1012,\n    DX9MediaSurfaceNotAcquired     = -1013,\n    \n    devicePartitionFailed          = -1057,\n    invalidPartitionCount          = -1058,\n    invalidPartitionName           = -1059,\n    \n    invalidEGLObject               = -1093,\n    EGLResourceNotAcquired         = -1092,\n}\n\nversion (D_BetterC)\n{\n    void delegate (Status) nothrow @nogc onDriverError = (Status _status) \n    { \n        defaultOnDriverError(_status);\n    };\n    \n    immutable void delegate (Status) nothrow @nogc defaultOnDriverError = \n    (Status _status)\n    {\n        import core.stdc.stdio : fprintf, stderr;\n        import std.conv : to;\n        fprintf(stderr,\"*** DCompute driver error:%s\\n\",\n               _status.to!(string).toStringz);\n    };\n}\nelse\n{\n    class DComputeDriverException : Exception\n    {\n        this(string msg, string file = __FILE__,\n             size_t line = __LINE__, Throwable next = null)\n        {\n            super(msg, file, line, next);\n        }\n        \n        this(Status err, string file = __FILE__, \n             size_t line = __LINE__, Throwable next = null)\n        {\n            import std.conv : to;\n            super(err.to!string, file, line, next);\n        }\n    }\n    void delegate(Status) onDriverError = (Status _status) \n    {\n        defaultOnDriverError(_status);\n    };\n    immutable void delegate(Status) defaultOnDriverError =\n    (Status _status)\n    {\n        throw new DComputeDriverException(_status);\n    };\n}\n\n// Thread local status\nStatus status;\n\nversion(DComputeIgnoreDriverErrors)\n{\n    void checkErrors() {}\n}\nelse\n{\n    void checkErrors()\n    {\n        if (status) onDriverError(status);\n    }\n\n}\n"
  },
  {
    "path": "source/dcompute/driver/ocl/buffer.d",
    "content": "module dcompute.driver.ocl.buffer;\n\nimport dcompute.driver.ocl;\n\nstruct Buffer(T)\n{\n    cl_mem raw;\n\n    // Host memory associated with this buffer\n    T[] hostMemory;\n    enum CreateType\n    {\n        region =0x1220,\n    }\n    // opSlice clCreateSubBuffer\n}\n"
  },
  {
    "path": "source/dcompute/driver/ocl/context.d",
    "content": "module dcompute.driver.ocl.context;\n\nimport dcompute.driver.ocl;\nimport std.typecons;\n\nimport std.experimental.allocator.typed;\n\nstruct Context\n{\n    cl_context raw;\n    \n    enum Properties\n    {\n        platform        = 0x1084,\n        interopUserSync = 0x1085,\n    }\n    \n    static struct Info\n    {\n        @(0x1080) uint referenceCount;\n        @(0x1081) Device* _devices;\n        @(0x1082) Context.Properties* properties;\n        @(0x1083) uint numDevices;\n        ArrayAccesssor!(_devices,numDevices) devices;\n        // Extensions\n        //@(0x2010) khrTerminate;\n        //@(0x200E) khrMemoryInitialise;\n        //@(0x4014) CONTEXT_D3D10_DEVICE_KHR\n        //@(0x402C) CONTEXT_D3D10_PREFER_SHARED_RESOURCES_KHR\n        //@(0x401D) CONTEXT_D3D11_DEVICE_KHR\n        //@(0x402D) CONTEXT_D3D11_PREFER_SHARED_RESOURCES_KHR\n        //@(0x2025) CONTEXT_ADAPTER_D3D9_KHR\n        //@(0x2026) CONTEXT_ADAPTER_D3D9EX_KHR\n        //@(0x2027) CONTEXT_ADAPTER_DXVA_KHR\n        //@(0x2008) GL_CONTEXT_KHR\n        //@(0x2009) EGL_DISPLAY_KHR\n        //@(0x200A) GLX_DISPLAY_KHR\n        //@(0x200B) WGL_HDC_KHR\n        //@(0x200C) CGL_SHAREGROUP_KHR\n\n    }\n    //mixin(generateGetInfo!(Info,clGetContextInfo));\n    \n    this(Device[] devs,const Properties[] props)\n    {\n        raw = clCreateContext(cast(const cl_context_properties*)props.ptr,\n                              cast(uint)devs.length,cast(const cl_device_id*)devs.ptr,\n                              null,null,\n                              cast(int*)&status);\n        checkErrors();\n    }\n    \n    this(Device.Type type,const Properties[] props)\n    {\n        raw = clCreateContextFromType(cast(const cl_context_properties*)props.ptr,\n                                      cast(cl_device_type)type,\n                                      null,null,\n                                      cast(int*)&status);\n        checkErrors();\n    }\n    void retain()\n    {\n        status = cast(Status)clRetainContext(raw);\n        checkErrors();\n    }\n    \n    void release()\n    {\n        status = cast(Status)clReleaseContext(raw);\n        checkErrors();\n    }\n    \n    Queue createQueue(Device dev,Queue.Properties prop)\n    {\n        Queue ret;\n        ret.raw = clCreateCommandQueue(this.raw,\n                                       dev.raw,\n                                       cast(cl_command_queue_properties)prop,\n                                       cast(int*)&status);\n        checkErrors();\n        return ret;\n    }\n    \n    Buffer!T createBuffer(T)(T[] arr,Memory.Flags flags = (Memory.Flags.useHostPointer | Memory.Flags.readWrite))\n    {\n        import std.stdio;\n        Buffer!T ret;\n        auto len = memSize(arr);\n        ret.raw = clCreateBuffer(raw,flags,len,arr.ptr,cast(int*)&status);\n        ret.hostMemory = arr;\n        checkErrors();\n        return ret;\n    }\n    \n    /*Image.Format[] supportedImageFormats(A)(A allocator, Memory.Flags f,Memory.Type t)\n    {\n        //Double call\n        clGetSupportedImageFormats\n    }*/\n    \n    Sampler createSampler(Flag!\"normalisedCoordinates\" f,\n                          Sampler.AddressingMode aMode,\n                          Sampler.FilterMode fMode)\n    {\n        Sampler ret;\n        ret.raw = clCreateSampler(this.raw,\n                                  cast(cl_bool)f,\n                                  cast(cl_addressing_mode)aMode,\n                                  cast(cl_filter_mode)fMode,\n                                  cast(int*)&status);\n        checkErrors();\n        return ret;\n    }\n    \n    /**Program createProgramFromSource(string[][] sources)\n     {\n        clCreateProgramWithSource\n     }\n    */\n    \n    Program createProgramFromSPIR(A)(A a, Device[] devices,ubyte[] spir)\n    {\n        auto allocator = TypedAllocator!(A)(a);\n        auto lengths = allocator.makeArray!(size_t)(devices.length);\n        lengths[]    = spir.length;\n        auto ptrs  = allocator.makeArray!(ubyte*)(devices.length);\n        ptrs[]       = spir.ptr;\n        Program ret;\n\n        ret.raw = clCreateProgramWithBinary(\n                                this.raw,\n                                cast(uint)devices.length, cast(cl_device_id*)devices.ptr,\n                                lengths.ptr,ptrs.ptr,\n                                null, // TODO report individual errors\n                                cast(int*)&status);\n        allocator.dispose(lengths);\n        allocator.dispose(ptrs);\n        return ret;\n    }\n    Program createProgram(void[] spirv)\n    {\n        Program ret;\n\n        ret.raw = clCreateProgramWithIL(this.raw,\n\t\t\t\t\t\t\t\t\t\tspirv.ptr,\n\t\t\t\t\t\t\t\t\t\tspirv.length,\n\t\t\t\t\t\t\t\t\t\tcast(int*)&status);\n        return ret;\n    }\n    \n    /*Program createProgramFromBuiltinKernels(Device[] devices, string kernelNames)\n    {\n        clCreateProgramWithBuiltInKernels\n    }*/\n}\n"
  },
  {
    "path": "source/dcompute/driver/ocl/device.d",
    "content": "module dcompute.driver.ocl.device;\n\nimport derelict.opencl.cl;\nimport dcompute.driver.ocl;\nimport std.meta: AliasSeq;\n\nstruct Device\n{\n    enum Type : cl_bitfield\n    {\n        default_     = 0x1,\n        CPU         = 0x2,\n        GPU         = 0x4,\n        accelerator = 0x8,\n        custom      = 0x10,\n        all         = 0xFFFFFFFF\n    }\n    \n    enum AffinityDomain : cl_bitfield\n    {\n        numa        = 0x1,\n        l4_Cache    = 0x2,\n        l3_Cache    = 0x4,\n        l2_Cache    = 0x8,\n        l1_Cache    = 0x10,\n        nextPartitionable = 0x20\n    }\n    \n    enum PartitionProperty : long\n    {\n        Equally          = 0x1086,\n        ByCounts         = 0x1087,\n        ByCountsListEnd  = 0,\n        ByAffinityDomain = 0x1088,\n    }\n    \n    enum FPConfig : cl_bitfield\n    {\n        denorm                  = 1 << 0,\n        infNan                  = 1 << 1,\n        roundNearest            = 1 << 2,\n        roundZero               = 1 << 3,\n        rounfInf                = 1 << 4,\n        fma                     = 1 << 5,\n        softFloat               = 1 << 6,\n        correctlyRoundedDivSqrt = 1 << 7,\n    }\n    \n    enum MemoryCacheType : cl_uint\n    {\n        none = 0,\n        readOnly = 1,\n        readWrite = 2,\n    }\n    \n    enum LocalMemoryType : cl_uint\n    {\n        local,\n        global,\n    }\n    \n    enum ExecutionCapabilities : cl_bitfield\n    {\n        kernel,\n        nativeKernel,\n    }\n    \n    static struct Info\n    {\n        @(0x1000) Type type;\n        @(0x1001) uint vendorID;\n        @(0x1002) uint maxComputeUnits;\n        @(0x1003) uint _maxWorkItemDimensions;\n        @(0x1004) size_t maxWorkGroupSize;\n        @(0x1005) size_t* _maxWorkItemSizes;\n        ArrayAccesssor!(_maxWorkItemSizes,_maxWorkItemDimensions) maxWorkItems;\n        @(0x1006) uint preferredVectorWidthByte;\n        @(0x1007) uint preferredVectorWidthShort;\n        @(0x1008) uint preferredVectorWidthInt;\n        @(0x1009) uint preferredVectorWidthLong;\n        @(0x100A) uint preferredVectorWidthFloat;\n        @(0x100B) uint preferredVectorWidthDouble;\n        @(0x100C) uint maxClockFrequency;\n        @(0x1000) uint addressBits;\n        @(0x100E) uint maxReadImageArgs;\n        @(0x100F) uint maxWriteImageArgs;\n        @(0x1010) ulong maxMemoryAllocSize;\n        @(0x1011) size_t image2DMaxWidth;\n        @(0x1012) size_t image2DMaxHeight;\n        @(0x1013) size_t image3DMaxWidth;\n        @(0x1014) size_t image3DMaxHeight;\n        @(0x1015) size_t image3DMaxDepth;\n        @(0x1016) bool imageSupport;\n        @(0x1017) size_t maxParameterSize;\n        @(0x1018) uint maxSamplers;\n        @(0x1019) uint memeoryBaseAddressAlign;\n        @(0x101A) uint minDataTypeAlignSize;        // Deprecated in OpenCl 1.2\n        @(0x101B) FPConfig floatFPConfig;\n        @(0x101C) MemoryCacheType GLobalMemoryCacheType;\n        @(0x101D) uint  globalMemoryCachelineSize;\n        @(0x101E) ulong globalMemoryCacheSize;\n        @(0x101F) ulong globalMemorySize;\n        @(0x1020) ulong maxConstantBufferSize;\n        @(0x1021) uint  maxConstantArgs;\n        @(0x1022) LocalMemoryType localMemoryType;\n        @(0x1023) ulong localMemorySize;\n        @(0x1024) bool errorCorrectionSupport;\n        @(0x1025) size_t profilingTimerResolution;\n        @(0x1026) bool endianLittle;\n        @(0x1027) bool available;\n        @(0x1028) bool compilerAvailable;\n        @(0x1029) ExecutionCapabilities executionCapabilities;\n        @(0x102A) Queue.Properties queueProperties;\n        @(0x102B) char* _name;\n        @(0x102C) char* _vendor;\n        @(0x102D) char* _driverVersion;\n        @(0x102E) char* _profile;\n        @(0x102F) char* _deviceVersion;\n        @(0x1030) char* _extensions;\n        \n        StringzAccessor!(_name) name;\n        StringzAccessor!(_vendor) vendor;\n        StringzAccessor!(_driverVersion) driverVersion;\n        StringzAccessor!(_profile) profile;\n        StringzAccessor!(_deviceVersion) deviceVersion;\n        StringzAccessor!(_extensions) extensions;\n        \n        @(0x1031) Platform platform;\n        @(0x1032) FPConfig doubleFPConfig;\n        @(0x1033) FPConfig halfFPConfig;\n        @(0x1034) uint pefferedVectorWidthHalf;\n        @(0x1035) bool hostUnifiedMemory;\n        @(0x1036) uint nativeVectorWidthByte;\n        @(0x1037) uint nativeVectorWidthShort;\n        @(0x1038) uint nativeVectorWidthInt;\n        @(0x1039) uint nativeVectorWidthLong;\n        @(0x103A) uint nativeVectorWidthFloat;\n        @(0x103B) uint nativeVectorWidthDouble;\n        @(0x103C) uint nativeVectorWidthHalf;\n        @(0x103D) char* _OpenCLCVersion;\n        StringzAccessor!(_OpenCLCVersion) OpenCLCVersion;\n        @(0x103E) bool linkerAvailable;\n        @(0x103F) char* _builtinKernels;\n        StringzAccessor!(_builtinKernels) builtinKernels;\n        @(0x1040) size_t imageMaxBufferSize;\n        @(0x1041) size_t imageMaxArraySize;\n        @(0x1042) Device parentDevice;\n        @(0x1043) uint partitionMaxSubDevices;\n        //@(0x1044) PartitionProperty* _partitionProperties;\n        //ZeroTerminatedArrayAccessor!(_partitionProperties) partitionProperties;\n        @(0x1045) AffinityDomain partitionAffinityDomain;\n        //@(0x1046) PartitionProperty* _partitionType;\n        //ZeroTerminatedArrayAccessor!(_partitionType) partitionType;\n        @(0x1047) uint peferenceCount;\n        @(0x1048) bool prefferedInteropUserSync;\n        @(0x1049) size_t printfBufferSize;\n        \n        // Extensions\n        //@(0x200F) khrTeminateCapability;\n        //@(0x4000) nvComputeCapabilityMajor;\n        //@(0x4001) nvComputeCapabilityMinor;\n        //@(0x4002) nvRegistersPerBlock;\n        //@(0x4003) nvWarpSize;\n        //@(0x4004) nvGPUOverlap;\n        //@(0x4005) nvKerenlExecTimeout;\n        //@(0x4006) nvIntegratedMemory;\n        \n        //@(0x4036) amdProfilingTimerOffset\n    }\n    \n    cl_device_id raw;\n\n    mixin(generateGetInfo!(Info,clGetDeviceInfo));\n    \n    //Is this a double call function? Also what to do about properties\n    //its zero terminated an can contain numbers\n    //see http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clCreateSubDevices.html under the examples\n    /*Device[] createSubDevices(cl.device_partition_property[] properites\n                                cl.uint_ numSubDevices)\n    {\n        \n    }\n    */\n\n    void retain()\n    {\n        status = cast(Status)clRetainDevice(raw);\n        checkErrors();\n    }\n    \n    void release()\n    {\n        status = cast(Status)clReleaseDevice(raw);\n        checkErrors();\n    }\n    \n}\n"
  },
  {
    "path": "source/dcompute/driver/ocl/event.d",
    "content": "module dcompute.driver.ocl.event;\n\nimport dcompute.driver.ocl;\n\nstruct Event\n{\n    cl_event raw;\n    enum EnqueuedCommand\n    {\n        kernel            = 0x11F0,\n        task              = 0x11F1,\n        nativeKernel      = 0x11F2,\n        bufferRead        = 0x11F3,\n        bufferWrite       = 0x11F4,\n        bufferCopy        = 0x11F5,\n        imageRead         = 0x11F6,\n        imageWrite        = 0x11F7,\n        imageCopy         = 0x11F8,\n        imageToBufferCopy = 0x11F9,\n        bufferToImageCopy = 0x11FA,\n        bufferMap         = 0x11FB,\n        imageMap          = 0x11FC,\n        unmap             = 0x11FD,\n        marker            = 0x11FE,\n        acquireGLObjects  = 0x11FF,\n        releaseGLObjects  = 0x1200,\n        bufferRectRead    = 0x1201,\n        bufferRectWrite   = 0x1202,\n        bufferRectCopy    = 0x1203,\n        user              = 0x1204,\n        barrier           = 0x1205,\n        migrate           = 0x1206,\n        bufferFill        = 0x1207,\n        imageFill         = 0x1208,\n        \n        // Extensions\n        acquireD3D10Objects = 0x4017,\n        releaseD3D10Objects = 0x4018,\n        acquireDX9MediaSurfaces = 0x202B,\n        releaseDX9MediaSurfaces = 0x202C,\n        acquireD3D11Objects = 0x4020,\n        releaseD3D11Objects = 0x4021,\n        GLFenceSyncObject   = 0x200D,\n        EGLFenceSyncObject  = 0x202F,\n        acquireEGLObjects   = 0x202D,\n        releaseEGLObjects   = 0x202E,\n\n    }\n    \n    \n    enum EcexutionStatus\n    {\n        complete  = 0x0,\n        running   = 0x1,\n        submitted = 0x2,\n        queued    = 0x3,\n    }\n    static struct Info\n    {\n        @(0x11D0) Queue queue;\n        @(0x11D1) EnqueuedCommand type;\n        @(0x11D2) uint referenceCount;\n        @(0x11D3) EcexutionStatus status;\n        @(0x11D4) Context context;\n    }\n    //mixin(generateGetInfo!(Info,clGetEventInfo));\n    \n    void retain()\n    {\n        status = cast(Status)clRetainEvent(raw);\n        checkErrors();\n    }\n    \n    void release()\n    {\n        status = cast(Status)clReleaseEvent(raw);\n        checkErrors();\n    }\n    void wait()\n    {\n        clWaitForEvents(1,&raw);\n    }\n}\n\nvoid wait(Event[] e)\n{\n    clWaitForEvents(cast(uint)e.length,cast(cl_event*)e.ptr);\n}\n"
  },
  {
    "path": "source/dcompute/driver/ocl/image.d",
    "content": "module dcompute.driver.ocl.image;\n\nimport dcompute.driver.ocl;\nstruct Image\n{\n    cl_mem raw;\n    \n    enum ChannelOrder\n    {\n        r            = 0x10B0,\n        a            = 0x10B1,\n        rg           = 0x10B2,\n        ra           = 0x10B3,\n        rgb          = 0x10B4,\n        rgba         = 0x10B5,\n        bgra         = 0x10B6,\n        argb         = 0x10B7,\n        intesity     = 0x10B8,\n        luminance    = 0x10B9,\n        Rx           = 0x10BA,\n        RGx          = 0x10BB,\n        RGBx         = 0x10BC,\n        depth        = 0x10BD,\n        depthStencil = 0x10BE,\n    }\n    \n    enum  ChannelType\n    {\n        snormInt8      = 0x10D0,\n        snormInt16     = 0x10D1,\n        unormInt8      = 0x10D2,\n        unormInt16     = 0x10D3,\n        uormShort565   = 0x10D4,\n        uormShort555   = 0x10D5,\n        unormInt101010 = 0x10D6,\n        byte_          = 0x10D7,\n        short_         = 0x10D8,\n        int_           = 0x10D9,\n        ubyte_         = 0x10DA,\n        ushort_        = 0x10DB,\n        uint_          = 0x10DC,\n        half_          = 0x10DD,\n        float_         = 0x10DE,\n        unormInt24     = 0x10DF,\n    }\n    static struct Format\n    {\n        ChannelOrder order;\n        ChannelType  type;\n    }\n    static struct Info\n    {\n        @(0x1110) Format format;\n        @(0x1111) size_t elementSize;\n        @(0x1112) size_t rowPitch;\n        @(0x1113) size_t slicePitch;\n        @(0x1114) size_t width;\n        @(0x1115) size_t height;\n        @(0x1116) size_t depth;\n        @(0x1117) size_t arraySize;\n        @(0x1118) Memory memory;\n        @(0x1119) uint mipLevels;\n        @(0x111A) uint samples;\n        \n        // Extensions\n        //@(0x4016) D3D10_SUBRESOURCE_KHR\n        //@(0x401F) D3D11_SUBRESOURCE_KHR\n        //@(0x202A) DX9_MEDIA_PLANE_KHR\n    }\n    //mixin(generateGetInfo!(Info,clGetImageInfo));\n}\n"
  },
  {
    "path": "source/dcompute/driver/ocl/kernel.d",
    "content": "module dcompute.driver.ocl.kernel;\n\nimport dcompute.driver.ocl;\n\nstruct Kernel(F) if (is(F == function) || is(F==void))\n{\n    cl_kernel raw;\n    \n    static struct Info\n    {\n        @(0x1190) immutable char* _name;\n        StringzAccessor!(_name) name;\n        @(0x1191) uint numArgs;\n        @(0x1192) uint referenceCount;\n        @(0x1193) Context context;\n        @(0x1194) Program program;\n        @(0x1195) immutable char* _attributes;\n        StringzAccessor!(_attributes) attributes;\n    }\n    //mixin(generateGetInfo!(Info,clGetKernelInfo));\n    void retain()\n    {\n        status = cast(Status)clRetainKernel(raw);\n        checkErrors();\n    }\n    \n    void release()\n    {\n        status = cast(Status)clReleaseKernel(raw);\n        checkErrors();\n    }\n    \n    void setArg(T)(uint index, T val, const bool isPrivate = false)\n    {\n\t\tstatic if (__traits(hasMember, T, \"raw\")) {\n\t\t\tstatus = cast(Status)clSetKernelArg(this.raw, index, cl_mem.sizeof, (isPrivate ? null : &val.raw));\n\t\t} else {\n\t\t\tstatus = cast(Status)clSetKernelArg(this.raw, index, T.sizeof, (isPrivate ? null : &val));\n\t\t}\n        checkErrors();\n    }\n}\n\nstruct Arg\n{\n    Kernel!void kernel;\n    uint argIndex;\n    enum AddressQualifier\n    {\n        global   = 0x119B,\n        local    = 0x119C,\n        constant = 0x119D,\n        private_ = 0x119E,\n    }\n    \n    enum AccessQualifier\n    {\n        readOnly  = 0x11A0,\n        writeOnly = 0x11A1,\n        readWrite = 0x11A2,\n        none      = 0x11A3,\n    }\n    \n    enum TypeQualifier\n    {\n        none     = 0,\n        const_   = 1 << 0,\n        restrict = 1 << 1,\n        volatile = 1 << 2,\n    }\n    \n    static struct Info\n    {\n        @(0x1196) AddressQualifier addressQualifier;\n        @(0x1197) AccessQualifier accessQualifier;\n        @(0x1198) immutable char* _typeName;\n        StringzAccessor!(_typeName) typeName;\n        @(0x1199) TypeQualifier typeQualifier;\n        @(0x119A) immutable char* _name;\n        StringzAccessor!(_name) name;\n    }\n    \n    //mixin(generateGetInfo!(Info,clGetKernelArgInfo,\"kernel.raw,argIndex\"));\n}\n\nstruct WorkGroup\n{\n    Kernel!void kernel;\n    Device device;\n    static struct Info\n    {\n        @(0x11B0) size_t workGroupSize;\n        @(0x11B1) size_t[3] requiredWorkGroupSize;\n        @(0x11B2) ulong localMemorySize;\n        @(0x11B3) size_t preferredWorkGroupSizeMultiple;\n        @(0x11B4) ulong privateMemSize;\n        @(0x11B5) size_t[3] globalWorkSize;\n    }\n    \n    //mixin(generateGetInfo!(Info,clGetKernelWorkGroupInfo,\"kernel.raw,device.raw\"));\n}\n"
  },
  {
    "path": "source/dcompute/driver/ocl/memory.d",
    "content": "module dcompute.driver.ocl.memory;\n\nimport dcompute.driver.ocl;\n\nstruct Memory\n{\n    enum Type\n    {\n        buffer         = 0x10F0,\n        image2D        = 0x10F1,\n        image3D        = 0x10F2,\n        image2Darray   = 0x10F3,\n        image1D        = 0x10F4,\n        image1Darray   = 0x10F5,\n        image1Dbuffer = 0x10F6,\n    }\n    \n    enum Flags\n    {\n        none                = 0,\n        readWrite           = 1 << 0,\n        writeOnly           = 1 << 1,\n        readOnly            = 1 << 2,\n        useHostPointer      = 1 << 3,\n        allocateHostPointer = 1 << 4,\n        copyHostPointer     = 1 << 5,\n        //reserved            1 << 6,\n        hostReadWrite       = 1 << 7,\n        hostReadOnly        = 1 << 8,\n        hostNoAccess        = 1 << 9,\n    }\n    \n    static struct Info\n    {\n        @(0x1100) Type type;\n        @(0x1101) Flags flags;\n        @(0x1102) size_t size;\n        @(0x1103) void* hostPointer;\n        @(0x1104) uint mapCount;\n        @(0x1105) uint referenceCount;\n        @(0x1106) Context context;\n        @(0x1107) Memory associatedMemory;\n        @(0x1108) size_t offset;\n        \n        // Extensions\n        //@(0x4015) D3D10_RESOURCE_KHR\n        //@(0x401E) D3D10_RESOURCE_KHR\n        //@(0x2028) DX9_MEDIA_ADAPTER_TYPE_KHR\n        //@(0x2029) DX9_MEDIA_SURFACE_INFO_KHR\n    }\n    cl_mem raw;\n    \n    //mixin(generateGetInfo!(Info,clGetMemObjectInfo));\n    void retain()\n    {\n        status = cast(Status)clRetainMemObject(raw);\n        checkErrors();\n    }\n    void release()\n    {\n        status = cast(Status)clReleaseMemObject(raw);\n        checkErrors();\n    }\n}\n"
  },
  {
    "path": "source/dcompute/driver/ocl/package.d",
    "content": "module dcompute.driver.ocl;\n\npublic import dcompute.driver.error;\n\npublic import dcompute.driver.ocl.buffer;\npublic import dcompute.driver.ocl.context;\npublic import dcompute.driver.ocl.device;\npublic import dcompute.driver.ocl.event;\npublic import dcompute.driver.ocl.image;\npublic import dcompute.driver.ocl.kernel;\npublic import dcompute.driver.ocl.memory;\npublic import dcompute.driver.ocl.platform;\npublic import dcompute.driver.ocl.program;\npublic import dcompute.driver.ocl.queue;\npublic import dcompute.driver.ocl.raw;\npublic import dcompute.driver.ocl.sampler;\npublic import dcompute.driver.ocl.util;\n"
  },
  {
    "path": "source/dcompute/driver/ocl/platform.d",
    "content": "module dcompute.driver.ocl.platform;\n\nimport dcompute.driver.ocl;\nimport std.experimental.allocator.typed;\nimport std.meta: AliasSeq;\n\nstruct Platform\n{\n\tstatic void initialise()\n\t{\n\t\tDerelictCL.load();\n\t}\n    static struct Info\n    {\n        @(0x0900) immutable(char)* _profile;\n        @(0x0901) immutable(char)* _version_;\n        @(0x0902) immutable(char)* _name;\n        @(0x0903) immutable(char)* _vendor;\n        @(0x0904) immutable(char)* _extensions;\n        StringzAccessor!(_profile) profile;\n        StringzAccessor!(_version_) version_;\n        StringzAccessor!(_name) name;\n        StringzAccessor!(_vendor) vendor;\n        StringzAccessor!(_extensions) extensions;\n        // Extensions\n        //@(0x0920) khrICDSuffix;\n\n    }\n\n    mixin(generateGetInfo!(Info,clGetPlatformInfo));\n\n    cl_platform_id raw;\n    static Platform[] getPlatforms(A)(A a)\n    {\n        auto allocator = TypedAllocator!(A)(a);\n        cl_uint numPlatforms;\n        status = cast(Status)clGetPlatformIDs(0,null,&numPlatforms);\n        checkErrors();\n        cl_platform_id[] ret = allocator.makeArray!(cl_platform_id)(numPlatforms);\n        status = cast(Status)clGetPlatformIDs(numPlatforms,cast(cl_platform_id*)ret.ptr,null);\n        checkErrors();\n        return cast(Platform[])ret;\n    }\n    \n    Device[] getDevices(A)(A a,Device.Type device_type = Device.Type.all)\n    {\n        auto allocator = TypedAllocator!(A)(a);\n        uint numDevices;\n        status = cast(Status)clGetDeviceIDs(\n            raw,\n            cast(cl_device_type)device_type,\n            0,\n            null,\n            &numDevices);\n        \n        auto deviceIDs = allocator.makeArray!cl_device_id(numDevices);\n        \n        status = cast(Status)clGetDeviceIDs(\n            raw,\n            cast(cl_device_type)device_type,\n            numDevices,\n            deviceIDs.ptr,\n            null);\n        \n        return cast(Device[])deviceIDs;\n    }\n    \n    // clGetExtensionFunctionAddressForPlatform\n}\n"
  },
  {
    "path": "source/dcompute/driver/ocl/program.d",
    "content": "module dcompute.driver.ocl.program;\n\nimport dcompute.driver.ocl;\nimport std.meta: AliasSeq;\nimport std.string : toStringz;\n\nstruct Program\n{\n    static struct Info\n    {\n        @(0x1160) uint referneceCount;\n        @(0x1161) Context context;\n        \n        @(0x1162) uint _numDevices;\n        @(0x1163) Device* _devices;\n        ArrayAccesssor!(_devices,_numDevices) devices;\n        \n        @(0x1164) char* _source;\n        StringzAccessor!(_source) source;\n        \n        @(0x1165) size_t* _binarySizes;\n        @(0x1166) ubyte** _binaries;\n        @(0x1167) size_t  _numKernels;\n        ArrayAccesssor2D!(_binaries,_binarySizes,_numKernels) binaries;\n        \n        @(0x1168) char* _kernelNames;\n        StringzAccessor!(_kernelNames) kernelNames;\n    }\n    static Program globalProgram;\n    cl_program raw;\n    mixin(generateGetInfo!(Info,clGetProgramInfo));\n    void retain()\n    {\n        status = cast(Status)clRetainProgram(raw);\n        checkErrors();\n    }\n    \n    void release()\n    {\n        status = cast(Status)clReleaseProgram(raw);\n        checkErrors();\n    }\n    void build(Device[] devices, string options)\n    {\n        status = cast(Status)clBuildProgram(raw,\n                                cast(uint)devices.length,cast(cl_device_id*)devices.ptr,\n                                options.toStringz,\n                                null,null);\n        checkErrors();\n    }\n    \n    Kernel!(typeof(sym)) getKernel(alias sym)()\n    {\n        Kernel!void ret = getKernel(sym.mangleof);\n        return cast(typeof(return))ret;\n    }\n    Kernel!void getKernel(string name)\n    {\n        Kernel!void ret;\n        ret.raw = clCreateKernel(this.raw,name.toStringz,cast(int*)&status);\n        checkErrors();\n        return ret;\n    }\n    \n}\n\n\n\nstruct Build\n{\n    Program program;\n    Device  device;\n    enum  BinaryType\n    {\n        none       = 0x0,\n        object     = 0x1,\n        library    = 0x2,\n        executable = 0x4,\n    }\n    \n    enum Status\n    {\n        success    =  0,\n        none       = -1,\n        error      = -2,\n        inProgress = -3,\n    }\n    \n    static struct Info\n    {\n        @(0x1181) Status status;\n        @(0x1182) char* _options;\n        StringzAccessor!(_options) options;\n        @(0x1183) char* _log;\n        StringzAccessor!(_log) log;\n        @(0x1184) BinaryType binaryType;\n    }\n    mixin(generateGetInfo!(Info,clGetProgramBuildInfo,\"program.raw,device.raw\"));\n}\n"
  },
  {
    "path": "source/dcompute/driver/ocl/queue.d",
    "content": "module dcompute.driver.ocl.queue;\n\nimport dcompute.driver.ocl;\nimport dcompute.driver.util;\nimport std.typecons;\n\nenum MapBufferFlags\n{\n    read                  = 1 << 0,\n    write                 = 1 << 1,\n    writeInvaildateRegion = 1 << 2,\n}\n\nenum  MemoryMigrationFlags\n{\n    host             = 1 << 0,\n    contentUndefined = 1 << 1,\n}\n\nstruct Queue\n{\n    cl_command_queue raw;\n    // constructed from context\n    \n    enum Properties : cl_bitfield\n    {\n        outOfOrderExecution = 1 << 0,\n        profiling = 1 << 1\n    }\n    static struct Info\n    {\n        @(0x1090) Context context;\n        @(0x1091) Device device;\n        @(0x1092) uint referenceCount;\n        @(0x1093) Properties properties;\n    }\n    \n    //mixin(generateGetInfo!(Info,clGetCommandQueueInfo));\n    \n    void retain()\n    {\n        status = cast(Status)clRetainCommandQueue(raw);\n        checkErrors();\n    }\n    \n    void release()\n    {\n        status = cast(Status)clReleaseCommandQueue(raw);\n        checkErrors();\n    }\n    \n    Event write(T)(Buffer!T buffer, T[] data,\n                   Flag!\"Blocking\" blocking = Yes.Blocking,\n                   size_t offset = 0, const Event[] waitList = null)\n    {\n        Event ret;\n        status = cast(Status)clEnqueueWriteBuffer(this.raw, buffer.raw, cast(cl_bool)blocking, offset,\n                                      data.memSize, cast(void*)data.ptr,\n                                      cast(cl_uint)waitList.length, cast(cl_event*)waitList.ptr,\n                                      &ret.raw);\n        checkErrors();\n        return ret;\n                    \n    }\n    \n    Event read(T)(Buffer!T buffer, T[] data,\n                  Flag!\"Blocking\" blocking = Yes.Blocking,\n                  size_t offset = 0, const Event[] waitList = null)\n    {\n        Event ret;\n        status = cast(Status)clEnqueueReadBuffer(this.raw, buffer.raw, cast(cl_bool)blocking, offset,\n                                     data.memSize, cast(void*)data.ptr,\n                                     cast(cl_uint)waitList.length, cast(cl_event*)waitList.ptr,\n                                     &ret.raw);\n        checkErrors();\n        return ret;\n    }\n    \n    auto enqueue(alias k)(const size_t[] globalWorkSize,\n                        const size_t[] globalWorkOffset = null, const size_t[] localWorkSize = null,\n                        const Event[] waitList = null)\n    in\n    {\n        if(globalWorkOffset)\n            assert(globalWorkSize.length == globalWorkOffset.length);\n        if(localWorkSize)\n            assert(globalWorkSize.length == localWorkSize.length);\n    }\n    do\n    {\n        static struct Call\n        {\n            Queue q;\n            const size_t[] globalWorkSize, globalWorkOffset,localWorkSize;\n            const Event[] waitList;\n            this(Queue _q,const size_t[] a, const size_t[] b, const size_t[] c, const Event[] d)\n            {\n                q = _q;\n                globalWorkSize = a;\n                globalWorkOffset = b;\n                localWorkSize = c;\n                waitList = d;\n            }\n            Event opCall(HostArgsOf!(typeof(k)) args)\n            {\n                auto kernel = Program.globalProgram.getKernel!k();\n                foreach(uint i, a; args)\n                {\n                    kernel.setArg(i,a);\n                }\n                Event e;\n                clEnqueueNDRangeKernel(q.raw, kernel.raw,\n                                       cast(uint)globalWorkSize.length,\n                                       globalWorkOffset.ptr, globalWorkSize.ptr, localWorkSize.ptr,\n                                       cast(uint)waitList.length, cast(cl_event*)waitList.ptr,\n                                       &e.raw);\n                kernel.release();\n                return e;\n            }\n        }\n        \n        return Call(this,globalWorkSize,globalWorkOffset,localWorkSize,waitList);\n    }\n    \n    Queue flush()\n    {\n        clFlush(this.raw);\n        return this;\n    }\n\n    Queue finish()\n    {\n        clFinish(this.raw);\n        return this;\n    }\n    //TODO: fill, copy, marker, barrier [, rectFill, rect Copy]\n\n    \n}\n"
  },
  {
    "path": "source/dcompute/driver/ocl/raw/enums.d",
    "content": "module dcompute.driver.ocl.raw.enums;\n\nimport dcompute.driver.ocl;\n\nenum //: profiling_info\n{\n    PROFILING_COMMAND_QUEUED = 0x1280,\n    PROFILING_COMMAND_SUBMIT = 0x1281,\n    PROFILING_COMMAND_START  = 0x1282,\n    PROFILING_COMMAND_END    = 0x1283,\n}\n\n// device_partition_property_ext extension\nenum\n{\n    DEVICE_PARTITION_EQUALLY_EXT             = 0x4050,\n    DEVICE_PARTITION_BY_COUNTS_EXT           = 0x4051,\n    DEVICE_PARTITION_BY_NAMES_EXT            = 0x4052,\n    DEVICE_PARTITION_BY_AFFINITY_DOMAIN_EXT  = 0x4053,\n}\n\n// clDeviceGetInfo selectors\nenum\n{\n    DEVICE_PARENT_DEVICE_EXT                 = 0x4054,\n    DEVICE_PARTITION_TYPES_EXT               = 0x4055,\n    DEVICE_AFFINITY_DOMAINS_EXT              = 0x4056,\n    DEVICE_REFERENCE_COUNT_EXT               = 0x4057,\n    DEVICE_PARTITION_STYLE_EXT               = 0x4058,\n}\n\n// AFFINITY_DOMAINs\nenum\n{\n    AFFINITY_DOMAIN_L1_CACHE_EXT             = 0x1,\n    AFFINITY_DOMAIN_L2_CACHE_EXT             = 0x2,\n    AFFINITY_DOMAIN_L3_CACHE_EXT             = 0x3,\n    AFFINITY_DOMAIN_L4_CACHE_EXT             = 0x4,\n    AFFINITY_DOMAIN_NUMA_EXT                 = 0x10,\n    AFFINITY_DOMAIN_NEXT_FISSIONABLE_EXT     = 0x100,\n}\n\n// device_partition_property_ext list terminators\nenum\n{\n    PROPERTIES_LIST_END_EXT          =  0,\n    PARTITION_BY_COUNTS_LIST_END_EXT =  0,\n    PARTITION_BY_NAMES_LIST_END_EXT  =  0 - 1,\n}\n\n\n// egl.h\n\n// gl.h\n\n// gl_object_type\nenum\n{\n    GL_OBJECT_BUFFER                         = 0x2000,\n    GL_OBJECT_TEXTURE2D                      = 0x2001,\n    GL_OBJECT_TEXTURE3D                      = 0x2002,\n    GL_OBJECT_RENDERBUFFER                   = 0x2003,\n    GL_OBJECT_TEXTURE2D_ARRAY                = 0x200E,\n    GL_OBJECT_TEXTURE1D                      = 0x200F,\n    GL_OBJECT_TEXTURE1D_ARRAY                = 0x2010,\n    GL_OBJECT_TEXTURE_BUFFER                 = 0x2011,\n}\n\n// gl_texture_info\nenum\n{\n    GL_TEXTURE_TARGET                        = 0x2004,\n    GL_MIPMAP_LEVEL                          = 0x2005,\n    GL_NUM_SAMPLES                           = 0x2012,\n}\n\n// gl_context_info\nenum\n{\n    CURRENT_DEVICE_FOR_GL_CONTEXT_KHR        = 0x2006,\n    DEVICES_FOR_GL_CONTEXT_KHR               = 0x2007,\n}\n\n\n// d3d10_device_source_nv\nenum\n{\n    D3D10_DEVICE_KHR                             = 0x4010,\n    D3D10_DXGI_ADAPTER_KHR                       = 0x4011,\n}\n\n// d3d10_device_set_nv\nenum\n{\n    PREFERRED_DEVICES_FOR_D3D10_KHR              = 0x4012,\n    ALL_DEVICES_FOR_D3D10_KHR                    = 0x4013,\n}\n\n// d3d11_device_source\nenum\n{\n    D3D11_DEVICE_KHR                             = 0x4019,\n    D3D11_DXGI_ADAPTER_KHR                       = 0x401A,\n}\n\n// d3d11_device_set\nenum\n{\n    PREFERRED_DEVICES_FOR_D3D11_KHR              = 0x401B,\n    ALL_DEVICES_FOR_D3D11_KHR                    = 0x401C,\n}\n\n// media_adapter_type_khr\nenum\n{\n    ADAPTER_D3D9_KHR                             = 0x2020,\n    ADAPTER_D3D9EX_KHR                           = 0x2021,\n    ADAPTER_DXVA_KHR                             = 0x2022,\n}\n\n// media_adapter_set_khr\nenum\n{\n    PREFERRED_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR  = 0x2023,\n    ALL_DEVICES_FOR_DX9_MEDIA_ADAPTER_KHR        = 0x2024,\n}\n\n"
  },
  {
    "path": "source/dcompute/driver/ocl/raw/functions.d",
    "content": "module dcompute.driver.ocl.raw.functions;\n\n//This is an autogenerated file, do not edit\n\n\nimport dcompute.driver.ocl;\n//nothrow: @nogc:\n\n/*\nauto getEventProfilingInfo(event a, profiling_info b, size_t c, void* d, size_t* e)\n{\n    debug assert(clGetEventProfilingInfo);\n    auto ret = cast(int)clGetEventProfilingInfo(cast(cl_event)a, cast(cl_profiling_info)b, cast(size_t)c, cast(void*)d, cast(size_t*)e);\n    return ret;\n}\n\nauto enqueueCopyBuffer(command_queue a, mem b, mem c, size_t d, size_t e, size_t f, uint g, const(event*) h, event* i)\n{\n    debug assert(clEnqueueCopyBuffer);\n    auto ret = cast(int)clEnqueueCopyBuffer(cast(cl_command_queue)a, cast(cl_mem)b, cast(cl_mem)c, cast(size_t)d, cast(size_t)e, cast(size_t)f, cast(cl_uint)g, cast(const(cl_event*))h, cast(cl_event*)i);\n    return ret;\n}\n\nauto enqueueReadImage(command_queue a, mem b, bool c, const(size_t*) d, const(size_t*) e, size_t f, size_t g, void* h, uint i, const(event*) j, event* k)\n{\n    debug assert(clEnqueueReadImage);\n    auto ret = cast(int)clEnqueueReadImage(cast(cl_command_queue)a, cast(cl_mem)b, cast(cl_bool)c, cast(const(size_t*))d, cast(const(size_t*))e, cast(size_t)f, cast(size_t)g, cast(void*)h, cast(cl_uint)i, cast(const(cl_event*))j, cast(cl_event*)k);\n    return ret;\n}\n\nauto enqueueWriteImage(command_queue a, mem b, bool c, const(size_t*) d, const(size_t*) e, size_t f, size_t g, const(void*) h, uint i, const(event*) j, event* k)\n{\n    debug assert(clEnqueueWriteImage);\n    auto ret = cast(int)clEnqueueWriteImage(cast(cl_command_queue)a, cast(cl_mem)b, cast(cl_bool)c, cast(const(size_t*))d, cast(const(size_t*))e, cast(size_t)f, cast(size_t)g, cast(const(void*))h, cast(cl_uint)i, cast(const(cl_event*))j, cast(cl_event*)k);\n    return ret;\n}\n\nauto enqueueCopyImage(command_queue a, mem b, mem c, const(size_t*) d, const(size_t*) e, const(size_t*) f, uint g, const(event*) h, event* i)\n{\n    debug assert(clEnqueueCopyImage);\n    auto ret = cast(int)clEnqueueCopyImage(cast(cl_command_queue)a, cast(cl_mem)b, cast(cl_mem)c, cast(const(size_t*))d, cast(const(size_t*))e, cast(const(size_t*))f, cast(cl_uint)g, cast(const(cl_event*))h, cast(cl_event*)i);\n    return ret;\n}\n\nauto enqueueCopyImageToBuffer(command_queue a, mem b, mem c, const(size_t*) d, const(size_t*) e, size_t f, uint g, const(event*) h, event* i)\n{\n    debug assert(clEnqueueCopyImageToBuffer);\n    auto ret = cast(int)clEnqueueCopyImageToBuffer(cast(cl_command_queue)a, cast(cl_mem)b, cast(cl_mem)c, cast(const(size_t*))d, cast(const(size_t*))e, cast(size_t)f, cast(cl_uint)g, cast(const(cl_event*))h, cast(cl_event*)i);\n    return ret;\n}\n\nauto enqueueCopyBufferToImage(command_queue a, mem b, mem c, size_t d, const(size_t*) e, const(size_t*) f, uint g, const(event*) h, event* i)\n{\n    debug assert(clEnqueueCopyBufferToImage);\n    auto ret = cast(int)clEnqueueCopyBufferToImage(cast(cl_command_queue)a, cast(cl_mem)b, cast(cl_mem)c, cast(size_t)d, cast(const(size_t*))e, cast(const(size_t*))f, cast(cl_uint)g, cast(const(cl_event*))h, cast(cl_event*)i);\n    return ret;\n}\n\nauto enqueueMapBuffer(command_queue a, mem b, bool c, map_flags d, size_t e, size_t f, uint g, const(event*) h, event* i, int* j)\n{\n    debug assert(clEnqueueMapBuffer);\n    auto ret = cast(void*)clEnqueueMapBuffer(cast(cl_command_queue)a, cast(cl_mem)b, cast(cl_bool)c, cast(cl_map_flags)d, cast(size_t)e, cast(size_t)f, cast(cl_uint)g, cast(const(cl_event*))h, cast(cl_event*)i, cast(cl_int*)j);\n    return ret;\n}\n\nauto enqueueMapImage(command_queue a, mem b, bool c, map_flags d, const(size_t*) e, const(size_t*) f, size_t* g, size_t* h, uint i, const(event*) j, event* k, int* l)\n{\n    debug assert(clEnqueueMapImage);\n    auto ret = cast(void*)clEnqueueMapImage(cast(cl_command_queue)a, cast(cl_mem)b, cast(cl_bool)c, cast(cl_map_flags)d, cast(const(size_t*))e, cast(const(size_t*))f, cast(size_t*)g, cast(size_t*)h, cast(cl_uint)i, cast(const(cl_event*))j, cast(cl_event*)k, cast(cl_int*)l);\n    return ret;\n}\n\nauto enqueueUnmapMemObject(command_queue a, mem b, void* c, uint d, const(event*) e, event* f)\n{\n    debug assert(clEnqueueUnmapMemObject);\n    auto ret = cast(int)clEnqueueUnmapMemObject(cast(cl_command_queue)a, cast(cl_mem)b, cast(void*)c, cast(cl_uint)d, cast(const(cl_event*))e, cast(cl_event*)f);\n    return ret;\n}\n\nauto enqueueNDRangeKernel(command_queue a, kernel b, uint c, const(size_t*) d, const(size_t*) e, const(size_t*) f, uint g, const(event*) h, event* i)\n{\n    debug assert(clEnqueueNDRangeKernel);\n    auto ret = cast(int)clEnqueueNDRangeKernel(cast(cl_command_queue)a, cast(cl_kernel)b, cast(cl_uint)c, cast(const(size_t*))d, cast(const(size_t*))e, cast(const(size_t*))f, cast(cl_uint)g, cast(const(cl_event*))h, cast(cl_event*)i);\n    return ret;\n}\n\nauto enqueueTask(command_queue a, kernel b, uint c, const(event*) d, event* e)\n{\n    debug assert(clEnqueueTask);\n    auto ret = cast(int)clEnqueueTask(cast(cl_command_queue)a, cast(cl_kernel)b, cast(cl_uint)c, cast(const(cl_event*))d, cast(cl_event*)e);\n    return ret;\n}\n\nextern(System) alias enqueueNativeKernel_FuncAlias = void function(void*);\nauto enqueueNativeKernel(command_queue a, enqueueNativeKernel_FuncAlias b, void* c, size_t d, uint e, const(mem*) f, const(void*)* g, uint h, const(event*) i, event* j)\n{\n    debug assert(clEnqueueNativeKernel);\n    auto ret = cast(int)clEnqueueNativeKernel(cast(cl_command_queue)a, cast(enqueueNativeKernel_FuncAlias)b, cast(void*)c, cast(size_t)d, cast(cl_uint)e, cast(const(cl_mem*))f, cast(const(void*)*)g, cast(cl_uint)h, cast(const(cl_event*))i, cast(cl_event*)j);\n    return ret;\n}\n\nauto setCommandQueueProperty(command_queue a, command_queue_properties b, bool c, command_queue_properties* d)\n{\n    debug assert(clSetCommandQueueProperty);\n    auto ret = cast(int)clSetCommandQueueProperty(cast(cl_command_queue)a, cast(cl_command_queue_properties)b, cast(cl_bool)c, cast(cl_command_queue_properties*)d);\n    return ret;\n}\n\nauto createSubBuffer(mem a, mem_flags b, buffer_create_type c, const(void*) d, int* e)\n{\n    debug assert(clCreateSubBuffer);\n    auto ret = cast(mem)clCreateSubBuffer(cast(cl_mem)a, cast(cl_mem_flags)b, cast(cl_buffer_create_type)c, cast(const(void*))d, cast(cl_int*)e);\n    return ret;\n}\n\nextern(System) alias setMemObjectDestructorCallback_FuncAlias = void function(cl_mem, void*);\nauto setMemObjectDestructorCallback(mem a, setMemObjectDestructorCallback_FuncAlias b, void* c)\n{\n    debug assert(clSetMemObjectDestructorCallback);\n    auto ret = cast(int)clSetMemObjectDestructorCallback(cast(cl_mem)a, cast(setMemObjectDestructorCallback_FuncAlias)b, cast(void*)c);\n    return ret;\n}\n\nauto createUserEvent(context a, int* b)\n{\n    debug assert(clCreateUserEvent);\n    auto ret = cast(event)clCreateUserEvent(cast(cl_context)a, cast(cl_int*)b);\n    return ret;\n}\n\nauto setUserEventStatus(event a, int b)\n{\n    debug assert(clSetUserEventStatus);\n    auto ret = cast(int)clSetUserEventStatus(cast(cl_event)a, cast(cl_int)b);\n    return ret;\n}\n\nextern(System) alias setEventCallback_FuncAlias = void function(cl_event, cl_int, void*);\nauto setEventCallback(event a, int b, setEventCallback_FuncAlias c, void* d)\n{\n    debug assert(clSetEventCallback);\n    auto ret = cast(int)clSetEventCallback(cast(cl_event)a, cast(cl_int)b, cast(setEventCallback_FuncAlias)c, cast(void*)d);\n    return ret;\n}\n\nauto enqueueReadBufferRect(command_queue a, mem b, bool c, const(size_t*) d, const(size_t*) e, const(size_t*) f, size_t g, size_t h, size_t i, size_t j, void* k, uint l, const(event*) m, event* n)\n{\n    debug assert(clEnqueueReadBufferRect);\n    auto ret = cast(int)clEnqueueReadBufferRect(cast(cl_command_queue)a, cast(cl_mem)b, cast(cl_bool)c, cast(const(size_t*))d, cast(const(size_t*))e, cast(const(size_t*))f, cast(size_t)g, cast(size_t)h, cast(size_t)i, cast(size_t)j, cast(void*)k, cast(cl_uint)l, cast(const(cl_event*))m, cast(cl_event*)n);\n    return ret;\n}\n\nauto enqueueWriteBufferRect(command_queue a, mem b, bool c, const(size_t*) d, const(size_t*) e, const(size_t*) f, size_t g, size_t h, size_t i, size_t j, const(void*) k, uint l, const(event*) m, event* n)\n{\n    debug assert(clEnqueueWriteBufferRect);\n    auto ret = cast(int)clEnqueueWriteBufferRect(cast(cl_command_queue)a, cast(cl_mem)b, cast(cl_bool)c, cast(const(size_t*))d, cast(const(size_t*))e, cast(const(size_t*))f, cast(size_t)g, cast(size_t)h, cast(size_t)i, cast(size_t)j, cast(const(void*))k, cast(cl_uint)l, cast(const(cl_event*))m, cast(cl_event*)n);\n    return ret;\n}\n\nauto enqueueCopyBufferRect(command_queue a, mem b, mem c, const(size_t*) d, const(size_t*) e, const(size_t*) f, size_t g, size_t h, size_t i, size_t j, uint k, const(event*) l, event* m)\n{\n    debug assert(clEnqueueCopyBufferRect);\n    auto ret = cast(int)clEnqueueCopyBufferRect(cast(cl_command_queue)a, cast(cl_mem)b, cast(cl_mem)c, cast(const(size_t*))d, cast(const(size_t*))e, cast(const(size_t*))f, cast(size_t)g, cast(size_t)h, cast(size_t)i, cast(size_t)j, cast(cl_uint)k, cast(const(cl_event*))l, cast(cl_event*)m);\n    return ret;\n}\n\nauto createImage2D(context a, mem_flags b, const(image_format*) c, size_t d, size_t e, size_t f, void* g, int* h)\n{\n    debug assert(clCreateImage2D);\n    auto ret = cast(mem)clCreateImage2D(cast(cl_context)a, cast(cl_mem_flags)b, cast(const(cl_image_format*))c, cast(size_t)d, cast(size_t)e, cast(size_t)f, cast(void*)g, cast(cl_int*)h);\n    return ret;\n}\n\nauto createImage3D(context a, mem_flags b, const(image_format*) c, size_t d, size_t e, size_t f, size_t g, size_t h, void* i, int* j)\n{\n    debug assert(clCreateImage3D);\n    auto ret = cast(mem)clCreateImage3D(cast(cl_context)a, cast(cl_mem_flags)b, cast(const(cl_image_format*))c, cast(size_t)d, cast(size_t)e, cast(size_t)f, cast(size_t)g, cast(size_t)h, cast(void*)i, cast(cl_int*)j);\n    return ret;\n}\n\nauto enqueueMarker(command_queue a, event* b)\n{\n    debug assert(clEnqueueMarker);\n    auto ret = cast(int)clEnqueueMarker(cast(cl_command_queue)a, cast(cl_event*)b);\n    return ret;\n}\n\nauto enqueueWaitForEvents(command_queue a, uint b, const(event*) c)\n{\n    debug assert(clEnqueueWaitForEvents);\n    auto ret = cast(int)clEnqueueWaitForEvents(cast(cl_command_queue)a, cast(cl_uint)b, cast(const(cl_event*))c);\n    return ret;\n}\n\nauto enqueueBarrier(command_queue a)\n{\n    debug assert(clEnqueueBarrier);\n    auto ret = cast(int)clEnqueueBarrier(cast(cl_command_queue)a);\n    return ret;\n}\n\nauto unloadCompiler()\n{\n    debug assert(clUnloadCompiler);\n    auto ret = cast(int)clUnloadCompiler();\n    return ret;\n}\n\nauto getExtensionFunctionAddress(const(char*) a)\n{\n    debug assert(clGetExtensionFunctionAddress);\n    auto ret = cast(void*)clGetExtensionFunctionAddress(cast(const(char*))a);\n    return ret;\n}\n\nauto createSubDevices(device_id a, const(device_partition_property*) b, uint c, device_id* d, uint* e)\n{\n    debug assert(clCreateSubDevices);\n    auto ret = cast(int)clCreateSubDevices(cast(cl_device_id)a, cast(const(cl_device_partition_property*))b, cast(cl_uint)c, cast(cl_device_id*)d, cast(cl_uint*)e);\n    return ret;\n}\n\nauto retainDevice(device_id a)\n{\n    debug assert(clRetainDevice);\n    auto ret = cast(int)clRetainDevice(cast(cl_device_id)a);\n    return ret;\n}\n\nauto releaseDevice(device_id a)\n{\n    debug assert(clReleaseDevice);\n    auto ret = cast(int)clReleaseDevice(cast(cl_device_id)a);\n    return ret;\n}\n\nauto createImage(context a, mem_flags b, const(image_format*) c, const(image_desc*) d, void* e, int* f)\n{\n    debug assert(clCreateImage);\n    auto ret = cast(mem)clCreateImage(cast(cl_context)a, cast(cl_mem_flags)b, cast(const(cl_image_format*))c, cast(const(cl_image_desc*))d, cast(void*)e, cast(cl_int*)f);\n    return ret;\n}\n\nextern(System) alias compileProgram_FuncAlias = void function(cl_program, void*);\nauto compileProgram(program a, uint b, const(device_id*) c, const(char*) d, uint e, const(program*) f, const(char*)* g, compileProgram_FuncAlias h, void* i)\n{\n    debug assert(clCompileProgram);\n    auto ret = cast(int)clCompileProgram(cast(cl_program)a, cast(cl_uint)b, cast(const(cl_device_id*))c, cast(const(char*))d, cast(cl_uint)e, cast(const(cl_program*))f, cast(const(char*)*)g, cast(compileProgram_FuncAlias)h, cast(void*)i);\n    return ret;\n}\n\nextern(System) alias linkProgram_FuncAlias = void function(cl_program, void*);\nauto linkProgram(context a, uint b, const(device_id*) c, const(char*) d, uint e, const(program*) f, linkProgram_FuncAlias g, void* h, int* i)\n{\n    debug assert(clLinkProgram);\n    auto ret = cast(program)clLinkProgram(cast(cl_context)a, cast(cl_uint)b, cast(const(cl_device_id*))c, cast(const(char*))d, cast(cl_uint)e, cast(const(cl_program*))f, cast(linkProgram_FuncAlias)g, cast(void*)h, cast(cl_int*)i);\n    return ret;\n}\n\nauto unloadPlatformCompiler(platform_id a)\n{\n    debug assert(clUnloadPlatformCompiler);\n    auto ret = cast(int)clUnloadPlatformCompiler(cast(cl_platform_id)a);\n    return ret;\n}\n\nauto enqueueFillBuffer(command_queue a, mem b, const(void*) c, size_t d, size_t e, size_t f, uint g, const(event*) h, event* i)\n{\n    debug assert(clEnqueueFillBuffer);\n    auto ret = cast(int)clEnqueueFillBuffer(cast(cl_command_queue)a, cast(cl_mem)b, cast(const(void*))c, cast(size_t)d, cast(size_t)e, cast(size_t)f, cast(cl_uint)g, cast(const(cl_event*))h, cast(cl_event*)i);\n    return ret;\n}\n\nauto enqueueFillImage(command_queue a, mem b, const(void*) c, const(size_t*) d, const(size_t*) e, uint f, const(event*) g, event* h)\n{\n    debug assert(clEnqueueFillImage);\n    auto ret = cast(int)clEnqueueFillImage(cast(cl_command_queue)a, cast(cl_mem)b, cast(const(void*))c, cast(const(size_t*))d, cast(const(size_t*))e, cast(cl_uint)f, cast(const(cl_event*))g, cast(cl_event*)h);\n    return ret;\n}\n\nauto enqueueMigrateMemObjects(command_queue a, uint b, const(mem*) c, mem_migration_flags d, uint e, const(event*) f, event* g)\n{\n    debug assert(clEnqueueMigrateMemObjects);\n    auto ret = cast(int)clEnqueueMigrateMemObjects(cast(cl_command_queue)a, cast(cl_uint)b, cast(const(cl_mem*))c, cast(cl_mem_migration_flags)d, cast(cl_uint)e, cast(const(cl_event*))f, cast(cl_event*)g);\n    return ret;\n}\n\nauto enqueueMarkerWithWaitList(command_queue a, uint b, const(event*) c, event* d)\n{\n    debug assert(clEnqueueMarkerWithWaitList);\n    auto ret = cast(int)clEnqueueMarkerWithWaitList(cast(cl_command_queue)a, cast(cl_uint)b, cast(const(cl_event*))c, cast(cl_event*)d);\n    return ret;\n}\n\nauto enqueueBarrierWithWaitList(command_queue a, uint b, const(event*) c, event* d)\n{\n    debug assert(clEnqueueBarrierWithWaitList);\n    auto ret = cast(int)clEnqueueBarrierWithWaitList(cast(cl_command_queue)a, cast(cl_uint)b, cast(const(cl_event*))c, cast(cl_event*)d);\n    return ret;\n}\n\nauto getExtensionFunctionAddressForPlatform(platform_id a, const(char*) b)\n{\n    debug assert(clGetExtensionFunctionAddressForPlatform);\n    auto ret = cast(void*)clGetExtensionFunctionAddressForPlatform(cast(cl_platform_id)a, cast(const(char*))b);\n    return ret;\n}\n*/\n"
  },
  {
    "path": "source/dcompute/driver/ocl/raw/package.d",
    "content": "module dcompute.driver.ocl.raw;\n\npublic import dcompute.driver.ocl.raw.functions;\npublic import dcompute.driver.ocl.raw.enums;\npublic import derelict.opencl.cl;\n"
  },
  {
    "path": "source/dcompute/driver/ocl/sampler.d",
    "content": "module dcompute.driver.ocl.sampler;\n\nimport dcompute.driver.ocl;\nstruct Sampler\n{\n    enum FilterMode\n    {\n        nearest = 0x1140,\n        linear  = 0x1141,\n    }\n    \n    enum AddressingMode\n    {\n        none           = 0x1130,\n        clampToEdge    = 0x1131,\n        clamp          = 0x1132,\n        repeat         = 0x1133,\n        mirroredRepeat = 0x1134,\n    }\n    static struct Info\n    {\n        @(0x1150) uint referenceCount;\n        @(0x1151) Context context;\n        @(0x1152) bool normalisedCoordinates; // CHECKME is this actually a bool?\n        @(0x1153) AddressingMode addressingMode;\n        @(0x1154) FilterMode filterMode;\n    }\n\n    cl_sampler raw;\n    \n    //mixin(generateGetInfo!(Info,clGetSamplerInfo));\n    void retain()\n    {\n        status = cast(Status)clRetainSampler(raw);\n        checkErrors();\n    }\n    \n    void release()\n    {\n        status = cast(Status)clReleaseSampler(raw);\n        checkErrors();\n    }\n    \n}\n"
  },
  {
    "path": "source/dcompute/driver/ocl/util.d",
    "content": "module dcompute.driver.ocl.util;\n\nimport std.range;\nimport std.meta;\nimport std.traits;\n\n//deal with arrays seperately, in part to avoid any\n//narrow-string idiocy\n@property auto memSize(R)(R r)\nif (is(R : T[], T))\n{\n    static if (is(R : T[], T))\n        return r.length * T.sizeof;\n    else\n        static assert(false);\n}\n\n@property auto memSize(R)(R r)\nif(isInputRange!R && hasLength!R && !is(R : T[], T))\n{\n    return r.length * (ElementType!R).sizeof;\n}\n\nT[Args.length + 1] propertyList(T,Args...)(Args args)\n{\n    T[Args.length + 1] props;\n    foreach(i, arg; args)\n        props[i] = *cast(T*)(&arg);\n    props[$-1] = cast(T)0;\n    return props;\n}\n\nstruct ArrayAccesssor(alias ptr, alias len) {}\n\nstruct StringzAccessor(alias ptr) {}\n\nstruct ZeroTerminatedArrayAccessor(alias ptr) {}\n\nstruct ArrayAccesssor2D(alias ptr, alias lens, alias len) {}\n\n// Returned by ArrayAccesssor2D\nstruct RangeOfArray(T)\n{\n    T**     ptr;\n    size_t* lengths;\n    size_t  length;\n    size_t  index;\n\n    bool empty()\n    {\n        return index == length;\n    }\n\n    @property T[] front()\n    {\n        return ptr[index][0 .. lengths[index]];\n    }\n\n    T[] opIndex(size_t i)\n    {\n        return ptr[i][0 .. lengths[i]];\n    }\n    void popFront()\n    {\n        ++index;\n    }\n    \n    @property size_t opDollar() { return length; }\n}\n\nstring generateGetInfo(Info,alias func,string args = \"raw\")()\n{\n    import std.string;\n    return helper!(Info.tupleof).format(func.stringof,args);\n}\n\n// A substitute for fullyQualifiedName to speed up compile time\nprivate template isModule(alias a) {\n    static if (is(a) || is(typeof(a)) || a.stringof.length < 7) {\n        enum isModule = false;\n    } else {\n        enum isModule = a.stringof[0..7] == \"module \";\n    }\n}\n\nprivate template partiallyQualifiedName(alias a) {\n    static if (isModule!a) {\n        enum partiallyQualifiedName = \"\";\n    } else {\n        static if (!isModule!(__traits(parent, a))) {\n            enum prefix = partiallyQualifiedName!(__traits(parent, a)) ~ \".\";\n        } else {\n            enum prefix = \"\";\n        }\n        enum partiallyQualifiedName = prefix ~ __traits(identifier, a);\n    }\n}\n\nprivate template helper(Fields...)\n{\n    static if (Fields.length == 0)\n        enum helper = \"\";\n\n    else static if (is(typeof(Fields[0]) : ArrayAccesssor!(ptr,len),alias ptr,alias len))\n    {\n        enum helper = \"@property \" ~ typeof(*ptr).stringof ~ \"[] \" ~ Fields[0].stringof ~ \"()\\n\" ~\n            \"{\\n\" ~\n            \"    return \" ~ ptr.stringof ~ \"[0 .. \" ~ len.stringof ~\"];\"~\n            \"}\\n\" ~ helper!(Fields[1 .. $]);\n    }\n    else static if (is(typeof(Fields[0]) : StringzAccessor!ptr,alias ptr))\n    {\n        enum helper = \"@property char[] \" ~ Fields[0].stringof ~ \"()\\n\" ~\n            \"{\\n\" ~\n            \"    import std.typecons; char[] ret;\" ~\n            \"    size_t len;\" ~\n            \"    %1$s(%2$s,\" ~ __traits(getAttributes, ptr).stringof ~ \"[0], 0, null, &len);\" ~\n            \"    ret.length = len;\" ~\n            \"    %1$s(%2$s,\" ~ __traits(getAttributes, ptr).stringof ~ \"[0], memSize(ret), ret.ptr, null);\" ~\n            \"    return ret;\" ~\n            \"}\\n\" ~ helper!(Fields[1 .. $]);\n    }\n    else static if (is(typeof(Fields[0]) : ArrayAccesssor2D!(ptr,lens,len) , alias ptr, alias lens, alias len))\n    {\n        enum helper = \"@property RangeOfArray!(\" ~ typeof(**ptr).stringof ~ \") \" ~ Fields[0].stringof ~ \"()\\n\" ~\n            \"{\\n\" ~\n            \"   import std.typecons; size_t length; size_t* lengths; \" ~ typeof(ptr).stringof ~ \" ptr;\" ~\n            \"   %1$s(%2$s,\" ~ __traits(getAttributes, len).stringof ~ \"[0],length.sizeof, &length,null);\" ~\n            \"   lengths = (new size_t[length]).ptr; ptr = (new \" ~ typeof(*ptr).stringof ~ \"[length]).ptr;\" ~\n            \"   %1$s(%2$s,\" ~ __traits(getAttributes, lens).stringof ~ \"[0],lengths.sizeof, lengths,null);\" ~\n            \"   if (lengths[length - 1] == 0) length--;\" ~\n            \"   foreach(i; 0 .. length) \\n{\" ~\n            \"       ptr[i] = (new \" ~ typeof(**ptr).stringof ~ \"[lengths[i]]).ptr;\" ~\n            \"   }\\n\" ~\n            \"   %1$s(%2$s,\" ~ __traits(getAttributes, ptr).stringof ~ \"[0], ptr.sizeof, ptr, null);\" ~\n            \"   return typeof(return)(ptr,lengths,length,0);\" ~\n            \"}\\n\" ~ helper!(Fields[1 .. $]);\n    }\n    else\n    {\n        static if (is(typeof(Fields[0]) == enum))\n        {\n            enum helper = \"@property \" ~ partiallyQualifiedName!(typeof(Fields[0])) ~ \" \" ~ Fields[0].stringof ~ \"()\\n\" ~\n                \"{\\n\" ~\n                \"    import std.typecons; typeof(return) ret;\" ~\n                \"%1$s(%2$s,\"~ __traits(getAttributes, Fields[0]).stringof ~ \"[0], ret.sizeof, &ret, null);\" ~\n                \"return ret; \" ~ \n                \"}\\n\" ~ helper!(Fields[1 .. $]);\n    \n        }\n        else \n        {\n            enum helper = \"@property \" ~ typeof(Fields[0]).stringof ~ \" \" ~ Fields[0].stringof ~ \"()\\n\" ~\n                \"{\\n\" ~\n                \"    import std.typecons; typeof(return) ret;\" ~\n                \"%1$s(%2$s,\"~ __traits(getAttributes, Fields[0]).stringof ~ \"[0], ret.sizeof, &ret, null);\" ~\n                \"return ret; \" ~ \n                \"}\\n\" ~ helper!(Fields[1 .. $]);\n        }\n    }\n}\n"
  },
  {
    "path": "source/dcompute/driver/util.d",
    "content": "module dcompute.driver.util;\n\nimport std.traits;\nimport std.meta;\nimport ldc.dcompute : Pointer;\nimport dcompute.driver.ocl.buffer : Buffer;\ntemplate HostArgsOf(F)\n{\n    import std.traits;\n    // TODO substitute Pointer!(n,T) with Buffer!T, Image etc.\n    template toBuffer(T)\n    {\n        static if(is(T : Pointer!(n,U), uint n,U))\n            alias toBuffer = Buffer!U;\n        else\n            alias toBuffer = T;\n    }\n    alias HostArgsOf = staticMap!(toBuffer,Parameters!F);\n}\n"
  },
  {
    "path": "source/dcompute/kernels/README.md",
    "content": "Algorithms\n==========\n\nAdjacent\n\nExample use\n===========\n\nIdeally we want to be able to do something like\n```D\nwith(kernelLaunchConfig(...)) //includes the Queue to launch on and any any other info\n    T val = hostrange.array\n            .transfer // transfer to device. parameters in config\n            .exclusice_scan!add\n            .inner_product(someOtherDeviceArray)\n            .mapReduce(map_func,reduce_op)\n            .retrieve;\n```\nand\n\n```D\nwith(kernelLaunchConfig(...)) //includes the Queue device allocator to launch on and any any other info\n    Event e = hostrange.array\n                .transfer // transfer to device. parameters in config\n                .exclusice_scan!add\n                .inner_product(someOtherDeviceArray);\nErrorCode ec = e.waitAndYeild(); //play nice with fibres/threads.\n```\nand have the pipeline be async and return an event/error.\n"
  },
  {
    "path": "source/dcompute/kernels/package.d",
    "content": "module dcompute.kernels;\n/*Adjacent:\n * adjacent!(R,alias e)(R r, R o) where e a is binary op to apply to adjacent elements of R\n *Allocator:\n *Search:\n * upper_bound\n * lower_bound\n * equal\n */"
  },
  {
    "path": "source/dcompute/std/atomic.d",
    "content": "@compute(CompileFor.deviceOnly) module dcompute.std.atomic;\n\nimport ldc.dcompute;\n\nimport cuda = dcompute.std.cuda.atomic;\npublic import dcompute.std.atomic_common : MemoryOrder;\n\nint atomicAddShared(MemoryOrder mo = MemoryOrder.seq_cst)(SharedPointer!int dst, int val)\n{\n\tif(__dcompute_reflect(ReflectTarget.CUDA))\n\t\treturn cuda.atomicAddShared!mo(dst, val);\n\tassert(0);\n}\n\nint atomicAdd(MemoryOrder mo = MemoryOrder.seq_cst)(GlobalPointer!int dst, int val)\n{\n\tif(__dcompute_reflect(ReflectTarget.CUDA))\n\t\treturn cuda.atomicAdd!mo(dst, val);\n\tassert(0);\n}\n/*\n * @brief Atomically exchanges the value at the address with a new value.\n * @param dst The shared memory address (passed as i64).\n * @param newVal The integer value to store (i32).\n * @return The old value that was stored at the address (i32).\n */\nint atomicExchange(MemoryOrder mo = MemoryOrder.seq_cst)\n                  (GlobalPointer!int dst, int newVal)\n{\n    if (__dcompute_reflect(ReflectTarget.CUDA))\n\t\treturn cuda.atomicExchange!mo(dst, newVal);\n\tassert(0);\n}\n\nint atomicExchangeShared(MemoryOrder mo = MemoryOrder.seq_cst)(SharedPointer!int dst, int newVal)\n{\n\tif(__dcompute_reflect(ReflectTarget.CUDA))\n\t\treturn cuda.atomicExchangeShared!mo(dst, newVal);\n\tassert(0);\n}\n\n/*\n *Atomic:\n * T add (GenericPointer!T addr,T val)\n * T sub (GenericPointer!T addr,T val)\n * T xchg(GenericPointer!T addr,T val)\n * T min (GenericPointer!T addr,T val)\n * T max (GenericPointer!T addr,T val)\n * T cas (GenericPointer!T addr,T val)\n * I and (GenericPointer!I addr,I val)\n * I or  (GenericPointer!I addr,I val)\n * I xor (GenericPointer!I addr,I val)\n * I inc (GenericPointer!I addr,I val)\n * I dec (GenericPointer!I addr,I val)\n\n */\n"
  },
  {
    "path": "source/dcompute/std/atomic_common.d",
    "content": "@compute(CompileFor.deviceOnly) module dcompute.std.atomic_common;\n\nimport ldc.dcompute;\n\nenum MemoryOrder {\n\trelaxed, \n\tacquire, \n\trelease, \n\tacq_rel, \n\tseq_cst\n}\n"
  },
  {
    "path": "source/dcompute/std/cuda/atomic.d",
    "content": "@compute(CompileFor.deviceOnly) module dcompute.std.cuda.atomic;\n\nimport ldc.dcompute;\nimport dcompute.std.atomic_common : MemoryOrder;\n\npragma(LDC_inline_ir)\n    R inlineIR(string s, R, P...)(P);\n\nint atomicAddShared(MemoryOrder mo = MemoryOrder.seq_cst)(SharedPointer!int dst, int val)\n{\n\tstatic if (mo == MemoryOrder.relaxed) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(3)*\n\t\t\t%old = atomicrmw add i32 addrspace(3)* %ptr, i32 %1 monotonic\n\t\t\tret i32 %old`, int)(cast(long)dst, cast(int)val);\n\t} else static if (mo == MemoryOrder.acquire) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(3)*\n\t\t\t%old = atomicrmw add i32 addrspace(3)* %ptr, i32 %1 acquire\n\t\t\tret i32 %old`, int)(cast(long)dst, cast(int)val);\n\t} else static if (mo == MemoryOrder.release) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(3)*\n\t\t\t%old = atomicrmw add i32 addrspace(3)* %ptr, i32 %1 release\n\t\t\tret i32 %old`, int)(cast(long)dst, cast(int)val);\n\t} else static if (mo == MemoryOrder.acq_rel) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(3)*\n\t\t\t%old = atomicrmw add i32 addrspace(3)* %ptr, i32 %1 acq_rel\n\t\t\tret i32 %old`, int)(cast(long)dst, cast(int)val);\n\t} else static if (mo == MemoryOrder.seq_cst) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(3)*\n\t\t\t%old = atomicrmw add i32 addrspace(3)* %ptr, i32 %1 seq_cst\n\t\t\tret i32 %old`, int)(cast(long)dst, cast(int)val);\n\t}\n\telse\n\t\tstatic assert(0, \"atomicAddShared doesn't support memoryOrder \" ~mo.stringof);\n}\n\nint atomicAdd(MemoryOrder mo = MemoryOrder.seq_cst)(GlobalPointer!int dst, int val)\n{\n\tstatic if (mo == MemoryOrder.relaxed) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(1)*\n\t\t\t%old = atomicrmw add i32 addrspace(1)* %ptr, i32 %1 monotonic\n\t\t\tret i32 %old`, int)(cast(long)dst, cast(int)val);\n\t} else static if (mo == MemoryOrder.acquire) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(1)*\n\t\t\t%old = atomicrmw add i32 addrspace(1)* %ptr, i32 %1 acquire\n\t\t\tret i32 %old`, int)(cast(long)dst, cast(int)val);\n\t} else static if (mo == MemoryOrder.release) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(1)*\n\t\t\t%old = atomicrmw add i32 addrspace(1)* %ptr, i32 %1 release\n\t\t\tret i32 %old`, int)(cast(long)dst, cast(int)val);\n\t} else static if (mo == MemoryOrder.acq_rel) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(1)*\n\t\t\t%old = atomicrmw add i32 addrspace(1)* %ptr, i32 %1 acq_rel\n\t\t\tret i32 %old`, int)(cast(long)dst, cast(int)val);\n\t} else static if (mo == MemoryOrder.seq_cst) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(1)*\n\t\t\t%old = atomicrmw add i32 addrspace(1)* %ptr, i32 %1 seq_cst\n\t\t\tret i32 %old`, int)(cast(long)dst, cast(int)val);\n\t}\n\telse \n\t\tstatic assert(0, \"atomicAdd doesn't support memoryOrder \" ~mo.stringof);\n}\n/*\n * @brief Atomically exchanges the value at the address with a new value.\n * @param dst The shared memory address (passed as i64).\n * @param newVal The integer value to store (i32).\n * @return The old value that was stored at the address (i32).\n */\nint atomicExchange(MemoryOrder mo)(GlobalPointer!int dst, int newVal)\n{\n    // The SharedPointer!int struct is cast to a raw long (i64) to bypass complex LDC type parsing.\n\tstatic if (mo == MemoryOrder.relaxed) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(1)*\n\t\t\t%old = atomicrmw xchg i32 addrspace(1)* %ptr, i32 %1 monotonic\n\t\t\tret i32 %old`, int)(cast(long)dst, newVal);\n\t} else static if (mo == MemoryOrder.acquire) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(1)*\n\t\t\t%old = atomicrmw xchg i32 addrspace(1)* %ptr, i32 %1 acquire\n\t\t\tret i32 %old`, int)(cast(long)dst, newVal);\n\t} else static if (mo == MemoryOrder.release) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(1)*\n\t\t\t%old = atomicrmw xchg i32 addrspace(1)* %ptr, i32 %1 release\n\t\t\tret i32 %old`, int)(cast(long)dst, newVal);\n\t} else static if (mo == MemoryOrder.acq_rel) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(1)*\n\t\t\t%old = atomicrmw xchg i32 addrspace(1)* %ptr, i32 %1 acq_rel\n\t\t\tret i32 %old`, int)(cast(long)dst, newVal);\n\t} else static if (mo == MemoryOrder.seq_cst) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(1)*\n\t\t\t%old = atomicrmw xchg i32 addrspace(1)* %ptr, i32 %1 seq_cst\n\t\t\tret i32 %old`, int)(cast(long)dst, newVal);\n\t}\n\telse\n\t\tstatic assert(0, \"atomicExchange doesn't support memoryOrder \" ~mo.stringof);\n}\nint atomicExchangeShared(MemoryOrder mo = MemoryOrder.seq_cst)(SharedPointer!int dst, int newVal)\n{\n    // The SharedPointer!int struct is cast to a raw long (i64) to bypass complex LDC type parsing.\n\tstatic if (mo == MemoryOrder.relaxed) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(3)*\n\t\t\t%old = atomicrmw xchg i32 addrspace(3)* %ptr, i32 %1 monotonic\n\t\t\tret i32 %old`, int)(cast(long)dst, newVal);\n\t} else static if (mo == MemoryOrder.acquire) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(3)*\n\t\t\t%old = atomicrmw xchg i32 addrspace(3)* %ptr, i32 %1 acquire\n\t\t\tret i32 %old`, int)(cast(long)dst, newVal);\n\t} else static if (mo == MemoryOrder.release) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(3)*\n\t\t\t%old = atomicrmw xchg i32 addrspace(3)* %ptr, i32 %1 release\n\t\t\tret i32 %old`, int)(cast(long)dst, newVal);\n\t} else static if (mo == MemoryOrder.acq_rel) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(3)*\n\t\t\t%old = atomicrmw xchg i32 addrspace(3)* %ptr, i32 %1 acq_rel\n\t\t\tret i32 %old`, int)(cast(long)dst, newVal);\n\t} else static if (mo == MemoryOrder.seq_cst) {\n\t\treturn inlineIR!(`\n\t\t\t%ptr = inttoptr i64 %0 to i32 addrspace(3)*\n\t\t\t%old = atomicrmw xchg i32 addrspace(3)* %ptr, i32 %1 seq_cst\n\t\t\tret i32 %old`, int)(cast(long)dst, newVal);\n\t}\n\telse\n\t\tstatic assert(0, \"atomicExchangeShared doesn't support memoryOrder \" ~mo.stringof);\n}\n"
  },
  {
    "path": "source/dcompute/std/cuda/index.d",
    "content": "@compute(CompileFor.deviceOnly) module dcompute.std.cuda.index;\n\nimport ldc.dcompute;\npure: nothrow: @nogc:\n//tid = threadId\npragma(LDC_intrinsic, \"llvm.nvvm.read.ptx.sreg.tid.x\")\nuint tid_x();\n\npragma(LDC_intrinsic, \"llvm.nvvm.read.ptx.sreg.tid.y\")\nuint tid_y();\n\npragma(LDC_intrinsic, \"llvm.nvvm.read.ptx.sreg.tid.z\")\nuint tid_z();\n\n//ntid = blockDim\npragma(LDC_intrinsic, \"llvm.nvvm.read.ptx.sreg.ntid.x\")\nuint ntid_x();\n\npragma(LDC_intrinsic, \"llvm.nvvm.read.ptx.sreg.ntid.y\")\nuint ntid_y();\n\npragma(LDC_intrinsic, \"llvm.nvvm.read.ptx.sreg.ntid.z\")\nuint ntid_z();\n\n//ctaid = blockIdx\npragma(LDC_intrinsic, \"llvm.nvvm.read.ptx.sreg.ctaid.x\")\nuint ctaid_x();\n\npragma(LDC_intrinsic, \"llvm.nvvm.read.ptx.sreg.ctaid.y\")\nuint ctaid_y();\n\npragma(LDC_intrinsic, \"llvm.nvvm.read.ptx.sreg.ctaid.z\")\nuint ctaid_z();\n\n//nctaid = gridDim\npragma(LDC_intrinsic, \"llvm.nvvm.read.ptx.sreg.nctaid.x\")\nuint nctaid_x();\n\npragma(LDC_intrinsic, \"llvm.nvvm.read.ptx.sreg.nctaid.y\")\nuint nctaid_y();\n\npragma(LDC_intrinsic, \"llvm.nvvm.read.ptx.sreg.nctaid.z\")\nuint nctaid_z();\n\n//warpsize\npragma(LDC_intrinsic, \"llvm.nvvm.read.ptx.sreg.warpsize\")\nuint warpsize();\n\n\n"
  },
  {
    "path": "source/dcompute/std/cuda/sync.d",
    "content": "@compute(CompileFor.deviceOnly) module dcompute.std.cuda.sync;\n\nimport ldc.dcompute;\nimport ldc.intrinsics;\n\npragma(LDC_intrinsic, \"llvm.nvvm.barrier0\")\nvoid barrier0();\n\nstatic if (LLVM_atleast!21) { // >= LDC 1.42.0(LLVM 21)\n    pragma(LDC_intrinsic, \"llvm.nvvm.barrier.cta.sync.aligned.all\")\n    void barrier_n(int);\n}\n\npragma(LDC_intrinsic, \"llvm.nvvm.barrier0.and\")\nint barrier0_and(int);\n\npragma(LDC_intrinsic, \"llvm.nvvm.barrier0.or\")\nint barrier0_or(int);\n\npragma(LDC_intrinsic, \"llvm.nvvm.barrier0.popc\")\nint barrier0_popc(int);\n\n//block memory barrier\npragma(LDC_intrinsic, \"llvm.nvvm.membar.cta\")\nvoid membar_cta();\n\n//device global\npragma(LDC_intrinsic, \"llvm.nvvm.membar.gl\")\nvoid membar_gl();\n\n//system global\npragma(LDC_intrinsic, \"llvm.nvvm.membar.sys\")\nvoid membar_sys();\n\n"
  },
  {
    "path": "source/dcompute/std/floating.d",
    "content": "@compute(CompileFor.hostAndDevice) module dcompute.std.floating;\n\nimport ldc.dcompute;\n\n/*\n *Intrinsic\n * isfinite\n * isinfinite\n * isnan\n * isnormal\n * signed\n * abs\n * ceil\n * copysign\n * fdim\n * floor\n * fma\n * fract\n * frexp\n * ilogb\n * ldexp\n * min\n * max\n * pow\n * powr\n * powi\n * trunc\n * sqrt\n * rsqrt\n *Standard Trancedentals:\n * acos\n * acosh\n * asin\n * asinh\n * atan\n * atan2\n * atanh\n * cos\n * cosh\n * cospi\n * exp\n * exp2\n * exp10\n * log\n * log2\n * log10\n * sincos\n * sin\n * sinh\n * sinpi\n * tan\n * tanh\n * tanpi\n */\n"
  },
  {
    "path": "source/dcompute/std/index.d",
    "content": "@compute(CompileFor.hostAndDevice) module dcompute.std.index;\n\nimport ldc.dcompute;\n\nprivate import ocl  = dcompute.std.opencl.index;\nprivate import cuda = dcompute.std.cuda.index;\n\n/*\n Index Terminology\n \n DCompute               CUDA                        OpenCL\n GlobalDimension.xyz    gridDim*blockDim            get_global_size()\n GlobalIndex.xyz        blockDim*blockIdx+threadIdx get_global_id()\n \n \n GroupDimension.xyz     gridDim                     get_num_groups()\n GroupIndex.xyz         blockIdx                    get_group_id()\n \n SharedDimension.xyz    blockDim                    get_local_size()\n SharedIndex.xyz        threadIdx                   get_local_id()\n \n GlobalIndex.linear     A nasty calcualion          get_global_linear_id()\n SharedIndex.linear     Ditto                       get_local_linear_id()\n \n Notes:\n    *Index.{x,y,z} are bounded by *Dimension.{x,y,z}\n \n    Use SharedIndex's to index Shared Memory and GlobalIndex's to index Global Memory\n \n    A Group is the ratio of Global to Shared. GroupDimension is NOT the size of a single\n    group, (thats SharedDimension) rather it is the number of groups along e.g \n    the x dimension. Similarly GroupIndex is how many units of the SharedDimension along\n    a given dimension is.\n \n    By default *Index.linear is the linearisation of 3D memory. Use *Index.linear!N where\n    N is 1, 2 or 3 to use a linearisation of ND memory (for e.g. efficiency/documentation).\n */\npure: nothrow: @nogc:\n\nstruct GlobalDimension\n{\n    pragma(inline,true);\n    @property static size_t x()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_global_size(0);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.ntid_x()*cuda.nctaid_x();\n        else\n            assert(0);\n    }\n    pragma(inline,true);\n    @property static size_t y()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_global_size(1);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.ntid_y()*cuda.nctaid_y();\n        else\n            assert(0);\n    }\n    pragma(inline,true);\n    @property static size_t z()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_global_size(2);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.ntid_z()*cuda.nctaid_z();\n        else\n            assert(0);\n    }\n}\n\nstruct GlobalIndex\n{\n    pragma(inline,true);\n    @property static size_t x()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_global_id(0);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.ctaid_x()*cuda.ntid_x() + cuda.tid_x();\n        else\n            assert(0);\n    }\n    pragma(inline,true);\n    @property static size_t y()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_global_id(1);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.ctaid_y()*cuda.ntid_y() + cuda.tid_y();\n        else\n            assert(0);\n    }\n    pragma(inline,true);\n    @property static size_t z()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_global_id(2);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.ctaid_z()*cuda.ntid_z() + cuda.tid_z();\n        else\n            assert(0);\n    }\n    pragma(inline,true);\n    @property static size_t linearImpl(int dim = 3)()\n    if(dim >= 1 && dim <= 3)\n    {\n        static if (dim == 3)\n            return  (z * GlobalDimension.y * GlobalDimension.x) +\n                    (y * GlobalDimension.x) + x;\n        else static if (dim == 2)\n            return (y * GlobalDimension.x) + x;\n        else\n            return x;\n    }\n    pragma(inline,true);\n    @property static size_t linear(int dim = 3)() if(dim >= 1 && dim <= 3)\n    {\n        //Foward to the intrinsic to help memoisation for the comsumer.\n        if(__dcompute_reflect(ReflectTarget.OpenCL,200))\n            return ocl.get_global_linear_id();\n        else if(__dcompute_reflect(ReflectTarget.OpenCL,210))\n            return ocl.get_global_linear_id();\n        else if(__dcompute_reflect(ReflectTarget.OpenCL,220))\n            return ocl.get_global_linear_id();\n        else\n            return linearImpl!dim;\n    }\n}\n\nstruct GroupDimension\n{\n    pragma(inline,true);\n    @property static size_t x()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_num_groups(0);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.nctaid_x();\n        else\n            assert(0);\n    }\n    pragma(inline,true);\n    @property static size_t y()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_num_groups(1);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.nctaid_y();\n        else\n            assert(0);\n    }\n    pragma(inline,true);\n    @property static size_t z()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_num_groups(2);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.nctaid_z();\n        else\n            assert(0);\n    }\n}\n\nstruct GroupIndex\n{\n    pragma(inline,true);\n    @property static size_t x()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_group_id(0);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.ctaid_x();\n        else\n            assert(0);\n    }\n    pragma(inline,true);\n    @property static size_t y()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_group_id(1);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.ctaid_y();\n        else\n            assert(0);\n    }\n    pragma(inline,true);\n    @property static size_t z()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_group_id(2);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.ctaid_z();\n        else\n            assert(0);\n    }\n}\n\nstruct SharedDimension\n{\n    pragma(inline,true);\n    @property static size_t x()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_local_size(0);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.ntid_x();\n        else\n            assert(0);\n    }\n    pragma(inline,true);\n    @property static size_t y()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_local_size(1);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.ntid_y();\n        else\n            assert(0);\n\n    }\n    pragma(inline,true);\n    @property static size_t z()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_local_size(2);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.ntid_z();\n        else\n            assert(0);\n    }\n}\n\nstruct SharedIndex\n{\n    pragma(inline,true);\n    @property static size_t x()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_local_id(0);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.tid_x();\n        else\n            assert(0);\n    }\n    pragma(inline,true);\n    @property static size_t y()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_local_id(1);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.tid_y();\n        else\n            assert(0);\n    }\n    pragma(inline,true);\n    @property static size_t z()()\n    {\n        if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n            return ocl.get_local_id(2);\n        else if(__dcompute_reflect(ReflectTarget.CUDA,0))\n            return cuda.tid_z();\n        else\n            assert(0);\n    }\n    pragma(inline,true);\n    @property static size_t linearImpl(int dim = 3)()\n    if(dim >= 1 && dim <= 3)\n    {\n        static if (dim == 3)\n            return  (z * SharedDimension.y * SharedDimension.x) +\n                    (y * SharedDimension.x) + x;\n        else static if (dim == 2)\n                return (y * SharedDimension.x) + x;\n        else\n            return x;\n\n    }\n    pragma(inline,true);\n    @property static size_t linear(int dim = 3)() if(dim >= 1 && dim <= 3)\n    {\n        //Foward to the intrinsic to help memoisation for the comsumer.\n        if(__dcompute_reflect(ReflectTarget.OpenCL,200))\n            return ocl.get_local_linear_id();\n        else if(__dcompute_reflect(ReflectTarget.OpenCL,210))\n            return ocl.get_local_linear_id();\n        else if(__dcompute_reflect(ReflectTarget.OpenCL,220))\n            return ocl.get_local_linear_id();\n        else\n            return linearImpl!dim;\n        \n    }\n}\n\nprivate import std.traits;\nstruct AutoIndexed(T) //if (isInstanceOf(T,Pointer))\n{\n    T p = void;\n    enum  n = TemplateArgsOf!(T)[0];\n    alias U = TemplateArgsOf!(T)[1];\n    static assert(n == AddrSpace.Global || n == AddrSpace.Shared);\n    \n    @property U index()\n    {\n        static if (n == AddrSpace.Global)\n            return p[GlobalIndex.linear];\n        else static if (n == AddrSpace.Shared)\n            return p[SharedIndex.linear];\n\n    }\n    \n    @property void index(U t)\n    {\n        static if (n == AddrSpace.Global)\n            p[GlobalIndex.linear] = t;\n        else static if (n == AddrSpace.Shared)\n            p[SharedIndex.linear] = t;\n    }\n    @disable this();\n    alias index this;\n}\n"
  },
  {
    "path": "source/dcompute/std/integer.d",
    "content": "@compute(CompileFor.hostAndDevice) module dcompute.std.integer;\n\nimport ldc.dcompute;\n\n/*\n brev - bit reverse\n sad  - sum of absolute differences\n abs\n min\n max\n add_sat\n sub_sat\n mul_hi\n mul_low\n mad\n mad_hi\n mad_lo\n mad_hi_sat\n mul24_hi\n mul24_lo\n mad24_hi\n mad24_lo\n mad24_hi_sat\nsm2.0 or higher\n popc   - count the number of set bits\n clz    - count the number of leading zeros\n bfind  - find most significant non-sign bit\n bfe    - bit field extract\n bfi    - bit field insert\n overflow arithmetic\n ctz - count trailing zeros\n rotate\n */\n"
  },
  {
    "path": "source/dcompute/std/memory.d",
    "content": "@compute(CompileFor.hostAndDevice) module dcompute.std.memory;\n\nimport ldc.dcompute;\n\n/*\n *Pointer conversions:\n * *Pointer!T genericPtrTo*(GenericPointer!T ptr)\n * GenericPointer!T *toGenericPtr(*Pointer!T ptr)\n *\n *Shared memory:\n * SharedPointer!T sharedStaticReserve!(T[N])\n * SharedPointer!void sharedDynamicBase();\n * auto sharedIndices!(Ts...) if(isSharedIndex!(Ts...) // (T, alias length) pairs\n       see http://stackoverflow.com/questions/15435559/use-dynamic-shared-memory-allocation-for-two-different-vectors\n       for what this emulates any why. Memory aligned to A = reduce!max(T.alignof)\n       Returns a tuple of {SharedPointer!(align(A) T), length} \"arrays\"\n */\n"
  },
  {
    "path": "source/dcompute/std/opencl/image.d",
    "content": "@compute(CompileFor.deviceOnly) module dcompute.std.opencl.image;\n\nimport ldc.dcompute;\n//separate module for opaque image type because the backend requires it\npublic import ldc.opencl;\n\ntemplate Image(int dim)\n{\n    static if (dim == 1)\n        alias Image = GlobalPointer!image1d_rw_t;\n    else static if (dim == 2)\n        alias Image = GlobalPointer!image2d_rw_t;\n    else static if (dim == 3)\n        alias Image = GlobalPointer!image3d_rw_t;\n}\ntemplate ReadOnlyImage(int dim)\n{\n    static if (dim == 1)\n        alias ReadOnlyImage = GlobalPointer!image1d_ro_t;\n    else static if (dim == 2)\n        alias ReadOnlyImage = GlobalPointer!image2d_ro_t;\n    else static if (dim == 3)\n        alias ReadOnlyImage = GlobalPointer!image3d_ro_t;\n}\ntemplate WriteOnlyImage(int dim)\n{\n    static if (dim == 1)\n        alias WriteOnlyImage = GlobalPointer!image1d_wo_t;\n    else static if (dim == 2)\n        alias WriteOnlyImage = GlobalPointer!image2d_wo_t;\n    else static if (dim == 3)\n        alias WriteOnlyImage = GlobalPointer!image3d_wo_t;\n}\n/* Sampler\n    A type used to control how elements of an image object are read by read_image\n    Sampler arguments to read_image must be literals.\nCoordinate normalisation\n    CLK_NORMALIZED_COORDS_TRUE,\n    CLK_NORMALIZED_COORDS_FALSE\nAddressing mode\n    CLK_ADDRESS_MIRRORED_REPEAT requires CLK_NORMALIZED_COORDS_TRUE\n        Flip the image coordinate at every integer junction.\n        If normalized coordinates are not used, this addressing mode may\n        generate image coordinates that are undefined.\n        Example: cba|abcd|dcb.\n    CLK_ADDRESS_REPEAT requires CLK_NORMALIZED_COORDS_TRUE\n        out-of-range image coordinates are wrapped to the valid range.\n        If normalized coordinates are not used, this addressing mode may\n        generate image coordinates that are undefined.\n        Example: bcd|abcd|abc.\n    CLK_ADDRESS_CLAMP_TO_EDGE\n        out-of-range image coordinates are clamped to the extent.\n        Example: aaa|abcd|ddd.\n    CLK_ADDRESS_CLAMP\n        out-of-range image coordinates will return a border color.\n        This is similar to the GL_ADDRESS_CLAMP_TO_BORDER addressing mode.\n        Example: 000|abcd|000.\n    CLK_ADDRESS_NONE -\n        for this addressing mode the programmer guarantees that the image\n        coordinates used to sample elements of the image refer to a location\n        inside the image; otherwise the results are undefined.\n    For 1D and 2D image arrays, the addressing mode applies only to the x and\n    (x, y) coordinates. The addressing mode for the coordinate which specifies\n    the array index is always CLK_ADDRESS_CLAMP_TO_EDGE\nFilter mode\n    CLK_FILTER_NEAREST\n    CLK_FILTER_LINEAR\n */\nenum SamplerAddressMode : int\n{\n    none = 0,\n    mirroredRepeat = 0x10,\n    repeat  = 0x20,\n    clampToEdge = 0x30,\n    clamp = 0x40,\n}\nenum SamplerFilterMode : int\n{\n    nearest = 0,\n    linear = 0x100\n}\nint samplerInit(bool normalisedCoords, SamplerAddressMode am, SamplerFilterMode fm)()\n{\n    return cast(int)(coords | am | fm);\n}\n\nalias Sampler = SharedPointer!sampler_t;\n\npragma(mangle,\"__translate_sampler_initializer\")\n    Sampler makeSampler(int);\n\n// TODO: Image 1d array, Image 1d buffer, Image 2d array depth, Image 2d array\n// Refer to https://github.com/KhronosGroup/SPIR-Tools/wiki/SPIR-2.0-built-in-functions#image-read-and-write-functions\n// for the read/write mangles\n// Alternately use https://godbolt.org with `-target spir -O0 -emit-llvm` and check the IR generated by clang\n\ntemplate read(T) if (is(T == float))\n{\n    // return type\n    alias T4 = __vector(T[4]);\n\n    pragma(mangle,\"_Z11read_imagef11ocl_image1d11ocl_samplerf\")\n        T4 read(GlobalPointer!image1d_rw_t, Sampler, float);\n    pragma(mangle,\"_Z11read_imagef11ocl_image1d11ocl_sampleri\")\n        T4 read(GlobalPointer!image1d_rw_t, Sampler, int);\n\n    pragma(mangle,\"_Z11read_imagef14ocl_image1d_ro11ocl_samplerf\")\n        T4 read(GlobalPointer!image1d_ro_t, Sampler, float);\n    pragma(mangle,\"_Z11read_imagef14ocl_image1d_ro11ocl_sampleri\")\n        T4 read(GlobalPointer!image1d_ro_t, Sampler, int);\n\n    pragma(mangle,\"_Z11read_imagef11ocl_image2d11ocl_samplerDv2_f\")\n        T4 read(GlobalPointer!image2d_rw_t, Sampler, __vector(float[2]));\n    pragma(mangle,\"_Z11read_imagef11ocl_image2d11ocl_samplerDv2_i\")\n        T4 read(GlobalPointer!image2d_rw_t, Sampler, __vector(int[2]));\n\n    pragma(mangle,\"_Z11read_imagef14ocl_image2d_ro11ocl_samplerDv2_f\")\n        T4 read(GlobalPointer!image2d_ro_t, Sampler, __vector(float[2]));\n    pragma(mangle,\"_Z11read_imagef14ocl_image2d_ro11ocl_samplerDv2_i\")\n        T4 read(GlobalPointer!image2d_ro_t, Sampler, __vector(int[2]));\n\n    pragma(mangle,\"_Z11read_imagef11ocl_image3d11ocl_samplerDv4_f\")\n        T4 read(GlobalPointer!image3d_rw_t, Sampler, __vector(float[4]));\n    pragma(mangle,\"_Z11read_imagef11ocl_image3d11ocl_samplerDv4_i\")\n        T4 read(GlobalPointer!image3d_rw_t, Sampler, __vector(int[4]));\n\n    pragma(mangle,\"_Z11read_imagef14ocl_image3d_ro11ocl_samplerDv4_f\")\n        T4 read(GlobalPointer!image3d_ro_t, Sampler, __vector(float[4]));\n    pragma(mangle,\"_Z11read_imagef14ocl_image3d_ro11ocl_samplerDv4_i\")\n        T4 read(GlobalPointer!image3d_ro_t, Sampler, __vector(int[4]));\n}\n\ntemplate read(T) if (is(T == int))\n{\n    // return type\n    alias T4 = __vector(T[4]);\n    pragma(mangle,\"_Z11read_imagei11ocl_image1d11ocl_samplerf\")\n        T4 read(GlobalPointer!image1d_rw_t, Sampler, float);\n    pragma(mangle,\"_Z11read_imagei11ocl_image1d11ocl_sampleri\")\n        T4 read(GlobalPointer!image1d_rw_t, Sampler, int);\n    \n    pragma(mangle,\"_Z11read_imagei14ocl_image1d_ro11ocl_samplerf\")\n        T4 read(GlobalPointer!image1d_ro_t, Sampler, float);\n    pragma(mangle,\"_Z11read_imagei14ocl_image1d_ro11ocl_sampleri\")\n        T4 read(GlobalPointer!image1d_ro_t, Sampler, int);\n\n    pragma(mangle,\"_Z11read_imagef11ocl_image2d11ocl_samplerDv2_f\")\n        T4 read(GlobalPointer!image2d_rw_t, Sampler, __vector(float[2]));\n    pragma(mangle,\"_Z11read_imagef11ocl_image2d11ocl_samplerDv2_i\")\n        T4 read(GlobalPointer!image2d_rw_t, Sampler, __vector(int[2]));\n\n    pragma(mangle,\"_Z11read_imagef14ocl_image2d_ro11ocl_samplerDv2_f\")\n        T4 read(GlobalPointer!image2d_ro_t, Sampler, __vector(float[2]));\n    pragma(mangle,\"_Z11read_imagef14ocl_image2d_ro11ocl_samplerDv2_i\")\n        T4 read(GlobalPointer!image2d_ro_t, Sampler, __vector(int[2]));\n\n    pragma(mangle,\"_Z11read_imagef11ocl_image3d11ocl_samplerDv4_f\")\n        T4 read(GlobalPointer!image3d_rw_t, Sampler, __vector(float[4]));\n    pragma(mangle,\"_Z11read_imagef11ocl_image3d11ocl_samplerDv4_i\")\n        T4 read(GlobalPointer!image3d_rw_t, Sampler, __vector(int[4]));\n\n    pragma(mangle,\"_Z11read_imagef14ocl_image3d_ro11ocl_samplerDv4_f\")\n        T4 read(GlobalPointer!image3d_ro_t, Sampler, __vector(float[4]));\n    pragma(mangle,\"_Z11read_imagef14ocl_image3d_ro11ocl_samplerDv4_i\")\n        T4 read(GlobalPointer!image3d_ro_t, Sampler, __vector(int[4]));\n}\n\ntemplate read(T) if (is(T == uint))\n{\n    // return type\n    alias T4 = __vector(T[4]);\n    pragma(mangle,\"_Z12read_imageui11ocl_image1d11ocl_samplerf\")\n        T4 read(GlobalPointer!image1d_rw_t, Sampler, float);\n    pragma(mangle,\"_Z12read_imageui11ocl_image1d11ocl_sampleri\")\n        T4 read(GlobalPointer!image1d_rw_t, Sampler, int);\n    pragma(mangle,\"_Z12read_imageui14ocl_image1d_ro11ocl_samplerf\")\n        T4 read(GlobalPointer!image1d_ro_t, Sampler, float);\n    pragma(mangle,\"_Z12read_imageui14ocl_image1d_ro11ocl_sampleri\")\n        T4 read(GlobalPointer!image1d_ro_t, Sampler, int);\n\n    pragma(mangle,\"_Z11read_imagef11ocl_image2d11ocl_samplerDv2_f\")\n        T4 read(GlobalPointer!image2d_rw_t, Sampler, __vector(float[2]));\n    pragma(mangle,\"_Z11read_imagef11ocl_image2d11ocl_samplerDv2_i\")\n        T4 read(GlobalPointer!image2d_rw_t, Sampler, __vector(int[2]));\n    pragma(mangle,\"_Z11read_imagef14ocl_image2d_ro11ocl_samplerDv2_f\")\n        T4 read(GlobalPointer!image2d_ro_t, Sampler, __vector(float[2]));\n    pragma(mangle,\"_Z11read_imagef14ocl_image2d_ro11ocl_samplerDv2_i\")\n        T4 read(GlobalPointer!image2d_ro_t, Sampler, __vector(int[2]));\n\n    pragma(mangle,\"_Z11read_imagef11ocl_image3d11ocl_samplerDv4_f\")\n        T4 read(GlobalPointer!image3d_rw_t, Sampler, __vector(float[4]));\n    pragma(mangle,\"_Z11read_imagef11ocl_image3d11ocl_samplerDv4_i\")\n        T4 read(GlobalPointer!image3d_rw_t, Sampler, __vector(int[4]));\n    pragma(mangle,\"_Z11read_imagef14ocl_image3d_ro11ocl_samplerDv4_f\")\n        T4 read(GlobalPointer!image3d_ro_t, Sampler, __vector(float[4]));\n    pragma(mangle,\"_Z11read_imagef14ocl_image3d_ro11ocl_samplerDv4_i\")\n        T4 read(GlobalPointer!image3d_ro_t, Sampler, __vector(int[4]));\n}\n\ntemplate write(I) if (is(I==GlobalPointer!image1d_rw_t))\n{\n    pragma(mangle,\"_Z12write_imagef11ocl_image1diDv4_f\")\n        void write(I,int,__vector(float[4]));\n    pragma(mangle,\"_Z12write_imagef11ocl_image1diDv4_i\")\n        void write(I,int,__vector(int[4]));\n    pragma(mangle,\"_Z13write_imageui11ocl_image1diDv4_j\")\n        void write(I,int,__vector(uint[4]));\n}\ntemplate write(I) if (is(I==GlobalPointer!image1d_wo_t))\n{\n    pragma(mangle,\"_Z12write_imagef14ocl_image1d_woiDv4_f\")\n        void write(I,int,__vector(float[4]));\n    pragma(mangle,\"_Z12write_imagef14ocl_image1d_woiDv4_i\")\n        void write(I,int,__vector(int[4]));\n    pragma(mangle,\"_Z13write_imageui14ocl_image1d_woiDv4_j\")\n        void write(I,int,__vector(uint[4]));\n}\n\ntemplate write(I) if (is(I==GlobalPointer!image2d_rw_t))\n{\n    pragma(mangle,\"_Z12write_imagef11ocl_image2dDv2_iDv4_f\")\n        void write(I, __vector(int[2]), __vector(float[4]));\n    pragma(mangle,\"_Z12write_imagei11ocl_image2dDv2_iDv4_i\")\n        void write(I, __vector(int[2]), __vector(int[4]));\n    pragma(mangle,\"_Z13write_imageui11ocl_image2dDv2_iDv4_j\")\n        void write(I, __vector(int[2]), __vector(uint[4]));\n}\ntemplate write(I) if (is(I==GlobalPointer!image2d_wo_t))\n{\n    pragma(mangle,\"_Z12write_imagef14ocl_image2d_woDv2_iDv4_f\")\n        void write(I, __vector(int[2]), __vector(float[4]));\n    pragma(mangle,\"_Z12write_imagei14ocl_image2d_woDv2_iDv4_i\")\n        void write(I, __vector(int[2]), __vector(int[4]));\n    pragma(mangle,\"_Z13write_imageui14ocl_image2d_woDv2_iDv4_j\")\n        void write(I, __vector(int[2]), __vector(uint[4]));\n}\n\ntemplate write(I) if (is(I==GlobalPointer!image3d_rw_t))\n{\n    pragma(mangle,\"_Z12write_imagef11ocl_image3dDv4_iDv4_f\")\n        void write(I,__vector(int[4]),__vector(float[4]));\n    pragma(mangle,\"_Z12write_imagei11ocl_image3dDv4_iDv4_i\")\n        void write(I,__vector(int[4]),__vector(int[4]));\n    pragma(mangle,\"_Z13write_imageui11ocl_image3dDv4_iDv4_j\")\n        void write(I,__vector(int[4]),__vector(uint[4]));\n}\ntemplate write(I) if (is(I==GlobalPointer!image3d_wo_t))\n{\n    pragma(mangle,\"_Z12write_imagef14ocl_image3d_woDv4_iDv4_f\")\n        void write(I, __vector(int[4]), __vector(float[4]));\n    pragma(mangle,\"_Z12write_imagei14ocl_image3d_woDv4_iDv4_i\")\n        void write(I, __vector(int[4]), __vector(int[4]));\n    pragma(mangle,\"_Z13write_imageui14ocl_image3d_woDv4_iDv4_j\")\n        void write(I, __vector(int[4]), __vector(uint[4]));\n}\n"
  },
  {
    "path": "source/dcompute/std/opencl/index.d",
    "content": "@compute(CompileFor.deviceOnly) module dcompute.std.opencl.index;\n\nimport ldc.dcompute;\n\npure:\nnothrow:\n@nogc:\n\n// These really ought to be intrinsics, but for some reason they aren't.\n\n/**\n * Returns the number of dimensions in use. This is the\n * value given to the work_dim argument specified in\n * clEnqueueNDRangeKernel.\n * For clEnqueueTask, this returns 1.\n */\npragma(mangle,\"_Z12get_work_dim\")\nuint get_work_dim();\n\n/**\n * Returns the number of global work-items specified for\n * dimension identified by dimindx. This value is given by\n * the global_work_size argument to\n * clEnqueueNDRangeKernel. Valid values of dimindx\n * are 0 to get_work_dim() - 1. For other values of\n * dimindx, get_global_size() returns 1.\n * For clEnqueueTask, this always returns 1.\n */\npragma(mangle,\"_Z15get_global_sizej\")\nsize_t get_global_size(uint dimindx);\n\n/**\n * Returns the unique global work-item ID value for\n * dimension identified by dimindx. The global work-item\n * ID specifies the work-item ID based on the number of\n * global work-items specified to execute the kernel. Valid\n * values of dimindx are 0 to get_work_dim() - 1. For\n * other values of dimindx, get_global_id() returns 0.\n * For clEnqueueTask, this returns 0.\n */\npragma(mangle,\"_Z13get_global_idj\")\nsize_t get_global_id(uint dimindx);\n\n/**\n * Returns the number of local work-items specified in\n * dimension identified by dimindx. This value is given by\n * the local_work_size argument to\n * clEnqueueNDRangeKernel if local_work_size is not\n * NULL; otherwise the OpenCL implementation chooses\n * an appropriate local_work_size value which is returned\n * by this function. Valid values of dimindx are 0 to\n * get_work_dim() - 1. For other values of dimindx,\n * get_local_size() returns 1.\n * For clEnqueueTask, this always returns 1.\n */\npragma(mangle,\"_Z14get_local_sizej\")\nsize_t get_local_size(uint dimindx);\n\n/**\n * Returns the unique local work-item ID i.e. a work-item\n * within a specific work-group for dimension identified by\n * dimindx. Valid values of dimindx are 0 to\n * get_work_dim() - 1. For other values of dimindx,\n * get_local_id() returns 0.\n * For clEnqueueTask, this returns 0.\n */\npragma(mangle,\"_Z12get_local_idj\")\n size_t get_local_id(uint dimindx);\n\n/**\n * Returns the number of work-groups that will execute a\n * kernel for dimension identified by dimindx.\n * Valid values of dimindx are 0 to get_work_dim() - 1.\n * For other values of dimindx, get_num_groups () returns\n * 1.\n * For clEnqueueTask, this always returns 1.\n */\npragma(mangle,\"_Z14get_num_groupsj\")\nsize_t get_num_groups(uint dimindx);\n\n/**\n * get_group_id returns the work-group ID which is a\n * number from 0 .. get_num_groups(dimindx) - 1.\n * Valid values of dimindx are 0 to get_work_dim() - 1.\n * For other values, get_group_id() returns 0.\n * For clEnqueueTask, this returns 0.\n */\npragma(mangle,\"_Z12get_group_idj\")\nsize_t get_group_id(uint dimindx);\n\n/**\n * get_global_offset returns the offset values specified in\n * global_work_offset argument to\n * clEnqueueNDRangeKernel.\n * Valid values of dimindx are 0 to get_work_dim() - 1.\n * For other values, get_global_offset() returns 0.\n * For clEnqueueTask, this returns 0.\n */\npragma(mangle,\"_Z17get_global_offsetj\")\nsize_t get_global_offset(uint dimindx);\n\n//pragma(mangle,\"_Z15get_global_sizej\")\n//size_t get_enqueued_local_size(uint);\npragma(mangle,\"_Z20get_global_linear_id\")\nsize_t get_global_linear_id();\npragma(mangle,\"_Z19get_local_linear_id\")\nsize_t get_local_linear_id();\n\n"
  },
  {
    "path": "source/dcompute/std/opencl/math.d",
    "content": "/**\nProvides access to the OpenCL C math functions and constants.\n\nThese functions are only callable from opencl kernels.\nFunctions taking or returning half floats and half float constants are not supported.\nStandards: [6.15.2. Math Functions](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#math-functions)$(BR)\n           [The OpenCL™ C Specification](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html)\nLicense:  [Boost License 1.0](https://boost.org/LICENSE_1_0.txt).\n*/\n\n@compute(CompileFor.deviceOnly)\nmodule dcompute.std.opencl.math;\n\nimport ldc.dcompute;\n\n// Constants\nenum MAXFLOAT  = float.max;\nenum HUGE_VALF = float.infinity;\nenum INFINITY  = float.infinity;\nenum NAN       = float.nan;\nenum HUGE_VAL  = double.infinity;\n\nenum FLT_DIG        = float.dig;\nenum FLT_MANT_DIG   = float.mant_dig;\nenum FLT_MAX_10_EXP = float.max_10_exp;\nenum FLT_MAX_EXP    = float.max_exp;\nenum FLT_MIN_10_EXP = float.min_10_exp;\nenum FLT_MIN_EXP    = float.min_exp;\nenum FLT_RADIX      = 2;\nenum FLT_MAX        = float.max;\nenum FLT_MIN        = float.min_normal;\nenum FLT_EPSILON    = float.epsilon;\n\nenum FP_ILOGB0   = int.min;\nenum FP_ILOGBNAN = int.max;\n\nenum M_E_F        = 2.71828182845904523536028747135266250f;\nenum M_LOG2E_F    = 1.44269504088896340735992468100189214f;\nenum M_LOG10E_F   = 0.434294481903251827651128918916605082f;\nenum M_LN2_F      = 0.693147180559945309417232121458176568f;\nenum M_LN10_F     = 2.30258509299404568401799145468436421f;\nenum M_PI_F       = 3.14159265358979323846264338327950288f;\nenum M_PI_2_F     = 1.57079632679489661923132169163975144f;\nenum M_PI_4_F     = 0.785398163397448309615660845819875721f;\nenum M_1_PI_F     = 0.318309886183790671537767526745028724f;\nenum M_2_PI_F     = 0.636619772367581343075535053490057448f;\nenum M_2_SQRTPI_F = 1.12837916709551257389615890312154517f;\nenum M_SQRT2_F    = 1.41421356237309504880168872420969808f;\nenum M_SQRT1_2_F  = 0.707106781186547524400844362104849039f;\n\nenum DBL_DIG        = double.dig;\nenum DBL_MANT_DIG   = double.mant_dig;\nenum DBL_MAX_10_EXP = double.max_10_exp;\nenum DBL_MAX_EXP    = double.max_exp;\nenum DBL_MIN_10_EXP = double.min_10_exp;\nenum DBL_MIN_EXP    = double.min_exp;\nenum DBL_MAX        = double.max;\nenum DBL_MIN        = double.min_normal;\nenum DBL_EPSILON    = double.epsilon;\n\nenum M_E        = 0x1.5bf0a8b145769p+1;\nenum M_LOG2E    = 0x1.71547652b82fep+0;\nenum M_LOG10E   = 0x1.bcb7b1526e50ep-2;\nenum M_LN2      = 0x1.62e42fefa39efp-1;\nenum M_LN10     = 0x1.26bb1bbb55516p+1;\nenum M_PI       = 0x1.921fb54442d18p+1;\nenum M_PI_2     = 0x1.921fb54442d18p+0;\nenum M_PI_4     = 0x1.921fb54442d18p-1;\nenum M_1_PI     = 0x1.45f306dc9c883p-2;\nenum M_2_PI     = 0x1.45f306dc9c883p-1;\nenum M_2_SQRTPI = 0x1.20dd750429b6dp+0;\nenum M_SQRT2    = 0x1.6a09e667f3bcdp+0;\nenum M_SQRT1_2  = 0x1.6a09e667f3bcdp-1;\n\n// acos\npragma(mangle,\"_Z4acosf\")               float       acos(         float);\npragma(mangle,\"_Z4acosDv2_f\")  __vector(float[2])   acos(__vector(float[2]));\npragma(mangle,\"_Z4acosDv3_f\")  __vector(float[3])   acos(__vector(float[3]));\npragma(mangle,\"_Z4acosDv4_f\")  __vector(float[4])   acos(__vector(float[4]));\npragma(mangle,\"_Z4acosDv8_f\")  __vector(float[8])   acos(__vector(float[8]));\npragma(mangle,\"_Z4acosDv16_f\") __vector(float[16])  acos(__vector(float[16]));\npragma(mangle,\"_Z4acosd\")               double      acos(         double);\npragma(mangle,\"_Z4acosDv2_d\")  __vector(double[2])  acos(__vector(double[2]));\npragma(mangle,\"_Z4acosDv3_d\")  __vector(double[3])  acos(__vector(double[3]));\npragma(mangle,\"_Z4acosDv4_d\")  __vector(double[4])  acos(__vector(double[4]));\npragma(mangle,\"_Z4acosDv8_d\")  __vector(double[8])  acos(__vector(double[8]));\npragma(mangle,\"_Z4acosDv16_d\") __vector(double[16]) acos(__vector(double[16]));\n\n// acosh\npragma(mangle,\"_Z5acoshf\")               float       acosh(         float);\npragma(mangle,\"_Z5acoshDv2_f\")  __vector(float[2])   acosh(__vector(float[2]));\npragma(mangle,\"_Z5acoshDv3_f\")  __vector(float[3])   acosh(__vector(float[3]));\npragma(mangle,\"_Z5acoshDv4_f\")  __vector(float[4])   acosh(__vector(float[4]));\npragma(mangle,\"_Z5acoshDv8_f\")  __vector(float[8])   acosh(__vector(float[8]));\npragma(mangle,\"_Z5acoshDv16_f\") __vector(float[16])  acosh(__vector(float[16]));\npragma(mangle,\"_Z5acoshd\")               double      acosh(         double);\npragma(mangle,\"_Z5acoshDv2_d\")  __vector(double[2])  acosh(__vector(double[2]));\npragma(mangle,\"_Z5acoshDv3_d\")  __vector(double[3])  acosh(__vector(double[3]));\npragma(mangle,\"_Z5acoshDv4_d\")  __vector(double[4])  acosh(__vector(double[4]));\npragma(mangle,\"_Z5acoshDv8_d\")  __vector(double[8])  acosh(__vector(double[8]));\npragma(mangle,\"_Z5acoshDv16_d\") __vector(double[16]) acosh(__vector(double[16]));\n\n// acospi\npragma(mangle,\"_Z6acospif\")               float       acospi(         float);\npragma(mangle,\"_Z6acospiDv2_f\")  __vector(float[2])   acospi(__vector(float[2]));\npragma(mangle,\"_Z6acospiDv3_f\")  __vector(float[3])   acospi(__vector(float[3]));\npragma(mangle,\"_Z6acospiDv4_f\")  __vector(float[4])   acospi(__vector(float[4]));\npragma(mangle,\"_Z6acospiDv8_f\")  __vector(float[8])   acospi(__vector(float[8]));\npragma(mangle,\"_Z6acospiDv16_f\") __vector(float[16])  acospi(__vector(float[16]));\npragma(mangle,\"_Z6acospid\")               double      acospi(         double);\npragma(mangle,\"_Z6acospiDv2_d\")  __vector(double[2])  acospi(__vector(double[2]));\npragma(mangle,\"_Z6acospiDv3_d\")  __vector(double[3])  acospi(__vector(double[3]));\npragma(mangle,\"_Z6acospiDv4_d\")  __vector(double[4])  acospi(__vector(double[4]));\npragma(mangle,\"_Z6acospiDv8_d\")  __vector(double[8])  acospi(__vector(double[8]));\npragma(mangle,\"_Z6acospiDv16_d\") __vector(double[16]) acospi(__vector(double[16]));\n\n// asin\npragma(mangle,\"_Z4asinf\")               float       asin(         float);\npragma(mangle,\"_Z4asinDv2_f\")  __vector(float[2])   asin(__vector(float[2]));\npragma(mangle,\"_Z4asinDv3_f\")  __vector(float[3])   asin(__vector(float[3]));\npragma(mangle,\"_Z4asinDv4_f\")  __vector(float[4])   asin(__vector(float[4]));\npragma(mangle,\"_Z4asinDv8_f\")  __vector(float[8])   asin(__vector(float[8]));\npragma(mangle,\"_Z4asinDv16_f\") __vector(float[16])  asin(__vector(float[16]));\npragma(mangle,\"_Z4asind\")               double      asin(         double);\npragma(mangle,\"_Z4asinDv2_d\")  __vector(double[2])  asin(__vector(double[2]));\npragma(mangle,\"_Z4asinDv3_d\")  __vector(double[3])  asin(__vector(double[3]));\npragma(mangle,\"_Z4asinDv4_d\")  __vector(double[4])  asin(__vector(double[4]));\npragma(mangle,\"_Z4asinDv8_d\")  __vector(double[8])  asin(__vector(double[8]));\npragma(mangle,\"_Z4asinDv16_d\") __vector(double[16]) asin(__vector(double[16]));\n\n// asinh\npragma(mangle,\"_Z5asinhf\")               float       asinh(         float);\npragma(mangle,\"_Z5asinhDv2_f\")  __vector(float[2])   asinh(__vector(float[2]));\npragma(mangle,\"_Z5asinhDv3_f\")  __vector(float[3])   asinh(__vector(float[3]));\npragma(mangle,\"_Z5asinhDv4_f\")  __vector(float[4])   asinh(__vector(float[4]));\npragma(mangle,\"_Z5asinhDv8_f\")  __vector(float[8])   asinh(__vector(float[8]));\npragma(mangle,\"_Z5asinhDv16_f\") __vector(float[16])  asinh(__vector(float[16]));\npragma(mangle,\"_Z5asinhd\")               double      asinh(         double);\npragma(mangle,\"_Z5asinhDv2_d\")  __vector(double[2])  asinh(__vector(double[2]));\npragma(mangle,\"_Z5asinhDv3_d\")  __vector(double[3])  asinh(__vector(double[3]));\npragma(mangle,\"_Z5asinhDv4_d\")  __vector(double[4])  asinh(__vector(double[4]));\npragma(mangle,\"_Z5asinhDv8_d\")  __vector(double[8])  asinh(__vector(double[8]));\npragma(mangle,\"_Z5asinhDv16_d\") __vector(double[16]) asinh(__vector(double[16]));\n\n// asinpi\npragma(mangle,\"_Z6asinpif\")               float       asinpi(         float);\npragma(mangle,\"_Z6asinpiDv2_f\")  __vector(float[2])   asinpi(__vector(float[2]));\npragma(mangle,\"_Z6asinpiDv3_f\")  __vector(float[3])   asinpi(__vector(float[3]));\npragma(mangle,\"_Z6asinpiDv4_f\")  __vector(float[4])   asinpi(__vector(float[4]));\npragma(mangle,\"_Z6asinpiDv8_f\")  __vector(float[8])   asinpi(__vector(float[8]));\npragma(mangle,\"_Z6asinpiDv16_f\") __vector(float[16])  asinpi(__vector(float[16]));\npragma(mangle,\"_Z6asinpid\")               double      asinpi(         double);\npragma(mangle,\"_Z6asinpiDv2_d\")  __vector(double[2])  asinpi(__vector(double[2]));\npragma(mangle,\"_Z6asinpiDv3_d\")  __vector(double[3])  asinpi(__vector(double[3]));\npragma(mangle,\"_Z6asinpiDv4_d\")  __vector(double[4])  asinpi(__vector(double[4]));\npragma(mangle,\"_Z6asinpiDv8_d\")  __vector(double[8])  asinpi(__vector(double[8]));\npragma(mangle,\"_Z6asinpiDv16_d\") __vector(double[16]) asinpi(__vector(double[16]));\n\n// atan\npragma(mangle,\"_Z4atanf\")               float       atan(         float);\npragma(mangle,\"_Z4atanDv2_f\")  __vector(float[2])   atan(__vector(float[2]));\npragma(mangle,\"_Z4atanDv3_f\")  __vector(float[3])   atan(__vector(float[3]));\npragma(mangle,\"_Z4atanDv4_f\")  __vector(float[4])   atan(__vector(float[4]));\npragma(mangle,\"_Z4atanDv8_f\")  __vector(float[8])   atan(__vector(float[8]));\npragma(mangle,\"_Z4atanDv16_f\") __vector(float[16])  atan(__vector(float[16]));\npragma(mangle,\"_Z4atand\")               double      atan(         double);\npragma(mangle,\"_Z4atanDv2_d\")  __vector(double[2])  atan(__vector(double[2]));\npragma(mangle,\"_Z4atanDv3_d\")  __vector(double[3])  atan(__vector(double[3]));\npragma(mangle,\"_Z4atanDv4_d\")  __vector(double[4])  atan(__vector(double[4]));\npragma(mangle,\"_Z4atanDv8_d\")  __vector(double[8])  atan(__vector(double[8]));\npragma(mangle,\"_Z4atanDv16_d\") __vector(double[16]) atan(__vector(double[16]));\n\n// atan2\npragma(mangle,\"_Z5atan2ff\")                float       atan2(         float,                float);\npragma(mangle,\"_Z5atan2Dv2_fS_\")  __vector(float[2])   atan2(__vector(float[2]),   __vector(float[2]));\npragma(mangle,\"_Z5atan2Dv3_fS_\")  __vector(float[3])   atan2(__vector(float[3]),   __vector(float[3]));\npragma(mangle,\"_Z5atan2Dv4_fS_\")  __vector(float[4])   atan2(__vector(float[4]),   __vector(float[4]));\npragma(mangle,\"_Z5atan2Dv8_fS_\")  __vector(float[8])   atan2(__vector(float[8]),   __vector(float[8]));\npragma(mangle,\"_Z5atan2Dv16_fS_\") __vector(float[16])  atan2(__vector(float[16]),  __vector(float[16]));\npragma(mangle,\"_Z5atan2dd\")                double      atan2(         double,              double);\npragma(mangle,\"_Z5atan2Dv2_dS_\")  __vector(double[2])  atan2(__vector(double[2]),  __vector(double[2]));\npragma(mangle,\"_Z5atan2Dv3_dS_\")  __vector(double[3])  atan2(__vector(double[3]),  __vector(double[3]));\npragma(mangle,\"_Z5atan2Dv4_dS_\")  __vector(double[4])  atan2(__vector(double[4]),  __vector(double[4]));\npragma(mangle,\"_Z5atan2Dv8_dS_\")  __vector(double[8])  atan2(__vector(double[8]),  __vector(double[8]));\npragma(mangle,\"_Z5atan2Dv16_dS_\") __vector(double[16]) atan2(__vector(double[16]), __vector(double[16]));\n\n// atanh\npragma(mangle,\"_Z5atanhf\")               float       atanh(         float);\npragma(mangle,\"_Z5atanhDv2_f\")  __vector(float[2])   atanh(__vector(float[2]));\npragma(mangle,\"_Z5atanhDv3_f\")  __vector(float[3])   atanh(__vector(float[3]));\npragma(mangle,\"_Z5atanhDv4_f\")  __vector(float[4])   atanh(__vector(float[4]));\npragma(mangle,\"_Z5atanhDv8_f\")  __vector(float[8])   atanh(__vector(float[8]));\npragma(mangle,\"_Z5atanhDv16_f\") __vector(float[16])  atanh(__vector(float[16]));\npragma(mangle,\"_Z5atanhd\")               double      atanh(         double);\npragma(mangle,\"_Z5atanhDv2_d\")  __vector(double[2])  atanh(__vector(double[2]));\npragma(mangle,\"_Z5atanhDv3_d\")  __vector(double[3])  atanh(__vector(double[3]));\npragma(mangle,\"_Z5atanhDv4_d\")  __vector(double[4])  atanh(__vector(double[4]));\npragma(mangle,\"_Z5atanhDv8_d\")  __vector(double[8])  atanh(__vector(double[8]));\npragma(mangle,\"_Z5atanhDv16_d\") __vector(double[16]) atanh(__vector(double[16]));\n\n// atanpi\npragma(mangle,\"_Z6atanpif\")               float       atanpi(         float);\npragma(mangle,\"_Z6atanpiDv2_f\")  __vector(float[2])   atanpi(__vector(float[2]));\npragma(mangle,\"_Z6atanpiDv3_f\")  __vector(float[3])   atanpi(__vector(float[3]));\npragma(mangle,\"_Z6atanpiDv4_f\")  __vector(float[4])   atanpi(__vector(float[4]));\npragma(mangle,\"_Z6atanpiDv8_f\")  __vector(float[8])   atanpi(__vector(float[8]));\npragma(mangle,\"_Z6atanpiDv16_f\") __vector(float[16])  atanpi(__vector(float[16]));\npragma(mangle,\"_Z6atanpid\")               double      atanpi(         double);\npragma(mangle,\"_Z6atanpiDv2_d\")  __vector(double[2])  atanpi(__vector(double[2]));\npragma(mangle,\"_Z6atanpiDv3_d\")  __vector(double[3])  atanpi(__vector(double[3]));\npragma(mangle,\"_Z6atanpiDv4_d\")  __vector(double[4])  atanpi(__vector(double[4]));\npragma(mangle,\"_Z6atanpiDv8_d\")  __vector(double[8])  atanpi(__vector(double[8]));\npragma(mangle,\"_Z6atanpiDv16_d\") __vector(double[16]) atanpi(__vector(double[16]));\n\n// atan2pi\npragma(mangle,\"_Z7atan2piff\")                float       atan2pi(         float,                float);\npragma(mangle,\"_Z7atan2piDv2_fS_\")  __vector(float[2])   atan2pi(__vector(float[2]),   __vector(float[2]));\npragma(mangle,\"_Z7atan2piDv3_fS_\")  __vector(float[3])   atan2pi(__vector(float[3]),   __vector(float[3]));\npragma(mangle,\"_Z7atan2piDv4_fS_\")  __vector(float[4])   atan2pi(__vector(float[4]),   __vector(float[4]));\npragma(mangle,\"_Z7atan2piDv8_fS_\")  __vector(float[8])   atan2pi(__vector(float[8]),   __vector(float[8]));\npragma(mangle,\"_Z7atan2piDv16_fS_\") __vector(float[16])  atan2pi(__vector(float[16]),  __vector(float[16]));\npragma(mangle,\"_Z7atan2pidd\")                double      atan2pi(         double,               double);\npragma(mangle,\"_Z7atan2piDv2_dS_\")  __vector(double[2])  atan2pi(__vector(double[2]),  __vector(double[2]));\npragma(mangle,\"_Z7atan2piDv3_dS_\")  __vector(double[3])  atan2pi(__vector(double[3]),  __vector(double[3]));\npragma(mangle,\"_Z7atan2piDv4_dS_\")  __vector(double[4])  atan2pi(__vector(double[4]),  __vector(double[4]));\npragma(mangle,\"_Z7atan2piDv8_dS_\")  __vector(double[8])  atan2pi(__vector(double[8]),  __vector(double[8]));\npragma(mangle,\"_Z7atan2piDv16_dS_\") __vector(double[16]) atan2pi(__vector(double[16]), __vector(double[16]));\n\n// cbrt\npragma(mangle,\"_Z4cbrtf\")               float       cbrt(         float);\npragma(mangle,\"_Z4cbrtDv2_f\")  __vector(float[2])   cbrt(__vector(float[2]));\npragma(mangle,\"_Z4cbrtDv3_f\")  __vector(float[3])   cbrt(__vector(float[3]));\npragma(mangle,\"_Z4cbrtDv4_f\")  __vector(float[4])   cbrt(__vector(float[4]));\npragma(mangle,\"_Z4cbrtDv8_f\")  __vector(float[8])   cbrt(__vector(float[8]));\npragma(mangle,\"_Z4cbrtDv16_f\") __vector(float[16])  cbrt(__vector(float[16]));\npragma(mangle,\"_Z4cbrtd\")               double      cbrt(         double);\npragma(mangle,\"_Z4cbrtDv2_d\")  __vector(double[2])  cbrt(__vector(double[2]));\npragma(mangle,\"_Z4cbrtDv3_d\")  __vector(double[3])  cbrt(__vector(double[3]));\npragma(mangle,\"_Z4cbrtDv4_d\")  __vector(double[4])  cbrt(__vector(double[4]));\npragma(mangle,\"_Z4cbrtDv8_d\")  __vector(double[8])  cbrt(__vector(double[8]));\npragma(mangle,\"_Z4cbrtDv16_d\") __vector(double[16]) cbrt(__vector(double[16]));\n\n// ceil\npragma(mangle,\"_Z4ceilf\")               float       ceil(         float);\npragma(mangle,\"_Z4ceilDv2_f\")  __vector(float[2])   ceil(__vector(float[2]));\npragma(mangle,\"_Z4ceilDv3_f\")  __vector(float[3])   ceil(__vector(float[3]));\npragma(mangle,\"_Z4ceilDv4_f\")  __vector(float[4])   ceil(__vector(float[4]));\npragma(mangle,\"_Z4ceilDv8_f\")  __vector(float[8])   ceil(__vector(float[8]));\npragma(mangle,\"_Z4ceilDv16_f\") __vector(float[16])  ceil(__vector(float[16]));\npragma(mangle,\"_Z4ceild\")               double      ceil(         double);\npragma(mangle,\"_Z4ceilDv2_d\")  __vector(double[2])  ceil(__vector(double[2]));\npragma(mangle,\"_Z4ceilDv3_d\")  __vector(double[3])  ceil(__vector(double[3]));\npragma(mangle,\"_Z4ceilDv4_d\")  __vector(double[4])  ceil(__vector(double[4]));\npragma(mangle,\"_Z4ceilDv8_d\")  __vector(double[8])  ceil(__vector(double[8]));\npragma(mangle,\"_Z4ceilDv16_d\") __vector(double[16]) ceil(__vector(double[16]));\n\n// copysign\npragma(mangle,\"_Z8copysignff\")                float       copysign(         float,                float);\npragma(mangle,\"_Z8copysignDv2_fS_\")  __vector(float[2])   copysign(__vector(float[2]),   __vector(float[2]));\npragma(mangle,\"_Z8copysignDv3_fS_\")  __vector(float[3])   copysign(__vector(float[3]),   __vector(float[3]));\npragma(mangle,\"_Z8copysignDv4_fS_\")  __vector(float[4])   copysign(__vector(float[4]),   __vector(float[4]));\npragma(mangle,\"_Z8copysignDv8_fS_\")  __vector(float[8])   copysign(__vector(float[8]),   __vector(float[8]));\npragma(mangle,\"_Z8copysignDv16_fS_\") __vector(float[16])  copysign(__vector(float[16]),  __vector(float[16]));\npragma(mangle,\"_Z8copysigndd\")                double      copysign(         double,               double);\npragma(mangle,\"_Z8copysignDv2_dS_\")  __vector(double[2])  copysign(__vector(double[2]),  __vector(double[2]));\npragma(mangle,\"_Z8copysignDv3_dS_\")  __vector(double[3])  copysign(__vector(double[3]),  __vector(double[3]));\npragma(mangle,\"_Z8copysignDv4_dS_\")  __vector(double[4])  copysign(__vector(double[4]),  __vector(double[4]));\npragma(mangle,\"_Z8copysignDv8_dS_\")  __vector(double[8])  copysign(__vector(double[8]),  __vector(double[8]));\npragma(mangle,\"_Z8copysignDv16_dS_\") __vector(double[16]) copysign(__vector(double[16]), __vector(double[16]));\n\n// cos\npragma(mangle,\"_Z3cosf\")               float       cos(         float);\npragma(mangle,\"_Z3cosDv2_f\")  __vector(float[2])   cos(__vector(float[2]));\npragma(mangle,\"_Z3cosDv3_f\")  __vector(float[3])   cos(__vector(float[3]));\npragma(mangle,\"_Z3cosDv4_f\")  __vector(float[4])   cos(__vector(float[4]));\npragma(mangle,\"_Z3cosDv8_f\")  __vector(float[8])   cos(__vector(float[8]));\npragma(mangle,\"_Z3cosDv16_f\") __vector(float[16])  cos(__vector(float[16]));\npragma(mangle,\"_Z3cosd\")               double      cos(         double);\npragma(mangle,\"_Z3cosDv2_d\")  __vector(double[2])  cos(__vector(double[2]));\npragma(mangle,\"_Z3cosDv3_d\")  __vector(double[3])  cos(__vector(double[3]));\npragma(mangle,\"_Z3cosDv4_d\")  __vector(double[4])  cos(__vector(double[4]));\npragma(mangle,\"_Z3cosDv8_d\")  __vector(double[8])  cos(__vector(double[8]));\npragma(mangle,\"_Z3cosDv16_d\") __vector(double[16]) cos(__vector(double[16]));\n\n// cosh\npragma(mangle,\"_Z4coshf\")               float       cosh(         float);\npragma(mangle,\"_Z4coshDv2_f\")  __vector(float[2])   cosh(__vector(float[2]));\npragma(mangle,\"_Z4coshDv3_f\")  __vector(float[3])   cosh(__vector(float[3]));\npragma(mangle,\"_Z4coshDv4_f\")  __vector(float[4])   cosh(__vector(float[4]));\npragma(mangle,\"_Z4coshDv8_f\")  __vector(float[8])   cosh(__vector(float[8]));\npragma(mangle,\"_Z4coshDv16_f\") __vector(float[16])  cosh(__vector(float[16]));\npragma(mangle,\"_Z4coshd\")               double      cosh(         double);\npragma(mangle,\"_Z4coshDv2_d\")  __vector(double[2])  cosh(__vector(double[2]));\npragma(mangle,\"_Z4coshDv3_d\")  __vector(double[3])  cosh(__vector(double[3]));\npragma(mangle,\"_Z4coshDv4_d\")  __vector(double[4])  cosh(__vector(double[4]));\npragma(mangle,\"_Z4coshDv8_d\")  __vector(double[8])  cosh(__vector(double[8]));\npragma(mangle,\"_Z4coshDv16_d\") __vector(double[16]) cosh(__vector(double[16]));\n\n// cospi\npragma(mangle,\"_Z5cospif\")               float       cospi(         float);\npragma(mangle,\"_Z5cospiDv2_f\")  __vector(float[2])   cospi(__vector(float[2]));\npragma(mangle,\"_Z5cospiDv3_f\")  __vector(float[3])   cospi(__vector(float[3]));\npragma(mangle,\"_Z5cospiDv4_f\")  __vector(float[4])   cospi(__vector(float[4]));\npragma(mangle,\"_Z5cospiDv8_f\")  __vector(float[8])   cospi(__vector(float[8]));\npragma(mangle,\"_Z5cospiDv16_f\") __vector(float[16])  cospi(__vector(float[16]));\npragma(mangle,\"_Z5cospid\")               double      cospi(         double);\npragma(mangle,\"_Z5cospiDv2_d\")  __vector(double[2])  cospi(__vector(double[2]));\npragma(mangle,\"_Z5cospiDv3_d\")  __vector(double[3])  cospi(__vector(double[3]));\npragma(mangle,\"_Z5cospiDv4_d\")  __vector(double[4])  cospi(__vector(double[4]));\npragma(mangle,\"_Z5cospiDv8_d\")  __vector(double[8])  cospi(__vector(double[8]));\npragma(mangle,\"_Z5cospiDv16_d\") __vector(double[16]) cospi(__vector(double[16]));\n\n// erfc\npragma(mangle,\"_Z4erfcf\")               float       erfc(         float);\npragma(mangle,\"_Z4erfcDv2_f\")  __vector(float[2])   erfc(__vector(float[2]));\npragma(mangle,\"_Z4erfcDv3_f\")  __vector(float[3])   erfc(__vector(float[3]));\npragma(mangle,\"_Z4erfcDv4_f\")  __vector(float[4])   erfc(__vector(float[4]));\npragma(mangle,\"_Z4erfcDv8_f\")  __vector(float[8])   erfc(__vector(float[8]));\npragma(mangle,\"_Z4erfcDv16_f\") __vector(float[16])  erfc(__vector(float[16]));\npragma(mangle,\"_Z4erfcd\")               double      erfc(         double);\npragma(mangle,\"_Z4erfcDv2_d\")  __vector(double[2])  erfc(__vector(double[2]));\npragma(mangle,\"_Z4erfcDv3_d\")  __vector(double[3])  erfc(__vector(double[3]));\npragma(mangle,\"_Z4erfcDv4_d\")  __vector(double[4])  erfc(__vector(double[4]));\npragma(mangle,\"_Z4erfcDv8_d\")  __vector(double[8])  erfc(__vector(double[8]));\npragma(mangle,\"_Z4erfcDv16_d\") __vector(double[16]) erfc(__vector(double[16]));\n\n// erf\npragma(mangle,\"_Z3erff\")               float       erf(         float);\npragma(mangle,\"_Z3erfDv2_f\")  __vector(float[2])   erf(__vector(float[2]));\npragma(mangle,\"_Z3erfDv3_f\")  __vector(float[3])   erf(__vector(float[3]));\npragma(mangle,\"_Z3erfDv4_f\")  __vector(float[4])   erf(__vector(float[4]));\npragma(mangle,\"_Z3erfDv8_f\")  __vector(float[8])   erf(__vector(float[8]));\npragma(mangle,\"_Z3erfDv16_f\") __vector(float[16])  erf(__vector(float[16]));\npragma(mangle,\"_Z3erfd\")               double      erf(         double);\npragma(mangle,\"_Z3erfDv2_d\")  __vector(double[2])  erf(__vector(double[2]));\npragma(mangle,\"_Z3erfDv3_d\")  __vector(double[3])  erf(__vector(double[3]));\npragma(mangle,\"_Z3erfDv4_d\")  __vector(double[4])  erf(__vector(double[4]));\npragma(mangle,\"_Z3erfDv8_d\")  __vector(double[8])  erf(__vector(double[8]));\npragma(mangle,\"_Z3erfDv16_d\") __vector(double[16]) erf(__vector(double[16]));\n\n// exp\npragma(mangle,\"_Z3expf\")               float       exp(         float);\npragma(mangle,\"_Z3expDv2_f\")  __vector(float[2])   exp(__vector(float[2]));\npragma(mangle,\"_Z3expDv3_f\")  __vector(float[3])   exp(__vector(float[3]));\npragma(mangle,\"_Z3expDv4_f\")  __vector(float[4])   exp(__vector(float[4]));\npragma(mangle,\"_Z3expDv8_f\")  __vector(float[8])   exp(__vector(float[8]));\npragma(mangle,\"_Z3expDv16_f\") __vector(float[16])  exp(__vector(float[16]));\npragma(mangle,\"_Z3expd\")               double      exp(         double);\npragma(mangle,\"_Z3expDv2_d\")  __vector(double[2])  exp(__vector(double[2]));\npragma(mangle,\"_Z3expDv3_d\")  __vector(double[3])  exp(__vector(double[3]));\npragma(mangle,\"_Z3expDv4_d\")  __vector(double[4])  exp(__vector(double[4]));\npragma(mangle,\"_Z3expDv8_d\")  __vector(double[8])  exp(__vector(double[8]));\npragma(mangle,\"_Z3expDv16_d\") __vector(double[16]) exp(__vector(double[16]));\n\n// exp2\npragma(mangle,\"_Z4exp2f\")               float       exp2(         float);\npragma(mangle,\"_Z4exp2Dv2_f\")  __vector(float[2])   exp2(__vector(float[2]));\npragma(mangle,\"_Z4exp2Dv3_f\")  __vector(float[3])   exp2(__vector(float[3]));\npragma(mangle,\"_Z4exp2Dv4_f\")  __vector(float[4])   exp2(__vector(float[4]));\npragma(mangle,\"_Z4exp2Dv8_f\")  __vector(float[8])   exp2(__vector(float[8]));\npragma(mangle,\"_Z4exp2Dv16_f\") __vector(float[16])  exp2(__vector(float[16]));\npragma(mangle,\"_Z4exp2d\")               double      exp2(         double);\npragma(mangle,\"_Z4exp2Dv2_d\")  __vector(double[2])  exp2(__vector(double[2]));\npragma(mangle,\"_Z4exp2Dv3_d\")  __vector(double[3])  exp2(__vector(double[3]));\npragma(mangle,\"_Z4exp2Dv4_d\")  __vector(double[4])  exp2(__vector(double[4]));\npragma(mangle,\"_Z4exp2Dv8_d\")  __vector(double[8])  exp2(__vector(double[8]));\npragma(mangle,\"_Z4exp2Dv16_d\") __vector(double[16]) exp2(__vector(double[16]));\n\n// exp10\npragma(mangle,\"_Z5exp10f\")               float       exp10(         float);\npragma(mangle,\"_Z5exp10Dv2_f\")  __vector(float[2])   exp10(__vector(float[2]));\npragma(mangle,\"_Z5exp10Dv3_f\")  __vector(float[3])   exp10(__vector(float[3]));\npragma(mangle,\"_Z5exp10Dv4_f\")  __vector(float[4])   exp10(__vector(float[4]));\npragma(mangle,\"_Z5exp10Dv8_f\")  __vector(float[8])   exp10(__vector(float[8]));\npragma(mangle,\"_Z5exp10Dv16_f\") __vector(float[16])  exp10(__vector(float[16]));\npragma(mangle,\"_Z5exp10d\")               double      exp10(         double);\npragma(mangle,\"_Z5exp10Dv2_d\")  __vector(double[2])  exp10(__vector(double[2]));\npragma(mangle,\"_Z5exp10Dv3_d\")  __vector(double[3])  exp10(__vector(double[3]));\npragma(mangle,\"_Z5exp10Dv4_d\")  __vector(double[4])  exp10(__vector(double[4]));\npragma(mangle,\"_Z5exp10Dv8_d\")  __vector(double[8])  exp10(__vector(double[8]));\npragma(mangle,\"_Z5exp10Dv16_d\") __vector(double[16]) exp10(__vector(double[16]));\n\n// expm1\npragma(mangle,\"_Z5expm1f\")               float       expm1(         float);\npragma(mangle,\"_Z5expm1Dv2_f\")  __vector(float[2])   expm1(__vector(float[2]));\npragma(mangle,\"_Z5expm1Dv3_f\")  __vector(float[3])   expm1(__vector(float[3]));\npragma(mangle,\"_Z5expm1Dv4_f\")  __vector(float[4])   expm1(__vector(float[4]));\npragma(mangle,\"_Z5expm1Dv8_f\")  __vector(float[8])   expm1(__vector(float[8]));\npragma(mangle,\"_Z5expm1Dv16_f\") __vector(float[16])  expm1(__vector(float[16]));\npragma(mangle,\"_Z5expm1d\")               double      expm1(         double);\npragma(mangle,\"_Z5expm1Dv2_d\")  __vector(double[2])  expm1(__vector(double[2]));\npragma(mangle,\"_Z5expm1Dv3_d\")  __vector(double[3])  expm1(__vector(double[3]));\npragma(mangle,\"_Z5expm1Dv4_d\")  __vector(double[4])  expm1(__vector(double[4]));\npragma(mangle,\"_Z5expm1Dv8_d\")  __vector(double[8])  expm1(__vector(double[8]));\npragma(mangle,\"_Z5expm1Dv16_d\") __vector(double[16]) expm1(__vector(double[16]));\n\n// fabs\npragma(mangle,\"_Z4fabsf\")               float       fabs(         float);\npragma(mangle,\"_Z4fabsDv2_f\")  __vector(float[2])   fabs(__vector(float[2]));\npragma(mangle,\"_Z4fabsDv3_f\")  __vector(float[3])   fabs(__vector(float[3]));\npragma(mangle,\"_Z4fabsDv4_f\")  __vector(float[4])   fabs(__vector(float[4]));\npragma(mangle,\"_Z4fabsDv8_f\")  __vector(float[8])   fabs(__vector(float[8]));\npragma(mangle,\"_Z4fabsDv16_f\") __vector(float[16])  fabs(__vector(float[16]));\npragma(mangle,\"_Z4fabsd\")               double      fabs(         double);\npragma(mangle,\"_Z4fabsDv2_d\")  __vector(double[2])  fabs(__vector(double[2]));\npragma(mangle,\"_Z4fabsDv3_d\")  __vector(double[3])  fabs(__vector(double[3]));\npragma(mangle,\"_Z4fabsDv4_d\")  __vector(double[4])  fabs(__vector(double[4]));\npragma(mangle,\"_Z4fabsDv8_d\")  __vector(double[8])  fabs(__vector(double[8]));\npragma(mangle,\"_Z4fabsDv16_d\") __vector(double[16]) fabs(__vector(double[16]));\n\n// fdim\npragma(mangle,\"_Z4fdimff\")                float       fdim(         float,                float);\npragma(mangle,\"_Z4fdimDv2_fS_\")  __vector(float[2])   fdim(__vector(float[2]),   __vector(float[2]));\npragma(mangle,\"_Z4fdimDv3_fS_\")  __vector(float[3])   fdim(__vector(float[3]),   __vector(float[3]));\npragma(mangle,\"_Z4fdimDv4_fS_\")  __vector(float[4])   fdim(__vector(float[4]),   __vector(float[4]));\npragma(mangle,\"_Z4fdimDv8_fS_\")  __vector(float[8])   fdim(__vector(float[8]),   __vector(float[8]));\npragma(mangle,\"_Z4fdimDv16_fS_\") __vector(float[16])  fdim(__vector(float[16]),  __vector(float[16]));\npragma(mangle,\"_Z4fdimdd\")                double      fdim(         double,               double);\npragma(mangle,\"_Z4fdimDv2_dS_\")  __vector(double[2])  fdim(__vector(double[2]),  __vector(double[2]));\npragma(mangle,\"_Z4fdimDv3_dS_\")  __vector(double[3])  fdim(__vector(double[3]),  __vector(double[3]));\npragma(mangle,\"_Z4fdimDv4_dS_\")  __vector(double[4])  fdim(__vector(double[4]),  __vector(double[4]));\npragma(mangle,\"_Z4fdimDv8_dS_\")  __vector(double[8])  fdim(__vector(double[8]),  __vector(double[8]));\npragma(mangle,\"_Z4fdimDv16_dS_\") __vector(double[16]) fdim(__vector(double[16]), __vector(double[16]));\n\n// floor\npragma(mangle,\"_Z5floorf\")               float       floor(         float);\npragma(mangle,\"_Z5floorDv2_f\")  __vector(float[2])   floor(__vector(float[2]));\npragma(mangle,\"_Z5floorDv3_f\")  __vector(float[3])   floor(__vector(float[3]));\npragma(mangle,\"_Z5floorDv4_f\")  __vector(float[4])   floor(__vector(float[4]));\npragma(mangle,\"_Z5floorDv8_f\")  __vector(float[8])   floor(__vector(float[8]));\npragma(mangle,\"_Z5floorDv16_f\") __vector(float[16])  floor(__vector(float[16]));\npragma(mangle,\"_Z5floord\")               double      floor(         double);\npragma(mangle,\"_Z5floorDv2_d\")  __vector(double[2])  floor(__vector(double[2]));\npragma(mangle,\"_Z5floorDv3_d\")  __vector(double[3])  floor(__vector(double[3]));\npragma(mangle,\"_Z5floorDv4_d\")  __vector(double[4])  floor(__vector(double[4]));\npragma(mangle,\"_Z5floorDv8_d\")  __vector(double[8])  floor(__vector(double[8]));\npragma(mangle,\"_Z5floorDv16_d\") __vector(double[16]) floor(__vector(double[16]));\n\n// fma\npragma(mangle,\"_Z3fmafff\")                float      fma(         float,                float,               float);\npragma(mangle,\"_Z3fmaDv2_fS_S_\") __vector(float[2])  fma(__vector(float[2]),  __vector(float[2]),  __vector(float[2]));\npragma(mangle,\"_Z3fmaDv3_fS_S_\") __vector(float[3])  fma(__vector(float[3]),  __vector(float[3]),  __vector(float[3]));\npragma(mangle,\"_Z3fmaDv4_fS_S_\") __vector(float[4])  fma(__vector(float[4]),  __vector(float[4]),  __vector(float[4]));\npragma(mangle,\"_Z3fmaDv8_fS_S_\") __vector(float[8])  fma(__vector(float[8]),  __vector(float[8]),  __vector(float[8]));\npragma(mangle,\"_Z3fmaDv16_fS_S_\")__vector(float[16]) fma(__vector(float[16]), __vector(float[16]), __vector(float[16]));\npragma(mangle,\"_Z3fmaddd\")                double     fma(         double,              double,              double);\npragma(mangle,\"_Z3fmaDv2_dS_S_\") __vector(double[2]) fma(__vector(double[2]), __vector(double[2]), __vector(double[2]));\npragma(mangle,\"_Z3fmaDv3_dS_S_\") __vector(double[3]) fma(__vector(double[3]), __vector(double[3]), __vector(double[3]));\npragma(mangle,\"_Z3fmaDv4_dS_S_\") __vector(double[4]) fma(__vector(double[4]), __vector(double[4]), __vector(double[4]));\npragma(mangle,\"_Z3fmaDv8_dS_S_\") __vector(double[8]) fma(__vector(double[8]), __vector(double[8]), __vector(double[8]));\npragma(mangle,\"_Z3fmaDv16_dS_S_\")__vector(double[16])fma(__vector(double[16]),__vector(double[16]),__vector(double[16]));\n\n// fmax\npragma(mangle,\"_Z4fmaxff\")                float       fmax(         float,                float);\npragma(mangle,\"_Z4fmaxDv2_fS_\")  __vector(float[2])   fmax(__vector(float[2]),   __vector(float[2]));\npragma(mangle,\"_Z4fmaxDv3_fS_\")  __vector(float[3])   fmax(__vector(float[3]),   __vector(float[3]));\npragma(mangle,\"_Z4fmaxDv4_fS_\")  __vector(float[4])   fmax(__vector(float[4]),   __vector(float[4]));\npragma(mangle,\"_Z4fmaxDv8_fS_\")  __vector(float[8])   fmax(__vector(float[8]),   __vector(float[8]));\npragma(mangle,\"_Z4fmaxDv16_fS_\") __vector(float[16])  fmax(__vector(float[16]),  __vector(float[16]));\npragma(mangle,\"_Z4fmaxDv2_ff\")   __vector(float[2])   fmax(__vector(float[2]),            float);\npragma(mangle,\"_Z4fmaxDv3_ff\")   __vector(float[3])   fmax(__vector(float[3]),            float);\npragma(mangle,\"_Z4fmaxDv4_ff\")   __vector(float[4])   fmax(__vector(float[4]),            float);\npragma(mangle,\"_Z4fmaxDv8_ff\")   __vector(float[8])   fmax(__vector(float[8]),            float);\npragma(mangle,\"_Z4fmaxDv16_ff\")  __vector(float[16])  fmax(__vector(float[16]),           float);\npragma(mangle,\"_Z4fmaxdd\")                double      fmax(         double,               double);\npragma(mangle,\"_Z4fmaxDv2_dS_\")  __vector(double[2])  fmax(__vector(double[2]),  __vector(double[2]));\npragma(mangle,\"_Z4fmaxDv3_dS_\")  __vector(double[3])  fmax(__vector(double[3]),  __vector(double[3]));\npragma(mangle,\"_Z4fmaxDv4_dS_\")  __vector(double[4])  fmax(__vector(double[4]),  __vector(double[4]));\npragma(mangle,\"_Z4fmaxDv8_dS_\")  __vector(double[8])  fmax(__vector(double[8]),  __vector(double[8]));\npragma(mangle,\"_Z4fmaxDv16_dS_\") __vector(double[16]) fmax(__vector(double[16]), __vector(double[16]));\npragma(mangle,\"_Z4fmaxDv2_dd\")   __vector(double[2])  fmax(__vector(double[2]),           double);\npragma(mangle,\"_Z4fmaxDv3_dd\")   __vector(double[3])  fmax(__vector(double[3]),           double);\npragma(mangle,\"_Z4fmaxDv4_dd\")   __vector(double[4])  fmax(__vector(double[4]),           double);\npragma(mangle,\"_Z4fmaxDv8_dd\")   __vector(double[8])  fmax(__vector(double[8]),           double);\npragma(mangle,\"_Z4fmaxDv16_dd\")  __vector(double[16]) fmax(__vector(double[16]),          double);\n\n// fmin\npragma(mangle,\"_Z4fminff\")                float       fmin(         float,                float);\npragma(mangle,\"_Z4fminDv2_fS_\")  __vector(float[2])   fmin(__vector(float[2]),   __vector(float[2]));\npragma(mangle,\"_Z4fminDv3_fS_\")  __vector(float[3])   fmin(__vector(float[3]),   __vector(float[3]));\npragma(mangle,\"_Z4fminDv4_fS_\")  __vector(float[4])   fmin(__vector(float[4]),   __vector(float[4]));\npragma(mangle,\"_Z4fminDv8_fS_\")  __vector(float[8])   fmin(__vector(float[8]),   __vector(float[8]));\npragma(mangle,\"_Z4fminDv16_fS_\") __vector(float[16])  fmin(__vector(float[16]),  __vector(float[16]));\npragma(mangle,\"_Z4fminDv2_ff\")   __vector(float[2])   fmin(__vector(float[2]),            float);\npragma(mangle,\"_Z4fminDv3_ff\")   __vector(float[3])   fmin(__vector(float[3]),            float);\npragma(mangle,\"_Z4fminDv4_ff\")   __vector(float[4])   fmin(__vector(float[4]),            float);\npragma(mangle,\"_Z4fminDv8_ff\")   __vector(float[8])   fmin(__vector(float[8]),            float);\npragma(mangle,\"_Z4fminDv16_ff\")  __vector(float[16])  fmin(__vector(float[16]),           float);\npragma(mangle,\"_Z4fmindd\")                double      fmin(         double,               double);\npragma(mangle,\"_Z4fminDv2_dS_\")  __vector(double[2])  fmin(__vector(double[2]),  __vector(double[2]));\npragma(mangle,\"_Z4fminDv3_dS_\")  __vector(double[3])  fmin(__vector(double[3]),  __vector(double[3]));\npragma(mangle,\"_Z4fminDv4_dS_\")  __vector(double[4])  fmin(__vector(double[4]),  __vector(double[4]));\npragma(mangle,\"_Z4fminDv8_dS_\")  __vector(double[8])  fmin(__vector(double[8]),  __vector(double[8]));\npragma(mangle,\"_Z4fminDv16_dS_\") __vector(double[16]) fmin(__vector(double[16]), __vector(double[16]));\npragma(mangle,\"_Z4fminDv2_dd\")   __vector(double[2])  fmin(__vector(double[2]),           double);\npragma(mangle,\"_Z4fminDv3_dd\")   __vector(double[3])  fmin(__vector(double[3]),           double);\npragma(mangle,\"_Z4fminDv4_dd\")   __vector(double[4])  fmin(__vector(double[4]),           double);\npragma(mangle,\"_Z4fminDv8_dd\")   __vector(double[8])  fmin(__vector(double[8]),           double);\npragma(mangle,\"_Z4fminDv16_dd\")  __vector(double[16]) fmin(__vector(double[16]),          double);\n\n// fmod\npragma(mangle,\"_Z4fmodff\")                float       fmod(         float,                float);\npragma(mangle,\"_Z4fmodDv2_fS_\")  __vector(float[2])   fmod(__vector(float[2]),   __vector(float[2]));\npragma(mangle,\"_Z4fmodDv3_fS_\")  __vector(float[3])   fmod(__vector(float[3]),   __vector(float[3]));\npragma(mangle,\"_Z4fmodDv4_fS_\")  __vector(float[4])   fmod(__vector(float[4]),   __vector(float[4]));\npragma(mangle,\"_Z4fmodDv8_fS_\")  __vector(float[8])   fmod(__vector(float[8]),   __vector(float[8]));\npragma(mangle,\"_Z4fmodDv16_fS_\") __vector(float[16])  fmod(__vector(float[16]),  __vector(float[16]));\npragma(mangle,\"_Z4fmoddd\")                double      fmod(         double,               double);\npragma(mangle,\"_Z4fmodDv2_dS_\")  __vector(double[2])  fmod(__vector(double[2]),  __vector(double[2]));\npragma(mangle,\"_Z4fmodDv3_dS_\")  __vector(double[3])  fmod(__vector(double[3]),  __vector(double[3]));\npragma(mangle,\"_Z4fmodDv4_dS_\")  __vector(double[4])  fmod(__vector(double[4]),  __vector(double[4]));\npragma(mangle,\"_Z4fmodDv8_dS_\")  __vector(double[8])  fmod(__vector(double[8]),  __vector(double[8]));\npragma(mangle,\"_Z4fmodDv16_dS_\") __vector(double[16]) fmod(__vector(double[16]), __vector(double[16]));\n\n// fract\npragma(mangle,\"_Z5fractfPU3AS4f\")\n        float       fract(          float,       GenericPointer!(         float));\npragma(mangle,\"_Z5fractDv2_fPU3AS4S_\")\n__vector(float[2])   fract(__vector(float[2]),   GenericPointer!(__vector(float[2])));\npragma(mangle,\"_Z5fractDv3_fPU3AS4S_\")\n__vector(float[3])   fract(__vector(float[3]),   GenericPointer!(__vector(float[3])));\npragma(mangle,\"_Z5fractDv4_fPU3AS4S_\")\n__vector(float[4])   fract(__vector(float[4]),   GenericPointer!(__vector(float[4])));\npragma(mangle,\"_Z5fractDv8_fPU3AS4S_\")\n__vector(float[8])   fract(__vector(float[8]),   GenericPointer!(__vector(float[8])));\npragma(mangle,\"_Z5fractDv16_fPU3AS4S_\")\n__vector(float[16])  fract(__vector(float[16]),  GenericPointer!(__vector(float[16])));\npragma(mangle,\"_Z5fractdPU3AS4d\")\n        double      fract(          double,      GenericPointer!(         double));\npragma(mangle,\"_Z5fractDv2_dPU3AS4S_\")\n__vector(double[2])  fract(__vector(double[2]),  GenericPointer!(__vector(double[2])));\npragma(mangle,\"_Z5fractDv3_dPU3AS4S_\")\n__vector(double[3])  fract(__vector(double[3]),  GenericPointer!(__vector(double[3])));\npragma(mangle,\"_Z5fractDv4_dPU3AS4S_\")\n__vector(double[4])  fract(__vector(double[4]),  GenericPointer!(__vector(double[4])));\npragma(mangle,\"_Z5fractDv8_dPU3AS4S_\")\n__vector(double[8])  fract(__vector(double[8]),  GenericPointer!(__vector(double[8])));\npragma(mangle,\"_Z5fractDv16_dPU3AS4S_\")\n__vector(double[16]) fract(__vector(double[16]), GenericPointer!(__vector(double[16])));\npragma(mangle,\"_Z5fractfPU3AS1f\")\n        float       fract(          float,       GlobalPointer!(         float));\npragma(mangle,\"_Z5fractDv2_fPU3AS1S_\")\n__vector(float[2])   fract(__vector(float[2]),   GlobalPointer!(__vector(float[2])));\npragma(mangle,\"_Z5fractDv3_fPU3AS1S_\")\n__vector(float[3])   fract(__vector(float[3]),   GlobalPointer!(__vector(float[3])));\npragma(mangle,\"_Z5fractDv4_fPU3AS1S_\")\n__vector(float[4])   fract(__vector(float[4]),   GlobalPointer!(__vector(float[4])));\npragma(mangle,\"_Z5fractDv8_fPU3AS1S_\")\n__vector(float[8])   fract(__vector(float[8]),   GlobalPointer!(__vector(float[8])));\npragma(mangle,\"_Z5fractDv16_fPU3AS1S_\")\n__vector(float[16])  fract(__vector(float[16]),  GlobalPointer!(__vector(float[16])));\npragma(mangle,\"_Z5fractdPU3AS1d\")\n        double      fract(          double,      GlobalPointer!(         double));\npragma(mangle,\"_Z5fractDv2_dPU3AS1S_\")\n__vector(double[2])  fract(__vector(double[2]),  GlobalPointer!(__vector(double[2])));\npragma(mangle,\"_Z5fractDv3_dPU3AS1S_\")\n__vector(double[3])  fract(__vector(double[3]),  GlobalPointer!(__vector(double[3])));\npragma(mangle,\"_Z5fractDv4_dPU3AS1S_\")\n__vector(double[4])  fract(__vector(double[4]),  GlobalPointer!(__vector(double[4])));\npragma(mangle,\"_Z5fractDv8_dPU3AS1S_\")\n__vector(double[8])  fract(__vector(double[8]),  GlobalPointer!(__vector(double[8])));\npragma(mangle,\"_Z5fractDv16_dPU3AS1S_\")\n__vector(double[16]) fract(__vector(double[16]), GlobalPointer!(__vector(double[16])));\npragma(mangle,\"_Z5fractfPU3AS3f\")\n        float       fract(          float,       SharedPointer!(         float));\npragma(mangle,\"_Z5fractDv2_fPU3AS3S_\")\n__vector(float[2])   fract(__vector(float[2]),   SharedPointer!(__vector(float[2])));\npragma(mangle,\"_Z5fractDv3_fPU3AS3S_\")\n__vector(float[3])   fract(__vector(float[3]),   SharedPointer!(__vector(float[3])));\npragma(mangle,\"_Z5fractDv4_fPU3AS3S_\")\n__vector(float[4])   fract(__vector(float[4]),   SharedPointer!(__vector(float[4])));\npragma(mangle,\"_Z5fractDv8_fPU3AS3S_\")\n__vector(float[8])   fract(__vector(float[8]),   SharedPointer!(__vector(float[8])));\npragma(mangle,\"_Z5fractDv16_fPU3AS3S_\")\n__vector(float[16])  fract(__vector(float[16]),  SharedPointer!(__vector(float[16])));\npragma(mangle,\"_Z5fractdPU3AS3d\")\n        double      fract(          double,      SharedPointer!(         double));\npragma(mangle,\"_Z5fractDv2_dPU3AS3S_\")\n__vector(double[2])  fract(__vector(double[2]),  SharedPointer!(__vector(double[2])));\npragma(mangle,\"_Z5fractDv3_dPU3AS3S_\")\n__vector(double[3])  fract(__vector(double[3]),  SharedPointer!(__vector(double[3])));\npragma(mangle,\"_Z5fractDv4_dPU3AS3S_\")\n__vector(double[4])  fract(__vector(double[4]),  SharedPointer!(__vector(double[4])));\npragma(mangle,\"_Z5fractDv8_dPU3AS3S_\")\n__vector(double[8])  fract(__vector(double[8]),  SharedPointer!(__vector(double[8])));\npragma(mangle,\"_Z5fractDv16_dPU3AS3S_\")\n__vector(double[16]) fract(__vector(double[16]), SharedPointer!(__vector(double[16])));\npragma(mangle,\"_Z5fractfPf\")\n        float       fract(          float,       PrivatePointer!(         float));\npragma(mangle,\"_Z5fractDv2_fPS_\")\n__vector(float[2])   fract(__vector(float[2]),   PrivatePointer!(__vector(float[2])));\npragma(mangle,\"_Z5fractDv3_fPS_\")\n__vector(float[3])   fract(__vector(float[3]),   PrivatePointer!(__vector(float[3])));\npragma(mangle,\"_Z5fractDv4_fPS_\")\n__vector(float[4])   fract(__vector(float[4]),   PrivatePointer!(__vector(float[4])));\npragma(mangle,\"_Z5fractDv8_fPS_\")\n__vector(float[8])   fract(__vector(float[8]),   PrivatePointer!(__vector(float[8])));\npragma(mangle,\"_Z5fractDv16_fPS_\")\n__vector(float[16])  fract(__vector(float[16]),  PrivatePointer!(__vector(float[16])));\npragma(mangle,\"_Z5fractdPd\")\n        double      fract(          double,      PrivatePointer!(         double));\npragma(mangle,\"_Z5fractDv2_dPS_\")\n__vector(double[2])  fract(__vector(double[2]),  PrivatePointer!(__vector(double[2])));\npragma(mangle,\"_Z5fractDv3_dPS_\")\n__vector(double[3])  fract(__vector(double[3]),  PrivatePointer!(__vector(double[3])));\npragma(mangle,\"_Z5fractDv4_dPS_\")\n__vector(double[4])  fract(__vector(double[4]),  PrivatePointer!(__vector(double[4])));\npragma(mangle,\"_Z5fractDv8_dPS_\")\n__vector(double[8])  fract(__vector(double[8]),  PrivatePointer!(__vector(double[8])));\npragma(mangle,\"_Z5fractDv16_dPS_\")\n__vector(double[16]) fract(__vector(double[16]), PrivatePointer!(__vector(double[16])));\n\n// frexp\npragma(mangle,\"_Z5frexpfPU3AS4i\")\n        float       frexp(          float,       GenericPointer!(         int));\npragma(mangle,\"_Z5frexpDv2_fPU3AS4Dv2_i\")\n__vector(float[2])   frexp(__vector(float[2]),   GenericPointer!(__vector(int[2])));\npragma(mangle,\"_Z5frexpDv3_fPU3AS4Dv3_i\")\n__vector(float[3])   frexp(__vector(float[3]),   GenericPointer!(__vector(int[3])));\npragma(mangle,\"_Z5frexpDv4_fPU3AS4Dv4_i\")\n__vector(float[4])   frexp(__vector(float[4]),   GenericPointer!(__vector(int[4])));\npragma(mangle,\"_Z5frexpDv8_fPU3AS4Dv8_i\")\n__vector(float[8])   frexp(__vector(float[8]),   GenericPointer!(__vector(int[8])));\npragma(mangle,\"_Z5frexpDv16_fPU3AS4Dv16_i\")\n__vector(float[16])  frexp(__vector(float[16]),  GenericPointer!(__vector(int[16])));\npragma(mangle,\"_Z5frexpdPU3AS4i\")\n        double      frexp(          double,      GenericPointer!(         int));\npragma(mangle,\"_Z5frexpDv2_dPU3AS4Dv2_i\")\n__vector(double[2])  frexp(__vector(double[2]),  GenericPointer!(__vector(int[2])));\npragma(mangle,\"_Z5frexpDv3_dPU3AS4Dv3_i\")\n__vector(double[3])  frexp(__vector(double[3]),  GenericPointer!(__vector(int[3])));\npragma(mangle,\"_Z5frexpDv4_dPU3AS4Dv4_i\")\n__vector(double[4])  frexp(__vector(double[4]),  GenericPointer!(__vector(int[4])));\npragma(mangle,\"_Z5frexpDv8_dPU3AS4Dv8_i\")\n__vector(double[8])  frexp(__vector(double[8]),  GenericPointer!(__vector(int[8])));\npragma(mangle,\"_Z5frexpDv16_dPU3AS4Dv16_i\")\n__vector(double[16]) frexp(__vector(double[16]), GenericPointer!(__vector(int[16])));\npragma(mangle,\"_Z5frexpfPU3AS1i\")\n        float       frexp(          float,       GlobalPointer!(         int));\npragma(mangle,\"_Z5frexpDv2_fPU3AS1Dv2_i\")\n__vector(float[2])   frexp(__vector(float[2]),   GlobalPointer!(__vector(int[2])));\npragma(mangle,\"_Z5frexpDv3_fPU3AS1Dv3_i\")\n__vector(float[3])   frexp(__vector(float[3]),   GlobalPointer!(__vector(int[3])));\npragma(mangle,\"_Z5frexpDv4_fPU3AS1Dv4_i\")\n__vector(float[4])   frexp(__vector(float[4]),   GlobalPointer!(__vector(int[4])));\npragma(mangle,\"_Z5frexpDv8_fPU3AS1Dv8_i\")\n__vector(float[8])   frexp(__vector(float[8]),   GlobalPointer!(__vector(int[8])));\npragma(mangle,\"_Z5frexpDv16_fPU3AS1Dv16_i\")\n__vector(float[16])  frexp(__vector(float[16]),  GlobalPointer!(__vector(int[16])));\npragma(mangle,\"_Z5frexpdPU3AS1i\")\n        double      frexp(          double,      GlobalPointer!(         int));\npragma(mangle,\"_Z5frexpDv2_dPU3AS1Dv2_i\")\n__vector(double[2])  frexp(__vector(double[2]),  GlobalPointer!(__vector(int[2])));\npragma(mangle,\"_Z5frexpDv3_dPU3AS1Dv3_i\")\n__vector(double[3])  frexp(__vector(double[3]),  GlobalPointer!(__vector(int[3])));\npragma(mangle,\"_Z5frexpDv4_dPU3AS1Dv4_i\")\n__vector(double[4])  frexp(__vector(double[4]),  GlobalPointer!(__vector(int[4])));\npragma(mangle,\"_Z5frexpDv8_dPU3AS1Dv8_i\")\n__vector(double[8])  frexp(__vector(double[8]),  GlobalPointer!(__vector(int[8])));\npragma(mangle,\"_Z5frexpDv16_dPU3AS1Dv16_i\")\n__vector(double[16]) frexp(__vector(double[16]), GlobalPointer!(__vector(int[16])));\npragma(mangle,\"_Z5frexpfPU3AS3i\")\n        float       frexp(          float,       SharedPointer!(         int));\npragma(mangle,\"_Z5frexpDv2_fPU3AS3Dv2_i\")\n__vector(float[2])   frexp(__vector(float[2]),   SharedPointer!(__vector(int[2])));\npragma(mangle,\"_Z5frexpDv3_fPU3AS3Dv3_i\")\n__vector(float[3])   frexp(__vector(float[3]),   SharedPointer!(__vector(int[3])));\npragma(mangle,\"_Z5frexpDv4_fPU3AS3Dv4_i\")\n__vector(float[4])   frexp(__vector(float[4]),   SharedPointer!(__vector(int[4])));\npragma(mangle,\"_Z5frexpDv8_fPU3AS3Dv8_i\")\n__vector(float[8])   frexp(__vector(float[8]),   SharedPointer!(__vector(int[8])));\npragma(mangle,\"_Z5frexpDv16_fPU3AS3Dv16_i\")\n__vector(float[16])  frexp(__vector(float[16]),  SharedPointer!(__vector(int[16])));\npragma(mangle,\"_Z5frexpdPU3AS3i\")\n        double      frexp(          double,      SharedPointer!(         int));\npragma(mangle,\"_Z5frexpDv2_dPU3AS3Dv2_i\")\n__vector(double[2])  frexp(__vector(double[2]),  SharedPointer!(__vector(int[2])));\npragma(mangle,\"_Z5frexpDv3_dPU3AS3Dv3_i\")\n__vector(double[3])  frexp(__vector(double[3]),  SharedPointer!(__vector(int[3])));\npragma(mangle,\"_Z5frexpDv4_dPU3AS3Dv4_i\")\n__vector(double[4])  frexp(__vector(double[4]),  SharedPointer!(__vector(int[4])));\npragma(mangle,\"_Z5frexpDv8_dPU3AS3Dv8_i\")\n__vector(double[8])  frexp(__vector(double[8]),  SharedPointer!(__vector(int[8])));\npragma(mangle,\"_Z5frexpDv16_dPU3AS3Dv16_i\")\n__vector(double[16]) frexp(__vector(double[16]), SharedPointer!(__vector(int[16])));\npragma(mangle,\"_Z5frexpfPi\")\n        float       frexp(          float,       PrivatePointer!(         int));\npragma(mangle,\"_Z5frexpDv2_fPDv2_i\")\n__vector(float[2])   frexp(__vector(float[2]),   PrivatePointer!(__vector(int[2])));\npragma(mangle,\"_Z5frexpDv3_fPDv3_i\")\n__vector(float[3])   frexp(__vector(float[3]),   PrivatePointer!(__vector(int[3])));\npragma(mangle,\"_Z5frexpDv4_fPDv4_i\")\n__vector(float[4])   frexp(__vector(float[4]),   PrivatePointer!(__vector(int[4])));\npragma(mangle,\"_Z5frexpDv8_fPDv8_i\")\n__vector(float[8])   frexp(__vector(float[8]),   PrivatePointer!(__vector(int[8])));\npragma(mangle,\"_Z5frexpDv16_fPDv16_i\")\n__vector(float[16])  frexp(__vector(float[16]),  PrivatePointer!(__vector(int[16])));\npragma(mangle,\"_Z5frexpdPi\")\n        double      frexp(          double,      PrivatePointer!(         int));\npragma(mangle,\"_Z5frexpDv2_dPDv2_i\")\n__vector(double[2])  frexp(__vector(double[2]),  PrivatePointer!(__vector(int[2])));\npragma(mangle,\"_Z5frexpDv3_dPDv3_i\")\n__vector(double[3])  frexp(__vector(double[3]),  PrivatePointer!(__vector(int[3])));\npragma(mangle,\"_Z5frexpDv4_dPDv4_i\")\n__vector(double[4])  frexp(__vector(double[4]),  PrivatePointer!(__vector(int[4])));\npragma(mangle,\"_Z5frexpDv8_dPDv8_i\")\n__vector(double[8])  frexp(__vector(double[8]),  PrivatePointer!(__vector(int[8])));\npragma(mangle,\"_Z5frexpDv16_dPDv16_i\")\n__vector(double[16]) frexp(__vector(double[16]), PrivatePointer!(__vector(int[16])));\n\n// hypot\npragma(mangle,\"_Z5hypotff\")                float       hypot(         float,                float);\npragma(mangle,\"_Z5hypotDv2_fS_\")  __vector(float[2])   hypot(__vector(float[2]),   __vector(float[2]));\npragma(mangle,\"_Z5hypotDv3_fS_\")  __vector(float[3])   hypot(__vector(float[3]),   __vector(float[3]));\npragma(mangle,\"_Z5hypotDv4_fS_\")  __vector(float[4])   hypot(__vector(float[4]),   __vector(float[4]));\npragma(mangle,\"_Z5hypotDv8_fS_\")  __vector(float[8])   hypot(__vector(float[8]),   __vector(float[8]));\npragma(mangle,\"_Z5hypotDv16_fS_\") __vector(float[16])  hypot(__vector(float[16]),  __vector(float[16]));\npragma(mangle,\"_Z5hypotdd\")                double      hypot(         double,               double);\npragma(mangle,\"_Z5hypotDv2_dS_\")  __vector(double[2])  hypot(__vector(double[2]),  __vector(double[2]));\npragma(mangle,\"_Z5hypotDv3_dS_\")  __vector(double[3])  hypot(__vector(double[3]),  __vector(double[3]));\npragma(mangle,\"_Z5hypotDv4_dS_\")  __vector(double[4])  hypot(__vector(double[4]),  __vector(double[4]));\npragma(mangle,\"_Z5hypotDv8_dS_\")  __vector(double[8])  hypot(__vector(double[8]),  __vector(double[8]));\npragma(mangle,\"_Z5hypotDv16_dS_\") __vector(double[16]) hypot(__vector(double[16]), __vector(double[16]));\n\n// ilogb\npragma(mangle,\"_Z5ilogbf\")               int      ilogb(         float);\npragma(mangle,\"_Z5ilogbDv2_f\")  __vector(int[2])  ilogb(__vector(float[2]));\npragma(mangle,\"_Z5ilogbDv3_f\")  __vector(int[3])  ilogb(__vector(float[3]));\npragma(mangle,\"_Z5ilogbDv4_f\")  __vector(int[4])  ilogb(__vector(float[4]));\npragma(mangle,\"_Z5ilogbDv8_f\")  __vector(int[8])  ilogb(__vector(float[8]));\npragma(mangle,\"_Z5ilogbDv16_f\") __vector(int[16]) ilogb(__vector(float[16]));\npragma(mangle,\"_Z5ilogbd\")               int      ilogb(         double);\npragma(mangle,\"_Z5ilogbDv2_d\")  __vector(int[2])  ilogb(__vector(double[2]));\npragma(mangle,\"_Z5ilogbDv3_d\")  __vector(int[3])  ilogb(__vector(double[3]));\npragma(mangle,\"_Z5ilogbDv4_d\")  __vector(int[4])  ilogb(__vector(double[4]));\npragma(mangle,\"_Z5ilogbDv8_d\")  __vector(int[8])  ilogb(__vector(double[8]));\npragma(mangle,\"_Z5ilogbDv16_d\") __vector(int[16]) ilogb(__vector(double[16]));\n\n// ldexp\npragma(mangle,\"_Z5ldexpfi\")                    float       ldexp(         float,                int);\npragma(mangle,\"_Z5ldexpDv2_fDv2_i\")   __vector(float[2])   ldexp(__vector(float[2]),   __vector(int[2]));\npragma(mangle,\"_Z5ldexpDv3_fDv3_i\")   __vector(float[3])   ldexp(__vector(float[3]),   __vector(int[3]));\npragma(mangle,\"_Z5ldexpDv4_fDv4_i\")   __vector(float[4])   ldexp(__vector(float[4]),   __vector(int[4]));\npragma(mangle,\"_Z5ldexpDv8_fDv8_i\")   __vector(float[8])   ldexp(__vector(float[8]),   __vector(int[8]));\npragma(mangle,\"_Z5ldexpDv16_fDv16_i\") __vector(float[16])  ldexp(__vector(float[16]),  __vector(int[16]));\npragma(mangle,\"_Z5ldexpDv2_fi\")       __vector(float[2])   ldexp(__vector(float[2]),            int);\npragma(mangle,\"_Z5ldexpDv3_fi\")       __vector(float[3])   ldexp(__vector(float[3]),            int);\npragma(mangle,\"_Z5ldexpDv4_fi\")       __vector(float[4])   ldexp(__vector(float[4]),            int);\npragma(mangle,\"_Z5ldexpDv8_fi\")       __vector(float[8])   ldexp(__vector(float[8]),            int);\npragma(mangle,\"_Z5ldexpDv16_fi\")      __vector(float[16])  ldexp(__vector(float[16]),           int);\npragma(mangle,\"_Z5ldexpdi\")                    double      ldexp(         double,               int);\npragma(mangle,\"_Z5ldexpDv2_dDv2_i\")   __vector(double[2])  ldexp(__vector(double[2]),  __vector(int[2]));\npragma(mangle,\"_Z5ldexpDv3_dDv3_i\")   __vector(double[3])  ldexp(__vector(double[3]),  __vector(int[3]));\npragma(mangle,\"_Z5ldexpDv4_dDv4_i\")   __vector(double[4])  ldexp(__vector(double[4]),  __vector(int[4]));\npragma(mangle,\"_Z5ldexpDv8_dDv8_i\")   __vector(double[8])  ldexp(__vector(double[8]),  __vector(int[8]));\npragma(mangle,\"_Z5ldexpDv16_dDv16_i\") __vector(double[16]) ldexp(__vector(double[16]), __vector(int[16]));\npragma(mangle,\"_Z5ldexpDv2_di\")       __vector(double[2])  ldexp(__vector(double[2]),           int);\npragma(mangle,\"_Z5ldexpDv3_di\")       __vector(double[3])  ldexp(__vector(double[3]),           int);\npragma(mangle,\"_Z5ldexpDv4_di\")       __vector(double[4])  ldexp(__vector(double[4]),           int);\npragma(mangle,\"_Z5ldexpDv8_di\")       __vector(double[8])  ldexp(__vector(double[8]),           int);\npragma(mangle,\"_Z5ldexpDv16_di\")      __vector(double[16]) ldexp(__vector(double[16]),          int);\n\n// lgamma\npragma(mangle,\"_Z6lgammaf\")               float       lgamma(         float);\npragma(mangle,\"_Z6lgammaDv2_f\")  __vector(float[2])   lgamma(__vector(float[2]));\npragma(mangle,\"_Z6lgammaDv3_f\")  __vector(float[3])   lgamma(__vector(float[3]));\npragma(mangle,\"_Z6lgammaDv4_f\")  __vector(float[4])   lgamma(__vector(float[4]));\npragma(mangle,\"_Z6lgammaDv8_f\")  __vector(float[8])   lgamma(__vector(float[8]));\npragma(mangle,\"_Z6lgammaDv16_f\") __vector(float[16])  lgamma(__vector(float[16]));\npragma(mangle,\"_Z6lgammad\")               double      lgamma(         double);\npragma(mangle,\"_Z6lgammaDv2_d\")  __vector(double[2])  lgamma(__vector(double[2]));\npragma(mangle,\"_Z6lgammaDv3_d\")  __vector(double[3])  lgamma(__vector(double[3]));\npragma(mangle,\"_Z6lgammaDv4_d\")  __vector(double[4])  lgamma(__vector(double[4]));\npragma(mangle,\"_Z6lgammaDv8_d\")  __vector(double[8])  lgamma(__vector(double[8]));\npragma(mangle,\"_Z6lgammaDv16_d\") __vector(double[16]) lgamma(__vector(double[16]));\n\n// lgamma_r\npragma(mangle,\"_Z8lgamma_rfPU3AS4i\")\n        float       lgamma_r(          float,       GenericPointer!(         int));\npragma(mangle,\"_Z8lgamma_rDv2_fPU3AS4Dv2_i\")\n__vector(float[2])   lgamma_r(__vector(float[2]),   GenericPointer!(__vector(int[2])));\npragma(mangle,\"_Z8lgamma_rDv3_fPU3AS4Dv3_i\")\n__vector(float[3])   lgamma_r(__vector(float[3]),   GenericPointer!(__vector(int[3])));\npragma(mangle,\"_Z8lgamma_rDv4_fPU3AS4Dv4_i\")\n__vector(float[4])   lgamma_r(__vector(float[4]),   GenericPointer!(__vector(int[4])));\npragma(mangle,\"_Z8lgamma_rDv8_fPU3AS4Dv8_i\")\n__vector(float[8])   lgamma_r(__vector(float[8]),   GenericPointer!(__vector(int[8])));\npragma(mangle,\"_Z8lgamma_rDv16_fPU3AS4Dv16_i\")\n__vector(float[16])  lgamma_r(__vector(float[16]),  GenericPointer!(__vector(int[16])));\npragma(mangle,\"_Z8lgamma_rdPU3AS4i\")\n        double      lgamma_r(          double,      GenericPointer!(         int));\npragma(mangle,\"_Z8lgamma_rDv2_dPU3AS4Dv2_i\")\n__vector(double[2])  lgamma_r(__vector(double[2]),  GenericPointer!(__vector(int[2])));\npragma(mangle,\"_Z8lgamma_rDv3_dPU3AS4Dv3_i\")\n__vector(double[3])  lgamma_r(__vector(double[3]),  GenericPointer!(__vector(int[3])));\npragma(mangle,\"_Z8lgamma_rDv4_dPU3AS4Dv4_i\")\n__vector(double[4])  lgamma_r(__vector(double[4]),  GenericPointer!(__vector(int[4])));\npragma(mangle,\"_Z8lgamma_rDv8_dPU3AS4Dv8_i\")\n__vector(double[8])  lgamma_r(__vector(double[8]),  GenericPointer!(__vector(int[8])));\npragma(mangle,\"_Z8lgamma_rDv16_dPU3AS4Dv16_i\")\n__vector(double[16]) lgamma_r(__vector(double[16]), GenericPointer!(__vector(int[16])));\npragma(mangle,\"_Z8lgamma_rfPU3AS1i\")\n        float       lgamma_r(          float,       GlobalPointer!(         int));\npragma(mangle,\"_Z8lgamma_rDv2_fPU3AS1Dv2_i\")\n__vector(float[2])   lgamma_r(__vector(float[2]),   GlobalPointer!(__vector(int[2])));\npragma(mangle,\"_Z8lgamma_rDv3_fPU3AS1Dv3_i\")\n__vector(float[3])   lgamma_r(__vector(float[3]),   GlobalPointer!(__vector(int[3])));\npragma(mangle,\"_Z8lgamma_rDv4_fPU3AS1Dv4_i\")\n__vector(float[4])   lgamma_r(__vector(float[4]),   GlobalPointer!(__vector(int[4])));\npragma(mangle,\"_Z8lgamma_rDv8_fPU3AS1Dv8_i\")\n__vector(float[8])   lgamma_r(__vector(float[8]),   GlobalPointer!(__vector(int[8])));\npragma(mangle,\"_Z8lgamma_rDv16_fPU3AS1Dv16_i\")\n__vector(float[16])  lgamma_r(__vector(float[16]),  GlobalPointer!(__vector(int[16])));\npragma(mangle,\"_Z8lgamma_rdPU3AS1i\")\n        double      lgamma_r(          double,      GlobalPointer!(         int));\npragma(mangle,\"_Z8lgamma_rDv2_dPU3AS1Dv2_i\")\n__vector(double[2])  lgamma_r(__vector(double[2]),  GlobalPointer!(__vector(int[2])));\npragma(mangle,\"_Z8lgamma_rDv3_dPU3AS1Dv3_i\")\n__vector(double[3])  lgamma_r(__vector(double[3]),  GlobalPointer!(__vector(int[3])));\npragma(mangle,\"_Z8lgamma_rDv4_dPU3AS1Dv4_i\")\n__vector(double[4])  lgamma_r(__vector(double[4]),  GlobalPointer!(__vector(int[4])));\npragma(mangle,\"_Z8lgamma_rDv8_dPU3AS1Dv8_i\")\n__vector(double[8])  lgamma_r(__vector(double[8]),  GlobalPointer!(__vector(int[8])));\npragma(mangle,\"_Z8lgamma_rDv16_dPU3AS1Dv16_i\")\n__vector(double[16]) lgamma_r(__vector(double[16]), GlobalPointer!(__vector(int[16])));\npragma(mangle,\"_Z8lgamma_rfPU3AS3i\")\n        float       lgamma_r(          float,       SharedPointer!(         int));\npragma(mangle,\"_Z8lgamma_rDv2_fPU3AS3Dv2_i\")\n__vector(float[2])   lgamma_r(__vector(float[2]),   SharedPointer!(__vector(int[2])));\npragma(mangle,\"_Z8lgamma_rDv3_fPU3AS3Dv3_i\")\n__vector(float[3])   lgamma_r(__vector(float[3]),   SharedPointer!(__vector(int[3])));\npragma(mangle,\"_Z8lgamma_rDv4_fPU3AS3Dv4_i\")\n__vector(float[4])   lgamma_r(__vector(float[4]),   SharedPointer!(__vector(int[4])));\npragma(mangle,\"_Z8lgamma_rDv8_fPU3AS3Dv8_i\")\n__vector(float[8])   lgamma_r(__vector(float[8]),   SharedPointer!(__vector(int[8])));\npragma(mangle,\"_Z8lgamma_rDv16_fPU3AS3Dv16_i\")\n__vector(float[16])  lgamma_r(__vector(float[16]),  SharedPointer!(__vector(int[16])));\npragma(mangle,\"_Z8lgamma_rdPU3AS3i\")\n        double      lgamma_r(          double,      SharedPointer!(         int));\npragma(mangle,\"_Z8lgamma_rDv2_dPU3AS3Dv2_i\")\n__vector(double[2])  lgamma_r(__vector(double[2]),  SharedPointer!(__vector(int[2])));\npragma(mangle,\"_Z8lgamma_rDv3_dPU3AS3Dv3_i\")\n__vector(double[3])  lgamma_r(__vector(double[3]),  SharedPointer!(__vector(int[3])));\npragma(mangle,\"_Z8lgamma_rDv4_dPU3AS3Dv4_i\")\n__vector(double[4])  lgamma_r(__vector(double[4]),  SharedPointer!(__vector(int[4])));\npragma(mangle,\"_Z8lgamma_rDv8_dPU3AS3Dv8_i\")\n__vector(double[8])  lgamma_r(__vector(double[8]),  SharedPointer!(__vector(int[8])));\npragma(mangle,\"_Z8lgamma_rDv16_dPU3AS3Dv16_i\")\n__vector(double[16]) lgamma_r(__vector(double[16]), SharedPointer!(__vector(int[16])));\npragma(mangle,\"_Z8lgamma_rfPi\")\n        float       lgamma_r(          float,       PrivatePointer!(         int));\npragma(mangle,\"_Z8lgamma_rDv2_fPDv2_i\")\n__vector(float[2])   lgamma_r(__vector(float[2]),   PrivatePointer!(__vector(int[2])));\npragma(mangle,\"_Z8lgamma_rDv3_fPDv3_i\")\n__vector(float[3])   lgamma_r(__vector(float[3]),   PrivatePointer!(__vector(int[3])));\npragma(mangle,\"_Z8lgamma_rDv4_fPDv4_i\")\n__vector(float[4])   lgamma_r(__vector(float[4]),   PrivatePointer!(__vector(int[4])));\npragma(mangle,\"_Z8lgamma_rDv8_fPDv8_i\")\n__vector(float[8])   lgamma_r(__vector(float[8]),   PrivatePointer!(__vector(int[8])));\npragma(mangle,\"_Z8lgamma_rDv16_fPDv16_i\")\n__vector(float[16])  lgamma_r(__vector(float[16]),  PrivatePointer!(__vector(int[16])));\npragma(mangle,\"_Z8lgamma_rdPi\")\n        double      lgamma_r(          double,      PrivatePointer!(         int));\npragma(mangle,\"_Z8lgamma_rDv2_dPDv2_i\")\n__vector(double[2])  lgamma_r(__vector(double[2]),  PrivatePointer!(__vector(int[2])));\npragma(mangle,\"_Z8lgamma_rDv3_dPDv3_i\")\n__vector(double[3])  lgamma_r(__vector(double[3]),  PrivatePointer!(__vector(int[3])));\npragma(mangle,\"_Z8lgamma_rDv4_dPDv4_i\")\n__vector(double[4])  lgamma_r(__vector(double[4]),  PrivatePointer!(__vector(int[4])));\npragma(mangle,\"_Z8lgamma_rDv8_dPDv8_i\")\n__vector(double[8])  lgamma_r(__vector(double[8]),  PrivatePointer!(__vector(int[8])));\npragma(mangle,\"_Z8lgamma_rDv16_dPDv16_i\")\n__vector(double[16]) lgamma_r(__vector(double[16]), PrivatePointer!(__vector(int[16])));\n\n// log\npragma(mangle,\"_Z3logf\")               float       log(         float);\npragma(mangle,\"_Z3logDv2_f\")  __vector(float[2])   log(__vector(float[2]));\npragma(mangle,\"_Z3logDv3_f\")  __vector(float[3])   log(__vector(float[3]));\npragma(mangle,\"_Z3logDv4_f\")  __vector(float[4])   log(__vector(float[4]));\npragma(mangle,\"_Z3logDv8_f\")  __vector(float[8])   log(__vector(float[8]));\npragma(mangle,\"_Z3logDv16_f\") __vector(float[16])  log(__vector(float[16]));\npragma(mangle,\"_Z3logd\")               double      log(         double);\npragma(mangle,\"_Z3logDv2_d\")  __vector(double[2])  log(__vector(double[2]));\npragma(mangle,\"_Z3logDv3_d\")  __vector(double[3])  log(__vector(double[3]));\npragma(mangle,\"_Z3logDv4_d\")  __vector(double[4])  log(__vector(double[4]));\npragma(mangle,\"_Z3logDv8_d\")  __vector(double[8])  log(__vector(double[8]));\npragma(mangle,\"_Z3logDv16_d\") __vector(double[16]) log(__vector(double[16]));\n\n// log2\npragma(mangle,\"_Z4log2f\")               float       log2(         float);\npragma(mangle,\"_Z4log2Dv2_f\")  __vector(float[2])   log2(__vector(float[2]));\npragma(mangle,\"_Z4log2Dv3_f\")  __vector(float[3])   log2(__vector(float[3]));\npragma(mangle,\"_Z4log2Dv4_f\")  __vector(float[4])   log2(__vector(float[4]));\npragma(mangle,\"_Z4log2Dv8_f\")  __vector(float[8])   log2(__vector(float[8]));\npragma(mangle,\"_Z4log2Dv16_f\") __vector(float[16])  log2(__vector(float[16]));\npragma(mangle,\"_Z4log2d\")               double      log2(         double);\npragma(mangle,\"_Z4log2Dv2_d\")  __vector(double[2])  log2(__vector(double[2]));\npragma(mangle,\"_Z4log2Dv3_d\")  __vector(double[3])  log2(__vector(double[3]));\npragma(mangle,\"_Z4log2Dv4_d\")  __vector(double[4])  log2(__vector(double[4]));\npragma(mangle,\"_Z4log2Dv8_d\")  __vector(double[8])  log2(__vector(double[8]));\npragma(mangle,\"_Z4log2Dv16_d\") __vector(double[16]) log2(__vector(double[16]));\n\n// log10\npragma(mangle,\"_Z5log10f\")               float       log10(         float);\npragma(mangle,\"_Z5log10Dv2_f\")  __vector(float[2])   log10(__vector(float[2]));\npragma(mangle,\"_Z5log10Dv3_f\")  __vector(float[3])   log10(__vector(float[3]));\npragma(mangle,\"_Z5log10Dv4_f\")  __vector(float[4])   log10(__vector(float[4]));\npragma(mangle,\"_Z5log10Dv8_f\")  __vector(float[8])   log10(__vector(float[8]));\npragma(mangle,\"_Z5log10Dv16_f\") __vector(float[16])  log10(__vector(float[16]));\npragma(mangle,\"_Z5log10d\")               double      log10(         double);\npragma(mangle,\"_Z5log10Dv2_d\")  __vector(double[2])  log10(__vector(double[2]));\npragma(mangle,\"_Z5log10Dv3_d\")  __vector(double[3])  log10(__vector(double[3]));\npragma(mangle,\"_Z5log10Dv4_d\")  __vector(double[4])  log10(__vector(double[4]));\npragma(mangle,\"_Z5log10Dv8_d\")  __vector(double[8])  log10(__vector(double[8]));\npragma(mangle,\"_Z5log10Dv16_d\") __vector(double[16]) log10(__vector(double[16]));\n\n// log1p\npragma(mangle,\"_Z5log1pf\")               float       log1p(         float);\npragma(mangle,\"_Z5log1pDv2_f\")  __vector(float[2])   log1p(__vector(float[2]));\npragma(mangle,\"_Z5log1pDv3_f\")  __vector(float[3])   log1p(__vector(float[3]));\npragma(mangle,\"_Z5log1pDv4_f\")  __vector(float[4])   log1p(__vector(float[4]));\npragma(mangle,\"_Z5log1pDv8_f\")  __vector(float[8])   log1p(__vector(float[8]));\npragma(mangle,\"_Z5log1pDv16_f\") __vector(float[16])  log1p(__vector(float[16]));\npragma(mangle,\"_Z5log1pd\")               double      log1p(         double);\npragma(mangle,\"_Z5log1pDv2_d\")  __vector(double[2])  log1p(__vector(double[2]));\npragma(mangle,\"_Z5log1pDv3_d\")  __vector(double[3])  log1p(__vector(double[3]));\npragma(mangle,\"_Z5log1pDv4_d\")  __vector(double[4])  log1p(__vector(double[4]));\npragma(mangle,\"_Z5log1pDv8_d\")  __vector(double[8])  log1p(__vector(double[8]));\npragma(mangle,\"_Z5log1pDv16_d\") __vector(double[16]) log1p(__vector(double[16]));\n\n// logb\npragma(mangle,\"_Z4logbf\")               float       logb(         float);\npragma(mangle,\"_Z4logbDv2_f\")  __vector(float[2])   logb(__vector(float[2]));\npragma(mangle,\"_Z4logbDv3_f\")  __vector(float[3])   logb(__vector(float[3]));\npragma(mangle,\"_Z4logbDv4_f\")  __vector(float[4])   logb(__vector(float[4]));\npragma(mangle,\"_Z4logbDv8_f\")  __vector(float[8])   logb(__vector(float[8]));\npragma(mangle,\"_Z4logbDv16_f\") __vector(float[16])  logb(__vector(float[16]));\npragma(mangle,\"_Z4logbd\")               double      logb(         double);\npragma(mangle,\"_Z4logbDv2_d\")  __vector(double[2])  logb(__vector(double[2]));\npragma(mangle,\"_Z4logbDv3_d\")  __vector(double[3])  logb(__vector(double[3]));\npragma(mangle,\"_Z4logbDv4_d\")  __vector(double[4])  logb(__vector(double[4]));\npragma(mangle,\"_Z4logbDv8_d\")  __vector(double[8])  logb(__vector(double[8]));\npragma(mangle,\"_Z4logbDv16_d\") __vector(double[16]) logb(__vector(double[16]));\n\n// mad\npragma(mangle,\"_Z3madfff\")                float      mad(         float,                float,               float);\npragma(mangle,\"_Z3madDv2_fS_S_\") __vector(float[2])  mad(__vector(float[2]),  __vector(float[2]),  __vector(float[2]));\npragma(mangle,\"_Z3madDv3_fS_S_\") __vector(float[3])  mad(__vector(float[3]),  __vector(float[3]),  __vector(float[3]));\npragma(mangle,\"_Z3madDv4_fS_S_\") __vector(float[4])  mad(__vector(float[4]),  __vector(float[4]),  __vector(float[4]));\npragma(mangle,\"_Z3madDv8_fS_S_\") __vector(float[8])  mad(__vector(float[8]),  __vector(float[8]),  __vector(float[8]));\npragma(mangle,\"_Z3madDv16_fS_S_\")__vector(float[16]) mad(__vector(float[16]), __vector(float[16]), __vector(float[16]));\npragma(mangle,\"_Z3madddd\")                double     mad(         double,              double,              double);\npragma(mangle,\"_Z3madDv2_dS_S_\") __vector(double[2]) mad(__vector(double[2]), __vector(double[2]), __vector(double[2]));\npragma(mangle,\"_Z3madDv3_dS_S_\") __vector(double[3]) mad(__vector(double[3]), __vector(double[3]), __vector(double[3]));\npragma(mangle,\"_Z3madDv4_dS_S_\") __vector(double[4]) mad(__vector(double[4]), __vector(double[4]), __vector(double[4]));\npragma(mangle,\"_Z3madDv8_dS_S_\") __vector(double[8]) mad(__vector(double[8]), __vector(double[8]), __vector(double[8]));\npragma(mangle,\"_Z3madDv16_dS_S_\")__vector(double[16])mad(__vector(double[16]),__vector(double[16]),__vector(double[16]));\n\n// maxmag\npragma(mangle,\"_Z6maxmagff\")                float       maxmag(         float,                float);\npragma(mangle,\"_Z6maxmagDv2_fS_\")  __vector(float[2])   maxmag(__vector(float[2]),   __vector(float[2]));\npragma(mangle,\"_Z6maxmagDv3_fS_\")  __vector(float[3])   maxmag(__vector(float[3]),   __vector(float[3]));\npragma(mangle,\"_Z6maxmagDv4_fS_\")  __vector(float[4])   maxmag(__vector(float[4]),   __vector(float[4]));\npragma(mangle,\"_Z6maxmagDv8_fS_\")  __vector(float[8])   maxmag(__vector(float[8]),   __vector(float[8]));\npragma(mangle,\"_Z6maxmagDv16_fS_\") __vector(float[16])  maxmag(__vector(float[16]),  __vector(float[16]));\npragma(mangle,\"_Z6maxmagdd\")                double      maxmag(         double,               double);\npragma(mangle,\"_Z6maxmagDv2_dS_\")  __vector(double[2])  maxmag(__vector(double[2]),  __vector(double[2]));\npragma(mangle,\"_Z6maxmagDv3_dS_\")  __vector(double[3])  maxmag(__vector(double[3]),  __vector(double[3]));\npragma(mangle,\"_Z6maxmagDv4_dS_\")  __vector(double[4])  maxmag(__vector(double[4]),  __vector(double[4]));\npragma(mangle,\"_Z6maxmagDv8_dS_\")  __vector(double[8])  maxmag(__vector(double[8]),  __vector(double[8]));\npragma(mangle,\"_Z6maxmagDv16_dS_\") __vector(double[16]) maxmag(__vector(double[16]), __vector(double[16]));\n\n// minmag\npragma(mangle,\"_Z6minmagff\")                float       minmag(         float,                float);\npragma(mangle,\"_Z6minmagDv2_fS_\")  __vector(float[2])   minmag(__vector(float[2]),   __vector(float[2]));\npragma(mangle,\"_Z6minmagDv3_fS_\")  __vector(float[3])   minmag(__vector(float[3]),   __vector(float[3]));\npragma(mangle,\"_Z6minmagDv4_fS_\")  __vector(float[4])   minmag(__vector(float[4]),   __vector(float[4]));\npragma(mangle,\"_Z6minmagDv8_fS_\")  __vector(float[8])   minmag(__vector(float[8]),   __vector(float[8]));\npragma(mangle,\"_Z6minmagDv16_fS_\") __vector(float[16])  minmag(__vector(float[16]),  __vector(float[16]));\npragma(mangle,\"_Z6minmagdd\")                double      minmag(         double,               double);\npragma(mangle,\"_Z6minmagDv2_dS_\")  __vector(double[2])  minmag(__vector(double[2]),  __vector(double[2]));\npragma(mangle,\"_Z6minmagDv3_dS_\")  __vector(double[3])  minmag(__vector(double[3]),  __vector(double[3]));\npragma(mangle,\"_Z6minmagDv4_dS_\")  __vector(double[4])  minmag(__vector(double[4]),  __vector(double[4]));\npragma(mangle,\"_Z6minmagDv8_dS_\")  __vector(double[8])  minmag(__vector(double[8]),  __vector(double[8]));\npragma(mangle,\"_Z6minmagDv16_dS_\") __vector(double[16]) minmag(__vector(double[16]), __vector(double[16]));\n\n// modf\npragma(mangle,\"_Z4modffPU3AS4f\")               float      modf(         float,      GenericPointer!(         float));\npragma(mangle,\"_Z4modfDv2_fPU3AS4S_\") __vector(float[2])  modf(__vector(float[2]),  GenericPointer!(__vector(float[2])));\npragma(mangle,\"_Z4modfDv3_fPU3AS4S_\") __vector(float[3])  modf(__vector(float[3]),  GenericPointer!(__vector(float[3])));\npragma(mangle,\"_Z4modfDv4_fPU3AS4S_\") __vector(float[4])  modf(__vector(float[4]),  GenericPointer!(__vector(float[4])));\npragma(mangle,\"_Z4modfDv8_fPU3AS4S_\") __vector(float[8])  modf(__vector(float[8]),  GenericPointer!(__vector(float[8])));\npragma(mangle,\"_Z4modfDv16_fPU3AS4S_\")__vector(float[16]) modf(__vector(float[16]), GenericPointer!(__vector(float[16])));\npragma(mangle,\"_Z4modfdPU3AS4d\")               double     modf(         double,     GenericPointer!(         double));\npragma(mangle,\"_Z4modfDv2_dPU3AS4S_\") __vector(double[2]) modf(__vector(double[2]), GenericPointer!(__vector(double[2])));\npragma(mangle,\"_Z4modfDv3_dPU3AS4S_\") __vector(double[3]) modf(__vector(double[3]), GenericPointer!(__vector(double[3])));\npragma(mangle,\"_Z4modfDv4_dPU3AS4S_\") __vector(double[4]) modf(__vector(double[4]), GenericPointer!(__vector(double[4])));\npragma(mangle,\"_Z4modfDv8_dPU3AS4S_\") __vector(double[8]) modf(__vector(double[8]), GenericPointer!(__vector(double[8])));\npragma(mangle,\"_Z4modfDv16_dPU3AS4S_\")__vector(double[16])modf(__vector(double[16]),GenericPointer!(__vector(double[16])));\npragma(mangle,\"_Z4modffPU3AS1f\")               float      modf(         float,      GlobalPointer!(         float));\npragma(mangle,\"_Z4modfDv2_fPU3AS1S_\") __vector(float[2])  modf(__vector(float[2]),  GlobalPointer!(__vector(float[2])));\npragma(mangle,\"_Z4modfDv3_fPU3AS1S_\") __vector(float[3])  modf(__vector(float[3]),  GlobalPointer!(__vector(float[3])));\npragma(mangle,\"_Z4modfDv4_fPU3AS1S_\") __vector(float[4])  modf(__vector(float[4]),  GlobalPointer!(__vector(float[4])));\npragma(mangle,\"_Z4modfDv8_fPU3AS1S_\") __vector(float[8])  modf(__vector(float[8]),  GlobalPointer!(__vector(float[8])));\npragma(mangle,\"_Z4modfDv16_fPU3AS1S_\")__vector(float[16]) modf(__vector(float[16]), GlobalPointer!(__vector(float[16])));\npragma(mangle,\"_Z4modfdPU3AS1d\")               double     modf(         double,     GlobalPointer!(         double));\npragma(mangle,\"_Z4modfDv2_dPU3AS1S_\") __vector(double[2]) modf(__vector(double[2]), GlobalPointer!(__vector(double[2])));\npragma(mangle,\"_Z4modfDv3_dPU3AS1S_\") __vector(double[3]) modf(__vector(double[3]), GlobalPointer!(__vector(double[3])));\npragma(mangle,\"_Z4modfDv4_dPU3AS1S_\") __vector(double[4]) modf(__vector(double[4]), GlobalPointer!(__vector(double[4])));\npragma(mangle,\"_Z4modfDv8_dPU3AS1S_\") __vector(double[8]) modf(__vector(double[8]), GlobalPointer!(__vector(double[8])));\npragma(mangle,\"_Z4modfDv16_dPU3AS1S_\")__vector(double[16])modf(__vector(double[16]),GlobalPointer!(__vector(double[16])));\npragma(mangle,\"_Z4modffPU3AS3f\")               float      modf(         float,      SharedPointer!(         float));\npragma(mangle,\"_Z4modfDv2_fPU3AS3S_\") __vector(float[2])  modf(__vector(float[2]),  SharedPointer!(__vector(float[2])));\npragma(mangle,\"_Z4modfDv3_fPU3AS3S_\") __vector(float[3])  modf(__vector(float[3]),  SharedPointer!(__vector(float[3])));\npragma(mangle,\"_Z4modfDv4_fPU3AS3S_\") __vector(float[4])  modf(__vector(float[4]),  SharedPointer!(__vector(float[4])));\npragma(mangle,\"_Z4modfDv8_fPU3AS3S_\") __vector(float[8])  modf(__vector(float[8]),  SharedPointer!(__vector(float[8])));\npragma(mangle,\"_Z4modfDv16_fPU3AS3S_\")__vector(float[16]) modf(__vector(float[16]), SharedPointer!(__vector(float[16])));\npragma(mangle,\"_Z4modfdPU3AS3d\")               double     modf(         double,     SharedPointer!(         double));\npragma(mangle,\"_Z4modfDv2_dPU3AS3S_\") __vector(double[2]) modf(__vector(double[2]), SharedPointer!(__vector(double[2])));\npragma(mangle,\"_Z4modfDv3_dPU3AS3S_\") __vector(double[3]) modf(__vector(double[3]), SharedPointer!(__vector(double[3])));\npragma(mangle,\"_Z4modfDv4_dPU3AS3S_\") __vector(double[4]) modf(__vector(double[4]), SharedPointer!(__vector(double[4])));\npragma(mangle,\"_Z4modfDv8_dPU3AS3S_\") __vector(double[8]) modf(__vector(double[8]), SharedPointer!(__vector(double[8])));\npragma(mangle,\"_Z4modfDv16_dPU3AS3S_\")__vector(double[16])modf(__vector(double[16]),SharedPointer!(__vector(double[16])));\npragma(mangle,\"_Z4modffPf\")                    float      modf(         float,      PrivatePointer!(         float));\npragma(mangle,\"_Z4modfDv2_fPS_\")      __vector(float[2])  modf(__vector(float[2]),  PrivatePointer!(__vector(float[2])));\npragma(mangle,\"_Z4modfDv3_fPS_\")      __vector(float[3])  modf(__vector(float[3]),  PrivatePointer!(__vector(float[3])));\npragma(mangle,\"_Z4modfDv4_fPS_\")      __vector(float[4])  modf(__vector(float[4]),  PrivatePointer!(__vector(float[4])));\npragma(mangle,\"_Z4modfDv8_fPS_\")      __vector(float[8])  modf(__vector(float[8]),  PrivatePointer!(__vector(float[8])));\npragma(mangle,\"_Z4modfDv16_fPS_\")     __vector(float[16]) modf(__vector(float[16]), PrivatePointer!(__vector(float[16])));\npragma(mangle,\"_Z4modfdPd\")                    double     modf(         double,     PrivatePointer!(         double));\npragma(mangle,\"_Z4modfDv2_dPS_\")      __vector(double[2]) modf(__vector(double[2]), PrivatePointer!(__vector(double[2])));\npragma(mangle,\"_Z4modfDv3_dPS_\")      __vector(double[3]) modf(__vector(double[3]), PrivatePointer!(__vector(double[3])));\npragma(mangle,\"_Z4modfDv4_dPS_\")      __vector(double[4]) modf(__vector(double[4]), PrivatePointer!(__vector(double[4])));\npragma(mangle,\"_Z4modfDv8_dPS_\")      __vector(double[8]) modf(__vector(double[8]), PrivatePointer!(__vector(double[8])));\npragma(mangle,\"_Z4modfDv16_dPS_\")     __vector(double[16])modf(__vector(double[16]),PrivatePointer!(__vector(double[16])));\n\n// nan\npragma(mangle,\"_Z3nanj\")               float       nan(         uint);\npragma(mangle,\"_Z3nanDv2_j\")  __vector(float[2])   nan(__vector(uint[2]));\npragma(mangle,\"_Z3nanDv3_j\")  __vector(float[3])   nan(__vector(uint[3]));\npragma(mangle,\"_Z3nanDv4_j\")  __vector(float[4])   nan(__vector(uint[4]));\npragma(mangle,\"_Z3nanDv8_j\")  __vector(float[8])   nan(__vector(uint[8]));\npragma(mangle,\"_Z3nanDv16_j\") __vector(float[16])  nan(__vector(uint[16]));\npragma(mangle,\"_Z3nanm\")               double      nan(         ulong);\npragma(mangle,\"_Z3nanDv2_m\")  __vector(double[2])  nan(__vector(ulong[2]));\npragma(mangle,\"_Z3nanDv3_m\")  __vector(double[3])  nan(__vector(ulong[3]));\npragma(mangle,\"_Z3nanDv4_m\")  __vector(double[4])  nan(__vector(ulong[4]));\npragma(mangle,\"_Z3nanDv8_m\")  __vector(double[8])  nan(__vector(ulong[8]));\npragma(mangle,\"_Z3nanDv16_m\") __vector(double[16]) nan(__vector(ulong[16]));\n\n// nextafter\npragma(mangle,\"_Z9nextafterff\")                float       nextafter(         float,                float);\npragma(mangle,\"_Z9nextafterDv2_fS_\")  __vector(float[2])   nextafter(__vector(float[2]),   __vector(float[2]));\npragma(mangle,\"_Z9nextafterDv3_fS_\")  __vector(float[3])   nextafter(__vector(float[3]),   __vector(float[3]));\npragma(mangle,\"_Z9nextafterDv4_fS_\")  __vector(float[4])   nextafter(__vector(float[4]),   __vector(float[4]));\npragma(mangle,\"_Z9nextafterDv8_fS_\")  __vector(float[8])   nextafter(__vector(float[8]),   __vector(float[8]));\npragma(mangle,\"_Z9nextafterDv16_fS_\") __vector(float[16])  nextafter(__vector(float[16]),  __vector(float[16]));\npragma(mangle,\"_Z9nextafterdd\")                double      nextafter(         double,               double);\npragma(mangle,\"_Z9nextafterDv2_dS_\")  __vector(double[2])  nextafter(__vector(double[2]),  __vector(double[2]));\npragma(mangle,\"_Z9nextafterDv3_dS_\")  __vector(double[3])  nextafter(__vector(double[3]),  __vector(double[3]));\npragma(mangle,\"_Z9nextafterDv4_dS_\")  __vector(double[4])  nextafter(__vector(double[4]),  __vector(double[4]));\npragma(mangle,\"_Z9nextafterDv8_dS_\")  __vector(double[8])  nextafter(__vector(double[8]),  __vector(double[8]));\npragma(mangle,\"_Z9nextafterDv16_dS_\") __vector(double[16]) nextafter(__vector(double[16]), __vector(double[16]));\n\n// pow\npragma(mangle,\"_Z3powff\")                float       pow(         float,                float);\npragma(mangle,\"_Z3powDv2_fS_\")  __vector(float[2])   pow(__vector(float[2]),   __vector(float[2]));\npragma(mangle,\"_Z3powDv3_fS_\")  __vector(float[3])   pow(__vector(float[3]),   __vector(float[3]));\npragma(mangle,\"_Z3powDv4_fS_\")  __vector(float[4])   pow(__vector(float[4]),   __vector(float[4]));\npragma(mangle,\"_Z3powDv8_fS_\")  __vector(float[8])   pow(__vector(float[8]),   __vector(float[8]));\npragma(mangle,\"_Z3powDv16_fS_\") __vector(float[16])  pow(__vector(float[16]),  __vector(float[16]));\npragma(mangle,\"_Z3powdd\")                double      pow(         double,               double);\npragma(mangle,\"_Z3powDv2_dS_\")  __vector(double[2])  pow(__vector(double[2]),  __vector(double[2]));\npragma(mangle,\"_Z3powDv3_dS_\")  __vector(double[3])  pow(__vector(double[3]),  __vector(double[3]));\npragma(mangle,\"_Z3powDv4_dS_\")  __vector(double[4])  pow(__vector(double[4]),  __vector(double[4]));\npragma(mangle,\"_Z3powDv8_dS_\")  __vector(double[8])  pow(__vector(double[8]),  __vector(double[8]));\npragma(mangle,\"_Z3powDv16_dS_\") __vector(double[16]) pow(__vector(double[16]), __vector(double[16]));\n\n// pown\npragma(mangle,\"_Z4pownfi\")                    float       pown(         float,                int);\npragma(mangle,\"_Z4pownDv2_fDv2_i\")   __vector(float[2])   pown(__vector(float[2]),   __vector(int[2]));\npragma(mangle,\"_Z4pownDv3_fDv3_i\")   __vector(float[3])   pown(__vector(float[3]),   __vector(int[3]));\npragma(mangle,\"_Z4pownDv4_fDv4_i\")   __vector(float[4])   pown(__vector(float[4]),   __vector(int[4]));\npragma(mangle,\"_Z4pownDv8_fDv8_i\")   __vector(float[8])   pown(__vector(float[8]),   __vector(int[8]));\npragma(mangle,\"_Z4pownDv16_fDv16_i\") __vector(float[16])  pown(__vector(float[16]),  __vector(int[16]));\npragma(mangle,\"_Z4powndi\")                    double      pown(         double,               int);\npragma(mangle,\"_Z4pownDv2_dDv2_i\")   __vector(double[2])  pown(__vector(double[2]),  __vector(int[2]));\npragma(mangle,\"_Z4pownDv3_dDv3_i\")   __vector(double[3])  pown(__vector(double[3]),  __vector(int[3]));\npragma(mangle,\"_Z4pownDv4_dDv4_i\")   __vector(double[4])  pown(__vector(double[4]),  __vector(int[4]));\npragma(mangle,\"_Z4pownDv8_dDv8_i\")   __vector(double[8])  pown(__vector(double[8]),  __vector(int[8]));\npragma(mangle,\"_Z4pownDv16_dDv16_i\") __vector(double[16]) pown(__vector(double[16]), __vector(int[16]));\n\n// powr\npragma(mangle,\"_Z4powrff\")                float       powr(         float,                float);\npragma(mangle,\"_Z4powrDv2_fS_\")  __vector(float[2])   powr(__vector(float[2]),   __vector(float[2]));\npragma(mangle,\"_Z4powrDv3_fS_\")  __vector(float[3])   powr(__vector(float[3]),   __vector(float[3]));\npragma(mangle,\"_Z4powrDv4_fS_\")  __vector(float[4])   powr(__vector(float[4]),   __vector(float[4]));\npragma(mangle,\"_Z4powrDv8_fS_\")  __vector(float[8])   powr(__vector(float[8]),   __vector(float[8]));\npragma(mangle,\"_Z4powrDv16_fS_\") __vector(float[16])  powr(__vector(float[16]),  __vector(float[16]));\npragma(mangle,\"_Z4powrdd\")                double      powr(         double,               double);\npragma(mangle,\"_Z4powrDv2_dS_\")  __vector(double[2])  powr(__vector(double[2]),  __vector(double[2]));\npragma(mangle,\"_Z4powrDv3_dS_\")  __vector(double[3])  powr(__vector(double[3]),  __vector(double[3]));\npragma(mangle,\"_Z4powrDv4_dS_\")  __vector(double[4])  powr(__vector(double[4]),  __vector(double[4]));\npragma(mangle,\"_Z4powrDv8_dS_\")  __vector(double[8])  powr(__vector(double[8]),  __vector(double[8]));\npragma(mangle,\"_Z4powrDv16_dS_\") __vector(double[16]) powr(__vector(double[16]), __vector(double[16]));\n\n// remainder\npragma(mangle,\"_Z9remainderff\")                float       remainder(         float,                float);\npragma(mangle,\"_Z9remainderDv2_fS_\")  __vector(float[2])   remainder(__vector(float[2]),   __vector(float[2]));\npragma(mangle,\"_Z9remainderDv3_fS_\")  __vector(float[3])   remainder(__vector(float[3]),   __vector(float[3]));\npragma(mangle,\"_Z9remainderDv4_fS_\")  __vector(float[4])   remainder(__vector(float[4]),   __vector(float[4]));\npragma(mangle,\"_Z9remainderDv8_fS_\")  __vector(float[8])   remainder(__vector(float[8]),   __vector(float[8]));\npragma(mangle,\"_Z9remainderDv16_fS_\") __vector(float[16])  remainder(__vector(float[16]),  __vector(float[16]));\npragma(mangle,\"_Z9remainderdd\")                double      remainder(         double,               double);\npragma(mangle,\"_Z9remainderDv2_dS_\")  __vector(double[2])  remainder(__vector(double[2]),  __vector(double[2]));\npragma(mangle,\"_Z9remainderDv3_dS_\")  __vector(double[3])  remainder(__vector(double[3]),  __vector(double[3]));\npragma(mangle,\"_Z9remainderDv4_dS_\")  __vector(double[4])  remainder(__vector(double[4]),  __vector(double[4]));\npragma(mangle,\"_Z9remainderDv8_dS_\")  __vector(double[8])  remainder(__vector(double[8]),  __vector(double[8]));\npragma(mangle,\"_Z9remainderDv16_dS_\") __vector(double[16]) remainder(__vector(double[16]), __vector(double[16]));\n\n// remquo\npragma(mangle,\"_Z6remquoffPU3AS4i\")\n        float       remquo(          float,                float,       GenericPointer!(         int));\npragma(mangle,\"_Z6remquoDv2_fS_PU3AS4Dv2_i\")\n__vector(float[2])   remquo(__vector(float[2]),   __vector(float[2]),   GenericPointer!(__vector(int[2])));\npragma(mangle,\"_Z6remquoDv3_fS_PU3AS4Dv3_i\")\n__vector(float[3])   remquo(__vector(float[3]),   __vector(float[3]),   GenericPointer!(__vector(int[3])));\npragma(mangle,\"_Z6remquoDv4_fS_PU3AS4Dv4_i\")\n__vector(float[4])   remquo(__vector(float[4]),   __vector(float[4]),   GenericPointer!(__vector(int[4])));\npragma(mangle,\"_Z6remquoDv8_fS_PU3AS4Dv8_i\")\n__vector(float[8])   remquo(__vector(float[8]),   __vector(float[8]),   GenericPointer!(__vector(int[8])));\npragma(mangle,\"_Z6remquoDv16_fS_PU3AS4Dv16_i\")\n__vector(float[16])  remquo(__vector(float[16]),  __vector(float[16]),  GenericPointer!(__vector(int[16])));\npragma(mangle,\"_Z6remquoddPU3AS4i\")\n        double      remquo(          double,               double,      GenericPointer!(         int));\npragma(mangle,\"_Z6remquoDv2_dS_PU3AS4Dv2_i\")\n__vector(double[2])  remquo(__vector(double[2]),  __vector(double[2]),  GenericPointer!(__vector(int[2])));\npragma(mangle,\"_Z6remquoDv3_dS_PU3AS4Dv3_i\")\n__vector(double[3])  remquo(__vector(double[3]),  __vector(double[3]),  GenericPointer!(__vector(int[3])));\npragma(mangle,\"_Z6remquoDv4_dS_PU3AS4Dv4_i\")\n__vector(double[4])  remquo(__vector(double[4]),  __vector(double[4]),  GenericPointer!(__vector(int[4])));\npragma(mangle,\"_Z6remquoDv8_dS_PU3AS4Dv8_i\")\n__vector(double[8])  remquo(__vector(double[8]),  __vector(double[8]),  GenericPointer!(__vector(int[8])));\npragma(mangle,\"_Z6remquoDv16_dS_PU3AS4Dv16_i\")\n__vector(double[16]) remquo(__vector(double[16]), __vector(double[16]), GenericPointer!(__vector(int[16])));\npragma(mangle,\"_Z6remquoffPU3AS1i\")\n        float       remquo(          float,                float,       GlobalPointer!(         int));\npragma(mangle,\"_Z6remquoDv2_fS_PU3AS1Dv2_i\")\n__vector(float[2])   remquo(__vector(float[2]),   __vector(float[2]),   GlobalPointer!(__vector(int[2])));\npragma(mangle,\"_Z6remquoDv3_fS_PU3AS1Dv3_i\")\n__vector(float[3])   remquo(__vector(float[3]),   __vector(float[3]),   GlobalPointer!(__vector(int[3])));\npragma(mangle,\"_Z6remquoDv4_fS_PU3AS1Dv4_i\")\n__vector(float[4])   remquo(__vector(float[4]),   __vector(float[4]),   GlobalPointer!(__vector(int[4])));\npragma(mangle,\"_Z6remquoDv8_fS_PU3AS1Dv8_i\")\n__vector(float[8])   remquo(__vector(float[8]),   __vector(float[8]),   GlobalPointer!(__vector(int[8])));\npragma(mangle,\"_Z6remquoDv16_fS_PU3AS1Dv16_i\")\n__vector(float[16])  remquo(__vector(float[16]),  __vector(float[16]),  GlobalPointer!(__vector(int[16])));\npragma(mangle,\"_Z6remquoddPU3AS1i\")\n        double      remquo(          double,               double,      GlobalPointer!(         int));\npragma(mangle,\"_Z6remquoDv2_dS_PU3AS1Dv2_i\")\n__vector(double[2])  remquo(__vector(double[2]),  __vector(double[2]),  GlobalPointer!(__vector(int[2])));\npragma(mangle,\"_Z6remquoDv3_dS_PU3AS1Dv3_i\")\n__vector(double[3])  remquo(__vector(double[3]),  __vector(double[3]),  GlobalPointer!(__vector(int[3])));\npragma(mangle,\"_Z6remquoDv4_dS_PU3AS1Dv4_i\")\n__vector(double[4])  remquo(__vector(double[4]),  __vector(double[4]),  GlobalPointer!(__vector(int[4])));\npragma(mangle,\"_Z6remquoDv8_dS_PU3AS1Dv8_i\")\n__vector(double[8])  remquo(__vector(double[8]),  __vector(double[8]),  GlobalPointer!(__vector(int[8])));\npragma(mangle,\"_Z6remquoDv16_dS_PU3AS1Dv16_i\")\n__vector(double[16]) remquo(__vector(double[16]), __vector(double[16]), GlobalPointer!(__vector(int[16])));\npragma(mangle,\"_Z6remquoffPU3AS3i\")\n        float       remquo(          float,                float,       SharedPointer!(         int));\npragma(mangle,\"_Z6remquoDv2_fS_PU3AS3Dv2_i\")\n__vector(float[2])   remquo(__vector(float[2]),   __vector(float[2]),   SharedPointer!(__vector(int[2])));\npragma(mangle,\"_Z6remquoDv3_fS_PU3AS3Dv3_i\")\n__vector(float[3])   remquo(__vector(float[3]),   __vector(float[3]),   SharedPointer!(__vector(int[3])));\npragma(mangle,\"_Z6remquoDv4_fS_PU3AS3Dv4_i\")\n__vector(float[4])   remquo(__vector(float[4]),   __vector(float[4]),   SharedPointer!(__vector(int[4])));\npragma(mangle,\"_Z6remquoDv8_fS_PU3AS3Dv8_i\")\n__vector(float[8])   remquo(__vector(float[8]),   __vector(float[8]),   SharedPointer!(__vector(int[8])));\npragma(mangle,\"_Z6remquoDv16_fS_PU3AS3Dv16_i\")\n__vector(float[16])  remquo(__vector(float[16]),  __vector(float[16]),  SharedPointer!(__vector(int[16])));\npragma(mangle,\"_Z6remquoddPU3AS3i\")\n        double      remquo(          double,               double,      SharedPointer!(         int));\npragma(mangle,\"_Z6remquoDv2_dS_PU3AS3Dv2_i\")\n__vector(double[2])  remquo(__vector(double[2]),  __vector(double[2]),  SharedPointer!(__vector(int[2])));\npragma(mangle,\"_Z6remquoDv3_dS_PU3AS3Dv3_i\")\n__vector(double[3])  remquo(__vector(double[3]),  __vector(double[3]),  SharedPointer!(__vector(int[3])));\npragma(mangle,\"_Z6remquoDv4_dS_PU3AS3Dv4_i\")\n__vector(double[4])  remquo(__vector(double[4]),  __vector(double[4]),  SharedPointer!(__vector(int[4])));\npragma(mangle,\"_Z6remquoDv8_dS_PU3AS3Dv8_i\")\n__vector(double[8])  remquo(__vector(double[8]),  __vector(double[8]),  SharedPointer!(__vector(int[8])));\npragma(mangle,\"_Z6remquoDv16_dS_PU3AS3Dv16_i\")\n__vector(double[16]) remquo(__vector(double[16]), __vector(double[16]), SharedPointer!(__vector(int[16])));\npragma(mangle,\"_Z6remquoffPi\")\n        float       remquo(          float,                float,       PrivatePointer!(         int));\npragma(mangle,\"_Z6remquoDv2_fS_PDv2_i\")\n__vector(float[2])   remquo(__vector(float[2]),   __vector(float[2]),   PrivatePointer!(__vector(int[2])));\npragma(mangle,\"_Z6remquoDv3_fS_PDv3_i\")\n__vector(float[3])   remquo(__vector(float[3]),   __vector(float[3]),   PrivatePointer!(__vector(int[3])));\npragma(mangle,\"_Z6remquoDv4_fS_PDv4_i\")\n__vector(float[4])   remquo(__vector(float[4]),   __vector(float[4]),   PrivatePointer!(__vector(int[4])));\npragma(mangle,\"_Z6remquoDv8_fS_PDv8_i\")\n__vector(float[8])   remquo(__vector(float[8]),   __vector(float[8]),   PrivatePointer!(__vector(int[8])));\npragma(mangle,\"_Z6remquoDv16_fS_PDv16_i\")\n__vector(float[16])  remquo(__vector(float[16]),  __vector(float[16]),  PrivatePointer!(__vector(int[16])));\npragma(mangle,\"_Z6remquoddPi\")\n        double      remquo(          double,               double,      PrivatePointer!(         int));\npragma(mangle,\"_Z6remquoDv2_dS_PDv2_i\")\n__vector(double[2])  remquo(__vector(double[2]),  __vector(double[2]),  PrivatePointer!(__vector(int[2])));\npragma(mangle,\"_Z6remquoDv3_dS_PDv3_i\")\n__vector(double[3])  remquo(__vector(double[3]),  __vector(double[3]),  PrivatePointer!(__vector(int[3])));\npragma(mangle,\"_Z6remquoDv4_dS_PDv4_i\")\n__vector(double[4])  remquo(__vector(double[4]),  __vector(double[4]),  PrivatePointer!(__vector(int[4])));\npragma(mangle,\"_Z6remquoDv8_dS_PDv8_i\")\n__vector(double[8])  remquo(__vector(double[8]),  __vector(double[8]),  PrivatePointer!(__vector(int[8])));\npragma(mangle,\"_Z6remquoDv16_dS_PDv16_i\")\n__vector(double[16]) remquo(__vector(double[16]), __vector(double[16]), PrivatePointer!(__vector(int[16])));\n\n// rint\npragma(mangle,\"_Z4rintf\")               float       rint(         float);\npragma(mangle,\"_Z4rintDv2_f\")  __vector(float[2])   rint(__vector(float[2]));\npragma(mangle,\"_Z4rintDv3_f\")  __vector(float[3])   rint(__vector(float[3]));\npragma(mangle,\"_Z4rintDv4_f\")  __vector(float[4])   rint(__vector(float[4]));\npragma(mangle,\"_Z4rintDv8_f\")  __vector(float[8])   rint(__vector(float[8]));\npragma(mangle,\"_Z4rintDv16_f\") __vector(float[16])  rint(__vector(float[16]));\npragma(mangle,\"_Z4rintd\")               double      rint(         double);\npragma(mangle,\"_Z4rintDv2_d\")  __vector(double[2])  rint(__vector(double[2]));\npragma(mangle,\"_Z4rintDv3_d\")  __vector(double[3])  rint(__vector(double[3]));\npragma(mangle,\"_Z4rintDv4_d\")  __vector(double[4])  rint(__vector(double[4]));\npragma(mangle,\"_Z4rintDv8_d\")  __vector(double[8])  rint(__vector(double[8]));\npragma(mangle,\"_Z4rintDv16_d\") __vector(double[16]) rint(__vector(double[16]));\n\n// rootn\npragma(mangle,\"_Z5rootnfi\")                    float       rootn(         float,                int);\npragma(mangle,\"_Z5rootnDv2_fDv2_i\")   __vector(float[2])   rootn(__vector(float[2]),   __vector(int[2]));\npragma(mangle,\"_Z5rootnDv3_fDv3_i\")   __vector(float[3])   rootn(__vector(float[3]),   __vector(int[3]));\npragma(mangle,\"_Z5rootnDv4_fDv4_i\")   __vector(float[4])   rootn(__vector(float[4]),   __vector(int[4]));\npragma(mangle,\"_Z5rootnDv8_fDv8_i\")   __vector(float[8])   rootn(__vector(float[8]),   __vector(int[8]));\npragma(mangle,\"_Z5rootnDv16_fDv16_i\") __vector(float[16])  rootn(__vector(float[16]),  __vector(int[16]));\npragma(mangle,\"_Z5rootndi\")                    double      rootn(         double,               int);\npragma(mangle,\"_Z5rootnDv2_dDv2_i\")   __vector(double[2])  rootn(__vector(double[2]),  __vector(int[2]));\npragma(mangle,\"_Z5rootnDv3_dDv3_i\")   __vector(double[3])  rootn(__vector(double[3]),  __vector(int[3]));\npragma(mangle,\"_Z5rootnDv4_dDv4_i\")   __vector(double[4])  rootn(__vector(double[4]),  __vector(int[4]));\npragma(mangle,\"_Z5rootnDv8_dDv8_i\")   __vector(double[8])  rootn(__vector(double[8]),  __vector(int[8]));\npragma(mangle,\"_Z5rootnDv16_dDv16_i\") __vector(double[16]) rootn(__vector(double[16]), __vector(int[16]));\n\n// round\npragma(mangle,\"_Z5roundf\")               float       round(         float);\npragma(mangle,\"_Z5roundDv2_f\")  __vector(float[2])   round(__vector(float[2]));\npragma(mangle,\"_Z5roundDv3_f\")  __vector(float[3])   round(__vector(float[3]));\npragma(mangle,\"_Z5roundDv4_f\")  __vector(float[4])   round(__vector(float[4]));\npragma(mangle,\"_Z5roundDv8_f\")  __vector(float[8])   round(__vector(float[8]));\npragma(mangle,\"_Z5roundDv16_f\") __vector(float[16])  round(__vector(float[16]));\npragma(mangle,\"_Z5roundd\")               double      round(         double);\npragma(mangle,\"_Z5roundDv2_d\")  __vector(double[2])  round(__vector(double[2]));\npragma(mangle,\"_Z5roundDv3_d\")  __vector(double[3])  round(__vector(double[3]));\npragma(mangle,\"_Z5roundDv4_d\")  __vector(double[4])  round(__vector(double[4]));\npragma(mangle,\"_Z5roundDv8_d\")  __vector(double[8])  round(__vector(double[8]));\npragma(mangle,\"_Z5roundDv16_d\") __vector(double[16]) round(__vector(double[16]));\n\n// rsqrt\npragma(mangle,\"_Z5rsqrtf\")               float       rsqrt(         float);\npragma(mangle,\"_Z5rsqrtDv2_f\")  __vector(float[2])   rsqrt(__vector(float[2]));\npragma(mangle,\"_Z5rsqrtDv3_f\")  __vector(float[3])   rsqrt(__vector(float[3]));\npragma(mangle,\"_Z5rsqrtDv4_f\")  __vector(float[4])   rsqrt(__vector(float[4]));\npragma(mangle,\"_Z5rsqrtDv8_f\")  __vector(float[8])   rsqrt(__vector(float[8]));\npragma(mangle,\"_Z5rsqrtDv16_f\") __vector(float[16])  rsqrt(__vector(float[16]));\npragma(mangle,\"_Z5rsqrtd\")               double      rsqrt(         double);\npragma(mangle,\"_Z5rsqrtDv2_d\")  __vector(double[2])  rsqrt(__vector(double[2]));\npragma(mangle,\"_Z5rsqrtDv3_d\")  __vector(double[3])  rsqrt(__vector(double[3]));\npragma(mangle,\"_Z5rsqrtDv4_d\")  __vector(double[4])  rsqrt(__vector(double[4]));\npragma(mangle,\"_Z5rsqrtDv8_d\")  __vector(double[8])  rsqrt(__vector(double[8]));\npragma(mangle,\"_Z5rsqrtDv16_d\") __vector(double[16]) rsqrt(__vector(double[16]));\n\n// sin\npragma(mangle,\"_Z3sinf\")               float       sin(         float);\npragma(mangle,\"_Z3sinDv2_f\")  __vector(float[2])   sin(__vector(float[2]));\npragma(mangle,\"_Z3sinDv3_f\")  __vector(float[3])   sin(__vector(float[3]));\npragma(mangle,\"_Z3sinDv4_f\")  __vector(float[4])   sin(__vector(float[4]));\npragma(mangle,\"_Z3sinDv8_f\")  __vector(float[8])   sin(__vector(float[8]));\npragma(mangle,\"_Z3sinDv16_f\") __vector(float[16])  sin(__vector(float[16]));\npragma(mangle,\"_Z3sind\")               double      sin(         double);\npragma(mangle,\"_Z3sinDv2_d\")  __vector(double[2])  sin(__vector(double[2]));\npragma(mangle,\"_Z3sinDv3_d\")  __vector(double[3])  sin(__vector(double[3]));\npragma(mangle,\"_Z3sinDv4_d\")  __vector(double[4])  sin(__vector(double[4]));\npragma(mangle,\"_Z3sinDv8_d\")  __vector(double[8])  sin(__vector(double[8]));\npragma(mangle,\"_Z3sinDv16_d\") __vector(double[16]) sin(__vector(double[16]));\n\n// sincos\npragma(mangle,\"_Z6sincosfPU3AS4f\")\n        float       sincos(          float,       GenericPointer!(         float));\npragma(mangle,\"_Z6sincosDv2_fPU3AS4S_\")\n__vector(float[2])   sincos(__vector(float[2]),   GenericPointer!(__vector(float[2])));\npragma(mangle,\"_Z6sincosDv3_fPU3AS4S_\")\n__vector(float[3])   sincos(__vector(float[3]),   GenericPointer!(__vector(float[3])));\npragma(mangle,\"_Z6sincosDv4_fPU3AS4S_\")\n__vector(float[4])   sincos(__vector(float[4]),   GenericPointer!(__vector(float[4])));\npragma(mangle,\"_Z6sincosDv8_fPU3AS4S_\")\n__vector(float[8])   sincos(__vector(float[8]),   GenericPointer!(__vector(float[8])));\npragma(mangle,\"_Z6sincosDv16_fPU3AS4S_\")\n__vector(float[16])  sincos(__vector(float[16]),  GenericPointer!(__vector(float[16])));\npragma(mangle,\"_Z6sincosdPU3AS4d\")\n        double      sincos(          double,      GenericPointer!(         double));\npragma(mangle,\"_Z6sincosDv2_dPU3AS4S_\")\n__vector(double[2])  sincos(__vector(double[2]),  GenericPointer!(__vector(double[2])));\npragma(mangle,\"_Z6sincosDv3_dPU3AS4S_\")\n__vector(double[3])  sincos(__vector(double[3]),  GenericPointer!(__vector(double[3])));\npragma(mangle,\"_Z6sincosDv4_dPU3AS4S_\")\n__vector(double[4])  sincos(__vector(double[4]),  GenericPointer!(__vector(double[4])));\npragma(mangle,\"_Z6sincosDv8_dPU3AS4S_\")\n__vector(double[8])  sincos(__vector(double[8]),  GenericPointer!(__vector(double[8])));\npragma(mangle,\"_Z6sincosDv16_dPU3AS4S_\")\n__vector(double[16]) sincos(__vector(double[16]), GenericPointer!(__vector(double[16])));\npragma(mangle,\"_Z6sincosfPU3AS1f\")\n        float       sincos(          float,       GlobalPointer!(         float));\npragma(mangle,\"_Z6sincosDv2_fPU3AS1S_\")\n__vector(float[2])   sincos(__vector(float[2]),   GlobalPointer!(__vector(float[2])));\npragma(mangle,\"_Z6sincosDv3_fPU3AS1S_\")\n__vector(float[3])   sincos(__vector(float[3]),   GlobalPointer!(__vector(float[3])));\npragma(mangle,\"_Z6sincosDv4_fPU3AS1S_\")\n__vector(float[4])   sincos(__vector(float[4]),   GlobalPointer!(__vector(float[4])));\npragma(mangle,\"_Z6sincosDv8_fPU3AS1S_\")\n__vector(float[8])   sincos(__vector(float[8]),   GlobalPointer!(__vector(float[8])));\npragma(mangle,\"_Z6sincosDv16_fPU3AS1S_\")\n__vector(float[16])  sincos(__vector(float[16]),  GlobalPointer!(__vector(float[16])));\npragma(mangle,\"_Z6sincosdPU3AS1d\")\n        double      sincos(          double,      GlobalPointer!(         double));\npragma(mangle,\"_Z6sincosDv2_dPU3AS1S_\")\n__vector(double[2])  sincos(__vector(double[2]),  GlobalPointer!(__vector(double[2])));\npragma(mangle,\"_Z6sincosDv3_dPU3AS1S_\")\n__vector(double[3])  sincos(__vector(double[3]),  GlobalPointer!(__vector(double[3])));\npragma(mangle,\"_Z6sincosDv4_dPU3AS1S_\")\n__vector(double[4])  sincos(__vector(double[4]),  GlobalPointer!(__vector(double[4])));\npragma(mangle,\"_Z6sincosDv8_dPU3AS1S_\")\n__vector(double[8])  sincos(__vector(double[8]),  GlobalPointer!(__vector(double[8])));\npragma(mangle,\"_Z6sincosDv16_dPU3AS1S_\")\n__vector(double[16]) sincos(__vector(double[16]), GlobalPointer!(__vector(double[16])));\npragma(mangle,\"_Z6sincosfPU3AS3f\")\n        float       sincos(          float,       SharedPointer!(         float));\npragma(mangle,\"_Z6sincosDv2_fPU3AS3S_\")\n__vector(float[2])   sincos(__vector(float[2]),   SharedPointer!(__vector(float[2])));\npragma(mangle,\"_Z6sincosDv3_fPU3AS3S_\")\n__vector(float[3])   sincos(__vector(float[3]),   SharedPointer!(__vector(float[3])));\npragma(mangle,\"_Z6sincosDv4_fPU3AS3S_\")\n__vector(float[4])   sincos(__vector(float[4]),   SharedPointer!(__vector(float[4])));\npragma(mangle,\"_Z6sincosDv8_fPU3AS3S_\")\n__vector(float[8])   sincos(__vector(float[8]),   SharedPointer!(__vector(float[8])));\npragma(mangle,\"_Z6sincosDv16_fPU3AS3S_\")\n__vector(float[16])  sincos(__vector(float[16]),  SharedPointer!(__vector(float[16])));\npragma(mangle,\"_Z6sincosdPU3AS3d\")\n        double      sincos(          double,      SharedPointer!(         double));\npragma(mangle,\"_Z6sincosDv2_dPU3AS3S_\")\n__vector(double[2])  sincos(__vector(double[2]),  SharedPointer!(__vector(double[2])));\npragma(mangle,\"_Z6sincosDv3_dPU3AS3S_\")\n__vector(double[3])  sincos(__vector(double[3]),  SharedPointer!(__vector(double[3])));\npragma(mangle,\"_Z6sincosDv4_dPU3AS3S_\")\n__vector(double[4])  sincos(__vector(double[4]),  SharedPointer!(__vector(double[4])));\npragma(mangle,\"_Z6sincosDv8_dPU3AS3S_\")\n__vector(double[8])  sincos(__vector(double[8]),  SharedPointer!(__vector(double[8])));\npragma(mangle,\"_Z6sincosDv16_dPU3AS3S_\")\n__vector(double[16]) sincos(__vector(double[16]), SharedPointer!(__vector(double[16])));\npragma(mangle,\"_Z6sincosfPf\")\n        float       sincos(          float,       PrivatePointer!(         float));\npragma(mangle,\"_Z6sincosDv2_fPS_\")\n__vector(float[2])   sincos(__vector(float[2]),   PrivatePointer!(__vector(float[2])));\npragma(mangle,\"_Z6sincosDv3_fPS_\")\n__vector(float[3])   sincos(__vector(float[3]),   PrivatePointer!(__vector(float[3])));\npragma(mangle,\"_Z6sincosDv4_fPS_\")\n__vector(float[4])   sincos(__vector(float[4]),   PrivatePointer!(__vector(float[4])));\npragma(mangle,\"_Z6sincosDv8_fPS_\")\n__vector(float[8])   sincos(__vector(float[8]),   PrivatePointer!(__vector(float[8])));\npragma(mangle,\"_Z6sincosDv16_fPS_\")\n__vector(float[16])  sincos(__vector(float[16]),  PrivatePointer!(__vector(float[16])));\npragma(mangle,\"_Z6sincosdPd\")\n        double      sincos(          double,      PrivatePointer!(         double));\npragma(mangle,\"_Z6sincosDv2_dPS_\")\n__vector(double[2])  sincos(__vector(double[2]),  PrivatePointer!(__vector(double[2])));\npragma(mangle,\"_Z6sincosDv3_dPS_\")\n__vector(double[3])  sincos(__vector(double[3]),  PrivatePointer!(__vector(double[3])));\npragma(mangle,\"_Z6sincosDv4_dPS_\")\n__vector(double[4])  sincos(__vector(double[4]),  PrivatePointer!(__vector(double[4])));\npragma(mangle,\"_Z6sincosDv8_dPS_\")\n__vector(double[8])  sincos(__vector(double[8]),  PrivatePointer!(__vector(double[8])));\npragma(mangle,\"_Z6sincosDv16_dPS_\")\n__vector(double[16]) sincos(__vector(double[16]), PrivatePointer!(__vector(double[16])));\n\n// sinh\npragma(mangle,\"_Z4sinhf\")               float       sinh(         float);\npragma(mangle,\"_Z4sinhDv2_f\")  __vector(float[2])   sinh(__vector(float[2]));\npragma(mangle,\"_Z4sinhDv3_f\")  __vector(float[3])   sinh(__vector(float[3]));\npragma(mangle,\"_Z4sinhDv4_f\")  __vector(float[4])   sinh(__vector(float[4]));\npragma(mangle,\"_Z4sinhDv8_f\")  __vector(float[8])   sinh(__vector(float[8]));\npragma(mangle,\"_Z4sinhDv16_f\") __vector(float[16])  sinh(__vector(float[16]));\npragma(mangle,\"_Z4sinhd\")               double      sinh(         double);\npragma(mangle,\"_Z4sinhDv2_d\")  __vector(double[2])  sinh(__vector(double[2]));\npragma(mangle,\"_Z4sinhDv3_d\")  __vector(double[3])  sinh(__vector(double[3]));\npragma(mangle,\"_Z4sinhDv4_d\")  __vector(double[4])  sinh(__vector(double[4]));\npragma(mangle,\"_Z4sinhDv8_d\")  __vector(double[8])  sinh(__vector(double[8]));\npragma(mangle,\"_Z4sinhDv16_d\") __vector(double[16]) sinh(__vector(double[16]));\n\n// sinpi\npragma(mangle,\"_Z5sinpif\")               float       sinpi(         float);\npragma(mangle,\"_Z5sinpiDv2_f\")  __vector(float[2])   sinpi(__vector(float[2]));\npragma(mangle,\"_Z5sinpiDv3_f\")  __vector(float[3])   sinpi(__vector(float[3]));\npragma(mangle,\"_Z5sinpiDv4_f\")  __vector(float[4])   sinpi(__vector(float[4]));\npragma(mangle,\"_Z5sinpiDv8_f\")  __vector(float[8])   sinpi(__vector(float[8]));\npragma(mangle,\"_Z5sinpiDv16_f\") __vector(float[16])  sinpi(__vector(float[16]));\npragma(mangle,\"_Z5sinpid\")               double      sinpi(         double);\npragma(mangle,\"_Z5sinpiDv2_d\")  __vector(double[2])  sinpi(__vector(double[2]));\npragma(mangle,\"_Z5sinpiDv3_d\")  __vector(double[3])  sinpi(__vector(double[3]));\npragma(mangle,\"_Z5sinpiDv4_d\")  __vector(double[4])  sinpi(__vector(double[4]));\npragma(mangle,\"_Z5sinpiDv8_d\")  __vector(double[8])  sinpi(__vector(double[8]));\npragma(mangle,\"_Z5sinpiDv16_d\") __vector(double[16]) sinpi(__vector(double[16]));\n\n// sqrt\npragma(mangle,\"_Z4sqrtf\")               float       sqrt(         float);\npragma(mangle,\"_Z4sqrtDv2_f\")  __vector(float[2])   sqrt(__vector(float[2]));\npragma(mangle,\"_Z4sqrtDv3_f\")  __vector(float[3])   sqrt(__vector(float[3]));\npragma(mangle,\"_Z4sqrtDv4_f\")  __vector(float[4])   sqrt(__vector(float[4]));\npragma(mangle,\"_Z4sqrtDv8_f\")  __vector(float[8])   sqrt(__vector(float[8]));\npragma(mangle,\"_Z4sqrtDv16_f\") __vector(float[16])  sqrt(__vector(float[16]));\npragma(mangle,\"_Z4sqrtd\")               double      sqrt(         double);\npragma(mangle,\"_Z4sqrtDv2_d\")  __vector(double[2])  sqrt(__vector(double[2]));\npragma(mangle,\"_Z4sqrtDv3_d\")  __vector(double[3])  sqrt(__vector(double[3]));\npragma(mangle,\"_Z4sqrtDv4_d\")  __vector(double[4])  sqrt(__vector(double[4]));\npragma(mangle,\"_Z4sqrtDv8_d\")  __vector(double[8])  sqrt(__vector(double[8]));\npragma(mangle,\"_Z4sqrtDv16_d\") __vector(double[16]) sqrt(__vector(double[16]));\n\n// tan\npragma(mangle,\"_Z3tanf\")               float       tan(         float);\npragma(mangle,\"_Z3tanDv2_f\")  __vector(float[2])   tan(__vector(float[2]));\npragma(mangle,\"_Z3tanDv3_f\")  __vector(float[3])   tan(__vector(float[3]));\npragma(mangle,\"_Z3tanDv4_f\")  __vector(float[4])   tan(__vector(float[4]));\npragma(mangle,\"_Z3tanDv8_f\")  __vector(float[8])   tan(__vector(float[8]));\npragma(mangle,\"_Z3tanDv16_f\") __vector(float[16])  tan(__vector(float[16]));\npragma(mangle,\"_Z3tand\")               double      tan(         double);\npragma(mangle,\"_Z3tanDv2_d\")  __vector(double[2])  tan(__vector(double[2]));\npragma(mangle,\"_Z3tanDv3_d\")  __vector(double[3])  tan(__vector(double[3]));\npragma(mangle,\"_Z3tanDv4_d\")  __vector(double[4])  tan(__vector(double[4]));\npragma(mangle,\"_Z3tanDv8_d\")  __vector(double[8])  tan(__vector(double[8]));\npragma(mangle,\"_Z3tanDv16_d\") __vector(double[16]) tan(__vector(double[16]));\n\n// tanh\npragma(mangle,\"_Z4tanhf\")               float       tanh(         float);\npragma(mangle,\"_Z4tanhDv2_f\")  __vector(float[2])   tanh(__vector(float[2]));\npragma(mangle,\"_Z4tanhDv3_f\")  __vector(float[3])   tanh(__vector(float[3]));\npragma(mangle,\"_Z4tanhDv4_f\")  __vector(float[4])   tanh(__vector(float[4]));\npragma(mangle,\"_Z4tanhDv8_f\")  __vector(float[8])   tanh(__vector(float[8]));\npragma(mangle,\"_Z4tanhDv16_f\") __vector(float[16])  tanh(__vector(float[16]));\npragma(mangle,\"_Z4tanhd\")               double      tanh(         double);\npragma(mangle,\"_Z4tanhDv2_d\")  __vector(double[2])  tanh(__vector(double[2]));\npragma(mangle,\"_Z4tanhDv3_d\")  __vector(double[3])  tanh(__vector(double[3]));\npragma(mangle,\"_Z4tanhDv4_d\")  __vector(double[4])  tanh(__vector(double[4]));\npragma(mangle,\"_Z4tanhDv8_d\")  __vector(double[8])  tanh(__vector(double[8]));\npragma(mangle,\"_Z4tanhDv16_d\") __vector(double[16]) tanh(__vector(double[16]));\n\n// tanpi\npragma(mangle,\"_Z5tanpif\")               float       tanpi(         float);\npragma(mangle,\"_Z5tanpiDv2_f\")  __vector(float[2])   tanpi(__vector(float[2]));\npragma(mangle,\"_Z5tanpiDv3_f\")  __vector(float[3])   tanpi(__vector(float[3]));\npragma(mangle,\"_Z5tanpiDv4_f\")  __vector(float[4])   tanpi(__vector(float[4]));\npragma(mangle,\"_Z5tanpiDv8_f\")  __vector(float[8])   tanpi(__vector(float[8]));\npragma(mangle,\"_Z5tanpiDv16_f\") __vector(float[16])  tanpi(__vector(float[16]));\npragma(mangle,\"_Z5tanpid\")               double      tanpi(         double);\npragma(mangle,\"_Z5tanpiDv2_d\")  __vector(double[2])  tanpi(__vector(double[2]));\npragma(mangle,\"_Z5tanpiDv3_d\")  __vector(double[3])  tanpi(__vector(double[3]));\npragma(mangle,\"_Z5tanpiDv4_d\")  __vector(double[4])  tanpi(__vector(double[4]));\npragma(mangle,\"_Z5tanpiDv8_d\")  __vector(double[8])  tanpi(__vector(double[8]));\npragma(mangle,\"_Z5tanpiDv16_d\") __vector(double[16]) tanpi(__vector(double[16]));\n\n// tgamma\npragma(mangle,\"_Z6tgammaf\")               float       tgamma(         float);\npragma(mangle,\"_Z6tgammaDv2_f\")  __vector(float[2])   tgamma(__vector(float[2]));\npragma(mangle,\"_Z6tgammaDv3_f\")  __vector(float[3])   tgamma(__vector(float[3]));\npragma(mangle,\"_Z6tgammaDv4_f\")  __vector(float[4])   tgamma(__vector(float[4]));\npragma(mangle,\"_Z6tgammaDv8_f\")  __vector(float[8])   tgamma(__vector(float[8]));\npragma(mangle,\"_Z6tgammaDv16_f\") __vector(float[16])  tgamma(__vector(float[16]));\npragma(mangle,\"_Z6tgammad\")               double      tgamma(         double);\npragma(mangle,\"_Z6tgammaDv2_d\")  __vector(double[2])  tgamma(__vector(double[2]));\npragma(mangle,\"_Z6tgammaDv3_d\")  __vector(double[3])  tgamma(__vector(double[3]));\npragma(mangle,\"_Z6tgammaDv4_d\")  __vector(double[4])  tgamma(__vector(double[4]));\npragma(mangle,\"_Z6tgammaDv8_d\")  __vector(double[8])  tgamma(__vector(double[8]));\npragma(mangle,\"_Z6tgammaDv16_d\") __vector(double[16]) tgamma(__vector(double[16]));\n\n// trunc\npragma(mangle,\"_Z5truncf\")               float       trunc(         float);\npragma(mangle,\"_Z5truncDv2_f\")  __vector(float[2])   trunc(__vector(float[2]));\npragma(mangle,\"_Z5truncDv3_f\")  __vector(float[3])   trunc(__vector(float[3]));\npragma(mangle,\"_Z5truncDv4_f\")  __vector(float[4])   trunc(__vector(float[4]));\npragma(mangle,\"_Z5truncDv8_f\")  __vector(float[8])   trunc(__vector(float[8]));\npragma(mangle,\"_Z5truncDv16_f\") __vector(float[16])  trunc(__vector(float[16]));\npragma(mangle,\"_Z5truncd\")               double      trunc(         double);\npragma(mangle,\"_Z5truncDv2_d\")  __vector(double[2])  trunc(__vector(double[2]));\npragma(mangle,\"_Z5truncDv3_d\")  __vector(double[3])  trunc(__vector(double[3]));\npragma(mangle,\"_Z5truncDv4_d\")  __vector(double[4])  trunc(__vector(double[4]));\npragma(mangle,\"_Z5truncDv8_d\")  __vector(double[8])  trunc(__vector(double[8]));\npragma(mangle,\"_Z5truncDv16_d\") __vector(double[16]) trunc(__vector(double[16]));\n\n// half_cos\npragma(mangle,\"_Z8half_cosf\")               float      half_cos(         float);\npragma(mangle,\"_Z8half_cosDv2_f\")  __vector(float[2])  half_cos(__vector(float[2]));\npragma(mangle,\"_Z8half_cosDv3_f\")  __vector(float[3])  half_cos(__vector(float[3]));\npragma(mangle,\"_Z8half_cosDv4_f\")  __vector(float[4])  half_cos(__vector(float[4]));\npragma(mangle,\"_Z8half_cosDv8_f\")  __vector(float[8])  half_cos(__vector(float[8]));\npragma(mangle,\"_Z8half_cosDv16_f\") __vector(float[16]) half_cos(__vector(float[16]));\n\n// half_divide\npragma(mangle,\"_Z11half_divideff\")                float      half_divide(         float,               float);\npragma(mangle,\"_Z11half_divideDv2_fS_\")  __vector(float[2])  half_divide(__vector(float[2]),  __vector(float[2]));\npragma(mangle,\"_Z11half_divideDv3_fS_\")  __vector(float[3])  half_divide(__vector(float[3]),  __vector(float[3]));\npragma(mangle,\"_Z11half_divideDv4_fS_\")  __vector(float[4])  half_divide(__vector(float[4]),  __vector(float[4]));\npragma(mangle,\"_Z11half_divideDv8_fS_\")  __vector(float[8])  half_divide(__vector(float[8]),  __vector(float[8]));\npragma(mangle,\"_Z11half_divideDv16_fS_\") __vector(float[16]) half_divide(__vector(float[16]), __vector(float[16]));\n\n// half_exp\npragma(mangle,\"_Z8half_expf\")               float      half_exp(         float);\npragma(mangle,\"_Z8half_expDv2_f\")  __vector(float[2])  half_exp(__vector(float[2]));\npragma(mangle,\"_Z8half_expDv3_f\")  __vector(float[3])  half_exp(__vector(float[3]));\npragma(mangle,\"_Z8half_expDv4_f\")  __vector(float[4])  half_exp(__vector(float[4]));\npragma(mangle,\"_Z8half_expDv8_f\")  __vector(float[8])  half_exp(__vector(float[8]));\npragma(mangle,\"_Z8half_expDv16_f\") __vector(float[16]) half_exp(__vector(float[16]));\n\n// half_exp2\npragma(mangle,\"_Z9half_exp2f\")               float      half_exp2(         float);\npragma(mangle,\"_Z9half_exp2Dv2_f\")  __vector(float[2])  half_exp2(__vector(float[2]));\npragma(mangle,\"_Z9half_exp2Dv3_f\")  __vector(float[3])  half_exp2(__vector(float[3]));\npragma(mangle,\"_Z9half_exp2Dv4_f\")  __vector(float[4])  half_exp2(__vector(float[4]));\npragma(mangle,\"_Z9half_exp2Dv8_f\")  __vector(float[8])  half_exp2(__vector(float[8]));\npragma(mangle,\"_Z9half_exp2Dv16_f\") __vector(float[16]) half_exp2(__vector(float[16]));\n\n// half_exp10\npragma(mangle,\"_Z10half_exp10f\")               float      half_exp10(         float);\npragma(mangle,\"_Z10half_exp10Dv2_f\")  __vector(float[2])  half_exp10(__vector(float[2]));\npragma(mangle,\"_Z10half_exp10Dv3_f\")  __vector(float[3])  half_exp10(__vector(float[3]));\npragma(mangle,\"_Z10half_exp10Dv4_f\")  __vector(float[4])  half_exp10(__vector(float[4]));\npragma(mangle,\"_Z10half_exp10Dv8_f\")  __vector(float[8])  half_exp10(__vector(float[8]));\npragma(mangle,\"_Z10half_exp10Dv16_f\") __vector(float[16]) half_exp10(__vector(float[16]));\n\n// half_log\npragma(mangle,\"_Z8half_logf\")               float      half_log(         float);\npragma(mangle,\"_Z8half_logDv2_f\")  __vector(float[2])  half_log(__vector(float[2]));\npragma(mangle,\"_Z8half_logDv3_f\")  __vector(float[3])  half_log(__vector(float[3]));\npragma(mangle,\"_Z8half_logDv4_f\")  __vector(float[4])  half_log(__vector(float[4]));\npragma(mangle,\"_Z8half_logDv8_f\")  __vector(float[8])  half_log(__vector(float[8]));\npragma(mangle,\"_Z8half_logDv16_f\") __vector(float[16]) half_log(__vector(float[16]));\n\n// half_log2\npragma(mangle,\"_Z9half_log2f\")               float      half_log2(         float);\npragma(mangle,\"_Z9half_log2Dv2_f\")  __vector(float[2])  half_log2(__vector(float[2]));\npragma(mangle,\"_Z9half_log2Dv3_f\")  __vector(float[3])  half_log2(__vector(float[3]));\npragma(mangle,\"_Z9half_log2Dv4_f\")  __vector(float[4])  half_log2(__vector(float[4]));\npragma(mangle,\"_Z9half_log2Dv8_f\")  __vector(float[8])  half_log2(__vector(float[8]));\npragma(mangle,\"_Z9half_log2Dv16_f\") __vector(float[16]) half_log2(__vector(float[16]));\n\n// half_log10\npragma(mangle,\"_Z10half_log10f\")               float      half_log10(         float);\npragma(mangle,\"_Z10half_log10Dv2_f\")  __vector(float[2])  half_log10(__vector(float[2]));\npragma(mangle,\"_Z10half_log10Dv3_f\")  __vector(float[3])  half_log10(__vector(float[3]));\npragma(mangle,\"_Z10half_log10Dv4_f\")  __vector(float[4])  half_log10(__vector(float[4]));\npragma(mangle,\"_Z10half_log10Dv8_f\")  __vector(float[8])  half_log10(__vector(float[8]));\npragma(mangle,\"_Z10half_log10Dv16_f\") __vector(float[16]) half_log10(__vector(float[16]));\n\n// half_powr\npragma(mangle,\"_Z9half_powrff\")                float      half_powr(         float,               float);\npragma(mangle,\"_Z9half_powrDv2_fS_\")  __vector(float[2])  half_powr(__vector(float[2]),  __vector(float[2]));\npragma(mangle,\"_Z9half_powrDv3_fS_\")  __vector(float[3])  half_powr(__vector(float[3]),  __vector(float[3]));\npragma(mangle,\"_Z9half_powrDv4_fS_\")  __vector(float[4])  half_powr(__vector(float[4]),  __vector(float[4]));\npragma(mangle,\"_Z9half_powrDv8_fS_\")  __vector(float[8])  half_powr(__vector(float[8]),  __vector(float[8]));\npragma(mangle,\"_Z9half_powrDv16_fS_\") __vector(float[16]) half_powr(__vector(float[16]), __vector(float[16]));\n\n// half_recip\npragma(mangle,\"_Z10half_recipf\")               float      half_recip(         float);\npragma(mangle,\"_Z10half_recipDv2_f\")  __vector(float[2])  half_recip(__vector(float[2]));\npragma(mangle,\"_Z10half_recipDv3_f\")  __vector(float[3])  half_recip(__vector(float[3]));\npragma(mangle,\"_Z10half_recipDv4_f\")  __vector(float[4])  half_recip(__vector(float[4]));\npragma(mangle,\"_Z10half_recipDv8_f\")  __vector(float[8])  half_recip(__vector(float[8]));\npragma(mangle,\"_Z10half_recipDv16_f\") __vector(float[16]) half_recip(__vector(float[16]));\n\n// half_rsqrt\npragma(mangle,\"_Z10half_rsqrtf\")               float      half_rsqrt(         float);\npragma(mangle,\"_Z10half_rsqrtDv2_f\")  __vector(float[2])  half_rsqrt(__vector(float[2]));\npragma(mangle,\"_Z10half_rsqrtDv3_f\")  __vector(float[3])  half_rsqrt(__vector(float[3]));\npragma(mangle,\"_Z10half_rsqrtDv4_f\")  __vector(float[4])  half_rsqrt(__vector(float[4]));\npragma(mangle,\"_Z10half_rsqrtDv8_f\")  __vector(float[8])  half_rsqrt(__vector(float[8]));\npragma(mangle,\"_Z10half_rsqrtDv16_f\") __vector(float[16]) half_rsqrt(__vector(float[16]));\n\n// half_sin\npragma(mangle,\"_Z8half_sinf\")               float      half_sin(         float);\npragma(mangle,\"_Z8half_sinDv2_f\")  __vector(float[2])  half_sin(__vector(float[2]));\npragma(mangle,\"_Z8half_sinDv3_f\")  __vector(float[3])  half_sin(__vector(float[3]));\npragma(mangle,\"_Z8half_sinDv4_f\")  __vector(float[4])  half_sin(__vector(float[4]));\npragma(mangle,\"_Z8half_sinDv8_f\")  __vector(float[8])  half_sin(__vector(float[8]));\npragma(mangle,\"_Z8half_sinDv16_f\") __vector(float[16]) half_sin(__vector(float[16]));\n\n// half_sqrt\npragma(mangle,\"_Z9half_sqrtf\")               float      half_sqrt(         float);\npragma(mangle,\"_Z9half_sqrtDv2_f\")  __vector(float[2])  half_sqrt(__vector(float[2]));\npragma(mangle,\"_Z9half_sqrtDv3_f\")  __vector(float[3])  half_sqrt(__vector(float[3]));\npragma(mangle,\"_Z9half_sqrtDv4_f\")  __vector(float[4])  half_sqrt(__vector(float[4]));\npragma(mangle,\"_Z9half_sqrtDv8_f\")  __vector(float[8])  half_sqrt(__vector(float[8]));\npragma(mangle,\"_Z9half_sqrtDv16_f\") __vector(float[16]) half_sqrt(__vector(float[16]));\n\n// half_tan\npragma(mangle,\"_Z8half_tanf\")               float      half_tan(         float);\npragma(mangle,\"_Z8half_tanDv2_f\")  __vector(float[2])  half_tan(__vector(float[2]));\npragma(mangle,\"_Z8half_tanDv3_f\")  __vector(float[3])  half_tan(__vector(float[3]));\npragma(mangle,\"_Z8half_tanDv4_f\")  __vector(float[4])  half_tan(__vector(float[4]));\npragma(mangle,\"_Z8half_tanDv8_f\")  __vector(float[8])  half_tan(__vector(float[8]));\npragma(mangle,\"_Z8half_tanDv16_f\") __vector(float[16]) half_tan(__vector(float[16]));\n\n// native_cos\npragma(mangle,\"_Z10native_cosf\")               float      native_cos(         float);\npragma(mangle,\"_Z10native_cosDv2_f\")  __vector(float[2])  native_cos(__vector(float[2]));\npragma(mangle,\"_Z10native_cosDv3_f\")  __vector(float[3])  native_cos(__vector(float[3]));\npragma(mangle,\"_Z10native_cosDv4_f\")  __vector(float[4])  native_cos(__vector(float[4]));\npragma(mangle,\"_Z10native_cosDv8_f\")  __vector(float[8])  native_cos(__vector(float[8]));\npragma(mangle,\"_Z10native_cosDv16_f\") __vector(float[16]) native_cos(__vector(float[16]));\n\n// native_divide\npragma(mangle,\"_Z13native_divideff\")                float      native_divide(         float,               float);\npragma(mangle,\"_Z13native_divideDv2_fS_\")  __vector(float[2])  native_divide(__vector(float[2]),  __vector(float[2]));\npragma(mangle,\"_Z13native_divideDv3_fS_\")  __vector(float[3])  native_divide(__vector(float[3]),  __vector(float[3]));\npragma(mangle,\"_Z13native_divideDv4_fS_\")  __vector(float[4])  native_divide(__vector(float[4]),  __vector(float[4]));\npragma(mangle,\"_Z13native_divideDv8_fS_\")  __vector(float[8])  native_divide(__vector(float[8]),  __vector(float[8]));\npragma(mangle,\"_Z13native_divideDv16_fS_\") __vector(float[16]) native_divide(__vector(float[16]), __vector(float[16]));\n\n// native_exp\npragma(mangle,\"_Z10native_expf\")               float      native_exp(         float);\npragma(mangle,\"_Z10native_expDv2_f\")  __vector(float[2])  native_exp(__vector(float[2]));\npragma(mangle,\"_Z10native_expDv3_f\")  __vector(float[3])  native_exp(__vector(float[3]));\npragma(mangle,\"_Z10native_expDv4_f\")  __vector(float[4])  native_exp(__vector(float[4]));\npragma(mangle,\"_Z10native_expDv8_f\")  __vector(float[8])  native_exp(__vector(float[8]));\npragma(mangle,\"_Z10native_expDv16_f\") __vector(float[16]) native_exp(__vector(float[16]));\n\n// native_exp2\npragma(mangle,\"_Z11native_exp2f\")               float      native_exp2(         float);\npragma(mangle,\"_Z11native_exp2Dv2_f\")  __vector(float[2])  native_exp2(__vector(float[2]));\npragma(mangle,\"_Z11native_exp2Dv3_f\")  __vector(float[3])  native_exp2(__vector(float[3]));\npragma(mangle,\"_Z11native_exp2Dv4_f\")  __vector(float[4])  native_exp2(__vector(float[4]));\npragma(mangle,\"_Z11native_exp2Dv8_f\")  __vector(float[8])  native_exp2(__vector(float[8]));\npragma(mangle,\"_Z11native_exp2Dv16_f\") __vector(float[16]) native_exp2(__vector(float[16]));\n\n// native_exp10\npragma(mangle,\"_Z12native_exp10f\")               float      native_exp10(         float);\npragma(mangle,\"_Z12native_exp10Dv2_f\")  __vector(float[2])  native_exp10(__vector(float[2]));\npragma(mangle,\"_Z12native_exp10Dv3_f\")  __vector(float[3])  native_exp10(__vector(float[3]));\npragma(mangle,\"_Z12native_exp10Dv4_f\")  __vector(float[4])  native_exp10(__vector(float[4]));\npragma(mangle,\"_Z12native_exp10Dv8_f\")  __vector(float[8])  native_exp10(__vector(float[8]));\npragma(mangle,\"_Z12native_exp10Dv16_f\") __vector(float[16]) native_exp10(__vector(float[16]));\n\n// native_log\npragma(mangle,\"_Z10native_logf\")               float      native_log(         float);\npragma(mangle,\"_Z10native_logDv2_f\")  __vector(float[2])  native_log(__vector(float[2]));\npragma(mangle,\"_Z10native_logDv3_f\")  __vector(float[3])  native_log(__vector(float[3]));\npragma(mangle,\"_Z10native_logDv4_f\")  __vector(float[4])  native_log(__vector(float[4]));\npragma(mangle,\"_Z10native_logDv8_f\")  __vector(float[8])  native_log(__vector(float[8]));\npragma(mangle,\"_Z10native_logDv16_f\") __vector(float[16]) native_log(__vector(float[16]));\n\n// native_log2\npragma(mangle,\"_Z11native_log2f\")               float      native_log2(         float);\npragma(mangle,\"_Z11native_log2Dv2_f\")  __vector(float[2])  native_log2(__vector(float[2]));\npragma(mangle,\"_Z11native_log2Dv3_f\")  __vector(float[3])  native_log2(__vector(float[3]));\npragma(mangle,\"_Z11native_log2Dv4_f\")  __vector(float[4])  native_log2(__vector(float[4]));\npragma(mangle,\"_Z11native_log2Dv8_f\")  __vector(float[8])  native_log2(__vector(float[8]));\npragma(mangle,\"_Z11native_log2Dv16_f\") __vector(float[16]) native_log2(__vector(float[16]));\n\n// native_log10\npragma(mangle,\"_Z12native_log10f\")               float      native_log10(         float);\npragma(mangle,\"_Z12native_log10Dv2_f\")  __vector(float[2])  native_log10(__vector(float[2]));\npragma(mangle,\"_Z12native_log10Dv3_f\")  __vector(float[3])  native_log10(__vector(float[3]));\npragma(mangle,\"_Z12native_log10Dv4_f\")  __vector(float[4])  native_log10(__vector(float[4]));\npragma(mangle,\"_Z12native_log10Dv8_f\")  __vector(float[8])  native_log10(__vector(float[8]));\npragma(mangle,\"_Z12native_log10Dv16_f\") __vector(float[16]) native_log10(__vector(float[16]));\n\n// native_powr\npragma(mangle,\"_Z11native_powrff\")                float      native_powr(         float,               float);\npragma(mangle,\"_Z11native_powrDv2_fS_\")  __vector(float[2])  native_powr(__vector(float[2]),  __vector(float[2]));\npragma(mangle,\"_Z11native_powrDv3_fS_\")  __vector(float[3])  native_powr(__vector(float[3]),  __vector(float[3]));\npragma(mangle,\"_Z11native_powrDv4_fS_\")  __vector(float[4])  native_powr(__vector(float[4]),  __vector(float[4]));\npragma(mangle,\"_Z11native_powrDv8_fS_\")  __vector(float[8])  native_powr(__vector(float[8]),  __vector(float[8]));\npragma(mangle,\"_Z11native_powrDv16_fS_\") __vector(float[16]) native_powr(__vector(float[16]), __vector(float[16]));\n\n// native_recip\npragma(mangle,\"_Z12native_recipf\")               float      native_recip(         float);\npragma(mangle,\"_Z12native_recipDv2_f\")  __vector(float[2])  native_recip(__vector(float[2]));\npragma(mangle,\"_Z12native_recipDv3_f\")  __vector(float[3])  native_recip(__vector(float[3]));\npragma(mangle,\"_Z12native_recipDv4_f\")  __vector(float[4])  native_recip(__vector(float[4]));\npragma(mangle,\"_Z12native_recipDv8_f\")  __vector(float[8])  native_recip(__vector(float[8]));\npragma(mangle,\"_Z12native_recipDv16_f\") __vector(float[16]) native_recip(__vector(float[16]));\n\n// native_rsqrt\npragma(mangle,\"_Z12native_rsqrtf\")               float      native_rsqrt(         float);\npragma(mangle,\"_Z12native_rsqrtDv2_f\")  __vector(float[2])  native_rsqrt(__vector(float[2]));\npragma(mangle,\"_Z12native_rsqrtDv3_f\")  __vector(float[3])  native_rsqrt(__vector(float[3]));\npragma(mangle,\"_Z12native_rsqrtDv4_f\")  __vector(float[4])  native_rsqrt(__vector(float[4]));\npragma(mangle,\"_Z12native_rsqrtDv8_f\")  __vector(float[8])  native_rsqrt(__vector(float[8]));\npragma(mangle,\"_Z12native_rsqrtDv16_f\") __vector(float[16]) native_rsqrt(__vector(float[16]));\n\n// native_sin\npragma(mangle,\"_Z10native_sinf\")               float      native_sin(         float);\npragma(mangle,\"_Z10native_sinDv2_f\")  __vector(float[2])  native_sin(__vector(float[2]));\npragma(mangle,\"_Z10native_sinDv3_f\")  __vector(float[3])  native_sin(__vector(float[3]));\npragma(mangle,\"_Z10native_sinDv4_f\")  __vector(float[4])  native_sin(__vector(float[4]));\npragma(mangle,\"_Z10native_sinDv8_f\")  __vector(float[8])  native_sin(__vector(float[8]));\npragma(mangle,\"_Z10native_sinDv16_f\") __vector(float[16]) native_sin(__vector(float[16]));\n\n// native_sqrt\npragma(mangle,\"_Z11native_sqrtf\")               float      native_sqrt(         float);\npragma(mangle,\"_Z11native_sqrtDv2_f\")  __vector(float[2])  native_sqrt(__vector(float[2]));\npragma(mangle,\"_Z11native_sqrtDv3_f\")  __vector(float[3])  native_sqrt(__vector(float[3]));\npragma(mangle,\"_Z11native_sqrtDv4_f\")  __vector(float[4])  native_sqrt(__vector(float[4]));\npragma(mangle,\"_Z11native_sqrtDv8_f\")  __vector(float[8])  native_sqrt(__vector(float[8]));\npragma(mangle,\"_Z11native_sqrtDv16_f\") __vector(float[16]) native_sqrt(__vector(float[16]));\n\n// native_tan\npragma(mangle,\"_Z10native_tanf\")               float      native_tan(         float);\npragma(mangle,\"_Z10native_tanDv2_f\")  __vector(float[2])  native_tan(__vector(float[2]));\npragma(mangle,\"_Z10native_tanDv3_f\")  __vector(float[3])  native_tan(__vector(float[3]));\npragma(mangle,\"_Z10native_tanDv4_f\")  __vector(float[4])  native_tan(__vector(float[4]));\npragma(mangle,\"_Z10native_tanDv8_f\")  __vector(float[8])  native_tan(__vector(float[8]));\npragma(mangle,\"_Z10native_tanDv16_f\") __vector(float[16]) native_tan(__vector(float[16]));\n"
  },
  {
    "path": "source/dcompute/std/opencl/sync.d",
    "content": "/++\nProvides access to the OpenCL C sync functions.\nSee_Also: [6.15.8. Synchronization Functions](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#synchronization-functions)$(BR)\n          [6.15.12.5. Fences](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#fences)$(BR)\n          [6.15.9. Legacy Explicit Memory Fence Functions](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#legacy-mem-fence-functions)\nStandards: [The OpenCL™ C Specification](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html)\nLicense:  [Boost License 1.0](https://boost.org/LICENSE_1_0.txt).\n+/\n@compute(CompileFor.deviceOnly) module dcompute.std.opencl.sync;\n\nimport ldc.dcompute;\nimport ldc.attributes;\n\npure:\nnothrow:\n@nogc:\n\n/// Standards: [6.3.3. Other Built-in Data Types](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#other-built-in-data-types)\nalias cl_mem_fence_flags = uint;\nenum : uint\n{\n    CLK_LOCAL_MEM_FENCE  = 1,\n    CLK_GLOBAL_MEM_FENCE = 2,\n    CLK_IMAGE_MEM_FENCE  = 4,\n}\n\n/// Standards: [6.15.12.4. Memory Scope](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#memory-scope)\nenum memory_scope : uint\n{\n    work_item       = 0,\n    sub_group       = 4,\n    work_group      = 1,\n    scope_device    = 2,\n    all_svm_devices = 3,\n    all_devices     = 3,\n}\n\n/// Standards: [6.15.12.3. Order and Consistency](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#order-and-consistency)\nenum memory_order : uint\n{\n    relaxed = 0,\n    acquire = 2,\n    release = 3,\n    acq_rel = 4,\n    seq_cst = 5,\n}\n\n/// Standards: [6.15.8. Synchronization Functions](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#synchronization-functions)\npragma(mangle, \"_Z7barrierj\")\nvoid barrier(cl_mem_fence_flags);\n\n/// Standards: [6.15.8. Synchronization Functions](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#synchronization-functions)\npragma(mangle, \"_Z18work_group_barrierj\")\nvoid work_group_barrier(cl_mem_fence_flags);\n\n/// Standards: [6.15.8. Synchronization Functions](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#synchronization-functions)\npragma(mangle, \"_Z18work_group_barrierj12memory_scope\")\nvoid work_group_barrier(cl_mem_fence_flags, memory_scope);\n\n/// Standards: [6.15.12.5. Fences](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#fences)\npragma(mangle, \"_Z22atomic_work_item_fencej12memory_order12memory_scope\")\nvoid atomic_work_item_fence(cl_mem_fence_flags, memory_order, memory_scope);\n\n/*\nLDC's backend, LLVM, does not support lowering the builtin memory fence functions.\nThese calls to atomic_work_item_fence generate the same OpMemoryBarrier instructions as a native mem_fence call.\nI've tried pragma(inline, true) but it leaves a OpStore before the OpMemoryBarrier at -O0,\nand something about it crashes llvm-spirv (i take this as a bad sign) so i didn't use it.\nUsing inlinehint doesn't inline at -O0 and inlines at -O1 and up. Perfect!\n*/\n/// Contrary to the OpenCL C spec this implementation of `mem_fence` is not deprecated by OpenCL C 2.0.\n/// Standards: [6.15.9. Legacy Explicit Memory Fence Functions](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#legacy-mem-fence-functions)\n@llvmAttr(\"inlinehint\")\nvoid mem_fence(cl_mem_fence_flags flags)\n{\n    if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n        atomic_work_item_fence(flags, memory_order.acq_rel, memory_scope.work_group);\n}\n\n/// Contrary to the OpenCL C spec this implementation of `read_mem_fence` is not deprecated by OpenCL C 2.0.\n/// Standards: [6.15.9. Legacy Explicit Memory Fence Functions](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#legacy-mem-fence-functions)\n@llvmAttr(\"inlinehint\")\nvoid read_mem_fence(cl_mem_fence_flags flags)\n{\n    if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n        atomic_work_item_fence(flags, memory_order.acquire, memory_scope.work_group);\n}\n\n/// Contrary to the OpenCL C spec this implementation of `write_mem_fence` is not deprecated by OpenCL C 2.0.\n/// Standards: [6.15.9. Legacy Explicit Memory Fence Functions](https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#legacy-mem-fence-functions)\n@llvmAttr(\"inlinehint\")\nvoid write_mem_fence(cl_mem_fence_flags flags)\n{\n    if(__dcompute_reflect(ReflectTarget.OpenCL,0))\n        atomic_work_item_fence(flags, memory_order.release, memory_scope.work_group);\n}\n"
  },
  {
    "path": "source/dcompute/std/pack.d",
    "content": "@compute(CompileFor.hostAndDevice) module dcompute.std.pack;\n\nimport ldc.dcompute;\n//Unpacking functions\n/*\nfloat4 unorm4x8_to_float(uint x);\nfloat4 snorm4x8_to_float(uint x);\n\nhalf4  unorm4x8_to_half(uint x);\nhalf4  snorm4x8_to_half(uint x);\n\nfloat4 unorm4x8_srgb_to_float(uint x);\nhalf4  unorm4x8_srgb_to_half(uint x);\n\nfloat2 unorm2x16_to_float(uint x);\nfloat2 snorm2x16_to_float(uint x);\n\nhalf2  unorm2x16_to_half(uint x);\nhalf2  snorm2x16_to_half(uint x);\n\nfloat4 unorm10a2_to_float(uint x);\nfloat3 unorm565_to_float(ushort x);\n\nhalf4  unorm10a2_to_half(uint x);\nhalf3  unorm565_to_half(ushort x);\n\n//Packing functions\nuint float_to_unorm4x8(float4 x);\nuint float_to_snorm4x8(float4 x);\n\nuint half_to_unorm4x8(half4 x);\nuint half_to_snorm4x8(half4 x);\n\nuint float_to_unorm2x16(float2 x);\nuint float_to_snorm2x16(float2 x);\n\nuint half_to_unorm2x16(half2 x);\nuint half_to_snorm2x16(half2 x);\n\nuint   float_to_unorm10a2(float4);\nushort float_to_unorm565(float3);\n\nuint   half_to_unorm10a2(half4);\nushort half_to_unorm565(half3);*/\n"
  },
  {
    "path": "source/dcompute/std/package.d",
    "content": "module dcompute.std;\n\nversion(LDC_DCompute) {}\nelse\n{\n    static assert(false, \"Need to use a DCompute enabled compiler.\");\n}\n\n\npublic import dcompute.std.index;\n"
  },
  {
    "path": "source/dcompute/std/sync.d",
    "content": "@compute(CompileFor.deviceOnly) module dcompute.std.sync;\n\nimport ldc.dcompute;\nimport ldc.intrinsics;\n\nimport ocl  = dcompute.std.opencl.sync;\nimport cuda = dcompute.std.cuda.sync;\n\n//suspends work-item execution until all work-items in the work-group have called the barrier\nvoid barrier()\n{\n    if(__dcompute_reflect(ReflectTarget.OpenCL))\n        ocl.barrier(0);\n    if(__dcompute_reflect(ReflectTarget.CUDA)) {\n        static if (LLVM_atleast!21) { // >= LDC 1.42.0(LLVM 21)\n            cuda.barrier_n(0);\n        } else {\n            cuda.barrier0();\n        }\n    }\n}\n\nvoid local_fence()\n{\n    if(__dcompute_reflect(ReflectTarget.OpenCL))\n        ocl.mem_fence(ocl.CLK_LOCAL_MEM_FENCE);\n    if(__dcompute_reflect(ReflectTarget.CUDA))\n        cuda.membar_cta();\n}\n// A global fence implies a local fence\nvoid global_fence()\n{\n    if(__dcompute_reflect(ReflectTarget.OpenCL))\n        ocl.mem_fence(ocl.CLK_GLOBAL_MEM_FENCE);\n    if(__dcompute_reflect(ReflectTarget.CUDA))\n        cuda.membar_gl();\n}\n\n//TODO: image fence?\n\n\n"
  },
  {
    "path": "source/dcompute/std/warp.d",
    "content": "@compute(CompileFor.deviceOnly) module dcompute.std.warp;\n\nimport ldc.dcompute;\n/*Warp functions\n *Vote:\n * int  any(int pred) - true if any lanes `pred` is true\n * int  all(int pred) - true iff all lanes `pred` are true\n * ulong ballot(int pred) - ith bit is set if pred is true for the ith lane\n *\n *Shuffle:\n * T shuffle(T val, int lane, int width=warpsize)\n * T shuffle_{up,down}(T val, uint lane_delta,int width=warpsize)\n * T shuffle_xor(T val, int lane_mask,int width=warpsize)\n *\n *Reduction:\n * T reduce!op(T val)\n * T inclusive_scan!op(T val)\n * T exclusive_scan!op(T val)\n */\n"
  },
  {
    "path": "source/dcompute/tests/dummykernels.d",
    "content": "@compute(CompileFor.deviceOnly)\nmodule dcompute.tests.dummykernels;\npragma(LDC_no_moduleinfo);\n\nimport ldc.dcompute;\nimport dcompute.std.index;\n\n@kernel() void saxpy(GlobalPointer!(float) res,\n                   float alpha,GlobalPointer!(float) x,\n                   GlobalPointer!(float) y, \n                   size_t N)\n{\n    auto i = GlobalIndex.x;\n    if (i >= N) return;\n    res[i] = alpha*x[i] + y[i];\n}\n\nalias aagf = AutoIndexed!(GlobalPointer!(float));\n\n@kernel() void auto_index_test(aagf a,\n                             aagf b,\n                             aagf c)\n{\n    a = b + c;\n}\n"
  },
  {
    "path": "source/dcompute/tests/main.d",
    "content": "version (DComputeTesting) {\n    version = DComputeTestCUDA;\n}\n\n//import dcompute.tests.test;\n\nimport std.algorithm;\nimport std.stdio;\nimport std.file;\nimport std.traits;\nimport std.meta;\nimport std.exception : enforce;\nimport std.experimental.allocator;\nimport std.array;\nimport std.typecons;\nimport std.conv : to;\nimport std.math.traits : isNaN;\n\nimport dcompute.driver.cuda.unified_buffer;\nimport dcompute.tests.dummykernels : saxpy;\n\nversion(DComputeTestOpenCL)\n    import dcompute.driver.ocl;\nelse version(DComputeTestCUDA)\n    import dcompute.driver.cuda;\nelse\n    static assert(false, \"Need to test something!\");\n\n// Index of OpenCL 2.1 capable platform returned by Platform.getPlatforms\nenum CL_PLATFORM_INDEX = 2;\n\nint main(string[] args)\n{\n    enum size_t N = 128;\n    float alpha = 5.0;\n    float[N] res, x,y;\n    foreach (i; 0 .. N)\n    { \n        x[i] = N - i;\n        y[i] = i * i;\n    }\n\n    version(DComputeTestOpenCL)\n    {\n        Platform.initialise();\n        onDriverError = (Status _status) { throw new DComputeDriverException(_status); };\n        auto platforms = Platform.getPlatforms(theAllocator);\n        auto platform = platforms[CL_PLATFORM_INDEX];\n        DerelictCL.reload(CLVersion.CL21);\n\n        writeln(\"Platforms:\");\n        foreach (i, ref p; platforms)\n        {\n            writefln(\"\\t[%d%1s] %s\", i, (i == CL_PLATFORM_INDEX) ? \"*\" : \"\", p.name);\n        }\n        writeln(\"\\tChosen: \", platform.name);\n\n        auto devices  = platform.getDevices(theAllocator);\n        writeln(\"Devices:\");\n        foreach (i, ref d; devices)\n        {\n            writefln(\"\\t[%d] %s\", i, d.name);\n            writefln(\"\\t\\t%s\", d.vendor);\n            writefln(\"\\t\\t(%d)%s\", d.type, d.type);\n            writefln(\"\\t\\t(%d)%s\", d.queueProperties, d.queueProperties);\n            writefln(\"\\t\\t(%d)%s\", d.floatFPConfig, d.floatFPConfig);\n            writefln(\"\\t\\t(%d)%s\", d.GLobalMemoryCacheType, d.GLobalMemoryCacheType);\n            writefln(\"\\t\\t(%d)%s\", d.executionCapabilities, d.executionCapabilities);\n            // writefln(\"\\t\\t%s\", d.OpenCLCVersion);\n            // writefln(\"\\t\\t%s\", d.deviceVersion);\n            // writefln(\"\\t\\t%s\", d.builtinKernels);\n        }\n        writeln(\"\\tChosen: \", devices[0].name);\n\n        auto plist    = propertyList!(Context.Properties)(Context.Properties.platform, platform.raw);\n        writeln(plist);\n        auto ctx      = Context(devices[0 ..1],null /*FIXME: plist[]*/);\n\t    // Change the file to the built OpenCL version.\n        version (Windows) {\n            Program.globalProgram = ctx.createProgram(cast(ubyte[]) read(\"./kernels_ocl200_64.spv\"));\n        } else {\n            Program.globalProgram = ctx.createProgram(cast(ubyte[]) read(\"./.dub/obj/kernels_ocl200_64.spv\"));\n        }\n\n        try\n        {\n            Program.globalProgram.build(devices,\"\");\n        }\n            catch(DComputeDriverException e)\n        {\n            auto b = Build(Program.globalProgram, devices[0]);\n            writeln(b.log);\n        }\n        \n        auto queue    = ctx.createQueue(devices[0],Queue.Properties.outOfOrderExecution);\n\n        Buffer!(float) b_res, b_x, b_y;\n\n        b_res = ctx.createBuffer(res[], Memory.Flags.useHostPointer | Memory.Flags.readWrite);\n        b_x = ctx.createBuffer(x[],Memory.Flags.useHostPointer | Memory.Flags.readWrite);\n        b_y = ctx.createBuffer(y[],Memory.Flags.useHostPointer | Memory.Flags.readWrite);\n\n        Event e = queue.enqueue!(saxpy)([N])(b_res,alpha,b_x,b_y, N);\n        e.wait();\n\n        // zero-copy failed\n        if (isNaN(res[0])) {\n            writeln(\"Read buffer from device\");\n            queue.read!(float)(b_res, res);\n        }\n    }\n\n    version(DComputeTestCUDA)\n    {\n        Platform.initialise();\n\t\n        auto devs = Platform.getDevices(theAllocator);\n        auto dev   = devs[0]; \n        auto ctx   = Context(dev); scope(exit) ctx.detach();\n\n        // Change the file to match your GPU.\n        version (Windows) {\n            Program.globalProgram = Program.fromFile(\"./kernels_cuda210_64.ptx\");\n        } else {\n            Program.globalProgram = Program.fromFile(\"./kernels_cuda800_64.ptx\");\n        }\n        auto q = Queue(false);\n\n        Buffer!(float) b_res, b_x, b_y;\n        b_res =  Buffer!(float)(res[]); scope(exit) b_res.release();\n        b_x   =  Buffer!(float)(x[]);   scope(exit) b_x.release();\n        b_y   =  Buffer!(float)(y[]);   scope(exit) b_y.release();\n\n        b_x.copy!(Copy.hostToDevice);\n        b_y.copy!(Copy.hostToDevice);\n\n        q.enqueue!(saxpy)\n                  ([N,1,1],[1,1,1])\n                  (b_res,alpha,b_x,b_y, N);\n        b_res.copy!(Copy.deviceToHost);\n\n        // --- Unified Memory test (runs only when the device supports it) ---\n        if (dev.supportsUnifiedMemory)\n        {\n            writeln(\"\\nDevice supports Unified Memory — running UnifiedBuffer test...\");\n\n            // Allocate managed memory and initialise from host slices.\n            // No explicit H2D copy is needed; the runtime migrates pages.\n            auto ub_x   = UnifiedBuffer!float(x[]);   scope(exit) ub_x.release();\n            auto ub_y   = UnifiedBuffer!float(y[]);   scope(exit) ub_y.release();\n            auto ub_res = UnifiedBuffer!float(N);     scope(exit) ub_res.release();\n\n            q.enqueue!(saxpy)\n                      ([N,1,1],[1,1,1])\n                      (ub_res, alpha, ub_x, ub_y, N);\n\n            // Synchronise so that host can safely read results.\n            // (No D2H copy — the host slice is the same allocation.)\n            Context.sync();\n\n            foreach (i; 0 .. N)\n                enforce(ub_res.hostSlice[i] == alpha * x[i] + y[i],\n                        \"Unified Memory verification failed at index \" ~ i.to!string ~ \"!\");\n\n            writeln(\"UnifiedBuffer test PASSED.\");\n        }\n        else\n        {\n            writeln(\"\\nDevice does not support Unified Memory — skipping UnifiedBuffer test.\");\n        }\n    }\n\n    foreach(i; 0 .. N)\n        enforce(res[i] == alpha * x[i] + y[i]);\n    writeln(res[]);\n    return 0;\n}\n\n\n"
  },
  {
    "path": "source/dcompute/tests/test.d",
    "content": "@compute(CompileFor.deviceOnly)\nmodule dcompute.tests.test;\n\nimport ldc.dcompute;\nimport dcompute.std.index;\nimport std.traits;\n\n@kernel()\nvoid map(alias F)(GlobalPointer!(ReturnType!(F)) r, Parameters!F args)\n{\n    r[GlobalIndex.x] = F(args);\n}\n\n"
  }
]