[
  {
    "path": ".gitignore",
    "content": ".vscode\nraw.h\nintrinsics.json\ndata.xml"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2023 Alivan Akbar\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# SIMD Implementation in Golang\n\nThis repository contains an implementation of SIMD (Single Instruction, Multiple Data) operations in Go, specifically targeting ARM NEON architecture. The goal is to provide optimized parallel processing capabilities for certain computational tasks.\n\n## Future Plans\n\nWe are actively working on expanding the SIMD implementation to support x86 architecture as well. The upcoming x86 implementation will provide similar SIMD functionalities for parallel processing on x86-based systems.\n\n## Hacks\n\nWhen we call a C function through CGO, there are some overheads due to Go design.\nIn general, avoiding CGO would be a good idea.\nBut I found a hack, instead of relying on CGO, we can utilize `linkname` directive to call C code, bypass CGO, and get better performance.\n\n```\ngoos: darwin\ngoarch: arm64\npkg: github.com/alivanz/go-simd/arm/neon\nBenchmarkMultRef-8                131395              9168 ns/op\nBenchmarkMultSimd-8               598742              1954 ns/op\nBenchmarkMultSimdBypass-8         605554              1959 ns/op\nBenchmarkMultSimdFull-8          1816879               661.3 ns/op\nBenchmarkMultSimdCgo-8             13020             92213 ns/op\nPASS\n```\n\n```\ngoos: darwin\ngoarch: arm64\npkg: github.com/alivanz/go-simd/arm/neon\ncpu: Apple M2\nBenchmarkVmulqF32N-8                8848            124616 ns/op        33657.86 MB/s       1422 B/op          0 allocs/op\nBenchmarkVmulqF32C-8                2256            528683 ns/op        7933.49 MB/s        5577 B/op          0 allocs/op\nBenchmarkVmulqF32Ref-8              3630            327995 ns/op        12787.69 MB/s       3466 B/op          0 allocs/op\nPASS\nok      github.com/alivanz/go-simd/arm/neon     5.793s\n```\n\nThe floating-point multiplication benchmarks demonstrate significant performance differences between implementations:\n\n- `VmulqF32N` (Native): Achieves the highest throughput at 33.6 GB/s with minimal memory allocation (1422 B/op). This implementation leverages direct SIMD instructions for optimal performance.\n- `VmulqF32C` (C): Shows the lowest performance at 7.9 GB/s with higher memory allocation (5577 B/op), likely due to the overhead of CGO calls and memory management.\n- `VmulqF32Ref` (Reference): Performs at 12.8 GB/s with moderate memory usage (3466 B/op), serving as a baseline for comparison.\n\nThese results highlight the importance of using native SIMD implementations over CGO-based solutions for performance-critical applications. The native implementation is approximately 2.6x faster than the reference implementation, while the C implementation is about 1.6x slower than the reference.\n\n## Features\n\n- SIMD operations for ARM NEON architecture.\n- High-performance parallel processing for specific tasks.\n- Utilizes the power of SIMD instructions to process multiple data elements simultaneously.\n- Supports a range of data types, including integers and floating-point numbers.\n- Modular design for easy integration into existing projects.\n- Well-documented code for understanding and extending the implementation.\n\n## Roadmap\n\n- [x] Implement SIMD operations for ARM NEON architecture.\n- [ ] Add support for x86 architecture.\n- [ ] Expand SIMD operations for additional data types.\n- [ ] Optimize performance for specific use cases.\n- [ ] Develop comprehensive test suite for validation.\n\n## Usage\n\nTo use the SIMD implementations in your project, follow these steps:\n\n1. Import the required package in your Go code:\n\n```go\nimport \"github.com/alivanz/go-simd\"\n```\n\n2. Use the SIMD functions in your code as needed. Example:\n\n```go\npackage main\n\nimport (\n\t\"log\"\n\n\t\"github.com/alivanz/go-simd/arm\"\n\t\"github.com/alivanz/go-simd/arm/neon\"\n)\n\nfunc main() {\n\tvar a, b arm.Int8X8\n\tvar add, mul arm.Int16X8\n\tfor i := 0; i < 8; i++ {\n\t\ta[i] = arm.Int8(i)\n\t\tb[i] = arm.Int8(i * i)\n\t}\n\tlog.Printf(\"a = %+v\", a)\n\tlog.Printf(\"b = %+v\", b)\n\tneon.VaddlS8(&add, &a, &b)\n\tneon.VmullS8(&mul, &a, &b)\n\tlog.Printf(\"add = %+v\", add)\n\tlog.Printf(\"mul = %+v\", mul)\n}\n\n```\n\n## Supported Operations\n\nOnly ARM Neon supported, for now.\n\nRefer to the documentation in each respective file for more details on how to use each operation.\n\n## Contributing\n\nContributions to this project are welcome. To contribute, please follow these steps:\n\n1. Fork the repository.\n2. Create a new branch for your feature or bug fix.\n3. Make your changes and commit them with descriptive messages.\n4. Push your changes to your forked repository.\n5. Submit a pull request to the main repository.\n\nPlease ensure that your code follows the existing code style and includes appropriate tests.\n\n## Acknowledgments\n\n- The ARM NEON architecture documentation for providing valuable insights into SIMD programming techniques.\n- The open-source community for their contributions and inspiration.\n\n## Contact\n\nFor any questions or feedback regarding this repository, please feel free to contact me at [alivan1627@gmail.com](mailto:alivan1627@gmail.com)"
  },
  {
    "path": "arm/generate.go",
    "content": "package arm\n\n//go:generate go run ../generator/arm\n"
  },
  {
    "path": "arm/neon/functions.c",
    "content": "#include <arm_neon.h>\n\nvoid VabaS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vaba_s8(*v0, *v1, *v2); }\nvoid VabaS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vaba_s16(*v0, *v1, *v2); }\nvoid VabaS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vaba_s32(*v0, *v1, *v2); }\nvoid VabaU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vaba_u8(*v0, *v1, *v2); }\nvoid VabaU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vaba_u16(*v0, *v1, *v2); }\nvoid VabaU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vaba_u32(*v0, *v1, *v2); }\nvoid VabalS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vabal_s8(*v0, *v1, *v2); }\nvoid VabalS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vabal_s16(*v0, *v1, *v2); }\nvoid VabalS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vabal_s32(*v0, *v1, *v2); }\nvoid VabalU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vabal_u8(*v0, *v1, *v2); }\nvoid VabalU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vabal_u16(*v0, *v1, *v2); }\nvoid VabalU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vabal_u32(*v0, *v1, *v2); }\nvoid VabalHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vabal_high_s8(*v0, *v1, *v2); }\nvoid VabalHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vabal_high_s16(*v0, *v1, *v2); }\nvoid VabalHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vabal_high_s32(*v0, *v1, *v2); }\nvoid VabalHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vabal_high_u8(*v0, *v1, *v2); }\nvoid VabalHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vabal_high_u16(*v0, *v1, *v2); }\nvoid VabalHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vabal_high_u32(*v0, *v1, *v2); }\nvoid VabaqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vabaq_s8(*v0, *v1, *v2); }\nvoid VabaqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vabaq_s16(*v0, *v1, *v2); }\nvoid VabaqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vabaq_s32(*v0, *v1, *v2); }\nvoid VabaqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vabaq_u8(*v0, *v1, *v2); }\nvoid VabaqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vabaq_u16(*v0, *v1, *v2); }\nvoid VabaqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vabaq_u32(*v0, *v1, *v2); }\nvoid VabdS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vabd_s8(*v0, *v1); }\nvoid VabdS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vabd_s16(*v0, *v1); }\nvoid VabdS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vabd_s32(*v0, *v1); }\nvoid VabdU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vabd_u8(*v0, *v1); }\nvoid VabdU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vabd_u16(*v0, *v1); }\nvoid VabdU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vabd_u32(*v0, *v1); }\nvoid VabdF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vabd_f32(*v0, *v1); }\nvoid VabdF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vabd_f64(*v0, *v1); }\nvoid VabddF64(float64_t* r, float64_t* v0, float64_t* v1) { *r = vabdd_f64(*v0, *v1); }\nvoid VabdlS8(int16x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vabdl_s8(*v0, *v1); }\nvoid VabdlS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vabdl_s16(*v0, *v1); }\nvoid VabdlS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vabdl_s32(*v0, *v1); }\nvoid VabdlU8(uint16x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vabdl_u8(*v0, *v1); }\nvoid VabdlU16(uint32x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vabdl_u16(*v0, *v1); }\nvoid VabdlU32(uint64x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vabdl_u32(*v0, *v1); }\nvoid VabdlHighS8(int16x8_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vabdl_high_s8(*v0, *v1); }\nvoid VabdlHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vabdl_high_s16(*v0, *v1); }\nvoid VabdlHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vabdl_high_s32(*v0, *v1); }\nvoid VabdlHighU8(uint16x8_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vabdl_high_u8(*v0, *v1); }\nvoid VabdlHighU16(uint32x4_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vabdl_high_u16(*v0, *v1); }\nvoid VabdlHighU32(uint64x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vabdl_high_u32(*v0, *v1); }\nvoid VabdqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vabdq_s8(*v0, *v1); }\nvoid VabdqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vabdq_s16(*v0, *v1); }\nvoid VabdqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vabdq_s32(*v0, *v1); }\nvoid VabdqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vabdq_u8(*v0, *v1); }\nvoid VabdqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vabdq_u16(*v0, *v1); }\nvoid VabdqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vabdq_u32(*v0, *v1); }\nvoid VabdqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vabdq_f32(*v0, *v1); }\nvoid VabdqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vabdq_f64(*v0, *v1); }\nvoid VabdsF32(float32_t* r, float32_t* v0, float32_t* v1) { *r = vabds_f32(*v0, *v1); }\nvoid VabsS8(int8x8_t* r, int8x8_t* v0) { *r = vabs_s8(*v0); }\nvoid VabsS16(int16x4_t* r, int16x4_t* v0) { *r = vabs_s16(*v0); }\nvoid VabsS32(int32x2_t* r, int32x2_t* v0) { *r = vabs_s32(*v0); }\nvoid VabsS64(int64x1_t* r, int64x1_t* v0) { *r = vabs_s64(*v0); }\nvoid VabsF32(float32x2_t* r, float32x2_t* v0) { *r = vabs_f32(*v0); }\nvoid VabsF64(float64x1_t* r, float64x1_t* v0) { *r = vabs_f64(*v0); }\nvoid VabsdS64(int64_t* r, int64_t* v0) { *r = vabsd_s64(*v0); }\nvoid VabsqS8(int8x16_t* r, int8x16_t* v0) { *r = vabsq_s8(*v0); }\nvoid VabsqS16(int16x8_t* r, int16x8_t* v0) { *r = vabsq_s16(*v0); }\nvoid VabsqS32(int32x4_t* r, int32x4_t* v0) { *r = vabsq_s32(*v0); }\nvoid VabsqS64(int64x2_t* r, int64x2_t* v0) { *r = vabsq_s64(*v0); }\nvoid VabsqF32(float32x4_t* r, float32x4_t* v0) { *r = vabsq_f32(*v0); }\nvoid VabsqF64(float64x2_t* r, float64x2_t* v0) { *r = vabsq_f64(*v0); }\nvoid VaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vadd_s8(*v0, *v1); }\nvoid VaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vadd_s16(*v0, *v1); }\nvoid VaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vadd_s32(*v0, *v1); }\nvoid VaddS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vadd_s64(*v0, *v1); }\nvoid VaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vadd_u8(*v0, *v1); }\nvoid VaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vadd_u16(*v0, *v1); }\nvoid VaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vadd_u32(*v0, *v1); }\nvoid VaddU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vadd_u64(*v0, *v1); }\nvoid VaddF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vadd_f32(*v0, *v1); }\nvoid VaddF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vadd_f64(*v0, *v1); }\nvoid VaddP16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vadd_p16(*v0, *v1); }\nvoid VaddP64(poly64x1_t* r, poly64x1_t* v0, poly64x1_t* v1) { *r = vadd_p64(*v0, *v1); }\nvoid VaddP8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vadd_p8(*v0, *v1); }\nvoid VadddS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vaddd_s64(*v0, *v1); }\nvoid VadddU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vaddd_u64(*v0, *v1); }\nvoid VaddhnS16(int8x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vaddhn_s16(*v0, *v1); }\nvoid VaddhnS32(int16x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vaddhn_s32(*v0, *v1); }\nvoid VaddhnS64(int32x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vaddhn_s64(*v0, *v1); }\nvoid VaddhnU16(uint8x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vaddhn_u16(*v0, *v1); }\nvoid VaddhnU32(uint16x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vaddhn_u32(*v0, *v1); }\nvoid VaddhnU64(uint32x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vaddhn_u64(*v0, *v1); }\nvoid VaddhnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vaddhn_high_s16(*v0, *v1, *v2); }\nvoid VaddhnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vaddhn_high_s32(*v0, *v1, *v2); }\nvoid VaddhnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vaddhn_high_s64(*v0, *v1, *v2); }\nvoid VaddhnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vaddhn_high_u16(*v0, *v1, *v2); }\nvoid VaddhnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vaddhn_high_u32(*v0, *v1, *v2); }\nvoid VaddhnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vaddhn_high_u64(*v0, *v1, *v2); }\nvoid VaddlS8(int16x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vaddl_s8(*v0, *v1); }\nvoid VaddlS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vaddl_s16(*v0, *v1); }\nvoid VaddlS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vaddl_s32(*v0, *v1); }\nvoid VaddlU8(uint16x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vaddl_u8(*v0, *v1); }\nvoid VaddlU16(uint32x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vaddl_u16(*v0, *v1); }\nvoid VaddlU32(uint64x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vaddl_u32(*v0, *v1); }\nvoid VaddlHighS8(int16x8_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vaddl_high_s8(*v0, *v1); }\nvoid VaddlHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vaddl_high_s16(*v0, *v1); }\nvoid VaddlHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vaddl_high_s32(*v0, *v1); }\nvoid VaddlHighU8(uint16x8_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vaddl_high_u8(*v0, *v1); }\nvoid VaddlHighU16(uint32x4_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vaddl_high_u16(*v0, *v1); }\nvoid VaddlHighU32(uint64x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vaddl_high_u32(*v0, *v1); }\nvoid VaddlvS8(int16_t* r, int8x8_t* v0) { *r = vaddlv_s8(*v0); }\nvoid VaddlvS16(int32_t* r, int16x4_t* v0) { *r = vaddlv_s16(*v0); }\nvoid VaddlvS32(int64_t* r, int32x2_t* v0) { *r = vaddlv_s32(*v0); }\nvoid VaddlvU8(uint16_t* r, uint8x8_t* v0) { *r = vaddlv_u8(*v0); }\nvoid VaddlvU16(uint32_t* r, uint16x4_t* v0) { *r = vaddlv_u16(*v0); }\nvoid VaddlvU32(uint64_t* r, uint32x2_t* v0) { *r = vaddlv_u32(*v0); }\nvoid VaddlvqS8(int16_t* r, int8x16_t* v0) { *r = vaddlvq_s8(*v0); }\nvoid VaddlvqS16(int32_t* r, int16x8_t* v0) { *r = vaddlvq_s16(*v0); }\nvoid VaddlvqS32(int64_t* r, int32x4_t* v0) { *r = vaddlvq_s32(*v0); }\nvoid VaddlvqU8(uint16_t* r, uint8x16_t* v0) { *r = vaddlvq_u8(*v0); }\nvoid VaddlvqU16(uint32_t* r, uint16x8_t* v0) { *r = vaddlvq_u16(*v0); }\nvoid VaddlvqU32(uint64_t* r, uint32x4_t* v0) { *r = vaddlvq_u32(*v0); }\nvoid VaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vaddq_s8(*v0, *v1); }\nvoid VaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vaddq_s16(*v0, *v1); }\nvoid VaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vaddq_s32(*v0, *v1); }\nvoid VaddqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vaddq_s64(*v0, *v1); }\nvoid VaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vaddq_u8(*v0, *v1); }\nvoid VaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vaddq_u16(*v0, *v1); }\nvoid VaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vaddq_u32(*v0, *v1); }\nvoid VaddqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vaddq_u64(*v0, *v1); }\nvoid VaddqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vaddq_f32(*v0, *v1); }\nvoid VaddqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vaddq_f64(*v0, *v1); }\nvoid VaddqP128(poly128_t* r, poly128_t* v0, poly128_t* v1) { *r = vaddq_p128(*v0, *v1); }\nvoid VaddqP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vaddq_p16(*v0, *v1); }\nvoid VaddqP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vaddq_p64(*v0, *v1); }\nvoid VaddqP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vaddq_p8(*v0, *v1); }\nvoid VaddvS8(int8_t* r, int8x8_t* v0) { *r = vaddv_s8(*v0); }\nvoid VaddvS16(int16_t* r, int16x4_t* v0) { *r = vaddv_s16(*v0); }\nvoid VaddvS32(int32_t* r, int32x2_t* v0) { *r = vaddv_s32(*v0); }\nvoid VaddvU8(uint8_t* r, uint8x8_t* v0) { *r = vaddv_u8(*v0); }\nvoid VaddvU16(uint16_t* r, uint16x4_t* v0) { *r = vaddv_u16(*v0); }\nvoid VaddvU32(uint32_t* r, uint32x2_t* v0) { *r = vaddv_u32(*v0); }\nvoid VaddvF32(float32_t* r, float32x2_t* v0) { *r = vaddv_f32(*v0); }\nvoid VaddvqS8(int8_t* r, int8x16_t* v0) { *r = vaddvq_s8(*v0); }\nvoid VaddvqS16(int16_t* r, int16x8_t* v0) { *r = vaddvq_s16(*v0); }\nvoid VaddvqS32(int32_t* r, int32x4_t* v0) { *r = vaddvq_s32(*v0); }\nvoid VaddvqS64(int64_t* r, int64x2_t* v0) { *r = vaddvq_s64(*v0); }\nvoid VaddvqU8(uint8_t* r, uint8x16_t* v0) { *r = vaddvq_u8(*v0); }\nvoid VaddvqU16(uint16_t* r, uint16x8_t* v0) { *r = vaddvq_u16(*v0); }\nvoid VaddvqU32(uint32_t* r, uint32x4_t* v0) { *r = vaddvq_u32(*v0); }\nvoid VaddvqU64(uint64_t* r, uint64x2_t* v0) { *r = vaddvq_u64(*v0); }\nvoid VaddvqF32(float32_t* r, float32x4_t* v0) { *r = vaddvq_f32(*v0); }\nvoid VaddvqF64(float64_t* r, float64x2_t* v0) { *r = vaddvq_f64(*v0); }\nvoid VaddwS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1) { *r = vaddw_s8(*v0, *v1); }\nvoid VaddwS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1) { *r = vaddw_s16(*v0, *v1); }\nvoid VaddwS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1) { *r = vaddw_s32(*v0, *v1); }\nvoid VaddwU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1) { *r = vaddw_u8(*v0, *v1); }\nvoid VaddwU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1) { *r = vaddw_u16(*v0, *v1); }\nvoid VaddwU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1) { *r = vaddw_u32(*v0, *v1); }\nvoid VaddwHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1) { *r = vaddw_high_s8(*v0, *v1); }\nvoid VaddwHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1) { *r = vaddw_high_s16(*v0, *v1); }\nvoid VaddwHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1) { *r = vaddw_high_s32(*v0, *v1); }\nvoid VaddwHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1) { *r = vaddw_high_u8(*v0, *v1); }\nvoid VaddwHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1) { *r = vaddw_high_u16(*v0, *v1); }\nvoid VaddwHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1) { *r = vaddw_high_u32(*v0, *v1); }\nvoid VaesdqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vaesdq_u8(*v0, *v1); }\nvoid VaeseqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vaeseq_u8(*v0, *v1); }\nvoid VaesimcqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vaesimcq_u8(*v0); }\nvoid VaesmcqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vaesmcq_u8(*v0); }\nvoid VandS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vand_s8(*v0, *v1); }\nvoid VandS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vand_s16(*v0, *v1); }\nvoid VandS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vand_s32(*v0, *v1); }\nvoid VandS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vand_s64(*v0, *v1); }\nvoid VandU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vand_u8(*v0, *v1); }\nvoid VandU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vand_u16(*v0, *v1); }\nvoid VandU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vand_u32(*v0, *v1); }\nvoid VandU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vand_u64(*v0, *v1); }\nvoid VandqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vandq_s8(*v0, *v1); }\nvoid VandqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vandq_s16(*v0, *v1); }\nvoid VandqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vandq_s32(*v0, *v1); }\nvoid VandqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vandq_s64(*v0, *v1); }\nvoid VandqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vandq_u8(*v0, *v1); }\nvoid VandqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vandq_u16(*v0, *v1); }\nvoid VandqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vandq_u32(*v0, *v1); }\nvoid VandqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vandq_u64(*v0, *v1); }\nvoid VbcaxqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vbcaxq_s8(*v0, *v1, *v2); }\nvoid VbcaxqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vbcaxq_s16(*v0, *v1, *v2); }\nvoid VbcaxqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vbcaxq_s32(*v0, *v1, *v2); }\nvoid VbcaxqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vbcaxq_s64(*v0, *v1, *v2); }\nvoid VbcaxqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vbcaxq_u8(*v0, *v1, *v2); }\nvoid VbcaxqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vbcaxq_u16(*v0, *v1, *v2); }\nvoid VbcaxqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vbcaxq_u32(*v0, *v1, *v2); }\nvoid VbcaxqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vbcaxq_u64(*v0, *v1, *v2); }\nvoid VbicS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vbic_s8(*v0, *v1); }\nvoid VbicS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vbic_s16(*v0, *v1); }\nvoid VbicS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vbic_s32(*v0, *v1); }\nvoid VbicS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vbic_s64(*v0, *v1); }\nvoid VbicU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vbic_u8(*v0, *v1); }\nvoid VbicU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vbic_u16(*v0, *v1); }\nvoid VbicU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vbic_u32(*v0, *v1); }\nvoid VbicU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vbic_u64(*v0, *v1); }\nvoid VbicqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vbicq_s8(*v0, *v1); }\nvoid VbicqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vbicq_s16(*v0, *v1); }\nvoid VbicqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vbicq_s32(*v0, *v1); }\nvoid VbicqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vbicq_s64(*v0, *v1); }\nvoid VbicqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vbicq_u8(*v0, *v1); }\nvoid VbicqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vbicq_u16(*v0, *v1); }\nvoid VbicqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vbicq_u32(*v0, *v1); }\nvoid VbicqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vbicq_u64(*v0, *v1); }\nvoid VbslS8(int8x8_t* r, uint8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vbsl_s8(*v0, *v1, *v2); }\nvoid VbslS16(int16x4_t* r, uint16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vbsl_s16(*v0, *v1, *v2); }\nvoid VbslS32(int32x2_t* r, uint32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vbsl_s32(*v0, *v1, *v2); }\nvoid VbslS64(int64x1_t* r, uint64x1_t* v0, int64x1_t* v1, int64x1_t* v2) { *r = vbsl_s64(*v0, *v1, *v2); }\nvoid VbslU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vbsl_u8(*v0, *v1, *v2); }\nvoid VbslU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vbsl_u16(*v0, *v1, *v2); }\nvoid VbslU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vbsl_u32(*v0, *v1, *v2); }\nvoid VbslU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1, uint64x1_t* v2) { *r = vbsl_u64(*v0, *v1, *v2); }\nvoid VbslF32(float32x2_t* r, uint32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vbsl_f32(*v0, *v1, *v2); }\nvoid VbslF64(float64x1_t* r, uint64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vbsl_f64(*v0, *v1, *v2); }\nvoid VbslP16(poly16x4_t* r, uint16x4_t* v0, poly16x4_t* v1, poly16x4_t* v2) { *r = vbsl_p16(*v0, *v1, *v2); }\nvoid VbslP64(poly64x1_t* r, uint64x1_t* v0, poly64x1_t* v1, poly64x1_t* v2) { *r = vbsl_p64(*v0, *v1, *v2); }\nvoid VbslP8(poly8x8_t* r, uint8x8_t* v0, poly8x8_t* v1, poly8x8_t* v2) { *r = vbsl_p8(*v0, *v1, *v2); }\nvoid VbslqS8(int8x16_t* r, uint8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vbslq_s8(*v0, *v1, *v2); }\nvoid VbslqS16(int16x8_t* r, uint16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vbslq_s16(*v0, *v1, *v2); }\nvoid VbslqS32(int32x4_t* r, uint32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vbslq_s32(*v0, *v1, *v2); }\nvoid VbslqS64(int64x2_t* r, uint64x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vbslq_s64(*v0, *v1, *v2); }\nvoid VbslqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vbslq_u8(*v0, *v1, *v2); }\nvoid VbslqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vbslq_u16(*v0, *v1, *v2); }\nvoid VbslqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vbslq_u32(*v0, *v1, *v2); }\nvoid VbslqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vbslq_u64(*v0, *v1, *v2); }\nvoid VbslqF32(float32x4_t* r, uint32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vbslq_f32(*v0, *v1, *v2); }\nvoid VbslqF64(float64x2_t* r, uint64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vbslq_f64(*v0, *v1, *v2); }\nvoid VbslqP16(poly16x8_t* r, uint16x8_t* v0, poly16x8_t* v1, poly16x8_t* v2) { *r = vbslq_p16(*v0, *v1, *v2); }\nvoid VbslqP64(poly64x2_t* r, uint64x2_t* v0, poly64x2_t* v1, poly64x2_t* v2) { *r = vbslq_p64(*v0, *v1, *v2); }\nvoid VbslqP8(poly8x16_t* r, uint8x16_t* v0, poly8x16_t* v1, poly8x16_t* v2) { *r = vbslq_p8(*v0, *v1, *v2); }\nvoid VcaddRot270F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcadd_rot270_f32(*v0, *v1); }\nvoid VcaddRot90F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcadd_rot90_f32(*v0, *v1); }\nvoid VcaddqRot270F32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcaddq_rot270_f32(*v0, *v1); }\nvoid VcaddqRot270F64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcaddq_rot270_f64(*v0, *v1); }\nvoid VcaddqRot90F32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcaddq_rot90_f32(*v0, *v1); }\nvoid VcaddqRot90F64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcaddq_rot90_f64(*v0, *v1); }\nvoid VcageF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcage_f32(*v0, *v1); }\nvoid VcageF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcage_f64(*v0, *v1); }\nvoid VcagedF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcaged_f64(*v0, *v1); }\nvoid VcageqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcageq_f32(*v0, *v1); }\nvoid VcageqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcageq_f64(*v0, *v1); }\nvoid VcagesF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcages_f32(*v0, *v1); }\nvoid VcagtF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcagt_f32(*v0, *v1); }\nvoid VcagtF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcagt_f64(*v0, *v1); }\nvoid VcagtdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcagtd_f64(*v0, *v1); }\nvoid VcagtqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcagtq_f32(*v0, *v1); }\nvoid VcagtqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcagtq_f64(*v0, *v1); }\nvoid VcagtsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcagts_f32(*v0, *v1); }\nvoid VcaleF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcale_f32(*v0, *v1); }\nvoid VcaleF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcale_f64(*v0, *v1); }\nvoid VcaledF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcaled_f64(*v0, *v1); }\nvoid VcaleqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcaleq_f32(*v0, *v1); }\nvoid VcaleqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcaleq_f64(*v0, *v1); }\nvoid VcalesF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcales_f32(*v0, *v1); }\nvoid VcaltF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcalt_f32(*v0, *v1); }\nvoid VcaltF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcalt_f64(*v0, *v1); }\nvoid VcaltdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcaltd_f64(*v0, *v1); }\nvoid VcaltqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcaltq_f32(*v0, *v1); }\nvoid VcaltqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcaltq_f64(*v0, *v1); }\nvoid VcaltsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcalts_f32(*v0, *v1); }\nvoid VceqS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vceq_s8(*v0, *v1); }\nvoid VceqS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vceq_s16(*v0, *v1); }\nvoid VceqS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vceq_s32(*v0, *v1); }\nvoid VceqS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vceq_s64(*v0, *v1); }\nvoid VceqU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vceq_u8(*v0, *v1); }\nvoid VceqU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vceq_u16(*v0, *v1); }\nvoid VceqU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vceq_u32(*v0, *v1); }\nvoid VceqU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vceq_u64(*v0, *v1); }\nvoid VceqF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vceq_f32(*v0, *v1); }\nvoid VceqF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vceq_f64(*v0, *v1); }\nvoid VceqP64(uint64x1_t* r, poly64x1_t* v0, poly64x1_t* v1) { *r = vceq_p64(*v0, *v1); }\nvoid VceqP8(uint8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vceq_p8(*v0, *v1); }\nvoid VceqdS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vceqd_s64(*v0, *v1); }\nvoid VceqdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vceqd_u64(*v0, *v1); }\nvoid VceqdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vceqd_f64(*v0, *v1); }\nvoid VceqqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vceqq_s8(*v0, *v1); }\nvoid VceqqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vceqq_s16(*v0, *v1); }\nvoid VceqqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vceqq_s32(*v0, *v1); }\nvoid VceqqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vceqq_s64(*v0, *v1); }\nvoid VceqqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vceqq_u8(*v0, *v1); }\nvoid VceqqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vceqq_u16(*v0, *v1); }\nvoid VceqqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vceqq_u32(*v0, *v1); }\nvoid VceqqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vceqq_u64(*v0, *v1); }\nvoid VceqqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vceqq_f32(*v0, *v1); }\nvoid VceqqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vceqq_f64(*v0, *v1); }\nvoid VceqqP64(uint64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vceqq_p64(*v0, *v1); }\nvoid VceqqP8(uint8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vceqq_p8(*v0, *v1); }\nvoid VceqsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vceqs_f32(*v0, *v1); }\nvoid VceqzS8(uint8x8_t* r, int8x8_t* v0) { *r = vceqz_s8(*v0); }\nvoid VceqzS16(uint16x4_t* r, int16x4_t* v0) { *r = vceqz_s16(*v0); }\nvoid VceqzS32(uint32x2_t* r, int32x2_t* v0) { *r = vceqz_s32(*v0); }\nvoid VceqzS64(uint64x1_t* r, int64x1_t* v0) { *r = vceqz_s64(*v0); }\nvoid VceqzU8(uint8x8_t* r, uint8x8_t* v0) { *r = vceqz_u8(*v0); }\nvoid VceqzU16(uint16x4_t* r, uint16x4_t* v0) { *r = vceqz_u16(*v0); }\nvoid VceqzU32(uint32x2_t* r, uint32x2_t* v0) { *r = vceqz_u32(*v0); }\nvoid VceqzU64(uint64x1_t* r, uint64x1_t* v0) { *r = vceqz_u64(*v0); }\nvoid VceqzF32(uint32x2_t* r, float32x2_t* v0) { *r = vceqz_f32(*v0); }\nvoid VceqzF64(uint64x1_t* r, float64x1_t* v0) { *r = vceqz_f64(*v0); }\nvoid VceqzP64(uint64x1_t* r, poly64x1_t* v0) { *r = vceqz_p64(*v0); }\nvoid VceqzP8(uint8x8_t* r, poly8x8_t* v0) { *r = vceqz_p8(*v0); }\nvoid VceqzdS64(uint64_t* r, int64_t* v0) { *r = vceqzd_s64(*v0); }\nvoid VceqzdU64(uint64_t* r, uint64_t* v0) { *r = vceqzd_u64(*v0); }\nvoid VceqzdF64(uint64_t* r, float64_t* v0) { *r = vceqzd_f64(*v0); }\nvoid VceqzqS8(uint8x16_t* r, int8x16_t* v0) { *r = vceqzq_s8(*v0); }\nvoid VceqzqS16(uint16x8_t* r, int16x8_t* v0) { *r = vceqzq_s16(*v0); }\nvoid VceqzqS32(uint32x4_t* r, int32x4_t* v0) { *r = vceqzq_s32(*v0); }\nvoid VceqzqS64(uint64x2_t* r, int64x2_t* v0) { *r = vceqzq_s64(*v0); }\nvoid VceqzqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vceqzq_u8(*v0); }\nvoid VceqzqU16(uint16x8_t* r, uint16x8_t* v0) { *r = vceqzq_u16(*v0); }\nvoid VceqzqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vceqzq_u32(*v0); }\nvoid VceqzqU64(uint64x2_t* r, uint64x2_t* v0) { *r = vceqzq_u64(*v0); }\nvoid VceqzqF32(uint32x4_t* r, float32x4_t* v0) { *r = vceqzq_f32(*v0); }\nvoid VceqzqF64(uint64x2_t* r, float64x2_t* v0) { *r = vceqzq_f64(*v0); }\nvoid VceqzqP64(uint64x2_t* r, poly64x2_t* v0) { *r = vceqzq_p64(*v0); }\nvoid VceqzqP8(uint8x16_t* r, poly8x16_t* v0) { *r = vceqzq_p8(*v0); }\nvoid VceqzsF32(uint32_t* r, float32_t* v0) { *r = vceqzs_f32(*v0); }\nvoid VcgeS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vcge_s8(*v0, *v1); }\nvoid VcgeS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vcge_s16(*v0, *v1); }\nvoid VcgeS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vcge_s32(*v0, *v1); }\nvoid VcgeS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vcge_s64(*v0, *v1); }\nvoid VcgeU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vcge_u8(*v0, *v1); }\nvoid VcgeU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vcge_u16(*v0, *v1); }\nvoid VcgeU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vcge_u32(*v0, *v1); }\nvoid VcgeU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vcge_u64(*v0, *v1); }\nvoid VcgeF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcge_f32(*v0, *v1); }\nvoid VcgeF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcge_f64(*v0, *v1); }\nvoid VcgedS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vcged_s64(*v0, *v1); }\nvoid VcgedU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vcged_u64(*v0, *v1); }\nvoid VcgedF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcged_f64(*v0, *v1); }\nvoid VcgeqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vcgeq_s8(*v0, *v1); }\nvoid VcgeqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vcgeq_s16(*v0, *v1); }\nvoid VcgeqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vcgeq_s32(*v0, *v1); }\nvoid VcgeqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vcgeq_s64(*v0, *v1); }\nvoid VcgeqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vcgeq_u8(*v0, *v1); }\nvoid VcgeqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vcgeq_u16(*v0, *v1); }\nvoid VcgeqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vcgeq_u32(*v0, *v1); }\nvoid VcgeqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vcgeq_u64(*v0, *v1); }\nvoid VcgeqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcgeq_f32(*v0, *v1); }\nvoid VcgeqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcgeq_f64(*v0, *v1); }\nvoid VcgesF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcges_f32(*v0, *v1); }\nvoid VcgezS8(uint8x8_t* r, int8x8_t* v0) { *r = vcgez_s8(*v0); }\nvoid VcgezS16(uint16x4_t* r, int16x4_t* v0) { *r = vcgez_s16(*v0); }\nvoid VcgezS32(uint32x2_t* r, int32x2_t* v0) { *r = vcgez_s32(*v0); }\nvoid VcgezS64(uint64x1_t* r, int64x1_t* v0) { *r = vcgez_s64(*v0); }\nvoid VcgezF32(uint32x2_t* r, float32x2_t* v0) { *r = vcgez_f32(*v0); }\nvoid VcgezF64(uint64x1_t* r, float64x1_t* v0) { *r = vcgez_f64(*v0); }\nvoid VcgezdS64(uint64_t* r, int64_t* v0) { *r = vcgezd_s64(*v0); }\nvoid VcgezdF64(uint64_t* r, float64_t* v0) { *r = vcgezd_f64(*v0); }\nvoid VcgezqS8(uint8x16_t* r, int8x16_t* v0) { *r = vcgezq_s8(*v0); }\nvoid VcgezqS16(uint16x8_t* r, int16x8_t* v0) { *r = vcgezq_s16(*v0); }\nvoid VcgezqS32(uint32x4_t* r, int32x4_t* v0) { *r = vcgezq_s32(*v0); }\nvoid VcgezqS64(uint64x2_t* r, int64x2_t* v0) { *r = vcgezq_s64(*v0); }\nvoid VcgezqF32(uint32x4_t* r, float32x4_t* v0) { *r = vcgezq_f32(*v0); }\nvoid VcgezqF64(uint64x2_t* r, float64x2_t* v0) { *r = vcgezq_f64(*v0); }\nvoid VcgezsF32(uint32_t* r, float32_t* v0) { *r = vcgezs_f32(*v0); }\nvoid VcgtS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vcgt_s8(*v0, *v1); }\nvoid VcgtS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vcgt_s16(*v0, *v1); }\nvoid VcgtS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vcgt_s32(*v0, *v1); }\nvoid VcgtS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vcgt_s64(*v0, *v1); }\nvoid VcgtU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vcgt_u8(*v0, *v1); }\nvoid VcgtU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vcgt_u16(*v0, *v1); }\nvoid VcgtU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vcgt_u32(*v0, *v1); }\nvoid VcgtU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vcgt_u64(*v0, *v1); }\nvoid VcgtF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcgt_f32(*v0, *v1); }\nvoid VcgtF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcgt_f64(*v0, *v1); }\nvoid VcgtdS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vcgtd_s64(*v0, *v1); }\nvoid VcgtdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vcgtd_u64(*v0, *v1); }\nvoid VcgtdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcgtd_f64(*v0, *v1); }\nvoid VcgtqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vcgtq_s8(*v0, *v1); }\nvoid VcgtqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vcgtq_s16(*v0, *v1); }\nvoid VcgtqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vcgtq_s32(*v0, *v1); }\nvoid VcgtqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vcgtq_s64(*v0, *v1); }\nvoid VcgtqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vcgtq_u8(*v0, *v1); }\nvoid VcgtqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vcgtq_u16(*v0, *v1); }\nvoid VcgtqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vcgtq_u32(*v0, *v1); }\nvoid VcgtqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vcgtq_u64(*v0, *v1); }\nvoid VcgtqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcgtq_f32(*v0, *v1); }\nvoid VcgtqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcgtq_f64(*v0, *v1); }\nvoid VcgtsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcgts_f32(*v0, *v1); }\nvoid VcgtzS8(uint8x8_t* r, int8x8_t* v0) { *r = vcgtz_s8(*v0); }\nvoid VcgtzS16(uint16x4_t* r, int16x4_t* v0) { *r = vcgtz_s16(*v0); }\nvoid VcgtzS32(uint32x2_t* r, int32x2_t* v0) { *r = vcgtz_s32(*v0); }\nvoid VcgtzS64(uint64x1_t* r, int64x1_t* v0) { *r = vcgtz_s64(*v0); }\nvoid VcgtzF32(uint32x2_t* r, float32x2_t* v0) { *r = vcgtz_f32(*v0); }\nvoid VcgtzF64(uint64x1_t* r, float64x1_t* v0) { *r = vcgtz_f64(*v0); }\nvoid VcgtzdS64(uint64_t* r, int64_t* v0) { *r = vcgtzd_s64(*v0); }\nvoid VcgtzdF64(uint64_t* r, float64_t* v0) { *r = vcgtzd_f64(*v0); }\nvoid VcgtzqS8(uint8x16_t* r, int8x16_t* v0) { *r = vcgtzq_s8(*v0); }\nvoid VcgtzqS16(uint16x8_t* r, int16x8_t* v0) { *r = vcgtzq_s16(*v0); }\nvoid VcgtzqS32(uint32x4_t* r, int32x4_t* v0) { *r = vcgtzq_s32(*v0); }\nvoid VcgtzqS64(uint64x2_t* r, int64x2_t* v0) { *r = vcgtzq_s64(*v0); }\nvoid VcgtzqF32(uint32x4_t* r, float32x4_t* v0) { *r = vcgtzq_f32(*v0); }\nvoid VcgtzqF64(uint64x2_t* r, float64x2_t* v0) { *r = vcgtzq_f64(*v0); }\nvoid VcgtzsF32(uint32_t* r, float32_t* v0) { *r = vcgtzs_f32(*v0); }\nvoid VcleS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vcle_s8(*v0, *v1); }\nvoid VcleS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vcle_s16(*v0, *v1); }\nvoid VcleS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vcle_s32(*v0, *v1); }\nvoid VcleS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vcle_s64(*v0, *v1); }\nvoid VcleU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vcle_u8(*v0, *v1); }\nvoid VcleU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vcle_u16(*v0, *v1); }\nvoid VcleU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vcle_u32(*v0, *v1); }\nvoid VcleU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vcle_u64(*v0, *v1); }\nvoid VcleF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcle_f32(*v0, *v1); }\nvoid VcleF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcle_f64(*v0, *v1); }\nvoid VcledS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vcled_s64(*v0, *v1); }\nvoid VcledU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vcled_u64(*v0, *v1); }\nvoid VcledF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcled_f64(*v0, *v1); }\nvoid VcleqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vcleq_s8(*v0, *v1); }\nvoid VcleqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vcleq_s16(*v0, *v1); }\nvoid VcleqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vcleq_s32(*v0, *v1); }\nvoid VcleqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vcleq_s64(*v0, *v1); }\nvoid VcleqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vcleq_u8(*v0, *v1); }\nvoid VcleqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vcleq_u16(*v0, *v1); }\nvoid VcleqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vcleq_u32(*v0, *v1); }\nvoid VcleqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vcleq_u64(*v0, *v1); }\nvoid VcleqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcleq_f32(*v0, *v1); }\nvoid VcleqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcleq_f64(*v0, *v1); }\nvoid VclesF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcles_f32(*v0, *v1); }\nvoid VclezS8(uint8x8_t* r, int8x8_t* v0) { *r = vclez_s8(*v0); }\nvoid VclezS16(uint16x4_t* r, int16x4_t* v0) { *r = vclez_s16(*v0); }\nvoid VclezS32(uint32x2_t* r, int32x2_t* v0) { *r = vclez_s32(*v0); }\nvoid VclezS64(uint64x1_t* r, int64x1_t* v0) { *r = vclez_s64(*v0); }\nvoid VclezF32(uint32x2_t* r, float32x2_t* v0) { *r = vclez_f32(*v0); }\nvoid VclezF64(uint64x1_t* r, float64x1_t* v0) { *r = vclez_f64(*v0); }\nvoid VclezdS64(uint64_t* r, int64_t* v0) { *r = vclezd_s64(*v0); }\nvoid VclezdF64(uint64_t* r, float64_t* v0) { *r = vclezd_f64(*v0); }\nvoid VclezqS8(uint8x16_t* r, int8x16_t* v0) { *r = vclezq_s8(*v0); }\nvoid VclezqS16(uint16x8_t* r, int16x8_t* v0) { *r = vclezq_s16(*v0); }\nvoid VclezqS32(uint32x4_t* r, int32x4_t* v0) { *r = vclezq_s32(*v0); }\nvoid VclezqS64(uint64x2_t* r, int64x2_t* v0) { *r = vclezq_s64(*v0); }\nvoid VclezqF32(uint32x4_t* r, float32x4_t* v0) { *r = vclezq_f32(*v0); }\nvoid VclezqF64(uint64x2_t* r, float64x2_t* v0) { *r = vclezq_f64(*v0); }\nvoid VclezsF32(uint32_t* r, float32_t* v0) { *r = vclezs_f32(*v0); }\nvoid VclsS8(int8x8_t* r, int8x8_t* v0) { *r = vcls_s8(*v0); }\nvoid VclsS16(int16x4_t* r, int16x4_t* v0) { *r = vcls_s16(*v0); }\nvoid VclsS32(int32x2_t* r, int32x2_t* v0) { *r = vcls_s32(*v0); }\nvoid VclsU8(int8x8_t* r, uint8x8_t* v0) { *r = vcls_u8(*v0); }\nvoid VclsU16(int16x4_t* r, uint16x4_t* v0) { *r = vcls_u16(*v0); }\nvoid VclsU32(int32x2_t* r, uint32x2_t* v0) { *r = vcls_u32(*v0); }\nvoid VclsqS8(int8x16_t* r, int8x16_t* v0) { *r = vclsq_s8(*v0); }\nvoid VclsqS16(int16x8_t* r, int16x8_t* v0) { *r = vclsq_s16(*v0); }\nvoid VclsqS32(int32x4_t* r, int32x4_t* v0) { *r = vclsq_s32(*v0); }\nvoid VclsqU8(int8x16_t* r, uint8x16_t* v0) { *r = vclsq_u8(*v0); }\nvoid VclsqU16(int16x8_t* r, uint16x8_t* v0) { *r = vclsq_u16(*v0); }\nvoid VclsqU32(int32x4_t* r, uint32x4_t* v0) { *r = vclsq_u32(*v0); }\nvoid VcltS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vclt_s8(*v0, *v1); }\nvoid VcltS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vclt_s16(*v0, *v1); }\nvoid VcltS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vclt_s32(*v0, *v1); }\nvoid VcltS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vclt_s64(*v0, *v1); }\nvoid VcltU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vclt_u8(*v0, *v1); }\nvoid VcltU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vclt_u16(*v0, *v1); }\nvoid VcltU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vclt_u32(*v0, *v1); }\nvoid VcltU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vclt_u64(*v0, *v1); }\nvoid VcltF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vclt_f32(*v0, *v1); }\nvoid VcltF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vclt_f64(*v0, *v1); }\nvoid VcltdS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vcltd_s64(*v0, *v1); }\nvoid VcltdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vcltd_u64(*v0, *v1); }\nvoid VcltdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcltd_f64(*v0, *v1); }\nvoid VcltqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vcltq_s8(*v0, *v1); }\nvoid VcltqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vcltq_s16(*v0, *v1); }\nvoid VcltqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vcltq_s32(*v0, *v1); }\nvoid VcltqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vcltq_s64(*v0, *v1); }\nvoid VcltqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vcltq_u8(*v0, *v1); }\nvoid VcltqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vcltq_u16(*v0, *v1); }\nvoid VcltqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vcltq_u32(*v0, *v1); }\nvoid VcltqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vcltq_u64(*v0, *v1); }\nvoid VcltqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcltq_f32(*v0, *v1); }\nvoid VcltqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcltq_f64(*v0, *v1); }\nvoid VcltsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vclts_f32(*v0, *v1); }\nvoid VcltzS8(uint8x8_t* r, int8x8_t* v0) { *r = vcltz_s8(*v0); }\nvoid VcltzS16(uint16x4_t* r, int16x4_t* v0) { *r = vcltz_s16(*v0); }\nvoid VcltzS32(uint32x2_t* r, int32x2_t* v0) { *r = vcltz_s32(*v0); }\nvoid VcltzS64(uint64x1_t* r, int64x1_t* v0) { *r = vcltz_s64(*v0); }\nvoid VcltzF32(uint32x2_t* r, float32x2_t* v0) { *r = vcltz_f32(*v0); }\nvoid VcltzF64(uint64x1_t* r, float64x1_t* v0) { *r = vcltz_f64(*v0); }\nvoid VcltzdS64(uint64_t* r, int64_t* v0) { *r = vcltzd_s64(*v0); }\nvoid VcltzdF64(uint64_t* r, float64_t* v0) { *r = vcltzd_f64(*v0); }\nvoid VcltzqS8(uint8x16_t* r, int8x16_t* v0) { *r = vcltzq_s8(*v0); }\nvoid VcltzqS16(uint16x8_t* r, int16x8_t* v0) { *r = vcltzq_s16(*v0); }\nvoid VcltzqS32(uint32x4_t* r, int32x4_t* v0) { *r = vcltzq_s32(*v0); }\nvoid VcltzqS64(uint64x2_t* r, int64x2_t* v0) { *r = vcltzq_s64(*v0); }\nvoid VcltzqF32(uint32x4_t* r, float32x4_t* v0) { *r = vcltzq_f32(*v0); }\nvoid VcltzqF64(uint64x2_t* r, float64x2_t* v0) { *r = vcltzq_f64(*v0); }\nvoid VcltzsF32(uint32_t* r, float32_t* v0) { *r = vcltzs_f32(*v0); }\nvoid VclzS8(int8x8_t* r, int8x8_t* v0) { *r = vclz_s8(*v0); }\nvoid VclzS16(int16x4_t* r, int16x4_t* v0) { *r = vclz_s16(*v0); }\nvoid VclzS32(int32x2_t* r, int32x2_t* v0) { *r = vclz_s32(*v0); }\nvoid VclzU8(uint8x8_t* r, uint8x8_t* v0) { *r = vclz_u8(*v0); }\nvoid VclzU16(uint16x4_t* r, uint16x4_t* v0) { *r = vclz_u16(*v0); }\nvoid VclzU32(uint32x2_t* r, uint32x2_t* v0) { *r = vclz_u32(*v0); }\nvoid VclzqS8(int8x16_t* r, int8x16_t* v0) { *r = vclzq_s8(*v0); }\nvoid VclzqS16(int16x8_t* r, int16x8_t* v0) { *r = vclzq_s16(*v0); }\nvoid VclzqS32(int32x4_t* r, int32x4_t* v0) { *r = vclzq_s32(*v0); }\nvoid VclzqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vclzq_u8(*v0); }\nvoid VclzqU16(uint16x8_t* r, uint16x8_t* v0) { *r = vclzq_u16(*v0); }\nvoid VclzqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vclzq_u32(*v0); }\nvoid VcntS8(int8x8_t* r, int8x8_t* v0) { *r = vcnt_s8(*v0); }\nvoid VcntU8(uint8x8_t* r, uint8x8_t* v0) { *r = vcnt_u8(*v0); }\nvoid VcntP8(poly8x8_t* r, poly8x8_t* v0) { *r = vcnt_p8(*v0); }\nvoid VcntqS8(int8x16_t* r, int8x16_t* v0) { *r = vcntq_s8(*v0); }\nvoid VcntqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vcntq_u8(*v0); }\nvoid VcntqP8(poly8x16_t* r, poly8x16_t* v0) { *r = vcntq_p8(*v0); }\nvoid VcombineS8(int8x16_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vcombine_s8(*v0, *v1); }\nvoid VcombineS16(int16x8_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vcombine_s16(*v0, *v1); }\nvoid VcombineS32(int32x4_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vcombine_s32(*v0, *v1); }\nvoid VcombineS64(int64x2_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vcombine_s64(*v0, *v1); }\nvoid VcombineU8(uint8x16_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vcombine_u8(*v0, *v1); }\nvoid VcombineU16(uint16x8_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vcombine_u16(*v0, *v1); }\nvoid VcombineU32(uint32x4_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vcombine_u32(*v0, *v1); }\nvoid VcombineU64(uint64x2_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vcombine_u64(*v0, *v1); }\nvoid VcombineF32(float32x4_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcombine_f32(*v0, *v1); }\nvoid VcombineF64(float64x2_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcombine_f64(*v0, *v1); }\nvoid VcombineP16(poly16x8_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vcombine_p16(*v0, *v1); }\nvoid VcombineP64(poly64x2_t* r, poly64x1_t* v0, poly64x1_t* v1) { *r = vcombine_p64(*v0, *v1); }\nvoid VcombineP8(poly8x16_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vcombine_p8(*v0, *v1); }\nvoid VcvtF32S32(float32x2_t* r, int32x2_t* v0) { *r = vcvt_f32_s32(*v0); }\nvoid VcvtF32U32(float32x2_t* r, uint32x2_t* v0) { *r = vcvt_f32_u32(*v0); }\nvoid VcvtF32F64(float32x2_t* r, float64x2_t* v0) { *r = vcvt_f32_f64(*v0); }\nvoid VcvtF64S64(float64x1_t* r, int64x1_t* v0) { *r = vcvt_f64_s64(*v0); }\nvoid VcvtF64U64(float64x1_t* r, uint64x1_t* v0) { *r = vcvt_f64_u64(*v0); }\nvoid VcvtF64F32(float64x2_t* r, float32x2_t* v0) { *r = vcvt_f64_f32(*v0); }\nvoid VcvtHighF32F64(float32x4_t* r, float32x2_t* v0, float64x2_t* v1) { *r = vcvt_high_f32_f64(*v0, *v1); }\nvoid VcvtHighF64F32(float64x2_t* r, float32x4_t* v0) { *r = vcvt_high_f64_f32(*v0); }\nvoid VcvtS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvt_s32_f32(*v0); }\nvoid VcvtS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvt_s64_f64(*v0); }\nvoid VcvtU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvt_u32_f32(*v0); }\nvoid VcvtU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvt_u64_f64(*v0); }\nvoid VcvtaS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvta_s32_f32(*v0); }\nvoid VcvtaS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvta_s64_f64(*v0); }\nvoid VcvtaU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvta_u32_f32(*v0); }\nvoid VcvtaU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvta_u64_f64(*v0); }\nvoid VcvtadS64F64(int64_t* r, float64_t* v0) { *r = vcvtad_s64_f64(*v0); }\nvoid VcvtadU64F64(uint64_t* r, float64_t* v0) { *r = vcvtad_u64_f64(*v0); }\nvoid VcvtaqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtaq_s32_f32(*v0); }\nvoid VcvtaqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtaq_s64_f64(*v0); }\nvoid VcvtaqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtaq_u32_f32(*v0); }\nvoid VcvtaqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtaq_u64_f64(*v0); }\nvoid VcvtasS32F32(int32_t* r, float32_t* v0) { *r = vcvtas_s32_f32(*v0); }\nvoid VcvtasU32F32(uint32_t* r, float32_t* v0) { *r = vcvtas_u32_f32(*v0); }\nvoid VcvtdF64S64(float64_t* r, int64_t* v0) { *r = vcvtd_f64_s64(*v0); }\nvoid VcvtdF64U64(float64_t* r, uint64_t* v0) { *r = vcvtd_f64_u64(*v0); }\nvoid VcvtdS64F64(int64_t* r, float64_t* v0) { *r = vcvtd_s64_f64(*v0); }\nvoid VcvtdU64F64(uint64_t* r, float64_t* v0) { *r = vcvtd_u64_f64(*v0); }\nvoid VcvtmS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvtm_s32_f32(*v0); }\nvoid VcvtmS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvtm_s64_f64(*v0); }\nvoid VcvtmU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvtm_u32_f32(*v0); }\nvoid VcvtmU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvtm_u64_f64(*v0); }\nvoid VcvtmdS64F64(int64_t* r, float64_t* v0) { *r = vcvtmd_s64_f64(*v0); }\nvoid VcvtmdU64F64(uint64_t* r, float64_t* v0) { *r = vcvtmd_u64_f64(*v0); }\nvoid VcvtmqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtmq_s32_f32(*v0); }\nvoid VcvtmqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtmq_s64_f64(*v0); }\nvoid VcvtmqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtmq_u32_f32(*v0); }\nvoid VcvtmqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtmq_u64_f64(*v0); }\nvoid VcvtmsS32F32(int32_t* r, float32_t* v0) { *r = vcvtms_s32_f32(*v0); }\nvoid VcvtmsU32F32(uint32_t* r, float32_t* v0) { *r = vcvtms_u32_f32(*v0); }\nvoid VcvtnS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvtn_s32_f32(*v0); }\nvoid VcvtnS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvtn_s64_f64(*v0); }\nvoid VcvtnU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvtn_u32_f32(*v0); }\nvoid VcvtnU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvtn_u64_f64(*v0); }\nvoid VcvtndS64F64(int64_t* r, float64_t* v0) { *r = vcvtnd_s64_f64(*v0); }\nvoid VcvtndU64F64(uint64_t* r, float64_t* v0) { *r = vcvtnd_u64_f64(*v0); }\nvoid VcvtnqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtnq_s32_f32(*v0); }\nvoid VcvtnqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtnq_s64_f64(*v0); }\nvoid VcvtnqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtnq_u32_f32(*v0); }\nvoid VcvtnqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtnq_u64_f64(*v0); }\nvoid VcvtnsS32F32(int32_t* r, float32_t* v0) { *r = vcvtns_s32_f32(*v0); }\nvoid VcvtnsU32F32(uint32_t* r, float32_t* v0) { *r = vcvtns_u32_f32(*v0); }\nvoid VcvtpS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvtp_s32_f32(*v0); }\nvoid VcvtpS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvtp_s64_f64(*v0); }\nvoid VcvtpU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvtp_u32_f32(*v0); }\nvoid VcvtpU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvtp_u64_f64(*v0); }\nvoid VcvtpdS64F64(int64_t* r, float64_t* v0) { *r = vcvtpd_s64_f64(*v0); }\nvoid VcvtpdU64F64(uint64_t* r, float64_t* v0) { *r = vcvtpd_u64_f64(*v0); }\nvoid VcvtpqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtpq_s32_f32(*v0); }\nvoid VcvtpqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtpq_s64_f64(*v0); }\nvoid VcvtpqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtpq_u32_f32(*v0); }\nvoid VcvtpqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtpq_u64_f64(*v0); }\nvoid VcvtpsS32F32(int32_t* r, float32_t* v0) { *r = vcvtps_s32_f32(*v0); }\nvoid VcvtpsU32F32(uint32_t* r, float32_t* v0) { *r = vcvtps_u32_f32(*v0); }\nvoid VcvtqF32S32(float32x4_t* r, int32x4_t* v0) { *r = vcvtq_f32_s32(*v0); }\nvoid VcvtqF32U32(float32x4_t* r, uint32x4_t* v0) { *r = vcvtq_f32_u32(*v0); }\nvoid VcvtqF64S64(float64x2_t* r, int64x2_t* v0) { *r = vcvtq_f64_s64(*v0); }\nvoid VcvtqF64U64(float64x2_t* r, uint64x2_t* v0) { *r = vcvtq_f64_u64(*v0); }\nvoid VcvtqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtq_s32_f32(*v0); }\nvoid VcvtqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtq_s64_f64(*v0); }\nvoid VcvtqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtq_u32_f32(*v0); }\nvoid VcvtqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtq_u64_f64(*v0); }\nvoid VcvtsF32S32(float32_t* r, int32_t* v0) { *r = vcvts_f32_s32(*v0); }\nvoid VcvtsF32U32(float32_t* r, uint32_t* v0) { *r = vcvts_f32_u32(*v0); }\nvoid VcvtsS32F32(int32_t* r, float32_t* v0) { *r = vcvts_s32_f32(*v0); }\nvoid VcvtsU32F32(uint32_t* r, float32_t* v0) { *r = vcvts_u32_f32(*v0); }\nvoid VcvtxF32F64(float32x2_t* r, float64x2_t* v0) { *r = vcvtx_f32_f64(*v0); }\nvoid VcvtxHighF32F64(float32x4_t* r, float32x2_t* v0, float64x2_t* v1) { *r = vcvtx_high_f32_f64(*v0, *v1); }\nvoid VcvtxdF32F64(float32_t* r, float64_t* v0) { *r = vcvtxd_f32_f64(*v0); }\nvoid VdivF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vdiv_f32(*v0, *v1); }\nvoid VdivF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vdiv_f64(*v0, *v1); }\nvoid VdivqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vdivq_f32(*v0, *v1); }\nvoid VdivqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vdivq_f64(*v0, *v1); }\nvoid VdotS32(int32x2_t* r, int32x2_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vdot_s32(*v0, *v1, *v2); }\nvoid VdotU32(uint32x2_t* r, uint32x2_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vdot_u32(*v0, *v1, *v2); }\nvoid VdotqS32(int32x4_t* r, int32x4_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vdotq_s32(*v0, *v1, *v2); }\nvoid VdotqU32(uint32x4_t* r, uint32x4_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vdotq_u32(*v0, *v1, *v2); }\nvoid VdupNS8(int8x8_t* r, int8_t* v0) { *r = vdup_n_s8(*v0); }\nvoid VdupNS16(int16x4_t* r, int16_t* v0) { *r = vdup_n_s16(*v0); }\nvoid VdupNS32(int32x2_t* r, int32_t* v0) { *r = vdup_n_s32(*v0); }\nvoid VdupNS64(int64x1_t* r, int64_t* v0) { *r = vdup_n_s64(*v0); }\nvoid VdupNU8(uint8x8_t* r, uint8_t* v0) { *r = vdup_n_u8(*v0); }\nvoid VdupNU16(uint16x4_t* r, uint16_t* v0) { *r = vdup_n_u16(*v0); }\nvoid VdupNU32(uint32x2_t* r, uint32_t* v0) { *r = vdup_n_u32(*v0); }\nvoid VdupNU64(uint64x1_t* r, uint64_t* v0) { *r = vdup_n_u64(*v0); }\nvoid VdupNF32(float32x2_t* r, float32_t* v0) { *r = vdup_n_f32(*v0); }\nvoid VdupNF64(float64x1_t* r, float64_t* v0) { *r = vdup_n_f64(*v0); }\nvoid VdupNP16(poly16x4_t* r, poly16_t* v0) { *r = vdup_n_p16(*v0); }\nvoid VdupNP64(poly64x1_t* r, poly64_t* v0) { *r = vdup_n_p64(*v0); }\nvoid VdupNP8(poly8x8_t* r, poly8_t* v0) { *r = vdup_n_p8(*v0); }\nvoid VdupqNS8(int8x16_t* r, int8_t* v0) { *r = vdupq_n_s8(*v0); }\nvoid VdupqNS16(int16x8_t* r, int16_t* v0) { *r = vdupq_n_s16(*v0); }\nvoid VdupqNS32(int32x4_t* r, int32_t* v0) { *r = vdupq_n_s32(*v0); }\nvoid VdupqNS64(int64x2_t* r, int64_t* v0) { *r = vdupq_n_s64(*v0); }\nvoid VdupqNU8(uint8x16_t* r, uint8_t* v0) { *r = vdupq_n_u8(*v0); }\nvoid VdupqNU16(uint16x8_t* r, uint16_t* v0) { *r = vdupq_n_u16(*v0); }\nvoid VdupqNU32(uint32x4_t* r, uint32_t* v0) { *r = vdupq_n_u32(*v0); }\nvoid VdupqNU64(uint64x2_t* r, uint64_t* v0) { *r = vdupq_n_u64(*v0); }\nvoid VdupqNF32(float32x4_t* r, float32_t* v0) { *r = vdupq_n_f32(*v0); }\nvoid VdupqNF64(float64x2_t* r, float64_t* v0) { *r = vdupq_n_f64(*v0); }\nvoid VdupqNP16(poly16x8_t* r, poly16_t* v0) { *r = vdupq_n_p16(*v0); }\nvoid VdupqNP64(poly64x2_t* r, poly64_t* v0) { *r = vdupq_n_p64(*v0); }\nvoid VdupqNP8(poly8x16_t* r, poly8_t* v0) { *r = vdupq_n_p8(*v0); }\nvoid VeorS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = veor_s8(*v0, *v1); }\nvoid VeorS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = veor_s16(*v0, *v1); }\nvoid VeorS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = veor_s32(*v0, *v1); }\nvoid VeorS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = veor_s64(*v0, *v1); }\nvoid VeorU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = veor_u8(*v0, *v1); }\nvoid VeorU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = veor_u16(*v0, *v1); }\nvoid VeorU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = veor_u32(*v0, *v1); }\nvoid VeorU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = veor_u64(*v0, *v1); }\nvoid Veor3QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = veor3q_s8(*v0, *v1, *v2); }\nvoid Veor3QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = veor3q_s16(*v0, *v1, *v2); }\nvoid Veor3QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = veor3q_s32(*v0, *v1, *v2); }\nvoid Veor3QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = veor3q_s64(*v0, *v1, *v2); }\nvoid Veor3QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = veor3q_u8(*v0, *v1, *v2); }\nvoid Veor3QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = veor3q_u16(*v0, *v1, *v2); }\nvoid Veor3QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = veor3q_u32(*v0, *v1, *v2); }\nvoid Veor3QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = veor3q_u64(*v0, *v1, *v2); }\nvoid VeorqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = veorq_s8(*v0, *v1); }\nvoid VeorqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = veorq_s16(*v0, *v1); }\nvoid VeorqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = veorq_s32(*v0, *v1); }\nvoid VeorqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = veorq_s64(*v0, *v1); }\nvoid VeorqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = veorq_u8(*v0, *v1); }\nvoid VeorqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = veorq_u16(*v0, *v1); }\nvoid VeorqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = veorq_u32(*v0, *v1); }\nvoid VeorqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = veorq_u64(*v0, *v1); }\nvoid VfmaF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vfma_f32(*v0, *v1, *v2); }\nvoid VfmaF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vfma_f64(*v0, *v1, *v2); }\nvoid VfmaNF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32_t* v2) { *r = vfma_n_f32(*v0, *v1, *v2); }\nvoid VfmaNF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64_t* v2) { *r = vfma_n_f64(*v0, *v1, *v2); }\nvoid VfmaqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vfmaq_f32(*v0, *v1, *v2); }\nvoid VfmaqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vfmaq_f64(*v0, *v1, *v2); }\nvoid VfmaqNF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32_t* v2) { *r = vfmaq_n_f32(*v0, *v1, *v2); }\nvoid VfmaqNF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64_t* v2) { *r = vfmaq_n_f64(*v0, *v1, *v2); }\nvoid VfmsF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vfms_f32(*v0, *v1, *v2); }\nvoid VfmsF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vfms_f64(*v0, *v1, *v2); }\nvoid VfmsNF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32_t* v2) { *r = vfms_n_f32(*v0, *v1, *v2); }\nvoid VfmsNF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64_t* v2) { *r = vfms_n_f64(*v0, *v1, *v2); }\nvoid VfmsqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vfmsq_f32(*v0, *v1, *v2); }\nvoid VfmsqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vfmsq_f64(*v0, *v1, *v2); }\nvoid VfmsqNF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32_t* v2) { *r = vfmsq_n_f32(*v0, *v1, *v2); }\nvoid VfmsqNF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64_t* v2) { *r = vfmsq_n_f64(*v0, *v1, *v2); }\nvoid VgetHighS8(int8x8_t* r, int8x16_t* v0) { *r = vget_high_s8(*v0); }\nvoid VgetHighS16(int16x4_t* r, int16x8_t* v0) { *r = vget_high_s16(*v0); }\nvoid VgetHighS32(int32x2_t* r, int32x4_t* v0) { *r = vget_high_s32(*v0); }\nvoid VgetHighS64(int64x1_t* r, int64x2_t* v0) { *r = vget_high_s64(*v0); }\nvoid VgetHighU8(uint8x8_t* r, uint8x16_t* v0) { *r = vget_high_u8(*v0); }\nvoid VgetHighU16(uint16x4_t* r, uint16x8_t* v0) { *r = vget_high_u16(*v0); }\nvoid VgetHighU32(uint32x2_t* r, uint32x4_t* v0) { *r = vget_high_u32(*v0); }\nvoid VgetHighU64(uint64x1_t* r, uint64x2_t* v0) { *r = vget_high_u64(*v0); }\nvoid VgetHighF32(float32x2_t* r, float32x4_t* v0) { *r = vget_high_f32(*v0); }\nvoid VgetHighF64(float64x1_t* r, float64x2_t* v0) { *r = vget_high_f64(*v0); }\nvoid VgetHighP16(poly16x4_t* r, poly16x8_t* v0) { *r = vget_high_p16(*v0); }\nvoid VgetHighP64(poly64x1_t* r, poly64x2_t* v0) { *r = vget_high_p64(*v0); }\nvoid VgetHighP8(poly8x8_t* r, poly8x16_t* v0) { *r = vget_high_p8(*v0); }\nvoid VgetLowS8(int8x8_t* r, int8x16_t* v0) { *r = vget_low_s8(*v0); }\nvoid VgetLowS16(int16x4_t* r, int16x8_t* v0) { *r = vget_low_s16(*v0); }\nvoid VgetLowS32(int32x2_t* r, int32x4_t* v0) { *r = vget_low_s32(*v0); }\nvoid VgetLowS64(int64x1_t* r, int64x2_t* v0) { *r = vget_low_s64(*v0); }\nvoid VgetLowU8(uint8x8_t* r, uint8x16_t* v0) { *r = vget_low_u8(*v0); }\nvoid VgetLowU16(uint16x4_t* r, uint16x8_t* v0) { *r = vget_low_u16(*v0); }\nvoid VgetLowU32(uint32x2_t* r, uint32x4_t* v0) { *r = vget_low_u32(*v0); }\nvoid VgetLowU64(uint64x1_t* r, uint64x2_t* v0) { *r = vget_low_u64(*v0); }\nvoid VgetLowF32(float32x2_t* r, float32x4_t* v0) { *r = vget_low_f32(*v0); }\nvoid VgetLowF64(float64x1_t* r, float64x2_t* v0) { *r = vget_low_f64(*v0); }\nvoid VgetLowP16(poly16x4_t* r, poly16x8_t* v0) { *r = vget_low_p16(*v0); }\nvoid VgetLowP64(poly64x1_t* r, poly64x2_t* v0) { *r = vget_low_p64(*v0); }\nvoid VgetLowP8(poly8x8_t* r, poly8x16_t* v0) { *r = vget_low_p8(*v0); }\nvoid VhaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vhadd_s8(*v0, *v1); }\nvoid VhaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vhadd_s16(*v0, *v1); }\nvoid VhaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vhadd_s32(*v0, *v1); }\nvoid VhaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vhadd_u8(*v0, *v1); }\nvoid VhaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vhadd_u16(*v0, *v1); }\nvoid VhaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vhadd_u32(*v0, *v1); }\nvoid VhaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vhaddq_s8(*v0, *v1); }\nvoid VhaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vhaddq_s16(*v0, *v1); }\nvoid VhaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vhaddq_s32(*v0, *v1); }\nvoid VhaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vhaddq_u8(*v0, *v1); }\nvoid VhaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vhaddq_u16(*v0, *v1); }\nvoid VhaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vhaddq_u32(*v0, *v1); }\nvoid VhsubS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vhsub_s8(*v0, *v1); }\nvoid VhsubS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vhsub_s16(*v0, *v1); }\nvoid VhsubS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vhsub_s32(*v0, *v1); }\nvoid VhsubU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vhsub_u8(*v0, *v1); }\nvoid VhsubU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vhsub_u16(*v0, *v1); }\nvoid VhsubU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vhsub_u32(*v0, *v1); }\nvoid VhsubqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vhsubq_s8(*v0, *v1); }\nvoid VhsubqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vhsubq_s16(*v0, *v1); }\nvoid VhsubqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vhsubq_s32(*v0, *v1); }\nvoid VhsubqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vhsubq_u8(*v0, *v1); }\nvoid VhsubqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vhsubq_u16(*v0, *v1); }\nvoid VhsubqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vhsubq_u32(*v0, *v1); }\nvoid VmaxS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmax_s8(*v0, *v1); }\nvoid VmaxS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vmax_s16(*v0, *v1); }\nvoid VmaxS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vmax_s32(*v0, *v1); }\nvoid VmaxU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vmax_u8(*v0, *v1); }\nvoid VmaxU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vmax_u16(*v0, *v1); }\nvoid VmaxU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vmax_u32(*v0, *v1); }\nvoid VmaxF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmax_f32(*v0, *v1); }\nvoid VmaxF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmax_f64(*v0, *v1); }\nvoid VmaxnmF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmaxnm_f32(*v0, *v1); }\nvoid VmaxnmF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmaxnm_f64(*v0, *v1); }\nvoid VmaxnmqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vmaxnmq_f32(*v0, *v1); }\nvoid VmaxnmqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vmaxnmq_f64(*v0, *v1); }\nvoid VmaxnmvF32(float32_t* r, float32x2_t* v0) { *r = vmaxnmv_f32(*v0); }\nvoid VmaxnmvqF32(float32_t* r, float32x4_t* v0) { *r = vmaxnmvq_f32(*v0); }\nvoid VmaxnmvqF64(float64_t* r, float64x2_t* v0) { *r = vmaxnmvq_f64(*v0); }\nvoid VmaxqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vmaxq_s8(*v0, *v1); }\nvoid VmaxqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vmaxq_s16(*v0, *v1); }\nvoid VmaxqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vmaxq_s32(*v0, *v1); }\nvoid VmaxqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vmaxq_u8(*v0, *v1); }\nvoid VmaxqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vmaxq_u16(*v0, *v1); }\nvoid VmaxqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vmaxq_u32(*v0, *v1); }\nvoid VmaxqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vmaxq_f32(*v0, *v1); }\nvoid VmaxqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vmaxq_f64(*v0, *v1); }\nvoid VmaxvS8(int8_t* r, int8x8_t* v0) { *r = vmaxv_s8(*v0); }\nvoid VmaxvS16(int16_t* r, int16x4_t* v0) { *r = vmaxv_s16(*v0); }\nvoid VmaxvS32(int32_t* r, int32x2_t* v0) { *r = vmaxv_s32(*v0); }\nvoid VmaxvU8(uint8_t* r, uint8x8_t* v0) { *r = vmaxv_u8(*v0); }\nvoid VmaxvU16(uint16_t* r, uint16x4_t* v0) { *r = vmaxv_u16(*v0); }\nvoid VmaxvU32(uint32_t* r, uint32x2_t* v0) { *r = vmaxv_u32(*v0); }\nvoid VmaxvF32(float32_t* r, float32x2_t* v0) { *r = vmaxv_f32(*v0); }\nvoid VmaxvqS8(int8_t* r, int8x16_t* v0) { *r = vmaxvq_s8(*v0); }\nvoid VmaxvqS16(int16_t* r, int16x8_t* v0) { *r = vmaxvq_s16(*v0); }\nvoid VmaxvqS32(int32_t* r, int32x4_t* v0) { *r = vmaxvq_s32(*v0); }\nvoid VmaxvqU8(uint8_t* r, uint8x16_t* v0) { *r = vmaxvq_u8(*v0); }\nvoid VmaxvqU16(uint16_t* r, uint16x8_t* v0) { *r = vmaxvq_u16(*v0); }\nvoid VmaxvqU32(uint32_t* r, uint32x4_t* v0) { *r = vmaxvq_u32(*v0); }\nvoid VmaxvqF32(float32_t* r, float32x4_t* v0) { *r = vmaxvq_f32(*v0); }\nvoid VmaxvqF64(float64_t* r, float64x2_t* v0) { *r = vmaxvq_f64(*v0); }\nvoid VminS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmin_s8(*v0, *v1); }\nvoid VminS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vmin_s16(*v0, *v1); }\nvoid VminS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vmin_s32(*v0, *v1); }\nvoid VminU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vmin_u8(*v0, *v1); }\nvoid VminU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vmin_u16(*v0, *v1); }\nvoid VminU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vmin_u32(*v0, *v1); }\nvoid VminF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmin_f32(*v0, *v1); }\nvoid VminF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmin_f64(*v0, *v1); }\nvoid VminnmF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vminnm_f32(*v0, *v1); }\nvoid VminnmF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vminnm_f64(*v0, *v1); }\nvoid VminnmqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vminnmq_f32(*v0, *v1); }\nvoid VminnmqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vminnmq_f64(*v0, *v1); }\nvoid VminnmvF32(float32_t* r, float32x2_t* v0) { *r = vminnmv_f32(*v0); }\nvoid VminnmvqF32(float32_t* r, float32x4_t* v0) { *r = vminnmvq_f32(*v0); }\nvoid VminnmvqF64(float64_t* r, float64x2_t* v0) { *r = vminnmvq_f64(*v0); }\nvoid VminqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vminq_s8(*v0, *v1); }\nvoid VminqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vminq_s16(*v0, *v1); }\nvoid VminqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vminq_s32(*v0, *v1); }\nvoid VminqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vminq_u8(*v0, *v1); }\nvoid VminqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vminq_u16(*v0, *v1); }\nvoid VminqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vminq_u32(*v0, *v1); }\nvoid VminqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vminq_f32(*v0, *v1); }\nvoid VminqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vminq_f64(*v0, *v1); }\nvoid VminvS8(int8_t* r, int8x8_t* v0) { *r = vminv_s8(*v0); }\nvoid VminvS16(int16_t* r, int16x4_t* v0) { *r = vminv_s16(*v0); }\nvoid VminvS32(int32_t* r, int32x2_t* v0) { *r = vminv_s32(*v0); }\nvoid VminvU8(uint8_t* r, uint8x8_t* v0) { *r = vminv_u8(*v0); }\nvoid VminvU16(uint16_t* r, uint16x4_t* v0) { *r = vminv_u16(*v0); }\nvoid VminvU32(uint32_t* r, uint32x2_t* v0) { *r = vminv_u32(*v0); }\nvoid VminvF32(float32_t* r, float32x2_t* v0) { *r = vminv_f32(*v0); }\nvoid VminvqS8(int8_t* r, int8x16_t* v0) { *r = vminvq_s8(*v0); }\nvoid VminvqS16(int16_t* r, int16x8_t* v0) { *r = vminvq_s16(*v0); }\nvoid VminvqS32(int32_t* r, int32x4_t* v0) { *r = vminvq_s32(*v0); }\nvoid VminvqU8(uint8_t* r, uint8x16_t* v0) { *r = vminvq_u8(*v0); }\nvoid VminvqU16(uint16_t* r, uint16x8_t* v0) { *r = vminvq_u16(*v0); }\nvoid VminvqU32(uint32_t* r, uint32x4_t* v0) { *r = vminvq_u32(*v0); }\nvoid VminvqF32(float32_t* r, float32x4_t* v0) { *r = vminvq_f32(*v0); }\nvoid VminvqF64(float64_t* r, float64x2_t* v0) { *r = vminvq_f64(*v0); }\nvoid VmlaS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vmla_s8(*v0, *v1, *v2); }\nvoid VmlaS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vmla_s16(*v0, *v1, *v2); }\nvoid VmlaS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vmla_s32(*v0, *v1, *v2); }\nvoid VmlaU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vmla_u8(*v0, *v1, *v2); }\nvoid VmlaU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vmla_u16(*v0, *v1, *v2); }\nvoid VmlaU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vmla_u32(*v0, *v1, *v2); }\nvoid VmlaF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vmla_f32(*v0, *v1, *v2); }\nvoid VmlaF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vmla_f64(*v0, *v1, *v2); }\nvoid VmlaNS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vmla_n_s16(*v0, *v1, *v2); }\nvoid VmlaNS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vmla_n_s32(*v0, *v1, *v2); }\nvoid VmlaNU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16_t* v2) { *r = vmla_n_u16(*v0, *v1, *v2); }\nvoid VmlaNU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32_t* v2) { *r = vmla_n_u32(*v0, *v1, *v2); }\nvoid VmlaNF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32_t* v2) { *r = vmla_n_f32(*v0, *v1, *v2); }\nvoid VmlalS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vmlal_s8(*v0, *v1, *v2); }\nvoid VmlalS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vmlal_s16(*v0, *v1, *v2); }\nvoid VmlalS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vmlal_s32(*v0, *v1, *v2); }\nvoid VmlalU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vmlal_u8(*v0, *v1, *v2); }\nvoid VmlalU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vmlal_u16(*v0, *v1, *v2); }\nvoid VmlalU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vmlal_u32(*v0, *v1, *v2); }\nvoid VmlalHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmlal_high_s8(*v0, *v1, *v2); }\nvoid VmlalHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vmlal_high_s16(*v0, *v1, *v2); }\nvoid VmlalHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vmlal_high_s32(*v0, *v1, *v2); }\nvoid VmlalHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmlal_high_u8(*v0, *v1, *v2); }\nvoid VmlalHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vmlal_high_u16(*v0, *v1, *v2); }\nvoid VmlalHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vmlal_high_u32(*v0, *v1, *v2); }\nvoid VmlalHighNS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16_t* v2) { *r = vmlal_high_n_s16(*v0, *v1, *v2); }\nvoid VmlalHighNS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32_t* v2) { *r = vmlal_high_n_s32(*v0, *v1, *v2); }\nvoid VmlalHighNU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16_t* v2) { *r = vmlal_high_n_u16(*v0, *v1, *v2); }\nvoid VmlalHighNU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32_t* v2) { *r = vmlal_high_n_u32(*v0, *v1, *v2); }\nvoid VmlalNS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vmlal_n_s16(*v0, *v1, *v2); }\nvoid VmlalNS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vmlal_n_s32(*v0, *v1, *v2); }\nvoid VmlalNU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16_t* v2) { *r = vmlal_n_u16(*v0, *v1, *v2); }\nvoid VmlalNU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32_t* v2) { *r = vmlal_n_u32(*v0, *v1, *v2); }\nvoid VmlaqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmlaq_s8(*v0, *v1, *v2); }\nvoid VmlaqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vmlaq_s16(*v0, *v1, *v2); }\nvoid VmlaqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vmlaq_s32(*v0, *v1, *v2); }\nvoid VmlaqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmlaq_u8(*v0, *v1, *v2); }\nvoid VmlaqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vmlaq_u16(*v0, *v1, *v2); }\nvoid VmlaqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vmlaq_u32(*v0, *v1, *v2); }\nvoid VmlaqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vmlaq_f32(*v0, *v1, *v2); }\nvoid VmlaqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vmlaq_f64(*v0, *v1, *v2); }\nvoid VmlaqNS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16_t* v2) { *r = vmlaq_n_s16(*v0, *v1, *v2); }\nvoid VmlaqNS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32_t* v2) { *r = vmlaq_n_s32(*v0, *v1, *v2); }\nvoid VmlaqNU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16_t* v2) { *r = vmlaq_n_u16(*v0, *v1, *v2); }\nvoid VmlaqNU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32_t* v2) { *r = vmlaq_n_u32(*v0, *v1, *v2); }\nvoid VmlaqNF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32_t* v2) { *r = vmlaq_n_f32(*v0, *v1, *v2); }\nvoid VmlsS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vmls_s8(*v0, *v1, *v2); }\nvoid VmlsS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vmls_s16(*v0, *v1, *v2); }\nvoid VmlsS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vmls_s32(*v0, *v1, *v2); }\nvoid VmlsU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vmls_u8(*v0, *v1, *v2); }\nvoid VmlsU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vmls_u16(*v0, *v1, *v2); }\nvoid VmlsU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vmls_u32(*v0, *v1, *v2); }\nvoid VmlsF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vmls_f32(*v0, *v1, *v2); }\nvoid VmlsF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vmls_f64(*v0, *v1, *v2); }\nvoid VmlsNS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vmls_n_s16(*v0, *v1, *v2); }\nvoid VmlsNS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vmls_n_s32(*v0, *v1, *v2); }\nvoid VmlsNU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16_t* v2) { *r = vmls_n_u16(*v0, *v1, *v2); }\nvoid VmlsNU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32_t* v2) { *r = vmls_n_u32(*v0, *v1, *v2); }\nvoid VmlsNF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32_t* v2) { *r = vmls_n_f32(*v0, *v1, *v2); }\nvoid VmlslS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vmlsl_s8(*v0, *v1, *v2); }\nvoid VmlslS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vmlsl_s16(*v0, *v1, *v2); }\nvoid VmlslS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vmlsl_s32(*v0, *v1, *v2); }\nvoid VmlslU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vmlsl_u8(*v0, *v1, *v2); }\nvoid VmlslU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vmlsl_u16(*v0, *v1, *v2); }\nvoid VmlslU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vmlsl_u32(*v0, *v1, *v2); }\nvoid VmlslHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmlsl_high_s8(*v0, *v1, *v2); }\nvoid VmlslHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vmlsl_high_s16(*v0, *v1, *v2); }\nvoid VmlslHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vmlsl_high_s32(*v0, *v1, *v2); }\nvoid VmlslHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmlsl_high_u8(*v0, *v1, *v2); }\nvoid VmlslHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vmlsl_high_u16(*v0, *v1, *v2); }\nvoid VmlslHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vmlsl_high_u32(*v0, *v1, *v2); }\nvoid VmlslHighNS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16_t* v2) { *r = vmlsl_high_n_s16(*v0, *v1, *v2); }\nvoid VmlslHighNS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32_t* v2) { *r = vmlsl_high_n_s32(*v0, *v1, *v2); }\nvoid VmlslHighNU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16_t* v2) { *r = vmlsl_high_n_u16(*v0, *v1, *v2); }\nvoid VmlslHighNU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32_t* v2) { *r = vmlsl_high_n_u32(*v0, *v1, *v2); }\nvoid VmlslNS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vmlsl_n_s16(*v0, *v1, *v2); }\nvoid VmlslNS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vmlsl_n_s32(*v0, *v1, *v2); }\nvoid VmlslNU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16_t* v2) { *r = vmlsl_n_u16(*v0, *v1, *v2); }\nvoid VmlslNU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32_t* v2) { *r = vmlsl_n_u32(*v0, *v1, *v2); }\nvoid VmlsqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmlsq_s8(*v0, *v1, *v2); }\nvoid VmlsqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vmlsq_s16(*v0, *v1, *v2); }\nvoid VmlsqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vmlsq_s32(*v0, *v1, *v2); }\nvoid VmlsqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmlsq_u8(*v0, *v1, *v2); }\nvoid VmlsqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vmlsq_u16(*v0, *v1, *v2); }\nvoid VmlsqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vmlsq_u32(*v0, *v1, *v2); }\nvoid VmlsqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vmlsq_f32(*v0, *v1, *v2); }\nvoid VmlsqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vmlsq_f64(*v0, *v1, *v2); }\nvoid VmlsqNS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16_t* v2) { *r = vmlsq_n_s16(*v0, *v1, *v2); }\nvoid VmlsqNS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32_t* v2) { *r = vmlsq_n_s32(*v0, *v1, *v2); }\nvoid VmlsqNU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16_t* v2) { *r = vmlsq_n_u16(*v0, *v1, *v2); }\nvoid VmlsqNU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32_t* v2) { *r = vmlsq_n_u32(*v0, *v1, *v2); }\nvoid VmlsqNF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32_t* v2) { *r = vmlsq_n_f32(*v0, *v1, *v2); }\nvoid VmmlaqS32(int32x4_t* r, int32x4_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmmlaq_s32(*v0, *v1, *v2); }\nvoid VmmlaqU32(uint32x4_t* r, uint32x4_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmmlaq_u32(*v0, *v1, *v2); }\nvoid VmovNS8(int8x8_t* r, int8_t* v0) { *r = vmov_n_s8(*v0); }\nvoid VmovNS16(int16x4_t* r, int16_t* v0) { *r = vmov_n_s16(*v0); }\nvoid VmovNS32(int32x2_t* r, int32_t* v0) { *r = vmov_n_s32(*v0); }\nvoid VmovNS64(int64x1_t* r, int64_t* v0) { *r = vmov_n_s64(*v0); }\nvoid VmovNU8(uint8x8_t* r, uint8_t* v0) { *r = vmov_n_u8(*v0); }\nvoid VmovNU16(uint16x4_t* r, uint16_t* v0) { *r = vmov_n_u16(*v0); }\nvoid VmovNU32(uint32x2_t* r, uint32_t* v0) { *r = vmov_n_u32(*v0); }\nvoid VmovNU64(uint64x1_t* r, uint64_t* v0) { *r = vmov_n_u64(*v0); }\nvoid VmovNF32(float32x2_t* r, float32_t* v0) { *r = vmov_n_f32(*v0); }\nvoid VmovNF64(float64x1_t* r, float64_t* v0) { *r = vmov_n_f64(*v0); }\nvoid VmovNP16(poly16x4_t* r, poly16_t* v0) { *r = vmov_n_p16(*v0); }\nvoid VmovNP64(poly64x1_t* r, poly64_t* v0) { *r = vmov_n_p64(*v0); }\nvoid VmovNP8(poly8x8_t* r, poly8_t* v0) { *r = vmov_n_p8(*v0); }\nvoid VmovlS8(int16x8_t* r, int8x8_t* v0) { *r = vmovl_s8(*v0); }\nvoid VmovlS16(int32x4_t* r, int16x4_t* v0) { *r = vmovl_s16(*v0); }\nvoid VmovlS32(int64x2_t* r, int32x2_t* v0) { *r = vmovl_s32(*v0); }\nvoid VmovlU8(uint16x8_t* r, uint8x8_t* v0) { *r = vmovl_u8(*v0); }\nvoid VmovlU16(uint32x4_t* r, uint16x4_t* v0) { *r = vmovl_u16(*v0); }\nvoid VmovlU32(uint64x2_t* r, uint32x2_t* v0) { *r = vmovl_u32(*v0); }\nvoid VmovlHighS8(int16x8_t* r, int8x16_t* v0) { *r = vmovl_high_s8(*v0); }\nvoid VmovlHighS16(int32x4_t* r, int16x8_t* v0) { *r = vmovl_high_s16(*v0); }\nvoid VmovlHighS32(int64x2_t* r, int32x4_t* v0) { *r = vmovl_high_s32(*v0); }\nvoid VmovlHighU8(uint16x8_t* r, uint8x16_t* v0) { *r = vmovl_high_u8(*v0); }\nvoid VmovlHighU16(uint32x4_t* r, uint16x8_t* v0) { *r = vmovl_high_u16(*v0); }\nvoid VmovlHighU32(uint64x2_t* r, uint32x4_t* v0) { *r = vmovl_high_u32(*v0); }\nvoid VmovnS16(int8x8_t* r, int16x8_t* v0) { *r = vmovn_s16(*v0); }\nvoid VmovnS32(int16x4_t* r, int32x4_t* v0) { *r = vmovn_s32(*v0); }\nvoid VmovnS64(int32x2_t* r, int64x2_t* v0) { *r = vmovn_s64(*v0); }\nvoid VmovnU16(uint8x8_t* r, uint16x8_t* v0) { *r = vmovn_u16(*v0); }\nvoid VmovnU32(uint16x4_t* r, uint32x4_t* v0) { *r = vmovn_u32(*v0); }\nvoid VmovnU64(uint32x2_t* r, uint64x2_t* v0) { *r = vmovn_u64(*v0); }\nvoid VmovnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1) { *r = vmovn_high_s16(*v0, *v1); }\nvoid VmovnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1) { *r = vmovn_high_s32(*v0, *v1); }\nvoid VmovnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1) { *r = vmovn_high_s64(*v0, *v1); }\nvoid VmovnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1) { *r = vmovn_high_u16(*v0, *v1); }\nvoid VmovnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1) { *r = vmovn_high_u32(*v0, *v1); }\nvoid VmovnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1) { *r = vmovn_high_u64(*v0, *v1); }\nvoid VmovqNS8(int8x16_t* r, int8_t* v0) { *r = vmovq_n_s8(*v0); }\nvoid VmovqNS16(int16x8_t* r, int16_t* v0) { *r = vmovq_n_s16(*v0); }\nvoid VmovqNS32(int32x4_t* r, int32_t* v0) { *r = vmovq_n_s32(*v0); }\nvoid VmovqNS64(int64x2_t* r, int64_t* v0) { *r = vmovq_n_s64(*v0); }\nvoid VmovqNU8(uint8x16_t* r, uint8_t* v0) { *r = vmovq_n_u8(*v0); }\nvoid VmovqNU16(uint16x8_t* r, uint16_t* v0) { *r = vmovq_n_u16(*v0); }\nvoid VmovqNU32(uint32x4_t* r, uint32_t* v0) { *r = vmovq_n_u32(*v0); }\nvoid VmovqNU64(uint64x2_t* r, uint64_t* v0) { *r = vmovq_n_u64(*v0); }\nvoid VmovqNF32(float32x4_t* r, float32_t* v0) { *r = vmovq_n_f32(*v0); }\nvoid VmovqNF64(float64x2_t* r, float64_t* v0) { *r = vmovq_n_f64(*v0); }\nvoid VmovqNP16(poly16x8_t* r, poly16_t* v0) { *r = vmovq_n_p16(*v0); }\nvoid VmovqNP64(poly64x2_t* r, poly64_t* v0) { *r = vmovq_n_p64(*v0); }\nvoid VmovqNP8(poly8x16_t* r, poly8_t* v0) { *r = vmovq_n_p8(*v0); }\nvoid VmulS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmul_s8(*v0, *v1); }\nvoid VmulS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vmul_s16(*v0, *v1); }\nvoid VmulS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vmul_s32(*v0, *v1); }\nvoid VmulU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vmul_u8(*v0, *v1); }\nvoid VmulU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vmul_u16(*v0, *v1); }\nvoid VmulU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vmul_u32(*v0, *v1); }\nvoid VmulF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmul_f32(*v0, *v1); }\nvoid VmulF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmul_f64(*v0, *v1); }\nvoid VmulNS16(int16x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vmul_n_s16(*v0, *v1); }\nvoid VmulNS32(int32x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vmul_n_s32(*v0, *v1); }\nvoid VmulNU16(uint16x4_t* r, uint16x4_t* v0, uint16_t* v1) { *r = vmul_n_u16(*v0, *v1); }\nvoid VmulNU32(uint32x2_t* r, uint32x2_t* v0, uint32_t* v1) { *r = vmul_n_u32(*v0, *v1); }\nvoid VmulNF32(float32x2_t* r, float32x2_t* v0, float32_t* v1) { *r = vmul_n_f32(*v0, *v1); }\nvoid VmulNF64(float64x1_t* r, float64x1_t* v0, float64_t* v1) { *r = vmul_n_f64(*v0, *v1); }\nvoid VmulP8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vmul_p8(*v0, *v1); }\nvoid VmullS8(int16x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmull_s8(*v0, *v1); }\nvoid VmullS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vmull_s16(*v0, *v1); }\nvoid VmullS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vmull_s32(*v0, *v1); }\nvoid VmullU8(uint16x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vmull_u8(*v0, *v1); }\nvoid VmullU16(uint32x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vmull_u16(*v0, *v1); }\nvoid VmullU32(uint64x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vmull_u32(*v0, *v1); }\nvoid VmullHighS8(int16x8_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vmull_high_s8(*v0, *v1); }\nvoid VmullHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vmull_high_s16(*v0, *v1); }\nvoid VmullHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vmull_high_s32(*v0, *v1); }\nvoid VmullHighU8(uint16x8_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vmull_high_u8(*v0, *v1); }\nvoid VmullHighU16(uint32x4_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vmull_high_u16(*v0, *v1); }\nvoid VmullHighU32(uint64x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vmull_high_u32(*v0, *v1); }\nvoid VmullHighNS16(int32x4_t* r, int16x8_t* v0, int16_t* v1) { *r = vmull_high_n_s16(*v0, *v1); }\nvoid VmullHighNS32(int64x2_t* r, int32x4_t* v0, int32_t* v1) { *r = vmull_high_n_s32(*v0, *v1); }\nvoid VmullHighNU16(uint32x4_t* r, uint16x8_t* v0, uint16_t* v1) { *r = vmull_high_n_u16(*v0, *v1); }\nvoid VmullHighNU32(uint64x2_t* r, uint32x4_t* v0, uint32_t* v1) { *r = vmull_high_n_u32(*v0, *v1); }\nvoid VmullHighP64(poly128_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vmull_high_p64(*v0, *v1); }\nvoid VmullHighP8(poly16x8_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vmull_high_p8(*v0, *v1); }\nvoid VmullNS16(int32x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vmull_n_s16(*v0, *v1); }\nvoid VmullNS32(int64x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vmull_n_s32(*v0, *v1); }\nvoid VmullNU16(uint32x4_t* r, uint16x4_t* v0, uint16_t* v1) { *r = vmull_n_u16(*v0, *v1); }\nvoid VmullNU32(uint64x2_t* r, uint32x2_t* v0, uint32_t* v1) { *r = vmull_n_u32(*v0, *v1); }\nvoid VmullP64(poly128_t* r, poly64_t* v0, poly64_t* v1) { *r = vmull_p64(*v0, *v1); }\nvoid VmullP8(poly16x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vmull_p8(*v0, *v1); }\nvoid VmulqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vmulq_s8(*v0, *v1); }\nvoid VmulqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vmulq_s16(*v0, *v1); }\nvoid VmulqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vmulq_s32(*v0, *v1); }\nvoid VmulqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vmulq_u8(*v0, *v1); }\nvoid VmulqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vmulq_u16(*v0, *v1); }\nvoid VmulqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vmulq_u32(*v0, *v1); }\nvoid VmulqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vmulq_f32(*v0, *v1); }\nvoid VmulqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vmulq_f64(*v0, *v1); }\nvoid VmulqNS16(int16x8_t* r, int16x8_t* v0, int16_t* v1) { *r = vmulq_n_s16(*v0, *v1); }\nvoid VmulqNS32(int32x4_t* r, int32x4_t* v0, int32_t* v1) { *r = vmulq_n_s32(*v0, *v1); }\nvoid VmulqNU16(uint16x8_t* r, uint16x8_t* v0, uint16_t* v1) { *r = vmulq_n_u16(*v0, *v1); }\nvoid VmulqNU32(uint32x4_t* r, uint32x4_t* v0, uint32_t* v1) { *r = vmulq_n_u32(*v0, *v1); }\nvoid VmulqNF32(float32x4_t* r, float32x4_t* v0, float32_t* v1) { *r = vmulq_n_f32(*v0, *v1); }\nvoid VmulqNF64(float64x2_t* r, float64x2_t* v0, float64_t* v1) { *r = vmulq_n_f64(*v0, *v1); }\nvoid VmulqP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vmulq_p8(*v0, *v1); }\nvoid VmulxF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmulx_f32(*v0, *v1); }\nvoid VmulxF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmulx_f64(*v0, *v1); }\nvoid VmulxdF64(float64_t* r, float64_t* v0, float64_t* v1) { *r = vmulxd_f64(*v0, *v1); }\nvoid VmulxqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vmulxq_f32(*v0, *v1); }\nvoid VmulxqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vmulxq_f64(*v0, *v1); }\nvoid VmulxsF32(float32_t* r, float32_t* v0, float32_t* v1) { *r = vmulxs_f32(*v0, *v1); }\nvoid VmvnS8(int8x8_t* r, int8x8_t* v0) { *r = vmvn_s8(*v0); }\nvoid VmvnS16(int16x4_t* r, int16x4_t* v0) { *r = vmvn_s16(*v0); }\nvoid VmvnS32(int32x2_t* r, int32x2_t* v0) { *r = vmvn_s32(*v0); }\nvoid VmvnU8(uint8x8_t* r, uint8x8_t* v0) { *r = vmvn_u8(*v0); }\nvoid VmvnU16(uint16x4_t* r, uint16x4_t* v0) { *r = vmvn_u16(*v0); }\nvoid VmvnU32(uint32x2_t* r, uint32x2_t* v0) { *r = vmvn_u32(*v0); }\nvoid VmvnP8(poly8x8_t* r, poly8x8_t* v0) { *r = vmvn_p8(*v0); }\nvoid VmvnqS8(int8x16_t* r, int8x16_t* v0) { *r = vmvnq_s8(*v0); }\nvoid VmvnqS16(int16x8_t* r, int16x8_t* v0) { *r = vmvnq_s16(*v0); }\nvoid VmvnqS32(int32x4_t* r, int32x4_t* v0) { *r = vmvnq_s32(*v0); }\nvoid VmvnqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vmvnq_u8(*v0); }\nvoid VmvnqU16(uint16x8_t* r, uint16x8_t* v0) { *r = vmvnq_u16(*v0); }\nvoid VmvnqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vmvnq_u32(*v0); }\nvoid VmvnqP8(poly8x16_t* r, poly8x16_t* v0) { *r = vmvnq_p8(*v0); }\nvoid VnegS8(int8x8_t* r, int8x8_t* v0) { *r = vneg_s8(*v0); }\nvoid VnegS16(int16x4_t* r, int16x4_t* v0) { *r = vneg_s16(*v0); }\nvoid VnegS32(int32x2_t* r, int32x2_t* v0) { *r = vneg_s32(*v0); }\nvoid VnegS64(int64x1_t* r, int64x1_t* v0) { *r = vneg_s64(*v0); }\nvoid VnegF32(float32x2_t* r, float32x2_t* v0) { *r = vneg_f32(*v0); }\nvoid VnegF64(float64x1_t* r, float64x1_t* v0) { *r = vneg_f64(*v0); }\nvoid VnegdS64(int64_t* r, int64_t* v0) { *r = vnegd_s64(*v0); }\nvoid VnegqS8(int8x16_t* r, int8x16_t* v0) { *r = vnegq_s8(*v0); }\nvoid VnegqS16(int16x8_t* r, int16x8_t* v0) { *r = vnegq_s16(*v0); }\nvoid VnegqS32(int32x4_t* r, int32x4_t* v0) { *r = vnegq_s32(*v0); }\nvoid VnegqS64(int64x2_t* r, int64x2_t* v0) { *r = vnegq_s64(*v0); }\nvoid VnegqF32(float32x4_t* r, float32x4_t* v0) { *r = vnegq_f32(*v0); }\nvoid VnegqF64(float64x2_t* r, float64x2_t* v0) { *r = vnegq_f64(*v0); }\nvoid VornS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vorn_s8(*v0, *v1); }\nvoid VornS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vorn_s16(*v0, *v1); }\nvoid VornS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vorn_s32(*v0, *v1); }\nvoid VornS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vorn_s64(*v0, *v1); }\nvoid VornU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vorn_u8(*v0, *v1); }\nvoid VornU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vorn_u16(*v0, *v1); }\nvoid VornU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vorn_u32(*v0, *v1); }\nvoid VornU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vorn_u64(*v0, *v1); }\nvoid VornqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vornq_s8(*v0, *v1); }\nvoid VornqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vornq_s16(*v0, *v1); }\nvoid VornqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vornq_s32(*v0, *v1); }\nvoid VornqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vornq_s64(*v0, *v1); }\nvoid VornqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vornq_u8(*v0, *v1); }\nvoid VornqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vornq_u16(*v0, *v1); }\nvoid VornqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vornq_u32(*v0, *v1); }\nvoid VornqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vornq_u64(*v0, *v1); }\nvoid VorrS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vorr_s8(*v0, *v1); }\nvoid VorrS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vorr_s16(*v0, *v1); }\nvoid VorrS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vorr_s32(*v0, *v1); }\nvoid VorrS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vorr_s64(*v0, *v1); }\nvoid VorrU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vorr_u8(*v0, *v1); }\nvoid VorrU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vorr_u16(*v0, *v1); }\nvoid VorrU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vorr_u32(*v0, *v1); }\nvoid VorrU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vorr_u64(*v0, *v1); }\nvoid VorrqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vorrq_s8(*v0, *v1); }\nvoid VorrqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vorrq_s16(*v0, *v1); }\nvoid VorrqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vorrq_s32(*v0, *v1); }\nvoid VorrqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vorrq_s64(*v0, *v1); }\nvoid VorrqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vorrq_u8(*v0, *v1); }\nvoid VorrqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vorrq_u16(*v0, *v1); }\nvoid VorrqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vorrq_u32(*v0, *v1); }\nvoid VorrqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vorrq_u64(*v0, *v1); }\nvoid VpadalS8(int16x4_t* r, int16x4_t* v0, int8x8_t* v1) { *r = vpadal_s8(*v0, *v1); }\nvoid VpadalS16(int32x2_t* r, int32x2_t* v0, int16x4_t* v1) { *r = vpadal_s16(*v0, *v1); }\nvoid VpadalS32(int64x1_t* r, int64x1_t* v0, int32x2_t* v1) { *r = vpadal_s32(*v0, *v1); }\nvoid VpadalU8(uint16x4_t* r, uint16x4_t* v0, uint8x8_t* v1) { *r = vpadal_u8(*v0, *v1); }\nvoid VpadalU16(uint32x2_t* r, uint32x2_t* v0, uint16x4_t* v1) { *r = vpadal_u16(*v0, *v1); }\nvoid VpadalU32(uint64x1_t* r, uint64x1_t* v0, uint32x2_t* v1) { *r = vpadal_u32(*v0, *v1); }\nvoid VpadalqS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1) { *r = vpadalq_s8(*v0, *v1); }\nvoid VpadalqS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1) { *r = vpadalq_s16(*v0, *v1); }\nvoid VpadalqS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1) { *r = vpadalq_s32(*v0, *v1); }\nvoid VpadalqU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1) { *r = vpadalq_u8(*v0, *v1); }\nvoid VpadalqU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1) { *r = vpadalq_u16(*v0, *v1); }\nvoid VpadalqU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1) { *r = vpadalq_u32(*v0, *v1); }\nvoid VpaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vpadd_s8(*v0, *v1); }\nvoid VpaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vpadd_s16(*v0, *v1); }\nvoid VpaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vpadd_s32(*v0, *v1); }\nvoid VpaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vpadd_u8(*v0, *v1); }\nvoid VpaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vpadd_u16(*v0, *v1); }\nvoid VpaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vpadd_u32(*v0, *v1); }\nvoid VpaddF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpadd_f32(*v0, *v1); }\nvoid VpadddS64(int64_t* r, int64x2_t* v0) { *r = vpaddd_s64(*v0); }\nvoid VpadddU64(uint64_t* r, uint64x2_t* v0) { *r = vpaddd_u64(*v0); }\nvoid VpadddF64(float64_t* r, float64x2_t* v0) { *r = vpaddd_f64(*v0); }\nvoid VpaddlS8(int16x4_t* r, int8x8_t* v0) { *r = vpaddl_s8(*v0); }\nvoid VpaddlS16(int32x2_t* r, int16x4_t* v0) { *r = vpaddl_s16(*v0); }\nvoid VpaddlS32(int64x1_t* r, int32x2_t* v0) { *r = vpaddl_s32(*v0); }\nvoid VpaddlU8(uint16x4_t* r, uint8x8_t* v0) { *r = vpaddl_u8(*v0); }\nvoid VpaddlU16(uint32x2_t* r, uint16x4_t* v0) { *r = vpaddl_u16(*v0); }\nvoid VpaddlU32(uint64x1_t* r, uint32x2_t* v0) { *r = vpaddl_u32(*v0); }\nvoid VpaddlqS8(int16x8_t* r, int8x16_t* v0) { *r = vpaddlq_s8(*v0); }\nvoid VpaddlqS16(int32x4_t* r, int16x8_t* v0) { *r = vpaddlq_s16(*v0); }\nvoid VpaddlqS32(int64x2_t* r, int32x4_t* v0) { *r = vpaddlq_s32(*v0); }\nvoid VpaddlqU8(uint16x8_t* r, uint8x16_t* v0) { *r = vpaddlq_u8(*v0); }\nvoid VpaddlqU16(uint32x4_t* r, uint16x8_t* v0) { *r = vpaddlq_u16(*v0); }\nvoid VpaddlqU32(uint64x2_t* r, uint32x4_t* v0) { *r = vpaddlq_u32(*v0); }\nvoid VpaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vpaddq_s8(*v0, *v1); }\nvoid VpaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vpaddq_s16(*v0, *v1); }\nvoid VpaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vpaddq_s32(*v0, *v1); }\nvoid VpaddqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vpaddq_s64(*v0, *v1); }\nvoid VpaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vpaddq_u8(*v0, *v1); }\nvoid VpaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vpaddq_u16(*v0, *v1); }\nvoid VpaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vpaddq_u32(*v0, *v1); }\nvoid VpaddqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vpaddq_u64(*v0, *v1); }\nvoid VpaddqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpaddq_f32(*v0, *v1); }\nvoid VpaddqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpaddq_f64(*v0, *v1); }\nvoid VpaddsF32(float32_t* r, float32x2_t* v0) { *r = vpadds_f32(*v0); }\nvoid VpmaxS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vpmax_s8(*v0, *v1); }\nvoid VpmaxS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vpmax_s16(*v0, *v1); }\nvoid VpmaxS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vpmax_s32(*v0, *v1); }\nvoid VpmaxU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vpmax_u8(*v0, *v1); }\nvoid VpmaxU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vpmax_u16(*v0, *v1); }\nvoid VpmaxU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vpmax_u32(*v0, *v1); }\nvoid VpmaxF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpmax_f32(*v0, *v1); }\nvoid VpmaxnmF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpmaxnm_f32(*v0, *v1); }\nvoid VpmaxnmqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpmaxnmq_f32(*v0, *v1); }\nvoid VpmaxnmqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpmaxnmq_f64(*v0, *v1); }\nvoid VpmaxnmqdF64(float64_t* r, float64x2_t* v0) { *r = vpmaxnmqd_f64(*v0); }\nvoid VpmaxnmsF32(float32_t* r, float32x2_t* v0) { *r = vpmaxnms_f32(*v0); }\nvoid VpmaxqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vpmaxq_s8(*v0, *v1); }\nvoid VpmaxqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vpmaxq_s16(*v0, *v1); }\nvoid VpmaxqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vpmaxq_s32(*v0, *v1); }\nvoid VpmaxqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vpmaxq_u8(*v0, *v1); }\nvoid VpmaxqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vpmaxq_u16(*v0, *v1); }\nvoid VpmaxqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vpmaxq_u32(*v0, *v1); }\nvoid VpmaxqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpmaxq_f32(*v0, *v1); }\nvoid VpmaxqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpmaxq_f64(*v0, *v1); }\nvoid VpmaxqdF64(float64_t* r, float64x2_t* v0) { *r = vpmaxqd_f64(*v0); }\nvoid VpmaxsF32(float32_t* r, float32x2_t* v0) { *r = vpmaxs_f32(*v0); }\nvoid VpminS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vpmin_s8(*v0, *v1); }\nvoid VpminS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vpmin_s16(*v0, *v1); }\nvoid VpminS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vpmin_s32(*v0, *v1); }\nvoid VpminU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vpmin_u8(*v0, *v1); }\nvoid VpminU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vpmin_u16(*v0, *v1); }\nvoid VpminU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vpmin_u32(*v0, *v1); }\nvoid VpminF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpmin_f32(*v0, *v1); }\nvoid VpminnmF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpminnm_f32(*v0, *v1); }\nvoid VpminnmqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpminnmq_f32(*v0, *v1); }\nvoid VpminnmqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpminnmq_f64(*v0, *v1); }\nvoid VpminnmqdF64(float64_t* r, float64x2_t* v0) { *r = vpminnmqd_f64(*v0); }\nvoid VpminnmsF32(float32_t* r, float32x2_t* v0) { *r = vpminnms_f32(*v0); }\nvoid VpminqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vpminq_s8(*v0, *v1); }\nvoid VpminqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vpminq_s16(*v0, *v1); }\nvoid VpminqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vpminq_s32(*v0, *v1); }\nvoid VpminqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vpminq_u8(*v0, *v1); }\nvoid VpminqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vpminq_u16(*v0, *v1); }\nvoid VpminqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vpminq_u32(*v0, *v1); }\nvoid VpminqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpminq_f32(*v0, *v1); }\nvoid VpminqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpminq_f64(*v0, *v1); }\nvoid VpminqdF64(float64_t* r, float64x2_t* v0) { *r = vpminqd_f64(*v0); }\nvoid VpminsF32(float32_t* r, float32x2_t* v0) { *r = vpmins_f32(*v0); }\nvoid VqabsS8(int8x8_t* r, int8x8_t* v0) { *r = vqabs_s8(*v0); }\nvoid VqabsS16(int16x4_t* r, int16x4_t* v0) { *r = vqabs_s16(*v0); }\nvoid VqabsS32(int32x2_t* r, int32x2_t* v0) { *r = vqabs_s32(*v0); }\nvoid VqabsS64(int64x1_t* r, int64x1_t* v0) { *r = vqabs_s64(*v0); }\nvoid VqabsbS8(int8_t* r, int8_t* v0) { *r = vqabsb_s8(*v0); }\nvoid VqabsdS64(int64_t* r, int64_t* v0) { *r = vqabsd_s64(*v0); }\nvoid VqabshS16(int16_t* r, int16_t* v0) { *r = vqabsh_s16(*v0); }\nvoid VqabsqS8(int8x16_t* r, int8x16_t* v0) { *r = vqabsq_s8(*v0); }\nvoid VqabsqS16(int16x8_t* r, int16x8_t* v0) { *r = vqabsq_s16(*v0); }\nvoid VqabsqS32(int32x4_t* r, int32x4_t* v0) { *r = vqabsq_s32(*v0); }\nvoid VqabsqS64(int64x2_t* r, int64x2_t* v0) { *r = vqabsq_s64(*v0); }\nvoid VqabssS32(int32_t* r, int32_t* v0) { *r = vqabss_s32(*v0); }\nvoid VqaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vqadd_s8(*v0, *v1); }\nvoid VqaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqadd_s16(*v0, *v1); }\nvoid VqaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqadd_s32(*v0, *v1); }\nvoid VqaddS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vqadd_s64(*v0, *v1); }\nvoid VqaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vqadd_u8(*v0, *v1); }\nvoid VqaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vqadd_u16(*v0, *v1); }\nvoid VqaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vqadd_u32(*v0, *v1); }\nvoid VqaddU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vqadd_u64(*v0, *v1); }\nvoid VqaddbS8(int8_t* r, int8_t* v0, int8_t* v1) { *r = vqaddb_s8(*v0, *v1); }\nvoid VqaddbU8(uint8_t* r, uint8_t* v0, uint8_t* v1) { *r = vqaddb_u8(*v0, *v1); }\nvoid VqadddS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vqaddd_s64(*v0, *v1); }\nvoid VqadddU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vqaddd_u64(*v0, *v1); }\nvoid VqaddhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqaddh_s16(*v0, *v1); }\nvoid VqaddhU16(uint16_t* r, uint16_t* v0, uint16_t* v1) { *r = vqaddh_u16(*v0, *v1); }\nvoid VqaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vqaddq_s8(*v0, *v1); }\nvoid VqaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqaddq_s16(*v0, *v1); }\nvoid VqaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqaddq_s32(*v0, *v1); }\nvoid VqaddqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vqaddq_s64(*v0, *v1); }\nvoid VqaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vqaddq_u8(*v0, *v1); }\nvoid VqaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vqaddq_u16(*v0, *v1); }\nvoid VqaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vqaddq_u32(*v0, *v1); }\nvoid VqaddqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vqaddq_u64(*v0, *v1); }\nvoid VqaddsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqadds_s32(*v0, *v1); }\nvoid VqaddsU32(uint32_t* r, uint32_t* v0, uint32_t* v1) { *r = vqadds_u32(*v0, *v1); }\nvoid VqdmlalS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vqdmlal_s16(*v0, *v1, *v2); }\nvoid VqdmlalS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vqdmlal_s32(*v0, *v1, *v2); }\nvoid VqdmlalHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vqdmlal_high_s16(*v0, *v1, *v2); }\nvoid VqdmlalHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vqdmlal_high_s32(*v0, *v1, *v2); }\nvoid VqdmlalHighNS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16_t* v2) { *r = vqdmlal_high_n_s16(*v0, *v1, *v2); }\nvoid VqdmlalHighNS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32_t* v2) { *r = vqdmlal_high_n_s32(*v0, *v1, *v2); }\nvoid VqdmlalNS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vqdmlal_n_s16(*v0, *v1, *v2); }\nvoid VqdmlalNS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vqdmlal_n_s32(*v0, *v1, *v2); }\nvoid VqdmlalhS16(int32_t* r, int32_t* v0, int16_t* v1, int16_t* v2) { *r = vqdmlalh_s16(*v0, *v1, *v2); }\nvoid VqdmlalsS32(int64_t* r, int64_t* v0, int32_t* v1, int32_t* v2) { *r = vqdmlals_s32(*v0, *v1, *v2); }\nvoid VqdmlslS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vqdmlsl_s16(*v0, *v1, *v2); }\nvoid VqdmlslS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vqdmlsl_s32(*v0, *v1, *v2); }\nvoid VqdmlslHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vqdmlsl_high_s16(*v0, *v1, *v2); }\nvoid VqdmlslHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vqdmlsl_high_s32(*v0, *v1, *v2); }\nvoid VqdmlslHighNS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16_t* v2) { *r = vqdmlsl_high_n_s16(*v0, *v1, *v2); }\nvoid VqdmlslHighNS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32_t* v2) { *r = vqdmlsl_high_n_s32(*v0, *v1, *v2); }\nvoid VqdmlslNS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vqdmlsl_n_s16(*v0, *v1, *v2); }\nvoid VqdmlslNS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vqdmlsl_n_s32(*v0, *v1, *v2); }\nvoid VqdmlslhS16(int32_t* r, int32_t* v0, int16_t* v1, int16_t* v2) { *r = vqdmlslh_s16(*v0, *v1, *v2); }\nvoid VqdmlslsS32(int64_t* r, int64_t* v0, int32_t* v1, int32_t* v2) { *r = vqdmlsls_s32(*v0, *v1, *v2); }\nvoid VqdmulhS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqdmulh_s16(*v0, *v1); }\nvoid VqdmulhS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqdmulh_s32(*v0, *v1); }\nvoid VqdmulhNS16(int16x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vqdmulh_n_s16(*v0, *v1); }\nvoid VqdmulhNS32(int32x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vqdmulh_n_s32(*v0, *v1); }\nvoid VqdmulhhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqdmulhh_s16(*v0, *v1); }\nvoid VqdmulhqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqdmulhq_s16(*v0, *v1); }\nvoid VqdmulhqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqdmulhq_s32(*v0, *v1); }\nvoid VqdmulhqNS16(int16x8_t* r, int16x8_t* v0, int16_t* v1) { *r = vqdmulhq_n_s16(*v0, *v1); }\nvoid VqdmulhqNS32(int32x4_t* r, int32x4_t* v0, int32_t* v1) { *r = vqdmulhq_n_s32(*v0, *v1); }\nvoid VqdmulhsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqdmulhs_s32(*v0, *v1); }\nvoid VqdmullS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqdmull_s16(*v0, *v1); }\nvoid VqdmullS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqdmull_s32(*v0, *v1); }\nvoid VqdmullHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqdmull_high_s16(*v0, *v1); }\nvoid VqdmullHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqdmull_high_s32(*v0, *v1); }\nvoid VqdmullHighNS16(int32x4_t* r, int16x8_t* v0, int16_t* v1) { *r = vqdmull_high_n_s16(*v0, *v1); }\nvoid VqdmullHighNS32(int64x2_t* r, int32x4_t* v0, int32_t* v1) { *r = vqdmull_high_n_s32(*v0, *v1); }\nvoid VqdmullNS16(int32x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vqdmull_n_s16(*v0, *v1); }\nvoid VqdmullNS32(int64x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vqdmull_n_s32(*v0, *v1); }\nvoid VqdmullhS16(int32_t* r, int16_t* v0, int16_t* v1) { *r = vqdmullh_s16(*v0, *v1); }\nvoid VqdmullsS32(int64_t* r, int32_t* v0, int32_t* v1) { *r = vqdmulls_s32(*v0, *v1); }\nvoid VqmovnS16(int8x8_t* r, int16x8_t* v0) { *r = vqmovn_s16(*v0); }\nvoid VqmovnS32(int16x4_t* r, int32x4_t* v0) { *r = vqmovn_s32(*v0); }\nvoid VqmovnS64(int32x2_t* r, int64x2_t* v0) { *r = vqmovn_s64(*v0); }\nvoid VqmovnU16(uint8x8_t* r, uint16x8_t* v0) { *r = vqmovn_u16(*v0); }\nvoid VqmovnU32(uint16x4_t* r, uint32x4_t* v0) { *r = vqmovn_u32(*v0); }\nvoid VqmovnU64(uint32x2_t* r, uint64x2_t* v0) { *r = vqmovn_u64(*v0); }\nvoid VqmovnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1) { *r = vqmovn_high_s16(*v0, *v1); }\nvoid VqmovnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1) { *r = vqmovn_high_s32(*v0, *v1); }\nvoid VqmovnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1) { *r = vqmovn_high_s64(*v0, *v1); }\nvoid VqmovnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1) { *r = vqmovn_high_u16(*v0, *v1); }\nvoid VqmovnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1) { *r = vqmovn_high_u32(*v0, *v1); }\nvoid VqmovnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1) { *r = vqmovn_high_u64(*v0, *v1); }\nvoid VqmovndS64(int32_t* r, int64_t* v0) { *r = vqmovnd_s64(*v0); }\nvoid VqmovndU64(uint32_t* r, uint64_t* v0) { *r = vqmovnd_u64(*v0); }\nvoid VqmovnhS16(int8_t* r, int16_t* v0) { *r = vqmovnh_s16(*v0); }\nvoid VqmovnhU16(uint8_t* r, uint16_t* v0) { *r = vqmovnh_u16(*v0); }\nvoid VqmovnsS32(int16_t* r, int32_t* v0) { *r = vqmovns_s32(*v0); }\nvoid VqmovnsU32(uint16_t* r, uint32_t* v0) { *r = vqmovns_u32(*v0); }\nvoid VqmovunS16(uint8x8_t* r, int16x8_t* v0) { *r = vqmovun_s16(*v0); }\nvoid VqmovunS32(uint16x4_t* r, int32x4_t* v0) { *r = vqmovun_s32(*v0); }\nvoid VqmovunS64(uint32x2_t* r, int64x2_t* v0) { *r = vqmovun_s64(*v0); }\nvoid VqmovunHighS16(uint8x16_t* r, uint8x8_t* v0, int16x8_t* v1) { *r = vqmovun_high_s16(*v0, *v1); }\nvoid VqmovunHighS32(uint16x8_t* r, uint16x4_t* v0, int32x4_t* v1) { *r = vqmovun_high_s32(*v0, *v1); }\nvoid VqmovunHighS64(uint32x4_t* r, uint32x2_t* v0, int64x2_t* v1) { *r = vqmovun_high_s64(*v0, *v1); }\nvoid VqmovundS64(uint32_t* r, int64_t* v0) { *r = vqmovund_s64(*v0); }\nvoid VqmovunhS16(uint8_t* r, int16_t* v0) { *r = vqmovunh_s16(*v0); }\nvoid VqmovunsS32(uint16_t* r, int32_t* v0) { *r = vqmovuns_s32(*v0); }\nvoid VqnegS8(int8x8_t* r, int8x8_t* v0) { *r = vqneg_s8(*v0); }\nvoid VqnegS16(int16x4_t* r, int16x4_t* v0) { *r = vqneg_s16(*v0); }\nvoid VqnegS32(int32x2_t* r, int32x2_t* v0) { *r = vqneg_s32(*v0); }\nvoid VqnegS64(int64x1_t* r, int64x1_t* v0) { *r = vqneg_s64(*v0); }\nvoid VqnegbS8(int8_t* r, int8_t* v0) { *r = vqnegb_s8(*v0); }\nvoid VqnegdS64(int64_t* r, int64_t* v0) { *r = vqnegd_s64(*v0); }\nvoid VqneghS16(int16_t* r, int16_t* v0) { *r = vqnegh_s16(*v0); }\nvoid VqnegqS8(int8x16_t* r, int8x16_t* v0) { *r = vqnegq_s8(*v0); }\nvoid VqnegqS16(int16x8_t* r, int16x8_t* v0) { *r = vqnegq_s16(*v0); }\nvoid VqnegqS32(int32x4_t* r, int32x4_t* v0) { *r = vqnegq_s32(*v0); }\nvoid VqnegqS64(int64x2_t* r, int64x2_t* v0) { *r = vqnegq_s64(*v0); }\nvoid VqnegsS32(int32_t* r, int32_t* v0) { *r = vqnegs_s32(*v0); }\nvoid VqrdmlahS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vqrdmlah_s16(*v0, *v1, *v2); }\nvoid VqrdmlahS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vqrdmlah_s32(*v0, *v1, *v2); }\nvoid VqrdmlahhS16(int16_t* r, int16_t* v0, int16_t* v1, int16_t* v2) { *r = vqrdmlahh_s16(*v0, *v1, *v2); }\nvoid VqrdmlahqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vqrdmlahq_s16(*v0, *v1, *v2); }\nvoid VqrdmlahqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vqrdmlahq_s32(*v0, *v1, *v2); }\nvoid VqrdmlahsS32(int32_t* r, int32_t* v0, int32_t* v1, int32_t* v2) { *r = vqrdmlahs_s32(*v0, *v1, *v2); }\nvoid VqrdmlshS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vqrdmlsh_s16(*v0, *v1, *v2); }\nvoid VqrdmlshS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vqrdmlsh_s32(*v0, *v1, *v2); }\nvoid VqrdmlshhS16(int16_t* r, int16_t* v0, int16_t* v1, int16_t* v2) { *r = vqrdmlshh_s16(*v0, *v1, *v2); }\nvoid VqrdmlshqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vqrdmlshq_s16(*v0, *v1, *v2); }\nvoid VqrdmlshqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vqrdmlshq_s32(*v0, *v1, *v2); }\nvoid VqrdmlshsS32(int32_t* r, int32_t* v0, int32_t* v1, int32_t* v2) { *r = vqrdmlshs_s32(*v0, *v1, *v2); }\nvoid VqrdmulhS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqrdmulh_s16(*v0, *v1); }\nvoid VqrdmulhS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqrdmulh_s32(*v0, *v1); }\nvoid VqrdmulhNS16(int16x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vqrdmulh_n_s16(*v0, *v1); }\nvoid VqrdmulhNS32(int32x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vqrdmulh_n_s32(*v0, *v1); }\nvoid VqrdmulhhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqrdmulhh_s16(*v0, *v1); }\nvoid VqrdmulhqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqrdmulhq_s16(*v0, *v1); }\nvoid VqrdmulhqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqrdmulhq_s32(*v0, *v1); }\nvoid VqrdmulhqNS16(int16x8_t* r, int16x8_t* v0, int16_t* v1) { *r = vqrdmulhq_n_s16(*v0, *v1); }\nvoid VqrdmulhqNS32(int32x4_t* r, int32x4_t* v0, int32_t* v1) { *r = vqrdmulhq_n_s32(*v0, *v1); }\nvoid VqrdmulhsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqrdmulhs_s32(*v0, *v1); }\nvoid VqrshlS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vqrshl_s8(*v0, *v1); }\nvoid VqrshlS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqrshl_s16(*v0, *v1); }\nvoid VqrshlS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqrshl_s32(*v0, *v1); }\nvoid VqrshlS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vqrshl_s64(*v0, *v1); }\nvoid VqrshlU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vqrshl_u8(*v0, *v1); }\nvoid VqrshlU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vqrshl_u16(*v0, *v1); }\nvoid VqrshlU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vqrshl_u32(*v0, *v1); }\nvoid VqrshlU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vqrshl_u64(*v0, *v1); }\nvoid VqrshlbS8(int8_t* r, int8_t* v0, int8_t* v1) { *r = vqrshlb_s8(*v0, *v1); }\nvoid VqrshlbU8(uint8_t* r, uint8_t* v0, int8_t* v1) { *r = vqrshlb_u8(*v0, *v1); }\nvoid VqrshldS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vqrshld_s64(*v0, *v1); }\nvoid VqrshldU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vqrshld_u64(*v0, *v1); }\nvoid VqrshlhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqrshlh_s16(*v0, *v1); }\nvoid VqrshlhU16(uint16_t* r, uint16_t* v0, int16_t* v1) { *r = vqrshlh_u16(*v0, *v1); }\nvoid VqrshlqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vqrshlq_s8(*v0, *v1); }\nvoid VqrshlqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqrshlq_s16(*v0, *v1); }\nvoid VqrshlqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqrshlq_s32(*v0, *v1); }\nvoid VqrshlqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vqrshlq_s64(*v0, *v1); }\nvoid VqrshlqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vqrshlq_u8(*v0, *v1); }\nvoid VqrshlqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vqrshlq_u16(*v0, *v1); }\nvoid VqrshlqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vqrshlq_u32(*v0, *v1); }\nvoid VqrshlqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vqrshlq_u64(*v0, *v1); }\nvoid VqrshlsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqrshls_s32(*v0, *v1); }\nvoid VqrshlsU32(uint32_t* r, uint32_t* v0, int32_t* v1) { *r = vqrshls_u32(*v0, *v1); }\nvoid VqshlS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vqshl_s8(*v0, *v1); }\nvoid VqshlS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqshl_s16(*v0, *v1); }\nvoid VqshlS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqshl_s32(*v0, *v1); }\nvoid VqshlS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vqshl_s64(*v0, *v1); }\nvoid VqshlU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vqshl_u8(*v0, *v1); }\nvoid VqshlU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vqshl_u16(*v0, *v1); }\nvoid VqshlU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vqshl_u32(*v0, *v1); }\nvoid VqshlU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vqshl_u64(*v0, *v1); }\nvoid VqshlbS8(int8_t* r, int8_t* v0, int8_t* v1) { *r = vqshlb_s8(*v0, *v1); }\nvoid VqshlbU8(uint8_t* r, uint8_t* v0, int8_t* v1) { *r = vqshlb_u8(*v0, *v1); }\nvoid VqshldS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vqshld_s64(*v0, *v1); }\nvoid VqshldU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vqshld_u64(*v0, *v1); }\nvoid VqshlhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqshlh_s16(*v0, *v1); }\nvoid VqshlhU16(uint16_t* r, uint16_t* v0, int16_t* v1) { *r = vqshlh_u16(*v0, *v1); }\nvoid VqshlqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vqshlq_s8(*v0, *v1); }\nvoid VqshlqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqshlq_s16(*v0, *v1); }\nvoid VqshlqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqshlq_s32(*v0, *v1); }\nvoid VqshlqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vqshlq_s64(*v0, *v1); }\nvoid VqshlqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vqshlq_u8(*v0, *v1); }\nvoid VqshlqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vqshlq_u16(*v0, *v1); }\nvoid VqshlqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vqshlq_u32(*v0, *v1); }\nvoid VqshlqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vqshlq_u64(*v0, *v1); }\nvoid VqshlsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqshls_s32(*v0, *v1); }\nvoid VqshlsU32(uint32_t* r, uint32_t* v0, int32_t* v1) { *r = vqshls_u32(*v0, *v1); }\nvoid VqsubS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vqsub_s8(*v0, *v1); }\nvoid VqsubS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqsub_s16(*v0, *v1); }\nvoid VqsubS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqsub_s32(*v0, *v1); }\nvoid VqsubS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vqsub_s64(*v0, *v1); }\nvoid VqsubU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vqsub_u8(*v0, *v1); }\nvoid VqsubU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vqsub_u16(*v0, *v1); }\nvoid VqsubU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vqsub_u32(*v0, *v1); }\nvoid VqsubU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vqsub_u64(*v0, *v1); }\nvoid VqsubbS8(int8_t* r, int8_t* v0, int8_t* v1) { *r = vqsubb_s8(*v0, *v1); }\nvoid VqsubbU8(uint8_t* r, uint8_t* v0, uint8_t* v1) { *r = vqsubb_u8(*v0, *v1); }\nvoid VqsubdS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vqsubd_s64(*v0, *v1); }\nvoid VqsubdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vqsubd_u64(*v0, *v1); }\nvoid VqsubhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqsubh_s16(*v0, *v1); }\nvoid VqsubhU16(uint16_t* r, uint16_t* v0, uint16_t* v1) { *r = vqsubh_u16(*v0, *v1); }\nvoid VqsubqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vqsubq_s8(*v0, *v1); }\nvoid VqsubqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqsubq_s16(*v0, *v1); }\nvoid VqsubqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqsubq_s32(*v0, *v1); }\nvoid VqsubqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vqsubq_s64(*v0, *v1); }\nvoid VqsubqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vqsubq_u8(*v0, *v1); }\nvoid VqsubqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vqsubq_u16(*v0, *v1); }\nvoid VqsubqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vqsubq_u32(*v0, *v1); }\nvoid VqsubqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vqsubq_u64(*v0, *v1); }\nvoid VqsubsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqsubs_s32(*v0, *v1); }\nvoid VqsubsU32(uint32_t* r, uint32_t* v0, uint32_t* v1) { *r = vqsubs_u32(*v0, *v1); }\nvoid Vqtbl1S8(int8x8_t* r, int8x16_t* v0, uint8x8_t* v1) { *r = vqtbl1_s8(*v0, *v1); }\nvoid Vqtbl1U8(uint8x8_t* r, uint8x16_t* v0, uint8x8_t* v1) { *r = vqtbl1_u8(*v0, *v1); }\nvoid Vqtbl1P8(poly8x8_t* r, poly8x16_t* v0, uint8x8_t* v1) { *r = vqtbl1_p8(*v0, *v1); }\nvoid Vqtbl1QS8(int8x16_t* r, int8x16_t* v0, uint8x16_t* v1) { *r = vqtbl1q_s8(*v0, *v1); }\nvoid Vqtbl1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vqtbl1q_u8(*v0, *v1); }\nvoid Vqtbl1QP8(poly8x16_t* r, poly8x16_t* v0, uint8x16_t* v1) { *r = vqtbl1q_p8(*v0, *v1); }\nvoid Vqtbl2S8(int8x8_t* r, int8x16x2_t* v0, uint8x8_t* v1) { *r = vqtbl2_s8(*v0, *v1); }\nvoid Vqtbl2U8(uint8x8_t* r, uint8x16x2_t* v0, uint8x8_t* v1) { *r = vqtbl2_u8(*v0, *v1); }\nvoid Vqtbl2P8(poly8x8_t* r, poly8x16x2_t* v0, uint8x8_t* v1) { *r = vqtbl2_p8(*v0, *v1); }\nvoid Vqtbl2QS8(int8x16_t* r, int8x16x2_t* v0, uint8x16_t* v1) { *r = vqtbl2q_s8(*v0, *v1); }\nvoid Vqtbl2QU8(uint8x16_t* r, uint8x16x2_t* v0, uint8x16_t* v1) { *r = vqtbl2q_u8(*v0, *v1); }\nvoid Vqtbl2QP8(poly8x16_t* r, poly8x16x2_t* v0, uint8x16_t* v1) { *r = vqtbl2q_p8(*v0, *v1); }\nvoid Vqtbl3S8(int8x8_t* r, int8x16x3_t* v0, uint8x8_t* v1) { *r = vqtbl3_s8(*v0, *v1); }\nvoid Vqtbl3U8(uint8x8_t* r, uint8x16x3_t* v0, uint8x8_t* v1) { *r = vqtbl3_u8(*v0, *v1); }\nvoid Vqtbl3P8(poly8x8_t* r, poly8x16x3_t* v0, uint8x8_t* v1) { *r = vqtbl3_p8(*v0, *v1); }\nvoid Vqtbl3QS8(int8x16_t* r, int8x16x3_t* v0, uint8x16_t* v1) { *r = vqtbl3q_s8(*v0, *v1); }\nvoid Vqtbl3QU8(uint8x16_t* r, uint8x16x3_t* v0, uint8x16_t* v1) { *r = vqtbl3q_u8(*v0, *v1); }\nvoid Vqtbl3QP8(poly8x16_t* r, poly8x16x3_t* v0, uint8x16_t* v1) { *r = vqtbl3q_p8(*v0, *v1); }\nvoid Vqtbl4S8(int8x8_t* r, int8x16x4_t* v0, uint8x8_t* v1) { *r = vqtbl4_s8(*v0, *v1); }\nvoid Vqtbl4U8(uint8x8_t* r, uint8x16x4_t* v0, uint8x8_t* v1) { *r = vqtbl4_u8(*v0, *v1); }\nvoid Vqtbl4P8(poly8x8_t* r, poly8x16x4_t* v0, uint8x8_t* v1) { *r = vqtbl4_p8(*v0, *v1); }\nvoid Vqtbl4QS8(int8x16_t* r, int8x16x4_t* v0, uint8x16_t* v1) { *r = vqtbl4q_s8(*v0, *v1); }\nvoid Vqtbl4QU8(uint8x16_t* r, uint8x16x4_t* v0, uint8x16_t* v1) { *r = vqtbl4q_u8(*v0, *v1); }\nvoid Vqtbl4QP8(poly8x16_t* r, poly8x16x4_t* v0, uint8x16_t* v1) { *r = vqtbl4q_p8(*v0, *v1); }\nvoid Vqtbx1S8(int8x8_t* r, int8x8_t* v0, int8x16_t* v1, uint8x8_t* v2) { *r = vqtbx1_s8(*v0, *v1, *v2); }\nvoid Vqtbx1U8(uint8x8_t* r, uint8x8_t* v0, uint8x16_t* v1, uint8x8_t* v2) { *r = vqtbx1_u8(*v0, *v1, *v2); }\nvoid Vqtbx1P8(poly8x8_t* r, poly8x8_t* v0, poly8x16_t* v1, uint8x8_t* v2) { *r = vqtbx1_p8(*v0, *v1, *v2); }\nvoid Vqtbx1QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, uint8x16_t* v2) { *r = vqtbx1q_s8(*v0, *v1, *v2); }\nvoid Vqtbx1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vqtbx1q_u8(*v0, *v1, *v2); }\nvoid Vqtbx1QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1, uint8x16_t* v2) { *r = vqtbx1q_p8(*v0, *v1, *v2); }\nvoid Vqtbx2S8(int8x8_t* r, int8x8_t* v0, int8x16x2_t* v1, uint8x8_t* v2) { *r = vqtbx2_s8(*v0, *v1, *v2); }\nvoid Vqtbx2U8(uint8x8_t* r, uint8x8_t* v0, uint8x16x2_t* v1, uint8x8_t* v2) { *r = vqtbx2_u8(*v0, *v1, *v2); }\nvoid Vqtbx2P8(poly8x8_t* r, poly8x8_t* v0, poly8x16x2_t* v1, uint8x8_t* v2) { *r = vqtbx2_p8(*v0, *v1, *v2); }\nvoid Vqtbx2QS8(int8x16_t* r, int8x16_t* v0, int8x16x2_t* v1, uint8x16_t* v2) { *r = vqtbx2q_s8(*v0, *v1, *v2); }\nvoid Vqtbx2QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16x2_t* v1, uint8x16_t* v2) { *r = vqtbx2q_u8(*v0, *v1, *v2); }\nvoid Vqtbx2QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16x2_t* v1, uint8x16_t* v2) { *r = vqtbx2q_p8(*v0, *v1, *v2); }\nvoid Vqtbx3S8(int8x8_t* r, int8x8_t* v0, int8x16x3_t* v1, uint8x8_t* v2) { *r = vqtbx3_s8(*v0, *v1, *v2); }\nvoid Vqtbx3U8(uint8x8_t* r, uint8x8_t* v0, uint8x16x3_t* v1, uint8x8_t* v2) { *r = vqtbx3_u8(*v0, *v1, *v2); }\nvoid Vqtbx3P8(poly8x8_t* r, poly8x8_t* v0, poly8x16x3_t* v1, uint8x8_t* v2) { *r = vqtbx3_p8(*v0, *v1, *v2); }\nvoid Vqtbx3QS8(int8x16_t* r, int8x16_t* v0, int8x16x3_t* v1, uint8x16_t* v2) { *r = vqtbx3q_s8(*v0, *v1, *v2); }\nvoid Vqtbx3QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16x3_t* v1, uint8x16_t* v2) { *r = vqtbx3q_u8(*v0, *v1, *v2); }\nvoid Vqtbx3QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16x3_t* v1, uint8x16_t* v2) { *r = vqtbx3q_p8(*v0, *v1, *v2); }\nvoid Vqtbx4S8(int8x8_t* r, int8x8_t* v0, int8x16x4_t* v1, uint8x8_t* v2) { *r = vqtbx4_s8(*v0, *v1, *v2); }\nvoid Vqtbx4U8(uint8x8_t* r, uint8x8_t* v0, uint8x16x4_t* v1, uint8x8_t* v2) { *r = vqtbx4_u8(*v0, *v1, *v2); }\nvoid Vqtbx4P8(poly8x8_t* r, poly8x8_t* v0, poly8x16x4_t* v1, uint8x8_t* v2) { *r = vqtbx4_p8(*v0, *v1, *v2); }\nvoid Vqtbx4QS8(int8x16_t* r, int8x16_t* v0, int8x16x4_t* v1, uint8x16_t* v2) { *r = vqtbx4q_s8(*v0, *v1, *v2); }\nvoid Vqtbx4QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16x4_t* v1, uint8x16_t* v2) { *r = vqtbx4q_u8(*v0, *v1, *v2); }\nvoid Vqtbx4QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16x4_t* v1, uint8x16_t* v2) { *r = vqtbx4q_p8(*v0, *v1, *v2); }\nvoid VraddhnS16(int8x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vraddhn_s16(*v0, *v1); }\nvoid VraddhnS32(int16x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vraddhn_s32(*v0, *v1); }\nvoid VraddhnS64(int32x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vraddhn_s64(*v0, *v1); }\nvoid VraddhnU16(uint8x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vraddhn_u16(*v0, *v1); }\nvoid VraddhnU32(uint16x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vraddhn_u32(*v0, *v1); }\nvoid VraddhnU64(uint32x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vraddhn_u64(*v0, *v1); }\nvoid VraddhnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vraddhn_high_s16(*v0, *v1, *v2); }\nvoid VraddhnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vraddhn_high_s32(*v0, *v1, *v2); }\nvoid VraddhnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vraddhn_high_s64(*v0, *v1, *v2); }\nvoid VraddhnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vraddhn_high_u16(*v0, *v1, *v2); }\nvoid VraddhnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vraddhn_high_u32(*v0, *v1, *v2); }\nvoid VraddhnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vraddhn_high_u64(*v0, *v1, *v2); }\nvoid Vrax1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vrax1q_u64(*v0, *v1); }\nvoid VrbitS8(int8x8_t* r, int8x8_t* v0) { *r = vrbit_s8(*v0); }\nvoid VrbitU8(uint8x8_t* r, uint8x8_t* v0) { *r = vrbit_u8(*v0); }\nvoid VrbitP8(poly8x8_t* r, poly8x8_t* v0) { *r = vrbit_p8(*v0); }\nvoid VrbitqS8(int8x16_t* r, int8x16_t* v0) { *r = vrbitq_s8(*v0); }\nvoid VrbitqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vrbitq_u8(*v0); }\nvoid VrbitqP8(poly8x16_t* r, poly8x16_t* v0) { *r = vrbitq_p8(*v0); }\nvoid VrecpeU32(uint32x2_t* r, uint32x2_t* v0) { *r = vrecpe_u32(*v0); }\nvoid VrecpeF32(float32x2_t* r, float32x2_t* v0) { *r = vrecpe_f32(*v0); }\nvoid VrecpeF64(float64x1_t* r, float64x1_t* v0) { *r = vrecpe_f64(*v0); }\nvoid VrecpedF64(float64_t* r, float64_t* v0) { *r = vrecped_f64(*v0); }\nvoid VrecpeqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vrecpeq_u32(*v0); }\nvoid VrecpeqF32(float32x4_t* r, float32x4_t* v0) { *r = vrecpeq_f32(*v0); }\nvoid VrecpeqF64(float64x2_t* r, float64x2_t* v0) { *r = vrecpeq_f64(*v0); }\nvoid VrecpesF32(float32_t* r, float32_t* v0) { *r = vrecpes_f32(*v0); }\nvoid VrecpsF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vrecps_f32(*v0, *v1); }\nvoid VrecpsF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vrecps_f64(*v0, *v1); }\nvoid VrecpsdF64(float64_t* r, float64_t* v0, float64_t* v1) { *r = vrecpsd_f64(*v0, *v1); }\nvoid VrecpsqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vrecpsq_f32(*v0, *v1); }\nvoid VrecpsqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vrecpsq_f64(*v0, *v1); }\nvoid VrecpssF32(float32_t* r, float32_t* v0, float32_t* v1) { *r = vrecpss_f32(*v0, *v1); }\nvoid VrecpxdF64(float64_t* r, float64_t* v0) { *r = vrecpxd_f64(*v0); }\nvoid VrecpxsF32(float32_t* r, float32_t* v0) { *r = vrecpxs_f32(*v0); }\nvoid VreinterpretF32S8(float32x2_t* r, int8x8_t* v0) { *r = vreinterpret_f32_s8(*v0); }\nvoid VreinterpretF32S16(float32x2_t* r, int16x4_t* v0) { *r = vreinterpret_f32_s16(*v0); }\nvoid VreinterpretF32S32(float32x2_t* r, int32x2_t* v0) { *r = vreinterpret_f32_s32(*v0); }\nvoid VreinterpretF32S64(float32x2_t* r, int64x1_t* v0) { *r = vreinterpret_f32_s64(*v0); }\nvoid VreinterpretF32U8(float32x2_t* r, uint8x8_t* v0) { *r = vreinterpret_f32_u8(*v0); }\nvoid VreinterpretF32U16(float32x2_t* r, uint16x4_t* v0) { *r = vreinterpret_f32_u16(*v0); }\nvoid VreinterpretF32U32(float32x2_t* r, uint32x2_t* v0) { *r = vreinterpret_f32_u32(*v0); }\nvoid VreinterpretF32U64(float32x2_t* r, uint64x1_t* v0) { *r = vreinterpret_f32_u64(*v0); }\nvoid VreinterpretF32F64(float32x2_t* r, float64x1_t* v0) { *r = vreinterpret_f32_f64(*v0); }\nvoid VreinterpretF32P16(float32x2_t* r, poly16x4_t* v0) { *r = vreinterpret_f32_p16(*v0); }\nvoid VreinterpretF32P64(float32x2_t* r, poly64x1_t* v0) { *r = vreinterpret_f32_p64(*v0); }\nvoid VreinterpretF32P8(float32x2_t* r, poly8x8_t* v0) { *r = vreinterpret_f32_p8(*v0); }\nvoid VreinterpretF64S8(float64x1_t* r, int8x8_t* v0) { *r = vreinterpret_f64_s8(*v0); }\nvoid VreinterpretF64S16(float64x1_t* r, int16x4_t* v0) { *r = vreinterpret_f64_s16(*v0); }\nvoid VreinterpretF64S32(float64x1_t* r, int32x2_t* v0) { *r = vreinterpret_f64_s32(*v0); }\nvoid VreinterpretF64S64(float64x1_t* r, int64x1_t* v0) { *r = vreinterpret_f64_s64(*v0); }\nvoid VreinterpretF64U8(float64x1_t* r, uint8x8_t* v0) { *r = vreinterpret_f64_u8(*v0); }\nvoid VreinterpretF64U16(float64x1_t* r, uint16x4_t* v0) { *r = vreinterpret_f64_u16(*v0); }\nvoid VreinterpretF64U32(float64x1_t* r, uint32x2_t* v0) { *r = vreinterpret_f64_u32(*v0); }\nvoid VreinterpretF64U64(float64x1_t* r, uint64x1_t* v0) { *r = vreinterpret_f64_u64(*v0); }\nvoid VreinterpretF64F32(float64x1_t* r, float32x2_t* v0) { *r = vreinterpret_f64_f32(*v0); }\nvoid VreinterpretF64P16(float64x1_t* r, poly16x4_t* v0) { *r = vreinterpret_f64_p16(*v0); }\nvoid VreinterpretF64P64(float64x1_t* r, poly64x1_t* v0) { *r = vreinterpret_f64_p64(*v0); }\nvoid VreinterpretF64P8(float64x1_t* r, poly8x8_t* v0) { *r = vreinterpret_f64_p8(*v0); }\nvoid VreinterpretP16S8(poly16x4_t* r, int8x8_t* v0) { *r = vreinterpret_p16_s8(*v0); }\nvoid VreinterpretP16S16(poly16x4_t* r, int16x4_t* v0) { *r = vreinterpret_p16_s16(*v0); }\nvoid VreinterpretP16S32(poly16x4_t* r, int32x2_t* v0) { *r = vreinterpret_p16_s32(*v0); }\nvoid VreinterpretP16S64(poly16x4_t* r, int64x1_t* v0) { *r = vreinterpret_p16_s64(*v0); }\nvoid VreinterpretP16U8(poly16x4_t* r, uint8x8_t* v0) { *r = vreinterpret_p16_u8(*v0); }\nvoid VreinterpretP16U16(poly16x4_t* r, uint16x4_t* v0) { *r = vreinterpret_p16_u16(*v0); }\nvoid VreinterpretP16U32(poly16x4_t* r, uint32x2_t* v0) { *r = vreinterpret_p16_u32(*v0); }\nvoid VreinterpretP16U64(poly16x4_t* r, uint64x1_t* v0) { *r = vreinterpret_p16_u64(*v0); }\nvoid VreinterpretP16F32(poly16x4_t* r, float32x2_t* v0) { *r = vreinterpret_p16_f32(*v0); }\nvoid VreinterpretP16F64(poly16x4_t* r, float64x1_t* v0) { *r = vreinterpret_p16_f64(*v0); }\nvoid VreinterpretP16P64(poly16x4_t* r, poly64x1_t* v0) { *r = vreinterpret_p16_p64(*v0); }\nvoid VreinterpretP16P8(poly16x4_t* r, poly8x8_t* v0) { *r = vreinterpret_p16_p8(*v0); }\nvoid VreinterpretP64S8(poly64x1_t* r, int8x8_t* v0) { *r = vreinterpret_p64_s8(*v0); }\nvoid VreinterpretP64S16(poly64x1_t* r, int16x4_t* v0) { *r = vreinterpret_p64_s16(*v0); }\nvoid VreinterpretP64S32(poly64x1_t* r, int32x2_t* v0) { *r = vreinterpret_p64_s32(*v0); }\nvoid VreinterpretP64S64(poly64x1_t* r, int64x1_t* v0) { *r = vreinterpret_p64_s64(*v0); }\nvoid VreinterpretP64U8(poly64x1_t* r, uint8x8_t* v0) { *r = vreinterpret_p64_u8(*v0); }\nvoid VreinterpretP64U16(poly64x1_t* r, uint16x4_t* v0) { *r = vreinterpret_p64_u16(*v0); }\nvoid VreinterpretP64U32(poly64x1_t* r, uint32x2_t* v0) { *r = vreinterpret_p64_u32(*v0); }\nvoid VreinterpretP64U64(poly64x1_t* r, uint64x1_t* v0) { *r = vreinterpret_p64_u64(*v0); }\nvoid VreinterpretP64F32(poly64x1_t* r, float32x2_t* v0) { *r = vreinterpret_p64_f32(*v0); }\nvoid VreinterpretP64F64(poly64x1_t* r, float64x1_t* v0) { *r = vreinterpret_p64_f64(*v0); }\nvoid VreinterpretP64P16(poly64x1_t* r, poly16x4_t* v0) { *r = vreinterpret_p64_p16(*v0); }\nvoid VreinterpretP64P8(poly64x1_t* r, poly8x8_t* v0) { *r = vreinterpret_p64_p8(*v0); }\nvoid VreinterpretP8S8(poly8x8_t* r, int8x8_t* v0) { *r = vreinterpret_p8_s8(*v0); }\nvoid VreinterpretP8S16(poly8x8_t* r, int16x4_t* v0) { *r = vreinterpret_p8_s16(*v0); }\nvoid VreinterpretP8S32(poly8x8_t* r, int32x2_t* v0) { *r = vreinterpret_p8_s32(*v0); }\nvoid VreinterpretP8S64(poly8x8_t* r, int64x1_t* v0) { *r = vreinterpret_p8_s64(*v0); }\nvoid VreinterpretP8U8(poly8x8_t* r, uint8x8_t* v0) { *r = vreinterpret_p8_u8(*v0); }\nvoid VreinterpretP8U16(poly8x8_t* r, uint16x4_t* v0) { *r = vreinterpret_p8_u16(*v0); }\nvoid VreinterpretP8U32(poly8x8_t* r, uint32x2_t* v0) { *r = vreinterpret_p8_u32(*v0); }\nvoid VreinterpretP8U64(poly8x8_t* r, uint64x1_t* v0) { *r = vreinterpret_p8_u64(*v0); }\nvoid VreinterpretP8F32(poly8x8_t* r, float32x2_t* v0) { *r = vreinterpret_p8_f32(*v0); }\nvoid VreinterpretP8F64(poly8x8_t* r, float64x1_t* v0) { *r = vreinterpret_p8_f64(*v0); }\nvoid VreinterpretP8P16(poly8x8_t* r, poly16x4_t* v0) { *r = vreinterpret_p8_p16(*v0); }\nvoid VreinterpretP8P64(poly8x8_t* r, poly64x1_t* v0) { *r = vreinterpret_p8_p64(*v0); }\nvoid VreinterpretS16S8(int16x4_t* r, int8x8_t* v0) { *r = vreinterpret_s16_s8(*v0); }\nvoid VreinterpretS16S32(int16x4_t* r, int32x2_t* v0) { *r = vreinterpret_s16_s32(*v0); }\nvoid VreinterpretS16S64(int16x4_t* r, int64x1_t* v0) { *r = vreinterpret_s16_s64(*v0); }\nvoid VreinterpretS16U8(int16x4_t* r, uint8x8_t* v0) { *r = vreinterpret_s16_u8(*v0); }\nvoid VreinterpretS16U16(int16x4_t* r, uint16x4_t* v0) { *r = vreinterpret_s16_u16(*v0); }\nvoid VreinterpretS16U32(int16x4_t* r, uint32x2_t* v0) { *r = vreinterpret_s16_u32(*v0); }\nvoid VreinterpretS16U64(int16x4_t* r, uint64x1_t* v0) { *r = vreinterpret_s16_u64(*v0); }\nvoid VreinterpretS16F32(int16x4_t* r, float32x2_t* v0) { *r = vreinterpret_s16_f32(*v0); }\nvoid VreinterpretS16F64(int16x4_t* r, float64x1_t* v0) { *r = vreinterpret_s16_f64(*v0); }\nvoid VreinterpretS16P16(int16x4_t* r, poly16x4_t* v0) { *r = vreinterpret_s16_p16(*v0); }\nvoid VreinterpretS16P64(int16x4_t* r, poly64x1_t* v0) { *r = vreinterpret_s16_p64(*v0); }\nvoid VreinterpretS16P8(int16x4_t* r, poly8x8_t* v0) { *r = vreinterpret_s16_p8(*v0); }\nvoid VreinterpretS32S8(int32x2_t* r, int8x8_t* v0) { *r = vreinterpret_s32_s8(*v0); }\nvoid VreinterpretS32S16(int32x2_t* r, int16x4_t* v0) { *r = vreinterpret_s32_s16(*v0); }\nvoid VreinterpretS32S64(int32x2_t* r, int64x1_t* v0) { *r = vreinterpret_s32_s64(*v0); }\nvoid VreinterpretS32U8(int32x2_t* r, uint8x8_t* v0) { *r = vreinterpret_s32_u8(*v0); }\nvoid VreinterpretS32U16(int32x2_t* r, uint16x4_t* v0) { *r = vreinterpret_s32_u16(*v0); }\nvoid VreinterpretS32U32(int32x2_t* r, uint32x2_t* v0) { *r = vreinterpret_s32_u32(*v0); }\nvoid VreinterpretS32U64(int32x2_t* r, uint64x1_t* v0) { *r = vreinterpret_s32_u64(*v0); }\nvoid VreinterpretS32F32(int32x2_t* r, float32x2_t* v0) { *r = vreinterpret_s32_f32(*v0); }\nvoid VreinterpretS32F64(int32x2_t* r, float64x1_t* v0) { *r = vreinterpret_s32_f64(*v0); }\nvoid VreinterpretS32P16(int32x2_t* r, poly16x4_t* v0) { *r = vreinterpret_s32_p16(*v0); }\nvoid VreinterpretS32P64(int32x2_t* r, poly64x1_t* v0) { *r = vreinterpret_s32_p64(*v0); }\nvoid VreinterpretS32P8(int32x2_t* r, poly8x8_t* v0) { *r = vreinterpret_s32_p8(*v0); }\nvoid VreinterpretS64S8(int64x1_t* r, int8x8_t* v0) { *r = vreinterpret_s64_s8(*v0); }\nvoid VreinterpretS64S16(int64x1_t* r, int16x4_t* v0) { *r = vreinterpret_s64_s16(*v0); }\nvoid VreinterpretS64S32(int64x1_t* r, int32x2_t* v0) { *r = vreinterpret_s64_s32(*v0); }\nvoid VreinterpretS64U8(int64x1_t* r, uint8x8_t* v0) { *r = vreinterpret_s64_u8(*v0); }\nvoid VreinterpretS64U16(int64x1_t* r, uint16x4_t* v0) { *r = vreinterpret_s64_u16(*v0); }\nvoid VreinterpretS64U32(int64x1_t* r, uint32x2_t* v0) { *r = vreinterpret_s64_u32(*v0); }\nvoid VreinterpretS64U64(int64x1_t* r, uint64x1_t* v0) { *r = vreinterpret_s64_u64(*v0); }\nvoid VreinterpretS64F32(int64x1_t* r, float32x2_t* v0) { *r = vreinterpret_s64_f32(*v0); }\nvoid VreinterpretS64F64(int64x1_t* r, float64x1_t* v0) { *r = vreinterpret_s64_f64(*v0); }\nvoid VreinterpretS64P16(int64x1_t* r, poly16x4_t* v0) { *r = vreinterpret_s64_p16(*v0); }\nvoid VreinterpretS64P64(int64x1_t* r, poly64x1_t* v0) { *r = vreinterpret_s64_p64(*v0); }\nvoid VreinterpretS64P8(int64x1_t* r, poly8x8_t* v0) { *r = vreinterpret_s64_p8(*v0); }\nvoid VreinterpretS8S16(int8x8_t* r, int16x4_t* v0) { *r = vreinterpret_s8_s16(*v0); }\nvoid VreinterpretS8S32(int8x8_t* r, int32x2_t* v0) { *r = vreinterpret_s8_s32(*v0); }\nvoid VreinterpretS8S64(int8x8_t* r, int64x1_t* v0) { *r = vreinterpret_s8_s64(*v0); }\nvoid VreinterpretS8U8(int8x8_t* r, uint8x8_t* v0) { *r = vreinterpret_s8_u8(*v0); }\nvoid VreinterpretS8U16(int8x8_t* r, uint16x4_t* v0) { *r = vreinterpret_s8_u16(*v0); }\nvoid VreinterpretS8U32(int8x8_t* r, uint32x2_t* v0) { *r = vreinterpret_s8_u32(*v0); }\nvoid VreinterpretS8U64(int8x8_t* r, uint64x1_t* v0) { *r = vreinterpret_s8_u64(*v0); }\nvoid VreinterpretS8F32(int8x8_t* r, float32x2_t* v0) { *r = vreinterpret_s8_f32(*v0); }\nvoid VreinterpretS8F64(int8x8_t* r, float64x1_t* v0) { *r = vreinterpret_s8_f64(*v0); }\nvoid VreinterpretS8P16(int8x8_t* r, poly16x4_t* v0) { *r = vreinterpret_s8_p16(*v0); }\nvoid VreinterpretS8P64(int8x8_t* r, poly64x1_t* v0) { *r = vreinterpret_s8_p64(*v0); }\nvoid VreinterpretS8P8(int8x8_t* r, poly8x8_t* v0) { *r = vreinterpret_s8_p8(*v0); }\nvoid VreinterpretU16S8(uint16x4_t* r, int8x8_t* v0) { *r = vreinterpret_u16_s8(*v0); }\nvoid VreinterpretU16S16(uint16x4_t* r, int16x4_t* v0) { *r = vreinterpret_u16_s16(*v0); }\nvoid VreinterpretU16S32(uint16x4_t* r, int32x2_t* v0) { *r = vreinterpret_u16_s32(*v0); }\nvoid VreinterpretU16S64(uint16x4_t* r, int64x1_t* v0) { *r = vreinterpret_u16_s64(*v0); }\nvoid VreinterpretU16U8(uint16x4_t* r, uint8x8_t* v0) { *r = vreinterpret_u16_u8(*v0); }\nvoid VreinterpretU16U32(uint16x4_t* r, uint32x2_t* v0) { *r = vreinterpret_u16_u32(*v0); }\nvoid VreinterpretU16U64(uint16x4_t* r, uint64x1_t* v0) { *r = vreinterpret_u16_u64(*v0); }\nvoid VreinterpretU16F32(uint16x4_t* r, float32x2_t* v0) { *r = vreinterpret_u16_f32(*v0); }\nvoid VreinterpretU16F64(uint16x4_t* r, float64x1_t* v0) { *r = vreinterpret_u16_f64(*v0); }\nvoid VreinterpretU16P16(uint16x4_t* r, poly16x4_t* v0) { *r = vreinterpret_u16_p16(*v0); }\nvoid VreinterpretU16P64(uint16x4_t* r, poly64x1_t* v0) { *r = vreinterpret_u16_p64(*v0); }\nvoid VreinterpretU16P8(uint16x4_t* r, poly8x8_t* v0) { *r = vreinterpret_u16_p8(*v0); }\nvoid VreinterpretU32S8(uint32x2_t* r, int8x8_t* v0) { *r = vreinterpret_u32_s8(*v0); }\nvoid VreinterpretU32S16(uint32x2_t* r, int16x4_t* v0) { *r = vreinterpret_u32_s16(*v0); }\nvoid VreinterpretU32S32(uint32x2_t* r, int32x2_t* v0) { *r = vreinterpret_u32_s32(*v0); }\nvoid VreinterpretU32S64(uint32x2_t* r, int64x1_t* v0) { *r = vreinterpret_u32_s64(*v0); }\nvoid VreinterpretU32U8(uint32x2_t* r, uint8x8_t* v0) { *r = vreinterpret_u32_u8(*v0); }\nvoid VreinterpretU32U16(uint32x2_t* r, uint16x4_t* v0) { *r = vreinterpret_u32_u16(*v0); }\nvoid VreinterpretU32U64(uint32x2_t* r, uint64x1_t* v0) { *r = vreinterpret_u32_u64(*v0); }\nvoid VreinterpretU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vreinterpret_u32_f32(*v0); }\nvoid VreinterpretU32F64(uint32x2_t* r, float64x1_t* v0) { *r = vreinterpret_u32_f64(*v0); }\nvoid VreinterpretU32P16(uint32x2_t* r, poly16x4_t* v0) { *r = vreinterpret_u32_p16(*v0); }\nvoid VreinterpretU32P64(uint32x2_t* r, poly64x1_t* v0) { *r = vreinterpret_u32_p64(*v0); }\nvoid VreinterpretU32P8(uint32x2_t* r, poly8x8_t* v0) { *r = vreinterpret_u32_p8(*v0); }\nvoid VreinterpretU64S8(uint64x1_t* r, int8x8_t* v0) { *r = vreinterpret_u64_s8(*v0); }\nvoid VreinterpretU64S16(uint64x1_t* r, int16x4_t* v0) { *r = vreinterpret_u64_s16(*v0); }\nvoid VreinterpretU64S32(uint64x1_t* r, int32x2_t* v0) { *r = vreinterpret_u64_s32(*v0); }\nvoid VreinterpretU64S64(uint64x1_t* r, int64x1_t* v0) { *r = vreinterpret_u64_s64(*v0); }\nvoid VreinterpretU64U8(uint64x1_t* r, uint8x8_t* v0) { *r = vreinterpret_u64_u8(*v0); }\nvoid VreinterpretU64U16(uint64x1_t* r, uint16x4_t* v0) { *r = vreinterpret_u64_u16(*v0); }\nvoid VreinterpretU64U32(uint64x1_t* r, uint32x2_t* v0) { *r = vreinterpret_u64_u32(*v0); }\nvoid VreinterpretU64F32(uint64x1_t* r, float32x2_t* v0) { *r = vreinterpret_u64_f32(*v0); }\nvoid VreinterpretU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vreinterpret_u64_f64(*v0); }\nvoid VreinterpretU64P16(uint64x1_t* r, poly16x4_t* v0) { *r = vreinterpret_u64_p16(*v0); }\nvoid VreinterpretU64P64(uint64x1_t* r, poly64x1_t* v0) { *r = vreinterpret_u64_p64(*v0); }\nvoid VreinterpretU64P8(uint64x1_t* r, poly8x8_t* v0) { *r = vreinterpret_u64_p8(*v0); }\nvoid VreinterpretU8S8(uint8x8_t* r, int8x8_t* v0) { *r = vreinterpret_u8_s8(*v0); }\nvoid VreinterpretU8S16(uint8x8_t* r, int16x4_t* v0) { *r = vreinterpret_u8_s16(*v0); }\nvoid VreinterpretU8S32(uint8x8_t* r, int32x2_t* v0) { *r = vreinterpret_u8_s32(*v0); }\nvoid VreinterpretU8S64(uint8x8_t* r, int64x1_t* v0) { *r = vreinterpret_u8_s64(*v0); }\nvoid VreinterpretU8U16(uint8x8_t* r, uint16x4_t* v0) { *r = vreinterpret_u8_u16(*v0); }\nvoid VreinterpretU8U32(uint8x8_t* r, uint32x2_t* v0) { *r = vreinterpret_u8_u32(*v0); }\nvoid VreinterpretU8U64(uint8x8_t* r, uint64x1_t* v0) { *r = vreinterpret_u8_u64(*v0); }\nvoid VreinterpretU8F32(uint8x8_t* r, float32x2_t* v0) { *r = vreinterpret_u8_f32(*v0); }\nvoid VreinterpretU8F64(uint8x8_t* r, float64x1_t* v0) { *r = vreinterpret_u8_f64(*v0); }\nvoid VreinterpretU8P16(uint8x8_t* r, poly16x4_t* v0) { *r = vreinterpret_u8_p16(*v0); }\nvoid VreinterpretU8P64(uint8x8_t* r, poly64x1_t* v0) { *r = vreinterpret_u8_p64(*v0); }\nvoid VreinterpretU8P8(uint8x8_t* r, poly8x8_t* v0) { *r = vreinterpret_u8_p8(*v0); }\nvoid VreinterpretqF32S8(float32x4_t* r, int8x16_t* v0) { *r = vreinterpretq_f32_s8(*v0); }\nvoid VreinterpretqF32S16(float32x4_t* r, int16x8_t* v0) { *r = vreinterpretq_f32_s16(*v0); }\nvoid VreinterpretqF32S32(float32x4_t* r, int32x4_t* v0) { *r = vreinterpretq_f32_s32(*v0); }\nvoid VreinterpretqF32S64(float32x4_t* r, int64x2_t* v0) { *r = vreinterpretq_f32_s64(*v0); }\nvoid VreinterpretqF32U8(float32x4_t* r, uint8x16_t* v0) { *r = vreinterpretq_f32_u8(*v0); }\nvoid VreinterpretqF32U16(float32x4_t* r, uint16x8_t* v0) { *r = vreinterpretq_f32_u16(*v0); }\nvoid VreinterpretqF32U32(float32x4_t* r, uint32x4_t* v0) { *r = vreinterpretq_f32_u32(*v0); }\nvoid VreinterpretqF32U64(float32x4_t* r, uint64x2_t* v0) { *r = vreinterpretq_f32_u64(*v0); }\nvoid VreinterpretqF32F64(float32x4_t* r, float64x2_t* v0) { *r = vreinterpretq_f32_f64(*v0); }\nvoid VreinterpretqF32P128(float32x4_t* r, poly128_t* v0) { *r = vreinterpretq_f32_p128(*v0); }\nvoid VreinterpretqF32P16(float32x4_t* r, poly16x8_t* v0) { *r = vreinterpretq_f32_p16(*v0); }\nvoid VreinterpretqF32P64(float32x4_t* r, poly64x2_t* v0) { *r = vreinterpretq_f32_p64(*v0); }\nvoid VreinterpretqF32P8(float32x4_t* r, poly8x16_t* v0) { *r = vreinterpretq_f32_p8(*v0); }\nvoid VreinterpretqF64S8(float64x2_t* r, int8x16_t* v0) { *r = vreinterpretq_f64_s8(*v0); }\nvoid VreinterpretqF64S16(float64x2_t* r, int16x8_t* v0) { *r = vreinterpretq_f64_s16(*v0); }\nvoid VreinterpretqF64S32(float64x2_t* r, int32x4_t* v0) { *r = vreinterpretq_f64_s32(*v0); }\nvoid VreinterpretqF64S64(float64x2_t* r, int64x2_t* v0) { *r = vreinterpretq_f64_s64(*v0); }\nvoid VreinterpretqF64U8(float64x2_t* r, uint8x16_t* v0) { *r = vreinterpretq_f64_u8(*v0); }\nvoid VreinterpretqF64U16(float64x2_t* r, uint16x8_t* v0) { *r = vreinterpretq_f64_u16(*v0); }\nvoid VreinterpretqF64U32(float64x2_t* r, uint32x4_t* v0) { *r = vreinterpretq_f64_u32(*v0); }\nvoid VreinterpretqF64U64(float64x2_t* r, uint64x2_t* v0) { *r = vreinterpretq_f64_u64(*v0); }\nvoid VreinterpretqF64F32(float64x2_t* r, float32x4_t* v0) { *r = vreinterpretq_f64_f32(*v0); }\nvoid VreinterpretqF64P128(float64x2_t* r, poly128_t* v0) { *r = vreinterpretq_f64_p128(*v0); }\nvoid VreinterpretqF64P16(float64x2_t* r, poly16x8_t* v0) { *r = vreinterpretq_f64_p16(*v0); }\nvoid VreinterpretqF64P64(float64x2_t* r, poly64x2_t* v0) { *r = vreinterpretq_f64_p64(*v0); }\nvoid VreinterpretqF64P8(float64x2_t* r, poly8x16_t* v0) { *r = vreinterpretq_f64_p8(*v0); }\nvoid VreinterpretqP128S8(poly128_t* r, int8x16_t* v0) { *r = vreinterpretq_p128_s8(*v0); }\nvoid VreinterpretqP128S16(poly128_t* r, int16x8_t* v0) { *r = vreinterpretq_p128_s16(*v0); }\nvoid VreinterpretqP128S32(poly128_t* r, int32x4_t* v0) { *r = vreinterpretq_p128_s32(*v0); }\nvoid VreinterpretqP128S64(poly128_t* r, int64x2_t* v0) { *r = vreinterpretq_p128_s64(*v0); }\nvoid VreinterpretqP128U8(poly128_t* r, uint8x16_t* v0) { *r = vreinterpretq_p128_u8(*v0); }\nvoid VreinterpretqP128U16(poly128_t* r, uint16x8_t* v0) { *r = vreinterpretq_p128_u16(*v0); }\nvoid VreinterpretqP128U32(poly128_t* r, uint32x4_t* v0) { *r = vreinterpretq_p128_u32(*v0); }\nvoid VreinterpretqP128U64(poly128_t* r, uint64x2_t* v0) { *r = vreinterpretq_p128_u64(*v0); }\nvoid VreinterpretqP128F32(poly128_t* r, float32x4_t* v0) { *r = vreinterpretq_p128_f32(*v0); }\nvoid VreinterpretqP128F64(poly128_t* r, float64x2_t* v0) { *r = vreinterpretq_p128_f64(*v0); }\nvoid VreinterpretqP128P16(poly128_t* r, poly16x8_t* v0) { *r = vreinterpretq_p128_p16(*v0); }\nvoid VreinterpretqP128P64(poly128_t* r, poly64x2_t* v0) { *r = vreinterpretq_p128_p64(*v0); }\nvoid VreinterpretqP128P8(poly128_t* r, poly8x16_t* v0) { *r = vreinterpretq_p128_p8(*v0); }\nvoid VreinterpretqP16S8(poly16x8_t* r, int8x16_t* v0) { *r = vreinterpretq_p16_s8(*v0); }\nvoid VreinterpretqP16S16(poly16x8_t* r, int16x8_t* v0) { *r = vreinterpretq_p16_s16(*v0); }\nvoid VreinterpretqP16S32(poly16x8_t* r, int32x4_t* v0) { *r = vreinterpretq_p16_s32(*v0); }\nvoid VreinterpretqP16S64(poly16x8_t* r, int64x2_t* v0) { *r = vreinterpretq_p16_s64(*v0); }\nvoid VreinterpretqP16U8(poly16x8_t* r, uint8x16_t* v0) { *r = vreinterpretq_p16_u8(*v0); }\nvoid VreinterpretqP16U16(poly16x8_t* r, uint16x8_t* v0) { *r = vreinterpretq_p16_u16(*v0); }\nvoid VreinterpretqP16U32(poly16x8_t* r, uint32x4_t* v0) { *r = vreinterpretq_p16_u32(*v0); }\nvoid VreinterpretqP16U64(poly16x8_t* r, uint64x2_t* v0) { *r = vreinterpretq_p16_u64(*v0); }\nvoid VreinterpretqP16F32(poly16x8_t* r, float32x4_t* v0) { *r = vreinterpretq_p16_f32(*v0); }\nvoid VreinterpretqP16F64(poly16x8_t* r, float64x2_t* v0) { *r = vreinterpretq_p16_f64(*v0); }\nvoid VreinterpretqP16P128(poly16x8_t* r, poly128_t* v0) { *r = vreinterpretq_p16_p128(*v0); }\nvoid VreinterpretqP16P64(poly16x8_t* r, poly64x2_t* v0) { *r = vreinterpretq_p16_p64(*v0); }\nvoid VreinterpretqP16P8(poly16x8_t* r, poly8x16_t* v0) { *r = vreinterpretq_p16_p8(*v0); }\nvoid VreinterpretqP64S8(poly64x2_t* r, int8x16_t* v0) { *r = vreinterpretq_p64_s8(*v0); }\nvoid VreinterpretqP64S16(poly64x2_t* r, int16x8_t* v0) { *r = vreinterpretq_p64_s16(*v0); }\nvoid VreinterpretqP64S32(poly64x2_t* r, int32x4_t* v0) { *r = vreinterpretq_p64_s32(*v0); }\nvoid VreinterpretqP64S64(poly64x2_t* r, int64x2_t* v0) { *r = vreinterpretq_p64_s64(*v0); }\nvoid VreinterpretqP64U8(poly64x2_t* r, uint8x16_t* v0) { *r = vreinterpretq_p64_u8(*v0); }\nvoid VreinterpretqP64U16(poly64x2_t* r, uint16x8_t* v0) { *r = vreinterpretq_p64_u16(*v0); }\nvoid VreinterpretqP64U32(poly64x2_t* r, uint32x4_t* v0) { *r = vreinterpretq_p64_u32(*v0); }\nvoid VreinterpretqP64U64(poly64x2_t* r, uint64x2_t* v0) { *r = vreinterpretq_p64_u64(*v0); }\nvoid VreinterpretqP64F32(poly64x2_t* r, float32x4_t* v0) { *r = vreinterpretq_p64_f32(*v0); }\nvoid VreinterpretqP64F64(poly64x2_t* r, float64x2_t* v0) { *r = vreinterpretq_p64_f64(*v0); }\nvoid VreinterpretqP64P128(poly64x2_t* r, poly128_t* v0) { *r = vreinterpretq_p64_p128(*v0); }\nvoid VreinterpretqP64P16(poly64x2_t* r, poly16x8_t* v0) { *r = vreinterpretq_p64_p16(*v0); }\nvoid VreinterpretqP64P8(poly64x2_t* r, poly8x16_t* v0) { *r = vreinterpretq_p64_p8(*v0); }\nvoid VreinterpretqP8S8(poly8x16_t* r, int8x16_t* v0) { *r = vreinterpretq_p8_s8(*v0); }\nvoid VreinterpretqP8S16(poly8x16_t* r, int16x8_t* v0) { *r = vreinterpretq_p8_s16(*v0); }\nvoid VreinterpretqP8S32(poly8x16_t* r, int32x4_t* v0) { *r = vreinterpretq_p8_s32(*v0); }\nvoid VreinterpretqP8S64(poly8x16_t* r, int64x2_t* v0) { *r = vreinterpretq_p8_s64(*v0); }\nvoid VreinterpretqP8U8(poly8x16_t* r, uint8x16_t* v0) { *r = vreinterpretq_p8_u8(*v0); }\nvoid VreinterpretqP8U16(poly8x16_t* r, uint16x8_t* v0) { *r = vreinterpretq_p8_u16(*v0); }\nvoid VreinterpretqP8U32(poly8x16_t* r, uint32x4_t* v0) { *r = vreinterpretq_p8_u32(*v0); }\nvoid VreinterpretqP8U64(poly8x16_t* r, uint64x2_t* v0) { *r = vreinterpretq_p8_u64(*v0); }\nvoid VreinterpretqP8F32(poly8x16_t* r, float32x4_t* v0) { *r = vreinterpretq_p8_f32(*v0); }\nvoid VreinterpretqP8F64(poly8x16_t* r, float64x2_t* v0) { *r = vreinterpretq_p8_f64(*v0); }\nvoid VreinterpretqP8P128(poly8x16_t* r, poly128_t* v0) { *r = vreinterpretq_p8_p128(*v0); }\nvoid VreinterpretqP8P16(poly8x16_t* r, poly16x8_t* v0) { *r = vreinterpretq_p8_p16(*v0); }\nvoid VreinterpretqP8P64(poly8x16_t* r, poly64x2_t* v0) { *r = vreinterpretq_p8_p64(*v0); }\nvoid VreinterpretqS16S8(int16x8_t* r, int8x16_t* v0) { *r = vreinterpretq_s16_s8(*v0); }\nvoid VreinterpretqS16S32(int16x8_t* r, int32x4_t* v0) { *r = vreinterpretq_s16_s32(*v0); }\nvoid VreinterpretqS16S64(int16x8_t* r, int64x2_t* v0) { *r = vreinterpretq_s16_s64(*v0); }\nvoid VreinterpretqS16U8(int16x8_t* r, uint8x16_t* v0) { *r = vreinterpretq_s16_u8(*v0); }\nvoid VreinterpretqS16U16(int16x8_t* r, uint16x8_t* v0) { *r = vreinterpretq_s16_u16(*v0); }\nvoid VreinterpretqS16U32(int16x8_t* r, uint32x4_t* v0) { *r = vreinterpretq_s16_u32(*v0); }\nvoid VreinterpretqS16U64(int16x8_t* r, uint64x2_t* v0) { *r = vreinterpretq_s16_u64(*v0); }\nvoid VreinterpretqS16F32(int16x8_t* r, float32x4_t* v0) { *r = vreinterpretq_s16_f32(*v0); }\nvoid VreinterpretqS16F64(int16x8_t* r, float64x2_t* v0) { *r = vreinterpretq_s16_f64(*v0); }\nvoid VreinterpretqS16P128(int16x8_t* r, poly128_t* v0) { *r = vreinterpretq_s16_p128(*v0); }\nvoid VreinterpretqS16P16(int16x8_t* r, poly16x8_t* v0) { *r = vreinterpretq_s16_p16(*v0); }\nvoid VreinterpretqS16P64(int16x8_t* r, poly64x2_t* v0) { *r = vreinterpretq_s16_p64(*v0); }\nvoid VreinterpretqS16P8(int16x8_t* r, poly8x16_t* v0) { *r = vreinterpretq_s16_p8(*v0); }\nvoid VreinterpretqS32S8(int32x4_t* r, int8x16_t* v0) { *r = vreinterpretq_s32_s8(*v0); }\nvoid VreinterpretqS32S16(int32x4_t* r, int16x8_t* v0) { *r = vreinterpretq_s32_s16(*v0); }\nvoid VreinterpretqS32S64(int32x4_t* r, int64x2_t* v0) { *r = vreinterpretq_s32_s64(*v0); }\nvoid VreinterpretqS32U8(int32x4_t* r, uint8x16_t* v0) { *r = vreinterpretq_s32_u8(*v0); }\nvoid VreinterpretqS32U16(int32x4_t* r, uint16x8_t* v0) { *r = vreinterpretq_s32_u16(*v0); }\nvoid VreinterpretqS32U32(int32x4_t* r, uint32x4_t* v0) { *r = vreinterpretq_s32_u32(*v0); }\nvoid VreinterpretqS32U64(int32x4_t* r, uint64x2_t* v0) { *r = vreinterpretq_s32_u64(*v0); }\nvoid VreinterpretqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vreinterpretq_s32_f32(*v0); }\nvoid VreinterpretqS32F64(int32x4_t* r, float64x2_t* v0) { *r = vreinterpretq_s32_f64(*v0); }\nvoid VreinterpretqS32P128(int32x4_t* r, poly128_t* v0) { *r = vreinterpretq_s32_p128(*v0); }\nvoid VreinterpretqS32P16(int32x4_t* r, poly16x8_t* v0) { *r = vreinterpretq_s32_p16(*v0); }\nvoid VreinterpretqS32P64(int32x4_t* r, poly64x2_t* v0) { *r = vreinterpretq_s32_p64(*v0); }\nvoid VreinterpretqS32P8(int32x4_t* r, poly8x16_t* v0) { *r = vreinterpretq_s32_p8(*v0); }\nvoid VreinterpretqS64S8(int64x2_t* r, int8x16_t* v0) { *r = vreinterpretq_s64_s8(*v0); }\nvoid VreinterpretqS64S16(int64x2_t* r, int16x8_t* v0) { *r = vreinterpretq_s64_s16(*v0); }\nvoid VreinterpretqS64S32(int64x2_t* r, int32x4_t* v0) { *r = vreinterpretq_s64_s32(*v0); }\nvoid VreinterpretqS64U8(int64x2_t* r, uint8x16_t* v0) { *r = vreinterpretq_s64_u8(*v0); }\nvoid VreinterpretqS64U16(int64x2_t* r, uint16x8_t* v0) { *r = vreinterpretq_s64_u16(*v0); }\nvoid VreinterpretqS64U32(int64x2_t* r, uint32x4_t* v0) { *r = vreinterpretq_s64_u32(*v0); }\nvoid VreinterpretqS64U64(int64x2_t* r, uint64x2_t* v0) { *r = vreinterpretq_s64_u64(*v0); }\nvoid VreinterpretqS64F32(int64x2_t* r, float32x4_t* v0) { *r = vreinterpretq_s64_f32(*v0); }\nvoid VreinterpretqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vreinterpretq_s64_f64(*v0); }\nvoid VreinterpretqS64P128(int64x2_t* r, poly128_t* v0) { *r = vreinterpretq_s64_p128(*v0); }\nvoid VreinterpretqS64P16(int64x2_t* r, poly16x8_t* v0) { *r = vreinterpretq_s64_p16(*v0); }\nvoid VreinterpretqS64P64(int64x2_t* r, poly64x2_t* v0) { *r = vreinterpretq_s64_p64(*v0); }\nvoid VreinterpretqS64P8(int64x2_t* r, poly8x16_t* v0) { *r = vreinterpretq_s64_p8(*v0); }\nvoid VreinterpretqS8S16(int8x16_t* r, int16x8_t* v0) { *r = vreinterpretq_s8_s16(*v0); }\nvoid VreinterpretqS8S32(int8x16_t* r, int32x4_t* v0) { *r = vreinterpretq_s8_s32(*v0); }\nvoid VreinterpretqS8S64(int8x16_t* r, int64x2_t* v0) { *r = vreinterpretq_s8_s64(*v0); }\nvoid VreinterpretqS8U8(int8x16_t* r, uint8x16_t* v0) { *r = vreinterpretq_s8_u8(*v0); }\nvoid VreinterpretqS8U16(int8x16_t* r, uint16x8_t* v0) { *r = vreinterpretq_s8_u16(*v0); }\nvoid VreinterpretqS8U32(int8x16_t* r, uint32x4_t* v0) { *r = vreinterpretq_s8_u32(*v0); }\nvoid VreinterpretqS8U64(int8x16_t* r, uint64x2_t* v0) { *r = vreinterpretq_s8_u64(*v0); }\nvoid VreinterpretqS8F32(int8x16_t* r, float32x4_t* v0) { *r = vreinterpretq_s8_f32(*v0); }\nvoid VreinterpretqS8F64(int8x16_t* r, float64x2_t* v0) { *r = vreinterpretq_s8_f64(*v0); }\nvoid VreinterpretqS8P128(int8x16_t* r, poly128_t* v0) { *r = vreinterpretq_s8_p128(*v0); }\nvoid VreinterpretqS8P16(int8x16_t* r, poly16x8_t* v0) { *r = vreinterpretq_s8_p16(*v0); }\nvoid VreinterpretqS8P64(int8x16_t* r, poly64x2_t* v0) { *r = vreinterpretq_s8_p64(*v0); }\nvoid VreinterpretqS8P8(int8x16_t* r, poly8x16_t* v0) { *r = vreinterpretq_s8_p8(*v0); }\nvoid VreinterpretqU16S8(uint16x8_t* r, int8x16_t* v0) { *r = vreinterpretq_u16_s8(*v0); }\nvoid VreinterpretqU16S16(uint16x8_t* r, int16x8_t* v0) { *r = vreinterpretq_u16_s16(*v0); }\nvoid VreinterpretqU16S32(uint16x8_t* r, int32x4_t* v0) { *r = vreinterpretq_u16_s32(*v0); }\nvoid VreinterpretqU16S64(uint16x8_t* r, int64x2_t* v0) { *r = vreinterpretq_u16_s64(*v0); }\nvoid VreinterpretqU16U8(uint16x8_t* r, uint8x16_t* v0) { *r = vreinterpretq_u16_u8(*v0); }\nvoid VreinterpretqU16U32(uint16x8_t* r, uint32x4_t* v0) { *r = vreinterpretq_u16_u32(*v0); }\nvoid VreinterpretqU16U64(uint16x8_t* r, uint64x2_t* v0) { *r = vreinterpretq_u16_u64(*v0); }\nvoid VreinterpretqU16F32(uint16x8_t* r, float32x4_t* v0) { *r = vreinterpretq_u16_f32(*v0); }\nvoid VreinterpretqU16F64(uint16x8_t* r, float64x2_t* v0) { *r = vreinterpretq_u16_f64(*v0); }\nvoid VreinterpretqU16P128(uint16x8_t* r, poly128_t* v0) { *r = vreinterpretq_u16_p128(*v0); }\nvoid VreinterpretqU16P16(uint16x8_t* r, poly16x8_t* v0) { *r = vreinterpretq_u16_p16(*v0); }\nvoid VreinterpretqU16P64(uint16x8_t* r, poly64x2_t* v0) { *r = vreinterpretq_u16_p64(*v0); }\nvoid VreinterpretqU16P8(uint16x8_t* r, poly8x16_t* v0) { *r = vreinterpretq_u16_p8(*v0); }\nvoid VreinterpretqU32S8(uint32x4_t* r, int8x16_t* v0) { *r = vreinterpretq_u32_s8(*v0); }\nvoid VreinterpretqU32S16(uint32x4_t* r, int16x8_t* v0) { *r = vreinterpretq_u32_s16(*v0); }\nvoid VreinterpretqU32S32(uint32x4_t* r, int32x4_t* v0) { *r = vreinterpretq_u32_s32(*v0); }\nvoid VreinterpretqU32S64(uint32x4_t* r, int64x2_t* v0) { *r = vreinterpretq_u32_s64(*v0); }\nvoid VreinterpretqU32U8(uint32x4_t* r, uint8x16_t* v0) { *r = vreinterpretq_u32_u8(*v0); }\nvoid VreinterpretqU32U16(uint32x4_t* r, uint16x8_t* v0) { *r = vreinterpretq_u32_u16(*v0); }\nvoid VreinterpretqU32U64(uint32x4_t* r, uint64x2_t* v0) { *r = vreinterpretq_u32_u64(*v0); }\nvoid VreinterpretqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vreinterpretq_u32_f32(*v0); }\nvoid VreinterpretqU32F64(uint32x4_t* r, float64x2_t* v0) { *r = vreinterpretq_u32_f64(*v0); }\nvoid VreinterpretqU32P128(uint32x4_t* r, poly128_t* v0) { *r = vreinterpretq_u32_p128(*v0); }\nvoid VreinterpretqU32P16(uint32x4_t* r, poly16x8_t* v0) { *r = vreinterpretq_u32_p16(*v0); }\nvoid VreinterpretqU32P64(uint32x4_t* r, poly64x2_t* v0) { *r = vreinterpretq_u32_p64(*v0); }\nvoid VreinterpretqU32P8(uint32x4_t* r, poly8x16_t* v0) { *r = vreinterpretq_u32_p8(*v0); }\nvoid VreinterpretqU64S8(uint64x2_t* r, int8x16_t* v0) { *r = vreinterpretq_u64_s8(*v0); }\nvoid VreinterpretqU64S16(uint64x2_t* r, int16x8_t* v0) { *r = vreinterpretq_u64_s16(*v0); }\nvoid VreinterpretqU64S32(uint64x2_t* r, int32x4_t* v0) { *r = vreinterpretq_u64_s32(*v0); }\nvoid VreinterpretqU64S64(uint64x2_t* r, int64x2_t* v0) { *r = vreinterpretq_u64_s64(*v0); }\nvoid VreinterpretqU64U8(uint64x2_t* r, uint8x16_t* v0) { *r = vreinterpretq_u64_u8(*v0); }\nvoid VreinterpretqU64U16(uint64x2_t* r, uint16x8_t* v0) { *r = vreinterpretq_u64_u16(*v0); }\nvoid VreinterpretqU64U32(uint64x2_t* r, uint32x4_t* v0) { *r = vreinterpretq_u64_u32(*v0); }\nvoid VreinterpretqU64F32(uint64x2_t* r, float32x4_t* v0) { *r = vreinterpretq_u64_f32(*v0); }\nvoid VreinterpretqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vreinterpretq_u64_f64(*v0); }\nvoid VreinterpretqU64P128(uint64x2_t* r, poly128_t* v0) { *r = vreinterpretq_u64_p128(*v0); }\nvoid VreinterpretqU64P16(uint64x2_t* r, poly16x8_t* v0) { *r = vreinterpretq_u64_p16(*v0); }\nvoid VreinterpretqU64P64(uint64x2_t* r, poly64x2_t* v0) { *r = vreinterpretq_u64_p64(*v0); }\nvoid VreinterpretqU64P8(uint64x2_t* r, poly8x16_t* v0) { *r = vreinterpretq_u64_p8(*v0); }\nvoid VreinterpretqU8S8(uint8x16_t* r, int8x16_t* v0) { *r = vreinterpretq_u8_s8(*v0); }\nvoid VreinterpretqU8S16(uint8x16_t* r, int16x8_t* v0) { *r = vreinterpretq_u8_s16(*v0); }\nvoid VreinterpretqU8S32(uint8x16_t* r, int32x4_t* v0) { *r = vreinterpretq_u8_s32(*v0); }\nvoid VreinterpretqU8S64(uint8x16_t* r, int64x2_t* v0) { *r = vreinterpretq_u8_s64(*v0); }\nvoid VreinterpretqU8U16(uint8x16_t* r, uint16x8_t* v0) { *r = vreinterpretq_u8_u16(*v0); }\nvoid VreinterpretqU8U32(uint8x16_t* r, uint32x4_t* v0) { *r = vreinterpretq_u8_u32(*v0); }\nvoid VreinterpretqU8U64(uint8x16_t* r, uint64x2_t* v0) { *r = vreinterpretq_u8_u64(*v0); }\nvoid VreinterpretqU8F32(uint8x16_t* r, float32x4_t* v0) { *r = vreinterpretq_u8_f32(*v0); }\nvoid VreinterpretqU8F64(uint8x16_t* r, float64x2_t* v0) { *r = vreinterpretq_u8_f64(*v0); }\nvoid VreinterpretqU8P128(uint8x16_t* r, poly128_t* v0) { *r = vreinterpretq_u8_p128(*v0); }\nvoid VreinterpretqU8P16(uint8x16_t* r, poly16x8_t* v0) { *r = vreinterpretq_u8_p16(*v0); }\nvoid VreinterpretqU8P64(uint8x16_t* r, poly64x2_t* v0) { *r = vreinterpretq_u8_p64(*v0); }\nvoid VreinterpretqU8P8(uint8x16_t* r, poly8x16_t* v0) { *r = vreinterpretq_u8_p8(*v0); }\nvoid Vrev16S8(int8x8_t* r, int8x8_t* v0) { *r = vrev16_s8(*v0); }\nvoid Vrev16U8(uint8x8_t* r, uint8x8_t* v0) { *r = vrev16_u8(*v0); }\nvoid Vrev16P8(poly8x8_t* r, poly8x8_t* v0) { *r = vrev16_p8(*v0); }\nvoid Vrev16QS8(int8x16_t* r, int8x16_t* v0) { *r = vrev16q_s8(*v0); }\nvoid Vrev16QU8(uint8x16_t* r, uint8x16_t* v0) { *r = vrev16q_u8(*v0); }\nvoid Vrev16QP8(poly8x16_t* r, poly8x16_t* v0) { *r = vrev16q_p8(*v0); }\nvoid Vrev32S8(int8x8_t* r, int8x8_t* v0) { *r = vrev32_s8(*v0); }\nvoid Vrev32S16(int16x4_t* r, int16x4_t* v0) { *r = vrev32_s16(*v0); }\nvoid Vrev32U8(uint8x8_t* r, uint8x8_t* v0) { *r = vrev32_u8(*v0); }\nvoid Vrev32U16(uint16x4_t* r, uint16x4_t* v0) { *r = vrev32_u16(*v0); }\nvoid Vrev32P16(poly16x4_t* r, poly16x4_t* v0) { *r = vrev32_p16(*v0); }\nvoid Vrev32P8(poly8x8_t* r, poly8x8_t* v0) { *r = vrev32_p8(*v0); }\nvoid Vrev32QS8(int8x16_t* r, int8x16_t* v0) { *r = vrev32q_s8(*v0); }\nvoid Vrev32QS16(int16x8_t* r, int16x8_t* v0) { *r = vrev32q_s16(*v0); }\nvoid Vrev32QU8(uint8x16_t* r, uint8x16_t* v0) { *r = vrev32q_u8(*v0); }\nvoid Vrev32QU16(uint16x8_t* r, uint16x8_t* v0) { *r = vrev32q_u16(*v0); }\nvoid Vrev32QP16(poly16x8_t* r, poly16x8_t* v0) { *r = vrev32q_p16(*v0); }\nvoid Vrev32QP8(poly8x16_t* r, poly8x16_t* v0) { *r = vrev32q_p8(*v0); }\nvoid Vrev64S8(int8x8_t* r, int8x8_t* v0) { *r = vrev64_s8(*v0); }\nvoid Vrev64S16(int16x4_t* r, int16x4_t* v0) { *r = vrev64_s16(*v0); }\nvoid Vrev64S32(int32x2_t* r, int32x2_t* v0) { *r = vrev64_s32(*v0); }\nvoid Vrev64U8(uint8x8_t* r, uint8x8_t* v0) { *r = vrev64_u8(*v0); }\nvoid Vrev64U16(uint16x4_t* r, uint16x4_t* v0) { *r = vrev64_u16(*v0); }\nvoid Vrev64U32(uint32x2_t* r, uint32x2_t* v0) { *r = vrev64_u32(*v0); }\nvoid Vrev64F32(float32x2_t* r, float32x2_t* v0) { *r = vrev64_f32(*v0); }\nvoid Vrev64P16(poly16x4_t* r, poly16x4_t* v0) { *r = vrev64_p16(*v0); }\nvoid Vrev64P8(poly8x8_t* r, poly8x8_t* v0) { *r = vrev64_p8(*v0); }\nvoid Vrev64QS8(int8x16_t* r, int8x16_t* v0) { *r = vrev64q_s8(*v0); }\nvoid Vrev64QS16(int16x8_t* r, int16x8_t* v0) { *r = vrev64q_s16(*v0); }\nvoid Vrev64QS32(int32x4_t* r, int32x4_t* v0) { *r = vrev64q_s32(*v0); }\nvoid Vrev64QU8(uint8x16_t* r, uint8x16_t* v0) { *r = vrev64q_u8(*v0); }\nvoid Vrev64QU16(uint16x8_t* r, uint16x8_t* v0) { *r = vrev64q_u16(*v0); }\nvoid Vrev64QU32(uint32x4_t* r, uint32x4_t* v0) { *r = vrev64q_u32(*v0); }\nvoid Vrev64QF32(float32x4_t* r, float32x4_t* v0) { *r = vrev64q_f32(*v0); }\nvoid Vrev64QP16(poly16x8_t* r, poly16x8_t* v0) { *r = vrev64q_p16(*v0); }\nvoid Vrev64QP8(poly8x16_t* r, poly8x16_t* v0) { *r = vrev64q_p8(*v0); }\nvoid VrhaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vrhadd_s8(*v0, *v1); }\nvoid VrhaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vrhadd_s16(*v0, *v1); }\nvoid VrhaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vrhadd_s32(*v0, *v1); }\nvoid VrhaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vrhadd_u8(*v0, *v1); }\nvoid VrhaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vrhadd_u16(*v0, *v1); }\nvoid VrhaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vrhadd_u32(*v0, *v1); }\nvoid VrhaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vrhaddq_s8(*v0, *v1); }\nvoid VrhaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vrhaddq_s16(*v0, *v1); }\nvoid VrhaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vrhaddq_s32(*v0, *v1); }\nvoid VrhaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vrhaddq_u8(*v0, *v1); }\nvoid VrhaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vrhaddq_u16(*v0, *v1); }\nvoid VrhaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vrhaddq_u32(*v0, *v1); }\nvoid VrndF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd_f32(*v0); }\nvoid VrndF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd_f64(*v0); }\nvoid Vrnd32XF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd32x_f32(*v0); }\nvoid Vrnd32XF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd32x_f64(*v0); }\nvoid Vrnd32XqF32(float32x4_t* r, float32x4_t* v0) { *r = vrnd32xq_f32(*v0); }\nvoid Vrnd32XqF64(float64x2_t* r, float64x2_t* v0) { *r = vrnd32xq_f64(*v0); }\nvoid Vrnd32ZF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd32z_f32(*v0); }\nvoid Vrnd32ZF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd32z_f64(*v0); }\nvoid Vrnd32ZqF32(float32x4_t* r, float32x4_t* v0) { *r = vrnd32zq_f32(*v0); }\nvoid Vrnd32ZqF64(float64x2_t* r, float64x2_t* v0) { *r = vrnd32zq_f64(*v0); }\nvoid Vrnd64XF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd64x_f32(*v0); }\nvoid Vrnd64XF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd64x_f64(*v0); }\nvoid Vrnd64XqF32(float32x4_t* r, float32x4_t* v0) { *r = vrnd64xq_f32(*v0); }\nvoid Vrnd64XqF64(float64x2_t* r, float64x2_t* v0) { *r = vrnd64xq_f64(*v0); }\nvoid Vrnd64ZF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd64z_f32(*v0); }\nvoid Vrnd64ZF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd64z_f64(*v0); }\nvoid Vrnd64ZqF32(float32x4_t* r, float32x4_t* v0) { *r = vrnd64zq_f32(*v0); }\nvoid Vrnd64ZqF64(float64x2_t* r, float64x2_t* v0) { *r = vrnd64zq_f64(*v0); }\nvoid VrndaF32(float32x2_t* r, float32x2_t* v0) { *r = vrnda_f32(*v0); }\nvoid VrndaF64(float64x1_t* r, float64x1_t* v0) { *r = vrnda_f64(*v0); }\nvoid VrndaqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndaq_f32(*v0); }\nvoid VrndaqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndaq_f64(*v0); }\nvoid VrndiF32(float32x2_t* r, float32x2_t* v0) { *r = vrndi_f32(*v0); }\nvoid VrndiF64(float64x1_t* r, float64x1_t* v0) { *r = vrndi_f64(*v0); }\nvoid VrndiqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndiq_f32(*v0); }\nvoid VrndiqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndiq_f64(*v0); }\nvoid VrndmF32(float32x2_t* r, float32x2_t* v0) { *r = vrndm_f32(*v0); }\nvoid VrndmF64(float64x1_t* r, float64x1_t* v0) { *r = vrndm_f64(*v0); }\nvoid VrndmqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndmq_f32(*v0); }\nvoid VrndmqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndmq_f64(*v0); }\nvoid VrndnF32(float32x2_t* r, float32x2_t* v0) { *r = vrndn_f32(*v0); }\nvoid VrndnF64(float64x1_t* r, float64x1_t* v0) { *r = vrndn_f64(*v0); }\nvoid VrndnqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndnq_f32(*v0); }\nvoid VrndnqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndnq_f64(*v0); }\nvoid VrndnsF32(float32_t* r, float32_t* v0) { *r = vrndns_f32(*v0); }\nvoid VrndpF32(float32x2_t* r, float32x2_t* v0) { *r = vrndp_f32(*v0); }\nvoid VrndpF64(float64x1_t* r, float64x1_t* v0) { *r = vrndp_f64(*v0); }\nvoid VrndpqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndpq_f32(*v0); }\nvoid VrndpqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndpq_f64(*v0); }\nvoid VrndqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndq_f32(*v0); }\nvoid VrndqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndq_f64(*v0); }\nvoid VrndxF32(float32x2_t* r, float32x2_t* v0) { *r = vrndx_f32(*v0); }\nvoid VrndxF64(float64x1_t* r, float64x1_t* v0) { *r = vrndx_f64(*v0); }\nvoid VrndxqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndxq_f32(*v0); }\nvoid VrndxqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndxq_f64(*v0); }\nvoid VrshlS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vrshl_s8(*v0, *v1); }\nvoid VrshlS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vrshl_s16(*v0, *v1); }\nvoid VrshlS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vrshl_s32(*v0, *v1); }\nvoid VrshlS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vrshl_s64(*v0, *v1); }\nvoid VrshlU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vrshl_u8(*v0, *v1); }\nvoid VrshlU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vrshl_u16(*v0, *v1); }\nvoid VrshlU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vrshl_u32(*v0, *v1); }\nvoid VrshlU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vrshl_u64(*v0, *v1); }\nvoid VrshldS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vrshld_s64(*v0, *v1); }\nvoid VrshldU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vrshld_u64(*v0, *v1); }\nvoid VrshlqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vrshlq_s8(*v0, *v1); }\nvoid VrshlqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vrshlq_s16(*v0, *v1); }\nvoid VrshlqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vrshlq_s32(*v0, *v1); }\nvoid VrshlqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vrshlq_s64(*v0, *v1); }\nvoid VrshlqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vrshlq_u8(*v0, *v1); }\nvoid VrshlqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vrshlq_u16(*v0, *v1); }\nvoid VrshlqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vrshlq_u32(*v0, *v1); }\nvoid VrshlqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vrshlq_u64(*v0, *v1); }\nvoid VrsqrteU32(uint32x2_t* r, uint32x2_t* v0) { *r = vrsqrte_u32(*v0); }\nvoid VrsqrteF32(float32x2_t* r, float32x2_t* v0) { *r = vrsqrte_f32(*v0); }\nvoid VrsqrteF64(float64x1_t* r, float64x1_t* v0) { *r = vrsqrte_f64(*v0); }\nvoid VrsqrtedF64(float64_t* r, float64_t* v0) { *r = vrsqrted_f64(*v0); }\nvoid VrsqrteqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vrsqrteq_u32(*v0); }\nvoid VrsqrteqF32(float32x4_t* r, float32x4_t* v0) { *r = vrsqrteq_f32(*v0); }\nvoid VrsqrteqF64(float64x2_t* r, float64x2_t* v0) { *r = vrsqrteq_f64(*v0); }\nvoid VrsqrtesF32(float32_t* r, float32_t* v0) { *r = vrsqrtes_f32(*v0); }\nvoid VrsqrtsF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vrsqrts_f32(*v0, *v1); }\nvoid VrsqrtsF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vrsqrts_f64(*v0, *v1); }\nvoid VrsqrtsdF64(float64_t* r, float64_t* v0, float64_t* v1) { *r = vrsqrtsd_f64(*v0, *v1); }\nvoid VrsqrtsqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vrsqrtsq_f32(*v0, *v1); }\nvoid VrsqrtsqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vrsqrtsq_f64(*v0, *v1); }\nvoid VrsqrtssF32(float32_t* r, float32_t* v0, float32_t* v1) { *r = vrsqrtss_f32(*v0, *v1); }\nvoid VrsubhnS16(int8x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vrsubhn_s16(*v0, *v1); }\nvoid VrsubhnS32(int16x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vrsubhn_s32(*v0, *v1); }\nvoid VrsubhnS64(int32x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vrsubhn_s64(*v0, *v1); }\nvoid VrsubhnU16(uint8x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vrsubhn_u16(*v0, *v1); }\nvoid VrsubhnU32(uint16x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vrsubhn_u32(*v0, *v1); }\nvoid VrsubhnU64(uint32x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vrsubhn_u64(*v0, *v1); }\nvoid VrsubhnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vrsubhn_high_s16(*v0, *v1, *v2); }\nvoid VrsubhnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vrsubhn_high_s32(*v0, *v1, *v2); }\nvoid VrsubhnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vrsubhn_high_s64(*v0, *v1, *v2); }\nvoid VrsubhnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vrsubhn_high_u16(*v0, *v1, *v2); }\nvoid VrsubhnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vrsubhn_high_u32(*v0, *v1, *v2); }\nvoid VrsubhnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vrsubhn_high_u64(*v0, *v1, *v2); }\nvoid Vsha1CqU32(uint32x4_t* r, uint32x4_t* v0, uint32_t* v1, uint32x4_t* v2) { *r = vsha1cq_u32(*v0, *v1, *v2); }\nvoid Vsha1HU32(uint32_t* r, uint32_t* v0) { *r = vsha1h_u32(*v0); }\nvoid Vsha1MqU32(uint32x4_t* r, uint32x4_t* v0, uint32_t* v1, uint32x4_t* v2) { *r = vsha1mq_u32(*v0, *v1, *v2); }\nvoid Vsha1PqU32(uint32x4_t* r, uint32x4_t* v0, uint32_t* v1, uint32x4_t* v2) { *r = vsha1pq_u32(*v0, *v1, *v2); }\nvoid Vsha1Su0QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsha1su0q_u32(*v0, *v1, *v2); }\nvoid Vsha1Su1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsha1su1q_u32(*v0, *v1); }\nvoid Vsha256H2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsha256h2q_u32(*v0, *v1, *v2); }\nvoid Vsha256HqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsha256hq_u32(*v0, *v1, *v2); }\nvoid Vsha256Su0QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsha256su0q_u32(*v0, *v1); }\nvoid Vsha256Su1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsha256su1q_u32(*v0, *v1, *v2); }\nvoid Vsha512H2QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vsha512h2q_u64(*v0, *v1, *v2); }\nvoid Vsha512HqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vsha512hq_u64(*v0, *v1, *v2); }\nvoid Vsha512Su0QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vsha512su0q_u64(*v0, *v1); }\nvoid Vsha512Su1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vsha512su1q_u64(*v0, *v1, *v2); }\nvoid VshlS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vshl_s8(*v0, *v1); }\nvoid VshlS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vshl_s16(*v0, *v1); }\nvoid VshlS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vshl_s32(*v0, *v1); }\nvoid VshlS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vshl_s64(*v0, *v1); }\nvoid VshlU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vshl_u8(*v0, *v1); }\nvoid VshlU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vshl_u16(*v0, *v1); }\nvoid VshlU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vshl_u32(*v0, *v1); }\nvoid VshlU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vshl_u64(*v0, *v1); }\nvoid VshldS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vshld_s64(*v0, *v1); }\nvoid VshldU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vshld_u64(*v0, *v1); }\nvoid VshlqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vshlq_s8(*v0, *v1); }\nvoid VshlqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vshlq_s16(*v0, *v1); }\nvoid VshlqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vshlq_s32(*v0, *v1); }\nvoid VshlqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vshlq_s64(*v0, *v1); }\nvoid VshlqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vshlq_u8(*v0, *v1); }\nvoid VshlqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vshlq_u16(*v0, *v1); }\nvoid VshlqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vshlq_u32(*v0, *v1); }\nvoid VshlqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vshlq_u64(*v0, *v1); }\nvoid Vsm3Partw1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsm3partw1q_u32(*v0, *v1, *v2); }\nvoid Vsm3Partw2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsm3partw2q_u32(*v0, *v1, *v2); }\nvoid Vsm3Ss1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsm3ss1q_u32(*v0, *v1, *v2); }\nvoid Vsm4EkeyqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsm4ekeyq_u32(*v0, *v1); }\nvoid Vsm4EqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsm4eq_u32(*v0, *v1); }\nvoid VsqaddU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vsqadd_u8(*v0, *v1); }\nvoid VsqaddU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vsqadd_u16(*v0, *v1); }\nvoid VsqaddU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vsqadd_u32(*v0, *v1); }\nvoid VsqaddU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vsqadd_u64(*v0, *v1); }\nvoid VsqaddbU8(uint8_t* r, uint8_t* v0, int8_t* v1) { *r = vsqaddb_u8(*v0, *v1); }\nvoid VsqadddU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vsqaddd_u64(*v0, *v1); }\nvoid VsqaddhU16(uint16_t* r, uint16_t* v0, int16_t* v1) { *r = vsqaddh_u16(*v0, *v1); }\nvoid VsqaddqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vsqaddq_u8(*v0, *v1); }\nvoid VsqaddqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vsqaddq_u16(*v0, *v1); }\nvoid VsqaddqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vsqaddq_u32(*v0, *v1); }\nvoid VsqaddqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vsqaddq_u64(*v0, *v1); }\nvoid VsqaddsU32(uint32_t* r, uint32_t* v0, int32_t* v1) { *r = vsqadds_u32(*v0, *v1); }\nvoid VsqrtF32(float32x2_t* r, float32x2_t* v0) { *r = vsqrt_f32(*v0); }\nvoid VsqrtF64(float64x1_t* r, float64x1_t* v0) { *r = vsqrt_f64(*v0); }\nvoid VsqrtqF32(float32x4_t* r, float32x4_t* v0) { *r = vsqrtq_f32(*v0); }\nvoid VsqrtqF64(float64x2_t* r, float64x2_t* v0) { *r = vsqrtq_f64(*v0); }\nvoid VsubS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vsub_s8(*v0, *v1); }\nvoid VsubS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vsub_s16(*v0, *v1); }\nvoid VsubS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vsub_s32(*v0, *v1); }\nvoid VsubS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vsub_s64(*v0, *v1); }\nvoid VsubU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vsub_u8(*v0, *v1); }\nvoid VsubU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vsub_u16(*v0, *v1); }\nvoid VsubU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vsub_u32(*v0, *v1); }\nvoid VsubU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vsub_u64(*v0, *v1); }\nvoid VsubF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vsub_f32(*v0, *v1); }\nvoid VsubF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vsub_f64(*v0, *v1); }\nvoid VsubdS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vsubd_s64(*v0, *v1); }\nvoid VsubdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vsubd_u64(*v0, *v1); }\nvoid VsubhnS16(int8x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vsubhn_s16(*v0, *v1); }\nvoid VsubhnS32(int16x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vsubhn_s32(*v0, *v1); }\nvoid VsubhnS64(int32x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vsubhn_s64(*v0, *v1); }\nvoid VsubhnU16(uint8x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vsubhn_u16(*v0, *v1); }\nvoid VsubhnU32(uint16x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsubhn_u32(*v0, *v1); }\nvoid VsubhnU64(uint32x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vsubhn_u64(*v0, *v1); }\nvoid VsubhnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vsubhn_high_s16(*v0, *v1, *v2); }\nvoid VsubhnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vsubhn_high_s32(*v0, *v1, *v2); }\nvoid VsubhnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vsubhn_high_s64(*v0, *v1, *v2); }\nvoid VsubhnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vsubhn_high_u16(*v0, *v1, *v2); }\nvoid VsubhnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsubhn_high_u32(*v0, *v1, *v2); }\nvoid VsubhnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vsubhn_high_u64(*v0, *v1, *v2); }\nvoid VsublS8(int16x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vsubl_s8(*v0, *v1); }\nvoid VsublS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vsubl_s16(*v0, *v1); }\nvoid VsublS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vsubl_s32(*v0, *v1); }\nvoid VsublU8(uint16x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vsubl_u8(*v0, *v1); }\nvoid VsublU16(uint32x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vsubl_u16(*v0, *v1); }\nvoid VsublU32(uint64x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vsubl_u32(*v0, *v1); }\nvoid VsublHighS8(int16x8_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vsubl_high_s8(*v0, *v1); }\nvoid VsublHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vsubl_high_s16(*v0, *v1); }\nvoid VsublHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vsubl_high_s32(*v0, *v1); }\nvoid VsublHighU8(uint16x8_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vsubl_high_u8(*v0, *v1); }\nvoid VsublHighU16(uint32x4_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vsubl_high_u16(*v0, *v1); }\nvoid VsublHighU32(uint64x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsubl_high_u32(*v0, *v1); }\nvoid VsubqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vsubq_s8(*v0, *v1); }\nvoid VsubqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vsubq_s16(*v0, *v1); }\nvoid VsubqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vsubq_s32(*v0, *v1); }\nvoid VsubqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vsubq_s64(*v0, *v1); }\nvoid VsubqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vsubq_u8(*v0, *v1); }\nvoid VsubqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vsubq_u16(*v0, *v1); }\nvoid VsubqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsubq_u32(*v0, *v1); }\nvoid VsubqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vsubq_u64(*v0, *v1); }\nvoid VsubqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vsubq_f32(*v0, *v1); }\nvoid VsubqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vsubq_f64(*v0, *v1); }\nvoid VsubwS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1) { *r = vsubw_s8(*v0, *v1); }\nvoid VsubwS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1) { *r = vsubw_s16(*v0, *v1); }\nvoid VsubwS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1) { *r = vsubw_s32(*v0, *v1); }\nvoid VsubwU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1) { *r = vsubw_u8(*v0, *v1); }\nvoid VsubwU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1) { *r = vsubw_u16(*v0, *v1); }\nvoid VsubwU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1) { *r = vsubw_u32(*v0, *v1); }\nvoid VsubwHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1) { *r = vsubw_high_s8(*v0, *v1); }\nvoid VsubwHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1) { *r = vsubw_high_s16(*v0, *v1); }\nvoid VsubwHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1) { *r = vsubw_high_s32(*v0, *v1); }\nvoid VsubwHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1) { *r = vsubw_high_u8(*v0, *v1); }\nvoid VsubwHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1) { *r = vsubw_high_u16(*v0, *v1); }\nvoid VsubwHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1) { *r = vsubw_high_u32(*v0, *v1); }\nvoid Vtbl1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtbl1_s8(*v0, *v1); }\nvoid Vtbl1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtbl1_u8(*v0, *v1); }\nvoid Vtbl1P8(poly8x8_t* r, poly8x8_t* v0, uint8x8_t* v1) { *r = vtbl1_p8(*v0, *v1); }\nvoid Vtbl2S8(int8x8_t* r, int8x8x2_t* v0, int8x8_t* v1) { *r = vtbl2_s8(*v0, *v1); }\nvoid Vtbl2U8(uint8x8_t* r, uint8x8x2_t* v0, uint8x8_t* v1) { *r = vtbl2_u8(*v0, *v1); }\nvoid Vtbl2P8(poly8x8_t* r, poly8x8x2_t* v0, uint8x8_t* v1) { *r = vtbl2_p8(*v0, *v1); }\nvoid Vtbl3S8(int8x8_t* r, int8x8x3_t* v0, int8x8_t* v1) { *r = vtbl3_s8(*v0, *v1); }\nvoid Vtbl3U8(uint8x8_t* r, uint8x8x3_t* v0, uint8x8_t* v1) { *r = vtbl3_u8(*v0, *v1); }\nvoid Vtbl3P8(poly8x8_t* r, poly8x8x3_t* v0, uint8x8_t* v1) { *r = vtbl3_p8(*v0, *v1); }\nvoid Vtbl4S8(int8x8_t* r, int8x8x4_t* v0, int8x8_t* v1) { *r = vtbl4_s8(*v0, *v1); }\nvoid Vtbl4U8(uint8x8_t* r, uint8x8x4_t* v0, uint8x8_t* v1) { *r = vtbl4_u8(*v0, *v1); }\nvoid Vtbl4P8(poly8x8_t* r, poly8x8x4_t* v0, uint8x8_t* v1) { *r = vtbl4_p8(*v0, *v1); }\nvoid Vtbx1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vtbx1_s8(*v0, *v1, *v2); }\nvoid Vtbx1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vtbx1_u8(*v0, *v1, *v2); }\nvoid Vtbx1P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1, uint8x8_t* v2) { *r = vtbx1_p8(*v0, *v1, *v2); }\nvoid Vtbx2S8(int8x8_t* r, int8x8_t* v0, int8x8x2_t* v1, int8x8_t* v2) { *r = vtbx2_s8(*v0, *v1, *v2); }\nvoid Vtbx2U8(uint8x8_t* r, uint8x8_t* v0, uint8x8x2_t* v1, uint8x8_t* v2) { *r = vtbx2_u8(*v0, *v1, *v2); }\nvoid Vtbx2P8(poly8x8_t* r, poly8x8_t* v0, poly8x8x2_t* v1, uint8x8_t* v2) { *r = vtbx2_p8(*v0, *v1, *v2); }\nvoid Vtbx3S8(int8x8_t* r, int8x8_t* v0, int8x8x3_t* v1, int8x8_t* v2) { *r = vtbx3_s8(*v0, *v1, *v2); }\nvoid Vtbx3U8(uint8x8_t* r, uint8x8_t* v0, uint8x8x3_t* v1, uint8x8_t* v2) { *r = vtbx3_u8(*v0, *v1, *v2); }\nvoid Vtbx3P8(poly8x8_t* r, poly8x8_t* v0, poly8x8x3_t* v1, uint8x8_t* v2) { *r = vtbx3_p8(*v0, *v1, *v2); }\nvoid Vtbx4S8(int8x8_t* r, int8x8_t* v0, int8x8x4_t* v1, int8x8_t* v2) { *r = vtbx4_s8(*v0, *v1, *v2); }\nvoid Vtbx4U8(uint8x8_t* r, uint8x8_t* v0, uint8x8x4_t* v1, uint8x8_t* v2) { *r = vtbx4_u8(*v0, *v1, *v2); }\nvoid Vtbx4P8(poly8x8_t* r, poly8x8_t* v0, poly8x8x4_t* v1, uint8x8_t* v2) { *r = vtbx4_p8(*v0, *v1, *v2); }\nvoid VtrnS8(int8x8x2_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtrn_s8(*v0, *v1); }\nvoid VtrnS16(int16x4x2_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vtrn_s16(*v0, *v1); }\nvoid VtrnS32(int32x2x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vtrn_s32(*v0, *v1); }\nvoid VtrnU8(uint8x8x2_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtrn_u8(*v0, *v1); }\nvoid VtrnU16(uint16x4x2_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vtrn_u16(*v0, *v1); }\nvoid VtrnU32(uint32x2x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vtrn_u32(*v0, *v1); }\nvoid VtrnF32(float32x2x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vtrn_f32(*v0, *v1); }\nvoid Vtrn1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtrn1_s8(*v0, *v1); }\nvoid Vtrn1S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vtrn1_s16(*v0, *v1); }\nvoid Vtrn1S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vtrn1_s32(*v0, *v1); }\nvoid Vtrn1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtrn1_u8(*v0, *v1); }\nvoid Vtrn1U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vtrn1_u16(*v0, *v1); }\nvoid Vtrn1U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vtrn1_u32(*v0, *v1); }\nvoid Vtrn1F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vtrn1_f32(*v0, *v1); }\nvoid Vtrn1P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vtrn1_p16(*v0, *v1); }\nvoid Vtrn1P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vtrn1_p8(*v0, *v1); }\nvoid Vtrn1QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vtrn1q_s8(*v0, *v1); }\nvoid Vtrn1QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vtrn1q_s16(*v0, *v1); }\nvoid Vtrn1QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vtrn1q_s32(*v0, *v1); }\nvoid Vtrn1QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vtrn1q_s64(*v0, *v1); }\nvoid Vtrn1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vtrn1q_u8(*v0, *v1); }\nvoid Vtrn1QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vtrn1q_u16(*v0, *v1); }\nvoid Vtrn1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vtrn1q_u32(*v0, *v1); }\nvoid Vtrn1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vtrn1q_u64(*v0, *v1); }\nvoid Vtrn1QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vtrn1q_f32(*v0, *v1); }\nvoid Vtrn1QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vtrn1q_f64(*v0, *v1); }\nvoid Vtrn1QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vtrn1q_p16(*v0, *v1); }\nvoid Vtrn1QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vtrn1q_p64(*v0, *v1); }\nvoid Vtrn1QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vtrn1q_p8(*v0, *v1); }\nvoid Vtrn2S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtrn2_s8(*v0, *v1); }\nvoid Vtrn2S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vtrn2_s16(*v0, *v1); }\nvoid Vtrn2S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vtrn2_s32(*v0, *v1); }\nvoid Vtrn2U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtrn2_u8(*v0, *v1); }\nvoid Vtrn2U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vtrn2_u16(*v0, *v1); }\nvoid Vtrn2U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vtrn2_u32(*v0, *v1); }\nvoid Vtrn2F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vtrn2_f32(*v0, *v1); }\nvoid Vtrn2P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vtrn2_p16(*v0, *v1); }\nvoid Vtrn2P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vtrn2_p8(*v0, *v1); }\nvoid Vtrn2QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vtrn2q_s8(*v0, *v1); }\nvoid Vtrn2QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vtrn2q_s16(*v0, *v1); }\nvoid Vtrn2QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vtrn2q_s32(*v0, *v1); }\nvoid Vtrn2QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vtrn2q_s64(*v0, *v1); }\nvoid Vtrn2QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vtrn2q_u8(*v0, *v1); }\nvoid Vtrn2QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vtrn2q_u16(*v0, *v1); }\nvoid Vtrn2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vtrn2q_u32(*v0, *v1); }\nvoid Vtrn2QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vtrn2q_u64(*v0, *v1); }\nvoid Vtrn2QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vtrn2q_f32(*v0, *v1); }\nvoid Vtrn2QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vtrn2q_f64(*v0, *v1); }\nvoid Vtrn2QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vtrn2q_p16(*v0, *v1); }\nvoid Vtrn2QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vtrn2q_p64(*v0, *v1); }\nvoid Vtrn2QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vtrn2q_p8(*v0, *v1); }\nvoid VtrnP16(poly16x4x2_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vtrn_p16(*v0, *v1); }\nvoid VtrnP8(poly8x8x2_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vtrn_p8(*v0, *v1); }\nvoid VtrnqS8(int8x16x2_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vtrnq_s8(*v0, *v1); }\nvoid VtrnqS16(int16x8x2_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vtrnq_s16(*v0, *v1); }\nvoid VtrnqS32(int32x4x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vtrnq_s32(*v0, *v1); }\nvoid VtrnqU8(uint8x16x2_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vtrnq_u8(*v0, *v1); }\nvoid VtrnqU16(uint16x8x2_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vtrnq_u16(*v0, *v1); }\nvoid VtrnqU32(uint32x4x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vtrnq_u32(*v0, *v1); }\nvoid VtrnqF32(float32x4x2_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vtrnq_f32(*v0, *v1); }\nvoid VtrnqP16(poly16x8x2_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vtrnq_p16(*v0, *v1); }\nvoid VtrnqP8(poly8x16x2_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vtrnq_p8(*v0, *v1); }\nvoid VtstS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtst_s8(*v0, *v1); }\nvoid VtstS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vtst_s16(*v0, *v1); }\nvoid VtstS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vtst_s32(*v0, *v1); }\nvoid VtstS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vtst_s64(*v0, *v1); }\nvoid VtstU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtst_u8(*v0, *v1); }\nvoid VtstU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vtst_u16(*v0, *v1); }\nvoid VtstU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vtst_u32(*v0, *v1); }\nvoid VtstU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vtst_u64(*v0, *v1); }\nvoid VtstP16(uint16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vtst_p16(*v0, *v1); }\nvoid VtstP64(uint64x1_t* r, poly64x1_t* v0, poly64x1_t* v1) { *r = vtst_p64(*v0, *v1); }\nvoid VtstP8(uint8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vtst_p8(*v0, *v1); }\nvoid VtstdS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vtstd_s64(*v0, *v1); }\nvoid VtstdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vtstd_u64(*v0, *v1); }\nvoid VtstqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vtstq_s8(*v0, *v1); }\nvoid VtstqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vtstq_s16(*v0, *v1); }\nvoid VtstqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vtstq_s32(*v0, *v1); }\nvoid VtstqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vtstq_s64(*v0, *v1); }\nvoid VtstqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vtstq_u8(*v0, *v1); }\nvoid VtstqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vtstq_u16(*v0, *v1); }\nvoid VtstqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vtstq_u32(*v0, *v1); }\nvoid VtstqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vtstq_u64(*v0, *v1); }\nvoid VtstqP16(uint16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vtstq_p16(*v0, *v1); }\nvoid VtstqP64(uint64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vtstq_p64(*v0, *v1); }\nvoid VtstqP8(uint8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vtstq_p8(*v0, *v1); }\nvoid VuqaddS8(int8x8_t* r, int8x8_t* v0, uint8x8_t* v1) { *r = vuqadd_s8(*v0, *v1); }\nvoid VuqaddS16(int16x4_t* r, int16x4_t* v0, uint16x4_t* v1) { *r = vuqadd_s16(*v0, *v1); }\nvoid VuqaddS32(int32x2_t* r, int32x2_t* v0, uint32x2_t* v1) { *r = vuqadd_s32(*v0, *v1); }\nvoid VuqaddS64(int64x1_t* r, int64x1_t* v0, uint64x1_t* v1) { *r = vuqadd_s64(*v0, *v1); }\nvoid VuqaddbS8(int8_t* r, int8_t* v0, uint8_t* v1) { *r = vuqaddb_s8(*v0, *v1); }\nvoid VuqadddS64(int64_t* r, int64_t* v0, uint64_t* v1) { *r = vuqaddd_s64(*v0, *v1); }\nvoid VuqaddhS16(int16_t* r, int16_t* v0, uint16_t* v1) { *r = vuqaddh_s16(*v0, *v1); }\nvoid VuqaddqS8(int8x16_t* r, int8x16_t* v0, uint8x16_t* v1) { *r = vuqaddq_s8(*v0, *v1); }\nvoid VuqaddqS16(int16x8_t* r, int16x8_t* v0, uint16x8_t* v1) { *r = vuqaddq_s16(*v0, *v1); }\nvoid VuqaddqS32(int32x4_t* r, int32x4_t* v0, uint32x4_t* v1) { *r = vuqaddq_s32(*v0, *v1); }\nvoid VuqaddqS64(int64x2_t* r, int64x2_t* v0, uint64x2_t* v1) { *r = vuqaddq_s64(*v0, *v1); }\nvoid VuqaddsS32(int32_t* r, int32_t* v0, uint32_t* v1) { *r = vuqadds_s32(*v0, *v1); }\nvoid VusdotS32(int32x2_t* r, int32x2_t* v0, uint8x8_t* v1, int8x8_t* v2) { *r = vusdot_s32(*v0, *v1, *v2); }\nvoid VusdotqS32(int32x4_t* r, int32x4_t* v0, uint8x16_t* v1, int8x16_t* v2) { *r = vusdotq_s32(*v0, *v1, *v2); }\nvoid VusmmlaqS32(int32x4_t* r, int32x4_t* v0, uint8x16_t* v1, int8x16_t* v2) { *r = vusmmlaq_s32(*v0, *v1, *v2); }\nvoid VuzpS8(int8x8x2_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vuzp_s8(*v0, *v1); }\nvoid VuzpS16(int16x4x2_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vuzp_s16(*v0, *v1); }\nvoid VuzpS32(int32x2x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vuzp_s32(*v0, *v1); }\nvoid VuzpU8(uint8x8x2_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vuzp_u8(*v0, *v1); }\nvoid VuzpU16(uint16x4x2_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vuzp_u16(*v0, *v1); }\nvoid VuzpU32(uint32x2x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vuzp_u32(*v0, *v1); }\nvoid VuzpF32(float32x2x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vuzp_f32(*v0, *v1); }\nvoid Vuzp1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vuzp1_s8(*v0, *v1); }\nvoid Vuzp1S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vuzp1_s16(*v0, *v1); }\nvoid Vuzp1S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vuzp1_s32(*v0, *v1); }\nvoid Vuzp1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vuzp1_u8(*v0, *v1); }\nvoid Vuzp1U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vuzp1_u16(*v0, *v1); }\nvoid Vuzp1U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vuzp1_u32(*v0, *v1); }\nvoid Vuzp1F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vuzp1_f32(*v0, *v1); }\nvoid Vuzp1P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vuzp1_p16(*v0, *v1); }\nvoid Vuzp1P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vuzp1_p8(*v0, *v1); }\nvoid Vuzp1QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vuzp1q_s8(*v0, *v1); }\nvoid Vuzp1QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vuzp1q_s16(*v0, *v1); }\nvoid Vuzp1QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vuzp1q_s32(*v0, *v1); }\nvoid Vuzp1QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vuzp1q_s64(*v0, *v1); }\nvoid Vuzp1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vuzp1q_u8(*v0, *v1); }\nvoid Vuzp1QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vuzp1q_u16(*v0, *v1); }\nvoid Vuzp1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vuzp1q_u32(*v0, *v1); }\nvoid Vuzp1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vuzp1q_u64(*v0, *v1); }\nvoid Vuzp1QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vuzp1q_f32(*v0, *v1); }\nvoid Vuzp1QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vuzp1q_f64(*v0, *v1); }\nvoid Vuzp1QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vuzp1q_p16(*v0, *v1); }\nvoid Vuzp1QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vuzp1q_p64(*v0, *v1); }\nvoid Vuzp1QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vuzp1q_p8(*v0, *v1); }\nvoid Vuzp2S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vuzp2_s8(*v0, *v1); }\nvoid Vuzp2S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vuzp2_s16(*v0, *v1); }\nvoid Vuzp2S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vuzp2_s32(*v0, *v1); }\nvoid Vuzp2U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vuzp2_u8(*v0, *v1); }\nvoid Vuzp2U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vuzp2_u16(*v0, *v1); }\nvoid Vuzp2U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vuzp2_u32(*v0, *v1); }\nvoid Vuzp2F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vuzp2_f32(*v0, *v1); }\nvoid Vuzp2P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vuzp2_p16(*v0, *v1); }\nvoid Vuzp2P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vuzp2_p8(*v0, *v1); }\nvoid Vuzp2QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vuzp2q_s8(*v0, *v1); }\nvoid Vuzp2QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vuzp2q_s16(*v0, *v1); }\nvoid Vuzp2QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vuzp2q_s32(*v0, *v1); }\nvoid Vuzp2QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vuzp2q_s64(*v0, *v1); }\nvoid Vuzp2QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vuzp2q_u8(*v0, *v1); }\nvoid Vuzp2QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vuzp2q_u16(*v0, *v1); }\nvoid Vuzp2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vuzp2q_u32(*v0, *v1); }\nvoid Vuzp2QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vuzp2q_u64(*v0, *v1); }\nvoid Vuzp2QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vuzp2q_f32(*v0, *v1); }\nvoid Vuzp2QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vuzp2q_f64(*v0, *v1); }\nvoid Vuzp2QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vuzp2q_p16(*v0, *v1); }\nvoid Vuzp2QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vuzp2q_p64(*v0, *v1); }\nvoid Vuzp2QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vuzp2q_p8(*v0, *v1); }\nvoid VuzpP16(poly16x4x2_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vuzp_p16(*v0, *v1); }\nvoid VuzpP8(poly8x8x2_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vuzp_p8(*v0, *v1); }\nvoid VuzpqS8(int8x16x2_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vuzpq_s8(*v0, *v1); }\nvoid VuzpqS16(int16x8x2_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vuzpq_s16(*v0, *v1); }\nvoid VuzpqS32(int32x4x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vuzpq_s32(*v0, *v1); }\nvoid VuzpqU8(uint8x16x2_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vuzpq_u8(*v0, *v1); }\nvoid VuzpqU16(uint16x8x2_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vuzpq_u16(*v0, *v1); }\nvoid VuzpqU32(uint32x4x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vuzpq_u32(*v0, *v1); }\nvoid VuzpqF32(float32x4x2_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vuzpq_f32(*v0, *v1); }\nvoid VuzpqP16(poly16x8x2_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vuzpq_p16(*v0, *v1); }\nvoid VuzpqP8(poly8x16x2_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vuzpq_p8(*v0, *v1); }\nvoid VzipS8(int8x8x2_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vzip_s8(*v0, *v1); }\nvoid VzipS16(int16x4x2_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vzip_s16(*v0, *v1); }\nvoid VzipS32(int32x2x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vzip_s32(*v0, *v1); }\nvoid VzipU8(uint8x8x2_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vzip_u8(*v0, *v1); }\nvoid VzipU16(uint16x4x2_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vzip_u16(*v0, *v1); }\nvoid VzipU32(uint32x2x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vzip_u32(*v0, *v1); }\nvoid VzipF32(float32x2x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vzip_f32(*v0, *v1); }\nvoid Vzip1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vzip1_s8(*v0, *v1); }\nvoid Vzip1S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vzip1_s16(*v0, *v1); }\nvoid Vzip1S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vzip1_s32(*v0, *v1); }\nvoid Vzip1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vzip1_u8(*v0, *v1); }\nvoid Vzip1U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vzip1_u16(*v0, *v1); }\nvoid Vzip1U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vzip1_u32(*v0, *v1); }\nvoid Vzip1F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vzip1_f32(*v0, *v1); }\nvoid Vzip1P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vzip1_p16(*v0, *v1); }\nvoid Vzip1P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vzip1_p8(*v0, *v1); }\nvoid Vzip1QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vzip1q_s8(*v0, *v1); }\nvoid Vzip1QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vzip1q_s16(*v0, *v1); }\nvoid Vzip1QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vzip1q_s32(*v0, *v1); }\nvoid Vzip1QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vzip1q_s64(*v0, *v1); }\nvoid Vzip1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vzip1q_u8(*v0, *v1); }\nvoid Vzip1QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vzip1q_u16(*v0, *v1); }\nvoid Vzip1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vzip1q_u32(*v0, *v1); }\nvoid Vzip1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vzip1q_u64(*v0, *v1); }\nvoid Vzip1QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vzip1q_f32(*v0, *v1); }\nvoid Vzip1QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vzip1q_f64(*v0, *v1); }\nvoid Vzip1QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vzip1q_p16(*v0, *v1); }\nvoid Vzip1QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vzip1q_p64(*v0, *v1); }\nvoid Vzip1QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vzip1q_p8(*v0, *v1); }\nvoid Vzip2S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vzip2_s8(*v0, *v1); }\nvoid Vzip2S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vzip2_s16(*v0, *v1); }\nvoid Vzip2S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vzip2_s32(*v0, *v1); }\nvoid Vzip2U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vzip2_u8(*v0, *v1); }\nvoid Vzip2U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vzip2_u16(*v0, *v1); }\nvoid Vzip2U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vzip2_u32(*v0, *v1); }\nvoid Vzip2F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vzip2_f32(*v0, *v1); }\nvoid Vzip2P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vzip2_p16(*v0, *v1); }\nvoid Vzip2P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vzip2_p8(*v0, *v1); }\nvoid Vzip2QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vzip2q_s8(*v0, *v1); }\nvoid Vzip2QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vzip2q_s16(*v0, *v1); }\nvoid Vzip2QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vzip2q_s32(*v0, *v1); }\nvoid Vzip2QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vzip2q_s64(*v0, *v1); }\nvoid Vzip2QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vzip2q_u8(*v0, *v1); }\nvoid Vzip2QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vzip2q_u16(*v0, *v1); }\nvoid Vzip2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vzip2q_u32(*v0, *v1); }\nvoid Vzip2QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vzip2q_u64(*v0, *v1); }\nvoid Vzip2QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vzip2q_f32(*v0, *v1); }\nvoid Vzip2QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vzip2q_f64(*v0, *v1); }\nvoid Vzip2QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vzip2q_p16(*v0, *v1); }\nvoid Vzip2QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vzip2q_p64(*v0, *v1); }\nvoid Vzip2QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vzip2q_p8(*v0, *v1); }\nvoid VzipP16(poly16x4x2_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vzip_p16(*v0, *v1); }\nvoid VzipP8(poly8x8x2_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vzip_p8(*v0, *v1); }\nvoid VzipqS8(int8x16x2_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vzipq_s8(*v0, *v1); }\nvoid VzipqS16(int16x8x2_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vzipq_s16(*v0, *v1); }\nvoid VzipqS32(int32x4x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vzipq_s32(*v0, *v1); }\nvoid VzipqU8(uint8x16x2_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vzipq_u8(*v0, *v1); }\nvoid VzipqU16(uint16x8x2_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vzipq_u16(*v0, *v1); }\nvoid VzipqU32(uint32x4x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vzipq_u32(*v0, *v1); }\nvoid VzipqF32(float32x4x2_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vzipq_f32(*v0, *v1); }\nvoid VzipqP16(poly16x8x2_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vzipq_p16(*v0, *v1); }\nvoid VzipqP8(poly8x16x2_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vzipq_p8(*v0, *v1); }\n"
  },
  {
    "path": "arm/neon/functions.go",
    "content": "package neon\n\nimport (\n\t\"github.com/alivanz/go-simd/arm\"\n)\n\n/*\n#include <arm_neon.h>\n*/\nimport \"C\"\n\n// Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.\n//\n//go:linkname VabaS8 VabaS8\n//go:noescape\nfunc VabaS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8, v2 *arm.Int8X8)\n\n// Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.\n//\n//go:linkname VabaS16 VabaS16\n//go:noescape\nfunc VabaS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4)\n\n// Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.\n//\n//go:linkname VabaS32 VabaS32\n//go:noescape\nfunc VabaS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2)\n\n// Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.\n//\n//go:linkname VabaU8 VabaU8\n//go:noescape\nfunc VabaU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)\n\n// Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.\n//\n//go:linkname VabaU16 VabaU16\n//go:noescape\nfunc VabaU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4)\n\n// Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.\n//\n//go:linkname VabaU32 VabaU32\n//go:noescape\nfunc VabaU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2)\n\n// Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VabalS8 VabalS8\n//go:noescape\nfunc VabalS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8, v2 *arm.Int8X8)\n\n// Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VabalS16 VabalS16\n//go:noescape\nfunc VabalS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4)\n\n// Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VabalS32 VabalS32\n//go:noescape\nfunc VabalS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2)\n\n// Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VabalU8 VabalU8\n//go:noescape\nfunc VabalU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)\n\n// Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VabalU16 VabalU16\n//go:noescape\nfunc VabalU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4)\n\n// Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VabalU32 VabalU32\n//go:noescape\nfunc VabalU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2)\n\n// Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VabalHighS8 VabalHighS8\n//go:noescape\nfunc VabalHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16, v2 *arm.Int8X16)\n\n// Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VabalHighS16 VabalHighS16\n//go:noescape\nfunc VabalHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VabalHighS32 VabalHighS32\n//go:noescape\nfunc VabalHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VabalHighU8 VabalHighU8\n//go:noescape\nfunc VabalHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16, v2 *arm.Uint8X16)\n\n// Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VabalHighU16 VabalHighU16\n//go:noescape\nfunc VabalHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16X8)\n\n// Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VabalHighU32 VabalHighU32\n//go:noescape\nfunc VabalHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.\n//\n//go:linkname VabaqS8 VabaqS8\n//go:noescape\nfunc VabaqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16)\n\n// Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.\n//\n//go:linkname VabaqS16 VabaqS16\n//go:noescape\nfunc VabaqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.\n//\n//go:linkname VabaqS32 VabaqS32\n//go:noescape\nfunc VabaqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.\n//\n//go:linkname VabaqU8 VabaqU8\n//go:noescape\nfunc VabaqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16)\n\n// Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.\n//\n//go:linkname VabaqU16 VabaqU16\n//go:noescape\nfunc VabaqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)\n\n// Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register.\n//\n//go:linkname VabaqU32 VabaqU32\n//go:noescape\nfunc VabaqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdS8 VabdS8\n//go:noescape\nfunc VabdS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdS16 VabdS16\n//go:noescape\nfunc VabdS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdS32 VabdS32\n//go:noescape\nfunc VabdS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdU8 VabdU8\n//go:noescape\nfunc VabdU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdU16 VabdU16\n//go:noescape\nfunc VabdU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdU32 VabdU32\n//go:noescape\nfunc VabdU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdF32 VabdF32\n//go:noescape\nfunc VabdF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdF64 VabdF64\n//go:noescape\nfunc VabdF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabddF64 VabddF64\n//go:noescape\nfunc VabddF64(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64)\n\n// Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VabdlS8 VabdlS8\n//go:noescape\nfunc VabdlS8(r *arm.Int16X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VabdlS16 VabdlS16\n//go:noescape\nfunc VabdlS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VabdlS32 VabdlS32\n//go:noescape\nfunc VabdlS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VabdlU8 VabdlU8\n//go:noescape\nfunc VabdlU8(r *arm.Uint16X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VabdlU16 VabdlU16\n//go:noescape\nfunc VabdlU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VabdlU32 VabdlU32\n//go:noescape\nfunc VabdlU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VabdlHighS8 VabdlHighS8\n//go:noescape\nfunc VabdlHighS8(r *arm.Int16X8, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VabdlHighS16 VabdlHighS16\n//go:noescape\nfunc VabdlHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VabdlHighS32 VabdlHighS32\n//go:noescape\nfunc VabdlHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VabdlHighU8 VabdlHighU8\n//go:noescape\nfunc VabdlHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VabdlHighU16 VabdlHighU16\n//go:noescape\nfunc VabdlHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VabdlHighU32 VabdlHighU32\n//go:noescape\nfunc VabdlHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqS8 VabdqS8\n//go:noescape\nfunc VabdqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqS16 VabdqS16\n//go:noescape\nfunc VabdqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqS32 VabdqS32\n//go:noescape\nfunc VabdqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqU8 VabdqU8\n//go:noescape\nfunc VabdqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqU16 VabdqU16\n//go:noescape\nfunc VabdqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqU32 VabdqU32\n//go:noescape\nfunc VabdqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqF32 VabdqF32\n//go:noescape\nfunc VabdqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqF64 VabdqF64\n//go:noescape\nfunc VabdqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdsF32 VabdsF32\n//go:noescape\nfunc VabdsF32(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsS8 VabsS8\n//go:noescape\nfunc VabsS8(r *arm.Int8X8, v0 *arm.Int8X8)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsS16 VabsS16\n//go:noescape\nfunc VabsS16(r *arm.Int16X4, v0 *arm.Int16X4)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsS32 VabsS32\n//go:noescape\nfunc VabsS32(r *arm.Int32X2, v0 *arm.Int32X2)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsS64 VabsS64\n//go:noescape\nfunc VabsS64(r *arm.Int64X1, v0 *arm.Int64X1)\n\n// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsF32 VabsF32\n//go:noescape\nfunc VabsF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsF64 VabsF64\n//go:noescape\nfunc VabsF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsdS64 VabsdS64\n//go:noescape\nfunc VabsdS64(r *arm.Int64, v0 *arm.Int64)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsqS8 VabsqS8\n//go:noescape\nfunc VabsqS8(r *arm.Int8X16, v0 *arm.Int8X16)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsqS16 VabsqS16\n//go:noescape\nfunc VabsqS16(r *arm.Int16X8, v0 *arm.Int16X8)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsqS32 VabsqS32\n//go:noescape\nfunc VabsqS32(r *arm.Int32X4, v0 *arm.Int32X4)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsqS64 VabsqS64\n//go:noescape\nfunc VabsqS64(r *arm.Int64X2, v0 *arm.Int64X2)\n\n// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsqF32 VabsqF32\n//go:noescape\nfunc VabsqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsqF64 VabsqF64\n//go:noescape\nfunc VabsqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddS8 VaddS8\n//go:noescape\nfunc VaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddS16 VaddS16\n//go:noescape\nfunc VaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddS32 VaddS32\n//go:noescape\nfunc VaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddS64 VaddS64\n//go:noescape\nfunc VaddS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddU8 VaddU8\n//go:noescape\nfunc VaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddU16 VaddU16\n//go:noescape\nfunc VaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddU32 VaddU32\n//go:noescape\nfunc VaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddU64 VaddU64\n//go:noescape\nfunc VaddU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VaddF32 VaddF32\n//go:noescape\nfunc VaddF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VaddF64 VaddF64\n//go:noescape\nfunc VaddF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VaddP16 VaddP16\n//go:noescape\nfunc VaddP16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VaddP64 VaddP64\n//go:noescape\nfunc VaddP64(r *arm.Poly64X1, v0 *arm.Poly64X1, v1 *arm.Poly64X1)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VaddP8 VaddP8\n//go:noescape\nfunc VaddP8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VadddS64 VadddS64\n//go:noescape\nfunc VadddS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VadddU64 VadddU64\n//go:noescape\nfunc VadddU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)\n\n// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VaddhnS16 VaddhnS16\n//go:noescape\nfunc VaddhnS16(r *arm.Int8X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VaddhnS32 VaddhnS32\n//go:noescape\nfunc VaddhnS32(r *arm.Int16X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VaddhnS64 VaddhnS64\n//go:noescape\nfunc VaddhnS64(r *arm.Int32X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VaddhnU16 VaddhnU16\n//go:noescape\nfunc VaddhnU16(r *arm.Uint8X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VaddhnU32 VaddhnU32\n//go:noescape\nfunc VaddhnU32(r *arm.Uint16X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VaddhnU64 VaddhnU64\n//go:noescape\nfunc VaddhnU64(r *arm.Uint32X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VaddhnHighS16 VaddhnHighS16\n//go:noescape\nfunc VaddhnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VaddhnHighS32 VaddhnHighS32\n//go:noescape\nfunc VaddhnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VaddhnHighS64 VaddhnHighS64\n//go:noescape\nfunc VaddhnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2, v2 *arm.Int64X2)\n\n// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VaddhnHighU16 VaddhnHighU16\n//go:noescape\nfunc VaddhnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)\n\n// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VaddhnHighU32 VaddhnHighU32\n//go:noescape\nfunc VaddhnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VaddhnHighU64 VaddhnHighU64\n//go:noescape\nfunc VaddhnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)\n\n// Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VaddlS8 VaddlS8\n//go:noescape\nfunc VaddlS8(r *arm.Int16X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VaddlS16 VaddlS16\n//go:noescape\nfunc VaddlS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VaddlS32 VaddlS32\n//go:noescape\nfunc VaddlS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddlU8 VaddlU8\n//go:noescape\nfunc VaddlU8(r *arm.Uint16X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddlU16 VaddlU16\n//go:noescape\nfunc VaddlU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddlU32 VaddlU32\n//go:noescape\nfunc VaddlU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VaddlHighS8 VaddlHighS8\n//go:noescape\nfunc VaddlHighS8(r *arm.Int16X8, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VaddlHighS16 VaddlHighS16\n//go:noescape\nfunc VaddlHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VaddlHighS32 VaddlHighS32\n//go:noescape\nfunc VaddlHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddlHighU8 VaddlHighU8\n//go:noescape\nfunc VaddlHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddlHighU16 VaddlHighU16\n//go:noescape\nfunc VaddlHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddlHighU32 VaddlHighU32\n//go:noescape\nfunc VaddlHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VaddlvS8 VaddlvS8\n//go:noescape\nfunc VaddlvS8(r *arm.Int16, v0 *arm.Int8X8)\n\n// Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VaddlvS16 VaddlvS16\n//go:noescape\nfunc VaddlvS16(r *arm.Int32, v0 *arm.Int16X4)\n\n// Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VaddlvS32 VaddlvS32\n//go:noescape\nfunc VaddlvS32(r *arm.Int64, v0 *arm.Int32X2)\n\n// Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddlvU8 VaddlvU8\n//go:noescape\nfunc VaddlvU8(r *arm.Uint16, v0 *arm.Uint8X8)\n\n// Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddlvU16 VaddlvU16\n//go:noescape\nfunc VaddlvU16(r *arm.Uint32, v0 *arm.Uint16X4)\n\n// Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VaddlvU32 VaddlvU32\n//go:noescape\nfunc VaddlvU32(r *arm.Uint64, v0 *arm.Uint32X2)\n\n// Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VaddlvqS8 VaddlvqS8\n//go:noescape\nfunc VaddlvqS8(r *arm.Int16, v0 *arm.Int8X16)\n\n// Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VaddlvqS16 VaddlvqS16\n//go:noescape\nfunc VaddlvqS16(r *arm.Int32, v0 *arm.Int16X8)\n\n// Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VaddlvqS32 VaddlvqS32\n//go:noescape\nfunc VaddlvqS32(r *arm.Int64, v0 *arm.Int32X4)\n\n// Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddlvqU8 VaddlvqU8\n//go:noescape\nfunc VaddlvqU8(r *arm.Uint16, v0 *arm.Uint8X16)\n\n// Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddlvqU16 VaddlvqU16\n//go:noescape\nfunc VaddlvqU16(r *arm.Uint32, v0 *arm.Uint16X8)\n\n// Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddlvqU32 VaddlvqU32\n//go:noescape\nfunc VaddlvqU32(r *arm.Uint64, v0 *arm.Uint32X4)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqS8 VaddqS8\n//go:noescape\nfunc VaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqS16 VaddqS16\n//go:noescape\nfunc VaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqS32 VaddqS32\n//go:noescape\nfunc VaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqS64 VaddqS64\n//go:noescape\nfunc VaddqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqU8 VaddqU8\n//go:noescape\nfunc VaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqU16 VaddqU16\n//go:noescape\nfunc VaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqU32 VaddqU32\n//go:noescape\nfunc VaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqU64 VaddqU64\n//go:noescape\nfunc VaddqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VaddqF32 VaddqF32\n//go:noescape\nfunc VaddqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VaddqF64 VaddqF64\n//go:noescape\nfunc VaddqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VaddqP128 VaddqP128\n//go:noescape\nfunc VaddqP128(r *arm.Poly128, v0 *arm.Poly128, v1 *arm.Poly128)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VaddqP16 VaddqP16\n//go:noescape\nfunc VaddqP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VaddqP64 VaddqP64\n//go:noescape\nfunc VaddqP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VaddqP8 VaddqP8\n//go:noescape\nfunc VaddqP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvS8 VaddvS8\n//go:noescape\nfunc VaddvS8(r *arm.Int8, v0 *arm.Int8X8)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvS16 VaddvS16\n//go:noescape\nfunc VaddvS16(r *arm.Int16, v0 *arm.Int16X4)\n\n// Add across vector\n//\n//go:linkname VaddvS32 VaddvS32\n//go:noescape\nfunc VaddvS32(r *arm.Int32, v0 *arm.Int32X2)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvU8 VaddvU8\n//go:noescape\nfunc VaddvU8(r *arm.Uint8, v0 *arm.Uint8X8)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvU16 VaddvU16\n//go:noescape\nfunc VaddvU16(r *arm.Uint16, v0 *arm.Uint16X4)\n\n// Add across vector\n//\n//go:linkname VaddvU32 VaddvU32\n//go:noescape\nfunc VaddvU32(r *arm.Uint32, v0 *arm.Uint32X2)\n\n// Floating-point add across vector\n//\n//go:linkname VaddvF32 VaddvF32\n//go:noescape\nfunc VaddvF32(r *arm.Float32, v0 *arm.Float32X2)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvqS8 VaddvqS8\n//go:noescape\nfunc VaddvqS8(r *arm.Int8, v0 *arm.Int8X16)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvqS16 VaddvqS16\n//go:noescape\nfunc VaddvqS16(r *arm.Int16, v0 *arm.Int16X8)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvqS32 VaddvqS32\n//go:noescape\nfunc VaddvqS32(r *arm.Int32, v0 *arm.Int32X4)\n\n// Add across vector\n//\n//go:linkname VaddvqS64 VaddvqS64\n//go:noescape\nfunc VaddvqS64(r *arm.Int64, v0 *arm.Int64X2)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvqU8 VaddvqU8\n//go:noescape\nfunc VaddvqU8(r *arm.Uint8, v0 *arm.Uint8X16)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvqU16 VaddvqU16\n//go:noescape\nfunc VaddvqU16(r *arm.Uint16, v0 *arm.Uint16X8)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvqU32 VaddvqU32\n//go:noescape\nfunc VaddvqU32(r *arm.Uint32, v0 *arm.Uint32X4)\n\n// Add across vector\n//\n//go:linkname VaddvqU64 VaddvqU64\n//go:noescape\nfunc VaddvqU64(r *arm.Uint64, v0 *arm.Uint64X2)\n\n// Floating-point add across vector\n//\n//go:linkname VaddvqF32 VaddvqF32\n//go:noescape\nfunc VaddvqF32(r *arm.Float32, v0 *arm.Float32X4)\n\n// Floating-point add across vector\n//\n//go:linkname VaddvqF64 VaddvqF64\n//go:noescape\nfunc VaddvqF64(r *arm.Float64, v0 *arm.Float64X2)\n\n// Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register.\n//\n//go:linkname VaddwS8 VaddwS8\n//go:noescape\nfunc VaddwS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8)\n\n// Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register.\n//\n//go:linkname VaddwS16 VaddwS16\n//go:noescape\nfunc VaddwS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4)\n\n// Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register.\n//\n//go:linkname VaddwS32 VaddwS32\n//go:noescape\nfunc VaddwS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2)\n\n// Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddwU8 VaddwU8\n//go:noescape\nfunc VaddwU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8)\n\n// Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddwU16 VaddwU16\n//go:noescape\nfunc VaddwU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4)\n\n// Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddwU32 VaddwU32\n//go:noescape\nfunc VaddwU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2)\n\n// Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register.\n//\n//go:linkname VaddwHighS8 VaddwHighS8\n//go:noescape\nfunc VaddwHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16)\n\n// Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register.\n//\n//go:linkname VaddwHighS16 VaddwHighS16\n//go:noescape\nfunc VaddwHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8)\n\n// Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register.\n//\n//go:linkname VaddwHighS32 VaddwHighS32\n//go:noescape\nfunc VaddwHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4)\n\n// Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddwHighU8 VaddwHighU8\n//go:noescape\nfunc VaddwHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16)\n\n// Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddwHighU16 VaddwHighU16\n//go:noescape\nfunc VaddwHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8)\n\n// Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VaddwHighU32 VaddwHighU32\n//go:noescape\nfunc VaddwHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4)\n\n// AES single round decryption.\n//\n//go:linkname VaesdqU8 VaesdqU8\n//go:noescape\nfunc VaesdqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// AES single round encryption.\n//\n//go:linkname VaeseqU8 VaeseqU8\n//go:noescape\nfunc VaeseqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// AES inverse mix columns.\n//\n//go:linkname VaesimcqU8 VaesimcqU8\n//go:noescape\nfunc VaesimcqU8(r *arm.Uint8X16, v0 *arm.Uint8X16)\n\n// AES mix columns.\n//\n//go:linkname VaesmcqU8 VaesmcqU8\n//go:noescape\nfunc VaesmcqU8(r *arm.Uint8X16, v0 *arm.Uint8X16)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandS8 VandS8\n//go:noescape\nfunc VandS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandS16 VandS16\n//go:noescape\nfunc VandS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandS32 VandS32\n//go:noescape\nfunc VandS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandS64 VandS64\n//go:noescape\nfunc VandS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandU8 VandU8\n//go:noescape\nfunc VandU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandU16 VandU16\n//go:noescape\nfunc VandU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandU32 VandU32\n//go:noescape\nfunc VandU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandU64 VandU64\n//go:noescape\nfunc VandU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqS8 VandqS8\n//go:noescape\nfunc VandqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqS16 VandqS16\n//go:noescape\nfunc VandqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqS32 VandqS32\n//go:noescape\nfunc VandqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqS64 VandqS64\n//go:noescape\nfunc VandqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqU8 VandqU8\n//go:noescape\nfunc VandqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqU16 VandqU16\n//go:noescape\nfunc VandqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqU32 VandqU32\n//go:noescape\nfunc VandqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqU64 VandqU64\n//go:noescape\nfunc VandqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbcaxqS8 VbcaxqS8\n//go:noescape\nfunc VbcaxqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16)\n\n// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbcaxqS16 VbcaxqS16\n//go:noescape\nfunc VbcaxqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbcaxqS32 VbcaxqS32\n//go:noescape\nfunc VbcaxqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbcaxqS64 VbcaxqS64\n//go:noescape\nfunc VbcaxqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2, v2 *arm.Int64X2)\n\n// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbcaxqU8 VbcaxqU8\n//go:noescape\nfunc VbcaxqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16)\n\n// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbcaxqU16 VbcaxqU16\n//go:noescape\nfunc VbcaxqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)\n\n// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbcaxqU32 VbcaxqU32\n//go:noescape\nfunc VbcaxqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbcaxqU64 VbcaxqU64\n//go:noescape\nfunc VbcaxqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicS8 VbicS8\n//go:noescape\nfunc VbicS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicS16 VbicS16\n//go:noescape\nfunc VbicS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicS32 VbicS32\n//go:noescape\nfunc VbicS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicS64 VbicS64\n//go:noescape\nfunc VbicS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicU8 VbicU8\n//go:noescape\nfunc VbicU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicU16 VbicU16\n//go:noescape\nfunc VbicU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicU32 VbicU32\n//go:noescape\nfunc VbicU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicU64 VbicU64\n//go:noescape\nfunc VbicU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqS8 VbicqS8\n//go:noescape\nfunc VbicqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqS16 VbicqS16\n//go:noescape\nfunc VbicqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqS32 VbicqS32\n//go:noescape\nfunc VbicqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqS64 VbicqS64\n//go:noescape\nfunc VbicqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqU8 VbicqU8\n//go:noescape\nfunc VbicqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqU16 VbicqU16\n//go:noescape\nfunc VbicqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqU32 VbicqU32\n//go:noescape\nfunc VbicqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqU64 VbicqU64\n//go:noescape\nfunc VbicqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslS8 VbslS8\n//go:noescape\nfunc VbslS8(r *arm.Int8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8, v2 *arm.Int8X8)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslS16 VbslS16\n//go:noescape\nfunc VbslS16(r *arm.Int16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4, v2 *arm.Int16X4)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslS32 VbslS32\n//go:noescape\nfunc VbslS32(r *arm.Int32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2, v2 *arm.Int32X2)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslS64 VbslS64\n//go:noescape\nfunc VbslS64(r *arm.Int64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1, v2 *arm.Int64X1)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslU8 VbslU8\n//go:noescape\nfunc VbslU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslU16 VbslU16\n//go:noescape\nfunc VbslU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslU32 VbslU32\n//go:noescape\nfunc VbslU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslU64 VbslU64\n//go:noescape\nfunc VbslU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1, v2 *arm.Uint64X1)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslF32 VbslF32\n//go:noescape\nfunc VbslF32(r *arm.Float32X2, v0 *arm.Uint32X2, v1 *arm.Float32X2, v2 *arm.Float32X2)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslF64 VbslF64\n//go:noescape\nfunc VbslF64(r *arm.Float64X1, v0 *arm.Uint64X1, v1 *arm.Float64X1, v2 *arm.Float64X1)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslP16 VbslP16\n//go:noescape\nfunc VbslP16(r *arm.Poly16X4, v0 *arm.Uint16X4, v1 *arm.Poly16X4, v2 *arm.Poly16X4)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslP64 VbslP64\n//go:noescape\nfunc VbslP64(r *arm.Poly64X1, v0 *arm.Uint64X1, v1 *arm.Poly64X1, v2 *arm.Poly64X1)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslP8 VbslP8\n//go:noescape\nfunc VbslP8(r *arm.Poly8X8, v0 *arm.Uint8X8, v1 *arm.Poly8X8, v2 *arm.Poly8X8)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslqS8 VbslqS8\n//go:noescape\nfunc VbslqS8(r *arm.Int8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16, v2 *arm.Int8X16)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslqS16 VbslqS16\n//go:noescape\nfunc VbslqS16(r *arm.Int16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslqS32 VbslqS32\n//go:noescape\nfunc VbslqS32(r *arm.Int32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslqS64 VbslqS64\n//go:noescape\nfunc VbslqS64(r *arm.Int64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2, v2 *arm.Int64X2)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslqU8 VbslqU8\n//go:noescape\nfunc VbslqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslqU16 VbslqU16\n//go:noescape\nfunc VbslqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslqU32 VbslqU32\n//go:noescape\nfunc VbslqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslqU64 VbslqU64\n//go:noescape\nfunc VbslqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslqF32 VbslqF32\n//go:noescape\nfunc VbslqF32(r *arm.Float32X4, v0 *arm.Uint32X4, v1 *arm.Float32X4, v2 *arm.Float32X4)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslqF64 VbslqF64\n//go:noescape\nfunc VbslqF64(r *arm.Float64X2, v0 *arm.Uint64X2, v1 *arm.Float64X2, v2 *arm.Float64X2)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslqP16 VbslqP16\n//go:noescape\nfunc VbslqP16(r *arm.Poly16X8, v0 *arm.Uint16X8, v1 *arm.Poly16X8, v2 *arm.Poly16X8)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslqP64 VbslqP64\n//go:noescape\nfunc VbslqP64(r *arm.Poly64X2, v0 *arm.Uint64X2, v1 *arm.Poly64X2, v2 *arm.Poly64X2)\n\n// Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register.\n//\n//go:linkname VbslqP8 VbslqP8\n//go:noescape\nfunc VbslqP8(r *arm.Poly8X16, v0 *arm.Uint8X16, v1 *arm.Poly8X16, v2 *arm.Poly8X16)\n\n// Floating-point Complex Add.\n//\n//go:linkname VcaddRot270F32 VcaddRot270F32\n//go:noescape\nfunc VcaddRot270F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Complex Add.\n//\n//go:linkname VcaddRot90F32 VcaddRot90F32\n//go:noescape\nfunc VcaddRot90F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Complex Add.\n//\n//go:linkname VcaddqRot270F32 VcaddqRot270F32\n//go:noescape\nfunc VcaddqRot270F32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Complex Add.\n//\n//go:linkname VcaddqRot270F64 VcaddqRot270F64\n//go:noescape\nfunc VcaddqRot270F64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Complex Add.\n//\n//go:linkname VcaddqRot90F32 VcaddqRot90F32\n//go:noescape\nfunc VcaddqRot90F32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Complex Add.\n//\n//go:linkname VcaddqRot90F64 VcaddqRot90F64\n//go:noescape\nfunc VcaddqRot90F64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcageF32 VcageF32\n//go:noescape\nfunc VcageF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcageF64 VcageF64\n//go:noescape\nfunc VcageF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagedF64 VcagedF64\n//go:noescape\nfunc VcagedF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)\n\n// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcageqF32 VcageqF32\n//go:noescape\nfunc VcageqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcageqF64 VcageqF64\n//go:noescape\nfunc VcageqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagesF32 VcagesF32\n//go:noescape\nfunc VcagesF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)\n\n// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagtF32 VcagtF32\n//go:noescape\nfunc VcagtF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagtF64 VcagtF64\n//go:noescape\nfunc VcagtF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagtdF64 VcagtdF64\n//go:noescape\nfunc VcagtdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)\n\n// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagtqF32 VcagtqF32\n//go:noescape\nfunc VcagtqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagtqF64 VcagtqF64\n//go:noescape\nfunc VcagtqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagtsF32 VcagtsF32\n//go:noescape\nfunc VcagtsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)\n\n// Floating-point absolute compare less than or equal\n//\n//go:linkname VcaleF32 VcaleF32\n//go:noescape\nfunc VcaleF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point absolute compare less than or equal\n//\n//go:linkname VcaleF64 VcaleF64\n//go:noescape\nfunc VcaleF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Floating-point absolute compare less than or equal\n//\n//go:linkname VcaledF64 VcaledF64\n//go:noescape\nfunc VcaledF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)\n\n// Floating-point absolute compare less than or equal\n//\n//go:linkname VcaleqF32 VcaleqF32\n//go:noescape\nfunc VcaleqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point absolute compare less than or equal\n//\n//go:linkname VcaleqF64 VcaleqF64\n//go:noescape\nfunc VcaleqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point absolute compare less than or equal\n//\n//go:linkname VcalesF32 VcalesF32\n//go:noescape\nfunc VcalesF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)\n\n// Floating-point absolute compare less than\n//\n//go:linkname VcaltF32 VcaltF32\n//go:noescape\nfunc VcaltF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point absolute compare less than\n//\n//go:linkname VcaltF64 VcaltF64\n//go:noescape\nfunc VcaltF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Floating-point absolute compare less than\n//\n//go:linkname VcaltdF64 VcaltdF64\n//go:noescape\nfunc VcaltdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)\n\n// Floating-point absolute compare less than\n//\n//go:linkname VcaltqF32 VcaltqF32\n//go:noescape\nfunc VcaltqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point absolute compare less than\n//\n//go:linkname VcaltqF64 VcaltqF64\n//go:noescape\nfunc VcaltqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point absolute compare less than\n//\n//go:linkname VcaltsF32 VcaltsF32\n//go:noescape\nfunc VcaltsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqS8 VceqS8\n//go:noescape\nfunc VceqS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqS16 VceqS16\n//go:noescape\nfunc VceqS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqS32 VceqS32\n//go:noescape\nfunc VceqS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqS64 VceqS64\n//go:noescape\nfunc VceqS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqU8 VceqU8\n//go:noescape\nfunc VceqU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqU16 VceqU16\n//go:noescape\nfunc VceqU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqU32 VceqU32\n//go:noescape\nfunc VceqU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqU64 VceqU64\n//go:noescape\nfunc VceqU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqF32 VceqF32\n//go:noescape\nfunc VceqF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqF64 VceqF64\n//go:noescape\nfunc VceqF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqP64 VceqP64\n//go:noescape\nfunc VceqP64(r *arm.Uint64X1, v0 *arm.Poly64X1, v1 *arm.Poly64X1)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqP8 VceqP8\n//go:noescape\nfunc VceqP8(r *arm.Uint8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqdS64 VceqdS64\n//go:noescape\nfunc VceqdS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqdU64 VceqdU64\n//go:noescape\nfunc VceqdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)\n\n// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqdF64 VceqdF64\n//go:noescape\nfunc VceqdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqS8 VceqqS8\n//go:noescape\nfunc VceqqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqS16 VceqqS16\n//go:noescape\nfunc VceqqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqS32 VceqqS32\n//go:noescape\nfunc VceqqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqS64 VceqqS64\n//go:noescape\nfunc VceqqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqU8 VceqqU8\n//go:noescape\nfunc VceqqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqU16 VceqqU16\n//go:noescape\nfunc VceqqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqU32 VceqqU32\n//go:noescape\nfunc VceqqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqU64 VceqqU64\n//go:noescape\nfunc VceqqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqF32 VceqqF32\n//go:noescape\nfunc VceqqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqF64 VceqqF64\n//go:noescape\nfunc VceqqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqP64 VceqqP64\n//go:noescape\nfunc VceqqP64(r *arm.Uint64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqP8 VceqqP8\n//go:noescape\nfunc VceqqP8(r *arm.Uint8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)\n\n// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqsF32 VceqsF32\n//go:noescape\nfunc VceqsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzS8 VceqzS8\n//go:noescape\nfunc VceqzS8(r *arm.Uint8X8, v0 *arm.Int8X8)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzS16 VceqzS16\n//go:noescape\nfunc VceqzS16(r *arm.Uint16X4, v0 *arm.Int16X4)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzS32 VceqzS32\n//go:noescape\nfunc VceqzS32(r *arm.Uint32X2, v0 *arm.Int32X2)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzS64 VceqzS64\n//go:noescape\nfunc VceqzS64(r *arm.Uint64X1, v0 *arm.Int64X1)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzU8 VceqzU8\n//go:noescape\nfunc VceqzU8(r *arm.Uint8X8, v0 *arm.Uint8X8)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzU16 VceqzU16\n//go:noescape\nfunc VceqzU16(r *arm.Uint16X4, v0 *arm.Uint16X4)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzU32 VceqzU32\n//go:noescape\nfunc VceqzU32(r *arm.Uint32X2, v0 *arm.Uint32X2)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzU64 VceqzU64\n//go:noescape\nfunc VceqzU64(r *arm.Uint64X1, v0 *arm.Uint64X1)\n\n// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzF32 VceqzF32\n//go:noescape\nfunc VceqzF32(r *arm.Uint32X2, v0 *arm.Float32X2)\n\n// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzF64 VceqzF64\n//go:noescape\nfunc VceqzF64(r *arm.Uint64X1, v0 *arm.Float64X1)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzP64 VceqzP64\n//go:noescape\nfunc VceqzP64(r *arm.Uint64X1, v0 *arm.Poly64X1)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzP8 VceqzP8\n//go:noescape\nfunc VceqzP8(r *arm.Uint8X8, v0 *arm.Poly8X8)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzdS64 VceqzdS64\n//go:noescape\nfunc VceqzdS64(r *arm.Uint64, v0 *arm.Int64)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzdU64 VceqzdU64\n//go:noescape\nfunc VceqzdU64(r *arm.Uint64, v0 *arm.Uint64)\n\n// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzdF64 VceqzdF64\n//go:noescape\nfunc VceqzdF64(r *arm.Uint64, v0 *arm.Float64)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqS8 VceqzqS8\n//go:noescape\nfunc VceqzqS8(r *arm.Uint8X16, v0 *arm.Int8X16)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqS16 VceqzqS16\n//go:noescape\nfunc VceqzqS16(r *arm.Uint16X8, v0 *arm.Int16X8)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqS32 VceqzqS32\n//go:noescape\nfunc VceqzqS32(r *arm.Uint32X4, v0 *arm.Int32X4)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqS64 VceqzqS64\n//go:noescape\nfunc VceqzqS64(r *arm.Uint64X2, v0 *arm.Int64X2)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqU8 VceqzqU8\n//go:noescape\nfunc VceqzqU8(r *arm.Uint8X16, v0 *arm.Uint8X16)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqU16 VceqzqU16\n//go:noescape\nfunc VceqzqU16(r *arm.Uint16X8, v0 *arm.Uint16X8)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqU32 VceqzqU32\n//go:noescape\nfunc VceqzqU32(r *arm.Uint32X4, v0 *arm.Uint32X4)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqU64 VceqzqU64\n//go:noescape\nfunc VceqzqU64(r *arm.Uint64X2, v0 *arm.Uint64X2)\n\n// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqF32 VceqzqF32\n//go:noescape\nfunc VceqzqF32(r *arm.Uint32X4, v0 *arm.Float32X4)\n\n// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqF64 VceqzqF64\n//go:noescape\nfunc VceqzqF64(r *arm.Uint64X2, v0 *arm.Float64X2)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqP64 VceqzqP64\n//go:noescape\nfunc VceqzqP64(r *arm.Uint64X2, v0 *arm.Poly64X2)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqP8 VceqzqP8\n//go:noescape\nfunc VceqzqP8(r *arm.Uint8X16, v0 *arm.Poly8X16)\n\n// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzsF32 VceqzsF32\n//go:noescape\nfunc VceqzsF32(r *arm.Uint32, v0 *arm.Float32)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeS8 VcgeS8\n//go:noescape\nfunc VcgeS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeS16 VcgeS16\n//go:noescape\nfunc VcgeS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeS32 VcgeS32\n//go:noescape\nfunc VcgeS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeS64 VcgeS64\n//go:noescape\nfunc VcgeS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeU8 VcgeU8\n//go:noescape\nfunc VcgeU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeU16 VcgeU16\n//go:noescape\nfunc VcgeU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeU32 VcgeU32\n//go:noescape\nfunc VcgeU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeU64 VcgeU64\n//go:noescape\nfunc VcgeU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeF32 VcgeF32\n//go:noescape\nfunc VcgeF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeF64 VcgeF64\n//go:noescape\nfunc VcgeF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgedS64 VcgedS64\n//go:noescape\nfunc VcgedS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgedU64 VcgedU64\n//go:noescape\nfunc VcgedU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)\n\n// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgedF64 VcgedF64\n//go:noescape\nfunc VcgedF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqS8 VcgeqS8\n//go:noescape\nfunc VcgeqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqS16 VcgeqS16\n//go:noescape\nfunc VcgeqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqS32 VcgeqS32\n//go:noescape\nfunc VcgeqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqS64 VcgeqS64\n//go:noescape\nfunc VcgeqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqU8 VcgeqU8\n//go:noescape\nfunc VcgeqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqU16 VcgeqU16\n//go:noescape\nfunc VcgeqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqU32 VcgeqU32\n//go:noescape\nfunc VcgeqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqU64 VcgeqU64\n//go:noescape\nfunc VcgeqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqF32 VcgeqF32\n//go:noescape\nfunc VcgeqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqF64 VcgeqF64\n//go:noescape\nfunc VcgeqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgesF32 VcgesF32\n//go:noescape\nfunc VcgesF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezS8 VcgezS8\n//go:noescape\nfunc VcgezS8(r *arm.Uint8X8, v0 *arm.Int8X8)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezS16 VcgezS16\n//go:noescape\nfunc VcgezS16(r *arm.Uint16X4, v0 *arm.Int16X4)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezS32 VcgezS32\n//go:noescape\nfunc VcgezS32(r *arm.Uint32X2, v0 *arm.Int32X2)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezS64 VcgezS64\n//go:noescape\nfunc VcgezS64(r *arm.Uint64X1, v0 *arm.Int64X1)\n\n// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezF32 VcgezF32\n//go:noescape\nfunc VcgezF32(r *arm.Uint32X2, v0 *arm.Float32X2)\n\n// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezF64 VcgezF64\n//go:noescape\nfunc VcgezF64(r *arm.Uint64X1, v0 *arm.Float64X1)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezdS64 VcgezdS64\n//go:noescape\nfunc VcgezdS64(r *arm.Uint64, v0 *arm.Int64)\n\n// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezdF64 VcgezdF64\n//go:noescape\nfunc VcgezdF64(r *arm.Uint64, v0 *arm.Float64)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezqS8 VcgezqS8\n//go:noescape\nfunc VcgezqS8(r *arm.Uint8X16, v0 *arm.Int8X16)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezqS16 VcgezqS16\n//go:noescape\nfunc VcgezqS16(r *arm.Uint16X8, v0 *arm.Int16X8)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezqS32 VcgezqS32\n//go:noescape\nfunc VcgezqS32(r *arm.Uint32X4, v0 *arm.Int32X4)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezqS64 VcgezqS64\n//go:noescape\nfunc VcgezqS64(r *arm.Uint64X2, v0 *arm.Int64X2)\n\n// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezqF32 VcgezqF32\n//go:noescape\nfunc VcgezqF32(r *arm.Uint32X4, v0 *arm.Float32X4)\n\n// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezqF64 VcgezqF64\n//go:noescape\nfunc VcgezqF64(r *arm.Uint64X2, v0 *arm.Float64X2)\n\n// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezsF32 VcgezsF32\n//go:noescape\nfunc VcgezsF32(r *arm.Uint32, v0 *arm.Float32)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtS8 VcgtS8\n//go:noescape\nfunc VcgtS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtS16 VcgtS16\n//go:noescape\nfunc VcgtS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtS32 VcgtS32\n//go:noescape\nfunc VcgtS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtS64 VcgtS64\n//go:noescape\nfunc VcgtS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtU8 VcgtU8\n//go:noescape\nfunc VcgtU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtU16 VcgtU16\n//go:noescape\nfunc VcgtU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtU32 VcgtU32\n//go:noescape\nfunc VcgtU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtU64 VcgtU64\n//go:noescape\nfunc VcgtU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtF32 VcgtF32\n//go:noescape\nfunc VcgtF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtF64 VcgtF64\n//go:noescape\nfunc VcgtF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtdS64 VcgtdS64\n//go:noescape\nfunc VcgtdS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtdU64 VcgtdU64\n//go:noescape\nfunc VcgtdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)\n\n// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtdF64 VcgtdF64\n//go:noescape\nfunc VcgtdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqS8 VcgtqS8\n//go:noescape\nfunc VcgtqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqS16 VcgtqS16\n//go:noescape\nfunc VcgtqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqS32 VcgtqS32\n//go:noescape\nfunc VcgtqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqS64 VcgtqS64\n//go:noescape\nfunc VcgtqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqU8 VcgtqU8\n//go:noescape\nfunc VcgtqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqU16 VcgtqU16\n//go:noescape\nfunc VcgtqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqU32 VcgtqU32\n//go:noescape\nfunc VcgtqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqU64 VcgtqU64\n//go:noescape\nfunc VcgtqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqF32 VcgtqF32\n//go:noescape\nfunc VcgtqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqF64 VcgtqF64\n//go:noescape\nfunc VcgtqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtsF32 VcgtsF32\n//go:noescape\nfunc VcgtsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzS8 VcgtzS8\n//go:noescape\nfunc VcgtzS8(r *arm.Uint8X8, v0 *arm.Int8X8)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzS16 VcgtzS16\n//go:noescape\nfunc VcgtzS16(r *arm.Uint16X4, v0 *arm.Int16X4)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzS32 VcgtzS32\n//go:noescape\nfunc VcgtzS32(r *arm.Uint32X2, v0 *arm.Int32X2)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzS64 VcgtzS64\n//go:noescape\nfunc VcgtzS64(r *arm.Uint64X1, v0 *arm.Int64X1)\n\n// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzF32 VcgtzF32\n//go:noescape\nfunc VcgtzF32(r *arm.Uint32X2, v0 *arm.Float32X2)\n\n// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzF64 VcgtzF64\n//go:noescape\nfunc VcgtzF64(r *arm.Uint64X1, v0 *arm.Float64X1)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzdS64 VcgtzdS64\n//go:noescape\nfunc VcgtzdS64(r *arm.Uint64, v0 *arm.Int64)\n\n// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzdF64 VcgtzdF64\n//go:noescape\nfunc VcgtzdF64(r *arm.Uint64, v0 *arm.Float64)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzqS8 VcgtzqS8\n//go:noescape\nfunc VcgtzqS8(r *arm.Uint8X16, v0 *arm.Int8X16)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzqS16 VcgtzqS16\n//go:noescape\nfunc VcgtzqS16(r *arm.Uint16X8, v0 *arm.Int16X8)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzqS32 VcgtzqS32\n//go:noescape\nfunc VcgtzqS32(r *arm.Uint32X4, v0 *arm.Int32X4)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzqS64 VcgtzqS64\n//go:noescape\nfunc VcgtzqS64(r *arm.Uint64X2, v0 *arm.Int64X2)\n\n// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzqF32 VcgtzqF32\n//go:noescape\nfunc VcgtzqF32(r *arm.Uint32X4, v0 *arm.Float32X4)\n\n// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzqF64 VcgtzqF64\n//go:noescape\nfunc VcgtzqF64(r *arm.Uint64X2, v0 *arm.Float64X2)\n\n// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzsF32 VcgtzsF32\n//go:noescape\nfunc VcgtzsF32(r *arm.Uint32, v0 *arm.Float32)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleS8 VcleS8\n//go:noescape\nfunc VcleS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleS16 VcleS16\n//go:noescape\nfunc VcleS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleS32 VcleS32\n//go:noescape\nfunc VcleS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleS64 VcleS64\n//go:noescape\nfunc VcleS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleU8 VcleU8\n//go:noescape\nfunc VcleU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleU16 VcleU16\n//go:noescape\nfunc VcleU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleU32 VcleU32\n//go:noescape\nfunc VcleU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleU64 VcleU64\n//go:noescape\nfunc VcleU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Floating-point compare less than or equal\n//\n//go:linkname VcleF32 VcleF32\n//go:noescape\nfunc VcleF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point compare less than or equal\n//\n//go:linkname VcleF64 VcleF64\n//go:noescape\nfunc VcleF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Compare signed less than or equal\n//\n//go:linkname VcledS64 VcledS64\n//go:noescape\nfunc VcledS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcledU64 VcledU64\n//go:noescape\nfunc VcledU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)\n\n// Floating-point compare less than or equal\n//\n//go:linkname VcledF64 VcledF64\n//go:noescape\nfunc VcledF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleqS8 VcleqS8\n//go:noescape\nfunc VcleqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleqS16 VcleqS16\n//go:noescape\nfunc VcleqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleqS32 VcleqS32\n//go:noescape\nfunc VcleqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleqS64 VcleqS64\n//go:noescape\nfunc VcleqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleqU8 VcleqU8\n//go:noescape\nfunc VcleqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleqU16 VcleqU16\n//go:noescape\nfunc VcleqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleqU32 VcleqU32\n//go:noescape\nfunc VcleqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleqU64 VcleqU64\n//go:noescape\nfunc VcleqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Floating-point compare less than or equal\n//\n//go:linkname VcleqF32 VcleqF32\n//go:noescape\nfunc VcleqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point compare less than or equal\n//\n//go:linkname VcleqF64 VcleqF64\n//go:noescape\nfunc VcleqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point compare less than or equal\n//\n//go:linkname VclesF32 VclesF32\n//go:noescape\nfunc VclesF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezS8 VclezS8\n//go:noescape\nfunc VclezS8(r *arm.Uint8X8, v0 *arm.Int8X8)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezS16 VclezS16\n//go:noescape\nfunc VclezS16(r *arm.Uint16X4, v0 *arm.Int16X4)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezS32 VclezS32\n//go:noescape\nfunc VclezS32(r *arm.Uint32X2, v0 *arm.Int32X2)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezS64 VclezS64\n//go:noescape\nfunc VclezS64(r *arm.Uint64X1, v0 *arm.Int64X1)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezF32 VclezF32\n//go:noescape\nfunc VclezF32(r *arm.Uint32X2, v0 *arm.Float32X2)\n\n// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezF64 VclezF64\n//go:noescape\nfunc VclezF64(r *arm.Uint64X1, v0 *arm.Float64X1)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezdS64 VclezdS64\n//go:noescape\nfunc VclezdS64(r *arm.Uint64, v0 *arm.Int64)\n\n// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezdF64 VclezdF64\n//go:noescape\nfunc VclezdF64(r *arm.Uint64, v0 *arm.Float64)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezqS8 VclezqS8\n//go:noescape\nfunc VclezqS8(r *arm.Uint8X16, v0 *arm.Int8X16)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezqS16 VclezqS16\n//go:noescape\nfunc VclezqS16(r *arm.Uint16X8, v0 *arm.Int16X8)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezqS32 VclezqS32\n//go:noescape\nfunc VclezqS32(r *arm.Uint32X4, v0 *arm.Int32X4)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezqS64 VclezqS64\n//go:noescape\nfunc VclezqS64(r *arm.Uint64X2, v0 *arm.Int64X2)\n\n// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezqF32 VclezqF32\n//go:noescape\nfunc VclezqF32(r *arm.Uint32X4, v0 *arm.Float32X4)\n\n// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezqF64 VclezqF64\n//go:noescape\nfunc VclezqF64(r *arm.Uint64X2, v0 *arm.Float64X2)\n\n// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezsF32 VclezsF32\n//go:noescape\nfunc VclezsF32(r *arm.Uint32, v0 *arm.Float32)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsS8 VclsS8\n//go:noescape\nfunc VclsS8(r *arm.Int8X8, v0 *arm.Int8X8)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsS16 VclsS16\n//go:noescape\nfunc VclsS16(r *arm.Int16X4, v0 *arm.Int16X4)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsS32 VclsS32\n//go:noescape\nfunc VclsS32(r *arm.Int32X2, v0 *arm.Int32X2)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsU8 VclsU8\n//go:noescape\nfunc VclsU8(r *arm.Int8X8, v0 *arm.Uint8X8)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsU16 VclsU16\n//go:noescape\nfunc VclsU16(r *arm.Int16X4, v0 *arm.Uint16X4)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsU32 VclsU32\n//go:noescape\nfunc VclsU32(r *arm.Int32X2, v0 *arm.Uint32X2)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsqS8 VclsqS8\n//go:noescape\nfunc VclsqS8(r *arm.Int8X16, v0 *arm.Int8X16)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsqS16 VclsqS16\n//go:noescape\nfunc VclsqS16(r *arm.Int16X8, v0 *arm.Int16X8)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsqS32 VclsqS32\n//go:noescape\nfunc VclsqS32(r *arm.Int32X4, v0 *arm.Int32X4)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsqU8 VclsqU8\n//go:noescape\nfunc VclsqU8(r *arm.Int8X16, v0 *arm.Uint8X16)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsqU16 VclsqU16\n//go:noescape\nfunc VclsqU16(r *arm.Int16X8, v0 *arm.Uint16X8)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsqU32 VclsqU32\n//go:noescape\nfunc VclsqU32(r *arm.Int32X4, v0 *arm.Uint32X4)\n\n// Compare signed less than\n//\n//go:linkname VcltS8 VcltS8\n//go:noescape\nfunc VcltS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Compare signed less than\n//\n//go:linkname VcltS16 VcltS16\n//go:noescape\nfunc VcltS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Compare signed less than\n//\n//go:linkname VcltS32 VcltS32\n//go:noescape\nfunc VcltS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Compare signed less than\n//\n//go:linkname VcltS64 VcltS64\n//go:noescape\nfunc VcltS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Compare unsigned less than\n//\n//go:linkname VcltU8 VcltU8\n//go:noescape\nfunc VcltU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Compare unsigned less than\n//\n//go:linkname VcltU16 VcltU16\n//go:noescape\nfunc VcltU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Compare unsigned less than\n//\n//go:linkname VcltU32 VcltU32\n//go:noescape\nfunc VcltU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Compare unsigned less than\n//\n//go:linkname VcltU64 VcltU64\n//go:noescape\nfunc VcltU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Floating-point compare less than\n//\n//go:linkname VcltF32 VcltF32\n//go:noescape\nfunc VcltF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point compare less than\n//\n//go:linkname VcltF64 VcltF64\n//go:noescape\nfunc VcltF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Compare signed less than\n//\n//go:linkname VcltdS64 VcltdS64\n//go:noescape\nfunc VcltdS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64)\n\n// Compare unsigned less than\n//\n//go:linkname VcltdU64 VcltdU64\n//go:noescape\nfunc VcltdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)\n\n// Floating-point compare less than\n//\n//go:linkname VcltdF64 VcltdF64\n//go:noescape\nfunc VcltdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64)\n\n// Compare signed less than\n//\n//go:linkname VcltqS8 VcltqS8\n//go:noescape\nfunc VcltqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Compare signed less than\n//\n//go:linkname VcltqS16 VcltqS16\n//go:noescape\nfunc VcltqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Compare signed less than\n//\n//go:linkname VcltqS32 VcltqS32\n//go:noescape\nfunc VcltqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Compare signed less than\n//\n//go:linkname VcltqS64 VcltqS64\n//go:noescape\nfunc VcltqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Compare unsigned less than\n//\n//go:linkname VcltqU8 VcltqU8\n//go:noescape\nfunc VcltqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Compare unsigned less than\n//\n//go:linkname VcltqU16 VcltqU16\n//go:noescape\nfunc VcltqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Compare unsigned less than\n//\n//go:linkname VcltqU32 VcltqU32\n//go:noescape\nfunc VcltqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Compare unsigned less than\n//\n//go:linkname VcltqU64 VcltqU64\n//go:noescape\nfunc VcltqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Floating-point compare less than\n//\n//go:linkname VcltqF32 VcltqF32\n//go:noescape\nfunc VcltqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point compare less than\n//\n//go:linkname VcltqF64 VcltqF64\n//go:noescape\nfunc VcltqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point compare less than\n//\n//go:linkname VcltsF32 VcltsF32\n//go:noescape\nfunc VcltsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzS8 VcltzS8\n//go:noescape\nfunc VcltzS8(r *arm.Uint8X8, v0 *arm.Int8X8)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzS16 VcltzS16\n//go:noescape\nfunc VcltzS16(r *arm.Uint16X4, v0 *arm.Int16X4)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzS32 VcltzS32\n//go:noescape\nfunc VcltzS32(r *arm.Uint32X2, v0 *arm.Int32X2)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzS64 VcltzS64\n//go:noescape\nfunc VcltzS64(r *arm.Uint64X1, v0 *arm.Int64X1)\n\n// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzF32 VcltzF32\n//go:noescape\nfunc VcltzF32(r *arm.Uint32X2, v0 *arm.Float32X2)\n\n// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzF64 VcltzF64\n//go:noescape\nfunc VcltzF64(r *arm.Uint64X1, v0 *arm.Float64X1)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzdS64 VcltzdS64\n//go:noescape\nfunc VcltzdS64(r *arm.Uint64, v0 *arm.Int64)\n\n// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzdF64 VcltzdF64\n//go:noescape\nfunc VcltzdF64(r *arm.Uint64, v0 *arm.Float64)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzqS8 VcltzqS8\n//go:noescape\nfunc VcltzqS8(r *arm.Uint8X16, v0 *arm.Int8X16)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzqS16 VcltzqS16\n//go:noescape\nfunc VcltzqS16(r *arm.Uint16X8, v0 *arm.Int16X8)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzqS32 VcltzqS32\n//go:noescape\nfunc VcltzqS32(r *arm.Uint32X4, v0 *arm.Int32X4)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzqS64 VcltzqS64\n//go:noescape\nfunc VcltzqS64(r *arm.Uint64X2, v0 *arm.Int64X2)\n\n// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzqF32 VcltzqF32\n//go:noescape\nfunc VcltzqF32(r *arm.Uint32X4, v0 *arm.Float32X4)\n\n// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzqF64 VcltzqF64\n//go:noescape\nfunc VcltzqF64(r *arm.Uint64X2, v0 *arm.Float64X2)\n\n// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzsF32 VcltzsF32\n//go:noescape\nfunc VcltzsF32(r *arm.Uint32, v0 *arm.Float32)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzS8 VclzS8\n//go:noescape\nfunc VclzS8(r *arm.Int8X8, v0 *arm.Int8X8)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzS16 VclzS16\n//go:noescape\nfunc VclzS16(r *arm.Int16X4, v0 *arm.Int16X4)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzS32 VclzS32\n//go:noescape\nfunc VclzS32(r *arm.Int32X2, v0 *arm.Int32X2)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzU8 VclzU8\n//go:noescape\nfunc VclzU8(r *arm.Uint8X8, v0 *arm.Uint8X8)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzU16 VclzU16\n//go:noescape\nfunc VclzU16(r *arm.Uint16X4, v0 *arm.Uint16X4)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzU32 VclzU32\n//go:noescape\nfunc VclzU32(r *arm.Uint32X2, v0 *arm.Uint32X2)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzqS8 VclzqS8\n//go:noescape\nfunc VclzqS8(r *arm.Int8X16, v0 *arm.Int8X16)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzqS16 VclzqS16\n//go:noescape\nfunc VclzqS16(r *arm.Int16X8, v0 *arm.Int16X8)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzqS32 VclzqS32\n//go:noescape\nfunc VclzqS32(r *arm.Int32X4, v0 *arm.Int32X4)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzqU8 VclzqU8\n//go:noescape\nfunc VclzqU8(r *arm.Uint8X16, v0 *arm.Uint8X16)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzqU16 VclzqU16\n//go:noescape\nfunc VclzqU16(r *arm.Uint16X8, v0 *arm.Uint16X8)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzqU32 VclzqU32\n//go:noescape\nfunc VclzqU32(r *arm.Uint32X4, v0 *arm.Uint32X4)\n\n// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VcntS8 VcntS8\n//go:noescape\nfunc VcntS8(r *arm.Int8X8, v0 *arm.Int8X8)\n\n// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VcntU8 VcntU8\n//go:noescape\nfunc VcntU8(r *arm.Uint8X8, v0 *arm.Uint8X8)\n\n// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VcntP8 VcntP8\n//go:noescape\nfunc VcntP8(r *arm.Poly8X8, v0 *arm.Poly8X8)\n\n// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VcntqS8 VcntqS8\n//go:noescape\nfunc VcntqS8(r *arm.Int8X16, v0 *arm.Int8X16)\n\n// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VcntqU8 VcntqU8\n//go:noescape\nfunc VcntqU8(r *arm.Uint8X16, v0 *arm.Uint8X16)\n\n// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VcntqP8 VcntqP8\n//go:noescape\nfunc VcntqP8(r *arm.Poly8X16, v0 *arm.Poly8X16)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineS8 VcombineS8\n//go:noescape\nfunc VcombineS8(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineS16 VcombineS16\n//go:noescape\nfunc VcombineS16(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineS32 VcombineS32\n//go:noescape\nfunc VcombineS32(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineS64 VcombineS64\n//go:noescape\nfunc VcombineS64(r *arm.Int64X2, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineU8 VcombineU8\n//go:noescape\nfunc VcombineU8(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineU16 VcombineU16\n//go:noescape\nfunc VcombineU16(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineU32 VcombineU32\n//go:noescape\nfunc VcombineU32(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineU64 VcombineU64\n//go:noescape\nfunc VcombineU64(r *arm.Uint64X2, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineF32 VcombineF32\n//go:noescape\nfunc VcombineF32(r *arm.Float32X4, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineF64 VcombineF64\n//go:noescape\nfunc VcombineF64(r *arm.Float64X2, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineP16 VcombineP16\n//go:noescape\nfunc VcombineP16(r *arm.Poly16X8, v0 *arm.Poly16X4, v1 *arm.Poly16X4)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineP64 VcombineP64\n//go:noescape\nfunc VcombineP64(r *arm.Poly64X2, v0 *arm.Poly64X1, v1 *arm.Poly64X1)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineP8 VcombineP8\n//go:noescape\nfunc VcombineP8(r *arm.Poly8X16, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtF32S32 VcvtF32S32\n//go:noescape\nfunc VcvtF32S32(r *arm.Float32X2, v0 *arm.Int32X2)\n\n// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtF32U32 VcvtF32U32\n//go:noescape\nfunc VcvtF32U32(r *arm.Float32X2, v0 *arm.Uint32X2)\n\n// Floating-point Convert to lower precision Narrow (vector). This instruction reads each vector element in the SIMD&FP source register, converts each result to half the precision of the source element, writes the final result to a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. The rounding mode is determined by the FPCR.\n//\n//go:linkname VcvtF32F64 VcvtF32F64\n//go:noescape\nfunc VcvtF32F64(r *arm.Float32X2, v0 *arm.Float64X2)\n\n// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtF64S64 VcvtF64S64\n//go:noescape\nfunc VcvtF64S64(r *arm.Float64X1, v0 *arm.Int64X1)\n\n// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtF64U64 VcvtF64U64\n//go:noescape\nfunc VcvtF64U64(r *arm.Float64X1, v0 *arm.Uint64X1)\n\n// Floating-point Convert to higher precision Long (vector). This instruction reads each element in a vector in the SIMD&FP source register, converts each value to double the precision of the source element using the rounding mode that is determined by the FPCR, and writes each result to the equivalent element of the vector in the SIMD&FP destination register.\n//\n//go:linkname VcvtF64F32 VcvtF64F32\n//go:noescape\nfunc VcvtF64F32(r *arm.Float64X2, v0 *arm.Float32X2)\n\n// Floating-point Convert to lower precision Narrow (vector). This instruction reads each vector element in the SIMD&FP source register, converts each result to half the precision of the source element, writes the final result to a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. The rounding mode is determined by the FPCR.\n//\n//go:linkname VcvtHighF32F64 VcvtHighF32F64\n//go:noescape\nfunc VcvtHighF32F64(r *arm.Float32X4, v0 *arm.Float32X2, v1 *arm.Float64X2)\n\n// Floating-point Convert to higher precision Long (vector). This instruction reads each element in a vector in the SIMD&FP source register, converts each value to double the precision of the source element using the rounding mode that is determined by the FPCR, and writes each result to the equivalent element of the vector in the SIMD&FP destination register.\n//\n//go:linkname VcvtHighF64F32 VcvtHighF64F32\n//go:noescape\nfunc VcvtHighF64F32(r *arm.Float64X2, v0 *arm.Float32X4)\n\n// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtS32F32 VcvtS32F32\n//go:noescape\nfunc VcvtS32F32(r *arm.Int32X2, v0 *arm.Float32X2)\n\n// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtS64F64 VcvtS64F64\n//go:noescape\nfunc VcvtS64F64(r *arm.Int64X1, v0 *arm.Float64X1)\n\n// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.\n//\n//go:linkname VcvtU32F32 VcvtU32F32\n//go:noescape\nfunc VcvtU32F32(r *arm.Uint32X2, v0 *arm.Float32X2)\n\n// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.\n//\n//go:linkname VcvtU64F64 VcvtU64F64\n//go:noescape\nfunc VcvtU64F64(r *arm.Uint64X1, v0 *arm.Float64X1)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaS32F32 VcvtaS32F32\n//go:noescape\nfunc VcvtaS32F32(r *arm.Int32X2, v0 *arm.Float32X2)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaS64F64 VcvtaS64F64\n//go:noescape\nfunc VcvtaS64F64(r *arm.Int64X1, v0 *arm.Float64X1)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaU32F32 VcvtaU32F32\n//go:noescape\nfunc VcvtaU32F32(r *arm.Uint32X2, v0 *arm.Float32X2)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaU64F64 VcvtaU64F64\n//go:noescape\nfunc VcvtaU64F64(r *arm.Uint64X1, v0 *arm.Float64X1)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtadS64F64 VcvtadS64F64\n//go:noescape\nfunc VcvtadS64F64(r *arm.Int64, v0 *arm.Float64)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtadU64F64 VcvtadU64F64\n//go:noescape\nfunc VcvtadU64F64(r *arm.Uint64, v0 *arm.Float64)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaqS32F32 VcvtaqS32F32\n//go:noescape\nfunc VcvtaqS32F32(r *arm.Int32X4, v0 *arm.Float32X4)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaqS64F64 VcvtaqS64F64\n//go:noescape\nfunc VcvtaqS64F64(r *arm.Int64X2, v0 *arm.Float64X2)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaqU32F32 VcvtaqU32F32\n//go:noescape\nfunc VcvtaqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaqU64F64 VcvtaqU64F64\n//go:noescape\nfunc VcvtaqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtasS32F32 VcvtasS32F32\n//go:noescape\nfunc VcvtasS32F32(r *arm.Int32, v0 *arm.Float32)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtasU32F32 VcvtasU32F32\n//go:noescape\nfunc VcvtasU32F32(r *arm.Uint32, v0 *arm.Float32)\n\n// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtdF64S64 VcvtdF64S64\n//go:noescape\nfunc VcvtdF64S64(r *arm.Float64, v0 *arm.Int64)\n\n// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtdF64U64 VcvtdF64U64\n//go:noescape\nfunc VcvtdF64U64(r *arm.Float64, v0 *arm.Uint64)\n\n// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtdS64F64 VcvtdS64F64\n//go:noescape\nfunc VcvtdS64F64(r *arm.Int64, v0 *arm.Float64)\n\n// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.\n//\n//go:linkname VcvtdU64F64 VcvtdU64F64\n//go:noescape\nfunc VcvtdU64F64(r *arm.Uint64, v0 *arm.Float64)\n\n// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmS32F32 VcvtmS32F32\n//go:noescape\nfunc VcvtmS32F32(r *arm.Int32X2, v0 *arm.Float32X2)\n\n// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmS64F64 VcvtmS64F64\n//go:noescape\nfunc VcvtmS64F64(r *arm.Int64X1, v0 *arm.Float64X1)\n\n// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmU32F32 VcvtmU32F32\n//go:noescape\nfunc VcvtmU32F32(r *arm.Uint32X2, v0 *arm.Float32X2)\n\n// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmU64F64 VcvtmU64F64\n//go:noescape\nfunc VcvtmU64F64(r *arm.Uint64X1, v0 *arm.Float64X1)\n\n// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmdS64F64 VcvtmdS64F64\n//go:noescape\nfunc VcvtmdS64F64(r *arm.Int64, v0 *arm.Float64)\n\n// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmdU64F64 VcvtmdU64F64\n//go:noescape\nfunc VcvtmdU64F64(r *arm.Uint64, v0 *arm.Float64)\n\n// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmqS32F32 VcvtmqS32F32\n//go:noescape\nfunc VcvtmqS32F32(r *arm.Int32X4, v0 *arm.Float32X4)\n\n// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmqS64F64 VcvtmqS64F64\n//go:noescape\nfunc VcvtmqS64F64(r *arm.Int64X2, v0 *arm.Float64X2)\n\n// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmqU32F32 VcvtmqU32F32\n//go:noescape\nfunc VcvtmqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4)\n\n// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmqU64F64 VcvtmqU64F64\n//go:noescape\nfunc VcvtmqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2)\n\n// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmsS32F32 VcvtmsS32F32\n//go:noescape\nfunc VcvtmsS32F32(r *arm.Int32, v0 *arm.Float32)\n\n// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmsU32F32 VcvtmsU32F32\n//go:noescape\nfunc VcvtmsU32F32(r *arm.Uint32, v0 *arm.Float32)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnS32F32 VcvtnS32F32\n//go:noescape\nfunc VcvtnS32F32(r *arm.Int32X2, v0 *arm.Float32X2)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnS64F64 VcvtnS64F64\n//go:noescape\nfunc VcvtnS64F64(r *arm.Int64X1, v0 *arm.Float64X1)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnU32F32 VcvtnU32F32\n//go:noescape\nfunc VcvtnU32F32(r *arm.Uint32X2, v0 *arm.Float32X2)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnU64F64 VcvtnU64F64\n//go:noescape\nfunc VcvtnU64F64(r *arm.Uint64X1, v0 *arm.Float64X1)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtndS64F64 VcvtndS64F64\n//go:noescape\nfunc VcvtndS64F64(r *arm.Int64, v0 *arm.Float64)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtndU64F64 VcvtndU64F64\n//go:noescape\nfunc VcvtndU64F64(r *arm.Uint64, v0 *arm.Float64)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnqS32F32 VcvtnqS32F32\n//go:noescape\nfunc VcvtnqS32F32(r *arm.Int32X4, v0 *arm.Float32X4)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnqS64F64 VcvtnqS64F64\n//go:noescape\nfunc VcvtnqS64F64(r *arm.Int64X2, v0 *arm.Float64X2)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnqU32F32 VcvtnqU32F32\n//go:noescape\nfunc VcvtnqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnqU64F64 VcvtnqU64F64\n//go:noescape\nfunc VcvtnqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnsS32F32 VcvtnsS32F32\n//go:noescape\nfunc VcvtnsS32F32(r *arm.Int32, v0 *arm.Float32)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnsU32F32 VcvtnsU32F32\n//go:noescape\nfunc VcvtnsU32F32(r *arm.Uint32, v0 *arm.Float32)\n\n// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpS32F32 VcvtpS32F32\n//go:noescape\nfunc VcvtpS32F32(r *arm.Int32X2, v0 *arm.Float32X2)\n\n// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpS64F64 VcvtpS64F64\n//go:noescape\nfunc VcvtpS64F64(r *arm.Int64X1, v0 *arm.Float64X1)\n\n// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpU32F32 VcvtpU32F32\n//go:noescape\nfunc VcvtpU32F32(r *arm.Uint32X2, v0 *arm.Float32X2)\n\n// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpU64F64 VcvtpU64F64\n//go:noescape\nfunc VcvtpU64F64(r *arm.Uint64X1, v0 *arm.Float64X1)\n\n// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpdS64F64 VcvtpdS64F64\n//go:noescape\nfunc VcvtpdS64F64(r *arm.Int64, v0 *arm.Float64)\n\n// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpdU64F64 VcvtpdU64F64\n//go:noescape\nfunc VcvtpdU64F64(r *arm.Uint64, v0 *arm.Float64)\n\n// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpqS32F32 VcvtpqS32F32\n//go:noescape\nfunc VcvtpqS32F32(r *arm.Int32X4, v0 *arm.Float32X4)\n\n// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpqS64F64 VcvtpqS64F64\n//go:noescape\nfunc VcvtpqS64F64(r *arm.Int64X2, v0 *arm.Float64X2)\n\n// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpqU32F32 VcvtpqU32F32\n//go:noescape\nfunc VcvtpqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4)\n\n// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpqU64F64 VcvtpqU64F64\n//go:noescape\nfunc VcvtpqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2)\n\n// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpsS32F32 VcvtpsS32F32\n//go:noescape\nfunc VcvtpsS32F32(r *arm.Int32, v0 *arm.Float32)\n\n// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpsU32F32 VcvtpsU32F32\n//go:noescape\nfunc VcvtpsU32F32(r *arm.Uint32, v0 *arm.Float32)\n\n// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtqF32S32 VcvtqF32S32\n//go:noescape\nfunc VcvtqF32S32(r *arm.Float32X4, v0 *arm.Int32X4)\n\n// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtqF32U32 VcvtqF32U32\n//go:noescape\nfunc VcvtqF32U32(r *arm.Float32X4, v0 *arm.Uint32X4)\n\n// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtqF64S64 VcvtqF64S64\n//go:noescape\nfunc VcvtqF64S64(r *arm.Float64X2, v0 *arm.Int64X2)\n\n// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtqF64U64 VcvtqF64U64\n//go:noescape\nfunc VcvtqF64U64(r *arm.Float64X2, v0 *arm.Uint64X2)\n\n// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtqS32F32 VcvtqS32F32\n//go:noescape\nfunc VcvtqS32F32(r *arm.Int32X4, v0 *arm.Float32X4)\n\n// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtqS64F64 VcvtqS64F64\n//go:noescape\nfunc VcvtqS64F64(r *arm.Int64X2, v0 *arm.Float64X2)\n\n// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.\n//\n//go:linkname VcvtqU32F32 VcvtqU32F32\n//go:noescape\nfunc VcvtqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4)\n\n// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.\n//\n//go:linkname VcvtqU64F64 VcvtqU64F64\n//go:noescape\nfunc VcvtqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2)\n\n// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtsF32S32 VcvtsF32S32\n//go:noescape\nfunc VcvtsF32S32(r *arm.Float32, v0 *arm.Int32)\n\n// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtsF32U32 VcvtsF32U32\n//go:noescape\nfunc VcvtsF32U32(r *arm.Float32, v0 *arm.Uint32)\n\n// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtsS32F32 VcvtsS32F32\n//go:noescape\nfunc VcvtsS32F32(r *arm.Int32, v0 *arm.Float32)\n\n// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.\n//\n//go:linkname VcvtsU32F32 VcvtsU32F32\n//go:noescape\nfunc VcvtsU32F32(r *arm.Uint32, v0 *arm.Float32)\n\n// Floating-point Convert to lower precision Narrow, rounding to odd (vector). This instruction reads each vector element in the source SIMD&FP register, narrows each value to half the precision of the source element using the Round to Odd rounding mode, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VcvtxF32F64 VcvtxF32F64\n//go:noescape\nfunc VcvtxF32F64(r *arm.Float32X2, v0 *arm.Float64X2)\n\n// Floating-point Convert to lower precision Narrow, rounding to odd (vector). This instruction reads each vector element in the source SIMD&FP register, narrows each value to half the precision of the source element using the Round to Odd rounding mode, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VcvtxHighF32F64 VcvtxHighF32F64\n//go:noescape\nfunc VcvtxHighF32F64(r *arm.Float32X4, v0 *arm.Float32X2, v1 *arm.Float64X2)\n\n// Floating-point Convert to lower precision Narrow, rounding to odd (vector). This instruction reads each vector element in the source SIMD&FP register, narrows each value to half the precision of the source element using the Round to Odd rounding mode, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VcvtxdF32F64 VcvtxdF32F64\n//go:noescape\nfunc VcvtxdF32F64(r *arm.Float32, v0 *arm.Float64)\n\n// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VdivF32 VdivF32\n//go:noescape\nfunc VdivF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VdivF64 VdivF64\n//go:noescape\nfunc VdivF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VdivqF32 VdivqF32\n//go:noescape\nfunc VdivqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VdivqF64 VdivqF64\n//go:noescape\nfunc VdivqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Dot Product signed arithmetic (vector). This instruction performs the dot product of the four signed 8-bit elements in each 32-bit element of the first source register with the four signed 8-bit elements of the corresponding 32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the destination register.\n//\n//go:linkname VdotS32 VdotS32\n//go:noescape\nfunc VdotS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int8X8, v2 *arm.Int8X8)\n\n// Dot Product unsigned arithmetic (vector). This instruction performs the dot product of the four unsigned 8-bit elements in each 32-bit element of the first source register with the four unsigned 8-bit elements of the corresponding 32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the destination register.\n//\n//go:linkname VdotU32 VdotU32\n//go:noescape\nfunc VdotU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint8X8, v2 *arm.Uint8X8)\n\n// Dot Product signed arithmetic (vector). This instruction performs the dot product of the four signed 8-bit elements in each 32-bit element of the first source register with the four signed 8-bit elements of the corresponding 32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the destination register.\n//\n//go:linkname VdotqS32 VdotqS32\n//go:noescape\nfunc VdotqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int8X16, v2 *arm.Int8X16)\n\n// Dot Product unsigned arithmetic (vector). This instruction performs the dot product of the four unsigned 8-bit elements in each 32-bit element of the first source register with the four unsigned 8-bit elements of the corresponding 32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the destination register.\n//\n//go:linkname VdotqU32 VdotqU32\n//go:noescape\nfunc VdotqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint8X16, v2 *arm.Uint8X16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNS8 VdupNS8\n//go:noescape\nfunc VdupNS8(r *arm.Int8X8, v0 *arm.Int8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNS16 VdupNS16\n//go:noescape\nfunc VdupNS16(r *arm.Int16X4, v0 *arm.Int16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNS32 VdupNS32\n//go:noescape\nfunc VdupNS32(r *arm.Int32X2, v0 *arm.Int32)\n\n// Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register.\n//\n//go:linkname VdupNS64 VdupNS64\n//go:noescape\nfunc VdupNS64(r *arm.Int64X1, v0 *arm.Int64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNU8 VdupNU8\n//go:noescape\nfunc VdupNU8(r *arm.Uint8X8, v0 *arm.Uint8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNU16 VdupNU16\n//go:noescape\nfunc VdupNU16(r *arm.Uint16X4, v0 *arm.Uint16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNU32 VdupNU32\n//go:noescape\nfunc VdupNU32(r *arm.Uint32X2, v0 *arm.Uint32)\n\n// Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register.\n//\n//go:linkname VdupNU64 VdupNU64\n//go:noescape\nfunc VdupNU64(r *arm.Uint64X1, v0 *arm.Uint64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNF32 VdupNF32\n//go:noescape\nfunc VdupNF32(r *arm.Float32X2, v0 *arm.Float32)\n\n// Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register.\n//\n//go:linkname VdupNF64 VdupNF64\n//go:noescape\nfunc VdupNF64(r *arm.Float64X1, v0 *arm.Float64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNP16 VdupNP16\n//go:noescape\nfunc VdupNP16(r *arm.Poly16X4, v0 *arm.Poly16)\n\n// Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register.\n//\n//go:linkname VdupNP64 VdupNP64\n//go:noescape\nfunc VdupNP64(r *arm.Poly64X1, v0 *arm.Poly64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNP8 VdupNP8\n//go:noescape\nfunc VdupNP8(r *arm.Poly8X8, v0 *arm.Poly8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNS8 VdupqNS8\n//go:noescape\nfunc VdupqNS8(r *arm.Int8X16, v0 *arm.Int8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNS16 VdupqNS16\n//go:noescape\nfunc VdupqNS16(r *arm.Int16X8, v0 *arm.Int16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNS32 VdupqNS32\n//go:noescape\nfunc VdupqNS32(r *arm.Int32X4, v0 *arm.Int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNS64 VdupqNS64\n//go:noescape\nfunc VdupqNS64(r *arm.Int64X2, v0 *arm.Int64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNU8 VdupqNU8\n//go:noescape\nfunc VdupqNU8(r *arm.Uint8X16, v0 *arm.Uint8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNU16 VdupqNU16\n//go:noescape\nfunc VdupqNU16(r *arm.Uint16X8, v0 *arm.Uint16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNU32 VdupqNU32\n//go:noescape\nfunc VdupqNU32(r *arm.Uint32X4, v0 *arm.Uint32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNU64 VdupqNU64\n//go:noescape\nfunc VdupqNU64(r *arm.Uint64X2, v0 *arm.Uint64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNF32 VdupqNF32\n//go:noescape\nfunc VdupqNF32(r *arm.Float32X4, v0 *arm.Float32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNF64 VdupqNF64\n//go:noescape\nfunc VdupqNF64(r *arm.Float64X2, v0 *arm.Float64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNP16 VdupqNP16\n//go:noescape\nfunc VdupqNP16(r *arm.Poly16X8, v0 *arm.Poly16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNP64 VdupqNP64\n//go:noescape\nfunc VdupqNP64(r *arm.Poly64X2, v0 *arm.Poly64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNP8 VdupqNP8\n//go:noescape\nfunc VdupqNP8(r *arm.Poly8X16, v0 *arm.Poly8)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorS8 VeorS8\n//go:noescape\nfunc VeorS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorS16 VeorS16\n//go:noescape\nfunc VeorS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorS32 VeorS32\n//go:noescape\nfunc VeorS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorS64 VeorS64\n//go:noescape\nfunc VeorS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorU8 VeorU8\n//go:noescape\nfunc VeorU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorU16 VeorU16\n//go:noescape\nfunc VeorU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorU32 VeorU32\n//go:noescape\nfunc VeorU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorU64 VeorU64\n//go:noescape\nfunc VeorU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname Veor3QS8 Veor3QS8\n//go:noescape\nfunc Veor3QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16)\n\n// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname Veor3QS16 Veor3QS16\n//go:noescape\nfunc Veor3QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname Veor3QS32 Veor3QS32\n//go:noescape\nfunc Veor3QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname Veor3QS64 Veor3QS64\n//go:noescape\nfunc Veor3QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2, v2 *arm.Int64X2)\n\n// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname Veor3QU8 Veor3QU8\n//go:noescape\nfunc Veor3QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16)\n\n// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname Veor3QU16 Veor3QU16\n//go:noescape\nfunc Veor3QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)\n\n// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname Veor3QU32 Veor3QU32\n//go:noescape\nfunc Veor3QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname Veor3QU64 Veor3QU64\n//go:noescape\nfunc Veor3QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqS8 VeorqS8\n//go:noescape\nfunc VeorqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqS16 VeorqS16\n//go:noescape\nfunc VeorqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqS32 VeorqS32\n//go:noescape\nfunc VeorqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqS64 VeorqS64\n//go:noescape\nfunc VeorqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqU8 VeorqU8\n//go:noescape\nfunc VeorqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqU16 VeorqU16\n//go:noescape\nfunc VeorqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqU32 VeorqU32\n//go:noescape\nfunc VeorqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqU64 VeorqU64\n//go:noescape\nfunc VeorqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VfmaF32 VfmaF32\n//go:noescape\nfunc VfmaF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32X2)\n\n// Floating-point fused Multiply-Add (scalar). This instruction multiplies the values of the first two SIMD&FP source registers, adds the product to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VfmaF64 VfmaF64\n//go:noescape\nfunc VfmaF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64X1)\n\n// Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VfmaNF32 VfmaNF32\n//go:noescape\nfunc VfmaNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32)\n\n// Floating-point fused Multiply-Add (scalar). This instruction multiplies the values of the first two SIMD&FP source registers, adds the product to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VfmaNF64 VfmaNF64\n//go:noescape\nfunc VfmaNF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64)\n\n// Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VfmaqF32 VfmaqF32\n//go:noescape\nfunc VfmaqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32X4)\n\n// Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VfmaqF64 VfmaqF64\n//go:noescape\nfunc VfmaqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64X2)\n\n// Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VfmaqNF32 VfmaqNF32\n//go:noescape\nfunc VfmaqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32)\n\n// Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VfmaqNF64 VfmaqNF64\n//go:noescape\nfunc VfmaqNF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64)\n\n// Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VfmsF32 VfmsF32\n//go:noescape\nfunc VfmsF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32X2)\n\n// Floating-point Fused Multiply-Subtract (scalar). This instruction multiplies the values of the first two SIMD&FP source registers, negates the product, adds that to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VfmsF64 VfmsF64\n//go:noescape\nfunc VfmsF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64X1)\n\n// Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VfmsNF32 VfmsNF32\n//go:noescape\nfunc VfmsNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32)\n\n// Floating-point Fused Multiply-Subtract (scalar). This instruction multiplies the values of the first two SIMD&FP source registers, negates the product, adds that to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VfmsNF64 VfmsNF64\n//go:noescape\nfunc VfmsNF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64)\n\n// Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VfmsqF32 VfmsqF32\n//go:noescape\nfunc VfmsqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32X4)\n\n// Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VfmsqF64 VfmsqF64\n//go:noescape\nfunc VfmsqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64X2)\n\n// Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VfmsqNF32 VfmsqNF32\n//go:noescape\nfunc VfmsqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32)\n\n// Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VfmsqNF64 VfmsqNF64\n//go:noescape\nfunc VfmsqNF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighS8 VgetHighS8\n//go:noescape\nfunc VgetHighS8(r *arm.Int8X8, v0 *arm.Int8X16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighS16 VgetHighS16\n//go:noescape\nfunc VgetHighS16(r *arm.Int16X4, v0 *arm.Int16X8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighS32 VgetHighS32\n//go:noescape\nfunc VgetHighS32(r *arm.Int32X2, v0 *arm.Int32X4)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighS64 VgetHighS64\n//go:noescape\nfunc VgetHighS64(r *arm.Int64X1, v0 *arm.Int64X2)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighU8 VgetHighU8\n//go:noescape\nfunc VgetHighU8(r *arm.Uint8X8, v0 *arm.Uint8X16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighU16 VgetHighU16\n//go:noescape\nfunc VgetHighU16(r *arm.Uint16X4, v0 *arm.Uint16X8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighU32 VgetHighU32\n//go:noescape\nfunc VgetHighU32(r *arm.Uint32X2, v0 *arm.Uint32X4)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighU64 VgetHighU64\n//go:noescape\nfunc VgetHighU64(r *arm.Uint64X1, v0 *arm.Uint64X2)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighF32 VgetHighF32\n//go:noescape\nfunc VgetHighF32(r *arm.Float32X2, v0 *arm.Float32X4)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighF64 VgetHighF64\n//go:noescape\nfunc VgetHighF64(r *arm.Float64X1, v0 *arm.Float64X2)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighP16 VgetHighP16\n//go:noescape\nfunc VgetHighP16(r *arm.Poly16X4, v0 *arm.Poly16X8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighP64 VgetHighP64\n//go:noescape\nfunc VgetHighP64(r *arm.Poly64X1, v0 *arm.Poly64X2)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighP8 VgetHighP8\n//go:noescape\nfunc VgetHighP8(r *arm.Poly8X8, v0 *arm.Poly8X16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowS8 VgetLowS8\n//go:noescape\nfunc VgetLowS8(r *arm.Int8X8, v0 *arm.Int8X16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowS16 VgetLowS16\n//go:noescape\nfunc VgetLowS16(r *arm.Int16X4, v0 *arm.Int16X8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowS32 VgetLowS32\n//go:noescape\nfunc VgetLowS32(r *arm.Int32X2, v0 *arm.Int32X4)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowS64 VgetLowS64\n//go:noescape\nfunc VgetLowS64(r *arm.Int64X1, v0 *arm.Int64X2)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowU8 VgetLowU8\n//go:noescape\nfunc VgetLowU8(r *arm.Uint8X8, v0 *arm.Uint8X16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowU16 VgetLowU16\n//go:noescape\nfunc VgetLowU16(r *arm.Uint16X4, v0 *arm.Uint16X8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowU32 VgetLowU32\n//go:noescape\nfunc VgetLowU32(r *arm.Uint32X2, v0 *arm.Uint32X4)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowU64 VgetLowU64\n//go:noescape\nfunc VgetLowU64(r *arm.Uint64X1, v0 *arm.Uint64X2)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowF32 VgetLowF32\n//go:noescape\nfunc VgetLowF32(r *arm.Float32X2, v0 *arm.Float32X4)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowF64 VgetLowF64\n//go:noescape\nfunc VgetLowF64(r *arm.Float64X1, v0 *arm.Float64X2)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowP16 VgetLowP16\n//go:noescape\nfunc VgetLowP16(r *arm.Poly16X4, v0 *arm.Poly16X8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowP64 VgetLowP64\n//go:noescape\nfunc VgetLowP64(r *arm.Poly64X1, v0 *arm.Poly64X2)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowP8 VgetLowP8\n//go:noescape\nfunc VgetLowP8(r *arm.Poly8X8, v0 *arm.Poly8X16)\n\n// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddS8 VhaddS8\n//go:noescape\nfunc VhaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddS16 VhaddS16\n//go:noescape\nfunc VhaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddS32 VhaddS32\n//go:noescape\nfunc VhaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddU8 VhaddU8\n//go:noescape\nfunc VhaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddU16 VhaddU16\n//go:noescape\nfunc VhaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddU32 VhaddU32\n//go:noescape\nfunc VhaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddqS8 VhaddqS8\n//go:noescape\nfunc VhaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddqS16 VhaddqS16\n//go:noescape\nfunc VhaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddqS32 VhaddqS32\n//go:noescape\nfunc VhaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddqU8 VhaddqU8\n//go:noescape\nfunc VhaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddqU16 VhaddqU16\n//go:noescape\nfunc VhaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddqU32 VhaddqU32\n//go:noescape\nfunc VhaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubS8 VhsubS8\n//go:noescape\nfunc VhsubS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubS16 VhsubS16\n//go:noescape\nfunc VhsubS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubS32 VhsubS32\n//go:noescape\nfunc VhsubS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubU8 VhsubU8\n//go:noescape\nfunc VhsubU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubU16 VhsubU16\n//go:noescape\nfunc VhsubU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubU32 VhsubU32\n//go:noescape\nfunc VhsubU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubqS8 VhsubqS8\n//go:noescape\nfunc VhsubqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubqS16 VhsubqS16\n//go:noescape\nfunc VhsubqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubqS32 VhsubqS32\n//go:noescape\nfunc VhsubqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubqU8 VhsubqU8\n//go:noescape\nfunc VhsubqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubqU16 VhsubqU16\n//go:noescape\nfunc VhsubqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubqU32 VhsubqU32\n//go:noescape\nfunc VhsubqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxS8 VmaxS8\n//go:noescape\nfunc VmaxS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxS16 VmaxS16\n//go:noescape\nfunc VmaxS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxS32 VmaxS32\n//go:noescape\nfunc VmaxS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxU8 VmaxU8\n//go:noescape\nfunc VmaxU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxU16 VmaxU16\n//go:noescape\nfunc VmaxU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxU32 VmaxU32\n//go:noescape\nfunc VmaxU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxF32 VmaxF32\n//go:noescape\nfunc VmaxF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxF64 VmaxF64\n//go:noescape\nfunc VmaxF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxnmF32 VmaxnmF32\n//go:noescape\nfunc VmaxnmF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxnmF64 VmaxnmF64\n//go:noescape\nfunc VmaxnmF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxnmqF32 VmaxnmqF32\n//go:noescape\nfunc VmaxnmqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxnmqF64 VmaxnmqF64\n//go:noescape\nfunc VmaxnmqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VmaxnmvF32 VmaxnmvF32\n//go:noescape\nfunc VmaxnmvF32(r *arm.Float32, v0 *arm.Float32X2)\n\n// Floating-point Maximum Number across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VmaxnmvqF32 VmaxnmvqF32\n//go:noescape\nfunc VmaxnmvqF32(r *arm.Float32, v0 *arm.Float32X4)\n\n// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VmaxnmvqF64 VmaxnmvqF64\n//go:noescape\nfunc VmaxnmvqF64(r *arm.Float64, v0 *arm.Float64X2)\n\n// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqS8 VmaxqS8\n//go:noescape\nfunc VmaxqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqS16 VmaxqS16\n//go:noescape\nfunc VmaxqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqS32 VmaxqS32\n//go:noescape\nfunc VmaxqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqU8 VmaxqU8\n//go:noescape\nfunc VmaxqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqU16 VmaxqU16\n//go:noescape\nfunc VmaxqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqU32 VmaxqU32\n//go:noescape\nfunc VmaxqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqF32 VmaxqF32\n//go:noescape\nfunc VmaxqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqF64 VmaxqF64\n//go:noescape\nfunc VmaxqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VmaxvS8 VmaxvS8\n//go:noescape\nfunc VmaxvS8(r *arm.Int8, v0 *arm.Int8X8)\n\n// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VmaxvS16 VmaxvS16\n//go:noescape\nfunc VmaxvS16(r *arm.Int16, v0 *arm.Int16X4)\n\n// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxvS32 VmaxvS32\n//go:noescape\nfunc VmaxvS32(r *arm.Int32, v0 *arm.Int32X2)\n\n// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmaxvU8 VmaxvU8\n//go:noescape\nfunc VmaxvU8(r *arm.Uint8, v0 *arm.Uint8X8)\n\n// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmaxvU16 VmaxvU16\n//go:noescape\nfunc VmaxvU16(r *arm.Uint16, v0 *arm.Uint16X4)\n\n// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxvU32 VmaxvU32\n//go:noescape\nfunc VmaxvU32(r *arm.Uint32, v0 *arm.Uint32X2)\n\n// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VmaxvF32 VmaxvF32\n//go:noescape\nfunc VmaxvF32(r *arm.Float32, v0 *arm.Float32X2)\n\n// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VmaxvqS8 VmaxvqS8\n//go:noescape\nfunc VmaxvqS8(r *arm.Int8, v0 *arm.Int8X16)\n\n// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VmaxvqS16 VmaxvqS16\n//go:noescape\nfunc VmaxvqS16(r *arm.Int16, v0 *arm.Int16X8)\n\n// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VmaxvqS32 VmaxvqS32\n//go:noescape\nfunc VmaxvqS32(r *arm.Int32, v0 *arm.Int32X4)\n\n// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmaxvqU8 VmaxvqU8\n//go:noescape\nfunc VmaxvqU8(r *arm.Uint8, v0 *arm.Uint8X16)\n\n// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmaxvqU16 VmaxvqU16\n//go:noescape\nfunc VmaxvqU16(r *arm.Uint16, v0 *arm.Uint16X8)\n\n// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmaxvqU32 VmaxvqU32\n//go:noescape\nfunc VmaxvqU32(r *arm.Uint32, v0 *arm.Uint32X4)\n\n// Floating-point Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VmaxvqF32 VmaxvqF32\n//go:noescape\nfunc VmaxvqF32(r *arm.Float32, v0 *arm.Float32X4)\n\n// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VmaxvqF64 VmaxvqF64\n//go:noescape\nfunc VmaxvqF64(r *arm.Float64, v0 *arm.Float64X2)\n\n// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminS8 VminS8\n//go:noescape\nfunc VminS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminS16 VminS16\n//go:noescape\nfunc VminS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminS32 VminS32\n//go:noescape\nfunc VminS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminU8 VminU8\n//go:noescape\nfunc VminU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminU16 VminU16\n//go:noescape\nfunc VminU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminU32 VminU32\n//go:noescape\nfunc VminU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminF32 VminF32\n//go:noescape\nfunc VminF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminF64 VminF64\n//go:noescape\nfunc VminF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminnmF32 VminnmF32\n//go:noescape\nfunc VminnmF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminnmF64 VminnmF64\n//go:noescape\nfunc VminnmF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminnmqF32 VminnmqF32\n//go:noescape\nfunc VminnmqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminnmqF64 VminnmqF64\n//go:noescape\nfunc VminnmqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VminnmvF32 VminnmvF32\n//go:noescape\nfunc VminnmvF32(r *arm.Float32, v0 *arm.Float32X2)\n\n// Floating-point Minimum Number across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VminnmvqF32 VminnmvqF32\n//go:noescape\nfunc VminnmvqF32(r *arm.Float32, v0 *arm.Float32X4)\n\n// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VminnmvqF64 VminnmvqF64\n//go:noescape\nfunc VminnmvqF64(r *arm.Float64, v0 *arm.Float64X2)\n\n// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqS8 VminqS8\n//go:noescape\nfunc VminqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqS16 VminqS16\n//go:noescape\nfunc VminqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqS32 VminqS32\n//go:noescape\nfunc VminqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqU8 VminqU8\n//go:noescape\nfunc VminqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqU16 VminqU16\n//go:noescape\nfunc VminqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqU32 VminqU32\n//go:noescape\nfunc VminqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqF32 VminqF32\n//go:noescape\nfunc VminqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqF64 VminqF64\n//go:noescape\nfunc VminqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VminvS8 VminvS8\n//go:noescape\nfunc VminvS8(r *arm.Int8, v0 *arm.Int8X8)\n\n// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VminvS16 VminvS16\n//go:noescape\nfunc VminvS16(r *arm.Int16, v0 *arm.Int16X4)\n\n// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminvS32 VminvS32\n//go:noescape\nfunc VminvS32(r *arm.Int32, v0 *arm.Int32X2)\n\n// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VminvU8 VminvU8\n//go:noescape\nfunc VminvU8(r *arm.Uint8, v0 *arm.Uint8X8)\n\n// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VminvU16 VminvU16\n//go:noescape\nfunc VminvU16(r *arm.Uint16, v0 *arm.Uint16X4)\n\n// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminvU32 VminvU32\n//go:noescape\nfunc VminvU32(r *arm.Uint32, v0 *arm.Uint32X2)\n\n// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VminvF32 VminvF32\n//go:noescape\nfunc VminvF32(r *arm.Float32, v0 *arm.Float32X2)\n\n// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VminvqS8 VminvqS8\n//go:noescape\nfunc VminvqS8(r *arm.Int8, v0 *arm.Int8X16)\n\n// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VminvqS16 VminvqS16\n//go:noescape\nfunc VminvqS16(r *arm.Int16, v0 *arm.Int16X8)\n\n// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VminvqS32 VminvqS32\n//go:noescape\nfunc VminvqS32(r *arm.Int32, v0 *arm.Int32X4)\n\n// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VminvqU8 VminvqU8\n//go:noescape\nfunc VminvqU8(r *arm.Uint8, v0 *arm.Uint8X16)\n\n// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VminvqU16 VminvqU16\n//go:noescape\nfunc VminvqU16(r *arm.Uint16, v0 *arm.Uint16X8)\n\n// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VminvqU32 VminvqU32\n//go:noescape\nfunc VminvqU32(r *arm.Uint32, v0 *arm.Uint32X4)\n\n// Floating-point Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VminvqF32 VminvqF32\n//go:noescape\nfunc VminvqF32(r *arm.Float32, v0 *arm.Float32X4)\n\n// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VminvqF64 VminvqF64\n//go:noescape\nfunc VminvqF64(r *arm.Float64, v0 *arm.Float64X2)\n\n// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlaS8 VmlaS8\n//go:noescape\nfunc VmlaS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8, v2 *arm.Int8X8)\n\n// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlaS16 VmlaS16\n//go:noescape\nfunc VmlaS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4)\n\n// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlaS32 VmlaS32\n//go:noescape\nfunc VmlaS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2)\n\n// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlaU8 VmlaU8\n//go:noescape\nfunc VmlaU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)\n\n// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlaU16 VmlaU16\n//go:noescape\nfunc VmlaU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4)\n\n// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlaU32 VmlaU32\n//go:noescape\nfunc VmlaU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2)\n\n// Floating-point multiply-add to accumulator\n//\n//go:linkname VmlaF32 VmlaF32\n//go:noescape\nfunc VmlaF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32X2)\n\n// Floating-point multiply-add to accumulator\n//\n//go:linkname VmlaF64 VmlaF64\n//go:noescape\nfunc VmlaF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64X1)\n\n// Vector multiply accumulate with scalar\n//\n//go:linkname VmlaNS16 VmlaNS16\n//go:noescape\nfunc VmlaNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16)\n\n// Vector multiply accumulate with scalar\n//\n//go:linkname VmlaNS32 VmlaNS32\n//go:noescape\nfunc VmlaNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32)\n\n// Vector multiply accumulate with scalar\n//\n//go:linkname VmlaNU16 VmlaNU16\n//go:noescape\nfunc VmlaNU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16)\n\n// Vector multiply accumulate with scalar\n//\n//go:linkname VmlaNU32 VmlaNU32\n//go:noescape\nfunc VmlaNU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32)\n\n// Vector multiply accumulate with scalar\n//\n//go:linkname VmlaNF32 VmlaNF32\n//go:noescape\nfunc VmlaNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32)\n\n// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalS8 VmlalS8\n//go:noescape\nfunc VmlalS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8, v2 *arm.Int8X8)\n\n// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalS16 VmlalS16\n//go:noescape\nfunc VmlalS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4)\n\n// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalS32 VmlalS32\n//go:noescape\nfunc VmlalS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2)\n\n// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalU8 VmlalU8\n//go:noescape\nfunc VmlalU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)\n\n// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalU16 VmlalU16\n//go:noescape\nfunc VmlalU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4)\n\n// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalU32 VmlalU32\n//go:noescape\nfunc VmlalU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2)\n\n// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalHighS8 VmlalHighS8\n//go:noescape\nfunc VmlalHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16, v2 *arm.Int8X16)\n\n// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalHighS16 VmlalHighS16\n//go:noescape\nfunc VmlalHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalHighS32 VmlalHighS32\n//go:noescape\nfunc VmlalHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalHighU8 VmlalHighU8\n//go:noescape\nfunc VmlalHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16, v2 *arm.Uint8X16)\n\n// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalHighU16 VmlalHighU16\n//go:noescape\nfunc VmlalHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16X8)\n\n// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalHighU32 VmlalHighU32\n//go:noescape\nfunc VmlalHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalHighNS16 VmlalHighNS16\n//go:noescape\nfunc VmlalHighNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16)\n\n// Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalHighNS32 VmlalHighNS32\n//go:noescape\nfunc VmlalHighNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32)\n\n// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalHighNU16 VmlalHighNU16\n//go:noescape\nfunc VmlalHighNU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16)\n\n// Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlalHighNU32 VmlalHighNU32\n//go:noescape\nfunc VmlalHighNU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32)\n\n// Vector widening multiply accumulate with scalar\n//\n//go:linkname VmlalNS16 VmlalNS16\n//go:noescape\nfunc VmlalNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16)\n\n// Vector widening multiply accumulate with scalar\n//\n//go:linkname VmlalNS32 VmlalNS32\n//go:noescape\nfunc VmlalNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32)\n\n// Vector widening multiply accumulate with scalar\n//\n//go:linkname VmlalNU16 VmlalNU16\n//go:noescape\nfunc VmlalNU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16)\n\n// Vector widening multiply accumulate with scalar\n//\n//go:linkname VmlalNU32 VmlalNU32\n//go:noescape\nfunc VmlalNU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32)\n\n// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlaqS8 VmlaqS8\n//go:noescape\nfunc VmlaqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16)\n\n// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlaqS16 VmlaqS16\n//go:noescape\nfunc VmlaqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlaqS32 VmlaqS32\n//go:noescape\nfunc VmlaqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlaqU8 VmlaqU8\n//go:noescape\nfunc VmlaqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16)\n\n// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlaqU16 VmlaqU16\n//go:noescape\nfunc VmlaqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)\n\n// Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlaqU32 VmlaqU32\n//go:noescape\nfunc VmlaqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// Floating-point multiply-add to accumulator\n//\n//go:linkname VmlaqF32 VmlaqF32\n//go:noescape\nfunc VmlaqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32X4)\n\n// Floating-point multiply-add to accumulator\n//\n//go:linkname VmlaqF64 VmlaqF64\n//go:noescape\nfunc VmlaqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64X2)\n\n// Vector multiply accumulate with scalar\n//\n//go:linkname VmlaqNS16 VmlaqNS16\n//go:noescape\nfunc VmlaqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16)\n\n// Vector multiply accumulate with scalar\n//\n//go:linkname VmlaqNS32 VmlaqNS32\n//go:noescape\nfunc VmlaqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32)\n\n// Vector multiply accumulate with scalar\n//\n//go:linkname VmlaqNU16 VmlaqNU16\n//go:noescape\nfunc VmlaqNU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16)\n\n// Vector multiply accumulate with scalar\n//\n//go:linkname VmlaqNU32 VmlaqNU32\n//go:noescape\nfunc VmlaqNU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32)\n\n// Vector multiply accumulate with scalar\n//\n//go:linkname VmlaqNF32 VmlaqNF32\n//go:noescape\nfunc VmlaqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32)\n\n// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlsS8 VmlsS8\n//go:noescape\nfunc VmlsS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8, v2 *arm.Int8X8)\n\n// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlsS16 VmlsS16\n//go:noescape\nfunc VmlsS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4)\n\n// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlsS32 VmlsS32\n//go:noescape\nfunc VmlsS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2)\n\n// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlsU8 VmlsU8\n//go:noescape\nfunc VmlsU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)\n\n// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlsU16 VmlsU16\n//go:noescape\nfunc VmlsU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4)\n\n// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlsU32 VmlsU32\n//go:noescape\nfunc VmlsU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2)\n\n// Multiply-subtract from accumulator\n//\n//go:linkname VmlsF32 VmlsF32\n//go:noescape\nfunc VmlsF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32X2)\n\n// Multiply-subtract from accumulator\n//\n//go:linkname VmlsF64 VmlsF64\n//go:noescape\nfunc VmlsF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64X1)\n\n// Vector multiply subtract with scalar\n//\n//go:linkname VmlsNS16 VmlsNS16\n//go:noescape\nfunc VmlsNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16)\n\n// Vector multiply subtract with scalar\n//\n//go:linkname VmlsNS32 VmlsNS32\n//go:noescape\nfunc VmlsNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32)\n\n// Vector multiply subtract with scalar\n//\n//go:linkname VmlsNU16 VmlsNU16\n//go:noescape\nfunc VmlsNU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16)\n\n// Vector multiply subtract with scalar\n//\n//go:linkname VmlsNU32 VmlsNU32\n//go:noescape\nfunc VmlsNU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32)\n\n// Vector multiply subtract with scalar\n//\n//go:linkname VmlsNF32 VmlsNF32\n//go:noescape\nfunc VmlsNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32)\n\n// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlslS8 VmlslS8\n//go:noescape\nfunc VmlslS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8, v2 *arm.Int8X8)\n\n// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlslS16 VmlslS16\n//go:noescape\nfunc VmlslS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4)\n\n// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlslS32 VmlslS32\n//go:noescape\nfunc VmlslS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2)\n\n// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmlslU8 VmlslU8\n//go:noescape\nfunc VmlslU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)\n\n// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmlslU16 VmlslU16\n//go:noescape\nfunc VmlslU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4)\n\n// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmlslU32 VmlslU32\n//go:noescape\nfunc VmlslU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2)\n\n// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlslHighS8 VmlslHighS8\n//go:noescape\nfunc VmlslHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16, v2 *arm.Int8X16)\n\n// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlslHighS16 VmlslHighS16\n//go:noescape\nfunc VmlslHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlslHighS32 VmlslHighS32\n//go:noescape\nfunc VmlslHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmlslHighU8 VmlslHighU8\n//go:noescape\nfunc VmlslHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16, v2 *arm.Uint8X16)\n\n// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmlslHighU16 VmlslHighU16\n//go:noescape\nfunc VmlslHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16X8)\n\n// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmlslHighU32 VmlslHighU32\n//go:noescape\nfunc VmlslHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlslHighNS16 VmlslHighNS16\n//go:noescape\nfunc VmlslHighNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16)\n\n// Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmlslHighNS32 VmlslHighNS32\n//go:noescape\nfunc VmlslHighNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32)\n\n// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmlslHighNU16 VmlslHighNU16\n//go:noescape\nfunc VmlslHighNU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16)\n\n// Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmlslHighNU32 VmlslHighNU32\n//go:noescape\nfunc VmlslHighNU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32)\n\n// Vector widening multiply subtract with scalar\n//\n//go:linkname VmlslNS16 VmlslNS16\n//go:noescape\nfunc VmlslNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16)\n\n// Vector widening multiply subtract with scalar\n//\n//go:linkname VmlslNS32 VmlslNS32\n//go:noescape\nfunc VmlslNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32)\n\n// Vector widening multiply subtract with scalar\n//\n//go:linkname VmlslNU16 VmlslNU16\n//go:noescape\nfunc VmlslNU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16)\n\n// Vector widening multiply subtract with scalar\n//\n//go:linkname VmlslNU32 VmlslNU32\n//go:noescape\nfunc VmlslNU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32)\n\n// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlsqS8 VmlsqS8\n//go:noescape\nfunc VmlsqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16)\n\n// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlsqS16 VmlsqS16\n//go:noescape\nfunc VmlsqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlsqS32 VmlsqS32\n//go:noescape\nfunc VmlsqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlsqU8 VmlsqU8\n//go:noescape\nfunc VmlsqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16)\n\n// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlsqU16 VmlsqU16\n//go:noescape\nfunc VmlsqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)\n\n// Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VmlsqU32 VmlsqU32\n//go:noescape\nfunc VmlsqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// Multiply-subtract from accumulator\n//\n//go:linkname VmlsqF32 VmlsqF32\n//go:noescape\nfunc VmlsqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32X4)\n\n// Multiply-subtract from accumulator\n//\n//go:linkname VmlsqF64 VmlsqF64\n//go:noescape\nfunc VmlsqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64X2)\n\n// Vector multiply subtract with scalar\n//\n//go:linkname VmlsqNS16 VmlsqNS16\n//go:noescape\nfunc VmlsqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16)\n\n// Vector multiply subtract with scalar\n//\n//go:linkname VmlsqNS32 VmlsqNS32\n//go:noescape\nfunc VmlsqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32)\n\n// Vector multiply subtract with scalar\n//\n//go:linkname VmlsqNU16 VmlsqNU16\n//go:noescape\nfunc VmlsqNU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16)\n\n// Vector multiply subtract with scalar\n//\n//go:linkname VmlsqNU32 VmlsqNU32\n//go:noescape\nfunc VmlsqNU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32)\n\n// Vector multiply subtract with scalar\n//\n//go:linkname VmlsqNF32 VmlsqNF32\n//go:noescape\nfunc VmlsqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32)\n\n// Signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of signed 8-bit integer values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element.\n//\n//go:linkname VmmlaqS32 VmmlaqS32\n//go:noescape\nfunc VmmlaqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int8X16, v2 *arm.Int8X16)\n\n// Unsigned 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned 8-bit integer values in the first source vector by the 8x2 matrix of unsigned 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element.\n//\n//go:linkname VmmlaqU32 VmmlaqU32\n//go:noescape\nfunc VmmlaqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint8X16, v2 *arm.Uint8X16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNS8 VmovNS8\n//go:noescape\nfunc VmovNS8(r *arm.Int8X8, v0 *arm.Int8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNS16 VmovNS16\n//go:noescape\nfunc VmovNS16(r *arm.Int16X4, v0 *arm.Int16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNS32 VmovNS32\n//go:noescape\nfunc VmovNS32(r *arm.Int32X2, v0 *arm.Int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNS64 VmovNS64\n//go:noescape\nfunc VmovNS64(r *arm.Int64X1, v0 *arm.Int64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNU8 VmovNU8\n//go:noescape\nfunc VmovNU8(r *arm.Uint8X8, v0 *arm.Uint8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNU16 VmovNU16\n//go:noescape\nfunc VmovNU16(r *arm.Uint16X4, v0 *arm.Uint16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNU32 VmovNU32\n//go:noescape\nfunc VmovNU32(r *arm.Uint32X2, v0 *arm.Uint32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNU64 VmovNU64\n//go:noescape\nfunc VmovNU64(r *arm.Uint64X1, v0 *arm.Uint64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNF32 VmovNF32\n//go:noescape\nfunc VmovNF32(r *arm.Float32X2, v0 *arm.Float32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNF64 VmovNF64\n//go:noescape\nfunc VmovNF64(r *arm.Float64X1, v0 *arm.Float64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNP16 VmovNP16\n//go:noescape\nfunc VmovNP16(r *arm.Poly16X4, v0 *arm.Poly16)\n\n// vmov_n_p64\n//\n//go:linkname VmovNP64 VmovNP64\n//go:noescape\nfunc VmovNP64(r *arm.Poly64X1, v0 *arm.Poly64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNP8 VmovNP8\n//go:noescape\nfunc VmovNP8(r *arm.Poly8X8, v0 *arm.Poly8)\n\n// Vector move\n//\n//go:linkname VmovlS8 VmovlS8\n//go:noescape\nfunc VmovlS8(r *arm.Int16X8, v0 *arm.Int8X8)\n\n// Vector move\n//\n//go:linkname VmovlS16 VmovlS16\n//go:noescape\nfunc VmovlS16(r *arm.Int32X4, v0 *arm.Int16X4)\n\n// Vector move\n//\n//go:linkname VmovlS32 VmovlS32\n//go:noescape\nfunc VmovlS32(r *arm.Int64X2, v0 *arm.Int32X2)\n\n// Vector move\n//\n//go:linkname VmovlU8 VmovlU8\n//go:noescape\nfunc VmovlU8(r *arm.Uint16X8, v0 *arm.Uint8X8)\n\n// Vector move\n//\n//go:linkname VmovlU16 VmovlU16\n//go:noescape\nfunc VmovlU16(r *arm.Uint32X4, v0 *arm.Uint16X4)\n\n// Vector move\n//\n//go:linkname VmovlU32 VmovlU32\n//go:noescape\nfunc VmovlU32(r *arm.Uint64X2, v0 *arm.Uint32X2)\n\n// Vector move\n//\n//go:linkname VmovlHighS8 VmovlHighS8\n//go:noescape\nfunc VmovlHighS8(r *arm.Int16X8, v0 *arm.Int8X16)\n\n// Vector move\n//\n//go:linkname VmovlHighS16 VmovlHighS16\n//go:noescape\nfunc VmovlHighS16(r *arm.Int32X4, v0 *arm.Int16X8)\n\n// Vector move\n//\n//go:linkname VmovlHighS32 VmovlHighS32\n//go:noescape\nfunc VmovlHighS32(r *arm.Int64X2, v0 *arm.Int32X4)\n\n// Vector move\n//\n//go:linkname VmovlHighU8 VmovlHighU8\n//go:noescape\nfunc VmovlHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16)\n\n// Vector move\n//\n//go:linkname VmovlHighU16 VmovlHighU16\n//go:noescape\nfunc VmovlHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8)\n\n// Vector move\n//\n//go:linkname VmovlHighU32 VmovlHighU32\n//go:noescape\nfunc VmovlHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4)\n\n// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VmovnS16 VmovnS16\n//go:noescape\nfunc VmovnS16(r *arm.Int8X8, v0 *arm.Int16X8)\n\n// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VmovnS32 VmovnS32\n//go:noescape\nfunc VmovnS32(r *arm.Int16X4, v0 *arm.Int32X4)\n\n// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VmovnS64 VmovnS64\n//go:noescape\nfunc VmovnS64(r *arm.Int32X2, v0 *arm.Int64X2)\n\n// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VmovnU16 VmovnU16\n//go:noescape\nfunc VmovnU16(r *arm.Uint8X8, v0 *arm.Uint16X8)\n\n// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VmovnU32 VmovnU32\n//go:noescape\nfunc VmovnU32(r *arm.Uint16X4, v0 *arm.Uint32X4)\n\n// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VmovnU64 VmovnU64\n//go:noescape\nfunc VmovnU64(r *arm.Uint32X2, v0 *arm.Uint64X2)\n\n// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VmovnHighS16 VmovnHighS16\n//go:noescape\nfunc VmovnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8)\n\n// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VmovnHighS32 VmovnHighS32\n//go:noescape\nfunc VmovnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4)\n\n// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VmovnHighS64 VmovnHighS64\n//go:noescape\nfunc VmovnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2)\n\n// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VmovnHighU16 VmovnHighU16\n//go:noescape\nfunc VmovnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8)\n\n// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VmovnHighU32 VmovnHighU32\n//go:noescape\nfunc VmovnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4)\n\n// Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VmovnHighU64 VmovnHighU64\n//go:noescape\nfunc VmovnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNS8 VmovqNS8\n//go:noescape\nfunc VmovqNS8(r *arm.Int8X16, v0 *arm.Int8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNS16 VmovqNS16\n//go:noescape\nfunc VmovqNS16(r *arm.Int16X8, v0 *arm.Int16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNS32 VmovqNS32\n//go:noescape\nfunc VmovqNS32(r *arm.Int32X4, v0 *arm.Int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNS64 VmovqNS64\n//go:noescape\nfunc VmovqNS64(r *arm.Int64X2, v0 *arm.Int64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNU8 VmovqNU8\n//go:noescape\nfunc VmovqNU8(r *arm.Uint8X16, v0 *arm.Uint8)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNU16 VmovqNU16\n//go:noescape\nfunc VmovqNU16(r *arm.Uint16X8, v0 *arm.Uint16)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNU32 VmovqNU32\n//go:noescape\nfunc VmovqNU32(r *arm.Uint32X4, v0 *arm.Uint32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNU64 VmovqNU64\n//go:noescape\nfunc VmovqNU64(r *arm.Uint64X2, v0 *arm.Uint64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNF32 VmovqNF32\n//go:noescape\nfunc VmovqNF32(r *arm.Float32X4, v0 *arm.Float32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNF64 VmovqNF64\n//go:noescape\nfunc VmovqNF64(r *arm.Float64X2, v0 *arm.Float64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNP16 VmovqNP16\n//go:noescape\nfunc VmovqNP16(r *arm.Poly16X8, v0 *arm.Poly16)\n\n// vmovq_n_p64\n//\n//go:linkname VmovqNP64 VmovqNP64\n//go:noescape\nfunc VmovqNP64(r *arm.Poly64X2, v0 *arm.Poly64)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNP8 VmovqNP8\n//go:noescape\nfunc VmovqNP8(r *arm.Poly8X16, v0 *arm.Poly8)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulS8 VmulS8\n//go:noescape\nfunc VmulS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulS16 VmulS16\n//go:noescape\nfunc VmulS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulS32 VmulS32\n//go:noescape\nfunc VmulS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulU8 VmulU8\n//go:noescape\nfunc VmulU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulU16 VmulU16\n//go:noescape\nfunc VmulU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulU32 VmulU32\n//go:noescape\nfunc VmulU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulF32 VmulF32\n//go:noescape\nfunc VmulF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulF64 VmulF64\n//go:noescape\nfunc VmulF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Vector multiply by scalar\n//\n//go:linkname VmulNS16 VmulNS16\n//go:noescape\nfunc VmulNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16)\n\n// Vector multiply by scalar\n//\n//go:linkname VmulNS32 VmulNS32\n//go:noescape\nfunc VmulNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32)\n\n// Vector multiply by scalar\n//\n//go:linkname VmulNU16 VmulNU16\n//go:noescape\nfunc VmulNU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16)\n\n// Vector multiply by scalar\n//\n//go:linkname VmulNU32 VmulNU32\n//go:noescape\nfunc VmulNU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32)\n\n// Vector multiply by scalar\n//\n//go:linkname VmulNF32 VmulNF32\n//go:noescape\nfunc VmulNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32)\n\n// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulNF64 VmulNF64\n//go:noescape\nfunc VmulNF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64)\n\n// Polynomial Multiply. This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulP8 VmulP8\n//go:noescape\nfunc VmulP8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmullS8 VmullS8\n//go:noescape\nfunc VmullS8(r *arm.Int16X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmullS16 VmullS16\n//go:noescape\nfunc VmullS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmullS32 VmullS32\n//go:noescape\nfunc VmullS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmullU8 VmullU8\n//go:noescape\nfunc VmullU8(r *arm.Uint16X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmullU16 VmullU16\n//go:noescape\nfunc VmullU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmullU32 VmullU32\n//go:noescape\nfunc VmullU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmullHighS8 VmullHighS8\n//go:noescape\nfunc VmullHighS8(r *arm.Int16X8, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmullHighS16 VmullHighS16\n//go:noescape\nfunc VmullHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmullHighS32 VmullHighS32\n//go:noescape\nfunc VmullHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmullHighU8 VmullHighU8\n//go:noescape\nfunc VmullHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmullHighU16 VmullHighU16\n//go:noescape\nfunc VmullHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmullHighU32 VmullHighU32\n//go:noescape\nfunc VmullHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmullHighNS16 VmullHighNS16\n//go:noescape\nfunc VmullHighNS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16)\n\n// Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmullHighNS32 VmullHighNS32\n//go:noescape\nfunc VmullHighNS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32)\n\n// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmullHighNU16 VmullHighNU16\n//go:noescape\nfunc VmullHighNU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16)\n\n// Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmullHighNU32 VmullHighNU32\n//go:noescape\nfunc VmullHighNU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32)\n\n// Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmullHighP64 VmullHighP64\n//go:noescape\nfunc VmullHighP64(r *arm.Poly128, v0 *arm.Poly64X2, v1 *arm.Poly64X2)\n\n// Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmullHighP8 VmullHighP8\n//go:noescape\nfunc VmullHighP8(r *arm.Poly16X8, v0 *arm.Poly8X16, v1 *arm.Poly8X16)\n\n// Vector long multiply with scalar\n//\n//go:linkname VmullNS16 VmullNS16\n//go:noescape\nfunc VmullNS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16)\n\n// Vector long multiply with scalar\n//\n//go:linkname VmullNS32 VmullNS32\n//go:noescape\nfunc VmullNS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32)\n\n// Vector long multiply with scalar\n//\n//go:linkname VmullNU16 VmullNU16\n//go:noescape\nfunc VmullNU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16)\n\n// Vector long multiply with scalar\n//\n//go:linkname VmullNU32 VmullNU32\n//go:noescape\nfunc VmullNU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32)\n\n// Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmullP64 VmullP64\n//go:noescape\nfunc VmullP64(r *arm.Poly128, v0 *arm.Poly64, v1 *arm.Poly64)\n\n// Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VmullP8 VmullP8\n//go:noescape\nfunc VmullP8(r *arm.Poly16X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqS8 VmulqS8\n//go:noescape\nfunc VmulqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqS16 VmulqS16\n//go:noescape\nfunc VmulqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqS32 VmulqS32\n//go:noescape\nfunc VmulqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqU8 VmulqU8\n//go:noescape\nfunc VmulqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqU16 VmulqU16\n//go:noescape\nfunc VmulqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqU32 VmulqU32\n//go:noescape\nfunc VmulqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqF32 VmulqF32\n//go:noescape\nfunc VmulqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqF64 VmulqF64\n//go:noescape\nfunc VmulqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Vector multiply by scalar\n//\n//go:linkname VmulqNS16 VmulqNS16\n//go:noescape\nfunc VmulqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16)\n\n// Vector multiply by scalar\n//\n//go:linkname VmulqNS32 VmulqNS32\n//go:noescape\nfunc VmulqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32)\n\n// Vector multiply by scalar\n//\n//go:linkname VmulqNU16 VmulqNU16\n//go:noescape\nfunc VmulqNU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16)\n\n// Vector multiply by scalar\n//\n//go:linkname VmulqNU32 VmulqNU32\n//go:noescape\nfunc VmulqNU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32)\n\n// Vector multiply by scalar\n//\n//go:linkname VmulqNF32 VmulqNF32\n//go:noescape\nfunc VmulqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32)\n\n// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqNF64 VmulqNF64\n//go:noescape\nfunc VmulqNF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64)\n\n// Polynomial Multiply. This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqP8 VmulqP8\n//go:noescape\nfunc VmulqP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)\n\n// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulxF32 VmulxF32\n//go:noescape\nfunc VmulxF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulxF64 VmulxF64\n//go:noescape\nfunc VmulxF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulxdF64 VmulxdF64\n//go:noescape\nfunc VmulxdF64(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64)\n\n// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulxqF32 VmulxqF32\n//go:noescape\nfunc VmulxqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulxqF64 VmulxqF64\n//go:noescape\nfunc VmulxqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulxsF32 VmulxsF32\n//go:noescape\nfunc VmulxsF32(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnS8 VmvnS8\n//go:noescape\nfunc VmvnS8(r *arm.Int8X8, v0 *arm.Int8X8)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnS16 VmvnS16\n//go:noescape\nfunc VmvnS16(r *arm.Int16X4, v0 *arm.Int16X4)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnS32 VmvnS32\n//go:noescape\nfunc VmvnS32(r *arm.Int32X2, v0 *arm.Int32X2)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnU8 VmvnU8\n//go:noescape\nfunc VmvnU8(r *arm.Uint8X8, v0 *arm.Uint8X8)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnU16 VmvnU16\n//go:noescape\nfunc VmvnU16(r *arm.Uint16X4, v0 *arm.Uint16X4)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnU32 VmvnU32\n//go:noescape\nfunc VmvnU32(r *arm.Uint32X2, v0 *arm.Uint32X2)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnP8 VmvnP8\n//go:noescape\nfunc VmvnP8(r *arm.Poly8X8, v0 *arm.Poly8X8)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnqS8 VmvnqS8\n//go:noescape\nfunc VmvnqS8(r *arm.Int8X16, v0 *arm.Int8X16)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnqS16 VmvnqS16\n//go:noescape\nfunc VmvnqS16(r *arm.Int16X8, v0 *arm.Int16X8)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnqS32 VmvnqS32\n//go:noescape\nfunc VmvnqS32(r *arm.Int32X4, v0 *arm.Int32X4)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnqU8 VmvnqU8\n//go:noescape\nfunc VmvnqU8(r *arm.Uint8X16, v0 *arm.Uint8X16)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnqU16 VmvnqU16\n//go:noescape\nfunc VmvnqU16(r *arm.Uint16X8, v0 *arm.Uint16X8)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnqU32 VmvnqU32\n//go:noescape\nfunc VmvnqU32(r *arm.Uint32X4, v0 *arm.Uint32X4)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnqP8 VmvnqP8\n//go:noescape\nfunc VmvnqP8(r *arm.Poly8X16, v0 *arm.Poly8X16)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegS8 VnegS8\n//go:noescape\nfunc VnegS8(r *arm.Int8X8, v0 *arm.Int8X8)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegS16 VnegS16\n//go:noescape\nfunc VnegS16(r *arm.Int16X4, v0 *arm.Int16X4)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegS32 VnegS32\n//go:noescape\nfunc VnegS32(r *arm.Int32X2, v0 *arm.Int32X2)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegS64 VnegS64\n//go:noescape\nfunc VnegS64(r *arm.Int64X1, v0 *arm.Int64X1)\n\n// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegF32 VnegF32\n//go:noescape\nfunc VnegF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegF64 VnegF64\n//go:noescape\nfunc VnegF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegdS64 VnegdS64\n//go:noescape\nfunc VnegdS64(r *arm.Int64, v0 *arm.Int64)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegqS8 VnegqS8\n//go:noescape\nfunc VnegqS8(r *arm.Int8X16, v0 *arm.Int8X16)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegqS16 VnegqS16\n//go:noescape\nfunc VnegqS16(r *arm.Int16X8, v0 *arm.Int16X8)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegqS32 VnegqS32\n//go:noescape\nfunc VnegqS32(r *arm.Int32X4, v0 *arm.Int32X4)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegqS64 VnegqS64\n//go:noescape\nfunc VnegqS64(r *arm.Int64X2, v0 *arm.Int64X2)\n\n// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegqF32 VnegqF32\n//go:noescape\nfunc VnegqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegqF64 VnegqF64\n//go:noescape\nfunc VnegqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornS8 VornS8\n//go:noescape\nfunc VornS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornS16 VornS16\n//go:noescape\nfunc VornS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornS32 VornS32\n//go:noescape\nfunc VornS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornS64 VornS64\n//go:noescape\nfunc VornS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornU8 VornU8\n//go:noescape\nfunc VornU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornU16 VornU16\n//go:noescape\nfunc VornU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornU32 VornU32\n//go:noescape\nfunc VornU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornU64 VornU64\n//go:noescape\nfunc VornU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqS8 VornqS8\n//go:noescape\nfunc VornqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqS16 VornqS16\n//go:noescape\nfunc VornqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqS32 VornqS32\n//go:noescape\nfunc VornqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqS64 VornqS64\n//go:noescape\nfunc VornqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqU8 VornqU8\n//go:noescape\nfunc VornqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqU16 VornqU16\n//go:noescape\nfunc VornqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqU32 VornqU32\n//go:noescape\nfunc VornqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqU64 VornqU64\n//go:noescape\nfunc VornqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrS8 VorrS8\n//go:noescape\nfunc VorrS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrS16 VorrS16\n//go:noescape\nfunc VorrS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrS32 VorrS32\n//go:noescape\nfunc VorrS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrS64 VorrS64\n//go:noescape\nfunc VorrS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrU8 VorrU8\n//go:noescape\nfunc VorrU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrU16 VorrU16\n//go:noescape\nfunc VorrU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrU32 VorrU32\n//go:noescape\nfunc VorrU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrU64 VorrU64\n//go:noescape\nfunc VorrU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqS8 VorrqS8\n//go:noescape\nfunc VorrqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqS16 VorrqS16\n//go:noescape\nfunc VorrqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqS32 VorrqS32\n//go:noescape\nfunc VorrqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqS64 VorrqS64\n//go:noescape\nfunc VorrqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqU8 VorrqU8\n//go:noescape\nfunc VorrqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqU16 VorrqU16\n//go:noescape\nfunc VorrqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqU32 VorrqU32\n//go:noescape\nfunc VorrqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqU64 VorrqU64\n//go:noescape\nfunc VorrqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpadalS8 VpadalS8\n//go:noescape\nfunc VpadalS8(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int8X8)\n\n// Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpadalS16 VpadalS16\n//go:noescape\nfunc VpadalS16(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int16X4)\n\n// Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpadalS32 VpadalS32\n//go:noescape\nfunc VpadalS32(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int32X2)\n\n// Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpadalU8 VpadalU8\n//go:noescape\nfunc VpadalU8(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint8X8)\n\n// Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpadalU16 VpadalU16\n//go:noescape\nfunc VpadalU16(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint16X4)\n\n// Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpadalU32 VpadalU32\n//go:noescape\nfunc VpadalU32(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint32X2)\n\n// Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpadalqS8 VpadalqS8\n//go:noescape\nfunc VpadalqS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16)\n\n// Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpadalqS16 VpadalqS16\n//go:noescape\nfunc VpadalqS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8)\n\n// Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpadalqS32 VpadalqS32\n//go:noescape\nfunc VpadalqS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4)\n\n// Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpadalqU8 VpadalqU8\n//go:noescape\nfunc VpadalqU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16)\n\n// Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpadalqU16 VpadalqU16\n//go:noescape\nfunc VpadalqU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8)\n\n// Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpadalqU32 VpadalqU32\n//go:noescape\nfunc VpadalqU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddS8 VpaddS8\n//go:noescape\nfunc VpaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddS16 VpaddS16\n//go:noescape\nfunc VpaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddS32 VpaddS32\n//go:noescape\nfunc VpaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddU8 VpaddU8\n//go:noescape\nfunc VpaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddU16 VpaddU16\n//go:noescape\nfunc VpaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddU32 VpaddU32\n//go:noescape\nfunc VpaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpaddF32 VpaddF32\n//go:noescape\nfunc VpaddF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpadddS64 VpadddS64\n//go:noescape\nfunc VpadddS64(r *arm.Int64, v0 *arm.Int64X2)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpadddU64 VpadddU64\n//go:noescape\nfunc VpadddU64(r *arm.Uint64, v0 *arm.Uint64X2)\n\n// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpadddF64 VpadddF64\n//go:noescape\nfunc VpadddF64(r *arm.Float64, v0 *arm.Float64X2)\n\n// Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpaddlS8 VpaddlS8\n//go:noescape\nfunc VpaddlS8(r *arm.Int16X4, v0 *arm.Int8X8)\n\n// Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpaddlS16 VpaddlS16\n//go:noescape\nfunc VpaddlS16(r *arm.Int32X2, v0 *arm.Int16X4)\n\n// Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpaddlS32 VpaddlS32\n//go:noescape\nfunc VpaddlS32(r *arm.Int64X1, v0 *arm.Int32X2)\n\n// Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpaddlU8 VpaddlU8\n//go:noescape\nfunc VpaddlU8(r *arm.Uint16X4, v0 *arm.Uint8X8)\n\n// Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpaddlU16 VpaddlU16\n//go:noescape\nfunc VpaddlU16(r *arm.Uint32X2, v0 *arm.Uint16X4)\n\n// Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpaddlU32 VpaddlU32\n//go:noescape\nfunc VpaddlU32(r *arm.Uint64X1, v0 *arm.Uint32X2)\n\n// Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpaddlqS8 VpaddlqS8\n//go:noescape\nfunc VpaddlqS8(r *arm.Int16X8, v0 *arm.Int8X16)\n\n// Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpaddlqS16 VpaddlqS16\n//go:noescape\nfunc VpaddlqS16(r *arm.Int32X4, v0 *arm.Int16X8)\n\n// Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpaddlqS32 VpaddlqS32\n//go:noescape\nfunc VpaddlqS32(r *arm.Int64X2, v0 *arm.Int32X4)\n\n// Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpaddlqU8 VpaddlqU8\n//go:noescape\nfunc VpaddlqU8(r *arm.Uint16X8, v0 *arm.Uint8X16)\n\n// Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpaddlqU16 VpaddlqU16\n//go:noescape\nfunc VpaddlqU16(r *arm.Uint32X4, v0 *arm.Uint16X8)\n\n// Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VpaddlqU32 VpaddlqU32\n//go:noescape\nfunc VpaddlqU32(r *arm.Uint64X2, v0 *arm.Uint32X4)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqS8 VpaddqS8\n//go:noescape\nfunc VpaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqS16 VpaddqS16\n//go:noescape\nfunc VpaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqS32 VpaddqS32\n//go:noescape\nfunc VpaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqS64 VpaddqS64\n//go:noescape\nfunc VpaddqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqU8 VpaddqU8\n//go:noescape\nfunc VpaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqU16 VpaddqU16\n//go:noescape\nfunc VpaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqU32 VpaddqU32\n//go:noescape\nfunc VpaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqU64 VpaddqU64\n//go:noescape\nfunc VpaddqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpaddqF32 VpaddqF32\n//go:noescape\nfunc VpaddqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpaddqF64 VpaddqF64\n//go:noescape\nfunc VpaddqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpaddsF32 VpaddsF32\n//go:noescape\nfunc VpaddsF32(r *arm.Float32, v0 *arm.Float32X2)\n\n// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxS8 VpmaxS8\n//go:noescape\nfunc VpmaxS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxS16 VpmaxS16\n//go:noescape\nfunc VpmaxS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxS32 VpmaxS32\n//go:noescape\nfunc VpmaxS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxU8 VpmaxU8\n//go:noescape\nfunc VpmaxU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxU16 VpmaxU16\n//go:noescape\nfunc VpmaxU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxU32 VpmaxU32\n//go:noescape\nfunc VpmaxU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxF32 VpmaxF32\n//go:noescape\nfunc VpmaxF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxnmF32 VpmaxnmF32\n//go:noescape\nfunc VpmaxnmF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxnmqF32 VpmaxnmqF32\n//go:noescape\nfunc VpmaxnmqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxnmqF64 VpmaxnmqF64\n//go:noescape\nfunc VpmaxnmqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxnmqdF64 VpmaxnmqdF64\n//go:noescape\nfunc VpmaxnmqdF64(r *arm.Float64, v0 *arm.Float64X2)\n\n// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxnmsF32 VpmaxnmsF32\n//go:noescape\nfunc VpmaxnmsF32(r *arm.Float32, v0 *arm.Float32X2)\n\n// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxqS8 VpmaxqS8\n//go:noescape\nfunc VpmaxqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxqS16 VpmaxqS16\n//go:noescape\nfunc VpmaxqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxqS32 VpmaxqS32\n//go:noescape\nfunc VpmaxqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxqU8 VpmaxqU8\n//go:noescape\nfunc VpmaxqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxqU16 VpmaxqU16\n//go:noescape\nfunc VpmaxqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxqU32 VpmaxqU32\n//go:noescape\nfunc VpmaxqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxqF32 VpmaxqF32\n//go:noescape\nfunc VpmaxqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxqF64 VpmaxqF64\n//go:noescape\nfunc VpmaxqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxqdF64 VpmaxqdF64\n//go:noescape\nfunc VpmaxqdF64(r *arm.Float64, v0 *arm.Float64X2)\n\n// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxsF32 VpmaxsF32\n//go:noescape\nfunc VpmaxsF32(r *arm.Float32, v0 *arm.Float32X2)\n\n// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminS8 VpminS8\n//go:noescape\nfunc VpminS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminS16 VpminS16\n//go:noescape\nfunc VpminS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminS32 VpminS32\n//go:noescape\nfunc VpminS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminU8 VpminU8\n//go:noescape\nfunc VpminU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminU16 VpminU16\n//go:noescape\nfunc VpminU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminU32 VpminU32\n//go:noescape\nfunc VpminU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminF32 VpminF32\n//go:noescape\nfunc VpminF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminnmF32 VpminnmF32\n//go:noescape\nfunc VpminnmF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminnmqF32 VpminnmqF32\n//go:noescape\nfunc VpminnmqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminnmqF64 VpminnmqF64\n//go:noescape\nfunc VpminnmqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminnmqdF64 VpminnmqdF64\n//go:noescape\nfunc VpminnmqdF64(r *arm.Float64, v0 *arm.Float64X2)\n\n// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminnmsF32 VpminnmsF32\n//go:noescape\nfunc VpminnmsF32(r *arm.Float32, v0 *arm.Float32X2)\n\n// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminqS8 VpminqS8\n//go:noescape\nfunc VpminqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminqS16 VpminqS16\n//go:noescape\nfunc VpminqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminqS32 VpminqS32\n//go:noescape\nfunc VpminqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminqU8 VpminqU8\n//go:noescape\nfunc VpminqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminqU16 VpminqU16\n//go:noescape\nfunc VpminqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminqU32 VpminqU32\n//go:noescape\nfunc VpminqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminqF32 VpminqF32\n//go:noescape\nfunc VpminqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminqF64 VpminqF64\n//go:noescape\nfunc VpminqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminqdF64 VpminqdF64\n//go:noescape\nfunc VpminqdF64(r *arm.Float64, v0 *arm.Float64X2)\n\n// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminsF32 VpminsF32\n//go:noescape\nfunc VpminsF32(r *arm.Float32, v0 *arm.Float32X2)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsS8 VqabsS8\n//go:noescape\nfunc VqabsS8(r *arm.Int8X8, v0 *arm.Int8X8)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsS16 VqabsS16\n//go:noescape\nfunc VqabsS16(r *arm.Int16X4, v0 *arm.Int16X4)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsS32 VqabsS32\n//go:noescape\nfunc VqabsS32(r *arm.Int32X2, v0 *arm.Int32X2)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsS64 VqabsS64\n//go:noescape\nfunc VqabsS64(r *arm.Int64X1, v0 *arm.Int64X1)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsbS8 VqabsbS8\n//go:noescape\nfunc VqabsbS8(r *arm.Int8, v0 *arm.Int8)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsdS64 VqabsdS64\n//go:noescape\nfunc VqabsdS64(r *arm.Int64, v0 *arm.Int64)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabshS16 VqabshS16\n//go:noescape\nfunc VqabshS16(r *arm.Int16, v0 *arm.Int16)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsqS8 VqabsqS8\n//go:noescape\nfunc VqabsqS8(r *arm.Int8X16, v0 *arm.Int8X16)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsqS16 VqabsqS16\n//go:noescape\nfunc VqabsqS16(r *arm.Int16X8, v0 *arm.Int16X8)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsqS32 VqabsqS32\n//go:noescape\nfunc VqabsqS32(r *arm.Int32X4, v0 *arm.Int32X4)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsqS64 VqabsqS64\n//go:noescape\nfunc VqabsqS64(r *arm.Int64X2, v0 *arm.Int64X2)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabssS32 VqabssS32\n//go:noescape\nfunc VqabssS32(r *arm.Int32, v0 *arm.Int32)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddS8 VqaddS8\n//go:noescape\nfunc VqaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddS16 VqaddS16\n//go:noescape\nfunc VqaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddS32 VqaddS32\n//go:noescape\nfunc VqaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddS64 VqaddS64\n//go:noescape\nfunc VqaddS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddU8 VqaddU8\n//go:noescape\nfunc VqaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddU16 VqaddU16\n//go:noescape\nfunc VqaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddU32 VqaddU32\n//go:noescape\nfunc VqaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddU64 VqaddU64\n//go:noescape\nfunc VqaddU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddbS8 VqaddbS8\n//go:noescape\nfunc VqaddbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddbU8 VqaddbU8\n//go:noescape\nfunc VqaddbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqadddS64 VqadddS64\n//go:noescape\nfunc VqadddS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqadddU64 VqadddU64\n//go:noescape\nfunc VqadddU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddhS16 VqaddhS16\n//go:noescape\nfunc VqaddhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddhU16 VqaddhU16\n//go:noescape\nfunc VqaddhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqS8 VqaddqS8\n//go:noescape\nfunc VqaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqS16 VqaddqS16\n//go:noescape\nfunc VqaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqS32 VqaddqS32\n//go:noescape\nfunc VqaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqS64 VqaddqS64\n//go:noescape\nfunc VqaddqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqU8 VqaddqU8\n//go:noescape\nfunc VqaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqU16 VqaddqU16\n//go:noescape\nfunc VqaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqU32 VqaddqU32\n//go:noescape\nfunc VqaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqU64 VqaddqU64\n//go:noescape\nfunc VqaddqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddsS32 VqaddsS32\n//go:noescape\nfunc VqaddsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddsU32 VqaddsU32\n//go:noescape\nfunc VqaddsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32)\n\n// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlalS16 VqdmlalS16\n//go:noescape\nfunc VqdmlalS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4)\n\n// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlalS32 VqdmlalS32\n//go:noescape\nfunc VqdmlalS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2)\n\n// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlalHighS16 VqdmlalHighS16\n//go:noescape\nfunc VqdmlalHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlalHighS32 VqdmlalHighS32\n//go:noescape\nfunc VqdmlalHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlalHighNS16 VqdmlalHighNS16\n//go:noescape\nfunc VqdmlalHighNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16)\n\n// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlalHighNS32 VqdmlalHighNS32\n//go:noescape\nfunc VqdmlalHighNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32)\n\n// Vector widening saturating doubling multiply accumulate with scalar\n//\n//go:linkname VqdmlalNS16 VqdmlalNS16\n//go:noescape\nfunc VqdmlalNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16)\n\n// Vector widening saturating doubling multiply accumulate with scalar\n//\n//go:linkname VqdmlalNS32 VqdmlalNS32\n//go:noescape\nfunc VqdmlalNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32)\n\n// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlalhS16 VqdmlalhS16\n//go:noescape\nfunc VqdmlalhS16(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int16, v2 *arm.Int16)\n\n// Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlalsS32 VqdmlalsS32\n//go:noescape\nfunc VqdmlalsS32(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int32, v2 *arm.Int32)\n\n// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlslS16 VqdmlslS16\n//go:noescape\nfunc VqdmlslS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4)\n\n// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlslS32 VqdmlslS32\n//go:noescape\nfunc VqdmlslS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2)\n\n// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlslHighS16 VqdmlslHighS16\n//go:noescape\nfunc VqdmlslHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlslHighS32 VqdmlslHighS32\n//go:noescape\nfunc VqdmlslHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlslHighNS16 VqdmlslHighNS16\n//go:noescape\nfunc VqdmlslHighNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16)\n\n// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlslHighNS32 VqdmlslHighNS32\n//go:noescape\nfunc VqdmlslHighNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32)\n\n// Vector widening saturating doubling multiply subtract with scalar\n//\n//go:linkname VqdmlslNS16 VqdmlslNS16\n//go:noescape\nfunc VqdmlslNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16)\n\n// Vector widening saturating doubling multiply subtract with scalar\n//\n//go:linkname VqdmlslNS32 VqdmlslNS32\n//go:noescape\nfunc VqdmlslNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32)\n\n// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlslhS16 VqdmlslhS16\n//go:noescape\nfunc VqdmlslhS16(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int16, v2 *arm.Int16)\n\n// Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.\n//\n//go:linkname VqdmlslsS32 VqdmlslsS32\n//go:noescape\nfunc VqdmlslsS32(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int32, v2 *arm.Int32)\n\n// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmulhS16 VqdmulhS16\n//go:noescape\nfunc VqdmulhS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmulhS32 VqdmulhS32\n//go:noescape\nfunc VqdmulhS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Vector saturating doubling multiply high with scalar\n//\n//go:linkname VqdmulhNS16 VqdmulhNS16\n//go:noescape\nfunc VqdmulhNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16)\n\n// Vector saturating doubling multiply high with scalar\n//\n//go:linkname VqdmulhNS32 VqdmulhNS32\n//go:noescape\nfunc VqdmulhNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32)\n\n// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmulhhS16 VqdmulhhS16\n//go:noescape\nfunc VqdmulhhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16)\n\n// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmulhqS16 VqdmulhqS16\n//go:noescape\nfunc VqdmulhqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmulhqS32 VqdmulhqS32\n//go:noescape\nfunc VqdmulhqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Vector saturating doubling multiply high with scalar\n//\n//go:linkname VqdmulhqNS16 VqdmulhqNS16\n//go:noescape\nfunc VqdmulhqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16)\n\n// Vector saturating doubling multiply high with scalar\n//\n//go:linkname VqdmulhqNS32 VqdmulhqNS32\n//go:noescape\nfunc VqdmulhqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32)\n\n// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmulhsS32 VqdmulhsS32\n//go:noescape\nfunc VqdmulhsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32)\n\n// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmullS16 VqdmullS16\n//go:noescape\nfunc VqdmullS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmullS32 VqdmullS32\n//go:noescape\nfunc VqdmullS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmullHighS16 VqdmullHighS16\n//go:noescape\nfunc VqdmullHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmullHighS32 VqdmullHighS32\n//go:noescape\nfunc VqdmullHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmullHighNS16 VqdmullHighNS16\n//go:noescape\nfunc VqdmullHighNS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16)\n\n// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmullHighNS32 VqdmullHighNS32\n//go:noescape\nfunc VqdmullHighNS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32)\n\n// Vector saturating doubling long multiply with scalar\n//\n//go:linkname VqdmullNS16 VqdmullNS16\n//go:noescape\nfunc VqdmullNS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16)\n\n// Vector saturating doubling long multiply with scalar\n//\n//go:linkname VqdmullNS32 VqdmullNS32\n//go:noescape\nfunc VqdmullNS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32)\n\n// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmullhS16 VqdmullhS16\n//go:noescape\nfunc VqdmullhS16(r *arm.Int32, v0 *arm.Int16, v1 *arm.Int16)\n\n// Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmullsS32 VqdmullsS32\n//go:noescape\nfunc VqdmullsS32(r *arm.Int64, v0 *arm.Int32, v1 *arm.Int32)\n\n// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VqmovnS16 VqmovnS16\n//go:noescape\nfunc VqmovnS16(r *arm.Int8X8, v0 *arm.Int16X8)\n\n// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VqmovnS32 VqmovnS32\n//go:noescape\nfunc VqmovnS32(r *arm.Int16X4, v0 *arm.Int32X4)\n\n// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VqmovnS64 VqmovnS64\n//go:noescape\nfunc VqmovnS64(r *arm.Int32X2, v0 *arm.Int64X2)\n\n// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VqmovnU16 VqmovnU16\n//go:noescape\nfunc VqmovnU16(r *arm.Uint8X8, v0 *arm.Uint16X8)\n\n// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VqmovnU32 VqmovnU32\n//go:noescape\nfunc VqmovnU32(r *arm.Uint16X4, v0 *arm.Uint32X4)\n\n// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VqmovnU64 VqmovnU64\n//go:noescape\nfunc VqmovnU64(r *arm.Uint32X2, v0 *arm.Uint64X2)\n\n// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VqmovnHighS16 VqmovnHighS16\n//go:noescape\nfunc VqmovnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8)\n\n// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VqmovnHighS32 VqmovnHighS32\n//go:noescape\nfunc VqmovnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4)\n\n// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VqmovnHighS64 VqmovnHighS64\n//go:noescape\nfunc VqmovnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2)\n\n// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VqmovnHighU16 VqmovnHighU16\n//go:noescape\nfunc VqmovnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8)\n\n// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VqmovnHighU32 VqmovnHighU32\n//go:noescape\nfunc VqmovnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4)\n\n// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VqmovnHighU64 VqmovnHighU64\n//go:noescape\nfunc VqmovnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2)\n\n// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VqmovndS64 VqmovndS64\n//go:noescape\nfunc VqmovndS64(r *arm.Int32, v0 *arm.Int64)\n\n// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VqmovndU64 VqmovndU64\n//go:noescape\nfunc VqmovndU64(r *arm.Uint32, v0 *arm.Uint64)\n\n// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VqmovnhS16 VqmovnhS16\n//go:noescape\nfunc VqmovnhS16(r *arm.Int8, v0 *arm.Int16)\n\n// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VqmovnhU16 VqmovnhU16\n//go:noescape\nfunc VqmovnhU16(r *arm.Uint8, v0 *arm.Uint16)\n\n// Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values.\n//\n//go:linkname VqmovnsS32 VqmovnsS32\n//go:noescape\nfunc VqmovnsS32(r *arm.Int16, v0 *arm.Int32)\n\n// Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VqmovnsU32 VqmovnsU32\n//go:noescape\nfunc VqmovnsU32(r *arm.Uint16, v0 *arm.Uint32)\n\n// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VqmovunS16 VqmovunS16\n//go:noescape\nfunc VqmovunS16(r *arm.Uint8X8, v0 *arm.Int16X8)\n\n// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VqmovunS32 VqmovunS32\n//go:noescape\nfunc VqmovunS32(r *arm.Uint16X4, v0 *arm.Int32X4)\n\n// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VqmovunS64 VqmovunS64\n//go:noescape\nfunc VqmovunS64(r *arm.Uint32X2, v0 *arm.Int64X2)\n\n// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VqmovunHighS16 VqmovunHighS16\n//go:noescape\nfunc VqmovunHighS16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Int16X8)\n\n// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VqmovunHighS32 VqmovunHighS32\n//go:noescape\nfunc VqmovunHighS32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Int32X4)\n\n// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VqmovunHighS64 VqmovunHighS64\n//go:noescape\nfunc VqmovunHighS64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Int64X2)\n\n// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VqmovundS64 VqmovundS64\n//go:noescape\nfunc VqmovundS64(r *arm.Uint32, v0 *arm.Int64)\n\n// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VqmovunhS16 VqmovunhS16\n//go:noescape\nfunc VqmovunhS16(r *arm.Uint8, v0 *arm.Int16)\n\n// Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.\n//\n//go:linkname VqmovunsS32 VqmovunsS32\n//go:noescape\nfunc VqmovunsS32(r *arm.Uint16, v0 *arm.Int32)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegS8 VqnegS8\n//go:noescape\nfunc VqnegS8(r *arm.Int8X8, v0 *arm.Int8X8)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegS16 VqnegS16\n//go:noescape\nfunc VqnegS16(r *arm.Int16X4, v0 *arm.Int16X4)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegS32 VqnegS32\n//go:noescape\nfunc VqnegS32(r *arm.Int32X2, v0 *arm.Int32X2)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegS64 VqnegS64\n//go:noescape\nfunc VqnegS64(r *arm.Int64X1, v0 *arm.Int64X1)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegbS8 VqnegbS8\n//go:noescape\nfunc VqnegbS8(r *arm.Int8, v0 *arm.Int8)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegdS64 VqnegdS64\n//go:noescape\nfunc VqnegdS64(r *arm.Int64, v0 *arm.Int64)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqneghS16 VqneghS16\n//go:noescape\nfunc VqneghS16(r *arm.Int16, v0 *arm.Int16)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegqS8 VqnegqS8\n//go:noescape\nfunc VqnegqS8(r *arm.Int8X16, v0 *arm.Int8X16)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegqS16 VqnegqS16\n//go:noescape\nfunc VqnegqS16(r *arm.Int16X8, v0 *arm.Int16X8)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegqS32 VqnegqS32\n//go:noescape\nfunc VqnegqS32(r *arm.Int32X4, v0 *arm.Int32X4)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegqS64 VqnegqS64\n//go:noescape\nfunc VqnegqS64(r *arm.Int64X2, v0 *arm.Int64X2)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegsS32 VqnegsS32\n//go:noescape\nfunc VqnegsS32(r *arm.Int32, v0 *arm.Int32)\n\n// Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded.\n//\n//go:linkname VqrdmlahS16 VqrdmlahS16\n//go:noescape\nfunc VqrdmlahS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4)\n\n// Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded.\n//\n//go:linkname VqrdmlahS32 VqrdmlahS32\n//go:noescape\nfunc VqrdmlahS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2)\n\n// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.\n//\n//go:linkname VqrdmlahhS16 VqrdmlahhS16\n//go:noescape\nfunc VqrdmlahhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, v2 *arm.Int16)\n\n// Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded.\n//\n//go:linkname VqrdmlahqS16 VqrdmlahqS16\n//go:noescape\nfunc VqrdmlahqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded.\n//\n//go:linkname VqrdmlahqS32 VqrdmlahqS32\n//go:noescape\nfunc VqrdmlahqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.\n//\n//go:linkname VqrdmlahsS32 VqrdmlahsS32\n//go:noescape\nfunc VqrdmlahsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, v2 *arm.Int32)\n\n// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.\n//\n//go:linkname VqrdmlshS16 VqrdmlshS16\n//go:noescape\nfunc VqrdmlshS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4)\n\n// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.\n//\n//go:linkname VqrdmlshS32 VqrdmlshS32\n//go:noescape\nfunc VqrdmlshS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2)\n\n// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.\n//\n//go:linkname VqrdmlshhS16 VqrdmlshhS16\n//go:noescape\nfunc VqrdmlshhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, v2 *arm.Int16)\n\n// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.\n//\n//go:linkname VqrdmlshqS16 VqrdmlshqS16\n//go:noescape\nfunc VqrdmlshqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.\n//\n//go:linkname VqrdmlshqS32 VqrdmlshqS32\n//go:noescape\nfunc VqrdmlshqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.\n//\n//go:linkname VqrdmlshsS32 VqrdmlshsS32\n//go:noescape\nfunc VqrdmlshsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, v2 *arm.Int32)\n\n// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrdmulhS16 VqrdmulhS16\n//go:noescape\nfunc VqrdmulhS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrdmulhS32 VqrdmulhS32\n//go:noescape\nfunc VqrdmulhS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Vector saturating rounding doubling multiply high with scalar\n//\n//go:linkname VqrdmulhNS16 VqrdmulhNS16\n//go:noescape\nfunc VqrdmulhNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16)\n\n// Vector saturating rounding doubling multiply high with scalar\n//\n//go:linkname VqrdmulhNS32 VqrdmulhNS32\n//go:noescape\nfunc VqrdmulhNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32)\n\n// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrdmulhhS16 VqrdmulhhS16\n//go:noescape\nfunc VqrdmulhhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16)\n\n// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrdmulhqS16 VqrdmulhqS16\n//go:noescape\nfunc VqrdmulhqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrdmulhqS32 VqrdmulhqS32\n//go:noescape\nfunc VqrdmulhqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Vector saturating rounding doubling multiply high with scalar\n//\n//go:linkname VqrdmulhqNS16 VqrdmulhqNS16\n//go:noescape\nfunc VqrdmulhqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16)\n\n// Vector saturating rounding doubling multiply high with scalar\n//\n//go:linkname VqrdmulhqNS32 VqrdmulhqNS32\n//go:noescape\nfunc VqrdmulhqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32)\n\n// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrdmulhsS32 VqrdmulhsS32\n//go:noescape\nfunc VqrdmulhsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlS8 VqrshlS8\n//go:noescape\nfunc VqrshlS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlS16 VqrshlS16\n//go:noescape\nfunc VqrshlS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlS32 VqrshlS32\n//go:noescape\nfunc VqrshlS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlS64 VqrshlS64\n//go:noescape\nfunc VqrshlS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlU8 VqrshlU8\n//go:noescape\nfunc VqrshlU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8)\n\n// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlU16 VqrshlU16\n//go:noescape\nfunc VqrshlU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4)\n\n// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlU32 VqrshlU32\n//go:noescape\nfunc VqrshlU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2)\n\n// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlU64 VqrshlU64\n//go:noescape\nfunc VqrshlU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlbS8 VqrshlbS8\n//go:noescape\nfunc VqrshlbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8)\n\n// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlbU8 VqrshlbU8\n//go:noescape\nfunc VqrshlbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Int8)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshldS64 VqrshldS64\n//go:noescape\nfunc VqrshldS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)\n\n// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshldU64 VqrshldU64\n//go:noescape\nfunc VqrshldU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlhS16 VqrshlhS16\n//go:noescape\nfunc VqrshlhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16)\n\n// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlhU16 VqrshlhU16\n//go:noescape\nfunc VqrshlhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Int16)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlqS8 VqrshlqS8\n//go:noescape\nfunc VqrshlqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlqS16 VqrshlqS16\n//go:noescape\nfunc VqrshlqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlqS32 VqrshlqS32\n//go:noescape\nfunc VqrshlqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlqS64 VqrshlqS64\n//go:noescape\nfunc VqrshlqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlqU8 VqrshlqU8\n//go:noescape\nfunc VqrshlqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16)\n\n// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlqU16 VqrshlqU16\n//go:noescape\nfunc VqrshlqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8)\n\n// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlqU32 VqrshlqU32\n//go:noescape\nfunc VqrshlqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4)\n\n// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlqU64 VqrshlqU64\n//go:noescape\nfunc VqrshlqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlsS32 VqrshlsS32\n//go:noescape\nfunc VqrshlsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32)\n\n// Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlsU32 VqrshlsU32\n//go:noescape\nfunc VqrshlsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Int32)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlS8 VqshlS8\n//go:noescape\nfunc VqshlS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlS16 VqshlS16\n//go:noescape\nfunc VqshlS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlS32 VqshlS32\n//go:noescape\nfunc VqshlS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlS64 VqshlS64\n//go:noescape\nfunc VqshlS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlU8 VqshlU8\n//go:noescape\nfunc VqshlU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8)\n\n// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlU16 VqshlU16\n//go:noescape\nfunc VqshlU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4)\n\n// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlU32 VqshlU32\n//go:noescape\nfunc VqshlU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2)\n\n// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlU64 VqshlU64\n//go:noescape\nfunc VqshlU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlbS8 VqshlbS8\n//go:noescape\nfunc VqshlbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8)\n\n// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlbU8 VqshlbU8\n//go:noescape\nfunc VqshlbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Int8)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshldS64 VqshldS64\n//go:noescape\nfunc VqshldS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)\n\n// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshldU64 VqshldU64\n//go:noescape\nfunc VqshldU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlhS16 VqshlhS16\n//go:noescape\nfunc VqshlhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16)\n\n// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlhU16 VqshlhU16\n//go:noescape\nfunc VqshlhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Int16)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlqS8 VqshlqS8\n//go:noescape\nfunc VqshlqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlqS16 VqshlqS16\n//go:noescape\nfunc VqshlqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlqS32 VqshlqS32\n//go:noescape\nfunc VqshlqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlqS64 VqshlqS64\n//go:noescape\nfunc VqshlqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlqU8 VqshlqU8\n//go:noescape\nfunc VqshlqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16)\n\n// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlqU16 VqshlqU16\n//go:noescape\nfunc VqshlqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8)\n\n// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlqU32 VqshlqU32\n//go:noescape\nfunc VqshlqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4)\n\n// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlqU64 VqshlqU64\n//go:noescape\nfunc VqshlqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlsS32 VqshlsS32\n//go:noescape\nfunc VqshlsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32)\n\n// Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlsU32 VqshlsU32\n//go:noescape\nfunc VqshlsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Int32)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubS8 VqsubS8\n//go:noescape\nfunc VqsubS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubS16 VqsubS16\n//go:noescape\nfunc VqsubS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubS32 VqsubS32\n//go:noescape\nfunc VqsubS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubS64 VqsubS64\n//go:noescape\nfunc VqsubS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubU8 VqsubU8\n//go:noescape\nfunc VqsubU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubU16 VqsubU16\n//go:noescape\nfunc VqsubU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubU32 VqsubU32\n//go:noescape\nfunc VqsubU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubU64 VqsubU64\n//go:noescape\nfunc VqsubU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubbS8 VqsubbS8\n//go:noescape\nfunc VqsubbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubbU8 VqsubbU8\n//go:noescape\nfunc VqsubbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubdS64 VqsubdS64\n//go:noescape\nfunc VqsubdS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubdU64 VqsubdU64\n//go:noescape\nfunc VqsubdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubhS16 VqsubhS16\n//go:noescape\nfunc VqsubhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubhU16 VqsubhU16\n//go:noescape\nfunc VqsubhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqS8 VqsubqS8\n//go:noescape\nfunc VqsubqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqS16 VqsubqS16\n//go:noescape\nfunc VqsubqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqS32 VqsubqS32\n//go:noescape\nfunc VqsubqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqS64 VqsubqS64\n//go:noescape\nfunc VqsubqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqU8 VqsubqU8\n//go:noescape\nfunc VqsubqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqU16 VqsubqU16\n//go:noescape\nfunc VqsubqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqU32 VqsubqU32\n//go:noescape\nfunc VqsubqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqU64 VqsubqU64\n//go:noescape\nfunc VqsubqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubsS32 VqsubsS32\n//go:noescape\nfunc VqsubsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubsU32 VqsubsU32\n//go:noescape\nfunc VqsubsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl1S8 Vqtbl1S8\n//go:noescape\nfunc Vqtbl1S8(r *arm.Int8X8, v0 *arm.Int8X16, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl1U8 Vqtbl1U8\n//go:noescape\nfunc Vqtbl1U8(r *arm.Uint8X8, v0 *arm.Uint8X16, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl1P8 Vqtbl1P8\n//go:noescape\nfunc Vqtbl1P8(r *arm.Poly8X8, v0 *arm.Poly8X16, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl1QS8 Vqtbl1QS8\n//go:noescape\nfunc Vqtbl1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Uint8X16)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl1QU8 Vqtbl1QU8\n//go:noescape\nfunc Vqtbl1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl1QP8 Vqtbl1QP8\n//go:noescape\nfunc Vqtbl1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Uint8X16)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl2S8 Vqtbl2S8\n//go:noescape\nfunc Vqtbl2S8(r *arm.Int8X8, v0 *arm.Int8X16X2, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl2U8 Vqtbl2U8\n//go:noescape\nfunc Vqtbl2U8(r *arm.Uint8X8, v0 *arm.Uint8X16X2, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl2P8 Vqtbl2P8\n//go:noescape\nfunc Vqtbl2P8(r *arm.Poly8X8, v0 *arm.Poly8X16X2, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl2QS8 Vqtbl2QS8\n//go:noescape\nfunc Vqtbl2QS8(r *arm.Int8X16, v0 *arm.Int8X16X2, v1 *arm.Uint8X16)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl2QU8 Vqtbl2QU8\n//go:noescape\nfunc Vqtbl2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16X2, v1 *arm.Uint8X16)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl2QP8 Vqtbl2QP8\n//go:noescape\nfunc Vqtbl2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16X2, v1 *arm.Uint8X16)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl3S8 Vqtbl3S8\n//go:noescape\nfunc Vqtbl3S8(r *arm.Int8X8, v0 *arm.Int8X16X3, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl3U8 Vqtbl3U8\n//go:noescape\nfunc Vqtbl3U8(r *arm.Uint8X8, v0 *arm.Uint8X16X3, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl3P8 Vqtbl3P8\n//go:noescape\nfunc Vqtbl3P8(r *arm.Poly8X8, v0 *arm.Poly8X16X3, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl3QS8 Vqtbl3QS8\n//go:noescape\nfunc Vqtbl3QS8(r *arm.Int8X16, v0 *arm.Int8X16X3, v1 *arm.Uint8X16)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl3QU8 Vqtbl3QU8\n//go:noescape\nfunc Vqtbl3QU8(r *arm.Uint8X16, v0 *arm.Uint8X16X3, v1 *arm.Uint8X16)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl3QP8 Vqtbl3QP8\n//go:noescape\nfunc Vqtbl3QP8(r *arm.Poly8X16, v0 *arm.Poly8X16X3, v1 *arm.Uint8X16)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl4S8 Vqtbl4S8\n//go:noescape\nfunc Vqtbl4S8(r *arm.Int8X8, v0 *arm.Int8X16X4, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl4U8 Vqtbl4U8\n//go:noescape\nfunc Vqtbl4U8(r *arm.Uint8X8, v0 *arm.Uint8X16X4, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl4P8 Vqtbl4P8\n//go:noescape\nfunc Vqtbl4P8(r *arm.Poly8X8, v0 *arm.Poly8X16X4, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl4QS8 Vqtbl4QS8\n//go:noescape\nfunc Vqtbl4QS8(r *arm.Int8X16, v0 *arm.Int8X16X4, v1 *arm.Uint8X16)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl4QU8 Vqtbl4QU8\n//go:noescape\nfunc Vqtbl4QU8(r *arm.Uint8X16, v0 *arm.Uint8X16X4, v1 *arm.Uint8X16)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl4QP8 Vqtbl4QP8\n//go:noescape\nfunc Vqtbl4QP8(r *arm.Poly8X16, v0 *arm.Poly8X16X4, v1 *arm.Uint8X16)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx1S8 Vqtbx1S8\n//go:noescape\nfunc Vqtbx1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X16, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx1U8 Vqtbx1U8\n//go:noescape\nfunc Vqtbx1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X16, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx1P8 Vqtbx1P8\n//go:noescape\nfunc Vqtbx1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X16, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx1QS8 Vqtbx1QS8\n//go:noescape\nfunc Vqtbx1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Uint8X16)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx1QU8 Vqtbx1QU8\n//go:noescape\nfunc Vqtbx1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx1QP8 Vqtbx1QP8\n//go:noescape\nfunc Vqtbx1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16, v2 *arm.Uint8X16)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx2S8 Vqtbx2S8\n//go:noescape\nfunc Vqtbx2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X16X2, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx2U8 Vqtbx2U8\n//go:noescape\nfunc Vqtbx2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X16X2, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx2P8 Vqtbx2P8\n//go:noescape\nfunc Vqtbx2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X16X2, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx2QS8 Vqtbx2QS8\n//go:noescape\nfunc Vqtbx2QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16X2, v2 *arm.Uint8X16)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx2QU8 Vqtbx2QU8\n//go:noescape\nfunc Vqtbx2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16X2, v2 *arm.Uint8X16)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx2QP8 Vqtbx2QP8\n//go:noescape\nfunc Vqtbx2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16X2, v2 *arm.Uint8X16)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx3S8 Vqtbx3S8\n//go:noescape\nfunc Vqtbx3S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X16X3, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx3U8 Vqtbx3U8\n//go:noescape\nfunc Vqtbx3U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X16X3, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx3P8 Vqtbx3P8\n//go:noescape\nfunc Vqtbx3P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X16X3, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx3QS8 Vqtbx3QS8\n//go:noescape\nfunc Vqtbx3QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16X3, v2 *arm.Uint8X16)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx3QU8 Vqtbx3QU8\n//go:noescape\nfunc Vqtbx3QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16X3, v2 *arm.Uint8X16)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx3QP8 Vqtbx3QP8\n//go:noescape\nfunc Vqtbx3QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16X3, v2 *arm.Uint8X16)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx4S8 Vqtbx4S8\n//go:noescape\nfunc Vqtbx4S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X16X4, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx4U8 Vqtbx4U8\n//go:noescape\nfunc Vqtbx4U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X16X4, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx4P8 Vqtbx4P8\n//go:noescape\nfunc Vqtbx4P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X16X4, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx4QS8 Vqtbx4QS8\n//go:noescape\nfunc Vqtbx4QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16X4, v2 *arm.Uint8X16)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx4QU8 Vqtbx4QU8\n//go:noescape\nfunc Vqtbx4QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16X4, v2 *arm.Uint8X16)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbx4QP8 Vqtbx4QP8\n//go:noescape\nfunc Vqtbx4QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16X4, v2 *arm.Uint8X16)\n\n// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VraddhnS16 VraddhnS16\n//go:noescape\nfunc VraddhnS16(r *arm.Int8X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VraddhnS32 VraddhnS32\n//go:noescape\nfunc VraddhnS32(r *arm.Int16X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VraddhnS64 VraddhnS64\n//go:noescape\nfunc VraddhnS64(r *arm.Int32X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VraddhnU16 VraddhnU16\n//go:noescape\nfunc VraddhnU16(r *arm.Uint8X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VraddhnU32 VraddhnU32\n//go:noescape\nfunc VraddhnU32(r *arm.Uint16X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VraddhnU64 VraddhnU64\n//go:noescape\nfunc VraddhnU64(r *arm.Uint32X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VraddhnHighS16 VraddhnHighS16\n//go:noescape\nfunc VraddhnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VraddhnHighS32 VraddhnHighS32\n//go:noescape\nfunc VraddhnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VraddhnHighS64 VraddhnHighS64\n//go:noescape\nfunc VraddhnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2, v2 *arm.Int64X2)\n\n// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VraddhnHighU16 VraddhnHighU16\n//go:noescape\nfunc VraddhnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)\n\n// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VraddhnHighU32 VraddhnHighU32\n//go:noescape\nfunc VraddhnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VraddhnHighU64 VraddhnHighU64\n//go:noescape\nfunc VraddhnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)\n\n// Rotate and Exclusive OR rotates each 64-bit element of the 128-bit vector in a source SIMD&FP register left by 1, performs a bitwise exclusive OR of the resulting 128-bit vector and the vector in another source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname Vrax1QU64 Vrax1QU64\n//go:noescape\nfunc Vrax1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrbitS8 VrbitS8\n//go:noescape\nfunc VrbitS8(r *arm.Int8X8, v0 *arm.Int8X8)\n\n// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrbitU8 VrbitU8\n//go:noescape\nfunc VrbitU8(r *arm.Uint8X8, v0 *arm.Uint8X8)\n\n// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrbitP8 VrbitP8\n//go:noescape\nfunc VrbitP8(r *arm.Poly8X8, v0 *arm.Poly8X8)\n\n// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrbitqS8 VrbitqS8\n//go:noescape\nfunc VrbitqS8(r *arm.Int8X16, v0 *arm.Int8X16)\n\n// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrbitqU8 VrbitqU8\n//go:noescape\nfunc VrbitqU8(r *arm.Uint8X16, v0 *arm.Uint8X16)\n\n// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrbitqP8 VrbitqP8\n//go:noescape\nfunc VrbitqP8(r *arm.Poly8X16, v0 *arm.Poly8X16)\n\n// Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpeU32 VrecpeU32\n//go:noescape\nfunc VrecpeU32(r *arm.Uint32X2, v0 *arm.Uint32X2)\n\n// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpeF32 VrecpeF32\n//go:noescape\nfunc VrecpeF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpeF64 VrecpeF64\n//go:noescape\nfunc VrecpeF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpedF64 VrecpedF64\n//go:noescape\nfunc VrecpedF64(r *arm.Float64, v0 *arm.Float64)\n\n// Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpeqU32 VrecpeqU32\n//go:noescape\nfunc VrecpeqU32(r *arm.Uint32X4, v0 *arm.Uint32X4)\n\n// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpeqF32 VrecpeqF32\n//go:noescape\nfunc VrecpeqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpeqF64 VrecpeqF64\n//go:noescape\nfunc VrecpeqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpesF32 VrecpesF32\n//go:noescape\nfunc VrecpesF32(r *arm.Float32, v0 *arm.Float32)\n\n// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpsF32 VrecpsF32\n//go:noescape\nfunc VrecpsF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpsF64 VrecpsF64\n//go:noescape\nfunc VrecpsF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpsdF64 VrecpsdF64\n//go:noescape\nfunc VrecpsdF64(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64)\n\n// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpsqF32 VrecpsqF32\n//go:noescape\nfunc VrecpsqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpsqF64 VrecpsqF64\n//go:noescape\nfunc VrecpsqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpssF32 VrecpssF32\n//go:noescape\nfunc VrecpssF32(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32)\n\n// Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpxdF64 VrecpxdF64\n//go:noescape\nfunc VrecpxdF64(r *arm.Float64, v0 *arm.Float64)\n\n// Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpxsF32 VrecpxsF32\n//go:noescape\nfunc VrecpxsF32(r *arm.Float32, v0 *arm.Float32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF32S8 VreinterpretF32S8\n//go:noescape\nfunc VreinterpretF32S8(r *arm.Float32X2, v0 *arm.Int8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF32S16 VreinterpretF32S16\n//go:noescape\nfunc VreinterpretF32S16(r *arm.Float32X2, v0 *arm.Int16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF32S32 VreinterpretF32S32\n//go:noescape\nfunc VreinterpretF32S32(r *arm.Float32X2, v0 *arm.Int32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF32S64 VreinterpretF32S64\n//go:noescape\nfunc VreinterpretF32S64(r *arm.Float32X2, v0 *arm.Int64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF32U8 VreinterpretF32U8\n//go:noescape\nfunc VreinterpretF32U8(r *arm.Float32X2, v0 *arm.Uint8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF32U16 VreinterpretF32U16\n//go:noescape\nfunc VreinterpretF32U16(r *arm.Float32X2, v0 *arm.Uint16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF32U32 VreinterpretF32U32\n//go:noescape\nfunc VreinterpretF32U32(r *arm.Float32X2, v0 *arm.Uint32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF32U64 VreinterpretF32U64\n//go:noescape\nfunc VreinterpretF32U64(r *arm.Float32X2, v0 *arm.Uint64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF32F64 VreinterpretF32F64\n//go:noescape\nfunc VreinterpretF32F64(r *arm.Float32X2, v0 *arm.Float64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF32P16 VreinterpretF32P16\n//go:noescape\nfunc VreinterpretF32P16(r *arm.Float32X2, v0 *arm.Poly16X4)\n\n// vreinterpret_f32_p64\n//\n//go:linkname VreinterpretF32P64 VreinterpretF32P64\n//go:noescape\nfunc VreinterpretF32P64(r *arm.Float32X2, v0 *arm.Poly64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF32P8 VreinterpretF32P8\n//go:noescape\nfunc VreinterpretF32P8(r *arm.Float32X2, v0 *arm.Poly8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF64S8 VreinterpretF64S8\n//go:noescape\nfunc VreinterpretF64S8(r *arm.Float64X1, v0 *arm.Int8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF64S16 VreinterpretF64S16\n//go:noescape\nfunc VreinterpretF64S16(r *arm.Float64X1, v0 *arm.Int16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF64S32 VreinterpretF64S32\n//go:noescape\nfunc VreinterpretF64S32(r *arm.Float64X1, v0 *arm.Int32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF64S64 VreinterpretF64S64\n//go:noescape\nfunc VreinterpretF64S64(r *arm.Float64X1, v0 *arm.Int64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF64U8 VreinterpretF64U8\n//go:noescape\nfunc VreinterpretF64U8(r *arm.Float64X1, v0 *arm.Uint8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF64U16 VreinterpretF64U16\n//go:noescape\nfunc VreinterpretF64U16(r *arm.Float64X1, v0 *arm.Uint16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF64U32 VreinterpretF64U32\n//go:noescape\nfunc VreinterpretF64U32(r *arm.Float64X1, v0 *arm.Uint32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF64U64 VreinterpretF64U64\n//go:noescape\nfunc VreinterpretF64U64(r *arm.Float64X1, v0 *arm.Uint64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF64F32 VreinterpretF64F32\n//go:noescape\nfunc VreinterpretF64F32(r *arm.Float64X1, v0 *arm.Float32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF64P16 VreinterpretF64P16\n//go:noescape\nfunc VreinterpretF64P16(r *arm.Float64X1, v0 *arm.Poly16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF64P64 VreinterpretF64P64\n//go:noescape\nfunc VreinterpretF64P64(r *arm.Float64X1, v0 *arm.Poly64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF64P8 VreinterpretF64P8\n//go:noescape\nfunc VreinterpretF64P8(r *arm.Float64X1, v0 *arm.Poly8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP16S8 VreinterpretP16S8\n//go:noescape\nfunc VreinterpretP16S8(r *arm.Poly16X4, v0 *arm.Int8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP16S16 VreinterpretP16S16\n//go:noescape\nfunc VreinterpretP16S16(r *arm.Poly16X4, v0 *arm.Int16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP16S32 VreinterpretP16S32\n//go:noescape\nfunc VreinterpretP16S32(r *arm.Poly16X4, v0 *arm.Int32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP16S64 VreinterpretP16S64\n//go:noescape\nfunc VreinterpretP16S64(r *arm.Poly16X4, v0 *arm.Int64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP16U8 VreinterpretP16U8\n//go:noescape\nfunc VreinterpretP16U8(r *arm.Poly16X4, v0 *arm.Uint8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP16U16 VreinterpretP16U16\n//go:noescape\nfunc VreinterpretP16U16(r *arm.Poly16X4, v0 *arm.Uint16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP16U32 VreinterpretP16U32\n//go:noescape\nfunc VreinterpretP16U32(r *arm.Poly16X4, v0 *arm.Uint32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP16U64 VreinterpretP16U64\n//go:noescape\nfunc VreinterpretP16U64(r *arm.Poly16X4, v0 *arm.Uint64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP16F32 VreinterpretP16F32\n//go:noescape\nfunc VreinterpretP16F32(r *arm.Poly16X4, v0 *arm.Float32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP16F64 VreinterpretP16F64\n//go:noescape\nfunc VreinterpretP16F64(r *arm.Poly16X4, v0 *arm.Float64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP16P64 VreinterpretP16P64\n//go:noescape\nfunc VreinterpretP16P64(r *arm.Poly16X4, v0 *arm.Poly64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP16P8 VreinterpretP16P8\n//go:noescape\nfunc VreinterpretP16P8(r *arm.Poly16X4, v0 *arm.Poly8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP64S8 VreinterpretP64S8\n//go:noescape\nfunc VreinterpretP64S8(r *arm.Poly64X1, v0 *arm.Int8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP64S16 VreinterpretP64S16\n//go:noescape\nfunc VreinterpretP64S16(r *arm.Poly64X1, v0 *arm.Int16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP64S32 VreinterpretP64S32\n//go:noescape\nfunc VreinterpretP64S32(r *arm.Poly64X1, v0 *arm.Int32X2)\n\n// vreinterpret_p64_s64\n//\n//go:linkname VreinterpretP64S64 VreinterpretP64S64\n//go:noescape\nfunc VreinterpretP64S64(r *arm.Poly64X1, v0 *arm.Int64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP64U8 VreinterpretP64U8\n//go:noescape\nfunc VreinterpretP64U8(r *arm.Poly64X1, v0 *arm.Uint8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP64U16 VreinterpretP64U16\n//go:noescape\nfunc VreinterpretP64U16(r *arm.Poly64X1, v0 *arm.Uint16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP64U32 VreinterpretP64U32\n//go:noescape\nfunc VreinterpretP64U32(r *arm.Poly64X1, v0 *arm.Uint32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP64U64 VreinterpretP64U64\n//go:noescape\nfunc VreinterpretP64U64(r *arm.Poly64X1, v0 *arm.Uint64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP64F32 VreinterpretP64F32\n//go:noescape\nfunc VreinterpretP64F32(r *arm.Poly64X1, v0 *arm.Float32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP64F64 VreinterpretP64F64\n//go:noescape\nfunc VreinterpretP64F64(r *arm.Poly64X1, v0 *arm.Float64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP64P16 VreinterpretP64P16\n//go:noescape\nfunc VreinterpretP64P16(r *arm.Poly64X1, v0 *arm.Poly16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP64P8 VreinterpretP64P8\n//go:noescape\nfunc VreinterpretP64P8(r *arm.Poly64X1, v0 *arm.Poly8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP8S8 VreinterpretP8S8\n//go:noescape\nfunc VreinterpretP8S8(r *arm.Poly8X8, v0 *arm.Int8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP8S16 VreinterpretP8S16\n//go:noescape\nfunc VreinterpretP8S16(r *arm.Poly8X8, v0 *arm.Int16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP8S32 VreinterpretP8S32\n//go:noescape\nfunc VreinterpretP8S32(r *arm.Poly8X8, v0 *arm.Int32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP8S64 VreinterpretP8S64\n//go:noescape\nfunc VreinterpretP8S64(r *arm.Poly8X8, v0 *arm.Int64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP8U8 VreinterpretP8U8\n//go:noescape\nfunc VreinterpretP8U8(r *arm.Poly8X8, v0 *arm.Uint8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP8U16 VreinterpretP8U16\n//go:noescape\nfunc VreinterpretP8U16(r *arm.Poly8X8, v0 *arm.Uint16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP8U32 VreinterpretP8U32\n//go:noescape\nfunc VreinterpretP8U32(r *arm.Poly8X8, v0 *arm.Uint32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP8U64 VreinterpretP8U64\n//go:noescape\nfunc VreinterpretP8U64(r *arm.Poly8X8, v0 *arm.Uint64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP8F32 VreinterpretP8F32\n//go:noescape\nfunc VreinterpretP8F32(r *arm.Poly8X8, v0 *arm.Float32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP8F64 VreinterpretP8F64\n//go:noescape\nfunc VreinterpretP8F64(r *arm.Poly8X8, v0 *arm.Float64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP8P16 VreinterpretP8P16\n//go:noescape\nfunc VreinterpretP8P16(r *arm.Poly8X8, v0 *arm.Poly16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretP8P64 VreinterpretP8P64\n//go:noescape\nfunc VreinterpretP8P64(r *arm.Poly8X8, v0 *arm.Poly64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS16S8 VreinterpretS16S8\n//go:noescape\nfunc VreinterpretS16S8(r *arm.Int16X4, v0 *arm.Int8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS16S32 VreinterpretS16S32\n//go:noescape\nfunc VreinterpretS16S32(r *arm.Int16X4, v0 *arm.Int32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS16S64 VreinterpretS16S64\n//go:noescape\nfunc VreinterpretS16S64(r *arm.Int16X4, v0 *arm.Int64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS16U8 VreinterpretS16U8\n//go:noescape\nfunc VreinterpretS16U8(r *arm.Int16X4, v0 *arm.Uint8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS16U16 VreinterpretS16U16\n//go:noescape\nfunc VreinterpretS16U16(r *arm.Int16X4, v0 *arm.Uint16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS16U32 VreinterpretS16U32\n//go:noescape\nfunc VreinterpretS16U32(r *arm.Int16X4, v0 *arm.Uint32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS16U64 VreinterpretS16U64\n//go:noescape\nfunc VreinterpretS16U64(r *arm.Int16X4, v0 *arm.Uint64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS16F32 VreinterpretS16F32\n//go:noescape\nfunc VreinterpretS16F32(r *arm.Int16X4, v0 *arm.Float32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS16F64 VreinterpretS16F64\n//go:noescape\nfunc VreinterpretS16F64(r *arm.Int16X4, v0 *arm.Float64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS16P16 VreinterpretS16P16\n//go:noescape\nfunc VreinterpretS16P16(r *arm.Int16X4, v0 *arm.Poly16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS16P64 VreinterpretS16P64\n//go:noescape\nfunc VreinterpretS16P64(r *arm.Int16X4, v0 *arm.Poly64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS16P8 VreinterpretS16P8\n//go:noescape\nfunc VreinterpretS16P8(r *arm.Int16X4, v0 *arm.Poly8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS32S8 VreinterpretS32S8\n//go:noescape\nfunc VreinterpretS32S8(r *arm.Int32X2, v0 *arm.Int8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS32S16 VreinterpretS32S16\n//go:noescape\nfunc VreinterpretS32S16(r *arm.Int32X2, v0 *arm.Int16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS32S64 VreinterpretS32S64\n//go:noescape\nfunc VreinterpretS32S64(r *arm.Int32X2, v0 *arm.Int64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS32U8 VreinterpretS32U8\n//go:noescape\nfunc VreinterpretS32U8(r *arm.Int32X2, v0 *arm.Uint8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS32U16 VreinterpretS32U16\n//go:noescape\nfunc VreinterpretS32U16(r *arm.Int32X2, v0 *arm.Uint16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS32U32 VreinterpretS32U32\n//go:noescape\nfunc VreinterpretS32U32(r *arm.Int32X2, v0 *arm.Uint32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS32U64 VreinterpretS32U64\n//go:noescape\nfunc VreinterpretS32U64(r *arm.Int32X2, v0 *arm.Uint64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS32F32 VreinterpretS32F32\n//go:noescape\nfunc VreinterpretS32F32(r *arm.Int32X2, v0 *arm.Float32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS32F64 VreinterpretS32F64\n//go:noescape\nfunc VreinterpretS32F64(r *arm.Int32X2, v0 *arm.Float64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS32P16 VreinterpretS32P16\n//go:noescape\nfunc VreinterpretS32P16(r *arm.Int32X2, v0 *arm.Poly16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS32P64 VreinterpretS32P64\n//go:noescape\nfunc VreinterpretS32P64(r *arm.Int32X2, v0 *arm.Poly64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS32P8 VreinterpretS32P8\n//go:noescape\nfunc VreinterpretS32P8(r *arm.Int32X2, v0 *arm.Poly8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS64S8 VreinterpretS64S8\n//go:noescape\nfunc VreinterpretS64S8(r *arm.Int64X1, v0 *arm.Int8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS64S16 VreinterpretS64S16\n//go:noescape\nfunc VreinterpretS64S16(r *arm.Int64X1, v0 *arm.Int16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS64S32 VreinterpretS64S32\n//go:noescape\nfunc VreinterpretS64S32(r *arm.Int64X1, v0 *arm.Int32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS64U8 VreinterpretS64U8\n//go:noescape\nfunc VreinterpretS64U8(r *arm.Int64X1, v0 *arm.Uint8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS64U16 VreinterpretS64U16\n//go:noescape\nfunc VreinterpretS64U16(r *arm.Int64X1, v0 *arm.Uint16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS64U32 VreinterpretS64U32\n//go:noescape\nfunc VreinterpretS64U32(r *arm.Int64X1, v0 *arm.Uint32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS64U64 VreinterpretS64U64\n//go:noescape\nfunc VreinterpretS64U64(r *arm.Int64X1, v0 *arm.Uint64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS64F32 VreinterpretS64F32\n//go:noescape\nfunc VreinterpretS64F32(r *arm.Int64X1, v0 *arm.Float32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS64F64 VreinterpretS64F64\n//go:noescape\nfunc VreinterpretS64F64(r *arm.Int64X1, v0 *arm.Float64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS64P16 VreinterpretS64P16\n//go:noescape\nfunc VreinterpretS64P16(r *arm.Int64X1, v0 *arm.Poly16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS64P64 VreinterpretS64P64\n//go:noescape\nfunc VreinterpretS64P64(r *arm.Int64X1, v0 *arm.Poly64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS64P8 VreinterpretS64P8\n//go:noescape\nfunc VreinterpretS64P8(r *arm.Int64X1, v0 *arm.Poly8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS8S16 VreinterpretS8S16\n//go:noescape\nfunc VreinterpretS8S16(r *arm.Int8X8, v0 *arm.Int16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS8S32 VreinterpretS8S32\n//go:noescape\nfunc VreinterpretS8S32(r *arm.Int8X8, v0 *arm.Int32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS8S64 VreinterpretS8S64\n//go:noescape\nfunc VreinterpretS8S64(r *arm.Int8X8, v0 *arm.Int64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS8U8 VreinterpretS8U8\n//go:noescape\nfunc VreinterpretS8U8(r *arm.Int8X8, v0 *arm.Uint8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS8U16 VreinterpretS8U16\n//go:noescape\nfunc VreinterpretS8U16(r *arm.Int8X8, v0 *arm.Uint16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS8U32 VreinterpretS8U32\n//go:noescape\nfunc VreinterpretS8U32(r *arm.Int8X8, v0 *arm.Uint32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS8U64 VreinterpretS8U64\n//go:noescape\nfunc VreinterpretS8U64(r *arm.Int8X8, v0 *arm.Uint64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS8F32 VreinterpretS8F32\n//go:noescape\nfunc VreinterpretS8F32(r *arm.Int8X8, v0 *arm.Float32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS8F64 VreinterpretS8F64\n//go:noescape\nfunc VreinterpretS8F64(r *arm.Int8X8, v0 *arm.Float64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS8P16 VreinterpretS8P16\n//go:noescape\nfunc VreinterpretS8P16(r *arm.Int8X8, v0 *arm.Poly16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS8P64 VreinterpretS8P64\n//go:noescape\nfunc VreinterpretS8P64(r *arm.Int8X8, v0 *arm.Poly64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS8P8 VreinterpretS8P8\n//go:noescape\nfunc VreinterpretS8P8(r *arm.Int8X8, v0 *arm.Poly8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU16S8 VreinterpretU16S8\n//go:noescape\nfunc VreinterpretU16S8(r *arm.Uint16X4, v0 *arm.Int8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU16S16 VreinterpretU16S16\n//go:noescape\nfunc VreinterpretU16S16(r *arm.Uint16X4, v0 *arm.Int16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU16S32 VreinterpretU16S32\n//go:noescape\nfunc VreinterpretU16S32(r *arm.Uint16X4, v0 *arm.Int32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU16S64 VreinterpretU16S64\n//go:noescape\nfunc VreinterpretU16S64(r *arm.Uint16X4, v0 *arm.Int64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU16U8 VreinterpretU16U8\n//go:noescape\nfunc VreinterpretU16U8(r *arm.Uint16X4, v0 *arm.Uint8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU16U32 VreinterpretU16U32\n//go:noescape\nfunc VreinterpretU16U32(r *arm.Uint16X4, v0 *arm.Uint32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU16U64 VreinterpretU16U64\n//go:noescape\nfunc VreinterpretU16U64(r *arm.Uint16X4, v0 *arm.Uint64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU16F32 VreinterpretU16F32\n//go:noescape\nfunc VreinterpretU16F32(r *arm.Uint16X4, v0 *arm.Float32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU16F64 VreinterpretU16F64\n//go:noescape\nfunc VreinterpretU16F64(r *arm.Uint16X4, v0 *arm.Float64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU16P16 VreinterpretU16P16\n//go:noescape\nfunc VreinterpretU16P16(r *arm.Uint16X4, v0 *arm.Poly16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU16P64 VreinterpretU16P64\n//go:noescape\nfunc VreinterpretU16P64(r *arm.Uint16X4, v0 *arm.Poly64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU16P8 VreinterpretU16P8\n//go:noescape\nfunc VreinterpretU16P8(r *arm.Uint16X4, v0 *arm.Poly8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU32S8 VreinterpretU32S8\n//go:noescape\nfunc VreinterpretU32S8(r *arm.Uint32X2, v0 *arm.Int8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU32S16 VreinterpretU32S16\n//go:noescape\nfunc VreinterpretU32S16(r *arm.Uint32X2, v0 *arm.Int16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU32S32 VreinterpretU32S32\n//go:noescape\nfunc VreinterpretU32S32(r *arm.Uint32X2, v0 *arm.Int32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU32S64 VreinterpretU32S64\n//go:noescape\nfunc VreinterpretU32S64(r *arm.Uint32X2, v0 *arm.Int64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU32U8 VreinterpretU32U8\n//go:noescape\nfunc VreinterpretU32U8(r *arm.Uint32X2, v0 *arm.Uint8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU32U16 VreinterpretU32U16\n//go:noescape\nfunc VreinterpretU32U16(r *arm.Uint32X2, v0 *arm.Uint16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU32U64 VreinterpretU32U64\n//go:noescape\nfunc VreinterpretU32U64(r *arm.Uint32X2, v0 *arm.Uint64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU32F32 VreinterpretU32F32\n//go:noescape\nfunc VreinterpretU32F32(r *arm.Uint32X2, v0 *arm.Float32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU32F64 VreinterpretU32F64\n//go:noescape\nfunc VreinterpretU32F64(r *arm.Uint32X2, v0 *arm.Float64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU32P16 VreinterpretU32P16\n//go:noescape\nfunc VreinterpretU32P16(r *arm.Uint32X2, v0 *arm.Poly16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU32P64 VreinterpretU32P64\n//go:noescape\nfunc VreinterpretU32P64(r *arm.Uint32X2, v0 *arm.Poly64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU32P8 VreinterpretU32P8\n//go:noescape\nfunc VreinterpretU32P8(r *arm.Uint32X2, v0 *arm.Poly8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU64S8 VreinterpretU64S8\n//go:noescape\nfunc VreinterpretU64S8(r *arm.Uint64X1, v0 *arm.Int8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU64S16 VreinterpretU64S16\n//go:noescape\nfunc VreinterpretU64S16(r *arm.Uint64X1, v0 *arm.Int16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU64S32 VreinterpretU64S32\n//go:noescape\nfunc VreinterpretU64S32(r *arm.Uint64X1, v0 *arm.Int32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU64S64 VreinterpretU64S64\n//go:noescape\nfunc VreinterpretU64S64(r *arm.Uint64X1, v0 *arm.Int64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU64U8 VreinterpretU64U8\n//go:noescape\nfunc VreinterpretU64U8(r *arm.Uint64X1, v0 *arm.Uint8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU64U16 VreinterpretU64U16\n//go:noescape\nfunc VreinterpretU64U16(r *arm.Uint64X1, v0 *arm.Uint16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU64U32 VreinterpretU64U32\n//go:noescape\nfunc VreinterpretU64U32(r *arm.Uint64X1, v0 *arm.Uint32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU64F32 VreinterpretU64F32\n//go:noescape\nfunc VreinterpretU64F32(r *arm.Uint64X1, v0 *arm.Float32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU64F64 VreinterpretU64F64\n//go:noescape\nfunc VreinterpretU64F64(r *arm.Uint64X1, v0 *arm.Float64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU64P16 VreinterpretU64P16\n//go:noescape\nfunc VreinterpretU64P16(r *arm.Uint64X1, v0 *arm.Poly16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU64P64 VreinterpretU64P64\n//go:noescape\nfunc VreinterpretU64P64(r *arm.Uint64X1, v0 *arm.Poly64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU64P8 VreinterpretU64P8\n//go:noescape\nfunc VreinterpretU64P8(r *arm.Uint64X1, v0 *arm.Poly8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU8S8 VreinterpretU8S8\n//go:noescape\nfunc VreinterpretU8S8(r *arm.Uint8X8, v0 *arm.Int8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU8S16 VreinterpretU8S16\n//go:noescape\nfunc VreinterpretU8S16(r *arm.Uint8X8, v0 *arm.Int16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU8S32 VreinterpretU8S32\n//go:noescape\nfunc VreinterpretU8S32(r *arm.Uint8X8, v0 *arm.Int32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU8S64 VreinterpretU8S64\n//go:noescape\nfunc VreinterpretU8S64(r *arm.Uint8X8, v0 *arm.Int64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU8U16 VreinterpretU8U16\n//go:noescape\nfunc VreinterpretU8U16(r *arm.Uint8X8, v0 *arm.Uint16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU8U32 VreinterpretU8U32\n//go:noescape\nfunc VreinterpretU8U32(r *arm.Uint8X8, v0 *arm.Uint32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU8U64 VreinterpretU8U64\n//go:noescape\nfunc VreinterpretU8U64(r *arm.Uint8X8, v0 *arm.Uint64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU8F32 VreinterpretU8F32\n//go:noescape\nfunc VreinterpretU8F32(r *arm.Uint8X8, v0 *arm.Float32X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU8F64 VreinterpretU8F64\n//go:noescape\nfunc VreinterpretU8F64(r *arm.Uint8X8, v0 *arm.Float64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU8P16 VreinterpretU8P16\n//go:noescape\nfunc VreinterpretU8P16(r *arm.Uint8X8, v0 *arm.Poly16X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU8P64 VreinterpretU8P64\n//go:noescape\nfunc VreinterpretU8P64(r *arm.Uint8X8, v0 *arm.Poly64X1)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU8P8 VreinterpretU8P8\n//go:noescape\nfunc VreinterpretU8P8(r *arm.Uint8X8, v0 *arm.Poly8X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF32S8 VreinterpretqF32S8\n//go:noescape\nfunc VreinterpretqF32S8(r *arm.Float32X4, v0 *arm.Int8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF32S16 VreinterpretqF32S16\n//go:noescape\nfunc VreinterpretqF32S16(r *arm.Float32X4, v0 *arm.Int16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF32S32 VreinterpretqF32S32\n//go:noescape\nfunc VreinterpretqF32S32(r *arm.Float32X4, v0 *arm.Int32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF32S64 VreinterpretqF32S64\n//go:noescape\nfunc VreinterpretqF32S64(r *arm.Float32X4, v0 *arm.Int64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF32U8 VreinterpretqF32U8\n//go:noescape\nfunc VreinterpretqF32U8(r *arm.Float32X4, v0 *arm.Uint8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF32U16 VreinterpretqF32U16\n//go:noescape\nfunc VreinterpretqF32U16(r *arm.Float32X4, v0 *arm.Uint16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF32U32 VreinterpretqF32U32\n//go:noescape\nfunc VreinterpretqF32U32(r *arm.Float32X4, v0 *arm.Uint32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF32U64 VreinterpretqF32U64\n//go:noescape\nfunc VreinterpretqF32U64(r *arm.Float32X4, v0 *arm.Uint64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF32F64 VreinterpretqF32F64\n//go:noescape\nfunc VreinterpretqF32F64(r *arm.Float32X4, v0 *arm.Float64X2)\n\n// vreinterpretq_f32_p128\n//\n//go:linkname VreinterpretqF32P128 VreinterpretqF32P128\n//go:noescape\nfunc VreinterpretqF32P128(r *arm.Float32X4, v0 *arm.Poly128)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF32P16 VreinterpretqF32P16\n//go:noescape\nfunc VreinterpretqF32P16(r *arm.Float32X4, v0 *arm.Poly16X8)\n\n// vreinterpretq_f32_p64\n//\n//go:linkname VreinterpretqF32P64 VreinterpretqF32P64\n//go:noescape\nfunc VreinterpretqF32P64(r *arm.Float32X4, v0 *arm.Poly64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF32P8 VreinterpretqF32P8\n//go:noescape\nfunc VreinterpretqF32P8(r *arm.Float32X4, v0 *arm.Poly8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64S8 VreinterpretqF64S8\n//go:noescape\nfunc VreinterpretqF64S8(r *arm.Float64X2, v0 *arm.Int8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64S16 VreinterpretqF64S16\n//go:noescape\nfunc VreinterpretqF64S16(r *arm.Float64X2, v0 *arm.Int16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64S32 VreinterpretqF64S32\n//go:noescape\nfunc VreinterpretqF64S32(r *arm.Float64X2, v0 *arm.Int32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64S64 VreinterpretqF64S64\n//go:noescape\nfunc VreinterpretqF64S64(r *arm.Float64X2, v0 *arm.Int64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64U8 VreinterpretqF64U8\n//go:noescape\nfunc VreinterpretqF64U8(r *arm.Float64X2, v0 *arm.Uint8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64U16 VreinterpretqF64U16\n//go:noescape\nfunc VreinterpretqF64U16(r *arm.Float64X2, v0 *arm.Uint16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64U32 VreinterpretqF64U32\n//go:noescape\nfunc VreinterpretqF64U32(r *arm.Float64X2, v0 *arm.Uint32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64U64 VreinterpretqF64U64\n//go:noescape\nfunc VreinterpretqF64U64(r *arm.Float64X2, v0 *arm.Uint64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64F32 VreinterpretqF64F32\n//go:noescape\nfunc VreinterpretqF64F32(r *arm.Float64X2, v0 *arm.Float32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64P128 VreinterpretqF64P128\n//go:noescape\nfunc VreinterpretqF64P128(r *arm.Float64X2, v0 *arm.Poly128)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64P16 VreinterpretqF64P16\n//go:noescape\nfunc VreinterpretqF64P16(r *arm.Float64X2, v0 *arm.Poly16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64P64 VreinterpretqF64P64\n//go:noescape\nfunc VreinterpretqF64P64(r *arm.Float64X2, v0 *arm.Poly64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64P8 VreinterpretqF64P8\n//go:noescape\nfunc VreinterpretqF64P8(r *arm.Float64X2, v0 *arm.Poly8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP128S8 VreinterpretqP128S8\n//go:noescape\nfunc VreinterpretqP128S8(r *arm.Poly128, v0 *arm.Int8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP128S16 VreinterpretqP128S16\n//go:noescape\nfunc VreinterpretqP128S16(r *arm.Poly128, v0 *arm.Int16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP128S32 VreinterpretqP128S32\n//go:noescape\nfunc VreinterpretqP128S32(r *arm.Poly128, v0 *arm.Int32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP128S64 VreinterpretqP128S64\n//go:noescape\nfunc VreinterpretqP128S64(r *arm.Poly128, v0 *arm.Int64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP128U8 VreinterpretqP128U8\n//go:noescape\nfunc VreinterpretqP128U8(r *arm.Poly128, v0 *arm.Uint8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP128U16 VreinterpretqP128U16\n//go:noescape\nfunc VreinterpretqP128U16(r *arm.Poly128, v0 *arm.Uint16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP128U32 VreinterpretqP128U32\n//go:noescape\nfunc VreinterpretqP128U32(r *arm.Poly128, v0 *arm.Uint32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP128U64 VreinterpretqP128U64\n//go:noescape\nfunc VreinterpretqP128U64(r *arm.Poly128, v0 *arm.Uint64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP128F32 VreinterpretqP128F32\n//go:noescape\nfunc VreinterpretqP128F32(r *arm.Poly128, v0 *arm.Float32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP128F64 VreinterpretqP128F64\n//go:noescape\nfunc VreinterpretqP128F64(r *arm.Poly128, v0 *arm.Float64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP128P16 VreinterpretqP128P16\n//go:noescape\nfunc VreinterpretqP128P16(r *arm.Poly128, v0 *arm.Poly16X8)\n\n// vreinterpretq_p128_p64\n//\n//go:linkname VreinterpretqP128P64 VreinterpretqP128P64\n//go:noescape\nfunc VreinterpretqP128P64(r *arm.Poly128, v0 *arm.Poly64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP128P8 VreinterpretqP128P8\n//go:noescape\nfunc VreinterpretqP128P8(r *arm.Poly128, v0 *arm.Poly8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP16S8 VreinterpretqP16S8\n//go:noescape\nfunc VreinterpretqP16S8(r *arm.Poly16X8, v0 *arm.Int8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP16S16 VreinterpretqP16S16\n//go:noescape\nfunc VreinterpretqP16S16(r *arm.Poly16X8, v0 *arm.Int16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP16S32 VreinterpretqP16S32\n//go:noescape\nfunc VreinterpretqP16S32(r *arm.Poly16X8, v0 *arm.Int32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP16S64 VreinterpretqP16S64\n//go:noescape\nfunc VreinterpretqP16S64(r *arm.Poly16X8, v0 *arm.Int64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP16U8 VreinterpretqP16U8\n//go:noescape\nfunc VreinterpretqP16U8(r *arm.Poly16X8, v0 *arm.Uint8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP16U16 VreinterpretqP16U16\n//go:noescape\nfunc VreinterpretqP16U16(r *arm.Poly16X8, v0 *arm.Uint16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP16U32 VreinterpretqP16U32\n//go:noescape\nfunc VreinterpretqP16U32(r *arm.Poly16X8, v0 *arm.Uint32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP16U64 VreinterpretqP16U64\n//go:noescape\nfunc VreinterpretqP16U64(r *arm.Poly16X8, v0 *arm.Uint64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP16F32 VreinterpretqP16F32\n//go:noescape\nfunc VreinterpretqP16F32(r *arm.Poly16X8, v0 *arm.Float32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP16F64 VreinterpretqP16F64\n//go:noescape\nfunc VreinterpretqP16F64(r *arm.Poly16X8, v0 *arm.Float64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP16P128 VreinterpretqP16P128\n//go:noescape\nfunc VreinterpretqP16P128(r *arm.Poly16X8, v0 *arm.Poly128)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP16P64 VreinterpretqP16P64\n//go:noescape\nfunc VreinterpretqP16P64(r *arm.Poly16X8, v0 *arm.Poly64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP16P8 VreinterpretqP16P8\n//go:noescape\nfunc VreinterpretqP16P8(r *arm.Poly16X8, v0 *arm.Poly8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP64S8 VreinterpretqP64S8\n//go:noescape\nfunc VreinterpretqP64S8(r *arm.Poly64X2, v0 *arm.Int8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP64S16 VreinterpretqP64S16\n//go:noescape\nfunc VreinterpretqP64S16(r *arm.Poly64X2, v0 *arm.Int16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP64S32 VreinterpretqP64S32\n//go:noescape\nfunc VreinterpretqP64S32(r *arm.Poly64X2, v0 *arm.Int32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP64S64 VreinterpretqP64S64\n//go:noescape\nfunc VreinterpretqP64S64(r *arm.Poly64X2, v0 *arm.Int64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP64U8 VreinterpretqP64U8\n//go:noescape\nfunc VreinterpretqP64U8(r *arm.Poly64X2, v0 *arm.Uint8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP64U16 VreinterpretqP64U16\n//go:noescape\nfunc VreinterpretqP64U16(r *arm.Poly64X2, v0 *arm.Uint16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP64U32 VreinterpretqP64U32\n//go:noescape\nfunc VreinterpretqP64U32(r *arm.Poly64X2, v0 *arm.Uint32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP64U64 VreinterpretqP64U64\n//go:noescape\nfunc VreinterpretqP64U64(r *arm.Poly64X2, v0 *arm.Uint64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP64F32 VreinterpretqP64F32\n//go:noescape\nfunc VreinterpretqP64F32(r *arm.Poly64X2, v0 *arm.Float32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP64F64 VreinterpretqP64F64\n//go:noescape\nfunc VreinterpretqP64F64(r *arm.Poly64X2, v0 *arm.Float64X2)\n\n// vreinterpretq_p64_p128\n//\n//go:linkname VreinterpretqP64P128 VreinterpretqP64P128\n//go:noescape\nfunc VreinterpretqP64P128(r *arm.Poly64X2, v0 *arm.Poly128)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP64P16 VreinterpretqP64P16\n//go:noescape\nfunc VreinterpretqP64P16(r *arm.Poly64X2, v0 *arm.Poly16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP64P8 VreinterpretqP64P8\n//go:noescape\nfunc VreinterpretqP64P8(r *arm.Poly64X2, v0 *arm.Poly8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP8S8 VreinterpretqP8S8\n//go:noescape\nfunc VreinterpretqP8S8(r *arm.Poly8X16, v0 *arm.Int8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP8S16 VreinterpretqP8S16\n//go:noescape\nfunc VreinterpretqP8S16(r *arm.Poly8X16, v0 *arm.Int16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP8S32 VreinterpretqP8S32\n//go:noescape\nfunc VreinterpretqP8S32(r *arm.Poly8X16, v0 *arm.Int32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP8S64 VreinterpretqP8S64\n//go:noescape\nfunc VreinterpretqP8S64(r *arm.Poly8X16, v0 *arm.Int64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP8U8 VreinterpretqP8U8\n//go:noescape\nfunc VreinterpretqP8U8(r *arm.Poly8X16, v0 *arm.Uint8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP8U16 VreinterpretqP8U16\n//go:noescape\nfunc VreinterpretqP8U16(r *arm.Poly8X16, v0 *arm.Uint16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP8U32 VreinterpretqP8U32\n//go:noescape\nfunc VreinterpretqP8U32(r *arm.Poly8X16, v0 *arm.Uint32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP8U64 VreinterpretqP8U64\n//go:noescape\nfunc VreinterpretqP8U64(r *arm.Poly8X16, v0 *arm.Uint64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP8F32 VreinterpretqP8F32\n//go:noescape\nfunc VreinterpretqP8F32(r *arm.Poly8X16, v0 *arm.Float32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP8F64 VreinterpretqP8F64\n//go:noescape\nfunc VreinterpretqP8F64(r *arm.Poly8X16, v0 *arm.Float64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP8P128 VreinterpretqP8P128\n//go:noescape\nfunc VreinterpretqP8P128(r *arm.Poly8X16, v0 *arm.Poly128)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP8P16 VreinterpretqP8P16\n//go:noescape\nfunc VreinterpretqP8P16(r *arm.Poly8X16, v0 *arm.Poly16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqP8P64 VreinterpretqP8P64\n//go:noescape\nfunc VreinterpretqP8P64(r *arm.Poly8X16, v0 *arm.Poly64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS16S8 VreinterpretqS16S8\n//go:noescape\nfunc VreinterpretqS16S8(r *arm.Int16X8, v0 *arm.Int8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS16S32 VreinterpretqS16S32\n//go:noescape\nfunc VreinterpretqS16S32(r *arm.Int16X8, v0 *arm.Int32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS16S64 VreinterpretqS16S64\n//go:noescape\nfunc VreinterpretqS16S64(r *arm.Int16X8, v0 *arm.Int64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS16U8 VreinterpretqS16U8\n//go:noescape\nfunc VreinterpretqS16U8(r *arm.Int16X8, v0 *arm.Uint8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS16U16 VreinterpretqS16U16\n//go:noescape\nfunc VreinterpretqS16U16(r *arm.Int16X8, v0 *arm.Uint16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS16U32 VreinterpretqS16U32\n//go:noescape\nfunc VreinterpretqS16U32(r *arm.Int16X8, v0 *arm.Uint32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS16U64 VreinterpretqS16U64\n//go:noescape\nfunc VreinterpretqS16U64(r *arm.Int16X8, v0 *arm.Uint64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS16F32 VreinterpretqS16F32\n//go:noescape\nfunc VreinterpretqS16F32(r *arm.Int16X8, v0 *arm.Float32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS16F64 VreinterpretqS16F64\n//go:noescape\nfunc VreinterpretqS16F64(r *arm.Int16X8, v0 *arm.Float64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS16P128 VreinterpretqS16P128\n//go:noescape\nfunc VreinterpretqS16P128(r *arm.Int16X8, v0 *arm.Poly128)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS16P16 VreinterpretqS16P16\n//go:noescape\nfunc VreinterpretqS16P16(r *arm.Int16X8, v0 *arm.Poly16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS16P64 VreinterpretqS16P64\n//go:noescape\nfunc VreinterpretqS16P64(r *arm.Int16X8, v0 *arm.Poly64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS16P8 VreinterpretqS16P8\n//go:noescape\nfunc VreinterpretqS16P8(r *arm.Int16X8, v0 *arm.Poly8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32S8 VreinterpretqS32S8\n//go:noescape\nfunc VreinterpretqS32S8(r *arm.Int32X4, v0 *arm.Int8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32S16 VreinterpretqS32S16\n//go:noescape\nfunc VreinterpretqS32S16(r *arm.Int32X4, v0 *arm.Int16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32S64 VreinterpretqS32S64\n//go:noescape\nfunc VreinterpretqS32S64(r *arm.Int32X4, v0 *arm.Int64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32U8 VreinterpretqS32U8\n//go:noescape\nfunc VreinterpretqS32U8(r *arm.Int32X4, v0 *arm.Uint8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32U16 VreinterpretqS32U16\n//go:noescape\nfunc VreinterpretqS32U16(r *arm.Int32X4, v0 *arm.Uint16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32U32 VreinterpretqS32U32\n//go:noescape\nfunc VreinterpretqS32U32(r *arm.Int32X4, v0 *arm.Uint32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32U64 VreinterpretqS32U64\n//go:noescape\nfunc VreinterpretqS32U64(r *arm.Int32X4, v0 *arm.Uint64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32F32 VreinterpretqS32F32\n//go:noescape\nfunc VreinterpretqS32F32(r *arm.Int32X4, v0 *arm.Float32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32F64 VreinterpretqS32F64\n//go:noescape\nfunc VreinterpretqS32F64(r *arm.Int32X4, v0 *arm.Float64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32P128 VreinterpretqS32P128\n//go:noescape\nfunc VreinterpretqS32P128(r *arm.Int32X4, v0 *arm.Poly128)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32P16 VreinterpretqS32P16\n//go:noescape\nfunc VreinterpretqS32P16(r *arm.Int32X4, v0 *arm.Poly16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32P64 VreinterpretqS32P64\n//go:noescape\nfunc VreinterpretqS32P64(r *arm.Int32X4, v0 *arm.Poly64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32P8 VreinterpretqS32P8\n//go:noescape\nfunc VreinterpretqS32P8(r *arm.Int32X4, v0 *arm.Poly8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64S8 VreinterpretqS64S8\n//go:noescape\nfunc VreinterpretqS64S8(r *arm.Int64X2, v0 *arm.Int8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64S16 VreinterpretqS64S16\n//go:noescape\nfunc VreinterpretqS64S16(r *arm.Int64X2, v0 *arm.Int16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64S32 VreinterpretqS64S32\n//go:noescape\nfunc VreinterpretqS64S32(r *arm.Int64X2, v0 *arm.Int32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64U8 VreinterpretqS64U8\n//go:noescape\nfunc VreinterpretqS64U8(r *arm.Int64X2, v0 *arm.Uint8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64U16 VreinterpretqS64U16\n//go:noescape\nfunc VreinterpretqS64U16(r *arm.Int64X2, v0 *arm.Uint16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64U32 VreinterpretqS64U32\n//go:noescape\nfunc VreinterpretqS64U32(r *arm.Int64X2, v0 *arm.Uint32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64U64 VreinterpretqS64U64\n//go:noescape\nfunc VreinterpretqS64U64(r *arm.Int64X2, v0 *arm.Uint64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64F32 VreinterpretqS64F32\n//go:noescape\nfunc VreinterpretqS64F32(r *arm.Int64X2, v0 *arm.Float32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64F64 VreinterpretqS64F64\n//go:noescape\nfunc VreinterpretqS64F64(r *arm.Int64X2, v0 *arm.Float64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64P128 VreinterpretqS64P128\n//go:noescape\nfunc VreinterpretqS64P128(r *arm.Int64X2, v0 *arm.Poly128)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64P16 VreinterpretqS64P16\n//go:noescape\nfunc VreinterpretqS64P16(r *arm.Int64X2, v0 *arm.Poly16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64P64 VreinterpretqS64P64\n//go:noescape\nfunc VreinterpretqS64P64(r *arm.Int64X2, v0 *arm.Poly64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64P8 VreinterpretqS64P8\n//go:noescape\nfunc VreinterpretqS64P8(r *arm.Int64X2, v0 *arm.Poly8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS8S16 VreinterpretqS8S16\n//go:noescape\nfunc VreinterpretqS8S16(r *arm.Int8X16, v0 *arm.Int16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS8S32 VreinterpretqS8S32\n//go:noescape\nfunc VreinterpretqS8S32(r *arm.Int8X16, v0 *arm.Int32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS8S64 VreinterpretqS8S64\n//go:noescape\nfunc VreinterpretqS8S64(r *arm.Int8X16, v0 *arm.Int64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS8U8 VreinterpretqS8U8\n//go:noescape\nfunc VreinterpretqS8U8(r *arm.Int8X16, v0 *arm.Uint8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS8U16 VreinterpretqS8U16\n//go:noescape\nfunc VreinterpretqS8U16(r *arm.Int8X16, v0 *arm.Uint16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS8U32 VreinterpretqS8U32\n//go:noescape\nfunc VreinterpretqS8U32(r *arm.Int8X16, v0 *arm.Uint32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS8U64 VreinterpretqS8U64\n//go:noescape\nfunc VreinterpretqS8U64(r *arm.Int8X16, v0 *arm.Uint64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS8F32 VreinterpretqS8F32\n//go:noescape\nfunc VreinterpretqS8F32(r *arm.Int8X16, v0 *arm.Float32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS8F64 VreinterpretqS8F64\n//go:noescape\nfunc VreinterpretqS8F64(r *arm.Int8X16, v0 *arm.Float64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS8P128 VreinterpretqS8P128\n//go:noescape\nfunc VreinterpretqS8P128(r *arm.Int8X16, v0 *arm.Poly128)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS8P16 VreinterpretqS8P16\n//go:noescape\nfunc VreinterpretqS8P16(r *arm.Int8X16, v0 *arm.Poly16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS8P64 VreinterpretqS8P64\n//go:noescape\nfunc VreinterpretqS8P64(r *arm.Int8X16, v0 *arm.Poly64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS8P8 VreinterpretqS8P8\n//go:noescape\nfunc VreinterpretqS8P8(r *arm.Int8X16, v0 *arm.Poly8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU16S8 VreinterpretqU16S8\n//go:noescape\nfunc VreinterpretqU16S8(r *arm.Uint16X8, v0 *arm.Int8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU16S16 VreinterpretqU16S16\n//go:noescape\nfunc VreinterpretqU16S16(r *arm.Uint16X8, v0 *arm.Int16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU16S32 VreinterpretqU16S32\n//go:noescape\nfunc VreinterpretqU16S32(r *arm.Uint16X8, v0 *arm.Int32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU16S64 VreinterpretqU16S64\n//go:noescape\nfunc VreinterpretqU16S64(r *arm.Uint16X8, v0 *arm.Int64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU16U8 VreinterpretqU16U8\n//go:noescape\nfunc VreinterpretqU16U8(r *arm.Uint16X8, v0 *arm.Uint8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU16U32 VreinterpretqU16U32\n//go:noescape\nfunc VreinterpretqU16U32(r *arm.Uint16X8, v0 *arm.Uint32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU16U64 VreinterpretqU16U64\n//go:noescape\nfunc VreinterpretqU16U64(r *arm.Uint16X8, v0 *arm.Uint64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU16F32 VreinterpretqU16F32\n//go:noescape\nfunc VreinterpretqU16F32(r *arm.Uint16X8, v0 *arm.Float32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU16F64 VreinterpretqU16F64\n//go:noescape\nfunc VreinterpretqU16F64(r *arm.Uint16X8, v0 *arm.Float64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU16P128 VreinterpretqU16P128\n//go:noescape\nfunc VreinterpretqU16P128(r *arm.Uint16X8, v0 *arm.Poly128)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU16P16 VreinterpretqU16P16\n//go:noescape\nfunc VreinterpretqU16P16(r *arm.Uint16X8, v0 *arm.Poly16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU16P64 VreinterpretqU16P64\n//go:noescape\nfunc VreinterpretqU16P64(r *arm.Uint16X8, v0 *arm.Poly64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU16P8 VreinterpretqU16P8\n//go:noescape\nfunc VreinterpretqU16P8(r *arm.Uint16X8, v0 *arm.Poly8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32S8 VreinterpretqU32S8\n//go:noescape\nfunc VreinterpretqU32S8(r *arm.Uint32X4, v0 *arm.Int8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32S16 VreinterpretqU32S16\n//go:noescape\nfunc VreinterpretqU32S16(r *arm.Uint32X4, v0 *arm.Int16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32S32 VreinterpretqU32S32\n//go:noescape\nfunc VreinterpretqU32S32(r *arm.Uint32X4, v0 *arm.Int32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32S64 VreinterpretqU32S64\n//go:noescape\nfunc VreinterpretqU32S64(r *arm.Uint32X4, v0 *arm.Int64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32U8 VreinterpretqU32U8\n//go:noescape\nfunc VreinterpretqU32U8(r *arm.Uint32X4, v0 *arm.Uint8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32U16 VreinterpretqU32U16\n//go:noescape\nfunc VreinterpretqU32U16(r *arm.Uint32X4, v0 *arm.Uint16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32U64 VreinterpretqU32U64\n//go:noescape\nfunc VreinterpretqU32U64(r *arm.Uint32X4, v0 *arm.Uint64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32F32 VreinterpretqU32F32\n//go:noescape\nfunc VreinterpretqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32F64 VreinterpretqU32F64\n//go:noescape\nfunc VreinterpretqU32F64(r *arm.Uint32X4, v0 *arm.Float64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32P128 VreinterpretqU32P128\n//go:noescape\nfunc VreinterpretqU32P128(r *arm.Uint32X4, v0 *arm.Poly128)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32P16 VreinterpretqU32P16\n//go:noescape\nfunc VreinterpretqU32P16(r *arm.Uint32X4, v0 *arm.Poly16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32P64 VreinterpretqU32P64\n//go:noescape\nfunc VreinterpretqU32P64(r *arm.Uint32X4, v0 *arm.Poly64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32P8 VreinterpretqU32P8\n//go:noescape\nfunc VreinterpretqU32P8(r *arm.Uint32X4, v0 *arm.Poly8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64S8 VreinterpretqU64S8\n//go:noescape\nfunc VreinterpretqU64S8(r *arm.Uint64X2, v0 *arm.Int8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64S16 VreinterpretqU64S16\n//go:noescape\nfunc VreinterpretqU64S16(r *arm.Uint64X2, v0 *arm.Int16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64S32 VreinterpretqU64S32\n//go:noescape\nfunc VreinterpretqU64S32(r *arm.Uint64X2, v0 *arm.Int32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64S64 VreinterpretqU64S64\n//go:noescape\nfunc VreinterpretqU64S64(r *arm.Uint64X2, v0 *arm.Int64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64U8 VreinterpretqU64U8\n//go:noescape\nfunc VreinterpretqU64U8(r *arm.Uint64X2, v0 *arm.Uint8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64U16 VreinterpretqU64U16\n//go:noescape\nfunc VreinterpretqU64U16(r *arm.Uint64X2, v0 *arm.Uint16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64U32 VreinterpretqU64U32\n//go:noescape\nfunc VreinterpretqU64U32(r *arm.Uint64X2, v0 *arm.Uint32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64F32 VreinterpretqU64F32\n//go:noescape\nfunc VreinterpretqU64F32(r *arm.Uint64X2, v0 *arm.Float32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64F64 VreinterpretqU64F64\n//go:noescape\nfunc VreinterpretqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64P128 VreinterpretqU64P128\n//go:noescape\nfunc VreinterpretqU64P128(r *arm.Uint64X2, v0 *arm.Poly128)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64P16 VreinterpretqU64P16\n//go:noescape\nfunc VreinterpretqU64P16(r *arm.Uint64X2, v0 *arm.Poly16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64P64 VreinterpretqU64P64\n//go:noescape\nfunc VreinterpretqU64P64(r *arm.Uint64X2, v0 *arm.Poly64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64P8 VreinterpretqU64P8\n//go:noescape\nfunc VreinterpretqU64P8(r *arm.Uint64X2, v0 *arm.Poly8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU8S8 VreinterpretqU8S8\n//go:noescape\nfunc VreinterpretqU8S8(r *arm.Uint8X16, v0 *arm.Int8X16)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU8S16 VreinterpretqU8S16\n//go:noescape\nfunc VreinterpretqU8S16(r *arm.Uint8X16, v0 *arm.Int16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU8S32 VreinterpretqU8S32\n//go:noescape\nfunc VreinterpretqU8S32(r *arm.Uint8X16, v0 *arm.Int32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU8S64 VreinterpretqU8S64\n//go:noescape\nfunc VreinterpretqU8S64(r *arm.Uint8X16, v0 *arm.Int64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU8U16 VreinterpretqU8U16\n//go:noescape\nfunc VreinterpretqU8U16(r *arm.Uint8X16, v0 *arm.Uint16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU8U32 VreinterpretqU8U32\n//go:noescape\nfunc VreinterpretqU8U32(r *arm.Uint8X16, v0 *arm.Uint32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU8U64 VreinterpretqU8U64\n//go:noescape\nfunc VreinterpretqU8U64(r *arm.Uint8X16, v0 *arm.Uint64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU8F32 VreinterpretqU8F32\n//go:noescape\nfunc VreinterpretqU8F32(r *arm.Uint8X16, v0 *arm.Float32X4)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU8F64 VreinterpretqU8F64\n//go:noescape\nfunc VreinterpretqU8F64(r *arm.Uint8X16, v0 *arm.Float64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU8P128 VreinterpretqU8P128\n//go:noescape\nfunc VreinterpretqU8P128(r *arm.Uint8X16, v0 *arm.Poly128)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU8P16 VreinterpretqU8P16\n//go:noescape\nfunc VreinterpretqU8P16(r *arm.Uint8X16, v0 *arm.Poly16X8)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU8P64 VreinterpretqU8P64\n//go:noescape\nfunc VreinterpretqU8P64(r *arm.Uint8X16, v0 *arm.Poly64X2)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU8P8 VreinterpretqU8P8\n//go:noescape\nfunc VreinterpretqU8P8(r *arm.Uint8X16, v0 *arm.Poly8X16)\n\n// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev16S8 Vrev16S8\n//go:noescape\nfunc Vrev16S8(r *arm.Int8X8, v0 *arm.Int8X8)\n\n// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev16U8 Vrev16U8\n//go:noescape\nfunc Vrev16U8(r *arm.Uint8X8, v0 *arm.Uint8X8)\n\n// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev16P8 Vrev16P8\n//go:noescape\nfunc Vrev16P8(r *arm.Poly8X8, v0 *arm.Poly8X8)\n\n// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev16QS8 Vrev16QS8\n//go:noescape\nfunc Vrev16QS8(r *arm.Int8X16, v0 *arm.Int8X16)\n\n// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev16QU8 Vrev16QU8\n//go:noescape\nfunc Vrev16QU8(r *arm.Uint8X16, v0 *arm.Uint8X16)\n\n// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev16QP8 Vrev16QP8\n//go:noescape\nfunc Vrev16QP8(r *arm.Poly8X16, v0 *arm.Poly8X16)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32S8 Vrev32S8\n//go:noescape\nfunc Vrev32S8(r *arm.Int8X8, v0 *arm.Int8X8)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32S16 Vrev32S16\n//go:noescape\nfunc Vrev32S16(r *arm.Int16X4, v0 *arm.Int16X4)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32U8 Vrev32U8\n//go:noescape\nfunc Vrev32U8(r *arm.Uint8X8, v0 *arm.Uint8X8)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32U16 Vrev32U16\n//go:noescape\nfunc Vrev32U16(r *arm.Uint16X4, v0 *arm.Uint16X4)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32P16 Vrev32P16\n//go:noescape\nfunc Vrev32P16(r *arm.Poly16X4, v0 *arm.Poly16X4)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32P8 Vrev32P8\n//go:noescape\nfunc Vrev32P8(r *arm.Poly8X8, v0 *arm.Poly8X8)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32QS8 Vrev32QS8\n//go:noescape\nfunc Vrev32QS8(r *arm.Int8X16, v0 *arm.Int8X16)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32QS16 Vrev32QS16\n//go:noescape\nfunc Vrev32QS16(r *arm.Int16X8, v0 *arm.Int16X8)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32QU8 Vrev32QU8\n//go:noescape\nfunc Vrev32QU8(r *arm.Uint8X16, v0 *arm.Uint8X16)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32QU16 Vrev32QU16\n//go:noescape\nfunc Vrev32QU16(r *arm.Uint16X8, v0 *arm.Uint16X8)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32QP16 Vrev32QP16\n//go:noescape\nfunc Vrev32QP16(r *arm.Poly16X8, v0 *arm.Poly16X8)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32QP8 Vrev32QP8\n//go:noescape\nfunc Vrev32QP8(r *arm.Poly8X16, v0 *arm.Poly8X16)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64S8 Vrev64S8\n//go:noescape\nfunc Vrev64S8(r *arm.Int8X8, v0 *arm.Int8X8)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64S16 Vrev64S16\n//go:noescape\nfunc Vrev64S16(r *arm.Int16X4, v0 *arm.Int16X4)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64S32 Vrev64S32\n//go:noescape\nfunc Vrev64S32(r *arm.Int32X2, v0 *arm.Int32X2)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64U8 Vrev64U8\n//go:noescape\nfunc Vrev64U8(r *arm.Uint8X8, v0 *arm.Uint8X8)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64U16 Vrev64U16\n//go:noescape\nfunc Vrev64U16(r *arm.Uint16X4, v0 *arm.Uint16X4)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64U32 Vrev64U32\n//go:noescape\nfunc Vrev64U32(r *arm.Uint32X2, v0 *arm.Uint32X2)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64F32 Vrev64F32\n//go:noescape\nfunc Vrev64F32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64P16 Vrev64P16\n//go:noescape\nfunc Vrev64P16(r *arm.Poly16X4, v0 *arm.Poly16X4)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64P8 Vrev64P8\n//go:noescape\nfunc Vrev64P8(r *arm.Poly8X8, v0 *arm.Poly8X8)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QS8 Vrev64QS8\n//go:noescape\nfunc Vrev64QS8(r *arm.Int8X16, v0 *arm.Int8X16)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QS16 Vrev64QS16\n//go:noescape\nfunc Vrev64QS16(r *arm.Int16X8, v0 *arm.Int16X8)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QS32 Vrev64QS32\n//go:noescape\nfunc Vrev64QS32(r *arm.Int32X4, v0 *arm.Int32X4)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QU8 Vrev64QU8\n//go:noescape\nfunc Vrev64QU8(r *arm.Uint8X16, v0 *arm.Uint8X16)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QU16 Vrev64QU16\n//go:noescape\nfunc Vrev64QU16(r *arm.Uint16X8, v0 *arm.Uint16X8)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QU32 Vrev64QU32\n//go:noescape\nfunc Vrev64QU32(r *arm.Uint32X4, v0 *arm.Uint32X4)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QF32 Vrev64QF32\n//go:noescape\nfunc Vrev64QF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QP16 Vrev64QP16\n//go:noescape\nfunc Vrev64QP16(r *arm.Poly16X8, v0 *arm.Poly16X8)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QP8 Vrev64QP8\n//go:noescape\nfunc Vrev64QP8(r *arm.Poly8X16, v0 *arm.Poly8X16)\n\n// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddS8 VrhaddS8\n//go:noescape\nfunc VrhaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddS16 VrhaddS16\n//go:noescape\nfunc VrhaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddS32 VrhaddS32\n//go:noescape\nfunc VrhaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddU8 VrhaddU8\n//go:noescape\nfunc VrhaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddU16 VrhaddU16\n//go:noescape\nfunc VrhaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddU32 VrhaddU32\n//go:noescape\nfunc VrhaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddqS8 VrhaddqS8\n//go:noescape\nfunc VrhaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddqS16 VrhaddqS16\n//go:noescape\nfunc VrhaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddqS32 VrhaddqS32\n//go:noescape\nfunc VrhaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddqU8 VrhaddqU8\n//go:noescape\nfunc VrhaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddqU16 VrhaddqU16\n//go:noescape\nfunc VrhaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddqU32 VrhaddqU32\n//go:noescape\nfunc VrhaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndF32 VrndF32\n//go:noescape\nfunc VrndF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndF64 VrndF64\n//go:noescape\nfunc VrndF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32XF32 Vrnd32XF32\n//go:noescape\nfunc Vrnd32XF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32XF64 Vrnd32XF64\n//go:noescape\nfunc Vrnd32XF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32XqF32 Vrnd32XqF32\n//go:noescape\nfunc Vrnd32XqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32XqF64 Vrnd32XqF64\n//go:noescape\nfunc Vrnd32XqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32ZF32 Vrnd32ZF32\n//go:noescape\nfunc Vrnd32ZF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32ZF64 Vrnd32ZF64\n//go:noescape\nfunc Vrnd32ZF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32ZqF32 Vrnd32ZqF32\n//go:noescape\nfunc Vrnd32ZqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32ZqF64 Vrnd32ZqF64\n//go:noescape\nfunc Vrnd32ZqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64XF32 Vrnd64XF32\n//go:noescape\nfunc Vrnd64XF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64XF64 Vrnd64XF64\n//go:noescape\nfunc Vrnd64XF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64XqF32 Vrnd64XqF32\n//go:noescape\nfunc Vrnd64XqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64XqF64 Vrnd64XqF64\n//go:noescape\nfunc Vrnd64XqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64ZF32 Vrnd64ZF32\n//go:noescape\nfunc Vrnd64ZF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64ZF64 Vrnd64ZF64\n//go:noescape\nfunc Vrnd64ZF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64ZqF32 Vrnd64ZqF32\n//go:noescape\nfunc Vrnd64ZqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64ZqF64 Vrnd64ZqF64\n//go:noescape\nfunc Vrnd64ZqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndaF32 VrndaF32\n//go:noescape\nfunc VrndaF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndaF64 VrndaF64\n//go:noescape\nfunc VrndaF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndaqF32 VrndaqF32\n//go:noescape\nfunc VrndaqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndaqF64 VrndaqF64\n//go:noescape\nfunc VrndaqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndiF32 VrndiF32\n//go:noescape\nfunc VrndiF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndiF64 VrndiF64\n//go:noescape\nfunc VrndiF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndiqF32 VrndiqF32\n//go:noescape\nfunc VrndiqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndiqF64 VrndiqF64\n//go:noescape\nfunc VrndiqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndmF32 VrndmF32\n//go:noescape\nfunc VrndmF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndmF64 VrndmF64\n//go:noescape\nfunc VrndmF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndmqF32 VrndmqF32\n//go:noescape\nfunc VrndmqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndmqF64 VrndmqF64\n//go:noescape\nfunc VrndmqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndnF32 VrndnF32\n//go:noescape\nfunc VrndnF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndnF64 VrndnF64\n//go:noescape\nfunc VrndnF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndnqF32 VrndnqF32\n//go:noescape\nfunc VrndnqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndnqF64 VrndnqF64\n//go:noescape\nfunc VrndnqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndnsF32 VrndnsF32\n//go:noescape\nfunc VrndnsF32(r *arm.Float32, v0 *arm.Float32)\n\n// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndpF32 VrndpF32\n//go:noescape\nfunc VrndpF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndpF64 VrndpF64\n//go:noescape\nfunc VrndpF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndpqF32 VrndpqF32\n//go:noescape\nfunc VrndpqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndpqF64 VrndpqF64\n//go:noescape\nfunc VrndpqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndqF32 VrndqF32\n//go:noescape\nfunc VrndqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndqF64 VrndqF64\n//go:noescape\nfunc VrndqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndxF32 VrndxF32\n//go:noescape\nfunc VrndxF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndxF64 VrndxF64\n//go:noescape\nfunc VrndxF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndxqF32 VrndxqF32\n//go:noescape\nfunc VrndxqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndxqF64 VrndxqF64\n//go:noescape\nfunc VrndxqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlS8 VrshlS8\n//go:noescape\nfunc VrshlS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlS16 VrshlS16\n//go:noescape\nfunc VrshlS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlS32 VrshlS32\n//go:noescape\nfunc VrshlS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlS64 VrshlS64\n//go:noescape\nfunc VrshlS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlU8 VrshlU8\n//go:noescape\nfunc VrshlU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8)\n\n// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlU16 VrshlU16\n//go:noescape\nfunc VrshlU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4)\n\n// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlU32 VrshlU32\n//go:noescape\nfunc VrshlU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2)\n\n// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlU64 VrshlU64\n//go:noescape\nfunc VrshlU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshldS64 VrshldS64\n//go:noescape\nfunc VrshldS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)\n\n// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshldU64 VrshldU64\n//go:noescape\nfunc VrshldU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlqS8 VrshlqS8\n//go:noescape\nfunc VrshlqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlqS16 VrshlqS16\n//go:noescape\nfunc VrshlqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlqS32 VrshlqS32\n//go:noescape\nfunc VrshlqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlqS64 VrshlqS64\n//go:noescape\nfunc VrshlqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlqU8 VrshlqU8\n//go:noescape\nfunc VrshlqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16)\n\n// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlqU16 VrshlqU16\n//go:noescape\nfunc VrshlqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8)\n\n// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlqU32 VrshlqU32\n//go:noescape\nfunc VrshlqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4)\n\n// Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlqU64 VrshlqU64\n//go:noescape\nfunc VrshlqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2)\n\n// Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VrsqrteU32 VrsqrteU32\n//go:noescape\nfunc VrsqrteU32(r *arm.Uint32X2, v0 *arm.Uint32X2)\n\n// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrteF32 VrsqrteF32\n//go:noescape\nfunc VrsqrteF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrteF64 VrsqrteF64\n//go:noescape\nfunc VrsqrteF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtedF64 VrsqrtedF64\n//go:noescape\nfunc VrsqrtedF64(r *arm.Float64, v0 *arm.Float64)\n\n// Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VrsqrteqU32 VrsqrteqU32\n//go:noescape\nfunc VrsqrteqU32(r *arm.Uint32X4, v0 *arm.Uint32X4)\n\n// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrteqF32 VrsqrteqF32\n//go:noescape\nfunc VrsqrteqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrteqF64 VrsqrteqF64\n//go:noescape\nfunc VrsqrteqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtesF32 VrsqrtesF32\n//go:noescape\nfunc VrsqrtesF32(r *arm.Float32, v0 *arm.Float32)\n\n// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtsF32 VrsqrtsF32\n//go:noescape\nfunc VrsqrtsF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtsF64 VrsqrtsF64\n//go:noescape\nfunc VrsqrtsF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtsdF64 VrsqrtsdF64\n//go:noescape\nfunc VrsqrtsdF64(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64)\n\n// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtsqF32 VrsqrtsqF32\n//go:noescape\nfunc VrsqrtsqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtsqF64 VrsqrtsqF64\n//go:noescape\nfunc VrsqrtsqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtssF32 VrsqrtssF32\n//go:noescape\nfunc VrsqrtssF32(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32)\n\n// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VrsubhnS16 VrsubhnS16\n//go:noescape\nfunc VrsubhnS16(r *arm.Int8X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VrsubhnS32 VrsubhnS32\n//go:noescape\nfunc VrsubhnS32(r *arm.Int16X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VrsubhnS64 VrsubhnS64\n//go:noescape\nfunc VrsubhnS64(r *arm.Int32X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VrsubhnU16 VrsubhnU16\n//go:noescape\nfunc VrsubhnU16(r *arm.Uint8X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VrsubhnU32 VrsubhnU32\n//go:noescape\nfunc VrsubhnU32(r *arm.Uint16X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VrsubhnU64 VrsubhnU64\n//go:noescape\nfunc VrsubhnU64(r *arm.Uint32X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VrsubhnHighS16 VrsubhnHighS16\n//go:noescape\nfunc VrsubhnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VrsubhnHighS32 VrsubhnHighS32\n//go:noescape\nfunc VrsubhnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VrsubhnHighS64 VrsubhnHighS64\n//go:noescape\nfunc VrsubhnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2, v2 *arm.Int64X2)\n\n// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VrsubhnHighU16 VrsubhnHighU16\n//go:noescape\nfunc VrsubhnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)\n\n// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VrsubhnHighU32 VrsubhnHighU32\n//go:noescape\nfunc VrsubhnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.\n//\n//go:linkname VrsubhnHighU64 VrsubhnHighU64\n//go:noescape\nfunc VrsubhnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)\n\n// SHA1 hash update (choose).\n//\n//go:linkname Vsha1CqU32 Vsha1CqU32\n//go:noescape\nfunc Vsha1CqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32, v2 *arm.Uint32X4)\n\n// SHA1 fixed rotate.\n//\n//go:linkname Vsha1HU32 Vsha1HU32\n//go:noescape\nfunc Vsha1HU32(r *arm.Uint32, v0 *arm.Uint32)\n\n// SHA1 hash update (majority).\n//\n//go:linkname Vsha1MqU32 Vsha1MqU32\n//go:noescape\nfunc Vsha1MqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32, v2 *arm.Uint32X4)\n\n// SHA1 hash update (parity).\n//\n//go:linkname Vsha1PqU32 Vsha1PqU32\n//go:noescape\nfunc Vsha1PqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32, v2 *arm.Uint32X4)\n\n// SHA1 schedule update 0.\n//\n//go:linkname Vsha1Su0QU32 Vsha1Su0QU32\n//go:noescape\nfunc Vsha1Su0QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// SHA1 schedule update 1.\n//\n//go:linkname Vsha1Su1QU32 Vsha1Su1QU32\n//go:noescape\nfunc Vsha1Su1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// SHA256 hash update (part 2).\n//\n//go:linkname Vsha256H2QU32 Vsha256H2QU32\n//go:noescape\nfunc Vsha256H2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// SHA256 hash update (part 1).\n//\n//go:linkname Vsha256HqU32 Vsha256HqU32\n//go:noescape\nfunc Vsha256HqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// SHA256 schedule update 0.\n//\n//go:linkname Vsha256Su0QU32 Vsha256Su0QU32\n//go:noescape\nfunc Vsha256Su0QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// SHA256 schedule update 1.\n//\n//go:linkname Vsha256Su1QU32 Vsha256Su1QU32\n//go:noescape\nfunc Vsha256Su1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// SHA512 Hash update part 2 takes the values from the three 128-bit source SIMD&FP registers and produces a 128-bit output value that combines the sigma0 and majority functions of two iterations of the SHA512 computation. It returns this value to the destination SIMD&FP register.\n//\n//go:linkname Vsha512H2QU64 Vsha512H2QU64\n//go:noescape\nfunc Vsha512H2QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)\n\n// SHA512 Hash update part 1 takes the values from the three 128-bit source SIMD&FP registers and produces a 128-bit output value that combines the sigma1 and chi functions of two iterations of the SHA512 computation. It returns this value to the destination SIMD&FP register.\n//\n//go:linkname Vsha512HqU64 Vsha512HqU64\n//go:noescape\nfunc Vsha512HqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)\n\n// SHA512 Schedule Update 0 takes the values from the two 128-bit source SIMD&FP registers and produces a 128-bit output value that combines the gamma0 functions of two iterations of the SHA512 schedule update that are performed after the first 16 iterations within a block. It returns this value to the destination SIMD&FP register.\n//\n//go:linkname Vsha512Su0QU64 Vsha512Su0QU64\n//go:noescape\nfunc Vsha512Su0QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// SHA512 Schedule Update 1 takes the values from the three source SIMD&FP registers and produces a 128-bit output value that combines the gamma1 functions of two iterations of the SHA512 schedule update that are performed after the first 16 iterations within a block. It returns this value to the destination SIMD&FP register.\n//\n//go:linkname Vsha512Su1QU64 Vsha512Su1QU64\n//go:noescape\nfunc Vsha512Su1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlS8 VshlS8\n//go:noescape\nfunc VshlS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlS16 VshlS16\n//go:noescape\nfunc VshlS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlS32 VshlS32\n//go:noescape\nfunc VshlS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlS64 VshlS64\n//go:noescape\nfunc VshlS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlU8 VshlU8\n//go:noescape\nfunc VshlU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8)\n\n// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlU16 VshlU16\n//go:noescape\nfunc VshlU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4)\n\n// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlU32 VshlU32\n//go:noescape\nfunc VshlU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2)\n\n// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlU64 VshlU64\n//go:noescape\nfunc VshlU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshldS64 VshldS64\n//go:noescape\nfunc VshldS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)\n\n// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshldU64 VshldU64\n//go:noescape\nfunc VshldU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlqS8 VshlqS8\n//go:noescape\nfunc VshlqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlqS16 VshlqS16\n//go:noescape\nfunc VshlqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlqS32 VshlqS32\n//go:noescape\nfunc VshlqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlqS64 VshlqS64\n//go:noescape\nfunc VshlqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlqU8 VshlqU8\n//go:noescape\nfunc VshlqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16)\n\n// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlqU16 VshlqU16\n//go:noescape\nfunc VshlqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8)\n\n// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlqU32 VshlqU32\n//go:noescape\nfunc VshlqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4)\n\n// Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlqU64 VshlqU64\n//go:noescape\nfunc VshlqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2)\n\n// SM3PARTW1 takes three 128-bit vectors from the three source SIMD&FP registers and returns a 128-bit result in the destination SIMD&FP register. The result is obtained by a three-way exclusive OR of the elements within the input vectors with some fixed rotations, see the Operation pseudocode for more information.\n//\n//go:linkname Vsm3Partw1QU32 Vsm3Partw1QU32\n//go:noescape\nfunc Vsm3Partw1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// SM3PARTW2 takes three 128-bit vectors from three source SIMD&FP registers and returns a 128-bit result in the destination SIMD&FP register. The result is obtained by a three-way exclusive OR of the elements within the input vectors with some fixed rotations, see the Operation pseudocode for more information.\n//\n//go:linkname Vsm3Partw2QU32 Vsm3Partw2QU32\n//go:noescape\nfunc Vsm3Partw2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// SM3SS1 rotates the top 32 bits of the 128-bit vector in the first source SIMD&FP register by 12, and adds that 32-bit value to the two other 32-bit values held in the top 32 bits of each of the 128-bit vectors in the second and third source SIMD&FP registers, rotating this result left by 7 and writing the final result into the top 32 bits of the vector in the destination SIMD&FP register, with the bottom 96 bits of the vector being written to 0.\n//\n//go:linkname Vsm3Ss1QU32 Vsm3Ss1QU32\n//go:noescape\nfunc Vsm3Ss1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// SM4 Key takes an input as a 128-bit vector from the first source SIMD&FP register and a 128-bit constant from the second SIMD&FP register. It derives four iterations of the output key, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register.\n//\n//go:linkname Vsm4EkeyqU32 Vsm4EkeyqU32\n//go:noescape\nfunc Vsm4EkeyqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// SM4 Encode takes input data as a 128-bit vector from the first source SIMD&FP register, and four iterations of the round key held as the elements of the 128-bit vector in the second source SIMD&FP register. It encrypts the data by four rounds, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register.\n//\n//go:linkname Vsm4EqU32 Vsm4EqU32\n//go:noescape\nfunc Vsm4EqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VsqaddU8 VsqaddU8\n//go:noescape\nfunc VsqaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8)\n\n// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VsqaddU16 VsqaddU16\n//go:noescape\nfunc VsqaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4)\n\n// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VsqaddU32 VsqaddU32\n//go:noescape\nfunc VsqaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2)\n\n// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VsqaddU64 VsqaddU64\n//go:noescape\nfunc VsqaddU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1)\n\n// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VsqaddbU8 VsqaddbU8\n//go:noescape\nfunc VsqaddbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Int8)\n\n// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VsqadddU64 VsqadddU64\n//go:noescape\nfunc VsqadddU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64)\n\n// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VsqaddhU16 VsqaddhU16\n//go:noescape\nfunc VsqaddhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Int16)\n\n// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VsqaddqU8 VsqaddqU8\n//go:noescape\nfunc VsqaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16)\n\n// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VsqaddqU16 VsqaddqU16\n//go:noescape\nfunc VsqaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8)\n\n// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VsqaddqU32 VsqaddqU32\n//go:noescape\nfunc VsqaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4)\n\n// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VsqaddqU64 VsqaddqU64\n//go:noescape\nfunc VsqaddqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2)\n\n// Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register.\n//\n//go:linkname VsqaddsU32 VsqaddsU32\n//go:noescape\nfunc VsqaddsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Int32)\n\n// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsqrtF32 VsqrtF32\n//go:noescape\nfunc VsqrtF32(r *arm.Float32X2, v0 *arm.Float32X2)\n\n// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsqrtF64 VsqrtF64\n//go:noescape\nfunc VsqrtF64(r *arm.Float64X1, v0 *arm.Float64X1)\n\n// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsqrtqF32 VsqrtqF32\n//go:noescape\nfunc VsqrtqF32(r *arm.Float32X4, v0 *arm.Float32X4)\n\n// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsqrtqF64 VsqrtqF64\n//go:noescape\nfunc VsqrtqF64(r *arm.Float64X2, v0 *arm.Float64X2)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubS8 VsubS8\n//go:noescape\nfunc VsubS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubS16 VsubS16\n//go:noescape\nfunc VsubS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubS32 VsubS32\n//go:noescape\nfunc VsubS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubS64 VsubS64\n//go:noescape\nfunc VsubS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubU8 VsubU8\n//go:noescape\nfunc VsubU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubU16 VsubU16\n//go:noescape\nfunc VsubU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubU32 VsubU32\n//go:noescape\nfunc VsubU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubU64 VsubU64\n//go:noescape\nfunc VsubU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubF32 VsubF32\n//go:noescape\nfunc VsubF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubF64 VsubF64\n//go:noescape\nfunc VsubF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubdS64 VsubdS64\n//go:noescape\nfunc VsubdS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubdU64 VsubdU64\n//go:noescape\nfunc VsubdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)\n\n// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubhnS16 VsubhnS16\n//go:noescape\nfunc VsubhnS16(r *arm.Int8X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubhnS32 VsubhnS32\n//go:noescape\nfunc VsubhnS32(r *arm.Int16X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubhnS64 VsubhnS64\n//go:noescape\nfunc VsubhnS64(r *arm.Int32X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubhnU16 VsubhnU16\n//go:noescape\nfunc VsubhnU16(r *arm.Uint8X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubhnU32 VsubhnU32\n//go:noescape\nfunc VsubhnU32(r *arm.Uint16X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubhnU64 VsubhnU64\n//go:noescape\nfunc VsubhnU64(r *arm.Uint32X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubhnHighS16 VsubhnHighS16\n//go:noescape\nfunc VsubhnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8, v2 *arm.Int16X8)\n\n// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubhnHighS32 VsubhnHighS32\n//go:noescape\nfunc VsubhnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4, v2 *arm.Int32X4)\n\n// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubhnHighS64 VsubhnHighS64\n//go:noescape\nfunc VsubhnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2, v2 *arm.Int64X2)\n\n// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubhnHighU16 VsubhnHighU16\n//go:noescape\nfunc VsubhnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8)\n\n// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubhnHighU32 VsubhnHighU32\n//go:noescape\nfunc VsubhnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4)\n\n// Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubhnHighU64 VsubhnHighU64\n//go:noescape\nfunc VsubhnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2)\n\n// Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VsublS8 VsublS8\n//go:noescape\nfunc VsublS8(r *arm.Int16X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VsublS16 VsublS16\n//go:noescape\nfunc VsublS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VsublS32 VsublS32\n//go:noescape\nfunc VsublS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VsublU8 VsublU8\n//go:noescape\nfunc VsublU8(r *arm.Uint16X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VsublU16 VsublU16\n//go:noescape\nfunc VsublU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VsublU32 VsublU32\n//go:noescape\nfunc VsublU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VsublHighS8 VsublHighS8\n//go:noescape\nfunc VsublHighS8(r *arm.Int16X8, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VsublHighS16 VsublHighS16\n//go:noescape\nfunc VsublHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VsublHighS32 VsublHighS32\n//go:noescape\nfunc VsublHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VsublHighU8 VsublHighU8\n//go:noescape\nfunc VsublHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VsublHighU16 VsublHighU16\n//go:noescape\nfunc VsublHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements.\n//\n//go:linkname VsublHighU32 VsublHighU32\n//go:noescape\nfunc VsublHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqS8 VsubqS8\n//go:noescape\nfunc VsubqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqS16 VsubqS16\n//go:noescape\nfunc VsubqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqS32 VsubqS32\n//go:noescape\nfunc VsubqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqS64 VsubqS64\n//go:noescape\nfunc VsubqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqU8 VsubqU8\n//go:noescape\nfunc VsubqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqU16 VsubqU16\n//go:noescape\nfunc VsubqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqU32 VsubqU32\n//go:noescape\nfunc VsubqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqU64 VsubqU64\n//go:noescape\nfunc VsubqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqF32 VsubqF32\n//go:noescape\nfunc VsubqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqF64 VsubqF64\n//go:noescape\nfunc VsubqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubwS8 VsubwS8\n//go:noescape\nfunc VsubwS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8)\n\n// Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubwS16 VsubwS16\n//go:noescape\nfunc VsubwS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4)\n\n// Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubwS32 VsubwS32\n//go:noescape\nfunc VsubwS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2)\n\n// Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubwU8 VsubwU8\n//go:noescape\nfunc VsubwU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8)\n\n// Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubwU16 VsubwU16\n//go:noescape\nfunc VsubwU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4)\n\n// Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubwU32 VsubwU32\n//go:noescape\nfunc VsubwU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2)\n\n// Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubwHighS8 VsubwHighS8\n//go:noescape\nfunc VsubwHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16)\n\n// Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubwHighS16 VsubwHighS16\n//go:noescape\nfunc VsubwHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8)\n\n// Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubwHighS32 VsubwHighS32\n//go:noescape\nfunc VsubwHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4)\n\n// Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubwHighU8 VsubwHighU8\n//go:noescape\nfunc VsubwHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16)\n\n// Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubwHighU16 VsubwHighU16\n//go:noescape\nfunc VsubwHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8)\n\n// Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values.\n//\n//go:linkname VsubwHighU32 VsubwHighU32\n//go:noescape\nfunc VsubwHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbl1S8 Vtbl1S8\n//go:noescape\nfunc Vtbl1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbl1U8 Vtbl1U8\n//go:noescape\nfunc Vtbl1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbl1P8 Vtbl1P8\n//go:noescape\nfunc Vtbl1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbl2S8 Vtbl2S8\n//go:noescape\nfunc Vtbl2S8(r *arm.Int8X8, v0 *arm.Int8X8X2, v1 *arm.Int8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbl2U8 Vtbl2U8\n//go:noescape\nfunc Vtbl2U8(r *arm.Uint8X8, v0 *arm.Uint8X8X2, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbl2P8 Vtbl2P8\n//go:noescape\nfunc Vtbl2P8(r *arm.Poly8X8, v0 *arm.Poly8X8X2, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbl3S8 Vtbl3S8\n//go:noescape\nfunc Vtbl3S8(r *arm.Int8X8, v0 *arm.Int8X8X3, v1 *arm.Int8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbl3U8 Vtbl3U8\n//go:noescape\nfunc Vtbl3U8(r *arm.Uint8X8, v0 *arm.Uint8X8X3, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbl3P8 Vtbl3P8\n//go:noescape\nfunc Vtbl3P8(r *arm.Poly8X8, v0 *arm.Poly8X8X3, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbl4S8 Vtbl4S8\n//go:noescape\nfunc Vtbl4S8(r *arm.Int8X8, v0 *arm.Int8X8X4, v1 *arm.Int8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbl4U8 Vtbl4U8\n//go:noescape\nfunc Vtbl4U8(r *arm.Uint8X8, v0 *arm.Uint8X8X4, v1 *arm.Uint8X8)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbl4P8 Vtbl4P8\n//go:noescape\nfunc Vtbl4P8(r *arm.Poly8X8, v0 *arm.Poly8X8X4, v1 *arm.Uint8X8)\n\n// Table vector lookup extension\n//\n//go:linkname Vtbx1S8 Vtbx1S8\n//go:noescape\nfunc Vtbx1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8, v2 *arm.Int8X8)\n\n// Table vector lookup extension\n//\n//go:linkname Vtbx1U8 Vtbx1U8\n//go:noescape\nfunc Vtbx1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8)\n\n// Table vector lookup extension\n//\n//go:linkname Vtbx1P8 Vtbx1P8\n//go:noescape\nfunc Vtbx1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbx2S8 Vtbx2S8\n//go:noescape\nfunc Vtbx2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8X2, v2 *arm.Int8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbx2U8 Vtbx2U8\n//go:noescape\nfunc Vtbx2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8X2, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbx2P8 Vtbx2P8\n//go:noescape\nfunc Vtbx2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8X2, v2 *arm.Uint8X8)\n\n// Table vector lookup extension\n//\n//go:linkname Vtbx3S8 Vtbx3S8\n//go:noescape\nfunc Vtbx3S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8X3, v2 *arm.Int8X8)\n\n// Table vector lookup extension\n//\n//go:linkname Vtbx3U8 Vtbx3U8\n//go:noescape\nfunc Vtbx3U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8X3, v2 *arm.Uint8X8)\n\n// Table vector lookup extension\n//\n//go:linkname Vtbx3P8 Vtbx3P8\n//go:noescape\nfunc Vtbx3P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8X3, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbx4S8 Vtbx4S8\n//go:noescape\nfunc Vtbx4S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8X4, v2 *arm.Int8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbx4U8 Vtbx4U8\n//go:noescape\nfunc Vtbx4U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8X4, v2 *arm.Uint8X8)\n\n// Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbx4P8 Vtbx4P8\n//go:noescape\nfunc Vtbx4P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8X4, v2 *arm.Uint8X8)\n\n// Transpose elements\n//\n//go:linkname VtrnS8 VtrnS8\n//go:noescape\nfunc VtrnS8(r *arm.Int8X8X2, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Transpose elements\n//\n//go:linkname VtrnS16 VtrnS16\n//go:noescape\nfunc VtrnS16(r *arm.Int16X4X2, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Transpose elements\n//\n//go:linkname VtrnS32 VtrnS32\n//go:noescape\nfunc VtrnS32(r *arm.Int32X2X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Transpose elements\n//\n//go:linkname VtrnU8 VtrnU8\n//go:noescape\nfunc VtrnU8(r *arm.Uint8X8X2, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Transpose elements\n//\n//go:linkname VtrnU16 VtrnU16\n//go:noescape\nfunc VtrnU16(r *arm.Uint16X4X2, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Transpose elements\n//\n//go:linkname VtrnU32 VtrnU32\n//go:noescape\nfunc VtrnU32(r *arm.Uint32X2X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Transpose elements\n//\n//go:linkname VtrnF32 VtrnF32\n//go:noescape\nfunc VtrnF32(r *arm.Float32X2X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1S8 Vtrn1S8\n//go:noescape\nfunc Vtrn1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1S16 Vtrn1S16\n//go:noescape\nfunc Vtrn1S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1S32 Vtrn1S32\n//go:noescape\nfunc Vtrn1S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1U8 Vtrn1U8\n//go:noescape\nfunc Vtrn1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1U16 Vtrn1U16\n//go:noescape\nfunc Vtrn1U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1U32 Vtrn1U32\n//go:noescape\nfunc Vtrn1U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1F32 Vtrn1F32\n//go:noescape\nfunc Vtrn1F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1P16 Vtrn1P16\n//go:noescape\nfunc Vtrn1P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1P8 Vtrn1P8\n//go:noescape\nfunc Vtrn1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QS8 Vtrn1QS8\n//go:noescape\nfunc Vtrn1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QS16 Vtrn1QS16\n//go:noescape\nfunc Vtrn1QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QS32 Vtrn1QS32\n//go:noescape\nfunc Vtrn1QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QS64 Vtrn1QS64\n//go:noescape\nfunc Vtrn1QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QU8 Vtrn1QU8\n//go:noescape\nfunc Vtrn1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QU16 Vtrn1QU16\n//go:noescape\nfunc Vtrn1QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QU32 Vtrn1QU32\n//go:noescape\nfunc Vtrn1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QU64 Vtrn1QU64\n//go:noescape\nfunc Vtrn1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QF32 Vtrn1QF32\n//go:noescape\nfunc Vtrn1QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QF64 Vtrn1QF64\n//go:noescape\nfunc Vtrn1QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QP16 Vtrn1QP16\n//go:noescape\nfunc Vtrn1QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QP64 Vtrn1QP64\n//go:noescape\nfunc Vtrn1QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QP8 Vtrn1QP8\n//go:noescape\nfunc Vtrn1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2S8 Vtrn2S8\n//go:noescape\nfunc Vtrn2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2S16 Vtrn2S16\n//go:noescape\nfunc Vtrn2S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2S32 Vtrn2S32\n//go:noescape\nfunc Vtrn2S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2U8 Vtrn2U8\n//go:noescape\nfunc Vtrn2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2U16 Vtrn2U16\n//go:noescape\nfunc Vtrn2U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2U32 Vtrn2U32\n//go:noescape\nfunc Vtrn2U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2F32 Vtrn2F32\n//go:noescape\nfunc Vtrn2F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2P16 Vtrn2P16\n//go:noescape\nfunc Vtrn2P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2P8 Vtrn2P8\n//go:noescape\nfunc Vtrn2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QS8 Vtrn2QS8\n//go:noescape\nfunc Vtrn2QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QS16 Vtrn2QS16\n//go:noescape\nfunc Vtrn2QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QS32 Vtrn2QS32\n//go:noescape\nfunc Vtrn2QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QS64 Vtrn2QS64\n//go:noescape\nfunc Vtrn2QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QU8 Vtrn2QU8\n//go:noescape\nfunc Vtrn2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QU16 Vtrn2QU16\n//go:noescape\nfunc Vtrn2QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QU32 Vtrn2QU32\n//go:noescape\nfunc Vtrn2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QU64 Vtrn2QU64\n//go:noescape\nfunc Vtrn2QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QF32 Vtrn2QF32\n//go:noescape\nfunc Vtrn2QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QF64 Vtrn2QF64\n//go:noescape\nfunc Vtrn2QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QP16 Vtrn2QP16\n//go:noescape\nfunc Vtrn2QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QP64 Vtrn2QP64\n//go:noescape\nfunc Vtrn2QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QP8 Vtrn2QP8\n//go:noescape\nfunc Vtrn2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)\n\n// Transpose elements\n//\n//go:linkname VtrnP16 VtrnP16\n//go:noescape\nfunc VtrnP16(r *arm.Poly16X4X2, v0 *arm.Poly16X4, v1 *arm.Poly16X4)\n\n// Transpose elements\n//\n//go:linkname VtrnP8 VtrnP8\n//go:noescape\nfunc VtrnP8(r *arm.Poly8X8X2, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Transpose elements\n//\n//go:linkname VtrnqS8 VtrnqS8\n//go:noescape\nfunc VtrnqS8(r *arm.Int8X16X2, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Transpose elements\n//\n//go:linkname VtrnqS16 VtrnqS16\n//go:noescape\nfunc VtrnqS16(r *arm.Int16X8X2, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Transpose elements\n//\n//go:linkname VtrnqS32 VtrnqS32\n//go:noescape\nfunc VtrnqS32(r *arm.Int32X4X2, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Transpose elements\n//\n//go:linkname VtrnqU8 VtrnqU8\n//go:noescape\nfunc VtrnqU8(r *arm.Uint8X16X2, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Transpose elements\n//\n//go:linkname VtrnqU16 VtrnqU16\n//go:noescape\nfunc VtrnqU16(r *arm.Uint16X8X2, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Transpose elements\n//\n//go:linkname VtrnqU32 VtrnqU32\n//go:noescape\nfunc VtrnqU32(r *arm.Uint32X4X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Transpose elements\n//\n//go:linkname VtrnqF32 VtrnqF32\n//go:noescape\nfunc VtrnqF32(r *arm.Float32X4X2, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Transpose elements\n//\n//go:linkname VtrnqP16 VtrnqP16\n//go:noescape\nfunc VtrnqP16(r *arm.Poly16X8X2, v0 *arm.Poly16X8, v1 *arm.Poly16X8)\n\n// Transpose elements\n//\n//go:linkname VtrnqP8 VtrnqP8\n//go:noescape\nfunc VtrnqP8(r *arm.Poly8X16X2, v0 *arm.Poly8X16, v1 *arm.Poly8X16)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstS8 VtstS8\n//go:noescape\nfunc VtstS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstS16 VtstS16\n//go:noescape\nfunc VtstS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstS32 VtstS32\n//go:noescape\nfunc VtstS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstS64 VtstS64\n//go:noescape\nfunc VtstS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstU8 VtstU8\n//go:noescape\nfunc VtstU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstU16 VtstU16\n//go:noescape\nfunc VtstU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstU32 VtstU32\n//go:noescape\nfunc VtstU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstU64 VtstU64\n//go:noescape\nfunc VtstU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1)\n\n// vtst_p16\n//\n//go:linkname VtstP16 VtstP16\n//go:noescape\nfunc VtstP16(r *arm.Uint16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstP64 VtstP64\n//go:noescape\nfunc VtstP64(r *arm.Uint64X1, v0 *arm.Poly64X1, v1 *arm.Poly64X1)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstP8 VtstP8\n//go:noescape\nfunc VtstP8(r *arm.Uint8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstdS64 VtstdS64\n//go:noescape\nfunc VtstdS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstdU64 VtstdU64\n//go:noescape\nfunc VtstdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqS8 VtstqS8\n//go:noescape\nfunc VtstqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqS16 VtstqS16\n//go:noescape\nfunc VtstqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqS32 VtstqS32\n//go:noescape\nfunc VtstqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqS64 VtstqS64\n//go:noescape\nfunc VtstqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqU8 VtstqU8\n//go:noescape\nfunc VtstqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqU16 VtstqU16\n//go:noescape\nfunc VtstqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqU32 VtstqU32\n//go:noescape\nfunc VtstqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqU64 VtstqU64\n//go:noescape\nfunc VtstqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// vtstq_p16\n//\n//go:linkname VtstqP16 VtstqP16\n//go:noescape\nfunc VtstqP16(r *arm.Uint16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqP64 VtstqP64\n//go:noescape\nfunc VtstqP64(r *arm.Uint64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqP8 VtstqP8\n//go:noescape\nfunc VtstqP8(r *arm.Uint8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)\n\n// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.\n//\n//go:linkname VuqaddS8 VuqaddS8\n//go:noescape\nfunc VuqaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Uint8X8)\n\n// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.\n//\n//go:linkname VuqaddS16 VuqaddS16\n//go:noescape\nfunc VuqaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Uint16X4)\n\n// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.\n//\n//go:linkname VuqaddS32 VuqaddS32\n//go:noescape\nfunc VuqaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Uint32X2)\n\n// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.\n//\n//go:linkname VuqaddS64 VuqaddS64\n//go:noescape\nfunc VuqaddS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Uint64X1)\n\n// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.\n//\n//go:linkname VuqaddbS8 VuqaddbS8\n//go:noescape\nfunc VuqaddbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Uint8)\n\n// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.\n//\n//go:linkname VuqadddS64 VuqadddS64\n//go:noescape\nfunc VuqadddS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Uint64)\n\n// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.\n//\n//go:linkname VuqaddhS16 VuqaddhS16\n//go:noescape\nfunc VuqaddhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Uint16)\n\n// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.\n//\n//go:linkname VuqaddqS8 VuqaddqS8\n//go:noescape\nfunc VuqaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Uint8X16)\n\n// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.\n//\n//go:linkname VuqaddqS16 VuqaddqS16\n//go:noescape\nfunc VuqaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Uint16X8)\n\n// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.\n//\n//go:linkname VuqaddqS32 VuqaddqS32\n//go:noescape\nfunc VuqaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Uint32X4)\n\n// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.\n//\n//go:linkname VuqaddqS64 VuqaddqS64\n//go:noescape\nfunc VuqaddqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Uint64X2)\n\n// Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.\n//\n//go:linkname VuqaddsS32 VuqaddsS32\n//go:noescape\nfunc VuqaddsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Uint32)\n\n// Dot Product vector form with unsigned and signed integers. This instruction performs the dot product of the four unsigned 8-bit integer values in each 32-bit element of the first source register with the four signed 8-bit integer values in the corresponding 32-bit element of the second source register, accumulating the result into the corresponding 32-bit element of the destination register.\n//\n//go:linkname VusdotS32 VusdotS32\n//go:noescape\nfunc VusdotS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Uint8X8, v2 *arm.Int8X8)\n\n// Dot Product vector form with unsigned and signed integers. This instruction performs the dot product of the four unsigned 8-bit integer values in each 32-bit element of the first source register with the four signed 8-bit integer values in the corresponding 32-bit element of the second source register, accumulating the result into the corresponding 32-bit element of the destination register.\n//\n//go:linkname VusdotqS32 VusdotqS32\n//go:noescape\nfunc VusdotqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Uint8X16, v2 *arm.Int8X16)\n\n// Unsigned and signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned 8-bit integer values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element.\n//\n//go:linkname VusmmlaqS32 VusmmlaqS32\n//go:noescape\nfunc VusmmlaqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Uint8X16, v2 *arm.Int8X16)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpS8 VuzpS8\n//go:noescape\nfunc VuzpS8(r *arm.Int8X8X2, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpS16 VuzpS16\n//go:noescape\nfunc VuzpS16(r *arm.Int16X4X2, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpS32 VuzpS32\n//go:noescape\nfunc VuzpS32(r *arm.Int32X2X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpU8 VuzpU8\n//go:noescape\nfunc VuzpU8(r *arm.Uint8X8X2, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpU16 VuzpU16\n//go:noescape\nfunc VuzpU16(r *arm.Uint16X4X2, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpU32 VuzpU32\n//go:noescape\nfunc VuzpU32(r *arm.Uint32X2X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpF32 VuzpF32\n//go:noescape\nfunc VuzpF32(r *arm.Float32X2X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1S8 Vuzp1S8\n//go:noescape\nfunc Vuzp1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1S16 Vuzp1S16\n//go:noescape\nfunc Vuzp1S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1S32 Vuzp1S32\n//go:noescape\nfunc Vuzp1S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1U8 Vuzp1U8\n//go:noescape\nfunc Vuzp1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1U16 Vuzp1U16\n//go:noescape\nfunc Vuzp1U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1U32 Vuzp1U32\n//go:noescape\nfunc Vuzp1U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1F32 Vuzp1F32\n//go:noescape\nfunc Vuzp1F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1P16 Vuzp1P16\n//go:noescape\nfunc Vuzp1P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1P8 Vuzp1P8\n//go:noescape\nfunc Vuzp1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QS8 Vuzp1QS8\n//go:noescape\nfunc Vuzp1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QS16 Vuzp1QS16\n//go:noescape\nfunc Vuzp1QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QS32 Vuzp1QS32\n//go:noescape\nfunc Vuzp1QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QS64 Vuzp1QS64\n//go:noescape\nfunc Vuzp1QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QU8 Vuzp1QU8\n//go:noescape\nfunc Vuzp1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QU16 Vuzp1QU16\n//go:noescape\nfunc Vuzp1QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QU32 Vuzp1QU32\n//go:noescape\nfunc Vuzp1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QU64 Vuzp1QU64\n//go:noescape\nfunc Vuzp1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QF32 Vuzp1QF32\n//go:noescape\nfunc Vuzp1QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QF64 Vuzp1QF64\n//go:noescape\nfunc Vuzp1QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QP16 Vuzp1QP16\n//go:noescape\nfunc Vuzp1QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QP64 Vuzp1QP64\n//go:noescape\nfunc Vuzp1QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QP8 Vuzp1QP8\n//go:noescape\nfunc Vuzp1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2S8 Vuzp2S8\n//go:noescape\nfunc Vuzp2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2S16 Vuzp2S16\n//go:noescape\nfunc Vuzp2S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2S32 Vuzp2S32\n//go:noescape\nfunc Vuzp2S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2U8 Vuzp2U8\n//go:noescape\nfunc Vuzp2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2U16 Vuzp2U16\n//go:noescape\nfunc Vuzp2U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2U32 Vuzp2U32\n//go:noescape\nfunc Vuzp2U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2F32 Vuzp2F32\n//go:noescape\nfunc Vuzp2F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2P16 Vuzp2P16\n//go:noescape\nfunc Vuzp2P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2P8 Vuzp2P8\n//go:noescape\nfunc Vuzp2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QS8 Vuzp2QS8\n//go:noescape\nfunc Vuzp2QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QS16 Vuzp2QS16\n//go:noescape\nfunc Vuzp2QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QS32 Vuzp2QS32\n//go:noescape\nfunc Vuzp2QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QS64 Vuzp2QS64\n//go:noescape\nfunc Vuzp2QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QU8 Vuzp2QU8\n//go:noescape\nfunc Vuzp2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QU16 Vuzp2QU16\n//go:noescape\nfunc Vuzp2QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QU32 Vuzp2QU32\n//go:noescape\nfunc Vuzp2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QU64 Vuzp2QU64\n//go:noescape\nfunc Vuzp2QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QF32 Vuzp2QF32\n//go:noescape\nfunc Vuzp2QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QF64 Vuzp2QF64\n//go:noescape\nfunc Vuzp2QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QP16 Vuzp2QP16\n//go:noescape\nfunc Vuzp2QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QP64 Vuzp2QP64\n//go:noescape\nfunc Vuzp2QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QP8 Vuzp2QP8\n//go:noescape\nfunc Vuzp2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpP16 VuzpP16\n//go:noescape\nfunc VuzpP16(r *arm.Poly16X4X2, v0 *arm.Poly16X4, v1 *arm.Poly16X4)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpP8 VuzpP8\n//go:noescape\nfunc VuzpP8(r *arm.Poly8X8X2, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpqS8 VuzpqS8\n//go:noescape\nfunc VuzpqS8(r *arm.Int8X16X2, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpqS16 VuzpqS16\n//go:noescape\nfunc VuzpqS16(r *arm.Int16X8X2, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpqS32 VuzpqS32\n//go:noescape\nfunc VuzpqS32(r *arm.Int32X4X2, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpqU8 VuzpqU8\n//go:noescape\nfunc VuzpqU8(r *arm.Uint8X16X2, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpqU16 VuzpqU16\n//go:noescape\nfunc VuzpqU16(r *arm.Uint16X8X2, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpqU32 VuzpqU32\n//go:noescape\nfunc VuzpqU32(r *arm.Uint32X4X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpqF32 VuzpqF32\n//go:noescape\nfunc VuzpqF32(r *arm.Float32X4X2, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpqP16 VuzpqP16\n//go:noescape\nfunc VuzpqP16(r *arm.Poly16X8X2, v0 *arm.Poly16X8, v1 *arm.Poly16X8)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VuzpqP8 VuzpqP8\n//go:noescape\nfunc VuzpqP8(r *arm.Poly8X16X2, v0 *arm.Poly8X16, v1 *arm.Poly8X16)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipS8 VzipS8\n//go:noescape\nfunc VzipS8(r *arm.Int8X8X2, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipS16 VzipS16\n//go:noescape\nfunc VzipS16(r *arm.Int16X4X2, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipS32 VzipS32\n//go:noescape\nfunc VzipS32(r *arm.Int32X2X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipU8 VzipU8\n//go:noescape\nfunc VzipU8(r *arm.Uint8X8X2, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipU16 VzipU16\n//go:noescape\nfunc VzipU16(r *arm.Uint16X4X2, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipU32 VzipU32\n//go:noescape\nfunc VzipU32(r *arm.Uint32X2X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipF32 VzipF32\n//go:noescape\nfunc VzipF32(r *arm.Float32X2X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1S8 Vzip1S8\n//go:noescape\nfunc Vzip1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1S16 Vzip1S16\n//go:noescape\nfunc Vzip1S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1S32 Vzip1S32\n//go:noescape\nfunc Vzip1S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1U8 Vzip1U8\n//go:noescape\nfunc Vzip1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1U16 Vzip1U16\n//go:noescape\nfunc Vzip1U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1U32 Vzip1U32\n//go:noescape\nfunc Vzip1U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1F32 Vzip1F32\n//go:noescape\nfunc Vzip1F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1P16 Vzip1P16\n//go:noescape\nfunc Vzip1P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1P8 Vzip1P8\n//go:noescape\nfunc Vzip1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QS8 Vzip1QS8\n//go:noescape\nfunc Vzip1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QS16 Vzip1QS16\n//go:noescape\nfunc Vzip1QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QS32 Vzip1QS32\n//go:noescape\nfunc Vzip1QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QS64 Vzip1QS64\n//go:noescape\nfunc Vzip1QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QU8 Vzip1QU8\n//go:noescape\nfunc Vzip1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QU16 Vzip1QU16\n//go:noescape\nfunc Vzip1QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QU32 Vzip1QU32\n//go:noescape\nfunc Vzip1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QU64 Vzip1QU64\n//go:noescape\nfunc Vzip1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QF32 Vzip1QF32\n//go:noescape\nfunc Vzip1QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QF64 Vzip1QF64\n//go:noescape\nfunc Vzip1QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QP16 Vzip1QP16\n//go:noescape\nfunc Vzip1QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QP64 Vzip1QP64\n//go:noescape\nfunc Vzip1QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QP8 Vzip1QP8\n//go:noescape\nfunc Vzip1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2S8 Vzip2S8\n//go:noescape\nfunc Vzip2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2S16 Vzip2S16\n//go:noescape\nfunc Vzip2S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2S32 Vzip2S32\n//go:noescape\nfunc Vzip2S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2U8 Vzip2U8\n//go:noescape\nfunc Vzip2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2U16 Vzip2U16\n//go:noescape\nfunc Vzip2U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2U32 Vzip2U32\n//go:noescape\nfunc Vzip2U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2F32 Vzip2F32\n//go:noescape\nfunc Vzip2F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2P16 Vzip2P16\n//go:noescape\nfunc Vzip2P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2P8 Vzip2P8\n//go:noescape\nfunc Vzip2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QS8 Vzip2QS8\n//go:noescape\nfunc Vzip2QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QS16 Vzip2QS16\n//go:noescape\nfunc Vzip2QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QS32 Vzip2QS32\n//go:noescape\nfunc Vzip2QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QS64 Vzip2QS64\n//go:noescape\nfunc Vzip2QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QU8 Vzip2QU8\n//go:noescape\nfunc Vzip2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QU16 Vzip2QU16\n//go:noescape\nfunc Vzip2QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QU32 Vzip2QU32\n//go:noescape\nfunc Vzip2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QU64 Vzip2QU64\n//go:noescape\nfunc Vzip2QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QF32 Vzip2QF32\n//go:noescape\nfunc Vzip2QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QF64 Vzip2QF64\n//go:noescape\nfunc Vzip2QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QP16 Vzip2QP16\n//go:noescape\nfunc Vzip2QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QP64 Vzip2QP64\n//go:noescape\nfunc Vzip2QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QP8 Vzip2QP8\n//go:noescape\nfunc Vzip2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipP16 VzipP16\n//go:noescape\nfunc VzipP16(r *arm.Poly16X4X2, v0 *arm.Poly16X4, v1 *arm.Poly16X4)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipP8 VzipP8\n//go:noescape\nfunc VzipP8(r *arm.Poly8X8X2, v0 *arm.Poly8X8, v1 *arm.Poly8X8)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipqS8 VzipqS8\n//go:noescape\nfunc VzipqS8(r *arm.Int8X16X2, v0 *arm.Int8X16, v1 *arm.Int8X16)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipqS16 VzipqS16\n//go:noescape\nfunc VzipqS16(r *arm.Int16X8X2, v0 *arm.Int16X8, v1 *arm.Int16X8)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipqS32 VzipqS32\n//go:noescape\nfunc VzipqS32(r *arm.Int32X4X2, v0 *arm.Int32X4, v1 *arm.Int32X4)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipqU8 VzipqU8\n//go:noescape\nfunc VzipqU8(r *arm.Uint8X16X2, v0 *arm.Uint8X16, v1 *arm.Uint8X16)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipqU16 VzipqU16\n//go:noescape\nfunc VzipqU16(r *arm.Uint16X8X2, v0 *arm.Uint16X8, v1 *arm.Uint16X8)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipqU32 VzipqU32\n//go:noescape\nfunc VzipqU32(r *arm.Uint32X4X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipqF32 VzipqF32\n//go:noescape\nfunc VzipqF32(r *arm.Float32X4X2, v0 *arm.Float32X4, v1 *arm.Float32X4)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipqP16 VzipqP16\n//go:noescape\nfunc VzipqP16(r *arm.Poly16X8X2, v0 *arm.Poly16X8, v1 *arm.Poly16X8)\n\n// Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector.\n//\n//go:linkname VzipqP8 VzipqP8\n//go:noescape\nfunc VzipqP8(r *arm.Poly8X16X2, v0 *arm.Poly8X16, v1 *arm.Poly8X16)\n"
  },
  {
    "path": "arm/neon/functions_bypass.go",
    "content": "package neon\n\n/*\n#include <arm_neon.h>\nvoid vmulS8_bypass(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmul_s8(*v0, *v1); }\nvoid vmulS8_full(int8_t* r, int8_t* v0, int8_t* v1, int n) {\n\tint8x8_t* pr = (int8x8_t*)r;\n\tint8x8_t* pa = (int8x8_t*)v0;\n\tint8x8_t* pb = (int8x8_t*)v1;\n\tfor (int i=0; i<n; i+=8) {\n\t\t*pr = vmul_s8(*pa, *pb);\n\t\tpr += 1;\n\t\tpa += 1;\n\t\tpb += 1;\n\t}\n}\n*/\nimport \"C\"\nimport \"github.com/alivanz/go-simd/arm\"\n\n//go:linkname vmulS8_bypass vmulS8_bypass\n//go:noescape\nfunc vmulS8_bypass(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8)\n\n//go:linkname vmulS8_full vmulS8_full\n//go:noescape\nfunc vmulS8_full(r *int8, v0 *int8, v1 *int8, n int)\n"
  },
  {
    "path": "arm/neon/functions_cgo.go",
    "content": "package neon\n\n/*\n#cgo CFLAGS: -march=armv8.5-a+crypto+i8mm\n#include <arm_neon.h>\n*/\nimport \"C\"\n\ntype int8x8 = C.int8x8_t\n\nfunc vmulS8_cgo(r, v0, v1 *int8x8) {\n\t*r = C.vmul_s8(*v0, *v1)\n}\n"
  },
  {
    "path": "arm/neon/functions_test.go",
    "content": "package neon\n\nimport (\n\t\"math/rand\"\n\t\"reflect\"\n\t\"runtime\"\n\t\"testing\"\n\t\"unsafe\"\n\n\t\"github.com/alivanz/go-simd/arm\"\n)\n\nfunc TestMult(t *testing.T) {\n\tvar (\n\t\ta      = arm.Int8X8{0, 1, 2, 3, 4, 5, 6, 7}\n\t\tb      = arm.Int8X8{7, 6, 5, 4, 3, 2, 1, 0}\n\t\tr      = arm.Int8X8{0, 6, 10, 12, 12, 10, 6, 0}\n\t\tresult arm.Int8X8\n\t)\n\tVmulS8(&result, &a, &b)\n\tif !reflect.DeepEqual(result, r) {\n\t\tt.Fatal(result)\n\t}\n}\n\nfunc TestMultFull(t *testing.T) {\n\tconst N = 1024 * 16\n\tvar (\n\t\ta      [N]int8\n\t\tb      [N]int8\n\t\tref    [N]int8\n\t\tresult [N]int8\n\t)\n\tfor i := 0; i < N; i++ {\n\t\ta[i] = int8(rand.Int())\n\t\tb[i] = int8(rand.Int())\n\t\tref[i] = a[i] * b[i]\n\t}\n\tvmulS8_full(&result[0], &a[0], &b[0], N)\n\tif !reflect.DeepEqual(result, ref) {\n\t\tt.Fail()\n\t}\n}\n\nfunc BenchmarkMultRef(t *testing.B) {\n\tconst N = 1024 * 16\n\tvar (\n\t\ta      [N]int8\n\t\tb      [N]int8\n\t\tresult [N]int8\n\t)\n\tfor j := range a[:] {\n\t\ta[j] = int8(rand.Int())\n\t\tb[j] = int8(rand.Int())\n\t}\n\tt.ResetTimer()\n\tfor i := 0; i < t.N; i++ {\n\t\tfor j := 0; j < N; j++ {\n\t\t\tresult[j] = a[j] * b[j]\n\t\t}\n\t}\n\truntime.KeepAlive(&result)\n}\n\nfunc BenchmarkMultSimd(t *testing.B) {\n\tconst N = 1024 * 16\n\tvar (\n\t\ta      [N]int8\n\t\tb      [N]int8\n\t\tresult [N]int8\n\t)\n\tfor i := 0; i < t.N; i++ {\n\t\tfor j := 0; j < N; j += 8 {\n\t\t\tVmulS8(\n\t\t\t\t(*arm.Int8X8)(unsafe.Pointer(&result[j])),\n\t\t\t\t(*arm.Int8X8)(unsafe.Pointer(&a[j])),\n\t\t\t\t(*arm.Int8X8)(unsafe.Pointer(&b[j])),\n\t\t\t)\n\t\t}\n\t}\n}\n\nfunc BenchmarkMultSimdBypass(t *testing.B) {\n\tconst N = 1024 * 16\n\tvar (\n\t\ta      [N]int8\n\t\tb      [N]int8\n\t\tresult [N]int8\n\t)\n\tfor i := 0; i < t.N; i++ {\n\t\tfor j := 0; j < N; j += 8 {\n\t\t\tvmulS8_bypass(\n\t\t\t\t(*arm.Int8X8)(unsafe.Pointer(&result[j])),\n\t\t\t\t(*arm.Int8X8)(unsafe.Pointer(&a[j])),\n\t\t\t\t(*arm.Int8X8)(unsafe.Pointer(&b[j])),\n\t\t\t)\n\t\t}\n\t}\n}\n\nfunc BenchmarkMultSimdFull(t *testing.B) {\n\tconst N = 1024 * 16\n\tvar (\n\t\ta      [N]int8\n\t\tb      [N]int8\n\t\tresult [N]int8\n\t)\n\tfor i := 0; i < t.N; i++ {\n\t\tvmulS8_full(\n\t\t\t&result[0],\n\t\t\t&a[0],\n\t\t\t&b[0],\n\t\t\tN,\n\t\t)\n\t}\n}\n\nfunc BenchmarkMultSimdCgo(t *testing.B) {\n\tconst N = 1024 * 16\n\tvar (\n\t\ta      [N]int8\n\t\tb      [N]int8\n\t\tresult [N]int8\n\t)\n\tfor i := 0; i < t.N; i++ {\n\t\tfor j := 0; j < N; j += 8 {\n\t\t\tvmulS8_cgo(\n\t\t\t\t(*int8x8)(unsafe.Pointer(&result[j])),\n\t\t\t\t(*int8x8)(unsafe.Pointer(&a[j])),\n\t\t\t\t(*int8x8)(unsafe.Pointer(&b[j])),\n\t\t\t)\n\t\t}\n\t}\n}\n"
  },
  {
    "path": "arm/neon/loops.c",
    "content": "#include <arm_neon.h>\n\n#define save(dst, src) *dst = src\n#define load(src) (*src)\n#define LOOP1(name, rtype, itype, f, set, load, rstep, istep) \\\n    void name(rtype *r, itype *v, int32_t n)                  \\\n    {                                                         \\\n        while (n >= rstep)                                    \\\n        {                                                     \\\n            set(r, f(load(v)));                               \\\n            r += rstep;                                       \\\n            n -= rstep;                                       \\\n            v += istep;                                       \\\n        }                                                     \\\n    }\n\nLOOP1(VabsS8N, int8_t, int8_t, vabs_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP1(VabsS16N, int16_t, int16_t, vabs_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP1(VabsS32N, int32_t, int32_t, vabs_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP1(VabsS64N, int64_t, int64_t, vabs_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP1(VabsF32N, float32_t, float32_t, vabs_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(VabsF64N, float64_t, float64_t, vabs_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(VabsdS64N, int64_t, int64_t, vabsd_s64, save, load, 1, 1)\nLOOP1(VabsqS8N, int8_t, int8_t, vabsq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP1(VabsqS16N, int16_t, int16_t, vabsq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP1(VabsqS32N, int32_t, int32_t, vabsq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP1(VabsqS64N, int64_t, int64_t, vabsq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP1(VabsqF32N, float32_t, float32_t, vabsq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(VabsqF64N, float64_t, float64_t, vabsq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(VaddvS8N, int8_t, int8_t, vaddv_s8, save, vld1_s8, 1, 8)\nLOOP1(VaddvS16N, int16_t, int16_t, vaddv_s16, save, vld1_s16, 1, 4)\nLOOP1(VaddvS32N, int32_t, int32_t, vaddv_s32, save, vld1_s32, 1, 2)\nLOOP1(VaddvU8N, uint8_t, uint8_t, vaddv_u8, save, vld1_u8, 1, 8)\nLOOP1(VaddvU16N, uint16_t, uint16_t, vaddv_u16, save, vld1_u16, 1, 4)\nLOOP1(VaddvU32N, uint32_t, uint32_t, vaddv_u32, save, vld1_u32, 1, 2)\nLOOP1(VaddvF32N, float32_t, float32_t, vaddv_f32, save, vld1_f32, 1, 2)\nLOOP1(VaddvqS8N, int8_t, int8_t, vaddvq_s8, save, vld1q_s8, 1, 16)\nLOOP1(VaddvqS16N, int16_t, int16_t, vaddvq_s16, save, vld1q_s16, 1, 8)\nLOOP1(VaddvqS32N, int32_t, int32_t, vaddvq_s32, save, vld1q_s32, 1, 4)\nLOOP1(VaddvqS64N, int64_t, int64_t, vaddvq_s64, save, vld1q_s64, 1, 2)\nLOOP1(VaddvqU8N, uint8_t, uint8_t, vaddvq_u8, save, vld1q_u8, 1, 16)\nLOOP1(VaddvqU16N, uint16_t, uint16_t, vaddvq_u16, save, vld1q_u16, 1, 8)\nLOOP1(VaddvqU32N, uint32_t, uint32_t, vaddvq_u32, save, vld1q_u32, 1, 4)\nLOOP1(VaddvqU64N, uint64_t, uint64_t, vaddvq_u64, save, vld1q_u64, 1, 2)\nLOOP1(VaddvqF32N, float32_t, float32_t, vaddvq_f32, save, vld1q_f32, 1, 4)\nLOOP1(VaddvqF64N, float64_t, float64_t, vaddvq_f64, save, vld1q_f64, 1, 2)\nLOOP1(VaesimcqU8N, uint8_t, uint8_t, vaesimcq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP1(VaesmcqU8N, uint8_t, uint8_t, vaesmcq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP1(VceqzS8N, uint8_t, int8_t, vceqz_s8, vst1_u8, vld1_s8, 8, 8)\nLOOP1(VceqzS16N, uint16_t, int16_t, vceqz_s16, vst1_u16, vld1_s16, 4, 4)\nLOOP1(VceqzS32N, uint32_t, int32_t, vceqz_s32, vst1_u32, vld1_s32, 2, 2)\nLOOP1(VceqzS64N, uint64_t, int64_t, vceqz_s64, vst1_u64, vld1_s64, 1, 1)\nLOOP1(VceqzU8N, uint8_t, uint8_t, vceqz_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP1(VceqzU16N, uint16_t, uint16_t, vceqz_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP1(VceqzU32N, uint32_t, uint32_t, vceqz_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP1(VceqzU64N, uint64_t, uint64_t, vceqz_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP1(VceqzF32N, uint32_t, float32_t, vceqz_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP1(VceqzF64N, uint64_t, float64_t, vceqz_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP1(VceqzdS64N, uint64_t, int64_t, vceqzd_s64, save, load, 1, 1)\nLOOP1(VceqzdU64N, uint64_t, uint64_t, vceqzd_u64, save, load, 1, 1)\nLOOP1(VceqzdF64N, uint64_t, float64_t, vceqzd_f64, save, load, 1, 1)\nLOOP1(VceqzqS8N, uint8_t, int8_t, vceqzq_s8, vst1q_u8, vld1q_s8, 16, 16)\nLOOP1(VceqzqS16N, uint16_t, int16_t, vceqzq_s16, vst1q_u16, vld1q_s16, 8, 8)\nLOOP1(VceqzqS32N, uint32_t, int32_t, vceqzq_s32, vst1q_u32, vld1q_s32, 4, 4)\nLOOP1(VceqzqS64N, uint64_t, int64_t, vceqzq_s64, vst1q_u64, vld1q_s64, 2, 2)\nLOOP1(VceqzqU8N, uint8_t, uint8_t, vceqzq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP1(VceqzqU16N, uint16_t, uint16_t, vceqzq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP1(VceqzqU32N, uint32_t, uint32_t, vceqzq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP1(VceqzqU64N, uint64_t, uint64_t, vceqzq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP1(VceqzqF32N, uint32_t, float32_t, vceqzq_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP1(VceqzqF64N, uint64_t, float64_t, vceqzq_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP1(VceqzsF32N, uint32_t, float32_t, vceqzs_f32, save, load, 1, 1)\nLOOP1(VcgezS8N, uint8_t, int8_t, vcgez_s8, vst1_u8, vld1_s8, 8, 8)\nLOOP1(VcgezS16N, uint16_t, int16_t, vcgez_s16, vst1_u16, vld1_s16, 4, 4)\nLOOP1(VcgezS32N, uint32_t, int32_t, vcgez_s32, vst1_u32, vld1_s32, 2, 2)\nLOOP1(VcgezS64N, uint64_t, int64_t, vcgez_s64, vst1_u64, vld1_s64, 1, 1)\nLOOP1(VcgezF32N, uint32_t, float32_t, vcgez_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP1(VcgezF64N, uint64_t, float64_t, vcgez_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP1(VcgezdS64N, uint64_t, int64_t, vcgezd_s64, save, load, 1, 1)\nLOOP1(VcgezdF64N, uint64_t, float64_t, vcgezd_f64, save, load, 1, 1)\nLOOP1(VcgezqS8N, uint8_t, int8_t, vcgezq_s8, vst1q_u8, vld1q_s8, 16, 16)\nLOOP1(VcgezqS16N, uint16_t, int16_t, vcgezq_s16, vst1q_u16, vld1q_s16, 8, 8)\nLOOP1(VcgezqS32N, uint32_t, int32_t, vcgezq_s32, vst1q_u32, vld1q_s32, 4, 4)\nLOOP1(VcgezqS64N, uint64_t, int64_t, vcgezq_s64, vst1q_u64, vld1q_s64, 2, 2)\nLOOP1(VcgezqF32N, uint32_t, float32_t, vcgezq_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP1(VcgezqF64N, uint64_t, float64_t, vcgezq_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP1(VcgezsF32N, uint32_t, float32_t, vcgezs_f32, save, load, 1, 1)\nLOOP1(VcgtzS8N, uint8_t, int8_t, vcgtz_s8, vst1_u8, vld1_s8, 8, 8)\nLOOP1(VcgtzS16N, uint16_t, int16_t, vcgtz_s16, vst1_u16, vld1_s16, 4, 4)\nLOOP1(VcgtzS32N, uint32_t, int32_t, vcgtz_s32, vst1_u32, vld1_s32, 2, 2)\nLOOP1(VcgtzS64N, uint64_t, int64_t, vcgtz_s64, vst1_u64, vld1_s64, 1, 1)\nLOOP1(VcgtzF32N, uint32_t, float32_t, vcgtz_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP1(VcgtzF64N, uint64_t, float64_t, vcgtz_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP1(VcgtzdS64N, uint64_t, int64_t, vcgtzd_s64, save, load, 1, 1)\nLOOP1(VcgtzdF64N, uint64_t, float64_t, vcgtzd_f64, save, load, 1, 1)\nLOOP1(VcgtzqS8N, uint8_t, int8_t, vcgtzq_s8, vst1q_u8, vld1q_s8, 16, 16)\nLOOP1(VcgtzqS16N, uint16_t, int16_t, vcgtzq_s16, vst1q_u16, vld1q_s16, 8, 8)\nLOOP1(VcgtzqS32N, uint32_t, int32_t, vcgtzq_s32, vst1q_u32, vld1q_s32, 4, 4)\nLOOP1(VcgtzqS64N, uint64_t, int64_t, vcgtzq_s64, vst1q_u64, vld1q_s64, 2, 2)\nLOOP1(VcgtzqF32N, uint32_t, float32_t, vcgtzq_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP1(VcgtzqF64N, uint64_t, float64_t, vcgtzq_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP1(VcgtzsF32N, uint32_t, float32_t, vcgtzs_f32, save, load, 1, 1)\nLOOP1(VclezS8N, uint8_t, int8_t, vclez_s8, vst1_u8, vld1_s8, 8, 8)\nLOOP1(VclezS16N, uint16_t, int16_t, vclez_s16, vst1_u16, vld1_s16, 4, 4)\nLOOP1(VclezS32N, uint32_t, int32_t, vclez_s32, vst1_u32, vld1_s32, 2, 2)\nLOOP1(VclezS64N, uint64_t, int64_t, vclez_s64, vst1_u64, vld1_s64, 1, 1)\nLOOP1(VclezF32N, uint32_t, float32_t, vclez_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP1(VclezF64N, uint64_t, float64_t, vclez_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP1(VclezdS64N, uint64_t, int64_t, vclezd_s64, save, load, 1, 1)\nLOOP1(VclezdF64N, uint64_t, float64_t, vclezd_f64, save, load, 1, 1)\nLOOP1(VclezqS8N, uint8_t, int8_t, vclezq_s8, vst1q_u8, vld1q_s8, 16, 16)\nLOOP1(VclezqS16N, uint16_t, int16_t, vclezq_s16, vst1q_u16, vld1q_s16, 8, 8)\nLOOP1(VclezqS32N, uint32_t, int32_t, vclezq_s32, vst1q_u32, vld1q_s32, 4, 4)\nLOOP1(VclezqS64N, uint64_t, int64_t, vclezq_s64, vst1q_u64, vld1q_s64, 2, 2)\nLOOP1(VclezqF32N, uint32_t, float32_t, vclezq_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP1(VclezqF64N, uint64_t, float64_t, vclezq_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP1(VclezsF32N, uint32_t, float32_t, vclezs_f32, save, load, 1, 1)\nLOOP1(VclsS8N, int8_t, int8_t, vcls_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP1(VclsS16N, int16_t, int16_t, vcls_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP1(VclsS32N, int32_t, int32_t, vcls_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP1(VclsU8N, int8_t, uint8_t, vcls_u8, vst1_s8, vld1_u8, 8, 8)\nLOOP1(VclsU16N, int16_t, uint16_t, vcls_u16, vst1_s16, vld1_u16, 4, 4)\nLOOP1(VclsU32N, int32_t, uint32_t, vcls_u32, vst1_s32, vld1_u32, 2, 2)\nLOOP1(VclsqS8N, int8_t, int8_t, vclsq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP1(VclsqS16N, int16_t, int16_t, vclsq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP1(VclsqS32N, int32_t, int32_t, vclsq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP1(VclsqU8N, int8_t, uint8_t, vclsq_u8, vst1q_s8, vld1q_u8, 16, 16)\nLOOP1(VclsqU16N, int16_t, uint16_t, vclsq_u16, vst1q_s16, vld1q_u16, 8, 8)\nLOOP1(VclsqU32N, int32_t, uint32_t, vclsq_u32, vst1q_s32, vld1q_u32, 4, 4)\nLOOP1(VcltzS8N, uint8_t, int8_t, vcltz_s8, vst1_u8, vld1_s8, 8, 8)\nLOOP1(VcltzS16N, uint16_t, int16_t, vcltz_s16, vst1_u16, vld1_s16, 4, 4)\nLOOP1(VcltzS32N, uint32_t, int32_t, vcltz_s32, vst1_u32, vld1_s32, 2, 2)\nLOOP1(VcltzS64N, uint64_t, int64_t, vcltz_s64, vst1_u64, vld1_s64, 1, 1)\nLOOP1(VcltzF32N, uint32_t, float32_t, vcltz_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP1(VcltzF64N, uint64_t, float64_t, vcltz_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP1(VcltzdS64N, uint64_t, int64_t, vcltzd_s64, save, load, 1, 1)\nLOOP1(VcltzdF64N, uint64_t, float64_t, vcltzd_f64, save, load, 1, 1)\nLOOP1(VcltzqS8N, uint8_t, int8_t, vcltzq_s8, vst1q_u8, vld1q_s8, 16, 16)\nLOOP1(VcltzqS16N, uint16_t, int16_t, vcltzq_s16, vst1q_u16, vld1q_s16, 8, 8)\nLOOP1(VcltzqS32N, uint32_t, int32_t, vcltzq_s32, vst1q_u32, vld1q_s32, 4, 4)\nLOOP1(VcltzqS64N, uint64_t, int64_t, vcltzq_s64, vst1q_u64, vld1q_s64, 2, 2)\nLOOP1(VcltzqF32N, uint32_t, float32_t, vcltzq_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP1(VcltzqF64N, uint64_t, float64_t, vcltzq_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP1(VcltzsF32N, uint32_t, float32_t, vcltzs_f32, save, load, 1, 1)\nLOOP1(VclzS8N, int8_t, int8_t, vclz_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP1(VclzS16N, int16_t, int16_t, vclz_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP1(VclzS32N, int32_t, int32_t, vclz_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP1(VclzU8N, uint8_t, uint8_t, vclz_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP1(VclzU16N, uint16_t, uint16_t, vclz_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP1(VclzU32N, uint32_t, uint32_t, vclz_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP1(VclzqS8N, int8_t, int8_t, vclzq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP1(VclzqS16N, int16_t, int16_t, vclzq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP1(VclzqS32N, int32_t, int32_t, vclzq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP1(VclzqU8N, uint8_t, uint8_t, vclzq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP1(VclzqU16N, uint16_t, uint16_t, vclzq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP1(VclzqU32N, uint32_t, uint32_t, vclzq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP1(VcntS8N, int8_t, int8_t, vcnt_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP1(VcntU8N, uint8_t, uint8_t, vcnt_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP1(VcntqS8N, int8_t, int8_t, vcntq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP1(VcntqU8N, uint8_t, uint8_t, vcntq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP1(VcvtF32S32N, float32_t, int32_t, vcvt_f32_s32, vst1_f32, vld1_s32, 2, 2)\nLOOP1(VcvtF32U32N, float32_t, uint32_t, vcvt_f32_u32, vst1_f32, vld1_u32, 2, 2)\nLOOP1(VcvtF64S64N, float64_t, int64_t, vcvt_f64_s64, vst1_f64, vld1_s64, 1, 1)\nLOOP1(VcvtF64U64N, float64_t, uint64_t, vcvt_f64_u64, vst1_f64, vld1_u64, 1, 1)\nLOOP1(VcvtS32F32N, int32_t, float32_t, vcvt_s32_f32, vst1_s32, vld1_f32, 2, 2)\nLOOP1(VcvtS64F64N, int64_t, float64_t, vcvt_s64_f64, vst1_s64, vld1_f64, 1, 1)\nLOOP1(VcvtU32F32N, uint32_t, float32_t, vcvt_u32_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP1(VcvtU64F64N, uint64_t, float64_t, vcvt_u64_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP1(VcvtaS32F32N, int32_t, float32_t, vcvta_s32_f32, vst1_s32, vld1_f32, 2, 2)\nLOOP1(VcvtaS64F64N, int64_t, float64_t, vcvta_s64_f64, vst1_s64, vld1_f64, 1, 1)\nLOOP1(VcvtaU32F32N, uint32_t, float32_t, vcvta_u32_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP1(VcvtaU64F64N, uint64_t, float64_t, vcvta_u64_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP1(VcvtadS64F64N, int64_t, float64_t, vcvtad_s64_f64, save, load, 1, 1)\nLOOP1(VcvtadU64F64N, uint64_t, float64_t, vcvtad_u64_f64, save, load, 1, 1)\nLOOP1(VcvtaqS32F32N, int32_t, float32_t, vcvtaq_s32_f32, vst1q_s32, vld1q_f32, 4, 4)\nLOOP1(VcvtaqS64F64N, int64_t, float64_t, vcvtaq_s64_f64, vst1q_s64, vld1q_f64, 2, 2)\nLOOP1(VcvtaqU32F32N, uint32_t, float32_t, vcvtaq_u32_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP1(VcvtaqU64F64N, uint64_t, float64_t, vcvtaq_u64_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP1(VcvtasS32F32N, int32_t, float32_t, vcvtas_s32_f32, save, load, 1, 1)\nLOOP1(VcvtasU32F32N, uint32_t, float32_t, vcvtas_u32_f32, save, load, 1, 1)\nLOOP1(VcvtdF64S64N, float64_t, int64_t, vcvtd_f64_s64, save, load, 1, 1)\nLOOP1(VcvtdF64U64N, float64_t, uint64_t, vcvtd_f64_u64, save, load, 1, 1)\nLOOP1(VcvtdS64F64N, int64_t, float64_t, vcvtd_s64_f64, save, load, 1, 1)\nLOOP1(VcvtdU64F64N, uint64_t, float64_t, vcvtd_u64_f64, save, load, 1, 1)\nLOOP1(VcvtmS32F32N, int32_t, float32_t, vcvtm_s32_f32, vst1_s32, vld1_f32, 2, 2)\nLOOP1(VcvtmS64F64N, int64_t, float64_t, vcvtm_s64_f64, vst1_s64, vld1_f64, 1, 1)\nLOOP1(VcvtmU32F32N, uint32_t, float32_t, vcvtm_u32_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP1(VcvtmU64F64N, uint64_t, float64_t, vcvtm_u64_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP1(VcvtmdS64F64N, int64_t, float64_t, vcvtmd_s64_f64, save, load, 1, 1)\nLOOP1(VcvtmdU64F64N, uint64_t, float64_t, vcvtmd_u64_f64, save, load, 1, 1)\nLOOP1(VcvtmqS32F32N, int32_t, float32_t, vcvtmq_s32_f32, vst1q_s32, vld1q_f32, 4, 4)\nLOOP1(VcvtmqS64F64N, int64_t, float64_t, vcvtmq_s64_f64, vst1q_s64, vld1q_f64, 2, 2)\nLOOP1(VcvtmqU32F32N, uint32_t, float32_t, vcvtmq_u32_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP1(VcvtmqU64F64N, uint64_t, float64_t, vcvtmq_u64_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP1(VcvtmsS32F32N, int32_t, float32_t, vcvtms_s32_f32, save, load, 1, 1)\nLOOP1(VcvtmsU32F32N, uint32_t, float32_t, vcvtms_u32_f32, save, load, 1, 1)\nLOOP1(VcvtnS32F32N, int32_t, float32_t, vcvtn_s32_f32, vst1_s32, vld1_f32, 2, 2)\nLOOP1(VcvtnS64F64N, int64_t, float64_t, vcvtn_s64_f64, vst1_s64, vld1_f64, 1, 1)\nLOOP1(VcvtnU32F32N, uint32_t, float32_t, vcvtn_u32_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP1(VcvtnU64F64N, uint64_t, float64_t, vcvtn_u64_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP1(VcvtndS64F64N, int64_t, float64_t, vcvtnd_s64_f64, save, load, 1, 1)\nLOOP1(VcvtndU64F64N, uint64_t, float64_t, vcvtnd_u64_f64, save, load, 1, 1)\nLOOP1(VcvtnqS32F32N, int32_t, float32_t, vcvtnq_s32_f32, vst1q_s32, vld1q_f32, 4, 4)\nLOOP1(VcvtnqS64F64N, int64_t, float64_t, vcvtnq_s64_f64, vst1q_s64, vld1q_f64, 2, 2)\nLOOP1(VcvtnqU32F32N, uint32_t, float32_t, vcvtnq_u32_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP1(VcvtnqU64F64N, uint64_t, float64_t, vcvtnq_u64_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP1(VcvtnsS32F32N, int32_t, float32_t, vcvtns_s32_f32, save, load, 1, 1)\nLOOP1(VcvtnsU32F32N, uint32_t, float32_t, vcvtns_u32_f32, save, load, 1, 1)\nLOOP1(VcvtpS32F32N, int32_t, float32_t, vcvtp_s32_f32, vst1_s32, vld1_f32, 2, 2)\nLOOP1(VcvtpS64F64N, int64_t, float64_t, vcvtp_s64_f64, vst1_s64, vld1_f64, 1, 1)\nLOOP1(VcvtpU32F32N, uint32_t, float32_t, vcvtp_u32_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP1(VcvtpU64F64N, uint64_t, float64_t, vcvtp_u64_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP1(VcvtpdS64F64N, int64_t, float64_t, vcvtpd_s64_f64, save, load, 1, 1)\nLOOP1(VcvtpdU64F64N, uint64_t, float64_t, vcvtpd_u64_f64, save, load, 1, 1)\nLOOP1(VcvtpqS32F32N, int32_t, float32_t, vcvtpq_s32_f32, vst1q_s32, vld1q_f32, 4, 4)\nLOOP1(VcvtpqS64F64N, int64_t, float64_t, vcvtpq_s64_f64, vst1q_s64, vld1q_f64, 2, 2)\nLOOP1(VcvtpqU32F32N, uint32_t, float32_t, vcvtpq_u32_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP1(VcvtpqU64F64N, uint64_t, float64_t, vcvtpq_u64_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP1(VcvtpsS32F32N, int32_t, float32_t, vcvtps_s32_f32, save, load, 1, 1)\nLOOP1(VcvtpsU32F32N, uint32_t, float32_t, vcvtps_u32_f32, save, load, 1, 1)\nLOOP1(VcvtqF32S32N, float32_t, int32_t, vcvtq_f32_s32, vst1q_f32, vld1q_s32, 4, 4)\nLOOP1(VcvtqF32U32N, float32_t, uint32_t, vcvtq_f32_u32, vst1q_f32, vld1q_u32, 4, 4)\nLOOP1(VcvtqF64S64N, float64_t, int64_t, vcvtq_f64_s64, vst1q_f64, vld1q_s64, 2, 2)\nLOOP1(VcvtqF64U64N, float64_t, uint64_t, vcvtq_f64_u64, vst1q_f64, vld1q_u64, 2, 2)\nLOOP1(VcvtqS32F32N, int32_t, float32_t, vcvtq_s32_f32, vst1q_s32, vld1q_f32, 4, 4)\nLOOP1(VcvtqS64F64N, int64_t, float64_t, vcvtq_s64_f64, vst1q_s64, vld1q_f64, 2, 2)\nLOOP1(VcvtqU32F32N, uint32_t, float32_t, vcvtq_u32_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP1(VcvtqU64F64N, uint64_t, float64_t, vcvtq_u64_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP1(VcvtsF32S32N, float32_t, int32_t, vcvts_f32_s32, save, load, 1, 1)\nLOOP1(VcvtsF32U32N, float32_t, uint32_t, vcvts_f32_u32, save, load, 1, 1)\nLOOP1(VcvtsS32F32N, int32_t, float32_t, vcvts_s32_f32, save, load, 1, 1)\nLOOP1(VcvtsU32F32N, uint32_t, float32_t, vcvts_u32_f32, save, load, 1, 1)\nLOOP1(VdupNS8N, int8_t, int8_t, vdup_n_s8, vst1_s8, load, 8, 1)\nLOOP1(VdupNS16N, int16_t, int16_t, vdup_n_s16, vst1_s16, load, 4, 1)\nLOOP1(VdupNS32N, int32_t, int32_t, vdup_n_s32, vst1_s32, load, 2, 1)\nLOOP1(VdupNS64N, int64_t, int64_t, vdup_n_s64, vst1_s64, load, 1, 1)\nLOOP1(VdupNU8N, uint8_t, uint8_t, vdup_n_u8, vst1_u8, load, 8, 1)\nLOOP1(VdupNU16N, uint16_t, uint16_t, vdup_n_u16, vst1_u16, load, 4, 1)\nLOOP1(VdupNU32N, uint32_t, uint32_t, vdup_n_u32, vst1_u32, load, 2, 1)\nLOOP1(VdupNU64N, uint64_t, uint64_t, vdup_n_u64, vst1_u64, load, 1, 1)\nLOOP1(VdupNF32N, float32_t, float32_t, vdup_n_f32, vst1_f32, load, 2, 1)\nLOOP1(VdupNF64N, float64_t, float64_t, vdup_n_f64, vst1_f64, load, 1, 1)\nLOOP1(VdupqNS8N, int8_t, int8_t, vdupq_n_s8, vst1q_s8, load, 16, 1)\nLOOP1(VdupqNS16N, int16_t, int16_t, vdupq_n_s16, vst1q_s16, load, 8, 1)\nLOOP1(VdupqNS32N, int32_t, int32_t, vdupq_n_s32, vst1q_s32, load, 4, 1)\nLOOP1(VdupqNS64N, int64_t, int64_t, vdupq_n_s64, vst1q_s64, load, 2, 1)\nLOOP1(VdupqNU8N, uint8_t, uint8_t, vdupq_n_u8, vst1q_u8, load, 16, 1)\nLOOP1(VdupqNU16N, uint16_t, uint16_t, vdupq_n_u16, vst1q_u16, load, 8, 1)\nLOOP1(VdupqNU32N, uint32_t, uint32_t, vdupq_n_u32, vst1q_u32, load, 4, 1)\nLOOP1(VdupqNU64N, uint64_t, uint64_t, vdupq_n_u64, vst1q_u64, load, 2, 1)\nLOOP1(VdupqNF32N, float32_t, float32_t, vdupq_n_f32, vst1q_f32, load, 4, 1)\nLOOP1(VdupqNF64N, float64_t, float64_t, vdupq_n_f64, vst1q_f64, load, 2, 1)\nLOOP1(VgetHighS8N, int8_t, int8_t, vget_high_s8, vst1_s8, vld1q_s8, 8, 16)\nLOOP1(VgetHighS16N, int16_t, int16_t, vget_high_s16, vst1_s16, vld1q_s16, 4, 8)\nLOOP1(VgetHighS32N, int32_t, int32_t, vget_high_s32, vst1_s32, vld1q_s32, 2, 4)\nLOOP1(VgetHighS64N, int64_t, int64_t, vget_high_s64, vst1_s64, vld1q_s64, 1, 2)\nLOOP1(VgetHighU8N, uint8_t, uint8_t, vget_high_u8, vst1_u8, vld1q_u8, 8, 16)\nLOOP1(VgetHighU16N, uint16_t, uint16_t, vget_high_u16, vst1_u16, vld1q_u16, 4, 8)\nLOOP1(VgetHighU32N, uint32_t, uint32_t, vget_high_u32, vst1_u32, vld1q_u32, 2, 4)\nLOOP1(VgetHighU64N, uint64_t, uint64_t, vget_high_u64, vst1_u64, vld1q_u64, 1, 2)\nLOOP1(VgetHighF32N, float32_t, float32_t, vget_high_f32, vst1_f32, vld1q_f32, 2, 4)\nLOOP1(VgetHighF64N, float64_t, float64_t, vget_high_f64, vst1_f64, vld1q_f64, 1, 2)\nLOOP1(VgetLowS8N, int8_t, int8_t, vget_low_s8, vst1_s8, vld1q_s8, 8, 16)\nLOOP1(VgetLowS16N, int16_t, int16_t, vget_low_s16, vst1_s16, vld1q_s16, 4, 8)\nLOOP1(VgetLowS32N, int32_t, int32_t, vget_low_s32, vst1_s32, vld1q_s32, 2, 4)\nLOOP1(VgetLowS64N, int64_t, int64_t, vget_low_s64, vst1_s64, vld1q_s64, 1, 2)\nLOOP1(VgetLowU8N, uint8_t, uint8_t, vget_low_u8, vst1_u8, vld1q_u8, 8, 16)\nLOOP1(VgetLowU16N, uint16_t, uint16_t, vget_low_u16, vst1_u16, vld1q_u16, 4, 8)\nLOOP1(VgetLowU32N, uint32_t, uint32_t, vget_low_u32, vst1_u32, vld1q_u32, 2, 4)\nLOOP1(VgetLowU64N, uint64_t, uint64_t, vget_low_u64, vst1_u64, vld1q_u64, 1, 2)\nLOOP1(VgetLowF32N, float32_t, float32_t, vget_low_f32, vst1_f32, vld1q_f32, 2, 4)\nLOOP1(VgetLowF64N, float64_t, float64_t, vget_low_f64, vst1_f64, vld1q_f64, 1, 2)\nLOOP1(VmaxnmvF32N, float32_t, float32_t, vmaxnmv_f32, save, vld1_f32, 1, 2)\nLOOP1(VmaxnmvqF32N, float32_t, float32_t, vmaxnmvq_f32, save, vld1q_f32, 1, 4)\nLOOP1(VmaxnmvqF64N, float64_t, float64_t, vmaxnmvq_f64, save, vld1q_f64, 1, 2)\nLOOP1(VmaxvS8N, int8_t, int8_t, vmaxv_s8, save, vld1_s8, 1, 8)\nLOOP1(VmaxvS16N, int16_t, int16_t, vmaxv_s16, save, vld1_s16, 1, 4)\nLOOP1(VmaxvS32N, int32_t, int32_t, vmaxv_s32, save, vld1_s32, 1, 2)\nLOOP1(VmaxvU8N, uint8_t, uint8_t, vmaxv_u8, save, vld1_u8, 1, 8)\nLOOP1(VmaxvU16N, uint16_t, uint16_t, vmaxv_u16, save, vld1_u16, 1, 4)\nLOOP1(VmaxvU32N, uint32_t, uint32_t, vmaxv_u32, save, vld1_u32, 1, 2)\nLOOP1(VmaxvF32N, float32_t, float32_t, vmaxv_f32, save, vld1_f32, 1, 2)\nLOOP1(VmaxvqS8N, int8_t, int8_t, vmaxvq_s8, save, vld1q_s8, 1, 16)\nLOOP1(VmaxvqS16N, int16_t, int16_t, vmaxvq_s16, save, vld1q_s16, 1, 8)\nLOOP1(VmaxvqS32N, int32_t, int32_t, vmaxvq_s32, save, vld1q_s32, 1, 4)\nLOOP1(VmaxvqU8N, uint8_t, uint8_t, vmaxvq_u8, save, vld1q_u8, 1, 16)\nLOOP1(VmaxvqU16N, uint16_t, uint16_t, vmaxvq_u16, save, vld1q_u16, 1, 8)\nLOOP1(VmaxvqU32N, uint32_t, uint32_t, vmaxvq_u32, save, vld1q_u32, 1, 4)\nLOOP1(VmaxvqF32N, float32_t, float32_t, vmaxvq_f32, save, vld1q_f32, 1, 4)\nLOOP1(VmaxvqF64N, float64_t, float64_t, vmaxvq_f64, save, vld1q_f64, 1, 2)\nLOOP1(VminnmvF32N, float32_t, float32_t, vminnmv_f32, save, vld1_f32, 1, 2)\nLOOP1(VminnmvqF32N, float32_t, float32_t, vminnmvq_f32, save, vld1q_f32, 1, 4)\nLOOP1(VminnmvqF64N, float64_t, float64_t, vminnmvq_f64, save, vld1q_f64, 1, 2)\nLOOP1(VminvS8N, int8_t, int8_t, vminv_s8, save, vld1_s8, 1, 8)\nLOOP1(VminvS16N, int16_t, int16_t, vminv_s16, save, vld1_s16, 1, 4)\nLOOP1(VminvS32N, int32_t, int32_t, vminv_s32, save, vld1_s32, 1, 2)\nLOOP1(VminvU8N, uint8_t, uint8_t, vminv_u8, save, vld1_u8, 1, 8)\nLOOP1(VminvU16N, uint16_t, uint16_t, vminv_u16, save, vld1_u16, 1, 4)\nLOOP1(VminvU32N, uint32_t, uint32_t, vminv_u32, save, vld1_u32, 1, 2)\nLOOP1(VminvF32N, float32_t, float32_t, vminv_f32, save, vld1_f32, 1, 2)\nLOOP1(VminvqS8N, int8_t, int8_t, vminvq_s8, save, vld1q_s8, 1, 16)\nLOOP1(VminvqS16N, int16_t, int16_t, vminvq_s16, save, vld1q_s16, 1, 8)\nLOOP1(VminvqS32N, int32_t, int32_t, vminvq_s32, save, vld1q_s32, 1, 4)\nLOOP1(VminvqU8N, uint8_t, uint8_t, vminvq_u8, save, vld1q_u8, 1, 16)\nLOOP1(VminvqU16N, uint16_t, uint16_t, vminvq_u16, save, vld1q_u16, 1, 8)\nLOOP1(VminvqU32N, uint32_t, uint32_t, vminvq_u32, save, vld1q_u32, 1, 4)\nLOOP1(VminvqF32N, float32_t, float32_t, vminvq_f32, save, vld1q_f32, 1, 4)\nLOOP1(VminvqF64N, float64_t, float64_t, vminvq_f64, save, vld1q_f64, 1, 2)\nLOOP1(VmovNS8N, int8_t, int8_t, vmov_n_s8, vst1_s8, load, 8, 1)\nLOOP1(VmovNS16N, int16_t, int16_t, vmov_n_s16, vst1_s16, load, 4, 1)\nLOOP1(VmovNS32N, int32_t, int32_t, vmov_n_s32, vst1_s32, load, 2, 1)\nLOOP1(VmovNS64N, int64_t, int64_t, vmov_n_s64, vst1_s64, load, 1, 1)\nLOOP1(VmovNU8N, uint8_t, uint8_t, vmov_n_u8, vst1_u8, load, 8, 1)\nLOOP1(VmovNU16N, uint16_t, uint16_t, vmov_n_u16, vst1_u16, load, 4, 1)\nLOOP1(VmovNU32N, uint32_t, uint32_t, vmov_n_u32, vst1_u32, load, 2, 1)\nLOOP1(VmovNU64N, uint64_t, uint64_t, vmov_n_u64, vst1_u64, load, 1, 1)\nLOOP1(VmovNF32N, float32_t, float32_t, vmov_n_f32, vst1_f32, load, 2, 1)\nLOOP1(VmovNF64N, float64_t, float64_t, vmov_n_f64, vst1_f64, load, 1, 1)\nLOOP1(VmovqNS8N, int8_t, int8_t, vmovq_n_s8, vst1q_s8, load, 16, 1)\nLOOP1(VmovqNS16N, int16_t, int16_t, vmovq_n_s16, vst1q_s16, load, 8, 1)\nLOOP1(VmovqNS32N, int32_t, int32_t, vmovq_n_s32, vst1q_s32, load, 4, 1)\nLOOP1(VmovqNS64N, int64_t, int64_t, vmovq_n_s64, vst1q_s64, load, 2, 1)\nLOOP1(VmovqNU8N, uint8_t, uint8_t, vmovq_n_u8, vst1q_u8, load, 16, 1)\nLOOP1(VmovqNU16N, uint16_t, uint16_t, vmovq_n_u16, vst1q_u16, load, 8, 1)\nLOOP1(VmovqNU32N, uint32_t, uint32_t, vmovq_n_u32, vst1q_u32, load, 4, 1)\nLOOP1(VmovqNU64N, uint64_t, uint64_t, vmovq_n_u64, vst1q_u64, load, 2, 1)\nLOOP1(VmovqNF32N, float32_t, float32_t, vmovq_n_f32, vst1q_f32, load, 4, 1)\nLOOP1(VmovqNF64N, float64_t, float64_t, vmovq_n_f64, vst1q_f64, load, 2, 1)\nLOOP1(VmvnS8N, int8_t, int8_t, vmvn_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP1(VmvnS16N, int16_t, int16_t, vmvn_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP1(VmvnS32N, int32_t, int32_t, vmvn_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP1(VmvnU8N, uint8_t, uint8_t, vmvn_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP1(VmvnU16N, uint16_t, uint16_t, vmvn_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP1(VmvnU32N, uint32_t, uint32_t, vmvn_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP1(VmvnqS8N, int8_t, int8_t, vmvnq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP1(VmvnqS16N, int16_t, int16_t, vmvnq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP1(VmvnqS32N, int32_t, int32_t, vmvnq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP1(VmvnqU8N, uint8_t, uint8_t, vmvnq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP1(VmvnqU16N, uint16_t, uint16_t, vmvnq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP1(VmvnqU32N, uint32_t, uint32_t, vmvnq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP1(VnegS8N, int8_t, int8_t, vneg_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP1(VnegS16N, int16_t, int16_t, vneg_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP1(VnegS32N, int32_t, int32_t, vneg_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP1(VnegS64N, int64_t, int64_t, vneg_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP1(VnegF32N, float32_t, float32_t, vneg_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(VnegF64N, float64_t, float64_t, vneg_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(VnegdS64N, int64_t, int64_t, vnegd_s64, save, load, 1, 1)\nLOOP1(VnegqS8N, int8_t, int8_t, vnegq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP1(VnegqS16N, int16_t, int16_t, vnegq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP1(VnegqS32N, int32_t, int32_t, vnegq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP1(VnegqS64N, int64_t, int64_t, vnegq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP1(VnegqF32N, float32_t, float32_t, vnegq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(VnegqF64N, float64_t, float64_t, vnegq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(VpadddS64N, int64_t, int64_t, vpaddd_s64, save, vld1q_s64, 1, 2)\nLOOP1(VpadddU64N, uint64_t, uint64_t, vpaddd_u64, save, vld1q_u64, 1, 2)\nLOOP1(VpadddF64N, float64_t, float64_t, vpaddd_f64, save, vld1q_f64, 1, 2)\nLOOP1(VpaddsF32N, float32_t, float32_t, vpadds_f32, save, vld1_f32, 1, 2)\nLOOP1(VpmaxnmqdF64N, float64_t, float64_t, vpmaxnmqd_f64, save, vld1q_f64, 1, 2)\nLOOP1(VpmaxnmsF32N, float32_t, float32_t, vpmaxnms_f32, save, vld1_f32, 1, 2)\nLOOP1(VpmaxqdF64N, float64_t, float64_t, vpmaxqd_f64, save, vld1q_f64, 1, 2)\nLOOP1(VpmaxsF32N, float32_t, float32_t, vpmaxs_f32, save, vld1_f32, 1, 2)\nLOOP1(VpminnmqdF64N, float64_t, float64_t, vpminnmqd_f64, save, vld1q_f64, 1, 2)\nLOOP1(VpminnmsF32N, float32_t, float32_t, vpminnms_f32, save, vld1_f32, 1, 2)\nLOOP1(VpminqdF64N, float64_t, float64_t, vpminqd_f64, save, vld1q_f64, 1, 2)\nLOOP1(VpminsF32N, float32_t, float32_t, vpmins_f32, save, vld1_f32, 1, 2)\nLOOP1(VqabsS8N, int8_t, int8_t, vqabs_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP1(VqabsS16N, int16_t, int16_t, vqabs_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP1(VqabsS32N, int32_t, int32_t, vqabs_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP1(VqabsS64N, int64_t, int64_t, vqabs_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP1(VqabsbS8N, int8_t, int8_t, vqabsb_s8, save, load, 1, 1)\nLOOP1(VqabsdS64N, int64_t, int64_t, vqabsd_s64, save, load, 1, 1)\nLOOP1(VqabshS16N, int16_t, int16_t, vqabsh_s16, save, load, 1, 1)\nLOOP1(VqabsqS8N, int8_t, int8_t, vqabsq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP1(VqabsqS16N, int16_t, int16_t, vqabsq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP1(VqabsqS32N, int32_t, int32_t, vqabsq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP1(VqabsqS64N, int64_t, int64_t, vqabsq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP1(VqabssS32N, int32_t, int32_t, vqabss_s32, save, load, 1, 1)\nLOOP1(VqnegS8N, int8_t, int8_t, vqneg_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP1(VqnegS16N, int16_t, int16_t, vqneg_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP1(VqnegS32N, int32_t, int32_t, vqneg_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP1(VqnegS64N, int64_t, int64_t, vqneg_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP1(VqnegbS8N, int8_t, int8_t, vqnegb_s8, save, load, 1, 1)\nLOOP1(VqnegdS64N, int64_t, int64_t, vqnegd_s64, save, load, 1, 1)\nLOOP1(VqneghS16N, int16_t, int16_t, vqnegh_s16, save, load, 1, 1)\nLOOP1(VqnegqS8N, int8_t, int8_t, vqnegq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP1(VqnegqS16N, int16_t, int16_t, vqnegq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP1(VqnegqS32N, int32_t, int32_t, vqnegq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP1(VqnegqS64N, int64_t, int64_t, vqnegq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP1(VqnegsS32N, int32_t, int32_t, vqnegs_s32, save, load, 1, 1)\nLOOP1(VrbitS8N, int8_t, int8_t, vrbit_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP1(VrbitU8N, uint8_t, uint8_t, vrbit_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP1(VrbitqS8N, int8_t, int8_t, vrbitq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP1(VrbitqU8N, uint8_t, uint8_t, vrbitq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP1(VrecpeU32N, uint32_t, uint32_t, vrecpe_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP1(VrecpeF32N, float32_t, float32_t, vrecpe_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(VrecpeF64N, float64_t, float64_t, vrecpe_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(VrecpedF64N, float64_t, float64_t, vrecped_f64, save, load, 1, 1)\nLOOP1(VrecpeqU32N, uint32_t, uint32_t, vrecpeq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP1(VrecpeqF32N, float32_t, float32_t, vrecpeq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(VrecpeqF64N, float64_t, float64_t, vrecpeq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(VrecpesF32N, float32_t, float32_t, vrecpes_f32, save, load, 1, 1)\nLOOP1(VrecpxdF64N, float64_t, float64_t, vrecpxd_f64, save, load, 1, 1)\nLOOP1(VrecpxsF32N, float32_t, float32_t, vrecpxs_f32, save, load, 1, 1)\nLOOP1(VreinterpretF32S32N, float32_t, int32_t, vreinterpret_f32_s32, vst1_f32, vld1_s32, 2, 2)\nLOOP1(VreinterpretF32U32N, float32_t, uint32_t, vreinterpret_f32_u32, vst1_f32, vld1_u32, 2, 2)\nLOOP1(VreinterpretF64S64N, float64_t, int64_t, vreinterpret_f64_s64, vst1_f64, vld1_s64, 1, 1)\nLOOP1(VreinterpretF64U64N, float64_t, uint64_t, vreinterpret_f64_u64, vst1_f64, vld1_u64, 1, 1)\nLOOP1(VreinterpretS16U16N, int16_t, uint16_t, vreinterpret_s16_u16, vst1_s16, vld1_u16, 4, 4)\nLOOP1(VreinterpretS32U32N, int32_t, uint32_t, vreinterpret_s32_u32, vst1_s32, vld1_u32, 2, 2)\nLOOP1(VreinterpretS32F32N, int32_t, float32_t, vreinterpret_s32_f32, vst1_s32, vld1_f32, 2, 2)\nLOOP1(VreinterpretS64U64N, int64_t, uint64_t, vreinterpret_s64_u64, vst1_s64, vld1_u64, 1, 1)\nLOOP1(VreinterpretS64F64N, int64_t, float64_t, vreinterpret_s64_f64, vst1_s64, vld1_f64, 1, 1)\nLOOP1(VreinterpretS8U8N, int8_t, uint8_t, vreinterpret_s8_u8, vst1_s8, vld1_u8, 8, 8)\nLOOP1(VreinterpretU16S16N, uint16_t, int16_t, vreinterpret_u16_s16, vst1_u16, vld1_s16, 4, 4)\nLOOP1(VreinterpretU32S32N, uint32_t, int32_t, vreinterpret_u32_s32, vst1_u32, vld1_s32, 2, 2)\nLOOP1(VreinterpretU32F32N, uint32_t, float32_t, vreinterpret_u32_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP1(VreinterpretU64S64N, uint64_t, int64_t, vreinterpret_u64_s64, vst1_u64, vld1_s64, 1, 1)\nLOOP1(VreinterpretU64F64N, uint64_t, float64_t, vreinterpret_u64_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP1(VreinterpretU8S8N, uint8_t, int8_t, vreinterpret_u8_s8, vst1_u8, vld1_s8, 8, 8)\nLOOP1(VreinterpretqF32S32N, float32_t, int32_t, vreinterpretq_f32_s32, vst1q_f32, vld1q_s32, 4, 4)\nLOOP1(VreinterpretqF32U32N, float32_t, uint32_t, vreinterpretq_f32_u32, vst1q_f32, vld1q_u32, 4, 4)\nLOOP1(VreinterpretqF64S64N, float64_t, int64_t, vreinterpretq_f64_s64, vst1q_f64, vld1q_s64, 2, 2)\nLOOP1(VreinterpretqF64U64N, float64_t, uint64_t, vreinterpretq_f64_u64, vst1q_f64, vld1q_u64, 2, 2)\nLOOP1(VreinterpretqS16U16N, int16_t, uint16_t, vreinterpretq_s16_u16, vst1q_s16, vld1q_u16, 8, 8)\nLOOP1(VreinterpretqS32U32N, int32_t, uint32_t, vreinterpretq_s32_u32, vst1q_s32, vld1q_u32, 4, 4)\nLOOP1(VreinterpretqS32F32N, int32_t, float32_t, vreinterpretq_s32_f32, vst1q_s32, vld1q_f32, 4, 4)\nLOOP1(VreinterpretqS64U64N, int64_t, uint64_t, vreinterpretq_s64_u64, vst1q_s64, vld1q_u64, 2, 2)\nLOOP1(VreinterpretqS64F64N, int64_t, float64_t, vreinterpretq_s64_f64, vst1q_s64, vld1q_f64, 2, 2)\nLOOP1(VreinterpretqS8U8N, int8_t, uint8_t, vreinterpretq_s8_u8, vst1q_s8, vld1q_u8, 16, 16)\nLOOP1(VreinterpretqU16S16N, uint16_t, int16_t, vreinterpretq_u16_s16, vst1q_u16, vld1q_s16, 8, 8)\nLOOP1(VreinterpretqU32S32N, uint32_t, int32_t, vreinterpretq_u32_s32, vst1q_u32, vld1q_s32, 4, 4)\nLOOP1(VreinterpretqU32F32N, uint32_t, float32_t, vreinterpretq_u32_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP1(VreinterpretqU64S64N, uint64_t, int64_t, vreinterpretq_u64_s64, vst1q_u64, vld1q_s64, 2, 2)\nLOOP1(VreinterpretqU64F64N, uint64_t, float64_t, vreinterpretq_u64_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP1(VreinterpretqU8S8N, uint8_t, int8_t, vreinterpretq_u8_s8, vst1q_u8, vld1q_s8, 16, 16)\nLOOP1(Vrev16S8N, int8_t, int8_t, vrev16_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP1(Vrev16U8N, uint8_t, uint8_t, vrev16_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP1(Vrev16QS8N, int8_t, int8_t, vrev16q_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP1(Vrev16QU8N, uint8_t, uint8_t, vrev16q_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP1(Vrev32S8N, int8_t, int8_t, vrev32_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP1(Vrev32S16N, int16_t, int16_t, vrev32_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP1(Vrev32U8N, uint8_t, uint8_t, vrev32_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP1(Vrev32U16N, uint16_t, uint16_t, vrev32_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP1(Vrev32QS8N, int8_t, int8_t, vrev32q_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP1(Vrev32QS16N, int16_t, int16_t, vrev32q_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP1(Vrev32QU8N, uint8_t, uint8_t, vrev32q_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP1(Vrev32QU16N, uint16_t, uint16_t, vrev32q_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP1(Vrev64S8N, int8_t, int8_t, vrev64_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP1(Vrev64S16N, int16_t, int16_t, vrev64_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP1(Vrev64S32N, int32_t, int32_t, vrev64_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP1(Vrev64U8N, uint8_t, uint8_t, vrev64_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP1(Vrev64U16N, uint16_t, uint16_t, vrev64_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP1(Vrev64U32N, uint32_t, uint32_t, vrev64_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP1(Vrev64F32N, float32_t, float32_t, vrev64_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(Vrev64QS8N, int8_t, int8_t, vrev64q_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP1(Vrev64QS16N, int16_t, int16_t, vrev64q_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP1(Vrev64QS32N, int32_t, int32_t, vrev64q_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP1(Vrev64QU8N, uint8_t, uint8_t, vrev64q_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP1(Vrev64QU16N, uint16_t, uint16_t, vrev64q_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP1(Vrev64QU32N, uint32_t, uint32_t, vrev64q_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP1(Vrev64QF32N, float32_t, float32_t, vrev64q_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(VrndF32N, float32_t, float32_t, vrnd_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(VrndF64N, float64_t, float64_t, vrnd_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(Vrnd32XF32N, float32_t, float32_t, vrnd32x_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(Vrnd32XF64N, float64_t, float64_t, vrnd32x_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(Vrnd32XqF32N, float32_t, float32_t, vrnd32xq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(Vrnd32XqF64N, float64_t, float64_t, vrnd32xq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(Vrnd32ZF32N, float32_t, float32_t, vrnd32z_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(Vrnd32ZF64N, float64_t, float64_t, vrnd32z_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(Vrnd32ZqF32N, float32_t, float32_t, vrnd32zq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(Vrnd32ZqF64N, float64_t, float64_t, vrnd32zq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(Vrnd64XF32N, float32_t, float32_t, vrnd64x_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(Vrnd64XF64N, float64_t, float64_t, vrnd64x_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(Vrnd64XqF32N, float32_t, float32_t, vrnd64xq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(Vrnd64XqF64N, float64_t, float64_t, vrnd64xq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(Vrnd64ZF32N, float32_t, float32_t, vrnd64z_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(Vrnd64ZF64N, float64_t, float64_t, vrnd64z_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(Vrnd64ZqF32N, float32_t, float32_t, vrnd64zq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(Vrnd64ZqF64N, float64_t, float64_t, vrnd64zq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(VrndaF32N, float32_t, float32_t, vrnda_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(VrndaF64N, float64_t, float64_t, vrnda_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(VrndaqF32N, float32_t, float32_t, vrndaq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(VrndaqF64N, float64_t, float64_t, vrndaq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(VrndiF32N, float32_t, float32_t, vrndi_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(VrndiF64N, float64_t, float64_t, vrndi_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(VrndiqF32N, float32_t, float32_t, vrndiq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(VrndiqF64N, float64_t, float64_t, vrndiq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(VrndmF32N, float32_t, float32_t, vrndm_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(VrndmF64N, float64_t, float64_t, vrndm_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(VrndmqF32N, float32_t, float32_t, vrndmq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(VrndmqF64N, float64_t, float64_t, vrndmq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(VrndnF32N, float32_t, float32_t, vrndn_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(VrndnF64N, float64_t, float64_t, vrndn_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(VrndnqF32N, float32_t, float32_t, vrndnq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(VrndnqF64N, float64_t, float64_t, vrndnq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(VrndnsF32N, float32_t, float32_t, vrndns_f32, save, load, 1, 1)\nLOOP1(VrndpF32N, float32_t, float32_t, vrndp_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(VrndpF64N, float64_t, float64_t, vrndp_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(VrndpqF32N, float32_t, float32_t, vrndpq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(VrndpqF64N, float64_t, float64_t, vrndpq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(VrndqF32N, float32_t, float32_t, vrndq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(VrndqF64N, float64_t, float64_t, vrndq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(VrndxF32N, float32_t, float32_t, vrndx_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(VrndxF64N, float64_t, float64_t, vrndx_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(VrndxqF32N, float32_t, float32_t, vrndxq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(VrndxqF64N, float64_t, float64_t, vrndxq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(VrsqrteU32N, uint32_t, uint32_t, vrsqrte_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP1(VrsqrteF32N, float32_t, float32_t, vrsqrte_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(VrsqrteF64N, float64_t, float64_t, vrsqrte_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(VrsqrtedF64N, float64_t, float64_t, vrsqrted_f64, save, load, 1, 1)\nLOOP1(VrsqrteqU32N, uint32_t, uint32_t, vrsqrteq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP1(VrsqrteqF32N, float32_t, float32_t, vrsqrteq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(VrsqrteqF64N, float64_t, float64_t, vrsqrteq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP1(VrsqrtesF32N, float32_t, float32_t, vrsqrtes_f32, save, load, 1, 1)\nLOOP1(Vsha1HU32N, uint32_t, uint32_t, vsha1h_u32, save, load, 1, 1)\nLOOP1(VsqrtF32N, float32_t, float32_t, vsqrt_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP1(VsqrtF64N, float64_t, float64_t, vsqrt_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP1(VsqrtqF32N, float32_t, float32_t, vsqrtq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP1(VsqrtqF64N, float64_t, float64_t, vsqrtq_f64, vst1q_f64, vld1q_f64, 2, 2)\n\n#define LOOP2(name, rtype, itype, f, set, load, rstep, istep) \\\n    void name(rtype *r, itype *v1, itype *v2, int32_t n)      \\\n    {                                                         \\\n        while (n >= rstep)                                    \\\n        {                                                     \\\n            set(r, f(load(v1), load(v2)));                    \\\n            r += rstep;                                       \\\n            n -= rstep;                                       \\\n            v1 += istep;                                      \\\n            v2 += istep;                                      \\\n        }                                                     \\\n    }\n\nLOOP2(VabdS8N, int8_t, int8_t, vabd_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VabdS16N, int16_t, int16_t, vabd_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VabdS32N, int32_t, int32_t, vabd_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VabdU8N, uint8_t, uint8_t, vabd_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VabdU16N, uint16_t, uint16_t, vabd_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VabdU32N, uint32_t, uint32_t, vabd_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VabdF32N, float32_t, float32_t, vabd_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VabdF64N, float64_t, float64_t, vabd_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP2(VabddF64N, float64_t, float64_t, vabdd_f64, save, load, 1, 1)\nLOOP2(VabdqS8N, int8_t, int8_t, vabdq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VabdqS16N, int16_t, int16_t, vabdq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VabdqS32N, int32_t, int32_t, vabdq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VabdqU8N, uint8_t, uint8_t, vabdq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VabdqU16N, uint16_t, uint16_t, vabdq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VabdqU32N, uint32_t, uint32_t, vabdq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VabdqF32N, float32_t, float32_t, vabdq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VabdqF64N, float64_t, float64_t, vabdq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VabdsF32N, float32_t, float32_t, vabds_f32, save, load, 1, 1)\nLOOP2(VaddS8N, int8_t, int8_t, vadd_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VaddS16N, int16_t, int16_t, vadd_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VaddS32N, int32_t, int32_t, vadd_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VaddS64N, int64_t, int64_t, vadd_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP2(VaddU8N, uint8_t, uint8_t, vadd_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VaddU16N, uint16_t, uint16_t, vadd_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VaddU32N, uint32_t, uint32_t, vadd_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VaddU64N, uint64_t, uint64_t, vadd_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VaddF32N, float32_t, float32_t, vadd_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VaddF64N, float64_t, float64_t, vadd_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP2(VadddS64N, int64_t, int64_t, vaddd_s64, save, load, 1, 1)\nLOOP2(VadddU64N, uint64_t, uint64_t, vaddd_u64, save, load, 1, 1)\nLOOP2(VaddqS8N, int8_t, int8_t, vaddq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VaddqS16N, int16_t, int16_t, vaddq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VaddqS32N, int32_t, int32_t, vaddq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VaddqS64N, int64_t, int64_t, vaddq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(VaddqU8N, uint8_t, uint8_t, vaddq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VaddqU16N, uint16_t, uint16_t, vaddq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VaddqU32N, uint32_t, uint32_t, vaddq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VaddqU64N, uint64_t, uint64_t, vaddq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VaddqF32N, float32_t, float32_t, vaddq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VaddqF64N, float64_t, float64_t, vaddq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VaesdqU8N, uint8_t, uint8_t, vaesdq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VaeseqU8N, uint8_t, uint8_t, vaeseq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VandS8N, int8_t, int8_t, vand_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VandS16N, int16_t, int16_t, vand_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VandS32N, int32_t, int32_t, vand_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VandS64N, int64_t, int64_t, vand_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP2(VandU8N, uint8_t, uint8_t, vand_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VandU16N, uint16_t, uint16_t, vand_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VandU32N, uint32_t, uint32_t, vand_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VandU64N, uint64_t, uint64_t, vand_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VandqS8N, int8_t, int8_t, vandq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VandqS16N, int16_t, int16_t, vandq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VandqS32N, int32_t, int32_t, vandq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VandqS64N, int64_t, int64_t, vandq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(VandqU8N, uint8_t, uint8_t, vandq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VandqU16N, uint16_t, uint16_t, vandq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VandqU32N, uint32_t, uint32_t, vandq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VandqU64N, uint64_t, uint64_t, vandq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VbicS8N, int8_t, int8_t, vbic_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VbicS16N, int16_t, int16_t, vbic_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VbicS32N, int32_t, int32_t, vbic_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VbicS64N, int64_t, int64_t, vbic_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP2(VbicU8N, uint8_t, uint8_t, vbic_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VbicU16N, uint16_t, uint16_t, vbic_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VbicU32N, uint32_t, uint32_t, vbic_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VbicU64N, uint64_t, uint64_t, vbic_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VbicqS8N, int8_t, int8_t, vbicq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VbicqS16N, int16_t, int16_t, vbicq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VbicqS32N, int32_t, int32_t, vbicq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VbicqS64N, int64_t, int64_t, vbicq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(VbicqU8N, uint8_t, uint8_t, vbicq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VbicqU16N, uint16_t, uint16_t, vbicq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VbicqU32N, uint32_t, uint32_t, vbicq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VbicqU64N, uint64_t, uint64_t, vbicq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VcaddRot270F32N, float32_t, float32_t, vcadd_rot270_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VcaddRot90F32N, float32_t, float32_t, vcadd_rot90_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VcaddqRot270F32N, float32_t, float32_t, vcaddq_rot270_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VcaddqRot270F64N, float64_t, float64_t, vcaddq_rot270_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VcaddqRot90F32N, float32_t, float32_t, vcaddq_rot90_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VcaddqRot90F64N, float64_t, float64_t, vcaddq_rot90_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VcageF32N, uint32_t, float32_t, vcage_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP2(VcageF64N, uint64_t, float64_t, vcage_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP2(VcagedF64N, uint64_t, float64_t, vcaged_f64, save, load, 1, 1)\nLOOP2(VcageqF32N, uint32_t, float32_t, vcageq_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP2(VcageqF64N, uint64_t, float64_t, vcageq_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP2(VcagesF32N, uint32_t, float32_t, vcages_f32, save, load, 1, 1)\nLOOP2(VcagtF32N, uint32_t, float32_t, vcagt_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP2(VcagtF64N, uint64_t, float64_t, vcagt_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP2(VcagtdF64N, uint64_t, float64_t, vcagtd_f64, save, load, 1, 1)\nLOOP2(VcagtqF32N, uint32_t, float32_t, vcagtq_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP2(VcagtqF64N, uint64_t, float64_t, vcagtq_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP2(VcagtsF32N, uint32_t, float32_t, vcagts_f32, save, load, 1, 1)\nLOOP2(VcaleF32N, uint32_t, float32_t, vcale_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP2(VcaleF64N, uint64_t, float64_t, vcale_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP2(VcaledF64N, uint64_t, float64_t, vcaled_f64, save, load, 1, 1)\nLOOP2(VcaleqF32N, uint32_t, float32_t, vcaleq_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP2(VcaleqF64N, uint64_t, float64_t, vcaleq_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP2(VcalesF32N, uint32_t, float32_t, vcales_f32, save, load, 1, 1)\nLOOP2(VcaltF32N, uint32_t, float32_t, vcalt_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP2(VcaltF64N, uint64_t, float64_t, vcalt_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP2(VcaltdF64N, uint64_t, float64_t, vcaltd_f64, save, load, 1, 1)\nLOOP2(VcaltqF32N, uint32_t, float32_t, vcaltq_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP2(VcaltqF64N, uint64_t, float64_t, vcaltq_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP2(VcaltsF32N, uint32_t, float32_t, vcalts_f32, save, load, 1, 1)\nLOOP2(VceqS8N, uint8_t, int8_t, vceq_s8, vst1_u8, vld1_s8, 8, 8)\nLOOP2(VceqS16N, uint16_t, int16_t, vceq_s16, vst1_u16, vld1_s16, 4, 4)\nLOOP2(VceqS32N, uint32_t, int32_t, vceq_s32, vst1_u32, vld1_s32, 2, 2)\nLOOP2(VceqS64N, uint64_t, int64_t, vceq_s64, vst1_u64, vld1_s64, 1, 1)\nLOOP2(VceqU8N, uint8_t, uint8_t, vceq_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VceqU16N, uint16_t, uint16_t, vceq_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VceqU32N, uint32_t, uint32_t, vceq_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VceqU64N, uint64_t, uint64_t, vceq_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VceqF32N, uint32_t, float32_t, vceq_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP2(VceqF64N, uint64_t, float64_t, vceq_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP2(VceqdS64N, uint64_t, int64_t, vceqd_s64, save, load, 1, 1)\nLOOP2(VceqdU64N, uint64_t, uint64_t, vceqd_u64, save, load, 1, 1)\nLOOP2(VceqdF64N, uint64_t, float64_t, vceqd_f64, save, load, 1, 1)\nLOOP2(VceqqS8N, uint8_t, int8_t, vceqq_s8, vst1q_u8, vld1q_s8, 16, 16)\nLOOP2(VceqqS16N, uint16_t, int16_t, vceqq_s16, vst1q_u16, vld1q_s16, 8, 8)\nLOOP2(VceqqS32N, uint32_t, int32_t, vceqq_s32, vst1q_u32, vld1q_s32, 4, 4)\nLOOP2(VceqqS64N, uint64_t, int64_t, vceqq_s64, vst1q_u64, vld1q_s64, 2, 2)\nLOOP2(VceqqU8N, uint8_t, uint8_t, vceqq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VceqqU16N, uint16_t, uint16_t, vceqq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VceqqU32N, uint32_t, uint32_t, vceqq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VceqqU64N, uint64_t, uint64_t, vceqq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VceqqF32N, uint32_t, float32_t, vceqq_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP2(VceqqF64N, uint64_t, float64_t, vceqq_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP2(VceqsF32N, uint32_t, float32_t, vceqs_f32, save, load, 1, 1)\nLOOP2(VcgeS8N, uint8_t, int8_t, vcge_s8, vst1_u8, vld1_s8, 8, 8)\nLOOP2(VcgeS16N, uint16_t, int16_t, vcge_s16, vst1_u16, vld1_s16, 4, 4)\nLOOP2(VcgeS32N, uint32_t, int32_t, vcge_s32, vst1_u32, vld1_s32, 2, 2)\nLOOP2(VcgeS64N, uint64_t, int64_t, vcge_s64, vst1_u64, vld1_s64, 1, 1)\nLOOP2(VcgeU8N, uint8_t, uint8_t, vcge_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VcgeU16N, uint16_t, uint16_t, vcge_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VcgeU32N, uint32_t, uint32_t, vcge_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VcgeU64N, uint64_t, uint64_t, vcge_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VcgeF32N, uint32_t, float32_t, vcge_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP2(VcgeF64N, uint64_t, float64_t, vcge_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP2(VcgedS64N, uint64_t, int64_t, vcged_s64, save, load, 1, 1)\nLOOP2(VcgedU64N, uint64_t, uint64_t, vcged_u64, save, load, 1, 1)\nLOOP2(VcgedF64N, uint64_t, float64_t, vcged_f64, save, load, 1, 1)\nLOOP2(VcgeqS8N, uint8_t, int8_t, vcgeq_s8, vst1q_u8, vld1q_s8, 16, 16)\nLOOP2(VcgeqS16N, uint16_t, int16_t, vcgeq_s16, vst1q_u16, vld1q_s16, 8, 8)\nLOOP2(VcgeqS32N, uint32_t, int32_t, vcgeq_s32, vst1q_u32, vld1q_s32, 4, 4)\nLOOP2(VcgeqS64N, uint64_t, int64_t, vcgeq_s64, vst1q_u64, vld1q_s64, 2, 2)\nLOOP2(VcgeqU8N, uint8_t, uint8_t, vcgeq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VcgeqU16N, uint16_t, uint16_t, vcgeq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VcgeqU32N, uint32_t, uint32_t, vcgeq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VcgeqU64N, uint64_t, uint64_t, vcgeq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VcgeqF32N, uint32_t, float32_t, vcgeq_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP2(VcgeqF64N, uint64_t, float64_t, vcgeq_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP2(VcgesF32N, uint32_t, float32_t, vcges_f32, save, load, 1, 1)\nLOOP2(VcgtS8N, uint8_t, int8_t, vcgt_s8, vst1_u8, vld1_s8, 8, 8)\nLOOP2(VcgtS16N, uint16_t, int16_t, vcgt_s16, vst1_u16, vld1_s16, 4, 4)\nLOOP2(VcgtS32N, uint32_t, int32_t, vcgt_s32, vst1_u32, vld1_s32, 2, 2)\nLOOP2(VcgtS64N, uint64_t, int64_t, vcgt_s64, vst1_u64, vld1_s64, 1, 1)\nLOOP2(VcgtU8N, uint8_t, uint8_t, vcgt_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VcgtU16N, uint16_t, uint16_t, vcgt_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VcgtU32N, uint32_t, uint32_t, vcgt_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VcgtU64N, uint64_t, uint64_t, vcgt_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VcgtF32N, uint32_t, float32_t, vcgt_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP2(VcgtF64N, uint64_t, float64_t, vcgt_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP2(VcgtdS64N, uint64_t, int64_t, vcgtd_s64, save, load, 1, 1)\nLOOP2(VcgtdU64N, uint64_t, uint64_t, vcgtd_u64, save, load, 1, 1)\nLOOP2(VcgtdF64N, uint64_t, float64_t, vcgtd_f64, save, load, 1, 1)\nLOOP2(VcgtqS8N, uint8_t, int8_t, vcgtq_s8, vst1q_u8, vld1q_s8, 16, 16)\nLOOP2(VcgtqS16N, uint16_t, int16_t, vcgtq_s16, vst1q_u16, vld1q_s16, 8, 8)\nLOOP2(VcgtqS32N, uint32_t, int32_t, vcgtq_s32, vst1q_u32, vld1q_s32, 4, 4)\nLOOP2(VcgtqS64N, uint64_t, int64_t, vcgtq_s64, vst1q_u64, vld1q_s64, 2, 2)\nLOOP2(VcgtqU8N, uint8_t, uint8_t, vcgtq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VcgtqU16N, uint16_t, uint16_t, vcgtq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VcgtqU32N, uint32_t, uint32_t, vcgtq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VcgtqU64N, uint64_t, uint64_t, vcgtq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VcgtqF32N, uint32_t, float32_t, vcgtq_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP2(VcgtqF64N, uint64_t, float64_t, vcgtq_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP2(VcgtsF32N, uint32_t, float32_t, vcgts_f32, save, load, 1, 1)\nLOOP2(VcleS8N, uint8_t, int8_t, vcle_s8, vst1_u8, vld1_s8, 8, 8)\nLOOP2(VcleS16N, uint16_t, int16_t, vcle_s16, vst1_u16, vld1_s16, 4, 4)\nLOOP2(VcleS32N, uint32_t, int32_t, vcle_s32, vst1_u32, vld1_s32, 2, 2)\nLOOP2(VcleS64N, uint64_t, int64_t, vcle_s64, vst1_u64, vld1_s64, 1, 1)\nLOOP2(VcleU8N, uint8_t, uint8_t, vcle_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VcleU16N, uint16_t, uint16_t, vcle_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VcleU32N, uint32_t, uint32_t, vcle_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VcleU64N, uint64_t, uint64_t, vcle_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VcleF32N, uint32_t, float32_t, vcle_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP2(VcleF64N, uint64_t, float64_t, vcle_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP2(VcledS64N, uint64_t, int64_t, vcled_s64, save, load, 1, 1)\nLOOP2(VcledU64N, uint64_t, uint64_t, vcled_u64, save, load, 1, 1)\nLOOP2(VcledF64N, uint64_t, float64_t, vcled_f64, save, load, 1, 1)\nLOOP2(VcleqS8N, uint8_t, int8_t, vcleq_s8, vst1q_u8, vld1q_s8, 16, 16)\nLOOP2(VcleqS16N, uint16_t, int16_t, vcleq_s16, vst1q_u16, vld1q_s16, 8, 8)\nLOOP2(VcleqS32N, uint32_t, int32_t, vcleq_s32, vst1q_u32, vld1q_s32, 4, 4)\nLOOP2(VcleqS64N, uint64_t, int64_t, vcleq_s64, vst1q_u64, vld1q_s64, 2, 2)\nLOOP2(VcleqU8N, uint8_t, uint8_t, vcleq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VcleqU16N, uint16_t, uint16_t, vcleq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VcleqU32N, uint32_t, uint32_t, vcleq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VcleqU64N, uint64_t, uint64_t, vcleq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VcleqF32N, uint32_t, float32_t, vcleq_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP2(VcleqF64N, uint64_t, float64_t, vcleq_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP2(VclesF32N, uint32_t, float32_t, vcles_f32, save, load, 1, 1)\nLOOP2(VcltS8N, uint8_t, int8_t, vclt_s8, vst1_u8, vld1_s8, 8, 8)\nLOOP2(VcltS16N, uint16_t, int16_t, vclt_s16, vst1_u16, vld1_s16, 4, 4)\nLOOP2(VcltS32N, uint32_t, int32_t, vclt_s32, vst1_u32, vld1_s32, 2, 2)\nLOOP2(VcltS64N, uint64_t, int64_t, vclt_s64, vst1_u64, vld1_s64, 1, 1)\nLOOP2(VcltU8N, uint8_t, uint8_t, vclt_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VcltU16N, uint16_t, uint16_t, vclt_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VcltU32N, uint32_t, uint32_t, vclt_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VcltU64N, uint64_t, uint64_t, vclt_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VcltF32N, uint32_t, float32_t, vclt_f32, vst1_u32, vld1_f32, 2, 2)\nLOOP2(VcltF64N, uint64_t, float64_t, vclt_f64, vst1_u64, vld1_f64, 1, 1)\nLOOP2(VcltdS64N, uint64_t, int64_t, vcltd_s64, save, load, 1, 1)\nLOOP2(VcltdU64N, uint64_t, uint64_t, vcltd_u64, save, load, 1, 1)\nLOOP2(VcltdF64N, uint64_t, float64_t, vcltd_f64, save, load, 1, 1)\nLOOP2(VcltqS8N, uint8_t, int8_t, vcltq_s8, vst1q_u8, vld1q_s8, 16, 16)\nLOOP2(VcltqS16N, uint16_t, int16_t, vcltq_s16, vst1q_u16, vld1q_s16, 8, 8)\nLOOP2(VcltqS32N, uint32_t, int32_t, vcltq_s32, vst1q_u32, vld1q_s32, 4, 4)\nLOOP2(VcltqS64N, uint64_t, int64_t, vcltq_s64, vst1q_u64, vld1q_s64, 2, 2)\nLOOP2(VcltqU8N, uint8_t, uint8_t, vcltq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VcltqU16N, uint16_t, uint16_t, vcltq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VcltqU32N, uint32_t, uint32_t, vcltq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VcltqU64N, uint64_t, uint64_t, vcltq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VcltqF32N, uint32_t, float32_t, vcltq_f32, vst1q_u32, vld1q_f32, 4, 4)\nLOOP2(VcltqF64N, uint64_t, float64_t, vcltq_f64, vst1q_u64, vld1q_f64, 2, 2)\nLOOP2(VcltsF32N, uint32_t, float32_t, vclts_f32, save, load, 1, 1)\nLOOP2(VcombineS8N, int8_t, int8_t, vcombine_s8, vst1q_s8, vld1_s8, 16, 8)\nLOOP2(VcombineS16N, int16_t, int16_t, vcombine_s16, vst1q_s16, vld1_s16, 8, 4)\nLOOP2(VcombineS32N, int32_t, int32_t, vcombine_s32, vst1q_s32, vld1_s32, 4, 2)\nLOOP2(VcombineS64N, int64_t, int64_t, vcombine_s64, vst1q_s64, vld1_s64, 2, 1)\nLOOP2(VcombineU8N, uint8_t, uint8_t, vcombine_u8, vst1q_u8, vld1_u8, 16, 8)\nLOOP2(VcombineU16N, uint16_t, uint16_t, vcombine_u16, vst1q_u16, vld1_u16, 8, 4)\nLOOP2(VcombineU32N, uint32_t, uint32_t, vcombine_u32, vst1q_u32, vld1_u32, 4, 2)\nLOOP2(VcombineU64N, uint64_t, uint64_t, vcombine_u64, vst1q_u64, vld1_u64, 2, 1)\nLOOP2(VcombineF32N, float32_t, float32_t, vcombine_f32, vst1q_f32, vld1_f32, 4, 2)\nLOOP2(VcombineF64N, float64_t, float64_t, vcombine_f64, vst1q_f64, vld1_f64, 2, 1)\nLOOP2(VdivF32N, float32_t, float32_t, vdiv_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VdivF64N, float64_t, float64_t, vdiv_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP2(VdivqF32N, float32_t, float32_t, vdivq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VdivqF64N, float64_t, float64_t, vdivq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VeorS8N, int8_t, int8_t, veor_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VeorS16N, int16_t, int16_t, veor_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VeorS32N, int32_t, int32_t, veor_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VeorS64N, int64_t, int64_t, veor_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP2(VeorU8N, uint8_t, uint8_t, veor_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VeorU16N, uint16_t, uint16_t, veor_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VeorU32N, uint32_t, uint32_t, veor_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VeorU64N, uint64_t, uint64_t, veor_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VeorqS8N, int8_t, int8_t, veorq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VeorqS16N, int16_t, int16_t, veorq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VeorqS32N, int32_t, int32_t, veorq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VeorqS64N, int64_t, int64_t, veorq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(VeorqU8N, uint8_t, uint8_t, veorq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VeorqU16N, uint16_t, uint16_t, veorq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VeorqU32N, uint32_t, uint32_t, veorq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VeorqU64N, uint64_t, uint64_t, veorq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VhaddS8N, int8_t, int8_t, vhadd_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VhaddS16N, int16_t, int16_t, vhadd_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VhaddS32N, int32_t, int32_t, vhadd_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VhaddU8N, uint8_t, uint8_t, vhadd_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VhaddU16N, uint16_t, uint16_t, vhadd_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VhaddU32N, uint32_t, uint32_t, vhadd_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VhaddqS8N, int8_t, int8_t, vhaddq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VhaddqS16N, int16_t, int16_t, vhaddq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VhaddqS32N, int32_t, int32_t, vhaddq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VhaddqU8N, uint8_t, uint8_t, vhaddq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VhaddqU16N, uint16_t, uint16_t, vhaddq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VhaddqU32N, uint32_t, uint32_t, vhaddq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VhsubS8N, int8_t, int8_t, vhsub_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VhsubS16N, int16_t, int16_t, vhsub_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VhsubS32N, int32_t, int32_t, vhsub_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VhsubU8N, uint8_t, uint8_t, vhsub_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VhsubU16N, uint16_t, uint16_t, vhsub_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VhsubU32N, uint32_t, uint32_t, vhsub_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VhsubqS8N, int8_t, int8_t, vhsubq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VhsubqS16N, int16_t, int16_t, vhsubq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VhsubqS32N, int32_t, int32_t, vhsubq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VhsubqU8N, uint8_t, uint8_t, vhsubq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VhsubqU16N, uint16_t, uint16_t, vhsubq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VhsubqU32N, uint32_t, uint32_t, vhsubq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VmaxS8N, int8_t, int8_t, vmax_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VmaxS16N, int16_t, int16_t, vmax_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VmaxS32N, int32_t, int32_t, vmax_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VmaxU8N, uint8_t, uint8_t, vmax_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VmaxU16N, uint16_t, uint16_t, vmax_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VmaxU32N, uint32_t, uint32_t, vmax_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VmaxF32N, float32_t, float32_t, vmax_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VmaxF64N, float64_t, float64_t, vmax_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP2(VmaxnmF32N, float32_t, float32_t, vmaxnm_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VmaxnmF64N, float64_t, float64_t, vmaxnm_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP2(VmaxnmqF32N, float32_t, float32_t, vmaxnmq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VmaxnmqF64N, float64_t, float64_t, vmaxnmq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VmaxqS8N, int8_t, int8_t, vmaxq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VmaxqS16N, int16_t, int16_t, vmaxq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VmaxqS32N, int32_t, int32_t, vmaxq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VmaxqU8N, uint8_t, uint8_t, vmaxq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VmaxqU16N, uint16_t, uint16_t, vmaxq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VmaxqU32N, uint32_t, uint32_t, vmaxq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VmaxqF32N, float32_t, float32_t, vmaxq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VmaxqF64N, float64_t, float64_t, vmaxq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VminS8N, int8_t, int8_t, vmin_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VminS16N, int16_t, int16_t, vmin_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VminS32N, int32_t, int32_t, vmin_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VminU8N, uint8_t, uint8_t, vmin_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VminU16N, uint16_t, uint16_t, vmin_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VminU32N, uint32_t, uint32_t, vmin_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VminF32N, float32_t, float32_t, vmin_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VminF64N, float64_t, float64_t, vmin_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP2(VminnmF32N, float32_t, float32_t, vminnm_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VminnmF64N, float64_t, float64_t, vminnm_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP2(VminnmqF32N, float32_t, float32_t, vminnmq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VminnmqF64N, float64_t, float64_t, vminnmq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VminqS8N, int8_t, int8_t, vminq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VminqS16N, int16_t, int16_t, vminq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VminqS32N, int32_t, int32_t, vminq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VminqU8N, uint8_t, uint8_t, vminq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VminqU16N, uint16_t, uint16_t, vminq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VminqU32N, uint32_t, uint32_t, vminq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VminqF32N, float32_t, float32_t, vminq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VminqF64N, float64_t, float64_t, vminq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VmulS8N, int8_t, int8_t, vmul_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VmulS16N, int16_t, int16_t, vmul_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VmulS32N, int32_t, int32_t, vmul_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VmulU8N, uint8_t, uint8_t, vmul_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VmulU16N, uint16_t, uint16_t, vmul_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VmulU32N, uint32_t, uint32_t, vmul_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VmulF32N, float32_t, float32_t, vmul_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VmulF64N, float64_t, float64_t, vmul_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP2(VmulqS8N, int8_t, int8_t, vmulq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VmulqS16N, int16_t, int16_t, vmulq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VmulqS32N, int32_t, int32_t, vmulq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VmulqU8N, uint8_t, uint8_t, vmulq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VmulqU16N, uint16_t, uint16_t, vmulq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VmulqU32N, uint32_t, uint32_t, vmulq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VmulqF32N, float32_t, float32_t, vmulq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VmulqF64N, float64_t, float64_t, vmulq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VmulxF32N, float32_t, float32_t, vmulx_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VmulxF64N, float64_t, float64_t, vmulx_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP2(VmulxdF64N, float64_t, float64_t, vmulxd_f64, save, load, 1, 1)\nLOOP2(VmulxqF32N, float32_t, float32_t, vmulxq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VmulxqF64N, float64_t, float64_t, vmulxq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VmulxsF32N, float32_t, float32_t, vmulxs_f32, save, load, 1, 1)\nLOOP2(VornS8N, int8_t, int8_t, vorn_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VornS16N, int16_t, int16_t, vorn_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VornS32N, int32_t, int32_t, vorn_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VornS64N, int64_t, int64_t, vorn_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP2(VornU8N, uint8_t, uint8_t, vorn_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VornU16N, uint16_t, uint16_t, vorn_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VornU32N, uint32_t, uint32_t, vorn_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VornU64N, uint64_t, uint64_t, vorn_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VornqS8N, int8_t, int8_t, vornq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VornqS16N, int16_t, int16_t, vornq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VornqS32N, int32_t, int32_t, vornq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VornqS64N, int64_t, int64_t, vornq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(VornqU8N, uint8_t, uint8_t, vornq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VornqU16N, uint16_t, uint16_t, vornq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VornqU32N, uint32_t, uint32_t, vornq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VornqU64N, uint64_t, uint64_t, vornq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VorrS8N, int8_t, int8_t, vorr_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VorrS16N, int16_t, int16_t, vorr_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VorrS32N, int32_t, int32_t, vorr_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VorrS64N, int64_t, int64_t, vorr_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP2(VorrU8N, uint8_t, uint8_t, vorr_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VorrU16N, uint16_t, uint16_t, vorr_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VorrU32N, uint32_t, uint32_t, vorr_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VorrU64N, uint64_t, uint64_t, vorr_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VorrqS8N, int8_t, int8_t, vorrq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VorrqS16N, int16_t, int16_t, vorrq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VorrqS32N, int32_t, int32_t, vorrq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VorrqS64N, int64_t, int64_t, vorrq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(VorrqU8N, uint8_t, uint8_t, vorrq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VorrqU16N, uint16_t, uint16_t, vorrq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VorrqU32N, uint32_t, uint32_t, vorrq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VorrqU64N, uint64_t, uint64_t, vorrq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VpaddS8N, int8_t, int8_t, vpadd_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VpaddS16N, int16_t, int16_t, vpadd_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VpaddS32N, int32_t, int32_t, vpadd_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VpaddU8N, uint8_t, uint8_t, vpadd_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VpaddU16N, uint16_t, uint16_t, vpadd_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VpaddU32N, uint32_t, uint32_t, vpadd_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VpaddF32N, float32_t, float32_t, vpadd_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VpaddqS8N, int8_t, int8_t, vpaddq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VpaddqS16N, int16_t, int16_t, vpaddq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VpaddqS32N, int32_t, int32_t, vpaddq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VpaddqS64N, int64_t, int64_t, vpaddq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(VpaddqU8N, uint8_t, uint8_t, vpaddq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VpaddqU16N, uint16_t, uint16_t, vpaddq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VpaddqU32N, uint32_t, uint32_t, vpaddq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VpaddqU64N, uint64_t, uint64_t, vpaddq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VpaddqF32N, float32_t, float32_t, vpaddq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VpaddqF64N, float64_t, float64_t, vpaddq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VpmaxS8N, int8_t, int8_t, vpmax_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VpmaxS16N, int16_t, int16_t, vpmax_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VpmaxS32N, int32_t, int32_t, vpmax_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VpmaxU8N, uint8_t, uint8_t, vpmax_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VpmaxU16N, uint16_t, uint16_t, vpmax_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VpmaxU32N, uint32_t, uint32_t, vpmax_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VpmaxF32N, float32_t, float32_t, vpmax_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VpmaxnmF32N, float32_t, float32_t, vpmaxnm_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VpmaxnmqF32N, float32_t, float32_t, vpmaxnmq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VpmaxnmqF64N, float64_t, float64_t, vpmaxnmq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VpmaxqS8N, int8_t, int8_t, vpmaxq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VpmaxqS16N, int16_t, int16_t, vpmaxq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VpmaxqS32N, int32_t, int32_t, vpmaxq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VpmaxqU8N, uint8_t, uint8_t, vpmaxq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VpmaxqU16N, uint16_t, uint16_t, vpmaxq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VpmaxqU32N, uint32_t, uint32_t, vpmaxq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VpmaxqF32N, float32_t, float32_t, vpmaxq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VpmaxqF64N, float64_t, float64_t, vpmaxq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VpminS8N, int8_t, int8_t, vpmin_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VpminS16N, int16_t, int16_t, vpmin_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VpminS32N, int32_t, int32_t, vpmin_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VpminU8N, uint8_t, uint8_t, vpmin_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VpminU16N, uint16_t, uint16_t, vpmin_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VpminU32N, uint32_t, uint32_t, vpmin_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VpminF32N, float32_t, float32_t, vpmin_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VpminnmF32N, float32_t, float32_t, vpminnm_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VpminnmqF32N, float32_t, float32_t, vpminnmq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VpminnmqF64N, float64_t, float64_t, vpminnmq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VpminqS8N, int8_t, int8_t, vpminq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VpminqS16N, int16_t, int16_t, vpminq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VpminqS32N, int32_t, int32_t, vpminq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VpminqU8N, uint8_t, uint8_t, vpminq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VpminqU16N, uint16_t, uint16_t, vpminq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VpminqU32N, uint32_t, uint32_t, vpminq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VpminqF32N, float32_t, float32_t, vpminq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VpminqF64N, float64_t, float64_t, vpminq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VqaddS8N, int8_t, int8_t, vqadd_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VqaddS16N, int16_t, int16_t, vqadd_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VqaddS32N, int32_t, int32_t, vqadd_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VqaddS64N, int64_t, int64_t, vqadd_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP2(VqaddU8N, uint8_t, uint8_t, vqadd_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VqaddU16N, uint16_t, uint16_t, vqadd_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VqaddU32N, uint32_t, uint32_t, vqadd_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VqaddU64N, uint64_t, uint64_t, vqadd_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VqaddbS8N, int8_t, int8_t, vqaddb_s8, save, load, 1, 1)\nLOOP2(VqaddbU8N, uint8_t, uint8_t, vqaddb_u8, save, load, 1, 1)\nLOOP2(VqadddS64N, int64_t, int64_t, vqaddd_s64, save, load, 1, 1)\nLOOP2(VqadddU64N, uint64_t, uint64_t, vqaddd_u64, save, load, 1, 1)\nLOOP2(VqaddhS16N, int16_t, int16_t, vqaddh_s16, save, load, 1, 1)\nLOOP2(VqaddhU16N, uint16_t, uint16_t, vqaddh_u16, save, load, 1, 1)\nLOOP2(VqaddqS8N, int8_t, int8_t, vqaddq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VqaddqS16N, int16_t, int16_t, vqaddq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VqaddqS32N, int32_t, int32_t, vqaddq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VqaddqS64N, int64_t, int64_t, vqaddq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(VqaddqU8N, uint8_t, uint8_t, vqaddq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VqaddqU16N, uint16_t, uint16_t, vqaddq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VqaddqU32N, uint32_t, uint32_t, vqaddq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VqaddqU64N, uint64_t, uint64_t, vqaddq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VqaddsS32N, int32_t, int32_t, vqadds_s32, save, load, 1, 1)\nLOOP2(VqaddsU32N, uint32_t, uint32_t, vqadds_u32, save, load, 1, 1)\nLOOP2(VqdmulhS16N, int16_t, int16_t, vqdmulh_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VqdmulhS32N, int32_t, int32_t, vqdmulh_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VqdmulhhS16N, int16_t, int16_t, vqdmulhh_s16, save, load, 1, 1)\nLOOP2(VqdmulhqS16N, int16_t, int16_t, vqdmulhq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VqdmulhqS32N, int32_t, int32_t, vqdmulhq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VqdmulhsS32N, int32_t, int32_t, vqdmulhs_s32, save, load, 1, 1)\nLOOP2(VqrdmulhS16N, int16_t, int16_t, vqrdmulh_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VqrdmulhS32N, int32_t, int32_t, vqrdmulh_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VqrdmulhhS16N, int16_t, int16_t, vqrdmulhh_s16, save, load, 1, 1)\nLOOP2(VqrdmulhqS16N, int16_t, int16_t, vqrdmulhq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VqrdmulhqS32N, int32_t, int32_t, vqrdmulhq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VqrdmulhsS32N, int32_t, int32_t, vqrdmulhs_s32, save, load, 1, 1)\nLOOP2(VqrshlS8N, int8_t, int8_t, vqrshl_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VqrshlS16N, int16_t, int16_t, vqrshl_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VqrshlS32N, int32_t, int32_t, vqrshl_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VqrshlS64N, int64_t, int64_t, vqrshl_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP2(VqrshlbS8N, int8_t, int8_t, vqrshlb_s8, save, load, 1, 1)\nLOOP2(VqrshldS64N, int64_t, int64_t, vqrshld_s64, save, load, 1, 1)\nLOOP2(VqrshlhS16N, int16_t, int16_t, vqrshlh_s16, save, load, 1, 1)\nLOOP2(VqrshlqS8N, int8_t, int8_t, vqrshlq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VqrshlqS16N, int16_t, int16_t, vqrshlq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VqrshlqS32N, int32_t, int32_t, vqrshlq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VqrshlqS64N, int64_t, int64_t, vqrshlq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(VqrshlsS32N, int32_t, int32_t, vqrshls_s32, save, load, 1, 1)\nLOOP2(VqshlS8N, int8_t, int8_t, vqshl_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VqshlS16N, int16_t, int16_t, vqshl_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VqshlS32N, int32_t, int32_t, vqshl_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VqshlS64N, int64_t, int64_t, vqshl_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP2(VqshlbS8N, int8_t, int8_t, vqshlb_s8, save, load, 1, 1)\nLOOP2(VqshldS64N, int64_t, int64_t, vqshld_s64, save, load, 1, 1)\nLOOP2(VqshlhS16N, int16_t, int16_t, vqshlh_s16, save, load, 1, 1)\nLOOP2(VqshlqS8N, int8_t, int8_t, vqshlq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VqshlqS16N, int16_t, int16_t, vqshlq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VqshlqS32N, int32_t, int32_t, vqshlq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VqshlqS64N, int64_t, int64_t, vqshlq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(VqshlsS32N, int32_t, int32_t, vqshls_s32, save, load, 1, 1)\nLOOP2(VqsubS8N, int8_t, int8_t, vqsub_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VqsubS16N, int16_t, int16_t, vqsub_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VqsubS32N, int32_t, int32_t, vqsub_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VqsubS64N, int64_t, int64_t, vqsub_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP2(VqsubU8N, uint8_t, uint8_t, vqsub_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VqsubU16N, uint16_t, uint16_t, vqsub_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VqsubU32N, uint32_t, uint32_t, vqsub_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VqsubU64N, uint64_t, uint64_t, vqsub_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VqsubbS8N, int8_t, int8_t, vqsubb_s8, save, load, 1, 1)\nLOOP2(VqsubbU8N, uint8_t, uint8_t, vqsubb_u8, save, load, 1, 1)\nLOOP2(VqsubdS64N, int64_t, int64_t, vqsubd_s64, save, load, 1, 1)\nLOOP2(VqsubdU64N, uint64_t, uint64_t, vqsubd_u64, save, load, 1, 1)\nLOOP2(VqsubhS16N, int16_t, int16_t, vqsubh_s16, save, load, 1, 1)\nLOOP2(VqsubhU16N, uint16_t, uint16_t, vqsubh_u16, save, load, 1, 1)\nLOOP2(VqsubqS8N, int8_t, int8_t, vqsubq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VqsubqS16N, int16_t, int16_t, vqsubq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VqsubqS32N, int32_t, int32_t, vqsubq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VqsubqS64N, int64_t, int64_t, vqsubq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(VqsubqU8N, uint8_t, uint8_t, vqsubq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VqsubqU16N, uint16_t, uint16_t, vqsubq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VqsubqU32N, uint32_t, uint32_t, vqsubq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VqsubqU64N, uint64_t, uint64_t, vqsubq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VqsubsS32N, int32_t, int32_t, vqsubs_s32, save, load, 1, 1)\nLOOP2(VqsubsU32N, uint32_t, uint32_t, vqsubs_u32, save, load, 1, 1)\nLOOP2(Vqtbl1QU8N, uint8_t, uint8_t, vqtbl1q_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(Vrax1QU64N, uint64_t, uint64_t, vrax1q_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VrecpsF32N, float32_t, float32_t, vrecps_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VrecpsF64N, float64_t, float64_t, vrecps_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP2(VrecpsdF64N, float64_t, float64_t, vrecpsd_f64, save, load, 1, 1)\nLOOP2(VrecpsqF32N, float32_t, float32_t, vrecpsq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VrecpsqF64N, float64_t, float64_t, vrecpsq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VrecpssF32N, float32_t, float32_t, vrecpss_f32, save, load, 1, 1)\nLOOP2(VrhaddS8N, int8_t, int8_t, vrhadd_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VrhaddS16N, int16_t, int16_t, vrhadd_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VrhaddS32N, int32_t, int32_t, vrhadd_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VrhaddU8N, uint8_t, uint8_t, vrhadd_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VrhaddU16N, uint16_t, uint16_t, vrhadd_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VrhaddU32N, uint32_t, uint32_t, vrhadd_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VrhaddqS8N, int8_t, int8_t, vrhaddq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VrhaddqS16N, int16_t, int16_t, vrhaddq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VrhaddqS32N, int32_t, int32_t, vrhaddq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VrhaddqU8N, uint8_t, uint8_t, vrhaddq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VrhaddqU16N, uint16_t, uint16_t, vrhaddq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VrhaddqU32N, uint32_t, uint32_t, vrhaddq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VrshlS8N, int8_t, int8_t, vrshl_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VrshlS16N, int16_t, int16_t, vrshl_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VrshlS32N, int32_t, int32_t, vrshl_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VrshlS64N, int64_t, int64_t, vrshl_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP2(VrshldS64N, int64_t, int64_t, vrshld_s64, save, load, 1, 1)\nLOOP2(VrshlqS8N, int8_t, int8_t, vrshlq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VrshlqS16N, int16_t, int16_t, vrshlq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VrshlqS32N, int32_t, int32_t, vrshlq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VrshlqS64N, int64_t, int64_t, vrshlq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(VrsqrtsF32N, float32_t, float32_t, vrsqrts_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VrsqrtsF64N, float64_t, float64_t, vrsqrts_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP2(VrsqrtsdF64N, float64_t, float64_t, vrsqrtsd_f64, save, load, 1, 1)\nLOOP2(VrsqrtsqF32N, float32_t, float32_t, vrsqrtsq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VrsqrtsqF64N, float64_t, float64_t, vrsqrtsq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VrsqrtssF32N, float32_t, float32_t, vrsqrtss_f32, save, load, 1, 1)\nLOOP2(Vsha1Su1QU32N, uint32_t, uint32_t, vsha1su1q_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(Vsha256Su0QU32N, uint32_t, uint32_t, vsha256su0q_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(Vsha512Su0QU64N, uint64_t, uint64_t, vsha512su0q_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VshlS8N, int8_t, int8_t, vshl_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VshlS16N, int16_t, int16_t, vshl_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VshlS32N, int32_t, int32_t, vshl_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VshlS64N, int64_t, int64_t, vshl_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP2(VshldS64N, int64_t, int64_t, vshld_s64, save, load, 1, 1)\nLOOP2(VshlqS8N, int8_t, int8_t, vshlq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VshlqS16N, int16_t, int16_t, vshlq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VshlqS32N, int32_t, int32_t, vshlq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VshlqS64N, int64_t, int64_t, vshlq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(Vsm4EkeyqU32N, uint32_t, uint32_t, vsm4ekeyq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(Vsm4EqU32N, uint32_t, uint32_t, vsm4eq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VsubS8N, int8_t, int8_t, vsub_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(VsubS16N, int16_t, int16_t, vsub_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(VsubS32N, int32_t, int32_t, vsub_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(VsubS64N, int64_t, int64_t, vsub_s64, vst1_s64, vld1_s64, 1, 1)\nLOOP2(VsubU8N, uint8_t, uint8_t, vsub_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VsubU16N, uint16_t, uint16_t, vsub_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VsubU32N, uint32_t, uint32_t, vsub_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VsubU64N, uint64_t, uint64_t, vsub_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VsubF32N, float32_t, float32_t, vsub_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(VsubF64N, float64_t, float64_t, vsub_f64, vst1_f64, vld1_f64, 1, 1)\nLOOP2(VsubdS64N, int64_t, int64_t, vsubd_s64, save, load, 1, 1)\nLOOP2(VsubdU64N, uint64_t, uint64_t, vsubd_u64, save, load, 1, 1)\nLOOP2(VsubqS8N, int8_t, int8_t, vsubq_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(VsubqS16N, int16_t, int16_t, vsubq_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(VsubqS32N, int32_t, int32_t, vsubq_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(VsubqS64N, int64_t, int64_t, vsubq_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(VsubqU8N, uint8_t, uint8_t, vsubq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VsubqU16N, uint16_t, uint16_t, vsubq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VsubqU32N, uint32_t, uint32_t, vsubq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VsubqU64N, uint64_t, uint64_t, vsubq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(VsubqF32N, float32_t, float32_t, vsubq_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(VsubqF64N, float64_t, float64_t, vsubq_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(Vtbl1S8N, int8_t, int8_t, vtbl1_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(Vtbl1U8N, uint8_t, uint8_t, vtbl1_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(Vtrn1S8N, int8_t, int8_t, vtrn1_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(Vtrn1S16N, int16_t, int16_t, vtrn1_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(Vtrn1S32N, int32_t, int32_t, vtrn1_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(Vtrn1U8N, uint8_t, uint8_t, vtrn1_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(Vtrn1U16N, uint16_t, uint16_t, vtrn1_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(Vtrn1U32N, uint32_t, uint32_t, vtrn1_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(Vtrn1F32N, float32_t, float32_t, vtrn1_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(Vtrn1QS8N, int8_t, int8_t, vtrn1q_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(Vtrn1QS16N, int16_t, int16_t, vtrn1q_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(Vtrn1QS32N, int32_t, int32_t, vtrn1q_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(Vtrn1QS64N, int64_t, int64_t, vtrn1q_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(Vtrn1QU8N, uint8_t, uint8_t, vtrn1q_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(Vtrn1QU16N, uint16_t, uint16_t, vtrn1q_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(Vtrn1QU32N, uint32_t, uint32_t, vtrn1q_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(Vtrn1QU64N, uint64_t, uint64_t, vtrn1q_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(Vtrn1QF32N, float32_t, float32_t, vtrn1q_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(Vtrn1QF64N, float64_t, float64_t, vtrn1q_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(Vtrn2S8N, int8_t, int8_t, vtrn2_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(Vtrn2S16N, int16_t, int16_t, vtrn2_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(Vtrn2S32N, int32_t, int32_t, vtrn2_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(Vtrn2U8N, uint8_t, uint8_t, vtrn2_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(Vtrn2U16N, uint16_t, uint16_t, vtrn2_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(Vtrn2U32N, uint32_t, uint32_t, vtrn2_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(Vtrn2F32N, float32_t, float32_t, vtrn2_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(Vtrn2QS8N, int8_t, int8_t, vtrn2q_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(Vtrn2QS16N, int16_t, int16_t, vtrn2q_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(Vtrn2QS32N, int32_t, int32_t, vtrn2q_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(Vtrn2QS64N, int64_t, int64_t, vtrn2q_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(Vtrn2QU8N, uint8_t, uint8_t, vtrn2q_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(Vtrn2QU16N, uint16_t, uint16_t, vtrn2q_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(Vtrn2QU32N, uint32_t, uint32_t, vtrn2q_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(Vtrn2QU64N, uint64_t, uint64_t, vtrn2q_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(Vtrn2QF32N, float32_t, float32_t, vtrn2q_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(Vtrn2QF64N, float64_t, float64_t, vtrn2q_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(VtstS8N, uint8_t, int8_t, vtst_s8, vst1_u8, vld1_s8, 8, 8)\nLOOP2(VtstS16N, uint16_t, int16_t, vtst_s16, vst1_u16, vld1_s16, 4, 4)\nLOOP2(VtstS32N, uint32_t, int32_t, vtst_s32, vst1_u32, vld1_s32, 2, 2)\nLOOP2(VtstS64N, uint64_t, int64_t, vtst_s64, vst1_u64, vld1_s64, 1, 1)\nLOOP2(VtstU8N, uint8_t, uint8_t, vtst_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(VtstU16N, uint16_t, uint16_t, vtst_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(VtstU32N, uint32_t, uint32_t, vtst_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(VtstU64N, uint64_t, uint64_t, vtst_u64, vst1_u64, vld1_u64, 1, 1)\nLOOP2(VtstdS64N, uint64_t, int64_t, vtstd_s64, save, load, 1, 1)\nLOOP2(VtstdU64N, uint64_t, uint64_t, vtstd_u64, save, load, 1, 1)\nLOOP2(VtstqS8N, uint8_t, int8_t, vtstq_s8, vst1q_u8, vld1q_s8, 16, 16)\nLOOP2(VtstqS16N, uint16_t, int16_t, vtstq_s16, vst1q_u16, vld1q_s16, 8, 8)\nLOOP2(VtstqS32N, uint32_t, int32_t, vtstq_s32, vst1q_u32, vld1q_s32, 4, 4)\nLOOP2(VtstqS64N, uint64_t, int64_t, vtstq_s64, vst1q_u64, vld1q_s64, 2, 2)\nLOOP2(VtstqU8N, uint8_t, uint8_t, vtstq_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(VtstqU16N, uint16_t, uint16_t, vtstq_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(VtstqU32N, uint32_t, uint32_t, vtstq_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(VtstqU64N, uint64_t, uint64_t, vtstq_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(Vuzp1S8N, int8_t, int8_t, vuzp1_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(Vuzp1S16N, int16_t, int16_t, vuzp1_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(Vuzp1S32N, int32_t, int32_t, vuzp1_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(Vuzp1U8N, uint8_t, uint8_t, vuzp1_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(Vuzp1U16N, uint16_t, uint16_t, vuzp1_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(Vuzp1U32N, uint32_t, uint32_t, vuzp1_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(Vuzp1F32N, float32_t, float32_t, vuzp1_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(Vuzp1QS8N, int8_t, int8_t, vuzp1q_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(Vuzp1QS16N, int16_t, int16_t, vuzp1q_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(Vuzp1QS32N, int32_t, int32_t, vuzp1q_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(Vuzp1QS64N, int64_t, int64_t, vuzp1q_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(Vuzp1QU8N, uint8_t, uint8_t, vuzp1q_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(Vuzp1QU16N, uint16_t, uint16_t, vuzp1q_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(Vuzp1QU32N, uint32_t, uint32_t, vuzp1q_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(Vuzp1QU64N, uint64_t, uint64_t, vuzp1q_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(Vuzp1QF32N, float32_t, float32_t, vuzp1q_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(Vuzp1QF64N, float64_t, float64_t, vuzp1q_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(Vuzp2S8N, int8_t, int8_t, vuzp2_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(Vuzp2S16N, int16_t, int16_t, vuzp2_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(Vuzp2S32N, int32_t, int32_t, vuzp2_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(Vuzp2U8N, uint8_t, uint8_t, vuzp2_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(Vuzp2U16N, uint16_t, uint16_t, vuzp2_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(Vuzp2U32N, uint32_t, uint32_t, vuzp2_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(Vuzp2F32N, float32_t, float32_t, vuzp2_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(Vuzp2QS8N, int8_t, int8_t, vuzp2q_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(Vuzp2QS16N, int16_t, int16_t, vuzp2q_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(Vuzp2QS32N, int32_t, int32_t, vuzp2q_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(Vuzp2QS64N, int64_t, int64_t, vuzp2q_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(Vuzp2QU8N, uint8_t, uint8_t, vuzp2q_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(Vuzp2QU16N, uint16_t, uint16_t, vuzp2q_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(Vuzp2QU32N, uint32_t, uint32_t, vuzp2q_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(Vuzp2QU64N, uint64_t, uint64_t, vuzp2q_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(Vuzp2QF32N, float32_t, float32_t, vuzp2q_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(Vuzp2QF64N, float64_t, float64_t, vuzp2q_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(Vzip1S8N, int8_t, int8_t, vzip1_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(Vzip1S16N, int16_t, int16_t, vzip1_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(Vzip1S32N, int32_t, int32_t, vzip1_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(Vzip1U8N, uint8_t, uint8_t, vzip1_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(Vzip1U16N, uint16_t, uint16_t, vzip1_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(Vzip1U32N, uint32_t, uint32_t, vzip1_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(Vzip1F32N, float32_t, float32_t, vzip1_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(Vzip1QS8N, int8_t, int8_t, vzip1q_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(Vzip1QS16N, int16_t, int16_t, vzip1q_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(Vzip1QS32N, int32_t, int32_t, vzip1q_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(Vzip1QS64N, int64_t, int64_t, vzip1q_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(Vzip1QU8N, uint8_t, uint8_t, vzip1q_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(Vzip1QU16N, uint16_t, uint16_t, vzip1q_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(Vzip1QU32N, uint32_t, uint32_t, vzip1q_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(Vzip1QU64N, uint64_t, uint64_t, vzip1q_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(Vzip1QF32N, float32_t, float32_t, vzip1q_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(Vzip1QF64N, float64_t, float64_t, vzip1q_f64, vst1q_f64, vld1q_f64, 2, 2)\nLOOP2(Vzip2S8N, int8_t, int8_t, vzip2_s8, vst1_s8, vld1_s8, 8, 8)\nLOOP2(Vzip2S16N, int16_t, int16_t, vzip2_s16, vst1_s16, vld1_s16, 4, 4)\nLOOP2(Vzip2S32N, int32_t, int32_t, vzip2_s32, vst1_s32, vld1_s32, 2, 2)\nLOOP2(Vzip2U8N, uint8_t, uint8_t, vzip2_u8, vst1_u8, vld1_u8, 8, 8)\nLOOP2(Vzip2U16N, uint16_t, uint16_t, vzip2_u16, vst1_u16, vld1_u16, 4, 4)\nLOOP2(Vzip2U32N, uint32_t, uint32_t, vzip2_u32, vst1_u32, vld1_u32, 2, 2)\nLOOP2(Vzip2F32N, float32_t, float32_t, vzip2_f32, vst1_f32, vld1_f32, 2, 2)\nLOOP2(Vzip2QS8N, int8_t, int8_t, vzip2q_s8, vst1q_s8, vld1q_s8, 16, 16)\nLOOP2(Vzip2QS16N, int16_t, int16_t, vzip2q_s16, vst1q_s16, vld1q_s16, 8, 8)\nLOOP2(Vzip2QS32N, int32_t, int32_t, vzip2q_s32, vst1q_s32, vld1q_s32, 4, 4)\nLOOP2(Vzip2QS64N, int64_t, int64_t, vzip2q_s64, vst1q_s64, vld1q_s64, 2, 2)\nLOOP2(Vzip2QU8N, uint8_t, uint8_t, vzip2q_u8, vst1q_u8, vld1q_u8, 16, 16)\nLOOP2(Vzip2QU16N, uint16_t, uint16_t, vzip2q_u16, vst1q_u16, vld1q_u16, 8, 8)\nLOOP2(Vzip2QU32N, uint32_t, uint32_t, vzip2q_u32, vst1q_u32, vld1q_u32, 4, 4)\nLOOP2(Vzip2QU64N, uint64_t, uint64_t, vzip2q_u64, vst1q_u64, vld1q_u64, 2, 2)\nLOOP2(Vzip2QF32N, float32_t, float32_t, vzip2q_f32, vst1q_f32, vld1q_f32, 4, 4)\nLOOP2(Vzip2QF64N, float64_t, float64_t, vzip2q_f64, vst1q_f64, vld1q_f64, 2, 2)\n"
  },
  {
    "path": "arm/neon/loops.go",
    "content": "package neon\n\nimport (\n\t\"github.com/alivanz/go-simd/arm\"\n)\n\n/*\n#include <arm_neon.h>\n*/\nimport \"C\"\n\n// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdS8N VabdS8N\n//go:noescape\nfunc VabdS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdS16N VabdS16N\n//go:noescape\nfunc VabdS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdS32N VabdS32N\n//go:noescape\nfunc VabdS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdU8N VabdU8N\n//go:noescape\nfunc VabdU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdU16N VabdU16N\n//go:noescape\nfunc VabdU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdU32N VabdU32N\n//go:noescape\nfunc VabdU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdF32N VabdF32N\n//go:noescape\nfunc VabdF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdF64N VabdF64N\n//go:noescape\nfunc VabdF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabddF64N VabddF64N\n//go:noescape\nfunc VabddF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqS8N VabdqS8N\n//go:noescape\nfunc VabdqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqS16N VabdqS16N\n//go:noescape\nfunc VabdqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqS32N VabdqS32N\n//go:noescape\nfunc VabdqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqU8N VabdqU8N\n//go:noescape\nfunc VabdqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqU16N VabdqU16N\n//go:noescape\nfunc VabdqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqU32N VabdqU32N\n//go:noescape\nfunc VabdqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqF32N VabdqF32N\n//go:noescape\nfunc VabdqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdqF64N VabdqF64N\n//go:noescape\nfunc VabdqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabdsF32N VabdsF32N\n//go:noescape\nfunc VabdsF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsS8N VabsS8N\n//go:noescape\nfunc VabsS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsS16N VabsS16N\n//go:noescape\nfunc VabsS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsS32N VabsS32N\n//go:noescape\nfunc VabsS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsS64N VabsS64N\n//go:noescape\nfunc VabsS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsF32N VabsF32N\n//go:noescape\nfunc VabsF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsF64N VabsF64N\n//go:noescape\nfunc VabsF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsdS64N VabsdS64N\n//go:noescape\nfunc VabsdS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsqS8N VabsqS8N\n//go:noescape\nfunc VabsqS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsqS16N VabsqS16N\n//go:noescape\nfunc VabsqS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsqS32N VabsqS32N\n//go:noescape\nfunc VabsqS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsqS64N VabsqS64N\n//go:noescape\nfunc VabsqS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsqF32N VabsqF32N\n//go:noescape\nfunc VabsqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VabsqF64N VabsqF64N\n//go:noescape\nfunc VabsqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddS8N VaddS8N\n//go:noescape\nfunc VaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddS16N VaddS16N\n//go:noescape\nfunc VaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddS32N VaddS32N\n//go:noescape\nfunc VaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddS64N VaddS64N\n//go:noescape\nfunc VaddS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddU8N VaddU8N\n//go:noescape\nfunc VaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddU16N VaddU16N\n//go:noescape\nfunc VaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddU32N VaddU32N\n//go:noescape\nfunc VaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddU64N VaddU64N\n//go:noescape\nfunc VaddU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VaddF32N VaddF32N\n//go:noescape\nfunc VaddF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VaddF64N VaddF64N\n//go:noescape\nfunc VaddF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VadddS64N VadddS64N\n//go:noescape\nfunc VadddS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VadddU64N VadddU64N\n//go:noescape\nfunc VadddU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqS8N VaddqS8N\n//go:noescape\nfunc VaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqS16N VaddqS16N\n//go:noescape\nfunc VaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqS32N VaddqS32N\n//go:noescape\nfunc VaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqS64N VaddqS64N\n//go:noescape\nfunc VaddqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqU8N VaddqU8N\n//go:noescape\nfunc VaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqU16N VaddqU16N\n//go:noescape\nfunc VaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqU32N VaddqU32N\n//go:noescape\nfunc VaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VaddqU64N VaddqU64N\n//go:noescape\nfunc VaddqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VaddqF32N VaddqF32N\n//go:noescape\nfunc VaddqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VaddqF64N VaddqF64N\n//go:noescape\nfunc VaddqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvS8N VaddvS8N\n//go:noescape\nfunc VaddvS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvS16N VaddvS16N\n//go:noescape\nfunc VaddvS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Add across vector\n//\n//go:linkname VaddvS32N VaddvS32N\n//go:noescape\nfunc VaddvS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvU8N VaddvU8N\n//go:noescape\nfunc VaddvU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvU16N VaddvU16N\n//go:noescape\nfunc VaddvU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Add across vector\n//\n//go:linkname VaddvU32N VaddvU32N\n//go:noescape\nfunc VaddvU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Floating-point add across vector\n//\n//go:linkname VaddvF32N VaddvF32N\n//go:noescape\nfunc VaddvF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvqS8N VaddvqS8N\n//go:noescape\nfunc VaddvqS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvqS16N VaddvqS16N\n//go:noescape\nfunc VaddvqS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvqS32N VaddvqS32N\n//go:noescape\nfunc VaddvqS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Add across vector\n//\n//go:linkname VaddvqS64N VaddvqS64N\n//go:noescape\nfunc VaddvqS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvqU8N VaddvqU8N\n//go:noescape\nfunc VaddvqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvqU16N VaddvqU16N\n//go:noescape\nfunc VaddvqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register.\n//\n//go:linkname VaddvqU32N VaddvqU32N\n//go:noescape\nfunc VaddvqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Add across vector\n//\n//go:linkname VaddvqU64N VaddvqU64N\n//go:noescape\nfunc VaddvqU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)\n\n// Floating-point add across vector\n//\n//go:linkname VaddvqF32N VaddvqF32N\n//go:noescape\nfunc VaddvqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point add across vector\n//\n//go:linkname VaddvqF64N VaddvqF64N\n//go:noescape\nfunc VaddvqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// AES single round decryption.\n//\n//go:linkname VaesdqU8N VaesdqU8N\n//go:noescape\nfunc VaesdqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// AES single round encryption.\n//\n//go:linkname VaeseqU8N VaeseqU8N\n//go:noescape\nfunc VaeseqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// AES inverse mix columns.\n//\n//go:linkname VaesimcqU8N VaesimcqU8N\n//go:noescape\nfunc VaesimcqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// AES mix columns.\n//\n//go:linkname VaesmcqU8N VaesmcqU8N\n//go:noescape\nfunc VaesmcqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandS8N VandS8N\n//go:noescape\nfunc VandS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandS16N VandS16N\n//go:noescape\nfunc VandS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandS32N VandS32N\n//go:noescape\nfunc VandS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandS64N VandS64N\n//go:noescape\nfunc VandS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandU8N VandU8N\n//go:noescape\nfunc VandU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandU16N VandU16N\n//go:noescape\nfunc VandU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandU32N VandU32N\n//go:noescape\nfunc VandU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandU64N VandU64N\n//go:noescape\nfunc VandU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqS8N VandqS8N\n//go:noescape\nfunc VandqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqS16N VandqS16N\n//go:noescape\nfunc VandqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqS32N VandqS32N\n//go:noescape\nfunc VandqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqS64N VandqS64N\n//go:noescape\nfunc VandqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqU8N VandqU8N\n//go:noescape\nfunc VandqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqU16N VandqU16N\n//go:noescape\nfunc VandqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqU32N VandqU32N\n//go:noescape\nfunc VandqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VandqU64N VandqU64N\n//go:noescape\nfunc VandqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicS8N VbicS8N\n//go:noescape\nfunc VbicS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicS16N VbicS16N\n//go:noescape\nfunc VbicS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicS32N VbicS32N\n//go:noescape\nfunc VbicS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicS64N VbicS64N\n//go:noescape\nfunc VbicS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicU8N VbicU8N\n//go:noescape\nfunc VbicU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicU16N VbicU16N\n//go:noescape\nfunc VbicU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicU32N VbicU32N\n//go:noescape\nfunc VbicU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicU64N VbicU64N\n//go:noescape\nfunc VbicU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqS8N VbicqS8N\n//go:noescape\nfunc VbicqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqS16N VbicqS16N\n//go:noescape\nfunc VbicqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqS32N VbicqS32N\n//go:noescape\nfunc VbicqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqS64N VbicqS64N\n//go:noescape\nfunc VbicqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqU8N VbicqU8N\n//go:noescape\nfunc VbicqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqU16N VbicqU16N\n//go:noescape\nfunc VbicqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqU32N VbicqU32N\n//go:noescape\nfunc VbicqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VbicqU64N VbicqU64N\n//go:noescape\nfunc VbicqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Complex Add.\n//\n//go:linkname VcaddRot270F32N VcaddRot270F32N\n//go:noescape\nfunc VcaddRot270F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Complex Add.\n//\n//go:linkname VcaddRot90F32N VcaddRot90F32N\n//go:noescape\nfunc VcaddRot90F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Complex Add.\n//\n//go:linkname VcaddqRot270F32N VcaddqRot270F32N\n//go:noescape\nfunc VcaddqRot270F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Complex Add.\n//\n//go:linkname VcaddqRot270F64N VcaddqRot270F64N\n//go:noescape\nfunc VcaddqRot270F64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Complex Add.\n//\n//go:linkname VcaddqRot90F32N VcaddqRot90F32N\n//go:noescape\nfunc VcaddqRot90F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Complex Add.\n//\n//go:linkname VcaddqRot90F64N VcaddqRot90F64N\n//go:noescape\nfunc VcaddqRot90F64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcageF32N VcageF32N\n//go:noescape\nfunc VcageF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcageF64N VcageF64N\n//go:noescape\nfunc VcageF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagedF64N VcagedF64N\n//go:noescape\nfunc VcagedF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcageqF32N VcageqF32N\n//go:noescape\nfunc VcageqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcageqF64N VcageqF64N\n//go:noescape\nfunc VcageqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagesF32N VcagesF32N\n//go:noescape\nfunc VcagesF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagtF32N VcagtF32N\n//go:noescape\nfunc VcagtF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagtF64N VcagtF64N\n//go:noescape\nfunc VcagtF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagtdF64N VcagtdF64N\n//go:noescape\nfunc VcagtdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagtqF32N VcagtqF32N\n//go:noescape\nfunc VcagtqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagtqF64N VcagtqF64N\n//go:noescape\nfunc VcagtqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcagtsF32N VcagtsF32N\n//go:noescape\nfunc VcagtsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point absolute compare less than or equal\n//\n//go:linkname VcaleF32N VcaleF32N\n//go:noescape\nfunc VcaleF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point absolute compare less than or equal\n//\n//go:linkname VcaleF64N VcaleF64N\n//go:noescape\nfunc VcaleF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point absolute compare less than or equal\n//\n//go:linkname VcaledF64N VcaledF64N\n//go:noescape\nfunc VcaledF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point absolute compare less than or equal\n//\n//go:linkname VcaleqF32N VcaleqF32N\n//go:noescape\nfunc VcaleqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point absolute compare less than or equal\n//\n//go:linkname VcaleqF64N VcaleqF64N\n//go:noescape\nfunc VcaleqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point absolute compare less than or equal\n//\n//go:linkname VcalesF32N VcalesF32N\n//go:noescape\nfunc VcalesF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point absolute compare less than\n//\n//go:linkname VcaltF32N VcaltF32N\n//go:noescape\nfunc VcaltF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point absolute compare less than\n//\n//go:linkname VcaltF64N VcaltF64N\n//go:noescape\nfunc VcaltF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point absolute compare less than\n//\n//go:linkname VcaltdF64N VcaltdF64N\n//go:noescape\nfunc VcaltdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point absolute compare less than\n//\n//go:linkname VcaltqF32N VcaltqF32N\n//go:noescape\nfunc VcaltqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point absolute compare less than\n//\n//go:linkname VcaltqF64N VcaltqF64N\n//go:noescape\nfunc VcaltqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point absolute compare less than\n//\n//go:linkname VcaltsF32N VcaltsF32N\n//go:noescape\nfunc VcaltsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqS8N VceqS8N\n//go:noescape\nfunc VceqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqS16N VceqS16N\n//go:noescape\nfunc VceqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqS32N VceqS32N\n//go:noescape\nfunc VceqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqS64N VceqS64N\n//go:noescape\nfunc VceqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqU8N VceqU8N\n//go:noescape\nfunc VceqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqU16N VceqU16N\n//go:noescape\nfunc VceqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqU32N VceqU32N\n//go:noescape\nfunc VceqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqU64N VceqU64N\n//go:noescape\nfunc VceqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqF32N VceqF32N\n//go:noescape\nfunc VceqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqF64N VceqF64N\n//go:noescape\nfunc VceqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqdS64N VceqdS64N\n//go:noescape\nfunc VceqdS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqdU64N VceqdU64N\n//go:noescape\nfunc VceqdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqdF64N VceqdF64N\n//go:noescape\nfunc VceqdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqS8N VceqqS8N\n//go:noescape\nfunc VceqqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqS16N VceqqS16N\n//go:noescape\nfunc VceqqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqS32N VceqqS32N\n//go:noescape\nfunc VceqqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqS64N VceqqS64N\n//go:noescape\nfunc VceqqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqU8N VceqqU8N\n//go:noescape\nfunc VceqqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqU16N VceqqU16N\n//go:noescape\nfunc VceqqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqU32N VceqqU32N\n//go:noescape\nfunc VceqqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqU64N VceqqU64N\n//go:noescape\nfunc VceqqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqF32N VceqqF32N\n//go:noescape\nfunc VceqqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqqF64N VceqqF64N\n//go:noescape\nfunc VceqqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqsF32N VceqsF32N\n//go:noescape\nfunc VceqsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzS8N VceqzS8N\n//go:noescape\nfunc VceqzS8N(r *arm.Uint8, v0 *arm.Int8, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzS16N VceqzS16N\n//go:noescape\nfunc VceqzS16N(r *arm.Uint16, v0 *arm.Int16, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzS32N VceqzS32N\n//go:noescape\nfunc VceqzS32N(r *arm.Uint32, v0 *arm.Int32, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzS64N VceqzS64N\n//go:noescape\nfunc VceqzS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzU8N VceqzU8N\n//go:noescape\nfunc VceqzU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzU16N VceqzU16N\n//go:noescape\nfunc VceqzU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzU32N VceqzU32N\n//go:noescape\nfunc VceqzU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzU64N VceqzU64N\n//go:noescape\nfunc VceqzU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)\n\n// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzF32N VceqzF32N\n//go:noescape\nfunc VceqzF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzF64N VceqzF64N\n//go:noescape\nfunc VceqzF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzdS64N VceqzdS64N\n//go:noescape\nfunc VceqzdS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzdU64N VceqzdU64N\n//go:noescape\nfunc VceqzdU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)\n\n// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzdF64N VceqzdF64N\n//go:noescape\nfunc VceqzdF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqS8N VceqzqS8N\n//go:noescape\nfunc VceqzqS8N(r *arm.Uint8, v0 *arm.Int8, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqS16N VceqzqS16N\n//go:noescape\nfunc VceqzqS16N(r *arm.Uint16, v0 *arm.Int16, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqS32N VceqzqS32N\n//go:noescape\nfunc VceqzqS32N(r *arm.Uint32, v0 *arm.Int32, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqS64N VceqzqS64N\n//go:noescape\nfunc VceqzqS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqU8N VceqzqU8N\n//go:noescape\nfunc VceqzqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqU16N VceqzqU16N\n//go:noescape\nfunc VceqzqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqU32N VceqzqU32N\n//go:noescape\nfunc VceqzqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqU64N VceqzqU64N\n//go:noescape\nfunc VceqzqU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)\n\n// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqF32N VceqzqF32N\n//go:noescape\nfunc VceqzqF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzqF64N VceqzqF64N\n//go:noescape\nfunc VceqzqF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VceqzsF32N VceqzsF32N\n//go:noescape\nfunc VceqzsF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeS8N VcgeS8N\n//go:noescape\nfunc VcgeS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeS16N VcgeS16N\n//go:noescape\nfunc VcgeS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeS32N VcgeS32N\n//go:noescape\nfunc VcgeS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeS64N VcgeS64N\n//go:noescape\nfunc VcgeS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeU8N VcgeU8N\n//go:noescape\nfunc VcgeU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeU16N VcgeU16N\n//go:noescape\nfunc VcgeU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeU32N VcgeU32N\n//go:noescape\nfunc VcgeU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeU64N VcgeU64N\n//go:noescape\nfunc VcgeU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeF32N VcgeF32N\n//go:noescape\nfunc VcgeF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeF64N VcgeF64N\n//go:noescape\nfunc VcgeF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgedS64N VcgedS64N\n//go:noescape\nfunc VcgedS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgedU64N VcgedU64N\n//go:noescape\nfunc VcgedU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgedF64N VcgedF64N\n//go:noescape\nfunc VcgedF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqS8N VcgeqS8N\n//go:noescape\nfunc VcgeqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqS16N VcgeqS16N\n//go:noescape\nfunc VcgeqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqS32N VcgeqS32N\n//go:noescape\nfunc VcgeqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqS64N VcgeqS64N\n//go:noescape\nfunc VcgeqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqU8N VcgeqU8N\n//go:noescape\nfunc VcgeqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqU16N VcgeqU16N\n//go:noescape\nfunc VcgeqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqU32N VcgeqU32N\n//go:noescape\nfunc VcgeqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqU64N VcgeqU64N\n//go:noescape\nfunc VcgeqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqF32N VcgeqF32N\n//go:noescape\nfunc VcgeqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgeqF64N VcgeqF64N\n//go:noescape\nfunc VcgeqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgesF32N VcgesF32N\n//go:noescape\nfunc VcgesF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezS8N VcgezS8N\n//go:noescape\nfunc VcgezS8N(r *arm.Uint8, v0 *arm.Int8, n int32)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezS16N VcgezS16N\n//go:noescape\nfunc VcgezS16N(r *arm.Uint16, v0 *arm.Int16, n int32)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezS32N VcgezS32N\n//go:noescape\nfunc VcgezS32N(r *arm.Uint32, v0 *arm.Int32, n int32)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezS64N VcgezS64N\n//go:noescape\nfunc VcgezS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezF32N VcgezF32N\n//go:noescape\nfunc VcgezF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezF64N VcgezF64N\n//go:noescape\nfunc VcgezF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezdS64N VcgezdS64N\n//go:noescape\nfunc VcgezdS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezdF64N VcgezdF64N\n//go:noescape\nfunc VcgezdF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezqS8N VcgezqS8N\n//go:noescape\nfunc VcgezqS8N(r *arm.Uint8, v0 *arm.Int8, n int32)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezqS16N VcgezqS16N\n//go:noescape\nfunc VcgezqS16N(r *arm.Uint16, v0 *arm.Int16, n int32)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezqS32N VcgezqS32N\n//go:noescape\nfunc VcgezqS32N(r *arm.Uint32, v0 *arm.Int32, n int32)\n\n// Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezqS64N VcgezqS64N\n//go:noescape\nfunc VcgezqS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezqF32N VcgezqF32N\n//go:noescape\nfunc VcgezqF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezqF64N VcgezqF64N\n//go:noescape\nfunc VcgezqF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgezsF32N VcgezsF32N\n//go:noescape\nfunc VcgezsF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtS8N VcgtS8N\n//go:noescape\nfunc VcgtS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtS16N VcgtS16N\n//go:noescape\nfunc VcgtS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtS32N VcgtS32N\n//go:noescape\nfunc VcgtS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtS64N VcgtS64N\n//go:noescape\nfunc VcgtS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtU8N VcgtU8N\n//go:noescape\nfunc VcgtU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtU16N VcgtU16N\n//go:noescape\nfunc VcgtU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtU32N VcgtU32N\n//go:noescape\nfunc VcgtU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtU64N VcgtU64N\n//go:noescape\nfunc VcgtU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtF32N VcgtF32N\n//go:noescape\nfunc VcgtF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtF64N VcgtF64N\n//go:noescape\nfunc VcgtF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtdS64N VcgtdS64N\n//go:noescape\nfunc VcgtdS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtdU64N VcgtdU64N\n//go:noescape\nfunc VcgtdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtdF64N VcgtdF64N\n//go:noescape\nfunc VcgtdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqS8N VcgtqS8N\n//go:noescape\nfunc VcgtqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqS16N VcgtqS16N\n//go:noescape\nfunc VcgtqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqS32N VcgtqS32N\n//go:noescape\nfunc VcgtqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqS64N VcgtqS64N\n//go:noescape\nfunc VcgtqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqU8N VcgtqU8N\n//go:noescape\nfunc VcgtqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqU16N VcgtqU16N\n//go:noescape\nfunc VcgtqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqU32N VcgtqU32N\n//go:noescape\nfunc VcgtqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqU64N VcgtqU64N\n//go:noescape\nfunc VcgtqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqF32N VcgtqF32N\n//go:noescape\nfunc VcgtqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtqF64N VcgtqF64N\n//go:noescape\nfunc VcgtqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtsF32N VcgtsF32N\n//go:noescape\nfunc VcgtsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzS8N VcgtzS8N\n//go:noescape\nfunc VcgtzS8N(r *arm.Uint8, v0 *arm.Int8, n int32)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzS16N VcgtzS16N\n//go:noescape\nfunc VcgtzS16N(r *arm.Uint16, v0 *arm.Int16, n int32)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzS32N VcgtzS32N\n//go:noescape\nfunc VcgtzS32N(r *arm.Uint32, v0 *arm.Int32, n int32)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzS64N VcgtzS64N\n//go:noescape\nfunc VcgtzS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzF32N VcgtzF32N\n//go:noescape\nfunc VcgtzF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzF64N VcgtzF64N\n//go:noescape\nfunc VcgtzF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzdS64N VcgtzdS64N\n//go:noescape\nfunc VcgtzdS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzdF64N VcgtzdF64N\n//go:noescape\nfunc VcgtzdF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzqS8N VcgtzqS8N\n//go:noescape\nfunc VcgtzqS8N(r *arm.Uint8, v0 *arm.Int8, n int32)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzqS16N VcgtzqS16N\n//go:noescape\nfunc VcgtzqS16N(r *arm.Uint16, v0 *arm.Int16, n int32)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzqS32N VcgtzqS32N\n//go:noescape\nfunc VcgtzqS32N(r *arm.Uint32, v0 *arm.Int32, n int32)\n\n// Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzqS64N VcgtzqS64N\n//go:noescape\nfunc VcgtzqS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzqF32N VcgtzqF32N\n//go:noescape\nfunc VcgtzqF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzqF64N VcgtzqF64N\n//go:noescape\nfunc VcgtzqF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcgtzsF32N VcgtzsF32N\n//go:noescape\nfunc VcgtzsF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleS8N VcleS8N\n//go:noescape\nfunc VcleS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleS16N VcleS16N\n//go:noescape\nfunc VcleS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleS32N VcleS32N\n//go:noescape\nfunc VcleS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleS64N VcleS64N\n//go:noescape\nfunc VcleS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleU8N VcleU8N\n//go:noescape\nfunc VcleU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleU16N VcleU16N\n//go:noescape\nfunc VcleU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleU32N VcleU32N\n//go:noescape\nfunc VcleU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleU64N VcleU64N\n//go:noescape\nfunc VcleU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point compare less than or equal\n//\n//go:linkname VcleF32N VcleF32N\n//go:noescape\nfunc VcleF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point compare less than or equal\n//\n//go:linkname VcleF64N VcleF64N\n//go:noescape\nfunc VcleF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Compare signed less than or equal\n//\n//go:linkname VcledS64N VcledS64N\n//go:noescape\nfunc VcledS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcledU64N VcledU64N\n//go:noescape\nfunc VcledU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point compare less than or equal\n//\n//go:linkname VcledF64N VcledF64N\n//go:noescape\nfunc VcledF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleqS8N VcleqS8N\n//go:noescape\nfunc VcleqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleqS16N VcleqS16N\n//go:noescape\nfunc VcleqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleqS32N VcleqS32N\n//go:noescape\nfunc VcleqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Compare signed less than or equal\n//\n//go:linkname VcleqS64N VcleqS64N\n//go:noescape\nfunc VcleqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleqU8N VcleqU8N\n//go:noescape\nfunc VcleqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleqU16N VcleqU16N\n//go:noescape\nfunc VcleqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleqU32N VcleqU32N\n//go:noescape\nfunc VcleqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Compare unsigned less than or equal\n//\n//go:linkname VcleqU64N VcleqU64N\n//go:noescape\nfunc VcleqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point compare less than or equal\n//\n//go:linkname VcleqF32N VcleqF32N\n//go:noescape\nfunc VcleqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point compare less than or equal\n//\n//go:linkname VcleqF64N VcleqF64N\n//go:noescape\nfunc VcleqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point compare less than or equal\n//\n//go:linkname VclesF32N VclesF32N\n//go:noescape\nfunc VclesF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezS8N VclezS8N\n//go:noescape\nfunc VclezS8N(r *arm.Uint8, v0 *arm.Int8, n int32)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezS16N VclezS16N\n//go:noescape\nfunc VclezS16N(r *arm.Uint16, v0 *arm.Int16, n int32)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezS32N VclezS32N\n//go:noescape\nfunc VclezS32N(r *arm.Uint32, v0 *arm.Int32, n int32)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezS64N VclezS64N\n//go:noescape\nfunc VclezS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezF32N VclezF32N\n//go:noescape\nfunc VclezF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezF64N VclezF64N\n//go:noescape\nfunc VclezF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezdS64N VclezdS64N\n//go:noescape\nfunc VclezdS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezdF64N VclezdF64N\n//go:noescape\nfunc VclezdF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezqS8N VclezqS8N\n//go:noescape\nfunc VclezqS8N(r *arm.Uint8, v0 *arm.Int8, n int32)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezqS16N VclezqS16N\n//go:noescape\nfunc VclezqS16N(r *arm.Uint16, v0 *arm.Int16, n int32)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezqS32N VclezqS32N\n//go:noescape\nfunc VclezqS32N(r *arm.Uint32, v0 *arm.Int32, n int32)\n\n// Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezqS64N VclezqS64N\n//go:noescape\nfunc VclezqS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezqF32N VclezqF32N\n//go:noescape\nfunc VclezqF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezqF64N VclezqF64N\n//go:noescape\nfunc VclezqF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VclezsF32N VclezsF32N\n//go:noescape\nfunc VclezsF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsS8N VclsS8N\n//go:noescape\nfunc VclsS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsS16N VclsS16N\n//go:noescape\nfunc VclsS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsS32N VclsS32N\n//go:noescape\nfunc VclsS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsU8N VclsU8N\n//go:noescape\nfunc VclsU8N(r *arm.Int8, v0 *arm.Uint8, n int32)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsU16N VclsU16N\n//go:noescape\nfunc VclsU16N(r *arm.Int16, v0 *arm.Uint16, n int32)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsU32N VclsU32N\n//go:noescape\nfunc VclsU32N(r *arm.Int32, v0 *arm.Uint32, n int32)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsqS8N VclsqS8N\n//go:noescape\nfunc VclsqS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsqS16N VclsqS16N\n//go:noescape\nfunc VclsqS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsqS32N VclsqS32N\n//go:noescape\nfunc VclsqS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsqU8N VclsqU8N\n//go:noescape\nfunc VclsqU8N(r *arm.Int8, v0 *arm.Uint8, n int32)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsqU16N VclsqU16N\n//go:noescape\nfunc VclsqU16N(r *arm.Int16, v0 *arm.Uint16, n int32)\n\n// Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself.\n//\n//go:linkname VclsqU32N VclsqU32N\n//go:noescape\nfunc VclsqU32N(r *arm.Int32, v0 *arm.Uint32, n int32)\n\n// Compare signed less than\n//\n//go:linkname VcltS8N VcltS8N\n//go:noescape\nfunc VcltS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Compare signed less than\n//\n//go:linkname VcltS16N VcltS16N\n//go:noescape\nfunc VcltS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Compare signed less than\n//\n//go:linkname VcltS32N VcltS32N\n//go:noescape\nfunc VcltS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Compare signed less than\n//\n//go:linkname VcltS64N VcltS64N\n//go:noescape\nfunc VcltS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare unsigned less than\n//\n//go:linkname VcltU8N VcltU8N\n//go:noescape\nfunc VcltU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Compare unsigned less than\n//\n//go:linkname VcltU16N VcltU16N\n//go:noescape\nfunc VcltU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Compare unsigned less than\n//\n//go:linkname VcltU32N VcltU32N\n//go:noescape\nfunc VcltU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Compare unsigned less than\n//\n//go:linkname VcltU64N VcltU64N\n//go:noescape\nfunc VcltU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point compare less than\n//\n//go:linkname VcltF32N VcltF32N\n//go:noescape\nfunc VcltF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point compare less than\n//\n//go:linkname VcltF64N VcltF64N\n//go:noescape\nfunc VcltF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Compare signed less than\n//\n//go:linkname VcltdS64N VcltdS64N\n//go:noescape\nfunc VcltdS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare unsigned less than\n//\n//go:linkname VcltdU64N VcltdU64N\n//go:noescape\nfunc VcltdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point compare less than\n//\n//go:linkname VcltdF64N VcltdF64N\n//go:noescape\nfunc VcltdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Compare signed less than\n//\n//go:linkname VcltqS8N VcltqS8N\n//go:noescape\nfunc VcltqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Compare signed less than\n//\n//go:linkname VcltqS16N VcltqS16N\n//go:noescape\nfunc VcltqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Compare signed less than\n//\n//go:linkname VcltqS32N VcltqS32N\n//go:noescape\nfunc VcltqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Compare signed less than\n//\n//go:linkname VcltqS64N VcltqS64N\n//go:noescape\nfunc VcltqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare unsigned less than\n//\n//go:linkname VcltqU8N VcltqU8N\n//go:noescape\nfunc VcltqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Compare unsigned less than\n//\n//go:linkname VcltqU16N VcltqU16N\n//go:noescape\nfunc VcltqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Compare unsigned less than\n//\n//go:linkname VcltqU32N VcltqU32N\n//go:noescape\nfunc VcltqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Compare unsigned less than\n//\n//go:linkname VcltqU64N VcltqU64N\n//go:noescape\nfunc VcltqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point compare less than\n//\n//go:linkname VcltqF32N VcltqF32N\n//go:noescape\nfunc VcltqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point compare less than\n//\n//go:linkname VcltqF64N VcltqF64N\n//go:noescape\nfunc VcltqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point compare less than\n//\n//go:linkname VcltsF32N VcltsF32N\n//go:noescape\nfunc VcltsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzS8N VcltzS8N\n//go:noescape\nfunc VcltzS8N(r *arm.Uint8, v0 *arm.Int8, n int32)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzS16N VcltzS16N\n//go:noescape\nfunc VcltzS16N(r *arm.Uint16, v0 *arm.Int16, n int32)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzS32N VcltzS32N\n//go:noescape\nfunc VcltzS32N(r *arm.Uint32, v0 *arm.Int32, n int32)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzS64N VcltzS64N\n//go:noescape\nfunc VcltzS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzF32N VcltzF32N\n//go:noescape\nfunc VcltzF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzF64N VcltzF64N\n//go:noescape\nfunc VcltzF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzdS64N VcltzdS64N\n//go:noescape\nfunc VcltzdS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzdF64N VcltzdF64N\n//go:noescape\nfunc VcltzdF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzqS8N VcltzqS8N\n//go:noescape\nfunc VcltzqS8N(r *arm.Uint8, v0 *arm.Int8, n int32)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzqS16N VcltzqS16N\n//go:noescape\nfunc VcltzqS16N(r *arm.Uint16, v0 *arm.Int16, n int32)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzqS32N VcltzqS32N\n//go:noescape\nfunc VcltzqS32N(r *arm.Uint32, v0 *arm.Int32, n int32)\n\n// Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzqS64N VcltzqS64N\n//go:noescape\nfunc VcltzqS64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzqF32N VcltzqF32N\n//go:noescape\nfunc VcltzqF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzqF64N VcltzqF64N\n//go:noescape\nfunc VcltzqF64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VcltzsF32N VcltzsF32N\n//go:noescape\nfunc VcltzsF32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzS8N VclzS8N\n//go:noescape\nfunc VclzS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzS16N VclzS16N\n//go:noescape\nfunc VclzS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzS32N VclzS32N\n//go:noescape\nfunc VclzS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzU8N VclzU8N\n//go:noescape\nfunc VclzU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzU16N VclzU16N\n//go:noescape\nfunc VclzU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzU32N VclzU32N\n//go:noescape\nfunc VclzU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzqS8N VclzqS8N\n//go:noescape\nfunc VclzqS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzqS16N VclzqS16N\n//go:noescape\nfunc VclzqS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzqS32N VclzqS32N\n//go:noescape\nfunc VclzqS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzqU8N VclzqU8N\n//go:noescape\nfunc VclzqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzqU16N VclzqU16N\n//go:noescape\nfunc VclzqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VclzqU32N VclzqU32N\n//go:noescape\nfunc VclzqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VcntS8N VcntS8N\n//go:noescape\nfunc VcntS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VcntU8N VcntU8N\n//go:noescape\nfunc VcntU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VcntqS8N VcntqS8N\n//go:noescape\nfunc VcntqS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VcntqU8N VcntqU8N\n//go:noescape\nfunc VcntqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineS8N VcombineS8N\n//go:noescape\nfunc VcombineS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineS16N VcombineS16N\n//go:noescape\nfunc VcombineS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineS32N VcombineS32N\n//go:noescape\nfunc VcombineS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineS64N VcombineS64N\n//go:noescape\nfunc VcombineS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineU8N VcombineU8N\n//go:noescape\nfunc VcombineU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineU16N VcombineU16N\n//go:noescape\nfunc VcombineU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineU32N VcombineU32N\n//go:noescape\nfunc VcombineU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineU64N VcombineU64N\n//go:noescape\nfunc VcombineU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineF32N VcombineF32N\n//go:noescape\nfunc VcombineF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Join two smaller vectors into a single larger vector\n//\n//go:linkname VcombineF64N VcombineF64N\n//go:noescape\nfunc VcombineF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtF32S32N VcvtF32S32N\n//go:noescape\nfunc VcvtF32S32N(r *arm.Float32, v0 *arm.Int32, n int32)\n\n// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtF32U32N VcvtF32U32N\n//go:noescape\nfunc VcvtF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32)\n\n// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtF64S64N VcvtF64S64N\n//go:noescape\nfunc VcvtF64S64N(r *arm.Float64, v0 *arm.Int64, n int32)\n\n// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtF64U64N VcvtF64U64N\n//go:noescape\nfunc VcvtF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32)\n\n// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtS32F32N VcvtS32F32N\n//go:noescape\nfunc VcvtS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtS64F64N VcvtS64F64N\n//go:noescape\nfunc VcvtS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.\n//\n//go:linkname VcvtU32F32N VcvtU32F32N\n//go:noescape\nfunc VcvtU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.\n//\n//go:linkname VcvtU64F64N VcvtU64F64N\n//go:noescape\nfunc VcvtU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaS32F32N VcvtaS32F32N\n//go:noescape\nfunc VcvtaS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaS64F64N VcvtaS64F64N\n//go:noescape\nfunc VcvtaS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaU32F32N VcvtaU32F32N\n//go:noescape\nfunc VcvtaU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaU64F64N VcvtaU64F64N\n//go:noescape\nfunc VcvtaU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtadS64F64N VcvtadS64F64N\n//go:noescape\nfunc VcvtadS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtadU64F64N VcvtadU64F64N\n//go:noescape\nfunc VcvtadU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaqS32F32N VcvtaqS32F32N\n//go:noescape\nfunc VcvtaqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaqS64F64N VcvtaqS64F64N\n//go:noescape\nfunc VcvtaqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaqU32F32N VcvtaqU32F32N\n//go:noescape\nfunc VcvtaqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtaqU64F64N VcvtaqU64F64N\n//go:noescape\nfunc VcvtaqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtasS32F32N VcvtasS32F32N\n//go:noescape\nfunc VcvtasS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtasU32F32N VcvtasU32F32N\n//go:noescape\nfunc VcvtasU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtdF64S64N VcvtdF64S64N\n//go:noescape\nfunc VcvtdF64S64N(r *arm.Float64, v0 *arm.Int64, n int32)\n\n// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtdF64U64N VcvtdF64U64N\n//go:noescape\nfunc VcvtdF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32)\n\n// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtdS64F64N VcvtdS64F64N\n//go:noescape\nfunc VcvtdS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.\n//\n//go:linkname VcvtdU64F64N VcvtdU64F64N\n//go:noescape\nfunc VcvtdU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmS32F32N VcvtmS32F32N\n//go:noescape\nfunc VcvtmS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmS64F64N VcvtmS64F64N\n//go:noescape\nfunc VcvtmS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmU32F32N VcvtmU32F32N\n//go:noescape\nfunc VcvtmU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmU64F64N VcvtmU64F64N\n//go:noescape\nfunc VcvtmU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmdS64F64N VcvtmdS64F64N\n//go:noescape\nfunc VcvtmdS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmdU64F64N VcvtmdU64F64N\n//go:noescape\nfunc VcvtmdU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmqS32F32N VcvtmqS32F32N\n//go:noescape\nfunc VcvtmqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmqS64F64N VcvtmqS64F64N\n//go:noescape\nfunc VcvtmqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmqU32F32N VcvtmqU32F32N\n//go:noescape\nfunc VcvtmqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmqU64F64N VcvtmqU64F64N\n//go:noescape\nfunc VcvtmqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmsS32F32N VcvtmsS32F32N\n//go:noescape\nfunc VcvtmsS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtmsU32F32N VcvtmsU32F32N\n//go:noescape\nfunc VcvtmsU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnS32F32N VcvtnS32F32N\n//go:noescape\nfunc VcvtnS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnS64F64N VcvtnS64F64N\n//go:noescape\nfunc VcvtnS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnU32F32N VcvtnU32F32N\n//go:noescape\nfunc VcvtnU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnU64F64N VcvtnU64F64N\n//go:noescape\nfunc VcvtnU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtndS64F64N VcvtndS64F64N\n//go:noescape\nfunc VcvtndS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtndU64F64N VcvtndU64F64N\n//go:noescape\nfunc VcvtndU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnqS32F32N VcvtnqS32F32N\n//go:noescape\nfunc VcvtnqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnqS64F64N VcvtnqS64F64N\n//go:noescape\nfunc VcvtnqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnqU32F32N VcvtnqU32F32N\n//go:noescape\nfunc VcvtnqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnqU64F64N VcvtnqU64F64N\n//go:noescape\nfunc VcvtnqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnsS32F32N VcvtnsS32F32N\n//go:noescape\nfunc VcvtnsS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtnsU32F32N VcvtnsU32F32N\n//go:noescape\nfunc VcvtnsU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpS32F32N VcvtpS32F32N\n//go:noescape\nfunc VcvtpS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpS64F64N VcvtpS64F64N\n//go:noescape\nfunc VcvtpS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpU32F32N VcvtpU32F32N\n//go:noescape\nfunc VcvtpU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpU64F64N VcvtpU64F64N\n//go:noescape\nfunc VcvtpU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpdS64F64N VcvtpdS64F64N\n//go:noescape\nfunc VcvtpdS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpdU64F64N VcvtpdU64F64N\n//go:noescape\nfunc VcvtpdU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpqS32F32N VcvtpqS32F32N\n//go:noescape\nfunc VcvtpqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpqS64F64N VcvtpqS64F64N\n//go:noescape\nfunc VcvtpqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpqU32F32N VcvtpqU32F32N\n//go:noescape\nfunc VcvtpqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpqU64F64N VcvtpqU64F64N\n//go:noescape\nfunc VcvtpqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpsS32F32N VcvtpsS32F32N\n//go:noescape\nfunc VcvtpsS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtpsU32F32N VcvtpsU32F32N\n//go:noescape\nfunc VcvtpsU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtqF32S32N VcvtqF32S32N\n//go:noescape\nfunc VcvtqF32S32N(r *arm.Float32, v0 *arm.Int32, n int32)\n\n// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtqF32U32N VcvtqF32U32N\n//go:noescape\nfunc VcvtqF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32)\n\n// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtqF64S64N VcvtqF64S64N\n//go:noescape\nfunc VcvtqF64S64N(r *arm.Float64, v0 *arm.Int64, n int32)\n\n// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtqF64U64N VcvtqF64U64N\n//go:noescape\nfunc VcvtqF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32)\n\n// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtqS32F32N VcvtqS32F32N\n//go:noescape\nfunc VcvtqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtqS64F64N VcvtqS64F64N\n//go:noescape\nfunc VcvtqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.\n//\n//go:linkname VcvtqU32F32N VcvtqU32F32N\n//go:noescape\nfunc VcvtqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.\n//\n//go:linkname VcvtqU64F64N VcvtqU64F64N\n//go:noescape\nfunc VcvtqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtsF32S32N VcvtsF32S32N\n//go:noescape\nfunc VcvtsF32S32N(r *arm.Float32, v0 *arm.Int32, n int32)\n\n// Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtsF32U32N VcvtsF32U32N\n//go:noescape\nfunc VcvtsF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32)\n\n// Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VcvtsS32F32N VcvtsS32F32N\n//go:noescape\nfunc VcvtsS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register.\n//\n//go:linkname VcvtsU32F32N VcvtsU32F32N\n//go:noescape\nfunc VcvtsU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VdivF32N VdivF32N\n//go:noescape\nfunc VdivF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VdivF64N VdivF64N\n//go:noescape\nfunc VdivF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VdivqF32N VdivqF32N\n//go:noescape\nfunc VdivqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VdivqF64N VdivqF64N\n//go:noescape\nfunc VdivqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNS8N VdupNS8N\n//go:noescape\nfunc VdupNS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNS16N VdupNS16N\n//go:noescape\nfunc VdupNS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNS32N VdupNS32N\n//go:noescape\nfunc VdupNS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register.\n//\n//go:linkname VdupNS64N VdupNS64N\n//go:noescape\nfunc VdupNS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNU8N VdupNU8N\n//go:noescape\nfunc VdupNU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNU16N VdupNU16N\n//go:noescape\nfunc VdupNU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNU32N VdupNU32N\n//go:noescape\nfunc VdupNU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register.\n//\n//go:linkname VdupNU64N VdupNU64N\n//go:noescape\nfunc VdupNU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupNF32N VdupNF32N\n//go:noescape\nfunc VdupNF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register.\n//\n//go:linkname VdupNF64N VdupNF64N\n//go:noescape\nfunc VdupNF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNS8N VdupqNS8N\n//go:noescape\nfunc VdupqNS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNS16N VdupqNS16N\n//go:noescape\nfunc VdupqNS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNS32N VdupqNS32N\n//go:noescape\nfunc VdupqNS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNS64N VdupqNS64N\n//go:noescape\nfunc VdupqNS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNU8N VdupqNU8N\n//go:noescape\nfunc VdupqNU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNU16N VdupqNU16N\n//go:noescape\nfunc VdupqNU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNU32N VdupqNU32N\n//go:noescape\nfunc VdupqNU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNU64N VdupqNU64N\n//go:noescape\nfunc VdupqNU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNF32N VdupqNF32N\n//go:noescape\nfunc VdupqNF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VdupqNF64N VdupqNF64N\n//go:noescape\nfunc VdupqNF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorS8N VeorS8N\n//go:noescape\nfunc VeorS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorS16N VeorS16N\n//go:noescape\nfunc VeorS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorS32N VeorS32N\n//go:noescape\nfunc VeorS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorS64N VeorS64N\n//go:noescape\nfunc VeorS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorU8N VeorU8N\n//go:noescape\nfunc VeorU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorU16N VeorU16N\n//go:noescape\nfunc VeorU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorU32N VeorU32N\n//go:noescape\nfunc VeorU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorU64N VeorU64N\n//go:noescape\nfunc VeorU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqS8N VeorqS8N\n//go:noescape\nfunc VeorqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqS16N VeorqS16N\n//go:noescape\nfunc VeorqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqS32N VeorqS32N\n//go:noescape\nfunc VeorqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqS64N VeorqS64N\n//go:noescape\nfunc VeorqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqU8N VeorqU8N\n//go:noescape\nfunc VeorqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqU16N VeorqU16N\n//go:noescape\nfunc VeorqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqU32N VeorqU32N\n//go:noescape\nfunc VeorqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register.\n//\n//go:linkname VeorqU64N VeorqU64N\n//go:noescape\nfunc VeorqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighS8N VgetHighS8N\n//go:noescape\nfunc VgetHighS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighS16N VgetHighS16N\n//go:noescape\nfunc VgetHighS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighS32N VgetHighS32N\n//go:noescape\nfunc VgetHighS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighS64N VgetHighS64N\n//go:noescape\nfunc VgetHighS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighU8N VgetHighU8N\n//go:noescape\nfunc VgetHighU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighU16N VgetHighU16N\n//go:noescape\nfunc VgetHighU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighU32N VgetHighU32N\n//go:noescape\nfunc VgetHighU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighU64N VgetHighU64N\n//go:noescape\nfunc VgetHighU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighF32N VgetHighF32N\n//go:noescape\nfunc VgetHighF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetHighF64N VgetHighF64N\n//go:noescape\nfunc VgetHighF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowS8N VgetLowS8N\n//go:noescape\nfunc VgetLowS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowS16N VgetLowS16N\n//go:noescape\nfunc VgetLowS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowS32N VgetLowS32N\n//go:noescape\nfunc VgetLowS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowS64N VgetLowS64N\n//go:noescape\nfunc VgetLowS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowU8N VgetLowU8N\n//go:noescape\nfunc VgetLowU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowU16N VgetLowU16N\n//go:noescape\nfunc VgetLowU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowU32N VgetLowU32N\n//go:noescape\nfunc VgetLowU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowU64N VgetLowU64N\n//go:noescape\nfunc VgetLowU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowF32N VgetLowF32N\n//go:noescape\nfunc VgetLowF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VgetLowF64N VgetLowF64N\n//go:noescape\nfunc VgetLowF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddS8N VhaddS8N\n//go:noescape\nfunc VhaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddS16N VhaddS16N\n//go:noescape\nfunc VhaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddS32N VhaddS32N\n//go:noescape\nfunc VhaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddU8N VhaddU8N\n//go:noescape\nfunc VhaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddU16N VhaddU16N\n//go:noescape\nfunc VhaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddU32N VhaddU32N\n//go:noescape\nfunc VhaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddqS8N VhaddqS8N\n//go:noescape\nfunc VhaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddqS16N VhaddqS16N\n//go:noescape\nfunc VhaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddqS32N VhaddqS32N\n//go:noescape\nfunc VhaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddqU8N VhaddqU8N\n//go:noescape\nfunc VhaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddqU16N VhaddqU16N\n//go:noescape\nfunc VhaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhaddqU32N VhaddqU32N\n//go:noescape\nfunc VhaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubS8N VhsubS8N\n//go:noescape\nfunc VhsubS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubS16N VhsubS16N\n//go:noescape\nfunc VhsubS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubS32N VhsubS32N\n//go:noescape\nfunc VhsubS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubU8N VhsubU8N\n//go:noescape\nfunc VhsubU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubU16N VhsubU16N\n//go:noescape\nfunc VhsubU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubU32N VhsubU32N\n//go:noescape\nfunc VhsubU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubqS8N VhsubqS8N\n//go:noescape\nfunc VhsubqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubqS16N VhsubqS16N\n//go:noescape\nfunc VhsubqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubqS32N VhsubqS32N\n//go:noescape\nfunc VhsubqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubqU8N VhsubqU8N\n//go:noescape\nfunc VhsubqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubqU16N VhsubqU16N\n//go:noescape\nfunc VhsubqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VhsubqU32N VhsubqU32N\n//go:noescape\nfunc VhsubqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxS8N VmaxS8N\n//go:noescape\nfunc VmaxS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxS16N VmaxS16N\n//go:noescape\nfunc VmaxS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxS32N VmaxS32N\n//go:noescape\nfunc VmaxS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxU8N VmaxU8N\n//go:noescape\nfunc VmaxU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxU16N VmaxU16N\n//go:noescape\nfunc VmaxU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxU32N VmaxU32N\n//go:noescape\nfunc VmaxU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxF32N VmaxF32N\n//go:noescape\nfunc VmaxF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxF64N VmaxF64N\n//go:noescape\nfunc VmaxF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxnmF32N VmaxnmF32N\n//go:noescape\nfunc VmaxnmF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxnmF64N VmaxnmF64N\n//go:noescape\nfunc VmaxnmF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxnmqF32N VmaxnmqF32N\n//go:noescape\nfunc VmaxnmqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxnmqF64N VmaxnmqF64N\n//go:noescape\nfunc VmaxnmqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VmaxnmvF32N VmaxnmvF32N\n//go:noescape\nfunc VmaxnmvF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Maximum Number across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VmaxnmvqF32N VmaxnmvqF32N\n//go:noescape\nfunc VmaxnmvqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VmaxnmvqF64N VmaxnmvqF64N\n//go:noescape\nfunc VmaxnmvqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqS8N VmaxqS8N\n//go:noescape\nfunc VmaxqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqS16N VmaxqS16N\n//go:noescape\nfunc VmaxqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqS32N VmaxqS32N\n//go:noescape\nfunc VmaxqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqU8N VmaxqU8N\n//go:noescape\nfunc VmaxqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqU16N VmaxqU16N\n//go:noescape\nfunc VmaxqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqU32N VmaxqU32N\n//go:noescape\nfunc VmaxqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqF32N VmaxqF32N\n//go:noescape\nfunc VmaxqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxqF64N VmaxqF64N\n//go:noescape\nfunc VmaxqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VmaxvS8N VmaxvS8N\n//go:noescape\nfunc VmaxvS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VmaxvS16N VmaxvS16N\n//go:noescape\nfunc VmaxvS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxvS32N VmaxvS32N\n//go:noescape\nfunc VmaxvS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmaxvU8N VmaxvU8N\n//go:noescape\nfunc VmaxvU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmaxvU16N VmaxvU16N\n//go:noescape\nfunc VmaxvU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmaxvU32N VmaxvU32N\n//go:noescape\nfunc VmaxvU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VmaxvF32N VmaxvF32N\n//go:noescape\nfunc VmaxvF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VmaxvqS8N VmaxvqS8N\n//go:noescape\nfunc VmaxvqS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VmaxvqS16N VmaxvqS16N\n//go:noescape\nfunc VmaxvqS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VmaxvqS32N VmaxvqS32N\n//go:noescape\nfunc VmaxvqS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmaxvqU8N VmaxvqU8N\n//go:noescape\nfunc VmaxvqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmaxvqU16N VmaxvqU16N\n//go:noescape\nfunc VmaxvqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VmaxvqU32N VmaxvqU32N\n//go:noescape\nfunc VmaxvqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Floating-point Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VmaxvqF32N VmaxvqF32N\n//go:noescape\nfunc VmaxvqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VmaxvqF64N VmaxvqF64N\n//go:noescape\nfunc VmaxvqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminS8N VminS8N\n//go:noescape\nfunc VminS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminS16N VminS16N\n//go:noescape\nfunc VminS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminS32N VminS32N\n//go:noescape\nfunc VminS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminU8N VminU8N\n//go:noescape\nfunc VminU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminU16N VminU16N\n//go:noescape\nfunc VminU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminU32N VminU32N\n//go:noescape\nfunc VminU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminF32N VminF32N\n//go:noescape\nfunc VminF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminF64N VminF64N\n//go:noescape\nfunc VminF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminnmF32N VminnmF32N\n//go:noescape\nfunc VminnmF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminnmF64N VminnmF64N\n//go:noescape\nfunc VminnmF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminnmqF32N VminnmqF32N\n//go:noescape\nfunc VminnmqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminnmqF64N VminnmqF64N\n//go:noescape\nfunc VminnmqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VminnmvF32N VminnmvF32N\n//go:noescape\nfunc VminnmvF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Minimum Number across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VminnmvqF32N VminnmvqF32N\n//go:noescape\nfunc VminnmvqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VminnmvqF64N VminnmvqF64N\n//go:noescape\nfunc VminnmvqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqS8N VminqS8N\n//go:noescape\nfunc VminqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqS16N VminqS16N\n//go:noescape\nfunc VminqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqS32N VminqS32N\n//go:noescape\nfunc VminqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqU8N VminqU8N\n//go:noescape\nfunc VminqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqU16N VminqU16N\n//go:noescape\nfunc VminqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqU32N VminqU32N\n//go:noescape\nfunc VminqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqF32N VminqF32N\n//go:noescape\nfunc VminqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminqF64N VminqF64N\n//go:noescape\nfunc VminqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VminvS8N VminvS8N\n//go:noescape\nfunc VminvS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VminvS16N VminvS16N\n//go:noescape\nfunc VminvS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminvS32N VminvS32N\n//go:noescape\nfunc VminvS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VminvU8N VminvU8N\n//go:noescape\nfunc VminvU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VminvU16N VminvU16N\n//go:noescape\nfunc VminvU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VminvU32N VminvU32N\n//go:noescape\nfunc VminvU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VminvF32N VminvF32N\n//go:noescape\nfunc VminvF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VminvqS8N VminvqS8N\n//go:noescape\nfunc VminvqS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VminvqS16N VminvqS16N\n//go:noescape\nfunc VminvqS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VminvqS32N VminvqS32N\n//go:noescape\nfunc VminvqS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VminvqU8N VminvqU8N\n//go:noescape\nfunc VminvqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VminvqU16N VminvqU16N\n//go:noescape\nfunc VminvqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VminvqU32N VminvqU32N\n//go:noescape\nfunc VminvqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Floating-point Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VminvqF32N VminvqF32N\n//go:noescape\nfunc VminvqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VminvqF64N VminvqF64N\n//go:noescape\nfunc VminvqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNS8N VmovNS8N\n//go:noescape\nfunc VmovNS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNS16N VmovNS16N\n//go:noescape\nfunc VmovNS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNS32N VmovNS32N\n//go:noescape\nfunc VmovNS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNS64N VmovNS64N\n//go:noescape\nfunc VmovNS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNU8N VmovNU8N\n//go:noescape\nfunc VmovNU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNU16N VmovNU16N\n//go:noescape\nfunc VmovNU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNU32N VmovNU32N\n//go:noescape\nfunc VmovNU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNU64N VmovNU64N\n//go:noescape\nfunc VmovNU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNF32N VmovNF32N\n//go:noescape\nfunc VmovNF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovNF64N VmovNF64N\n//go:noescape\nfunc VmovNF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNS8N VmovqNS8N\n//go:noescape\nfunc VmovqNS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNS16N VmovqNS16N\n//go:noescape\nfunc VmovqNS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNS32N VmovqNS32N\n//go:noescape\nfunc VmovqNS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNS64N VmovqNS64N\n//go:noescape\nfunc VmovqNS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNU8N VmovqNU8N\n//go:noescape\nfunc VmovqNU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNU16N VmovqNU16N\n//go:noescape\nfunc VmovqNU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNU32N VmovqNU32N\n//go:noescape\nfunc VmovqNU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNU64N VmovqNU64N\n//go:noescape\nfunc VmovqNU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNF32N VmovqNF32N\n//go:noescape\nfunc VmovqNF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VmovqNF64N VmovqNF64N\n//go:noescape\nfunc VmovqNF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulS8N VmulS8N\n//go:noescape\nfunc VmulS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulS16N VmulS16N\n//go:noescape\nfunc VmulS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulS32N VmulS32N\n//go:noescape\nfunc VmulS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulU8N VmulU8N\n//go:noescape\nfunc VmulU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulU16N VmulU16N\n//go:noescape\nfunc VmulU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulU32N VmulU32N\n//go:noescape\nfunc VmulU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulF32N VmulF32N\n//go:noescape\nfunc VmulF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulF64N VmulF64N\n//go:noescape\nfunc VmulF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqS8N VmulqS8N\n//go:noescape\nfunc VmulqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqS16N VmulqS16N\n//go:noescape\nfunc VmulqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqS32N VmulqS32N\n//go:noescape\nfunc VmulqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqU8N VmulqU8N\n//go:noescape\nfunc VmulqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqU16N VmulqU16N\n//go:noescape\nfunc VmulqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqU32N VmulqU32N\n//go:noescape\nfunc VmulqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqF32N VmulqF32N\n//go:noescape\nfunc VmulqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulqF64N VmulqF64N\n//go:noescape\nfunc VmulqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulxF32N VmulxF32N\n//go:noescape\nfunc VmulxF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulxF64N VmulxF64N\n//go:noescape\nfunc VmulxF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulxdF64N VmulxdF64N\n//go:noescape\nfunc VmulxdF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulxqF32N VmulxqF32N\n//go:noescape\nfunc VmulxqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulxqF64N VmulxqF64N\n//go:noescape\nfunc VmulxqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmulxsF32N VmulxsF32N\n//go:noescape\nfunc VmulxsF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnS8N VmvnS8N\n//go:noescape\nfunc VmvnS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnS16N VmvnS16N\n//go:noescape\nfunc VmvnS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnS32N VmvnS32N\n//go:noescape\nfunc VmvnS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnU8N VmvnU8N\n//go:noescape\nfunc VmvnU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnU16N VmvnU16N\n//go:noescape\nfunc VmvnU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnU32N VmvnU32N\n//go:noescape\nfunc VmvnU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnqS8N VmvnqS8N\n//go:noescape\nfunc VmvnqS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnqS16N VmvnqS16N\n//go:noescape\nfunc VmvnqS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnqS32N VmvnqS32N\n//go:noescape\nfunc VmvnqS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnqU8N VmvnqU8N\n//go:noescape\nfunc VmvnqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnqU16N VmvnqU16N\n//go:noescape\nfunc VmvnqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VmvnqU32N VmvnqU32N\n//go:noescape\nfunc VmvnqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegS8N VnegS8N\n//go:noescape\nfunc VnegS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegS16N VnegS16N\n//go:noescape\nfunc VnegS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegS32N VnegS32N\n//go:noescape\nfunc VnegS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegS64N VnegS64N\n//go:noescape\nfunc VnegS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegF32N VnegF32N\n//go:noescape\nfunc VnegF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegF64N VnegF64N\n//go:noescape\nfunc VnegF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegdS64N VnegdS64N\n//go:noescape\nfunc VnegdS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegqS8N VnegqS8N\n//go:noescape\nfunc VnegqS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegqS16N VnegqS16N\n//go:noescape\nfunc VnegqS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegqS32N VnegqS32N\n//go:noescape\nfunc VnegqS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegqS64N VnegqS64N\n//go:noescape\nfunc VnegqS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegqF32N VnegqF32N\n//go:noescape\nfunc VnegqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VnegqF64N VnegqF64N\n//go:noescape\nfunc VnegqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornS8N VornS8N\n//go:noescape\nfunc VornS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornS16N VornS16N\n//go:noescape\nfunc VornS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornS32N VornS32N\n//go:noescape\nfunc VornS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornS64N VornS64N\n//go:noescape\nfunc VornS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornU8N VornU8N\n//go:noescape\nfunc VornU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornU16N VornU16N\n//go:noescape\nfunc VornU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornU32N VornU32N\n//go:noescape\nfunc VornU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornU64N VornU64N\n//go:noescape\nfunc VornU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqS8N VornqS8N\n//go:noescape\nfunc VornqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqS16N VornqS16N\n//go:noescape\nfunc VornqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqS32N VornqS32N\n//go:noescape\nfunc VornqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqS64N VornqS64N\n//go:noescape\nfunc VornqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqU8N VornqU8N\n//go:noescape\nfunc VornqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqU16N VornqU16N\n//go:noescape\nfunc VornqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqU32N VornqU32N\n//go:noescape\nfunc VornqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VornqU64N VornqU64N\n//go:noescape\nfunc VornqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrS8N VorrS8N\n//go:noescape\nfunc VorrS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrS16N VorrS16N\n//go:noescape\nfunc VorrS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrS32N VorrS32N\n//go:noescape\nfunc VorrS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrS64N VorrS64N\n//go:noescape\nfunc VorrS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrU8N VorrU8N\n//go:noescape\nfunc VorrU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrU16N VorrU16N\n//go:noescape\nfunc VorrU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrU32N VorrU32N\n//go:noescape\nfunc VorrU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrU64N VorrU64N\n//go:noescape\nfunc VorrU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqS8N VorrqS8N\n//go:noescape\nfunc VorrqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqS16N VorrqS16N\n//go:noescape\nfunc VorrqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqS32N VorrqS32N\n//go:noescape\nfunc VorrqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqS64N VorrqS64N\n//go:noescape\nfunc VorrqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqU8N VorrqU8N\n//go:noescape\nfunc VorrqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqU16N VorrqU16N\n//go:noescape\nfunc VorrqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqU32N VorrqU32N\n//go:noescape\nfunc VorrqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname VorrqU64N VorrqU64N\n//go:noescape\nfunc VorrqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddS8N VpaddS8N\n//go:noescape\nfunc VpaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddS16N VpaddS16N\n//go:noescape\nfunc VpaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddS32N VpaddS32N\n//go:noescape\nfunc VpaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddU8N VpaddU8N\n//go:noescape\nfunc VpaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddU16N VpaddU16N\n//go:noescape\nfunc VpaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddU32N VpaddU32N\n//go:noescape\nfunc VpaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpaddF32N VpaddF32N\n//go:noescape\nfunc VpaddF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpadddS64N VpadddS64N\n//go:noescape\nfunc VpadddS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpadddU64N VpadddU64N\n//go:noescape\nfunc VpadddU64N(r *arm.Uint64, v0 *arm.Uint64, n int32)\n\n// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpadddF64N VpadddF64N\n//go:noescape\nfunc VpadddF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqS8N VpaddqS8N\n//go:noescape\nfunc VpaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqS16N VpaddqS16N\n//go:noescape\nfunc VpaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqS32N VpaddqS32N\n//go:noescape\nfunc VpaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqS64N VpaddqS64N\n//go:noescape\nfunc VpaddqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqU8N VpaddqU8N\n//go:noescape\nfunc VpaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqU16N VpaddqU16N\n//go:noescape\nfunc VpaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqU32N VpaddqU32N\n//go:noescape\nfunc VpaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpaddqU64N VpaddqU64N\n//go:noescape\nfunc VpaddqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpaddqF32N VpaddqF32N\n//go:noescape\nfunc VpaddqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpaddqF64N VpaddqF64N\n//go:noescape\nfunc VpaddqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpaddsF32N VpaddsF32N\n//go:noescape\nfunc VpaddsF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxS8N VpmaxS8N\n//go:noescape\nfunc VpmaxS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxS16N VpmaxS16N\n//go:noescape\nfunc VpmaxS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxS32N VpmaxS32N\n//go:noescape\nfunc VpmaxS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxU8N VpmaxU8N\n//go:noescape\nfunc VpmaxU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxU16N VpmaxU16N\n//go:noescape\nfunc VpmaxU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxU32N VpmaxU32N\n//go:noescape\nfunc VpmaxU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxF32N VpmaxF32N\n//go:noescape\nfunc VpmaxF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxnmF32N VpmaxnmF32N\n//go:noescape\nfunc VpmaxnmF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxnmqF32N VpmaxnmqF32N\n//go:noescape\nfunc VpmaxnmqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxnmqF64N VpmaxnmqF64N\n//go:noescape\nfunc VpmaxnmqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxnmqdF64N VpmaxnmqdF64N\n//go:noescape\nfunc VpmaxnmqdF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxnmsF32N VpmaxnmsF32N\n//go:noescape\nfunc VpmaxnmsF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxqS8N VpmaxqS8N\n//go:noescape\nfunc VpmaxqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxqS16N VpmaxqS16N\n//go:noescape\nfunc VpmaxqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxqS32N VpmaxqS32N\n//go:noescape\nfunc VpmaxqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxqU8N VpmaxqU8N\n//go:noescape\nfunc VpmaxqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxqU16N VpmaxqU16N\n//go:noescape\nfunc VpmaxqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpmaxqU32N VpmaxqU32N\n//go:noescape\nfunc VpmaxqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxqF32N VpmaxqF32N\n//go:noescape\nfunc VpmaxqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxqF64N VpmaxqF64N\n//go:noescape\nfunc VpmaxqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxqdF64N VpmaxqdF64N\n//go:noescape\nfunc VpmaxqdF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpmaxsF32N VpmaxsF32N\n//go:noescape\nfunc VpmaxsF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminS8N VpminS8N\n//go:noescape\nfunc VpminS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminS16N VpminS16N\n//go:noescape\nfunc VpminS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminS32N VpminS32N\n//go:noescape\nfunc VpminS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminU8N VpminU8N\n//go:noescape\nfunc VpminU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminU16N VpminU16N\n//go:noescape\nfunc VpminU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminU32N VpminU32N\n//go:noescape\nfunc VpminU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminF32N VpminF32N\n//go:noescape\nfunc VpminF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminnmF32N VpminnmF32N\n//go:noescape\nfunc VpminnmF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminnmqF32N VpminnmqF32N\n//go:noescape\nfunc VpminnmqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminnmqF64N VpminnmqF64N\n//go:noescape\nfunc VpminnmqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminnmqdF64N VpminnmqdF64N\n//go:noescape\nfunc VpminnmqdF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminnmsF32N VpminnmsF32N\n//go:noescape\nfunc VpminnmsF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminqS8N VpminqS8N\n//go:noescape\nfunc VpminqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminqS16N VpminqS16N\n//go:noescape\nfunc VpminqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminqS32N VpminqS32N\n//go:noescape\nfunc VpminqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminqU8N VpminqU8N\n//go:noescape\nfunc VpminqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminqU16N VpminqU16N\n//go:noescape\nfunc VpminqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VpminqU32N VpminqU32N\n//go:noescape\nfunc VpminqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminqF32N VpminqF32N\n//go:noescape\nfunc VpminqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminqF64N VpminqF64N\n//go:noescape\nfunc VpminqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminqdF64N VpminqdF64N\n//go:noescape\nfunc VpminqdF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.\n//\n//go:linkname VpminsF32N VpminsF32N\n//go:noescape\nfunc VpminsF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsS8N VqabsS8N\n//go:noescape\nfunc VqabsS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsS16N VqabsS16N\n//go:noescape\nfunc VqabsS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsS32N VqabsS32N\n//go:noescape\nfunc VqabsS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsS64N VqabsS64N\n//go:noescape\nfunc VqabsS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsbS8N VqabsbS8N\n//go:noescape\nfunc VqabsbS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsdS64N VqabsdS64N\n//go:noescape\nfunc VqabsdS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabshS16N VqabshS16N\n//go:noescape\nfunc VqabshS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsqS8N VqabsqS8N\n//go:noescape\nfunc VqabsqS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsqS16N VqabsqS16N\n//go:noescape\nfunc VqabsqS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsqS32N VqabsqS32N\n//go:noescape\nfunc VqabsqS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabsqS64N VqabsqS64N\n//go:noescape\nfunc VqabsqS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqabssS32N VqabssS32N\n//go:noescape\nfunc VqabssS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddS8N VqaddS8N\n//go:noescape\nfunc VqaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddS16N VqaddS16N\n//go:noescape\nfunc VqaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddS32N VqaddS32N\n//go:noescape\nfunc VqaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddS64N VqaddS64N\n//go:noescape\nfunc VqaddS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddU8N VqaddU8N\n//go:noescape\nfunc VqaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddU16N VqaddU16N\n//go:noescape\nfunc VqaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddU32N VqaddU32N\n//go:noescape\nfunc VqaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddU64N VqaddU64N\n//go:noescape\nfunc VqaddU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddbS8N VqaddbS8N\n//go:noescape\nfunc VqaddbS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddbU8N VqaddbU8N\n//go:noescape\nfunc VqaddbU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqadddS64N VqadddS64N\n//go:noescape\nfunc VqadddS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqadddU64N VqadddU64N\n//go:noescape\nfunc VqadddU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddhS16N VqaddhS16N\n//go:noescape\nfunc VqaddhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddhU16N VqaddhU16N\n//go:noescape\nfunc VqaddhU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqS8N VqaddqS8N\n//go:noescape\nfunc VqaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqS16N VqaddqS16N\n//go:noescape\nfunc VqaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqS32N VqaddqS32N\n//go:noescape\nfunc VqaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqS64N VqaddqS64N\n//go:noescape\nfunc VqaddqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqU8N VqaddqU8N\n//go:noescape\nfunc VqaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqU16N VqaddqU16N\n//go:noescape\nfunc VqaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqU32N VqaddqU32N\n//go:noescape\nfunc VqaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddqU64N VqaddqU64N\n//go:noescape\nfunc VqaddqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddsS32N VqaddsS32N\n//go:noescape\nfunc VqaddsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqaddsU32N VqaddsU32N\n//go:noescape\nfunc VqaddsU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmulhS16N VqdmulhS16N\n//go:noescape\nfunc VqdmulhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmulhS32N VqdmulhS32N\n//go:noescape\nfunc VqdmulhS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmulhhS16N VqdmulhhS16N\n//go:noescape\nfunc VqdmulhhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmulhqS16N VqdmulhqS16N\n//go:noescape\nfunc VqdmulhqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmulhqS32N VqdmulhqS32N\n//go:noescape\nfunc VqdmulhqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqdmulhsS32N VqdmulhsS32N\n//go:noescape\nfunc VqdmulhsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegS8N VqnegS8N\n//go:noescape\nfunc VqnegS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegS16N VqnegS16N\n//go:noescape\nfunc VqnegS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegS32N VqnegS32N\n//go:noescape\nfunc VqnegS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegS64N VqnegS64N\n//go:noescape\nfunc VqnegS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegbS8N VqnegbS8N\n//go:noescape\nfunc VqnegbS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegdS64N VqnegdS64N\n//go:noescape\nfunc VqnegdS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqneghS16N VqneghS16N\n//go:noescape\nfunc VqneghS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegqS8N VqnegqS8N\n//go:noescape\nfunc VqnegqS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegqS16N VqnegqS16N\n//go:noescape\nfunc VqnegqS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegqS32N VqnegqS32N\n//go:noescape\nfunc VqnegqS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegqS64N VqnegqS64N\n//go:noescape\nfunc VqnegqS64N(r *arm.Int64, v0 *arm.Int64, n int32)\n\n// Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values.\n//\n//go:linkname VqnegsS32N VqnegsS32N\n//go:noescape\nfunc VqnegsS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrdmulhS16N VqrdmulhS16N\n//go:noescape\nfunc VqrdmulhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrdmulhS32N VqrdmulhS32N\n//go:noescape\nfunc VqrdmulhS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrdmulhhS16N VqrdmulhhS16N\n//go:noescape\nfunc VqrdmulhhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrdmulhqS16N VqrdmulhqS16N\n//go:noescape\nfunc VqrdmulhqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrdmulhqS32N VqrdmulhqS32N\n//go:noescape\nfunc VqrdmulhqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrdmulhsS32N VqrdmulhsS32N\n//go:noescape\nfunc VqrdmulhsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlS8N VqrshlS8N\n//go:noescape\nfunc VqrshlS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlS16N VqrshlS16N\n//go:noescape\nfunc VqrshlS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlS32N VqrshlS32N\n//go:noescape\nfunc VqrshlS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlS64N VqrshlS64N\n//go:noescape\nfunc VqrshlS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlbS8N VqrshlbS8N\n//go:noescape\nfunc VqrshlbS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshldS64N VqrshldS64N\n//go:noescape\nfunc VqrshldS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlhS16N VqrshlhS16N\n//go:noescape\nfunc VqrshlhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlqS8N VqrshlqS8N\n//go:noescape\nfunc VqrshlqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlqS16N VqrshlqS16N\n//go:noescape\nfunc VqrshlqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlqS32N VqrshlqS32N\n//go:noescape\nfunc VqrshlqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlqS64N VqrshlqS64N\n//go:noescape\nfunc VqrshlqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqrshlsS32N VqrshlsS32N\n//go:noescape\nfunc VqrshlsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlS8N VqshlS8N\n//go:noescape\nfunc VqshlS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlS16N VqshlS16N\n//go:noescape\nfunc VqshlS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlS32N VqshlS32N\n//go:noescape\nfunc VqshlS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlS64N VqshlS64N\n//go:noescape\nfunc VqshlS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlbS8N VqshlbS8N\n//go:noescape\nfunc VqshlbS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshldS64N VqshldS64N\n//go:noescape\nfunc VqshldS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlhS16N VqshlhS16N\n//go:noescape\nfunc VqshlhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlqS8N VqshlqS8N\n//go:noescape\nfunc VqshlqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlqS16N VqshlqS16N\n//go:noescape\nfunc VqshlqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlqS32N VqshlqS32N\n//go:noescape\nfunc VqshlqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlqS64N VqshlqS64N\n//go:noescape\nfunc VqshlqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqshlsS32N VqshlsS32N\n//go:noescape\nfunc VqshlsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubS8N VqsubS8N\n//go:noescape\nfunc VqsubS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubS16N VqsubS16N\n//go:noescape\nfunc VqsubS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubS32N VqsubS32N\n//go:noescape\nfunc VqsubS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubS64N VqsubS64N\n//go:noescape\nfunc VqsubS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubU8N VqsubU8N\n//go:noescape\nfunc VqsubU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubU16N VqsubU16N\n//go:noescape\nfunc VqsubU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubU32N VqsubU32N\n//go:noescape\nfunc VqsubU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubU64N VqsubU64N\n//go:noescape\nfunc VqsubU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubbS8N VqsubbS8N\n//go:noescape\nfunc VqsubbS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubbU8N VqsubbU8N\n//go:noescape\nfunc VqsubbU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubdS64N VqsubdS64N\n//go:noescape\nfunc VqsubdS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubdU64N VqsubdU64N\n//go:noescape\nfunc VqsubdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubhS16N VqsubhS16N\n//go:noescape\nfunc VqsubhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubhU16N VqsubhU16N\n//go:noescape\nfunc VqsubhU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqS8N VqsubqS8N\n//go:noescape\nfunc VqsubqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqS16N VqsubqS16N\n//go:noescape\nfunc VqsubqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqS32N VqsubqS32N\n//go:noescape\nfunc VqsubqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqS64N VqsubqS64N\n//go:noescape\nfunc VqsubqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqU8N VqsubqU8N\n//go:noescape\nfunc VqsubqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqU16N VqsubqU16N\n//go:noescape\nfunc VqsubqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqU32N VqsubqU32N\n//go:noescape\nfunc VqsubqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubqU64N VqsubqU64N\n//go:noescape\nfunc VqsubqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubsS32N VqsubsS32N\n//go:noescape\nfunc VqsubsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VqsubsU32N VqsubsU32N\n//go:noescape\nfunc VqsubsU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vqtbl1QU8N Vqtbl1QU8N\n//go:noescape\nfunc Vqtbl1QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Rotate and Exclusive OR rotates each 64-bit element of the 128-bit vector in a source SIMD&FP register left by 1, performs a bitwise exclusive OR of the resulting 128-bit vector and the vector in another source SIMD&FP register, and writes the result to the destination SIMD&FP register.\n//\n//go:linkname Vrax1QU64N Vrax1QU64N\n//go:noescape\nfunc Vrax1QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrbitS8N VrbitS8N\n//go:noescape\nfunc VrbitS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrbitU8N VrbitU8N\n//go:noescape\nfunc VrbitU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrbitqS8N VrbitqS8N\n//go:noescape\nfunc VrbitqS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrbitqU8N VrbitqU8N\n//go:noescape\nfunc VrbitqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpeU32N VrecpeU32N\n//go:noescape\nfunc VrecpeU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpeF32N VrecpeF32N\n//go:noescape\nfunc VrecpeF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpeF64N VrecpeF64N\n//go:noescape\nfunc VrecpeF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpedF64N VrecpedF64N\n//go:noescape\nfunc VrecpedF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpeqU32N VrecpeqU32N\n//go:noescape\nfunc VrecpeqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpeqF32N VrecpeqF32N\n//go:noescape\nfunc VrecpeqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpeqF64N VrecpeqF64N\n//go:noescape\nfunc VrecpeqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpesF32N VrecpesF32N\n//go:noescape\nfunc VrecpesF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpsF32N VrecpsF32N\n//go:noescape\nfunc VrecpsF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpsF64N VrecpsF64N\n//go:noescape\nfunc VrecpsF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpsdF64N VrecpsdF64N\n//go:noescape\nfunc VrecpsdF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpsqF32N VrecpsqF32N\n//go:noescape\nfunc VrecpsqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpsqF64N VrecpsqF64N\n//go:noescape\nfunc VrecpsqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpssF32N VrecpssF32N\n//go:noescape\nfunc VrecpssF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpxdF64N VrecpxdF64N\n//go:noescape\nfunc VrecpxdF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrecpxsF32N VrecpxsF32N\n//go:noescape\nfunc VrecpxsF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF32S32N VreinterpretF32S32N\n//go:noescape\nfunc VreinterpretF32S32N(r *arm.Float32, v0 *arm.Int32, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF32U32N VreinterpretF32U32N\n//go:noescape\nfunc VreinterpretF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF64S64N VreinterpretF64S64N\n//go:noescape\nfunc VreinterpretF64S64N(r *arm.Float64, v0 *arm.Int64, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretF64U64N VreinterpretF64U64N\n//go:noescape\nfunc VreinterpretF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS16U16N VreinterpretS16U16N\n//go:noescape\nfunc VreinterpretS16U16N(r *arm.Int16, v0 *arm.Uint16, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS32U32N VreinterpretS32U32N\n//go:noescape\nfunc VreinterpretS32U32N(r *arm.Int32, v0 *arm.Uint32, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS32F32N VreinterpretS32F32N\n//go:noescape\nfunc VreinterpretS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS64U64N VreinterpretS64U64N\n//go:noescape\nfunc VreinterpretS64U64N(r *arm.Int64, v0 *arm.Uint64, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS64F64N VreinterpretS64F64N\n//go:noescape\nfunc VreinterpretS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretS8U8N VreinterpretS8U8N\n//go:noescape\nfunc VreinterpretS8U8N(r *arm.Int8, v0 *arm.Uint8, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU16S16N VreinterpretU16S16N\n//go:noescape\nfunc VreinterpretU16S16N(r *arm.Uint16, v0 *arm.Int16, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU32S32N VreinterpretU32S32N\n//go:noescape\nfunc VreinterpretU32S32N(r *arm.Uint32, v0 *arm.Int32, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU32F32N VreinterpretU32F32N\n//go:noescape\nfunc VreinterpretU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU64S64N VreinterpretU64S64N\n//go:noescape\nfunc VreinterpretU64S64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU64F64N VreinterpretU64F64N\n//go:noescape\nfunc VreinterpretU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretU8S8N VreinterpretU8S8N\n//go:noescape\nfunc VreinterpretU8S8N(r *arm.Uint8, v0 *arm.Int8, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF32S32N VreinterpretqF32S32N\n//go:noescape\nfunc VreinterpretqF32S32N(r *arm.Float32, v0 *arm.Int32, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF32U32N VreinterpretqF32U32N\n//go:noescape\nfunc VreinterpretqF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64S64N VreinterpretqF64S64N\n//go:noescape\nfunc VreinterpretqF64S64N(r *arm.Float64, v0 *arm.Int64, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqF64U64N VreinterpretqF64U64N\n//go:noescape\nfunc VreinterpretqF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS16U16N VreinterpretqS16U16N\n//go:noescape\nfunc VreinterpretqS16U16N(r *arm.Int16, v0 *arm.Uint16, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32U32N VreinterpretqS32U32N\n//go:noescape\nfunc VreinterpretqS32U32N(r *arm.Int32, v0 *arm.Uint32, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS32F32N VreinterpretqS32F32N\n//go:noescape\nfunc VreinterpretqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64U64N VreinterpretqS64U64N\n//go:noescape\nfunc VreinterpretqS64U64N(r *arm.Int64, v0 *arm.Uint64, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS64F64N VreinterpretqS64F64N\n//go:noescape\nfunc VreinterpretqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqS8U8N VreinterpretqS8U8N\n//go:noescape\nfunc VreinterpretqS8U8N(r *arm.Int8, v0 *arm.Uint8, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU16S16N VreinterpretqU16S16N\n//go:noescape\nfunc VreinterpretqU16S16N(r *arm.Uint16, v0 *arm.Int16, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32S32N VreinterpretqU32S32N\n//go:noescape\nfunc VreinterpretqU32S32N(r *arm.Uint32, v0 *arm.Int32, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU32F32N VreinterpretqU32F32N\n//go:noescape\nfunc VreinterpretqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64S64N VreinterpretqU64S64N\n//go:noescape\nfunc VreinterpretqU64S64N(r *arm.Uint64, v0 *arm.Int64, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU64F64N VreinterpretqU64F64N\n//go:noescape\nfunc VreinterpretqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32)\n\n// Vector reinterpret cast operation\n//\n//go:linkname VreinterpretqU8S8N VreinterpretqU8S8N\n//go:noescape\nfunc VreinterpretqU8S8N(r *arm.Uint8, v0 *arm.Int8, n int32)\n\n// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev16S8N Vrev16S8N\n//go:noescape\nfunc Vrev16S8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev16U8N Vrev16U8N\n//go:noescape\nfunc Vrev16U8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev16QS8N Vrev16QS8N\n//go:noescape\nfunc Vrev16QS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev16QU8N Vrev16QU8N\n//go:noescape\nfunc Vrev16QU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32S8N Vrev32S8N\n//go:noescape\nfunc Vrev32S8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32S16N Vrev32S16N\n//go:noescape\nfunc Vrev32S16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32U8N Vrev32U8N\n//go:noescape\nfunc Vrev32U8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32U16N Vrev32U16N\n//go:noescape\nfunc Vrev32U16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32QS8N Vrev32QS8N\n//go:noescape\nfunc Vrev32QS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32QS16N Vrev32QS16N\n//go:noescape\nfunc Vrev32QS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32QU8N Vrev32QU8N\n//go:noescape\nfunc Vrev32QU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev32QU16N Vrev32QU16N\n//go:noescape\nfunc Vrev32QU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64S8N Vrev64S8N\n//go:noescape\nfunc Vrev64S8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64S16N Vrev64S16N\n//go:noescape\nfunc Vrev64S16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64S32N Vrev64S32N\n//go:noescape\nfunc Vrev64S32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64U8N Vrev64U8N\n//go:noescape\nfunc Vrev64U8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64U16N Vrev64U16N\n//go:noescape\nfunc Vrev64U16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64U32N Vrev64U32N\n//go:noescape\nfunc Vrev64U32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64F32N Vrev64F32N\n//go:noescape\nfunc Vrev64F32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QS8N Vrev64QS8N\n//go:noescape\nfunc Vrev64QS8N(r *arm.Int8, v0 *arm.Int8, n int32)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QS16N Vrev64QS16N\n//go:noescape\nfunc Vrev64QS16N(r *arm.Int16, v0 *arm.Int16, n int32)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QS32N Vrev64QS32N\n//go:noescape\nfunc Vrev64QS32N(r *arm.Int32, v0 *arm.Int32, n int32)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QU8N Vrev64QU8N\n//go:noescape\nfunc Vrev64QU8N(r *arm.Uint8, v0 *arm.Uint8, n int32)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QU16N Vrev64QU16N\n//go:noescape\nfunc Vrev64QU16N(r *arm.Uint16, v0 *arm.Uint16, n int32)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QU32N Vrev64QU32N\n//go:noescape\nfunc Vrev64QU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vrev64QF32N Vrev64QF32N\n//go:noescape\nfunc Vrev64QF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddS8N VrhaddS8N\n//go:noescape\nfunc VrhaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddS16N VrhaddS16N\n//go:noescape\nfunc VrhaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddS32N VrhaddS32N\n//go:noescape\nfunc VrhaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddU8N VrhaddU8N\n//go:noescape\nfunc VrhaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddU16N VrhaddU16N\n//go:noescape\nfunc VrhaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddU32N VrhaddU32N\n//go:noescape\nfunc VrhaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddqS8N VrhaddqS8N\n//go:noescape\nfunc VrhaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddqS16N VrhaddqS16N\n//go:noescape\nfunc VrhaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddqS32N VrhaddqS32N\n//go:noescape\nfunc VrhaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddqU8N VrhaddqU8N\n//go:noescape\nfunc VrhaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddqU16N VrhaddqU16N\n//go:noescape\nfunc VrhaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrhaddqU32N VrhaddqU32N\n//go:noescape\nfunc VrhaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndF32N VrndF32N\n//go:noescape\nfunc VrndF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndF64N VrndF64N\n//go:noescape\nfunc VrndF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32XF32N Vrnd32XF32N\n//go:noescape\nfunc Vrnd32XF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32XF64N Vrnd32XF64N\n//go:noescape\nfunc Vrnd32XF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32XqF32N Vrnd32XqF32N\n//go:noescape\nfunc Vrnd32XqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32XqF64N Vrnd32XqF64N\n//go:noescape\nfunc Vrnd32XqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32ZF32N Vrnd32ZF32N\n//go:noescape\nfunc Vrnd32ZF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32ZF64N Vrnd32ZF64N\n//go:noescape\nfunc Vrnd32ZF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32ZqF32N Vrnd32ZqF32N\n//go:noescape\nfunc Vrnd32ZqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd32ZqF64N Vrnd32ZqF64N\n//go:noescape\nfunc Vrnd32ZqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64XF32N Vrnd64XF32N\n//go:noescape\nfunc Vrnd64XF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64XF64N Vrnd64XF64N\n//go:noescape\nfunc Vrnd64XF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64XqF32N Vrnd64XqF32N\n//go:noescape\nfunc Vrnd64XqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64XqF64N Vrnd64XqF64N\n//go:noescape\nfunc Vrnd64XqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64ZF32N Vrnd64ZF32N\n//go:noescape\nfunc Vrnd64ZF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64ZF64N Vrnd64ZF64N\n//go:noescape\nfunc Vrnd64ZF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64ZqF32N Vrnd64ZqF32N\n//go:noescape\nfunc Vrnd64ZqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname Vrnd64ZqF64N Vrnd64ZqF64N\n//go:noescape\nfunc Vrnd64ZqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndaF32N VrndaF32N\n//go:noescape\nfunc VrndaF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndaF64N VrndaF64N\n//go:noescape\nfunc VrndaF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndaqF32N VrndaqF32N\n//go:noescape\nfunc VrndaqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndaqF64N VrndaqF64N\n//go:noescape\nfunc VrndaqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndiF32N VrndiF32N\n//go:noescape\nfunc VrndiF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndiF64N VrndiF64N\n//go:noescape\nfunc VrndiF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndiqF32N VrndiqF32N\n//go:noescape\nfunc VrndiqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndiqF64N VrndiqF64N\n//go:noescape\nfunc VrndiqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndmF32N VrndmF32N\n//go:noescape\nfunc VrndmF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndmF64N VrndmF64N\n//go:noescape\nfunc VrndmF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndmqF32N VrndmqF32N\n//go:noescape\nfunc VrndmqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndmqF64N VrndmqF64N\n//go:noescape\nfunc VrndmqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndnF32N VrndnF32N\n//go:noescape\nfunc VrndnF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndnF64N VrndnF64N\n//go:noescape\nfunc VrndnF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndnqF32N VrndnqF32N\n//go:noescape\nfunc VrndnqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndnqF64N VrndnqF64N\n//go:noescape\nfunc VrndnqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndnsF32N VrndnsF32N\n//go:noescape\nfunc VrndnsF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndpF32N VrndpF32N\n//go:noescape\nfunc VrndpF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndpF64N VrndpF64N\n//go:noescape\nfunc VrndpF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndpqF32N VrndpqF32N\n//go:noescape\nfunc VrndpqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndpqF64N VrndpqF64N\n//go:noescape\nfunc VrndpqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndqF32N VrndqF32N\n//go:noescape\nfunc VrndqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndqF64N VrndqF64N\n//go:noescape\nfunc VrndqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndxF32N VrndxF32N\n//go:noescape\nfunc VrndxF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndxF64N VrndxF64N\n//go:noescape\nfunc VrndxF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndxqF32N VrndxqF32N\n//go:noescape\nfunc VrndxqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.\n//\n//go:linkname VrndxqF64N VrndxqF64N\n//go:noescape\nfunc VrndxqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlS8N VrshlS8N\n//go:noescape\nfunc VrshlS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlS16N VrshlS16N\n//go:noescape\nfunc VrshlS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlS32N VrshlS32N\n//go:noescape\nfunc VrshlS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlS64N VrshlS64N\n//go:noescape\nfunc VrshlS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshldS64N VrshldS64N\n//go:noescape\nfunc VrshldS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlqS8N VrshlqS8N\n//go:noescape\nfunc VrshlqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlqS16N VrshlqS16N\n//go:noescape\nfunc VrshlqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlqS32N VrshlqS32N\n//go:noescape\nfunc VrshlqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrshlqS64N VrshlqS64N\n//go:noescape\nfunc VrshlqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VrsqrteU32N VrsqrteU32N\n//go:noescape\nfunc VrsqrteU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrteF32N VrsqrteF32N\n//go:noescape\nfunc VrsqrteF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrteF64N VrsqrteF64N\n//go:noescape\nfunc VrsqrteF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtedF64N VrsqrtedF64N\n//go:noescape\nfunc VrsqrtedF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.\n//\n//go:linkname VrsqrteqU32N VrsqrteqU32N\n//go:noescape\nfunc VrsqrteqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrteqF32N VrsqrteqF32N\n//go:noescape\nfunc VrsqrteqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrteqF64N VrsqrteqF64N\n//go:noescape\nfunc VrsqrteqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtesF32N VrsqrtesF32N\n//go:noescape\nfunc VrsqrtesF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtsF32N VrsqrtsF32N\n//go:noescape\nfunc VrsqrtsF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtsF64N VrsqrtsF64N\n//go:noescape\nfunc VrsqrtsF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtsdF64N VrsqrtsdF64N\n//go:noescape\nfunc VrsqrtsdF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtsqF32N VrsqrtsqF32N\n//go:noescape\nfunc VrsqrtsqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtsqF64N VrsqrtsqF64N\n//go:noescape\nfunc VrsqrtsqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VrsqrtssF32N VrsqrtssF32N\n//go:noescape\nfunc VrsqrtssF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// SHA1 fixed rotate.\n//\n//go:linkname Vsha1HU32N Vsha1HU32N\n//go:noescape\nfunc Vsha1HU32N(r *arm.Uint32, v0 *arm.Uint32, n int32)\n\n// SHA1 schedule update 1.\n//\n//go:linkname Vsha1Su1QU32N Vsha1Su1QU32N\n//go:noescape\nfunc Vsha1Su1QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// SHA256 schedule update 0.\n//\n//go:linkname Vsha256Su0QU32N Vsha256Su0QU32N\n//go:noescape\nfunc Vsha256Su0QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// SHA512 Schedule Update 0 takes the values from the two 128-bit source SIMD&FP registers and produces a 128-bit output value that combines the gamma0 functions of two iterations of the SHA512 schedule update that are performed after the first 16 iterations within a block. It returns this value to the destination SIMD&FP register.\n//\n//go:linkname Vsha512Su0QU64N Vsha512Su0QU64N\n//go:noescape\nfunc Vsha512Su0QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlS8N VshlS8N\n//go:noescape\nfunc VshlS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlS16N VshlS16N\n//go:noescape\nfunc VshlS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlS32N VshlS32N\n//go:noescape\nfunc VshlS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlS64N VshlS64N\n//go:noescape\nfunc VshlS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshldS64N VshldS64N\n//go:noescape\nfunc VshldS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlqS8N VshlqS8N\n//go:noescape\nfunc VshlqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlqS16N VshlqS16N\n//go:noescape\nfunc VshlqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlqS32N VshlqS32N\n//go:noescape\nfunc VshlqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VshlqS64N VshlqS64N\n//go:noescape\nfunc VshlqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// SM4 Key takes an input as a 128-bit vector from the first source SIMD&FP register and a 128-bit constant from the second SIMD&FP register. It derives four iterations of the output key, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register.\n//\n//go:linkname Vsm4EkeyqU32N Vsm4EkeyqU32N\n//go:noescape\nfunc Vsm4EkeyqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// SM4 Encode takes input data as a 128-bit vector from the first source SIMD&FP register, and four iterations of the round key held as the elements of the 128-bit vector in the second source SIMD&FP register. It encrypts the data by four rounds, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register.\n//\n//go:linkname Vsm4EqU32N Vsm4EqU32N\n//go:noescape\nfunc Vsm4EqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsqrtF32N VsqrtF32N\n//go:noescape\nfunc VsqrtF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsqrtF64N VsqrtF64N\n//go:noescape\nfunc VsqrtF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsqrtqF32N VsqrtqF32N\n//go:noescape\nfunc VsqrtqF32N(r *arm.Float32, v0 *arm.Float32, n int32)\n\n// Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsqrtqF64N VsqrtqF64N\n//go:noescape\nfunc VsqrtqF64N(r *arm.Float64, v0 *arm.Float64, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubS8N VsubS8N\n//go:noescape\nfunc VsubS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubS16N VsubS16N\n//go:noescape\nfunc VsubS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubS32N VsubS32N\n//go:noescape\nfunc VsubS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubS64N VsubS64N\n//go:noescape\nfunc VsubS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubU8N VsubU8N\n//go:noescape\nfunc VsubU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubU16N VsubU16N\n//go:noescape\nfunc VsubU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubU32N VsubU32N\n//go:noescape\nfunc VsubU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubU64N VsubU64N\n//go:noescape\nfunc VsubU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubF32N VsubF32N\n//go:noescape\nfunc VsubF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubF64N VsubF64N\n//go:noescape\nfunc VsubF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubdS64N VsubdS64N\n//go:noescape\nfunc VsubdS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubdU64N VsubdU64N\n//go:noescape\nfunc VsubdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqS8N VsubqS8N\n//go:noescape\nfunc VsubqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqS16N VsubqS16N\n//go:noescape\nfunc VsubqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqS32N VsubqS32N\n//go:noescape\nfunc VsubqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqS64N VsubqS64N\n//go:noescape\nfunc VsubqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqU8N VsubqU8N\n//go:noescape\nfunc VsubqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqU16N VsubqU16N\n//go:noescape\nfunc VsubqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqU32N VsubqU32N\n//go:noescape\nfunc VsubqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqU64N VsubqU64N\n//go:noescape\nfunc VsubqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqF32N VsubqF32N\n//go:noescape\nfunc VsubqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname VsubqF64N VsubqF64N\n//go:noescape\nfunc VsubqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbl1S8N Vtbl1S8N\n//go:noescape\nfunc Vtbl1S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table.\n//\n//go:linkname Vtbl1U8N Vtbl1U8N\n//go:noescape\nfunc Vtbl1U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1S8N Vtrn1S8N\n//go:noescape\nfunc Vtrn1S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1S16N Vtrn1S16N\n//go:noescape\nfunc Vtrn1S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1S32N Vtrn1S32N\n//go:noescape\nfunc Vtrn1S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1U8N Vtrn1U8N\n//go:noescape\nfunc Vtrn1U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1U16N Vtrn1U16N\n//go:noescape\nfunc Vtrn1U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1U32N Vtrn1U32N\n//go:noescape\nfunc Vtrn1U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1F32N Vtrn1F32N\n//go:noescape\nfunc Vtrn1F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QS8N Vtrn1QS8N\n//go:noescape\nfunc Vtrn1QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QS16N Vtrn1QS16N\n//go:noescape\nfunc Vtrn1QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QS32N Vtrn1QS32N\n//go:noescape\nfunc Vtrn1QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QS64N Vtrn1QS64N\n//go:noescape\nfunc Vtrn1QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QU8N Vtrn1QU8N\n//go:noescape\nfunc Vtrn1QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QU16N Vtrn1QU16N\n//go:noescape\nfunc Vtrn1QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QU32N Vtrn1QU32N\n//go:noescape\nfunc Vtrn1QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QU64N Vtrn1QU64N\n//go:noescape\nfunc Vtrn1QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QF32N Vtrn1QF32N\n//go:noescape\nfunc Vtrn1QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn1QF64N Vtrn1QF64N\n//go:noescape\nfunc Vtrn1QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2S8N Vtrn2S8N\n//go:noescape\nfunc Vtrn2S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2S16N Vtrn2S16N\n//go:noescape\nfunc Vtrn2S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2S32N Vtrn2S32N\n//go:noescape\nfunc Vtrn2S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2U8N Vtrn2U8N\n//go:noescape\nfunc Vtrn2U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2U16N Vtrn2U16N\n//go:noescape\nfunc Vtrn2U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2U32N Vtrn2U32N\n//go:noescape\nfunc Vtrn2U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2F32N Vtrn2F32N\n//go:noescape\nfunc Vtrn2F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QS8N Vtrn2QS8N\n//go:noescape\nfunc Vtrn2QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QS16N Vtrn2QS16N\n//go:noescape\nfunc Vtrn2QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QS32N Vtrn2QS32N\n//go:noescape\nfunc Vtrn2QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QS64N Vtrn2QS64N\n//go:noescape\nfunc Vtrn2QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QU8N Vtrn2QU8N\n//go:noescape\nfunc Vtrn2QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QU16N Vtrn2QU16N\n//go:noescape\nfunc Vtrn2QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QU32N Vtrn2QU32N\n//go:noescape\nfunc Vtrn2QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QU64N Vtrn2QU64N\n//go:noescape\nfunc Vtrn2QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QF32N Vtrn2QF32N\n//go:noescape\nfunc Vtrn2QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector.\n//\n//go:linkname Vtrn2QF64N Vtrn2QF64N\n//go:noescape\nfunc Vtrn2QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstS8N VtstS8N\n//go:noescape\nfunc VtstS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstS16N VtstS16N\n//go:noescape\nfunc VtstS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstS32N VtstS32N\n//go:noescape\nfunc VtstS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstS64N VtstS64N\n//go:noescape\nfunc VtstS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstU8N VtstU8N\n//go:noescape\nfunc VtstU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstU16N VtstU16N\n//go:noescape\nfunc VtstU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstU32N VtstU32N\n//go:noescape\nfunc VtstU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstU64N VtstU64N\n//go:noescape\nfunc VtstU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstdS64N VtstdS64N\n//go:noescape\nfunc VtstdS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstdU64N VtstdU64N\n//go:noescape\nfunc VtstdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqS8N VtstqS8N\n//go:noescape\nfunc VtstqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqS16N VtstqS16N\n//go:noescape\nfunc VtstqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqS32N VtstqS32N\n//go:noescape\nfunc VtstqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqS64N VtstqS64N\n//go:noescape\nfunc VtstqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqU8N VtstqU8N\n//go:noescape\nfunc VtstqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqU16N VtstqU16N\n//go:noescape\nfunc VtstqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqU32N VtstqU32N\n//go:noescape\nfunc VtstqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.\n//\n//go:linkname VtstqU64N VtstqU64N\n//go:noescape\nfunc VtstqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1S8N Vuzp1S8N\n//go:noescape\nfunc Vuzp1S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1S16N Vuzp1S16N\n//go:noescape\nfunc Vuzp1S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1S32N Vuzp1S32N\n//go:noescape\nfunc Vuzp1S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1U8N Vuzp1U8N\n//go:noescape\nfunc Vuzp1U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1U16N Vuzp1U16N\n//go:noescape\nfunc Vuzp1U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1U32N Vuzp1U32N\n//go:noescape\nfunc Vuzp1U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1F32N Vuzp1F32N\n//go:noescape\nfunc Vuzp1F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QS8N Vuzp1QS8N\n//go:noescape\nfunc Vuzp1QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QS16N Vuzp1QS16N\n//go:noescape\nfunc Vuzp1QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QS32N Vuzp1QS32N\n//go:noescape\nfunc Vuzp1QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QS64N Vuzp1QS64N\n//go:noescape\nfunc Vuzp1QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QU8N Vuzp1QU8N\n//go:noescape\nfunc Vuzp1QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QU16N Vuzp1QU16N\n//go:noescape\nfunc Vuzp1QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QU32N Vuzp1QU32N\n//go:noescape\nfunc Vuzp1QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QU64N Vuzp1QU64N\n//go:noescape\nfunc Vuzp1QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QF32N Vuzp1QF32N\n//go:noescape\nfunc Vuzp1QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp1QF64N Vuzp1QF64N\n//go:noescape\nfunc Vuzp1QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2S8N Vuzp2S8N\n//go:noescape\nfunc Vuzp2S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2S16N Vuzp2S16N\n//go:noescape\nfunc Vuzp2S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2S32N Vuzp2S32N\n//go:noescape\nfunc Vuzp2S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2U8N Vuzp2U8N\n//go:noescape\nfunc Vuzp2U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2U16N Vuzp2U16N\n//go:noescape\nfunc Vuzp2U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2U32N Vuzp2U32N\n//go:noescape\nfunc Vuzp2U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2F32N Vuzp2F32N\n//go:noescape\nfunc Vuzp2F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QS8N Vuzp2QS8N\n//go:noescape\nfunc Vuzp2QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QS16N Vuzp2QS16N\n//go:noescape\nfunc Vuzp2QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QS32N Vuzp2QS32N\n//go:noescape\nfunc Vuzp2QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QS64N Vuzp2QS64N\n//go:noescape\nfunc Vuzp2QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QU8N Vuzp2QU8N\n//go:noescape\nfunc Vuzp2QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QU16N Vuzp2QU16N\n//go:noescape\nfunc Vuzp2QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QU32N Vuzp2QU32N\n//go:noescape\nfunc Vuzp2QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QU64N Vuzp2QU64N\n//go:noescape\nfunc Vuzp2QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QF32N Vuzp2QF32N\n//go:noescape\nfunc Vuzp2QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register.\n//\n//go:linkname Vuzp2QF64N Vuzp2QF64N\n//go:noescape\nfunc Vuzp2QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1S8N Vzip1S8N\n//go:noescape\nfunc Vzip1S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1S16N Vzip1S16N\n//go:noescape\nfunc Vzip1S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1S32N Vzip1S32N\n//go:noescape\nfunc Vzip1S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1U8N Vzip1U8N\n//go:noescape\nfunc Vzip1U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1U16N Vzip1U16N\n//go:noescape\nfunc Vzip1U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1U32N Vzip1U32N\n//go:noescape\nfunc Vzip1U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1F32N Vzip1F32N\n//go:noescape\nfunc Vzip1F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QS8N Vzip1QS8N\n//go:noescape\nfunc Vzip1QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QS16N Vzip1QS16N\n//go:noescape\nfunc Vzip1QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QS32N Vzip1QS32N\n//go:noescape\nfunc Vzip1QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QS64N Vzip1QS64N\n//go:noescape\nfunc Vzip1QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QU8N Vzip1QU8N\n//go:noescape\nfunc Vzip1QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QU16N Vzip1QU16N\n//go:noescape\nfunc Vzip1QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QU32N Vzip1QU32N\n//go:noescape\nfunc Vzip1QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QU64N Vzip1QU64N\n//go:noescape\nfunc Vzip1QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QF32N Vzip1QF32N\n//go:noescape\nfunc Vzip1QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip1QF64N Vzip1QF64N\n//go:noescape\nfunc Vzip1QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2S8N Vzip2S8N\n//go:noescape\nfunc Vzip2S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2S16N Vzip2S16N\n//go:noescape\nfunc Vzip2S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2S32N Vzip2S32N\n//go:noescape\nfunc Vzip2S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2U8N Vzip2U8N\n//go:noescape\nfunc Vzip2U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2U16N Vzip2U16N\n//go:noescape\nfunc Vzip2U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2U32N Vzip2U32N\n//go:noescape\nfunc Vzip2U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2F32N Vzip2F32N\n//go:noescape\nfunc Vzip2F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QS8N Vzip2QS8N\n//go:noescape\nfunc Vzip2QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QS16N Vzip2QS16N\n//go:noescape\nfunc Vzip2QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QS32N Vzip2QS32N\n//go:noescape\nfunc Vzip2QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QS64N Vzip2QS64N\n//go:noescape\nfunc Vzip2QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QU8N Vzip2QU8N\n//go:noescape\nfunc Vzip2QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QU16N Vzip2QU16N\n//go:noescape\nfunc Vzip2QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QU32N Vzip2QU32N\n//go:noescape\nfunc Vzip2QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QU64N Vzip2QU64N\n//go:noescape\nfunc Vzip2QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QF32N Vzip2QF32N\n//go:noescape\nfunc Vzip2QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32)\n\n// Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register.\n//\n//go:linkname Vzip2QF64N Vzip2QF64N\n//go:noescape\nfunc Vzip2QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32)\n"
  },
  {
    "path": "arm/neon/loops_test.go",
    "content": "package neon\n\nimport (\n\t\"math/rand\"\n\t\"reflect\"\n\t\"testing\"\n\t\"unsafe\"\n\n\t\"github.com/alivanz/go-simd/arm\"\n)\n\nfunc TestVabsS32N(t *testing.T) {\n\tconst N = 1024 * 16\n\tvar (\n\t\tr   = make([]arm.Int32, N)\n\t\tv   = make([]arm.Int32, N)\n\t\tref = make([]arm.Int32, N)\n\t)\n\tfor i := 0; i < N; i++ {\n\t\tr[i] = arm.Int32(int32(rand.Int()))\n\t\tv[i] = arm.Int32(int32(rand.Int()))\n\t\tif v[i] < 0 {\n\t\t\tref[i] = -v[i]\n\t\t} else {\n\t\t\tref[i] = v[i]\n\t\t}\n\t}\n\tVabsS32N(&r[0], &v[0], N)\n\tif !reflect.DeepEqual(r, ref) {\n\t\tt.Fatal(r)\n\t}\n}\n\nfunc TestVmulqF32N(t *testing.T) {\n\tconst N = 1024 * 16\n\tvar (\n\t\tr   = make([]arm.Float32, N)\n\t\tv1  = make([]arm.Float32, N)\n\t\tv2  = make([]arm.Float32, N)\n\t\tref = make([]arm.Float32, N)\n\t)\n\tfor i := 0; i < N; i++ {\n\t\tv1[i] = arm.Float32(rand.Float32())\n\t\tv2[i] = arm.Float32(rand.Float32())\n\t\tref[i] = v1[i] * v2[i]\n\t}\n\tVmulqF32N(&r[0], &v1[0], &v2[0], N)\n\tif !reflect.DeepEqual(r, ref) {\n\t\tt.Fatal(r)\n\t}\n}\n\n// this benchmark is fully run on C code\nfunc BenchmarkVmulqF32N(b *testing.B) {\n\tconst N = 1024 * 1024\n\tvar (\n\t\tr  = make([]arm.Float32, N)\n\t\tv1 = make([]arm.Float32, N)\n\t\tv2 = make([]arm.Float32, N)\n\t)\n\tb.SetBytes(N * 4)\n\tfor i := int32(0); i < N; i++ {\n\t\tv1[i] = arm.Float32(rand.Float32())\n\t\tv2[i] = arm.Float32(rand.Float32())\n\t}\n\tb.StartTimer()\n\tfor i := 0; i < b.N; i++ {\n\t\tVmulqF32N(&r[0], &v1[0], &v2[0], N)\n\t}\n}\n\n// this benchmark is calling the C code multiple times\nfunc BenchmarkVmulqF32C(b *testing.B) {\n\tconst N = 1024 * 1024\n\tvar (\n\t\tr  = make([]arm.Float32, N)\n\t\tv1 = make([]arm.Float32, N)\n\t\tv2 = make([]arm.Float32, N)\n\t)\n\tb.SetBytes(N * 4)\n\tfor i := int32(0); i < N; i++ {\n\t\tv1[i] = arm.Float32(rand.Float32())\n\t\tv2[i] = arm.Float32(rand.Float32())\n\t}\n\tb.StartTimer()\n\tfor i := 0; i < b.N; i++ {\n\t\tfor j := int32(0); j < N; j += 4 {\n\t\t\tVmulqF32(\n\t\t\t\t(*arm.Float32X4)(unsafe.Pointer(&r[j])),\n\t\t\t\t(*arm.Float32X4)(unsafe.Pointer(&v1[j])),\n\t\t\t\t(*arm.Float32X4)(unsafe.Pointer(&v2[j])),\n\t\t\t)\n\t\t}\n\t}\n}\n\n// this benchmark is Go runtime implementation\nfunc BenchmarkVmulqF32Ref(b *testing.B) {\n\tconst N = 1024 * 1024\n\tvar (\n\t\tr  = make([]arm.Float32, N)\n\t\tv1 = make([]arm.Float32, N)\n\t\tv2 = make([]arm.Float32, N)\n\t)\n\tb.SetBytes(N * 4)\n\tfor i := int32(0); i < N; i++ {\n\t\tv1[i] = arm.Float32(rand.Float32())\n\t\tv2[i] = arm.Float32(rand.Float32())\n\t}\n\tb.StartTimer()\n\tfor i := 0; i < b.N; i++ {\n\t\tfor j := int32(0); j < N; j++ {\n\t\t\tr[j] = v1[j] * v2[j]\n\t\t}\n\t}\n}\n"
  },
  {
    "path": "arm/types.go",
    "content": "package arm\n\n/*\n#include <arm_neon.h>\n*/\nimport \"C\"\n\n// typedef float float32_t;\ntype Float32 = C.float32_t\n\n// typedef __attribute__((neon_vector_type(2))) float32_t float32x2_t;\ntype Float32X2 = C.float32x2_t\n\n// typedef struct float32x2x2_t { float32x2_t val[2];} float32x2x2_t;\ntype Float32X2X2 = C.float32x2x2_t\n\n// typedef __attribute__((neon_vector_type(4))) float32_t float32x4_t;\ntype Float32X4 = C.float32x4_t\n\n// typedef struct float32x4x2_t { float32x4_t val[2];} float32x4x2_t;\ntype Float32X4X2 = C.float32x4x2_t\n\n// typedef double float64_t;\ntype Float64 = C.float64_t\n\n// typedef __attribute__((neon_vector_type(1))) float64_t float64x1_t;\ntype Float64X1 = C.float64x1_t\n\n// typedef __attribute__((neon_vector_type(2))) float64_t float64x2_t;\ntype Float64X2 = C.float64x2_t\n\n// typedef short int16_t;\ntype Int16 = C.int16_t\n\n// typedef __attribute__((neon_vector_type(4))) int16_t int16x4_t;\ntype Int16X4 = C.int16x4_t\n\n// typedef struct int16x4x2_t { int16x4_t val[2];} int16x4x2_t;\ntype Int16X4X2 = C.int16x4x2_t\n\n// typedef __attribute__((neon_vector_type(8))) int16_t int16x8_t;\ntype Int16X8 = C.int16x8_t\n\n// typedef struct int16x8x2_t { int16x8_t val[2];} int16x8x2_t;\ntype Int16X8X2 = C.int16x8x2_t\n\n// typedef int int32_t;\ntype Int32 = C.int32_t\n\n// typedef __attribute__((neon_vector_type(2))) int32_t int32x2_t;\ntype Int32X2 = C.int32x2_t\n\n// typedef struct int32x2x2_t { int32x2_t val[2];} int32x2x2_t;\ntype Int32X2X2 = C.int32x2x2_t\n\n// typedef __attribute__((neon_vector_type(4))) int32_t int32x4_t;\ntype Int32X4 = C.int32x4_t\n\n// typedef struct int32x4x2_t { int32x4_t val[2];} int32x4x2_t;\ntype Int32X4X2 = C.int32x4x2_t\n\n// typedef longlong int64_t;\ntype Int64 = C.int64_t\n\n// typedef __attribute__((neon_vector_type(1))) int64_t int64x1_t;\ntype Int64X1 = C.int64x1_t\n\n// typedef __attribute__((neon_vector_type(2))) int64_t int64x2_t;\ntype Int64X2 = C.int64x2_t\n\n// typedef signed char int8_t;\ntype Int8 = C.int8_t\n\n// typedef __attribute__((neon_vector_type(16))) int8_t int8x16_t;\ntype Int8X16 = C.int8x16_t\n\n// typedef struct int8x16x2_t { int8x16_t val[2];} int8x16x2_t;\ntype Int8X16X2 = C.int8x16x2_t\n\n// typedef struct int8x16x3_t { int8x16_t val[3];} int8x16x3_t;\ntype Int8X16X3 = C.int8x16x3_t\n\n// typedef struct int8x16x4_t { int8x16_t val[4];} int8x16x4_t;\ntype Int8X16X4 = C.int8x16x4_t\n\n// typedef __attribute__((neon_vector_type(8))) int8_t int8x8_t;\ntype Int8X8 = C.int8x8_t\n\n// typedef struct int8x8x2_t { int8x8_t val[2];} int8x8x2_t;\ntype Int8X8X2 = C.int8x8x2_t\n\n// typedef struct int8x8x3_t { int8x8_t val[3];} int8x8x3_t;\ntype Int8X8X3 = C.int8x8x3_t\n\n// typedef struct int8x8x4_t { int8x8_t val[4];} int8x8x4_t;\ntype Int8X8X4 = C.int8x8x4_t\n\n// typedef __uint128_t poly128_t;\ntype Poly128 = C.poly128_t\n\n// typedef uint16_t poly16_t;\ntype Poly16 = C.poly16_t\n\n// typedef __attribute__((neon_polyvector_type(4))) poly16_t poly16x4_t;\ntype Poly16X4 = C.poly16x4_t\n\n// typedef struct poly16x4x2_t { poly16x4_t val[2];} poly16x4x2_t;\ntype Poly16X4X2 = C.poly16x4x2_t\n\n// typedef __attribute__((neon_polyvector_type(8))) poly16_t poly16x8_t;\ntype Poly16X8 = C.poly16x8_t\n\n// typedef struct poly16x8x2_t { poly16x8_t val[2];} poly16x8x2_t;\ntype Poly16X8X2 = C.poly16x8x2_t\n\n// typedef uint64_t poly64_t;\ntype Poly64 = C.poly64_t\n\n// typedef __attribute__((neon_polyvector_type(1))) poly64_t poly64x1_t;\ntype Poly64X1 = C.poly64x1_t\n\n// typedef __attribute__((neon_polyvector_type(2))) poly64_t poly64x2_t;\ntype Poly64X2 = C.poly64x2_t\n\n// typedef uint8_t poly8_t;\ntype Poly8 = C.poly8_t\n\n// typedef __attribute__((neon_polyvector_type(16))) poly8_t poly8x16_t;\ntype Poly8X16 = C.poly8x16_t\n\n// typedef struct poly8x16x2_t { poly8x16_t val[2];} poly8x16x2_t;\ntype Poly8X16X2 = C.poly8x16x2_t\n\n// typedef struct poly8x16x3_t { poly8x16_t val[3];} poly8x16x3_t;\ntype Poly8X16X3 = C.poly8x16x3_t\n\n// typedef struct poly8x16x4_t { poly8x16_t val[4];} poly8x16x4_t;\ntype Poly8X16X4 = C.poly8x16x4_t\n\n// typedef __attribute__((neon_polyvector_type(8))) poly8_t poly8x8_t;\ntype Poly8X8 = C.poly8x8_t\n\n// typedef struct poly8x8x2_t { poly8x8_t val[2];} poly8x8x2_t;\ntype Poly8X8X2 = C.poly8x8x2_t\n\n// typedef struct poly8x8x3_t { poly8x8_t val[3];} poly8x8x3_t;\ntype Poly8X8X3 = C.poly8x8x3_t\n\n// typedef struct poly8x8x4_t { poly8x8_t val[4];} poly8x8x4_t;\ntype Poly8X8X4 = C.poly8x8x4_t\n\n// typedef ushort uint16_t;\ntype Uint16 = C.uint16_t\n\n// typedef __attribute__((neon_vector_type(4))) uint16_t uint16x4_t;\ntype Uint16X4 = C.uint16x4_t\n\n// typedef struct uint16x4x2_t { uint16x4_t val[2];} uint16x4x2_t;\ntype Uint16X4X2 = C.uint16x4x2_t\n\n// typedef __attribute__((neon_vector_type(8))) uint16_t uint16x8_t;\ntype Uint16X8 = C.uint16x8_t\n\n// typedef struct uint16x8x2_t { uint16x8_t val[2];} uint16x8x2_t;\ntype Uint16X8X2 = C.uint16x8x2_t\n\n// typedef uint uint32_t;\ntype Uint32 = C.uint32_t\n\n// typedef __attribute__((neon_vector_type(2))) uint32_t uint32x2_t;\ntype Uint32X2 = C.uint32x2_t\n\n// typedef struct uint32x2x2_t { uint32x2_t val[2];} uint32x2x2_t;\ntype Uint32X2X2 = C.uint32x2x2_t\n\n// typedef __attribute__((neon_vector_type(4))) uint32_t uint32x4_t;\ntype Uint32X4 = C.uint32x4_t\n\n// typedef struct uint32x4x2_t { uint32x4_t val[2];} uint32x4x2_t;\ntype Uint32X4X2 = C.uint32x4x2_t\n\n// typedef ulonglong uint64_t;\ntype Uint64 = C.uint64_t\n\n// typedef __attribute__((neon_vector_type(1))) uint64_t uint64x1_t;\ntype Uint64X1 = C.uint64x1_t\n\n// typedef __attribute__((neon_vector_type(2))) uint64_t uint64x2_t;\ntype Uint64X2 = C.uint64x2_t\n\n// typedef uchar uint8_t;\ntype Uint8 = C.uint8_t\n\n// typedef __attribute__((neon_vector_type(16))) uint8_t uint8x16_t;\ntype Uint8X16 = C.uint8x16_t\n\n// typedef struct uint8x16x2_t { uint8x16_t val[2];} uint8x16x2_t;\ntype Uint8X16X2 = C.uint8x16x2_t\n\n// typedef struct uint8x16x3_t { uint8x16_t val[3];} uint8x16x3_t;\ntype Uint8X16X3 = C.uint8x16x3_t\n\n// typedef struct uint8x16x4_t { uint8x16_t val[4];} uint8x16x4_t;\ntype Uint8X16X4 = C.uint8x16x4_t\n\n// typedef __attribute__((neon_vector_type(8))) uint8_t uint8x8_t;\ntype Uint8X8 = C.uint8x8_t\n\n// typedef struct uint8x8x2_t { uint8x8_t val[2];} uint8x8x2_t;\ntype Uint8X8X2 = C.uint8x8x2_t\n\n// typedef struct uint8x8x3_t { uint8x8_t val[3];} uint8x8x3_t;\ntype Uint8X8X3 = C.uint8x8x3_t\n\n// typedef struct uint8x8x4_t { uint8x8_t val[4];} uint8x8x4_t;\ntype Uint8X8X4 = C.uint8x8x4_t\n"
  },
  {
    "path": "example/neon/main.go",
    "content": "package main\n\nimport (\n\t\"log\"\n\n\t\"github.com/alivanz/go-simd/arm\"\n\t\"github.com/alivanz/go-simd/arm/neon\"\n)\n\nfunc main() {\n\tvar a, b arm.Int8X8\n\tvar add, mul arm.Int16X8\n\tfor i := 0; i < 8; i++ {\n\t\ta[i] = arm.Int8(i)\n\t\tb[i] = arm.Int8(i * i)\n\t}\n\tlog.Printf(\"a = %+v\", b)\n\tlog.Printf(\"b = %+v\", a)\n\tneon.VaddlS8(&add, &a, &b)\n\tneon.VmullS8(&mul, &a, &b)\n\tlog.Printf(\"add = %+v\", add)\n\tlog.Printf(\"mul = %+v\", mul)\n}\n"
  },
  {
    "path": "example/sse2/main.go",
    "content": "package main\n\nimport (\n\t\"log\"\n\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\nfunc main() {\n\ta := x86.MmSetrEpi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)\n\tb := x86.MmSetrEpi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)\n\tadd := x86.MmAddEpi8(a, b)\n\tlog.Print(a)\n\tlog.Print(b)\n\tlog.Print(add)\n}\n"
  },
  {
    "path": "generator/arm/arm.go",
    "content": "package main\n\nimport (\n\t\"encoding/json\"\n\t\"os\"\n\n\t\"github.com/alivanz/go-simd/generator/utils\"\n)\n\ntype ArmIntrinsics []ArmIntrinsic\n\ntype ArmIntrinsic struct {\n\tName        string `json:\"name\"`\n\tDescription string `json:\"description\"`\n}\n\nfunc GetIntrinsics() (ArmIntrinsics, error) {\n\tif err := utils.Download(\n\t\t\"intrinsics.json\",\n\t\t\"https://developer.arm.com/architectures/instruction-sets/intrinsics/data/intrinsics.json\",\n\t); err != nil {\n\t\treturn nil, err\n\t}\n\tf, err := os.Open(\"intrinsics.json\")\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\tdefer f.Close()\n\tvar intrins ArmIntrinsics\n\tif err := json.NewDecoder(f).Decode(&intrins); err != nil {\n\t\treturn nil, err\n\t}\n\treturn intrins, nil\n}\n\nfunc (intrins ArmIntrinsics) Find(s string) *ArmIntrinsic {\n\tfor _, intrin := range intrins {\n\t\tif intrin.Name == s {\n\t\t\treturn &intrin\n\t\t}\n\t}\n\treturn nil\n}\n"
  },
  {
    "path": "generator/arm/main.go",
    "content": "package main\n\nimport (\n\t\"bytes\"\n\t\"fmt\"\n\t\"io\"\n\t\"log\"\n\t\"os\"\n\t\"os/exec\"\n\t\"sort\"\n\t\"strconv\"\n\t\"strings\"\n\n\t\"github.com/alivanz/go-simd/generator/scanner\"\n\t\"github.com/alivanz/go-simd/generator/types\"\n\t\"github.com/alivanz/go-simd/generator/utils\"\n\t\"github.com/alivanz/go-simd/generator/writer\"\n\t\"github.com/iancoleman/strcase\"\n)\n\nfunc Source() ([]byte, error) {\n\tcmd := exec.Command(\"clang\", \"-E\", \"-\")\n\tcmd.Stdin = bytes.NewBufferString(strings.Join(writer.Includes([]string{\n\t\t\"arm_neon.h\",\n\t}), \"\\n\"))\n\tcmd.Stderr = os.Stderr\n\treturn cmd.Output()\n}\n\nfunc main() {\n\tsrc, err := Source()\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\t// write raw\n\tif err := writer.WriteToFile(\"raw.h\", func(w io.Writer) error {\n\t\t_, err := w.Write(src)\n\t\treturn err\n\t}); err != nil {\n\t\tlog.Fatal(err)\n\t}\n\t// scan\n\tresult, err := scanner.Scan(src)\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\t// filter functions\n\tresult.Functions = utils.Filter(result.Functions, func(fn types.Function) bool {\n\t\tif strings.HasPrefix(fn.Name, \"vbf\") {\n\t\t\treturn false\n\t\t}\n\t\tif strings.Contains(fn.Name, \"bf16\") {\n\t\t\treturn false\n\t\t}\n\t\treturn true\n\t})\n\t// filter types\n\tmtype := make(map[string]bool)\n\tfor _, fn := range result.Functions {\n\t\tif fn.Return != nil {\n\t\t\tmtype[fn.Return.Name] = true\n\t\t}\n\t\tfor _, arg := range fn.Args {\n\t\t\tmtype[arg.Name] = true\n\t\t}\n\t}\n\tresult.Types = utils.Filter(result.Types, func(t types.Type) bool {\n\t\treturn mtype[t.Name]\n\t})\n\t// sort functions\n\tsort.Slice(result.Functions, func(i, j int) bool {\n\t\tg0, i0, _ := sortGroup(result.Functions[i].Name)\n\t\tg1, i1, _ := sortGroup(result.Functions[j].Name)\n\t\tif g0 != g1 {\n\t\t\treturn g0 < g1\n\t\t}\n\t\treturn i0 < i1\n\t})\n\t// sort types\n\tsort.Slice(result.Types, func(i, j int) bool {\n\t\treturn result.Types[i].Name < result.Types[j].Name\n\t})\n\t// write types\n\tif err := writer.WriteToFile(\"types.go\", func(w io.Writer) error {\n\t\tif err := writer.Package(w, \"arm\"); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tif err := writer.ImportC(w, func(w io.Writer) error {\n\t\t\t_, err := io.WriteString(w, strings.Join(writer.Includes([]string{\n\t\t\t\t\"arm_neon.h\",\n\t\t\t}), \"\\n\"))\n\t\t\treturn err\n\t\t}); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tif err := writer.Types(w, result.Types); err != nil {\n\t\t\treturn err\n\t\t}\n\t\treturn nil\n\t}); err != nil {\n\t\tlog.Fatal(err)\n\t}\n\t// patch intrinsics info\n\tintrins, err := GetIntrinsics()\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tfor i, fn := range result.Functions {\n\t\tif info := intrins.Find(fn.Name); info != nil {\n\t\t\tresult.Functions[i].Comment = info.Description\n\t\t}\n\t}\n\t// write C\n\tif err := writer.WriteToFile(\"neon/functions.c\", func(w io.Writer) error {\n\t\tif _, err := io.WriteString(w, \"#include <arm_neon.h>\\n\\n\"); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tfor _, fn := range result.Functions {\n\t\t\tif fn.Blacklisted() {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tif err := writer.RewriteC(w, fn); err != nil {\n\t\t\t\treturn err\n\t\t\t}\n\t\t}\n\t\treturn nil\n\t}); err != nil {\n\t\tlog.Fatal(err)\n\t}\n\t// write functions\n\tif err := writer.WriteToFile(\"neon/functions.go\", func(w io.Writer) error {\n\t\tif err := writer.Package(w, \"neon\"); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tif err := writer.Import(w, []string{\n\t\t\t\"github.com/alivanz/go-simd/arm\",\n\t\t}); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tif err := writer.ImportC(w, func(w io.Writer) error {\n\t\t\tif _, err := io.WriteString(w, \"#include <arm_neon.h>\"); err != nil {\n\t\t\t\treturn err\n\t\t\t}\n\t\t\treturn nil\n\t\t}); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tfor _, fn := range result.Functions {\n\t\t\tif fn.Blacklisted() {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\twriter.DeclareFuncBypass(w, fn, \"arm\")\n\t\t}\n\t\treturn nil\n\t}); err != nil {\n\t\tlog.Fatal(err)\n\t}\n\t// C loops\n\tvar (\n\t\tloops = make(map[string]bool)\n\t)\n\tif err := writer.WriteToFile(\"neon/loops.c\", func(w io.Writer) error {\n\t\tif _, err := io.WriteString(w, \"#include <arm_neon.h>\\n\\n\"); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tif _, err := io.WriteString(w, \"#define save(dst, src) *dst = src\\n\"); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tif _, err := io.WriteString(w, \"#define load(src) (*src)\\n\"); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tif _, err := io.WriteString(w, `#define LOOP1(name, rtype, itype, f, set, load, rstep, istep) \\\n    void name(rtype *r, itype *v, int32_t n)                  \\\n    {                                                         \\\n        while (n >= rstep)                                    \\\n        {                                                     \\\n            set(r, f(load(v)));                               \\\n            r += rstep;                                       \\\n            n -= rstep;                                       \\\n            v += istep;                                       \\\n        }                                                     \\\n    }\n\n`); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tfor _, fn := range result.Functions {\n\t\t\tif fn.Blacklisted() {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tif len(fn.Args) != 1 {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tog, o0, o1 := parseType(fn.Return.Name)\n\t\t\tif og == \"\" {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tig, i0, i1 := parseType(fn.Args[0].Name)\n\t\t\tif ig == \"\" {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tif o0 != i0 {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tvar rq, iq string\n\t\t\tif o0*o1 == 128 {\n\t\t\t\trq = \"q\"\n\t\t\t}\n\t\t\tif i0*i1 == 128 {\n\t\t\t\tiq = \"q\"\n\t\t\t}\n\t\t\tif o1 == -1 {\n\t\t\t\to1 = 1\n\t\t\t}\n\t\t\tif i1 == -1 {\n\t\t\t\ti1 = 1\n\t\t\t}\n\t\t\tgroup, _, suffix := sortGroup(fn.Name)\n\t\t\trg, r0, _ := parseType(fn.Return.Name)\n\t\t\tif rg == \"\" {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tio.WriteString(w,\n\t\t\t\tfmt.Sprintf(\n\t\t\t\t\t\"LOOP1(%s, %s, %s, %s, %s, %s, %d, %d)\\n\",\n\t\t\t\t\tstrcase.ToCamel(group+suffix+\"N\"),\n\t\t\t\t\tfmt.Sprintf(\"%s%d_t\", rg, r0),\n\t\t\t\t\tfmt.Sprintf(\"%s%d_t\", ig, i0),\n\t\t\t\t\tfn.Name,\n\t\t\t\t\tsetter(fn.Return.Name, \"save\", fmt.Sprintf(\"vst1%s_%s%d\", rq, typeShort[rg], r0)),\n\t\t\t\t\tsetter(fn.Args[0].Name, \"load\", fmt.Sprintf(\"vld1%s_%s%d\", iq, typeShort[ig], i0)),\n\t\t\t\t\to1,\n\t\t\t\t\ti1,\n\t\t\t\t),\n\t\t\t)\n\t\t\tloops[fn.Name] = true\n\t\t}\n\t\tio.WriteString(w, \"\\n\")\n\t\tif _, err := io.WriteString(w, `#define LOOP2(name, rtype, itype, f, set, load, rstep, istep) \\\n    void name(rtype *r, itype *v1, itype *v2, int32_t n)      \\\n    {                                                         \\\n        while (n >= rstep)                                    \\\n        {                                                     \\\n            set(r, f(load(v1), load(v2)));                    \\\n            r += rstep;                                       \\\n            n -= rstep;                                       \\\n            v1 += istep;                                      \\\n            v2 += istep;                                      \\\n        }                                                     \\\n    }\n\n`); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tfor _, fn := range result.Functions {\n\t\t\tif fn.Blacklisted() {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tif len(fn.Args) != 2 {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tif fn.Args[0].Name != fn.Args[1].Name {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tog, o0, o1 := parseType(fn.Return.Name)\n\t\t\tif og == \"\" {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tig, i0, i1 := parseType(fn.Args[0].Name)\n\t\t\tif ig == \"\" {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tif o0 != i0 {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tvar rq, iq string\n\t\t\tif o0*o1 == 128 {\n\t\t\t\trq = \"q\"\n\t\t\t}\n\t\t\tif i0*i1 == 128 {\n\t\t\t\tiq = \"q\"\n\t\t\t}\n\t\t\tif o1 == -1 {\n\t\t\t\to1 = 1\n\t\t\t}\n\t\t\tif i1 == -1 {\n\t\t\t\ti1 = 1\n\t\t\t}\n\t\t\tgroup, _, suffix := sortGroup(fn.Name)\n\t\t\trg, r0, _ := parseType(fn.Return.Name)\n\t\t\tif rg == \"\" {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\tio.WriteString(w,\n\t\t\t\tfmt.Sprintf(\n\t\t\t\t\t\"LOOP2(%s, %s, %s, %s, %s, %s, %d, %d)\\n\",\n\t\t\t\t\tstrcase.ToCamel(group+suffix+\"N\"),\n\t\t\t\t\tfmt.Sprintf(\"%s%d_t\", rg, r0),\n\t\t\t\t\tfmt.Sprintf(\"%s%d_t\", ig, i0),\n\t\t\t\t\tfn.Name,\n\t\t\t\t\tsetter(fn.Return.Name, \"save\", fmt.Sprintf(\"vst1%s_%s%d\", rq, typeShort[rg], r0)),\n\t\t\t\t\tsetter(fn.Args[0].Name, \"load\", fmt.Sprintf(\"vld1%s_%s%d\", iq, typeShort[ig], i0)),\n\t\t\t\t\to1,\n\t\t\t\t\ti1,\n\t\t\t\t),\n\t\t\t)\n\t\t\tloops[fn.Name] = true\n\t\t}\n\t\treturn nil\n\t}); err != nil {\n\t\tlog.Fatal(err)\n\t}\n\t// loop functions\n\tif err := writer.WriteToFile(\"neon/loops.go\", func(w io.Writer) error {\n\t\tif err := writer.Package(w, \"neon\"); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tif err := writer.Import(w, []string{\n\t\t\t\"github.com/alivanz/go-simd/arm\",\n\t\t}); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tif err := writer.ImportC(w, func(w io.Writer) error {\n\t\t\tif _, err := io.WriteString(w, \"#include <arm_neon.h>\"); err != nil {\n\t\t\t\treturn err\n\t\t\t}\n\t\t\treturn nil\n\t\t}); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tfor _, fn := range result.Functions {\n\t\t\tif !loops[fn.Name] {\n\t\t\t\tcontinue\n\t\t\t}\n\t\t\t// add suffix\n\t\t\tfn.Name += \"N\"\n\t\t\t// write\n\t\t\tfmt.Fprintf(w, \"\\n\")\n\t\t\tif len(fn.Comment) > 0 {\n\t\t\t\tfmt.Fprintf(w, \"// %s\\n\", fn.Comment)\n\t\t\t} else {\n\t\t\t\tfmt.Fprintf(w, \"// %s\\n\", fn.Name)\n\t\t\t}\n\t\t\tfmt.Fprintf(w, \"//\\n\")\n\t\t\tfmt.Fprintf(w, \"//go:linkname %s %s\\n\", strcase.ToCamel(fn.Name), strcase.ToCamel(fn.Name))\n\t\t\tfmt.Fprintf(w, \"//go:noescape\\n\")\n\t\t\tfmt.Fprintf(w, \"func %s(\", strcase.ToCamel(fn.Name))\n\t\t\tif fn.Return != nil {\n\t\t\t\tvar parts = strings.SplitN(strings.TrimSuffix(fn.Return.Name, \"_t\"), \"x\", 2)\n\t\t\t\tfmt.Fprintf(w, \"r *arm.%s, \", strcase.ToCamel(parts[0]))\n\t\t\t}\n\t\t\tfmt.Fprintf(w, \"%s, n int32)\\n\", strings.Join(utils.Transform(fn.Args, func(i int, t types.Type) string {\n\t\t\t\tvar parts = strings.SplitN(strings.TrimSuffix(t.Name, \"_t\"), \"x\", 2)\n\t\t\t\treturn fmt.Sprintf(\"v%d *arm.%s\", i, strcase.ToCamel(parts[0]))\n\t\t\t}), \", \"))\n\t\t}\n\t\treturn nil\n\t}); err != nil {\n\t\tlog.Fatal(err)\n\t}\n}\n\nfunc setter(t string, direct string, def string) string {\n\t_, _, r1 := parseType(t)\n\tif r1 == -1 {\n\t\treturn direct\n\t}\n\treturn def\n}\n\nfunc parseType(t string) (string, int, int) {\n\tvar (\n\t\tgroup string\n\t)\n\tt = strings.TrimSuffix(t, \"_t\")\n\tif strings.HasPrefix(t, \"uint\") {\n\t\tgroup = \"uint\"\n\t\tt = t[4:]\n\t} else if strings.HasPrefix(t, \"int\") {\n\t\tgroup = \"int\"\n\t\tt = t[3:]\n\t} else if strings.HasPrefix(t, \"float\") {\n\t\tgroup = \"float\"\n\t\tt = t[5:]\n\t}\n\tparts := strings.Split(t, \"x\")\n\tswitch len(parts) {\n\tcase 1:\n\t\tw, err := strconv.ParseUint(parts[0], 10, 32)\n\t\tif err != nil {\n\t\t\treturn \"\", 0, 0\n\t\t}\n\t\treturn group, int(w), -1\n\tcase 2:\n\t\tw, err := strconv.ParseUint(parts[0], 10, 32)\n\t\tif err != nil {\n\t\t\treturn \"\", 0, 0\n\t\t}\n\t\th, err := strconv.ParseUint(parts[1], 10, 32)\n\t\tif err != nil {\n\t\t\treturn \"\", 0, 0\n\t\t}\n\t\treturn group, int(w), int(h)\n\t}\n\treturn \"\", 0, 0\n}\n\nvar (\n\ttypeShort = map[string]string{\n\t\t\"uint\":    \"u\",\n\t\t\"uint8\":   \"u8\",\n\t\t\"uint16\":  \"u16\",\n\t\t\"uint32\":  \"u32\",\n\t\t\"uint64\":  \"u64\",\n\t\t\"int\":     \"s\",\n\t\t\"int8\":    \"s8\",\n\t\t\"int16\":   \"s16\",\n\t\t\"int32\":   \"s32\",\n\t\t\"int64\":   \"s64\",\n\t\t\"float\":   \"f\",\n\t\t\"float32\": \"f32\",\n\t\t\"float64\": \"f64\",\n\t}\n)\n"
  },
  {
    "path": "generator/arm/sort.go",
    "content": "package main\n\nimport \"strings\"\n\nvar (\n\tsuffixOrder = []string{\n\t\t\"_s8\",\n\t\t\"_s16\",\n\t\t\"_s32\",\n\t\t\"_s64\",\n\t\t\"_u8\",\n\t\t\"_u16\",\n\t\t\"_u32\",\n\t\t\"_u64\",\n\t\t\"_f32\",\n\t\t\"_f64\",\n\t}\n)\n\nfunc sortGroup(name string) (string, int, string) {\n\tvar (\n\t\tgroup  = name\n\t\tindex  = -1\n\t\tsuffix = \"\"\n\t)\n\tfor i, s := range suffixOrder {\n\t\tif strings.HasSuffix(name, s) {\n\t\t\tgroup = strings.TrimSuffix(name, s)\n\t\t\tindex = i\n\t\t\tsuffix = s\n\t\t}\n\t}\n\treturn group, index, suffix\n}\n"
  },
  {
    "path": "generator/scanner/scan.go",
    "content": "package scanner\n\nimport (\n\t\"bytes\"\n\t\"regexp\"\n\n\t\"github.com/alivanz/go-simd/generator/types\"\n\t\"github.com/alivanz/go-simd/generator/utils\"\n)\n\nvar (\n\tname             = `(\\w+?)`\n\targs             = `([\\w\\s,_]*?)`\n\tattr             = `(?:__attribute__\\(\\(` + `([\\w\\s,\\(\\)\"]+?)` + `\\)\\))`\n\tregTypedefSimple = regexp.MustCompile(`typedef\\s+` + attr + `?[\\w\\s]+? ` + name + `\\s*` + attr + `?;`)\n\tregTypedefStruct = regexp.MustCompile(`typedef struct \\w+? {.+?}\\s*?` + name + `;`)\n\tregFunction      = regexp.MustCompile(name + `\\s+` + attr + `?\\s*` + name + `\\s*\\(` + args + `\\)` + `\\s*` + `{.*?}`)\n\tregArg           = regexp.MustCompile(`\\s*(([\\w\\s]+)\\s(?:\\w+))`)\n\tregWhitespace    = regexp.MustCompile(`\\s+`)\n\tregComma         = regexp.MustCompile(`\\s*,\\s*`)\n\tregLongLong      = regexp.MustCompile(`long\\s+long`)\n\tregULongLong     = regexp.MustCompile(`unsigned\\s+long\\s+long`)\n\tregUlong         = regexp.MustCompile(`unsigned\\s+long`)\n\tregUint          = regexp.MustCompile(`unsigned\\s+int`)\n\tregUshort        = regexp.MustCompile(`unsigned\\s+short`)\n\tregUchar         = regexp.MustCompile(`unsigned\\s+char`)\n)\n\ntype ScanResult struct {\n\tTypes     []types.Type\n\tFunctions []types.Function\n}\n\nfunc Scan(raw []byte) (*ScanResult, error) {\n\tvar buf bytes.Buffer\n\t// filter #\n\tfor _, line := range bytes.Split(raw, []byte(\"\\n\")) {\n\t\tif !bytes.HasPrefix(line, []byte(\"#\")) {\n\t\t\tbuf.Write(line)\n\t\t}\n\t}\n\t// remove duplicates whitespace\n\traw = regWhitespace.ReplaceAll(buf.Bytes(), []byte(\" \"))\n\t// replace known types\n\traw = regLongLong.ReplaceAll(raw, []byte(\"longlong\"))\n\traw = regULongLong.ReplaceAll(raw, []byte(\"ulonglong\"))\n\traw = regUlong.ReplaceAll(raw, []byte(\"ulong\"))\n\traw = regUint.ReplaceAll(raw, []byte(\"uint\"))\n\traw = regUshort.ReplaceAll(raw, []byte(\"ushort\"))\n\traw = regUchar.ReplaceAll(raw, []byte(\"uchar\"))\n\ts := string(raw)\n\tvar result ScanResult\n\t// types\n\tresult.Types = utils.Merge(\n\t\tutils.Transform(\n\t\t\tregTypedefSimple.FindAllStringSubmatch(s, -1),\n\t\t\tfunc(i int, e []string) types.Type {\n\t\t\t\treturn types.Type{\n\t\t\t\t\tName:       e[2],\n\t\t\t\t\tFull:       e[0],\n\t\t\t\t\tAttributes: commaSplit(e[1], e[3]),\n\t\t\t\t}\n\t\t\t},\n\t\t),\n\t\tutils.Transform(\n\t\t\tregTypedefStruct.FindAllStringSubmatch(s, -1),\n\t\t\tfunc(i int, e []string) types.Type {\n\t\t\t\treturn types.Type{\n\t\t\t\t\tName: e[1],\n\t\t\t\t\tFull: e[0],\n\t\t\t\t}\n\t\t\t},\n\t\t),\n\t)\n\t// functions\n\tresult.Functions = utils.Transform(\n\t\tregFunction.FindAllStringSubmatch(s, -1),\n\t\tfunc(i int, match []string) types.Function {\n\t\t\tvar args []types.Type\n\t\t\tfor _, arg := range regArg.FindAllStringSubmatch(match[4], -1) {\n\t\t\t\tif arg[2] == \"void\" {\n\t\t\t\t\tcontinue\n\t\t\t\t}\n\t\t\t\targs = append(args, types.Type{\n\t\t\t\t\tName: arg[2],\n\t\t\t\t\tFull: arg[1],\n\t\t\t\t})\n\t\t\t}\n\t\t\tvar ret *types.Type\n\t\t\tif match[1] != \"void\" {\n\t\t\t\tret = &types.Type{\n\t\t\t\t\tName: match[1],\n\t\t\t\t\tFull: match[1],\n\t\t\t\t}\n\t\t\t}\n\t\t\treturn types.Function{\n\t\t\t\tName:      match[3],\n\t\t\t\tAttribute: match[2],\n\t\t\t\tReturn:    ret,\n\t\t\t\tArgs:      args,\n\t\t\t}\n\t\t},\n\t)\n\treturn &result, nil\n}\n"
  },
  {
    "path": "generator/scanner/scan_test.go",
    "content": "package scanner\n\nimport (\n\t\"reflect\"\n\t\"regexp\"\n\t\"testing\"\n\n\t\"github.com/alivanz/go-simd/generator/types\"\n)\n\nfunc TestAttribute(t *testing.T) {\n\treg := regexp.MustCompile(attr + \";\")\n\tresult := reg.FindAllString(`\n\t\t__attribute__((__vector_size__(32), __aligned__(32)));\n\t\t__attribute__((neon_vector_type(8)));\n\t`, -1)\n\tref := []string{\n\t\t\"__attribute__((__vector_size__(32), __aligned__(32)));\",\n\t\t\"__attribute__((neon_vector_type(8)));\",\n\t}\n\tt.Log(result)\n\tt.Log(ref)\n\tif !reflect.DeepEqual(result, ref) {\n\t\tt.Fail()\n\t}\n}\n\nfunc TestScan(t *testing.T) {\n\tresult, err := Scan([]byte(`\n\t\ttypedef char int8_t;\n\t\ttypedef __attribute__((neon_vector_type(8))) int8_t int8x8_t;\n\t\ttypedef double __m256d __attribute__((__vector_size__(32), __aligned__(32)));\n\t\ttypedef struct int32x4x3_t {\n\t\t\tint32x4_t val[3];\n\t\t} int32x4x3_t;\n\t\tint func(int a, int b, int c) { return a+b+c; }\n\t\tstatic __inline__ __m128 __attribute__((__always_inline__, __nodebug__, __target__(\"mmx, sse\"), __min_vector_width__(128))) _mm_move_ss(__m128 __a, __m128 __b) { __a[0] = __b[0]; return __a; }\n\t\tstatic __inline__ long long __attribute__((__always_inline__, __nodebug__, __target__(\"mmx\"), __min_vector_width__(64))) _mm_cvtm64_si64(__m64 __m) { return 1; }\n\t\tvoid lolo(int a, long long b) { }\n\t\tvoid vovo(void) { }\n\t`))\n\tif err != nil {\n\t\tt.Fatal(err)\n\t}\n\tref := &ScanResult{\n\t\tTypes: []types.Type{\n\t\t\t{\n\t\t\t\tName: \"int8_t\",\n\t\t\t\tFull: \"typedef char int8_t;\",\n\t\t\t},\n\t\t\t{\n\t\t\t\tName:       \"int8x8_t\",\n\t\t\t\tFull:       \"typedef __attribute__((neon_vector_type(8))) int8_t int8x8_t;\",\n\t\t\t\tAttributes: []string{\"neon_vector_type(8)\"},\n\t\t\t},\n\t\t\t{\n\t\t\t\tName:       \"__m256d\",\n\t\t\t\tFull:       \"typedef double __m256d __attribute__((__vector_size__(32), __aligned__(32)));\",\n\t\t\t\tAttributes: []string{\"__vector_size__(32)\", \"__aligned__(32)\"},\n\t\t\t},\n\t\t\t{\n\t\t\t\tName: \"int32x4x3_t\",\n\t\t\t\tFull: \"typedef struct int32x4x3_t { int32x4_t val[3]; } int32x4x3_t;\",\n\t\t\t},\n\t\t},\n\t\tFunctions: []types.Function{\n\t\t\t{\n\t\t\t\tName: \"func\",\n\t\t\t\tReturn: &types.Type{\n\t\t\t\t\tName: \"int\",\n\t\t\t\t\tFull: \"int\",\n\t\t\t\t},\n\t\t\t\tArgs: []types.Type{\n\t\t\t\t\t{\n\t\t\t\t\t\tName: \"int\",\n\t\t\t\t\t\tFull: \"int a\",\n\t\t\t\t\t}, {\n\t\t\t\t\t\tName: \"int\",\n\t\t\t\t\t\tFull: \"int b\",\n\t\t\t\t\t}, {\n\t\t\t\t\t\tName: \"int\",\n\t\t\t\t\t\tFull: \"int c\",\n\t\t\t\t\t},\n\t\t\t\t},\n\t\t\t},\n\t\t\t{\n\t\t\t\tName:      \"_mm_move_ss\",\n\t\t\t\tAttribute: `__always_inline__, __nodebug__, __target__(\"mmx, sse\"), __min_vector_width__(128)`,\n\t\t\t\tReturn: &types.Type{\n\t\t\t\t\tName: \"__m128\",\n\t\t\t\t\tFull: \"__m128\",\n\t\t\t\t},\n\t\t\t\tArgs: []types.Type{\n\t\t\t\t\t{\n\t\t\t\t\t\tName: \"__m128\",\n\t\t\t\t\t\tFull: \"__m128 __a\",\n\t\t\t\t\t},\n\t\t\t\t\t{\n\t\t\t\t\t\tName: \"__m128\",\n\t\t\t\t\t\tFull: \"__m128 __b\",\n\t\t\t\t\t},\n\t\t\t\t},\n\t\t\t},\n\t\t\t{\n\t\t\t\tName:      \"_mm_cvtm64_si64\",\n\t\t\t\tAttribute: `__always_inline__, __nodebug__, __target__(\"mmx\"), __min_vector_width__(64)`,\n\t\t\t\tReturn: &types.Type{\n\t\t\t\t\tName: \"longlong\",\n\t\t\t\t\tFull: \"longlong\",\n\t\t\t\t},\n\t\t\t\tArgs: []types.Type{\n\t\t\t\t\t{\n\t\t\t\t\t\tName: \"__m64\",\n\t\t\t\t\t\tFull: \"__m64 __m\",\n\t\t\t\t\t},\n\t\t\t\t},\n\t\t\t},\n\t\t\t{\n\t\t\t\tName: \"lolo\",\n\t\t\t\tArgs: []types.Type{\n\t\t\t\t\t{\n\t\t\t\t\t\tName: \"int\",\n\t\t\t\t\t\tFull: \"int a\",\n\t\t\t\t\t}, {\n\t\t\t\t\t\tName: \"longlong\",\n\t\t\t\t\t\tFull: \"longlong b\",\n\t\t\t\t\t},\n\t\t\t\t},\n\t\t\t}, {\n\t\t\t\tName: \"vovo\",\n\t\t\t},\n\t\t},\n\t}\n\tt.Logf(\"%+v\", result.Functions[4].Return)\n\tif !reflect.DeepEqual(result.Types, ref.Types) {\n\t\tt.Logf(\"%+v\", result.Types)\n\t\tt.Logf(\"%+v\", ref.Types)\n\t\tt.Fatal()\n\t}\n\tif !reflect.DeepEqual(result, ref) {\n\t\tt.Logf(\"%+v\", result.Functions)\n\t\tt.Logf(\"%+v\", ref.Functions)\n\t\tt.Fatal()\n\t}\n}\n"
  },
  {
    "path": "generator/scanner/util.go",
    "content": "package scanner\n\nfunc commaSplit(ss ...string) []string {\n\tswitch len(ss) {\n\tcase 0:\n\t\treturn nil\n\tcase 1:\n\t\ts := regWhitespace.ReplaceAllString(ss[0], \" \")\n\t\tif len(s) == 0 {\n\t\t\treturn nil\n\t\t}\n\t\treturn regComma.Split(s, -1)\n\tdefault:\n\t\treturn append(commaSplit(ss[0]), commaSplit(ss[1:]...)...)\n\t}\n}\n"
  },
  {
    "path": "generator/types/function.go",
    "content": "package types\n\nimport (\n\t\"regexp\"\n\t\"strings\"\n)\n\ntype Function struct {\n\tName      string\n\tArgs      []Type\n\tReturn    *Type\n\tAttribute string\n\tComment   string\n}\n\ntype Arg struct {\n\tName string\n\tType string\n}\n\nvar (\n\tregTarget = regexp.MustCompile(`__target__\\(\"([a-z0-9\\s,]+)\"\\)`)\n)\n\nfunc (f *Function) Target() string {\n\tmatch := regTarget.FindStringSubmatch(f.Attribute)\n\tif match == nil {\n\t\treturn \"\"\n\t}\n\treturn match[1]\n}\n\nfunc (fn *Function) Blacklisted() bool {\n\tfor _, blacklist := range []string{\n\t\t\"f16\",\n\t\t\"vcmla\",\n\t\t\"__extension__\",\n\t} {\n\t\tif strings.Contains(fn.Name, blacklist) {\n\t\t\treturn true\n\t\t}\n\t}\n\treturn false\n}\n"
  },
  {
    "path": "generator/types/type.go",
    "content": "package types\n\nimport (\n\t\"strings\"\n\n\t\"github.com/iancoleman/strcase\"\n)\n\ntype Type struct {\n\tName       string\n\tFull       string\n\tAttributes []string\n}\n\nfunc (t *Type) C() string {\n\tswitch t.Name {\n\tcase \"longlong\":\n\t\treturn \"long long\"\n\tcase \"ulonglong\":\n\t\treturn \"unsigned long long\"\n\tcase \"ulong\":\n\t\treturn \"unsigned long\"\n\tcase \"uint\":\n\t\treturn \"unsigned int\"\n\tcase \"ushort\":\n\t\treturn \"unsigned short\"\n\tcase \"uchar\":\n\t\treturn \"unsigned char\"\n\tdefault:\n\t\treturn t.Name\n\t}\n}\n\nfunc (t *Type) CGO() string {\n\tif !strings.Contains(t.Name, \" \") {\n\t\treturn t.Name\n\t}\n\ts := strings.Replace(t.Name, \"unsigned\", \"u\", -1)\n\ts = strings.Replace(s, \" \", \"\", -1)\n\treturn s\n}\n\nfunc (t *Type) Go(pkg string) string {\n\ts := strings.TrimSuffix(string(t.Name), \"_t\")\n\ts = strcase.ToCamel(s)\n\tif len(pkg) > 0 {\n\t\treturn pkg + \".\" + s\n\t}\n\treturn s\n}\n\nfunc (t *Type) Blacklisted() bool {\n\tfor _, blacklist := range []string{\n\t\t\"__darwin\",\n\t\t\"__int\",\n\t\t\"__uint\",\n\t\t\"__mm_storeh\",\n\t\t\"_tile\",\n\t\t\"_aligned\",\n\t\t// float16\n\t\t\"float16\",\n\t\t\"f16\",\n\t\t\"v8bf\",\n\t\t\"v8hf\",\n\t\t\"m128h\",\n\t\t\"m128bh\",\n\t\t// windows?\n\t\t\"crt\",\n\t\t\"_pi_\",\n\t\t\"mbstate_t\",\n\t} {\n\t\tif strings.Contains(t.Name, blacklist) {\n\t\t\treturn true\n\t\t}\n\t}\n\treturn false\n}\n"
  },
  {
    "path": "generator/utils/download.go",
    "content": "package utils\n\nimport (\n\t\"io\"\n\t\"net/http\"\n\t\"os\"\n)\n\nfunc Download(dst, url string) error {\n\tif _, err := os.Stat(dst); !os.IsNotExist(err) {\n\t\treturn nil\n\t}\n\tresp, err := http.Get(url)\n\tif err != nil {\n\t\treturn err\n\t}\n\tdefer resp.Body.Close()\n\tf, err := os.OpenFile(dst, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.ModePerm)\n\tif err != nil {\n\t\treturn err\n\t}\n\tdefer f.Close()\n\tif _, err := io.Copy(f, resp.Body); err != nil {\n\t\treturn err\n\t}\n\treturn nil\n}\n"
  },
  {
    "path": "generator/utils/filter.go",
    "content": "package utils\n\nfunc Filter[T any](arr []T, fn func(e T) bool) []T {\n\tout := make([]T, 0, len(arr))\n\tfor _, e := range arr {\n\t\tif !fn(e) {\n\t\t\tcontinue\n\t\t}\n\t\tout = append(out, e)\n\t}\n\treturn out\n}\n"
  },
  {
    "path": "generator/utils/slice.go",
    "content": "package utils\n\nfunc Transform[A, B any](arr []A, fn func(i int, e A) B) []B {\n\tif arr == nil {\n\t\treturn nil\n\t}\n\tout := make([]B, len(arr))\n\tfor i, e := range arr {\n\t\tout[i] = fn(i, e)\n\t}\n\treturn out\n}\n\nfunc Merge[T any](lists ...[]T) []T {\n\tvar out []T\n\tfor _, l := range lists {\n\t\tout = append(out, l...)\n\t}\n\treturn out\n}\n"
  },
  {
    "path": "generator/writer/cgo.go",
    "content": "package writer\n\nimport (\n\t\"fmt\"\n\t\"strings\"\n)\n\nfunc Cflags(flags []string) string {\n\treturn fmt.Sprintf(\"#cgo CFLAGS: %s\", strings.Join(flags, \" \"))\n}\n\nfunc Includes(headers []string) []string {\n\tout := make([]string, len(headers))\n\tfor i, h := range headers {\n\t\tout[i] = fmt.Sprintf(\"#include <%s>\", h)\n\t}\n\treturn out\n}\n"
  },
  {
    "path": "generator/writer/function.go",
    "content": "package writer\n\nimport (\n\t\"fmt\"\n\t\"io\"\n\t\"strings\"\n\n\t\"github.com/alivanz/go-simd/generator/types\"\n\t\"github.com/alivanz/go-simd/generator/utils\"\n\t\"github.com/iancoleman/strcase\"\n)\n\nfunc DeclareFunc(w io.Writer, f types.Function, typePkg string) error {\n\tfmt.Fprintf(w, \"\\n\")\n\tif len(f.Comment) > 0 {\n\t\tfmt.Fprintf(w, \"// %s\\n\", f.Comment)\n\t} else {\n\t\tfmt.Fprintf(w, \"// %s\\n\", f.Name)\n\t}\n\t// if len(f.Attribute) > 0 {\n\t// \tfmt.Fprintf(w, \"// %s\\n\", f.Attribute)\n\t// }\n\tfmt.Fprintf(w, \"func %s(\", strcase.ToCamel(f.Name))\n\tfmt.Fprintf(w, \"%s\", strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string {\n\t\treturn fmt.Sprintf(\"v%d %s\", i, t.Go(typePkg))\n\t}), \", \"))\n\tif f.Return == nil {\n\t\tfmt.Fprintf(w, \") {\\n\")\n\t} else {\n\t\tfmt.Fprintf(w, \") %s {\\n\", f.Return.Go(typePkg))\n\t}\n\tif f.Return == nil {\n\t\tfmt.Fprintf(w, \"\\tC.%s(%s)\\n\", f.Name, strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string {\n\t\t\treturn fmt.Sprintf(\"v%d\", i)\n\t\t}), \", \"))\n\t} else {\n\t\tfmt.Fprintf(w, \"\\treturn C.%s(%s)\\n\", f.Name, strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string {\n\t\t\treturn fmt.Sprintf(\"v%d\", i)\n\t\t}), \", \"))\n\t}\n\tfmt.Fprintf(w, \"}\\n\")\n\treturn nil\n}\n\nfunc DeclareFuncBypass(w io.Writer, f types.Function, typePkg string) error {\n\tfmt.Fprintf(w, \"\\n\")\n\tif len(f.Comment) > 0 {\n\t\tfmt.Fprintf(w, \"// %s\\n\", f.Comment)\n\t} else {\n\t\tfmt.Fprintf(w, \"// %s\\n\", f.Name)\n\t}\n\tfmt.Fprintf(w, \"//\\n\")\n\tfmt.Fprintf(w, \"//go:linkname %s %s\\n\", strcase.ToCamel(f.Name), strcase.ToCamel(f.Name))\n\tfmt.Fprintf(w, \"//go:noescape\\n\")\n\tfmt.Fprintf(w, \"func %s(\", strcase.ToCamel(f.Name))\n\tif f.Return != nil {\n\t\tfmt.Fprintf(w, \"r *%s, \", f.Return.Go(typePkg))\n\t}\n\tfmt.Fprintf(w, \"%s)\\n\", strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string {\n\t\treturn fmt.Sprintf(\"v%d *%s\", i, t.Go(typePkg))\n\t}), \", \"))\n\treturn nil\n}\n\nfunc RewriteC(w io.Writer, f types.Function) error {\n\tvar cargs []string\n\tif f.Return != nil {\n\t\tcargs = append(cargs, fmt.Sprintf(\"%s* r\", f.Return.C()))\n\t}\n\tfor i, t := range f.Args {\n\t\tcargs = append(cargs, fmt.Sprintf(\"%s* v%d\", t.C(), i))\n\t}\n\tfmt.Fprintf(w, \"void %s(%s) { \",\n\t\tstrcase.ToCamel(f.Name),\n\t\tstrings.Join(cargs, \", \"),\n\t)\n\tif f.Return != nil {\n\t\tfmt.Fprintf(w, \"*r = \")\n\t}\n\tfmt.Fprintf(w, \"%s(%s); }\\n\", f.Name, strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string {\n\t\treturn fmt.Sprintf(\"*v%d\", i)\n\t}), \", \"))\n\treturn nil\n}\n"
  },
  {
    "path": "generator/writer/package.go",
    "content": "package writer\n\nimport (\n\t\"fmt\"\n\t\"io\"\n\t\"strings\"\n\n\t\"github.com/alivanz/go-simd/generator/types\"\n)\n\nfunc Package(w io.Writer, pkg string) error {\n\t_, err := fmt.Fprintf(w, \"package %s\\n\", pkg)\n\treturn err\n}\n\nfunc Import(w io.Writer, pkgs []string) error {\n\tif len(pkgs) == 0 {\n\t\treturn nil\n\t}\n\t_, err := fmt.Fprintf(w, \"\\nimport (\\n\\t\\\"%s\\\"\\n)\\n\", strings.Join(pkgs, \"\\\"\\n\\t\\\"\"))\n\treturn err\n}\n\nfunc ImportC(w io.Writer, fn func(w io.Writer) error) error {\n\tif _, err := fmt.Fprintf(w, \"\\n/*\\n\"); err != nil {\n\t\treturn err\n\t}\n\tif err := fn(w); err != nil {\n\t\treturn err\n\t}\n\tif _, err := fmt.Fprintf(w, \"\\n*/\\nimport \\\"C\\\"\\n\"); err != nil {\n\t\treturn err\n\t}\n\treturn nil\n}\n\nfunc Types(w io.Writer, types []types.Type) error {\n\tfor _, t := range types {\n\t\tif t.Blacklisted() {\n\t\t\tcontinue\n\t\t}\n\t\tif err := DeclareType(w, t); err != nil {\n\t\t\treturn err\n\t\t}\n\t}\n\treturn nil\n}\n\nfunc Funcs(w io.Writer, funcs []types.Function, typePkg string) error {\n\tfor _, fn := range funcs {\n\t\tif fn.Blacklisted() {\n\t\t\tcontinue\n\t\t}\n\t\tif err := DeclareFunc(w, fn, typePkg); err != nil {\n\t\t\treturn err\n\t\t}\n\t}\n\treturn nil\n}\n"
  },
  {
    "path": "generator/writer/package_test.go",
    "content": "package writer\n\nimport (\n\t\"bytes\"\n\t\"io\"\n\t\"strings\"\n\t\"testing\"\n)\n\nfunc TestPackage(t *testing.T) {\n\tvar buf bytes.Buffer\n\tPackage(&buf, \"abc\")\n\tif buf.String() != \"package abc\\n\" {\n\t\tt.Fatal(buf.String())\n\t}\n}\n\nfunc TestImport(t *testing.T) {\n\tvar buf bytes.Buffer\n\tImport(&buf, []string{\n\t\t\"pkg1\",\n\t\t\"pkg2\",\n\t\t\"pkg3\",\n\t})\n\tif buf.String() != `\nimport (\n\t\"pkg1\"\n\t\"pkg2\"\n\t\"pkg3\"\n)\n` {\n\t\tt.Fatal(buf.String())\n\t}\n}\n\nfunc TestImportC(t *testing.T) {\n\tvar buf bytes.Buffer\n\tImportC(&buf, func(w io.Writer) error {\n\t\tio.WriteString(w, strings.Join([]string{\n\t\t\t`#include <abc.h>`,\n\t\t\t`#include <def.h>`,\n\t\t}, \"\\n\"))\n\t\treturn nil\n\t})\n\tref := `\n/*\n#include <abc.h>\n#include <def.h>\n*/\nimport \"C\"\n`\n\tif buf.String() != ref {\n\t\tt.Fatal(buf.String())\n\t}\n}\n"
  },
  {
    "path": "generator/writer/type.go",
    "content": "package writer\n\nimport (\n\t\"fmt\"\n\t\"io\"\n\n\t\"github.com/alivanz/go-simd/generator/types\"\n)\n\nfunc DeclareType(w io.Writer, t types.Type) error {\n\tvar err error\n\tif len(t.Full) > 0 {\n\t\t_, err = fmt.Fprintf(w, \"\\n// %s\\ntype %s = C.%s\\n\", t.Full, t.Go(\"\"), t.CGO())\n\t} else {\n\t\t_, err = fmt.Fprintf(w, \"\\ntype %s = C.%s\\n\", t.Go(\"\"), t.CGO())\n\t}\n\treturn err\n}\n"
  },
  {
    "path": "generator/writer/writer.go",
    "content": "package writer\n\nimport (\n\t\"io\"\n\t\"os\"\n\t\"path/filepath\"\n)\n\nfunc WriteToFile(dst string, fn func(w io.Writer) error) error {\n\tif len(dst) == 0 {\n\t\treturn nil\n\t}\n\tdst, err := filepath.Abs(dst)\n\tif err != nil {\n\t\treturn err\n\t}\n\tif err := os.MkdirAll(filepath.Dir(dst), os.ModePerm); err != nil {\n\t\treturn err\n\t}\n\tf, err := os.OpenFile(dst, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.ModePerm)\n\tif err != nil {\n\t\treturn err\n\t}\n\tdefer f.Close()\n\treturn fn(f)\n}\n"
  },
  {
    "path": "generator/x86/info.go",
    "content": "package main\n\nimport (\n\t\"bytes\"\n\t\"io/ioutil\"\n\t\"regexp\"\n\n\t\"github.com/alivanz/go-simd/generator/utils\"\n)\n\ntype Intrinsic struct {\n\tName        string\n\tCpuID       string\n\tDescription string\n\tOperation   string\n}\n\nvar (\n\tregIntrinsic   = regexp.MustCompile(`<intrinsic .+?</intrinsic>`)\n\tregName        = regexp.MustCompile(`name=\"(.+?)\"`)\n\tregDescription = regexp.MustCompile(`<description>(.+?)</description>`)\n\tregCpuID       = regexp.MustCompile(`<CPUID>(.+?)</CPUID>`)\n)\n\nfunc GetIntrinsic() ([]*Intrinsic, error) {\n\tif err := utils.Download(\n\t\t\"data.xml\",\n\t\t\"https://www.intel.com/content/dam/develop/public/us/en/include/intrinsics-guide/data-3-6-6.xml\",\n\t); err != nil {\n\t\treturn nil, err\n\t}\n\traw, err := ioutil.ReadFile(\"data.xml\")\n\tif err != nil {\n\t\treturn nil, err\n\t}\n\traw = bytes.ReplaceAll(raw, []byte(\"\\n\"), []byte(\"\"))\n\tintrins := regIntrinsic.FindAll(raw, -1)\n\tout := make([]*Intrinsic, len(intrins))\n\tfor i, part := range intrins {\n\t\tvar intrin Intrinsic\n\t\tif match := regName.FindSubmatch(part); match != nil {\n\t\t\tintrin.Name = string(match[1])\n\t\t}\n\t\tif match := regDescription.FindSubmatch(part); match != nil {\n\t\t\tintrin.Description = string(match[1])\n\t\t}\n\t\tif match := regCpuID.FindSubmatch(part); match != nil {\n\t\t\tintrin.CpuID = string(match[1])\n\t\t}\n\t\tout[i] = &intrin\n\t}\n\treturn out, nil\n}\n"
  },
  {
    "path": "generator/x86/main.go",
    "content": "package main\n\nimport (\n\t\"bytes\"\n\t\"fmt\"\n\t\"io\"\n\t\"log\"\n\t\"os\"\n\t\"os/exec\"\n\t\"regexp\"\n\t\"strings\"\n\n\t\"github.com/alivanz/go-simd/generator/scanner\"\n\t\"github.com/alivanz/go-simd/generator/types\"\n\t\"github.com/alivanz/go-simd/generator/utils\"\n\t\"github.com/alivanz/go-simd/generator/writer\"\n)\n\nvar (\n\tregComma = regexp.MustCompile(`\\s*,\\s*`)\n)\n\nfunc main() {\n\t// generate\n\tcmd := exec.Command(\"clang\", \"-march=native\", \"-E\", \"-\")\n\tcmd.Stdin = bytes.NewBufferString(strings.Join([]string{\n\t\t\"#include <immintrin.h>\",\n\t}, \"\\n\"))\n\tcmd.Stderr = os.Stderr\n\tsrc, err := cmd.Output()\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\t// raw\n\tif err := writer.WriteToFile(\"raw.h\", func(w io.Writer) error {\n\t\t_, err := w.Write(src)\n\t\treturn err\n\t}); err != nil {\n\t\tlog.Fatal(err)\n\t}\n\t// scan\n\tresult, err := scanner.Scan(src)\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\t// filter functions\n\tmfunc := make(map[string]bool)\n\tresult.Functions = utils.Filter(result.Functions, func(fn types.Function) bool {\n\t\tif mfunc[fn.Name] {\n\t\t\treturn false\n\t\t}\n\t\tif len(fn.Target()) == 0 {\n\t\t\treturn false\n\t\t}\n\t\tmfunc[fn.Name] = true\n\t\treturn true\n\t})\n\t// filter types\n\tmtype := make(map[string]bool)\n\tfor _, fn := range result.Functions {\n\t\tif fn.Return != nil {\n\t\t\tmtype[fn.Return.Name] = true\n\t\t\t// append type\n\t\t\tresult.Types = append(result.Types, *fn.Return)\n\t\t}\n\t\tfor _, arg := range fn.Args {\n\t\t\tmtype[arg.Name] = true\n\t\t\tresult.Types = append(result.Types, arg)\n\t\t}\n\t}\n\tresult.Types = utils.Filter(result.Types, func(t types.Type) bool {\n\t\tif !mtype[t.Name] {\n\t\t\treturn false\n\t\t}\n\t\t// remove dup\n\t\tdelete(mtype, t.Name)\n\t\treturn true\n\t})\n\t// types\n\tif err := writer.WriteToFile(\"types.go\", func(w io.Writer) error {\n\t\tif err := writer.Package(w, \"x86\"); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tif err := writer.ImportC(w, func(w io.Writer) error {\n\t\t\tfmt.Fprintf(w, \"#include <immintrin.h>\")\n\t\t\treturn err\n\t\t}); err != nil {\n\t\t\treturn err\n\t\t}\n\t\tif err := writer.Types(w, result.Types); err != nil {\n\t\t\treturn err\n\t\t}\n\t\treturn nil\n\t}); err != nil {\n\t\tlog.Fatal(err)\n\t}\n\t// patch funcs\n\tintrins, err := GetIntrinsic()\n\tif err != nil {\n\t\tlog.Fatal(err)\n\t}\n\tlog.Printf(\"%+v\", intrins[0])\n\tmintrin := make(map[string]*Intrinsic)\n\tfor _, intrin := range intrins {\n\t\tmintrin[intrin.Name] = intrin\n\t}\n\tlog.Printf(\"%+v\", mintrin[\"_mm_fmsubadd_pd\"])\n\tfor i, fn := range result.Functions {\n\t\tif intrin, found := mintrin[fn.Name]; found {\n\t\t\tresult.Functions[i].Comment = intrin.Description\n\t\t}\n\t}\n\t// group funcs by target\n\tmf := make(map[string][]types.Function)\n\tfor _, fn := range result.Functions {\n\t\ttarget := fn.Target()\n\t\tmf[target] = append(mf[target], fn)\n\t}\n\t// funcs\n\tfor target, funcs := range mf {\n\t\ttarget = regComma.ReplaceAllString(target, \"_\")\n\t\tcname := fmt.Sprintf(\"%s/functions.c\", target)\n\t\tfname := fmt.Sprintf(\"%s/functions.go\", target)\n\t\t// write C\n\t\tif err := writer.WriteToFile(cname, func(w io.Writer) error {\n\t\t\tif _, err := io.WriteString(w, \"#include <immintrin.h>\\n\\n\"); err != nil {\n\t\t\t\treturn err\n\t\t\t}\n\t\t\tfor _, fn := range funcs {\n\t\t\t\tif fn.Blacklisted() {\n\t\t\t\t\tcontinue\n\t\t\t\t}\n\t\t\t\tif err := writer.RewriteC(w, fn); err != nil {\n\t\t\t\t\treturn err\n\t\t\t\t}\n\t\t\t}\n\t\t\treturn nil\n\t\t}); err != nil {\n\t\t\tlog.Fatal(err)\n\t\t}\n\t\t// write Go\n\t\tif err := writer.WriteToFile(fname, func(w io.Writer) error {\n\t\t\tif err := writer.Package(w, target); err != nil {\n\t\t\t\treturn err\n\t\t\t}\n\t\t\tif err := writer.Import(w, []string{\n\t\t\t\t\"github.com/alivanz/go-simd/x86\",\n\t\t\t}); err != nil {\n\t\t\t\treturn err\n\t\t\t}\n\t\t\tif err := writer.ImportC(w, func(w io.Writer) error {\n\t\t\t\tfeats := strings.Split(target, \"_\")\n\t\t\t\tif len(feats) > 0 {\n\t\t\t\t\tfmt.Fprintf(w, \"#cgo CFLAGS: %s\\n\", strings.Join(utils.Transform(feats, func(i int, feat string) string {\n\t\t\t\t\t\treturn \"-m\" + feat\n\t\t\t\t\t}), \" \"))\n\t\t\t\t}\n\t\t\t\tfmt.Fprintf(w, \"#include <immintrin.h>\")\n\t\t\t\treturn err\n\t\t\t}); err != nil {\n\t\t\t\treturn err\n\t\t\t}\n\t\t\tfor _, fn := range funcs {\n\t\t\t\tif fn.Blacklisted() {\n\t\t\t\t\tcontinue\n\t\t\t\t}\n\t\t\t\tif err := writer.DeclareFuncBypass(w, fn, \"x86\"); err != nil {\n\t\t\t\t\treturn err\n\t\t\t\t}\n\t\t\t}\n\t\t\treturn nil\n\t\t}); err != nil {\n\t\t\tlog.Fatal(err)\n\t\t}\n\t}\n}\n"
  },
  {
    "path": "go.mod",
    "content": "module github.com/alivanz/go-simd\n\ngo 1.20\n\nrequire github.com/iancoleman/strcase v0.2.0\n"
  },
  {
    "path": "go.sum",
    "content": "github.com/iancoleman/strcase v0.2.0 h1:05I4QRnGpI0m37iZQRuskXh+w77mr6Z41lwQzuHLwW0=\ngithub.com/iancoleman/strcase v0.2.0/go.mod h1:iwCmte+B7n89clKwxIoIXy/HfoL7AsD47ZCWhYzw7ho=\n"
  },
  {
    "path": "x86/aes/functions.c",
    "content": "#include <immintrin.h>\n\nvoid MmAesencSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_aesenc_si128(*v0, *v1); }\nvoid MmAesenclastSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_aesenclast_si128(*v0, *v1); }\nvoid MmAesdecSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_aesdec_si128(*v0, *v1); }\nvoid MmAesdeclastSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_aesdeclast_si128(*v0, *v1); }\nvoid MmAesimcSi128(__m128i* r, __m128i* v0) { *r = _mm_aesimc_si128(*v0); }\n"
  },
  {
    "path": "x86/aes/functions.go",
    "content": "package aes\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -maes\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Perform one round of an AES encryption flow on data (state) in \"a\" using the round key in \"RoundKey\", and store the result in \"dst\".\"\n//\n//go:linkname MmAesencSi128 MmAesencSi128\n//go:noescape\nfunc MmAesencSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Perform the last round of an AES encryption flow on data (state) in \"a\" using the round key in \"RoundKey\", and store the result in \"dst\".\"\n//\n//go:linkname MmAesenclastSi128 MmAesenclastSi128\n//go:noescape\nfunc MmAesenclastSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Perform one round of an AES decryption flow on data (state) in \"a\" using the round key in \"RoundKey\", and store the result in \"dst\".\n//\n//go:linkname MmAesdecSi128 MmAesdecSi128\n//go:noescape\nfunc MmAesdecSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Perform the last round of an AES decryption flow on data (state) in \"a\" using the round key in \"RoundKey\", and store the result in \"dst\".\n//\n//go:linkname MmAesdeclastSi128 MmAesdeclastSi128\n//go:noescape\nfunc MmAesdeclastSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Perform the InvMixColumns transformation on \"a\" and store the result in \"dst\".\n//\n//go:linkname MmAesimcSi128 MmAesimcSi128\n//go:noescape\nfunc MmAesimcSi128(r *x86.M128I, v0 *x86.M128I)\n"
  },
  {
    "path": "x86/avx/functions.c",
    "content": "#include <immintrin.h>\n\nvoid Mm256AddPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_add_pd(*v0, *v1); }\nvoid Mm256AddPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_add_ps(*v0, *v1); }\nvoid Mm256SubPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_sub_pd(*v0, *v1); }\nvoid Mm256SubPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_sub_ps(*v0, *v1); }\nvoid Mm256AddsubPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_addsub_pd(*v0, *v1); }\nvoid Mm256AddsubPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_addsub_ps(*v0, *v1); }\nvoid Mm256DivPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_div_pd(*v0, *v1); }\nvoid Mm256DivPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_div_ps(*v0, *v1); }\nvoid Mm256MaxPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_max_pd(*v0, *v1); }\nvoid Mm256MaxPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_max_ps(*v0, *v1); }\nvoid Mm256MinPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_min_pd(*v0, *v1); }\nvoid Mm256MinPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_min_ps(*v0, *v1); }\nvoid Mm256MulPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_mul_pd(*v0, *v1); }\nvoid Mm256MulPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_mul_ps(*v0, *v1); }\nvoid Mm256SqrtPd(__m256d* r, __m256d* v0) { *r = _mm256_sqrt_pd(*v0); }\nvoid Mm256SqrtPs(__m256* r, __m256* v0) { *r = _mm256_sqrt_ps(*v0); }\nvoid Mm256RsqrtPs(__m256* r, __m256* v0) { *r = _mm256_rsqrt_ps(*v0); }\nvoid Mm256RcpPs(__m256* r, __m256* v0) { *r = _mm256_rcp_ps(*v0); }\nvoid Mm256AndPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_and_pd(*v0, *v1); }\nvoid Mm256AndPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_and_ps(*v0, *v1); }\nvoid Mm256AndnotPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_andnot_pd(*v0, *v1); }\nvoid Mm256AndnotPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_andnot_ps(*v0, *v1); }\nvoid Mm256OrPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_or_pd(*v0, *v1); }\nvoid Mm256OrPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_or_ps(*v0, *v1); }\nvoid Mm256XorPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_xor_pd(*v0, *v1); }\nvoid Mm256XorPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_xor_ps(*v0, *v1); }\nvoid Mm256HaddPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_hadd_pd(*v0, *v1); }\nvoid Mm256HaddPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_hadd_ps(*v0, *v1); }\nvoid Mm256HsubPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_hsub_pd(*v0, *v1); }\nvoid Mm256HsubPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_hsub_ps(*v0, *v1); }\nvoid MmPermutevarPd(__m128d* r, __m128d* v0, __m128i* v1) { *r = _mm_permutevar_pd(*v0, *v1); }\nvoid Mm256PermutevarPd(__m256d* r, __m256d* v0, __m256i* v1) { *r = _mm256_permutevar_pd(*v0, *v1); }\nvoid MmPermutevarPs(__m128* r, __m128* v0, __m128i* v1) { *r = _mm_permutevar_ps(*v0, *v1); }\nvoid Mm256PermutevarPs(__m256* r, __m256* v0, __m256i* v1) { *r = _mm256_permutevar_ps(*v0, *v1); }\nvoid Mm256BlendvPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_blendv_pd(*v0, *v1, *v2); }\nvoid Mm256BlendvPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_blendv_ps(*v0, *v1, *v2); }\nvoid Mm256Cvtepi32Pd(__m256d* r, __m128i* v0) { *r = _mm256_cvtepi32_pd(*v0); }\nvoid Mm256Cvtepi32Ps(__m256* r, __m256i* v0) { *r = _mm256_cvtepi32_ps(*v0); }\nvoid Mm256CvtpdPs(__m128* r, __m256d* v0) { *r = _mm256_cvtpd_ps(*v0); }\nvoid Mm256CvtpsEpi32(__m256i* r, __m256* v0) { *r = _mm256_cvtps_epi32(*v0); }\nvoid Mm256CvtpsPd(__m256d* r, __m128* v0) { *r = _mm256_cvtps_pd(*v0); }\nvoid Mm256CvttpdEpi32(__m128i* r, __m256d* v0) { *r = _mm256_cvttpd_epi32(*v0); }\nvoid Mm256CvtpdEpi32(__m128i* r, __m256d* v0) { *r = _mm256_cvtpd_epi32(*v0); }\nvoid Mm256CvttpsEpi32(__m256i* r, __m256* v0) { *r = _mm256_cvttps_epi32(*v0); }\nvoid Mm256CvtsdF64(double* r, __m256d* v0) { *r = _mm256_cvtsd_f64(*v0); }\nvoid Mm256Cvtsi256Si32(int* r, __m256i* v0) { *r = _mm256_cvtsi256_si32(*v0); }\nvoid Mm256CvtssF32(float* r, __m256* v0) { *r = _mm256_cvtss_f32(*v0); }\nvoid Mm256MovehdupPs(__m256* r, __m256* v0) { *r = _mm256_movehdup_ps(*v0); }\nvoid Mm256MoveldupPs(__m256* r, __m256* v0) { *r = _mm256_moveldup_ps(*v0); }\nvoid Mm256MovedupPd(__m256d* r, __m256d* v0) { *r = _mm256_movedup_pd(*v0); }\nvoid Mm256UnpackhiPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_unpackhi_pd(*v0, *v1); }\nvoid Mm256UnpackloPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_unpacklo_pd(*v0, *v1); }\nvoid Mm256UnpackhiPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_unpackhi_ps(*v0, *v1); }\nvoid Mm256UnpackloPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_unpacklo_ps(*v0, *v1); }\nvoid MmTestzPd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_testz_pd(*v0, *v1); }\nvoid MmTestcPd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_testc_pd(*v0, *v1); }\nvoid MmTestnzcPd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_testnzc_pd(*v0, *v1); }\nvoid MmTestzPs(int* r, __m128* v0, __m128* v1) { *r = _mm_testz_ps(*v0, *v1); }\nvoid MmTestcPs(int* r, __m128* v0, __m128* v1) { *r = _mm_testc_ps(*v0, *v1); }\nvoid MmTestnzcPs(int* r, __m128* v0, __m128* v1) { *r = _mm_testnzc_ps(*v0, *v1); }\nvoid Mm256TestzPd(int* r, __m256d* v0, __m256d* v1) { *r = _mm256_testz_pd(*v0, *v1); }\nvoid Mm256TestcPd(int* r, __m256d* v0, __m256d* v1) { *r = _mm256_testc_pd(*v0, *v1); }\nvoid Mm256TestnzcPd(int* r, __m256d* v0, __m256d* v1) { *r = _mm256_testnzc_pd(*v0, *v1); }\nvoid Mm256TestzPs(int* r, __m256* v0, __m256* v1) { *r = _mm256_testz_ps(*v0, *v1); }\nvoid Mm256TestcPs(int* r, __m256* v0, __m256* v1) { *r = _mm256_testc_ps(*v0, *v1); }\nvoid Mm256TestnzcPs(int* r, __m256* v0, __m256* v1) { *r = _mm256_testnzc_ps(*v0, *v1); }\nvoid Mm256TestzSi256(int* r, __m256i* v0, __m256i* v1) { *r = _mm256_testz_si256(*v0, *v1); }\nvoid Mm256TestcSi256(int* r, __m256i* v0, __m256i* v1) { *r = _mm256_testc_si256(*v0, *v1); }\nvoid Mm256TestnzcSi256(int* r, __m256i* v0, __m256i* v1) { *r = _mm256_testnzc_si256(*v0, *v1); }\nvoid Mm256MovemaskPd(int* r, __m256d* v0) { *r = _mm256_movemask_pd(*v0); }\nvoid Mm256MovemaskPs(int* r, __m256* v0) { *r = _mm256_movemask_ps(*v0); }\nvoid Mm256Zeroall() { _mm256_zeroall(); }\nvoid Mm256Zeroupper() { _mm256_zeroupper(); }\nvoid Mm256UndefinedPd(__m256d* r) { *r = _mm256_undefined_pd(); }\nvoid Mm256UndefinedPs(__m256* r) { *r = _mm256_undefined_ps(); }\nvoid Mm256UndefinedSi256(__m256i* r) { *r = _mm256_undefined_si256(); }\nvoid Mm256SetPd(__m256d* r, double* v0, double* v1, double* v2, double* v3) { *r = _mm256_set_pd(*v0, *v1, *v2, *v3); }\nvoid Mm256SetPs(__m256* r, float* v0, float* v1, float* v2, float* v3, float* v4, float* v5, float* v6, float* v7) { *r = _mm256_set_ps(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }\nvoid Mm256SetEpi32(__m256i* r, int* v0, int* v1, int* v2, int* v3, int* v4, int* v5, int* v6, int* v7) { *r = _mm256_set_epi32(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }\nvoid Mm256SetEpi16(__m256i* r, short* v0, short* v1, short* v2, short* v3, short* v4, short* v5, short* v6, short* v7, short* v8, short* v9, short* v10, short* v11, short* v12, short* v13, short* v14, short* v15) { *r = _mm256_set_epi16(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15); }\nvoid Mm256SetEpi8(__m256i* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7, char* v8, char* v9, char* v10, char* v11, char* v12, char* v13, char* v14, char* v15, char* v16, char* v17, char* v18, char* v19, char* v20, char* v21, char* v22, char* v23, char* v24, char* v25, char* v26, char* v27, char* v28, char* v29, char* v30, char* v31) { *r = _mm256_set_epi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15, *v16, *v17, *v18, *v19, *v20, *v21, *v22, *v23, *v24, *v25, *v26, *v27, *v28, *v29, *v30, *v31); }\nvoid Mm256SetEpi64X(__m256i* r, long long* v0, long long* v1, long long* v2, long long* v3) { *r = _mm256_set_epi64x(*v0, *v1, *v2, *v3); }\nvoid Mm256SetrPd(__m256d* r, double* v0, double* v1, double* v2, double* v3) { *r = _mm256_setr_pd(*v0, *v1, *v2, *v3); }\nvoid Mm256SetrPs(__m256* r, float* v0, float* v1, float* v2, float* v3, float* v4, float* v5, float* v6, float* v7) { *r = _mm256_setr_ps(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }\nvoid Mm256SetrEpi32(__m256i* r, int* v0, int* v1, int* v2, int* v3, int* v4, int* v5, int* v6, int* v7) { *r = _mm256_setr_epi32(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }\nvoid Mm256SetrEpi16(__m256i* r, short* v0, short* v1, short* v2, short* v3, short* v4, short* v5, short* v6, short* v7, short* v8, short* v9, short* v10, short* v11, short* v12, short* v13, short* v14, short* v15) { *r = _mm256_setr_epi16(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15); }\nvoid Mm256SetrEpi8(__m256i* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7, char* v8, char* v9, char* v10, char* v11, char* v12, char* v13, char* v14, char* v15, char* v16, char* v17, char* v18, char* v19, char* v20, char* v21, char* v22, char* v23, char* v24, char* v25, char* v26, char* v27, char* v28, char* v29, char* v30, char* v31) { *r = _mm256_setr_epi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15, *v16, *v17, *v18, *v19, *v20, *v21, *v22, *v23, *v24, *v25, *v26, *v27, *v28, *v29, *v30, *v31); }\nvoid Mm256SetrEpi64X(__m256i* r, long long* v0, long long* v1, long long* v2, long long* v3) { *r = _mm256_setr_epi64x(*v0, *v1, *v2, *v3); }\nvoid Mm256Set1Pd(__m256d* r, double* v0) { *r = _mm256_set1_pd(*v0); }\nvoid Mm256Set1Ps(__m256* r, float* v0) { *r = _mm256_set1_ps(*v0); }\nvoid Mm256Set1Epi32(__m256i* r, int* v0) { *r = _mm256_set1_epi32(*v0); }\nvoid Mm256Set1Epi16(__m256i* r, short* v0) { *r = _mm256_set1_epi16(*v0); }\nvoid Mm256Set1Epi8(__m256i* r, char* v0) { *r = _mm256_set1_epi8(*v0); }\nvoid Mm256Set1Epi64X(__m256i* r, long long* v0) { *r = _mm256_set1_epi64x(*v0); }\nvoid Mm256SetzeroPd(__m256d* r) { *r = _mm256_setzero_pd(); }\nvoid Mm256SetzeroPs(__m256* r) { *r = _mm256_setzero_ps(); }\nvoid Mm256SetzeroSi256(__m256i* r) { *r = _mm256_setzero_si256(); }\nvoid Mm256CastpdPs(__m256* r, __m256d* v0) { *r = _mm256_castpd_ps(*v0); }\nvoid Mm256CastpdSi256(__m256i* r, __m256d* v0) { *r = _mm256_castpd_si256(*v0); }\nvoid Mm256CastpsPd(__m256d* r, __m256* v0) { *r = _mm256_castps_pd(*v0); }\nvoid Mm256CastpsSi256(__m256i* r, __m256* v0) { *r = _mm256_castps_si256(*v0); }\nvoid Mm256Castsi256Ps(__m256* r, __m256i* v0) { *r = _mm256_castsi256_ps(*v0); }\nvoid Mm256Castsi256Pd(__m256d* r, __m256i* v0) { *r = _mm256_castsi256_pd(*v0); }\nvoid Mm256Castpd256Pd128(__m128d* r, __m256d* v0) { *r = _mm256_castpd256_pd128(*v0); }\nvoid Mm256Castps256Ps128(__m128* r, __m256* v0) { *r = _mm256_castps256_ps128(*v0); }\nvoid Mm256Castsi256Si128(__m128i* r, __m256i* v0) { *r = _mm256_castsi256_si128(*v0); }\nvoid Mm256Castpd128Pd256(__m256d* r, __m128d* v0) { *r = _mm256_castpd128_pd256(*v0); }\nvoid Mm256Castps128Ps256(__m256* r, __m128* v0) { *r = _mm256_castps128_ps256(*v0); }\nvoid Mm256Castsi128Si256(__m256i* r, __m128i* v0) { *r = _mm256_castsi128_si256(*v0); }\nvoid Mm256Zextpd128Pd256(__m256d* r, __m128d* v0) { *r = _mm256_zextpd128_pd256(*v0); }\nvoid Mm256Zextps128Ps256(__m256* r, __m128* v0) { *r = _mm256_zextps128_ps256(*v0); }\nvoid Mm256Zextsi128Si256(__m256i* r, __m128i* v0) { *r = _mm256_zextsi128_si256(*v0); }\nvoid Mm256SetM128(__m256* r, __m128* v0, __m128* v1) { *r = _mm256_set_m128(*v0, *v1); }\nvoid Mm256SetM128D(__m256d* r, __m128d* v0, __m128d* v1) { *r = _mm256_set_m128d(*v0, *v1); }\nvoid Mm256SetM128I(__m256i* r, __m128i* v0, __m128i* v1) { *r = _mm256_set_m128i(*v0, *v1); }\nvoid Mm256SetrM128(__m256* r, __m128* v0, __m128* v1) { *r = _mm256_setr_m128(*v0, *v1); }\nvoid Mm256SetrM128D(__m256d* r, __m128d* v0, __m128d* v1) { *r = _mm256_setr_m128d(*v0, *v1); }\nvoid Mm256SetrM128I(__m256i* r, __m128i* v0, __m128i* v1) { *r = _mm256_setr_m128i(*v0, *v1); }\n"
  },
  {
    "path": "x86/avx/functions.go",
    "content": "package avx\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mavx\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Add packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256AddPd Mm256AddPd\n//go:noescape\nfunc Mm256AddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Add packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256AddPs Mm256AddPs\n//go:noescape\nfunc Mm256AddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Subtract packed double-precision (64-bit) floating-point elements in \"b\" from packed double-precision (64-bit) floating-point elements in \"a\", and store the results in \"dst\".\n//\n//go:linkname Mm256SubPd Mm256SubPd\n//go:noescape\nfunc Mm256SubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Subtract packed single-precision (32-bit) floating-point elements in \"b\" from packed single-precision (32-bit) floating-point elements in \"a\", and store the results in \"dst\".\n//\n//go:linkname Mm256SubPs Mm256SubPs\n//go:noescape\nfunc Mm256SubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Alternatively add and subtract packed double-precision (64-bit) floating-point elements in \"a\" to/from packed elements in \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256AddsubPd Mm256AddsubPd\n//go:noescape\nfunc Mm256AddsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Alternatively add and subtract packed single-precision (32-bit) floating-point elements in \"a\" to/from packed elements in \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256AddsubPs Mm256AddsubPs\n//go:noescape\nfunc Mm256AddsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Divide packed double-precision (64-bit) floating-point elements in \"a\" by packed elements in \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256DivPd Mm256DivPd\n//go:noescape\nfunc Mm256DivPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Divide packed single-precision (32-bit) floating-point elements in \"a\" by packed elements in \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256DivPs Mm256DivPs\n//go:noescape\nfunc Mm256DivPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", and store packed maximum values in \"dst\". [max_float_note]\n//\n//go:linkname Mm256MaxPd Mm256MaxPd\n//go:noescape\nfunc Mm256MaxPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", and store packed maximum values in \"dst\". [max_float_note]\n//\n//go:linkname Mm256MaxPs Mm256MaxPs\n//go:noescape\nfunc Mm256MaxPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", and store packed minimum values in \"dst\". [min_float_note]\n//\n//go:linkname Mm256MinPd Mm256MinPd\n//go:noescape\nfunc Mm256MinPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", and store packed minimum values in \"dst\". [min_float_note]\n//\n//go:linkname Mm256MinPs Mm256MinPs\n//go:noescape\nfunc Mm256MinPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Multiply packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256MulPd Mm256MulPd\n//go:noescape\nfunc Mm256MulPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Multiply packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256MulPs Mm256MulPs\n//go:noescape\nfunc Mm256MulPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Compute the square root of packed double-precision (64-bit) floating-point elements in \"a\", and store the results in \"dst\".\n//\n//go:linkname Mm256SqrtPd Mm256SqrtPd\n//go:noescape\nfunc Mm256SqrtPd(r *x86.M256D, v0 *x86.M256D)\n\n// Compute the square root of packed single-precision (32-bit) floating-point elements in \"a\", and store the results in \"dst\".\n//\n//go:linkname Mm256SqrtPs Mm256SqrtPs\n//go:noescape\nfunc Mm256SqrtPs(r *x86.M256, v0 *x86.M256)\n\n// Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in \"a\", and store the results in \"dst\". The maximum relative error for this approximation is less than 1.5*2^-12.\n//\n//go:linkname Mm256RsqrtPs Mm256RsqrtPs\n//go:noescape\nfunc Mm256RsqrtPs(r *x86.M256, v0 *x86.M256)\n\n// Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in \"a\", and store the results in \"dst\". The maximum relative error for this approximation is less than 1.5*2^-12.\n//\n//go:linkname Mm256RcpPs Mm256RcpPs\n//go:noescape\nfunc Mm256RcpPs(r *x86.M256, v0 *x86.M256)\n\n// Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256AndPd Mm256AndPd\n//go:noescape\nfunc Mm256AndPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256AndPs Mm256AndPs\n//go:noescape\nfunc Mm256AndPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in \"a\" and then AND with \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256AndnotPd Mm256AndnotPd\n//go:noescape\nfunc Mm256AndnotPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in \"a\" and then AND with \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256AndnotPs Mm256AndnotPs\n//go:noescape\nfunc Mm256AndnotPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256OrPd Mm256OrPd\n//go:noescape\nfunc Mm256OrPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256OrPs Mm256OrPs\n//go:noescape\nfunc Mm256OrPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256XorPd Mm256XorPd\n//go:noescape\nfunc Mm256XorPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256XorPs Mm256XorPs\n//go:noescape\nfunc Mm256XorPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in \"a\" and \"b\", and pack the results in \"dst\".\n//\n//go:linkname Mm256HaddPd Mm256HaddPd\n//go:noescape\nfunc Mm256HaddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in \"a\" and \"b\", and pack the results in \"dst\".\n//\n//go:linkname Mm256HaddPs Mm256HaddPs\n//go:noescape\nfunc Mm256HaddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in \"a\" and \"b\", and pack the results in \"dst\".\n//\n//go:linkname Mm256HsubPd Mm256HsubPd\n//go:noescape\nfunc Mm256HsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Horizontally subtract adjacent pairs of single-precision (32-bit) floating-point elements in \"a\" and \"b\", and pack the results in \"dst\".\n//\n//go:linkname Mm256HsubPs Mm256HsubPs\n//go:noescape\nfunc Mm256HsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Shuffle double-precision (64-bit) floating-point elements in \"a\" using the control in \"b\", and store the results in \"dst\".\n//\n//go:linkname MmPermutevarPd MmPermutevarPd\n//go:noescape\nfunc MmPermutevarPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128I)\n\n// Shuffle double-precision (64-bit) floating-point elements in \"a\" within 128-bit lanes using the control in \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256PermutevarPd Mm256PermutevarPd\n//go:noescape\nfunc Mm256PermutevarPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256I)\n\n// Shuffle single-precision (32-bit) floating-point elements in \"a\" using the control in \"b\", and store the results in \"dst\".\n//\n//go:linkname MmPermutevarPs MmPermutevarPs\n//go:noescape\nfunc MmPermutevarPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128I)\n\n// Shuffle single-precision (32-bit) floating-point elements in \"a\" within 128-bit lanes using the control in \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256PermutevarPs Mm256PermutevarPs\n//go:noescape\nfunc Mm256PermutevarPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256I)\n\n// Blend packed double-precision (64-bit) floating-point elements from \"a\" and \"b\" using \"mask\", and store the results in \"dst\".\n//\n//go:linkname Mm256BlendvPd Mm256BlendvPd\n//go:noescape\nfunc Mm256BlendvPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D)\n\n// Blend packed single-precision (32-bit) floating-point elements from \"a\" and \"b\" using \"mask\", and store the results in \"dst\".\n//\n//go:linkname Mm256BlendvPs Mm256BlendvPs\n//go:noescape\nfunc Mm256BlendvPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256)\n\n// Convert packed signed 32-bit integers in \"a\" to packed double-precision (64-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname Mm256Cvtepi32Pd Mm256Cvtepi32Pd\n//go:noescape\nfunc Mm256Cvtepi32Pd(r *x86.M256D, v0 *x86.M128I)\n\n// Convert packed signed 32-bit integers in \"a\" to packed single-precision (32-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname Mm256Cvtepi32Ps Mm256Cvtepi32Ps\n//go:noescape\nfunc Mm256Cvtepi32Ps(r *x86.M256, v0 *x86.M256I)\n\n// Convert packed double-precision (64-bit) floating-point elements in \"a\" to packed single-precision (32-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname Mm256CvtpdPs Mm256CvtpdPs\n//go:noescape\nfunc Mm256CvtpdPs(r *x86.M128, v0 *x86.M256D)\n\n// Convert packed single-precision (32-bit) floating-point elements in \"a\" to packed 32-bit integers, and store the results in \"dst\".\n//\n//go:linkname Mm256CvtpsEpi32 Mm256CvtpsEpi32\n//go:noescape\nfunc Mm256CvtpsEpi32(r *x86.M256I, v0 *x86.M256)\n\n// Convert packed single-precision (32-bit) floating-point elements in \"a\" to packed double-precision (64-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname Mm256CvtpsPd Mm256CvtpsPd\n//go:noescape\nfunc Mm256CvtpsPd(r *x86.M256D, v0 *x86.M128)\n\n// Convert packed double-precision (64-bit) floating-point elements in \"a\" to packed 32-bit integers with truncation, and store the results in \"dst\".\n//\n//go:linkname Mm256CvttpdEpi32 Mm256CvttpdEpi32\n//go:noescape\nfunc Mm256CvttpdEpi32(r *x86.M128I, v0 *x86.M256D)\n\n// Convert packed double-precision (64-bit) floating-point elements in \"a\" to packed 32-bit integers, and store the results in \"dst\".\n//\n//go:linkname Mm256CvtpdEpi32 Mm256CvtpdEpi32\n//go:noescape\nfunc Mm256CvtpdEpi32(r *x86.M128I, v0 *x86.M256D)\n\n// Convert packed single-precision (32-bit) floating-point elements in \"a\" to packed 32-bit integers with truncation, and store the results in \"dst\".\n//\n//go:linkname Mm256CvttpsEpi32 Mm256CvttpsEpi32\n//go:noescape\nfunc Mm256CvttpsEpi32(r *x86.M256I, v0 *x86.M256)\n\n// Copy the lower double-precision (64-bit) floating-point element of \"a\" to \"dst\".\n//\n//go:linkname Mm256CvtsdF64 Mm256CvtsdF64\n//go:noescape\nfunc Mm256CvtsdF64(r *x86.Double, v0 *x86.M256D)\n\n// Copy the lower 32-bit integer in \"a\" to \"dst\".\n//\n//go:linkname Mm256Cvtsi256Si32 Mm256Cvtsi256Si32\n//go:noescape\nfunc Mm256Cvtsi256Si32(r *x86.Int, v0 *x86.M256I)\n\n// Copy the lower single-precision (32-bit) floating-point element of \"a\" to \"dst\".\n//\n//go:linkname Mm256CvtssF32 Mm256CvtssF32\n//go:noescape\nfunc Mm256CvtssF32(r *x86.Float, v0 *x86.M256)\n\n// Duplicate odd-indexed single-precision (32-bit) floating-point elements from \"a\", and store the results in \"dst\".\n//\n//go:linkname Mm256MovehdupPs Mm256MovehdupPs\n//go:noescape\nfunc Mm256MovehdupPs(r *x86.M256, v0 *x86.M256)\n\n// Duplicate even-indexed single-precision (32-bit) floating-point elements from \"a\", and store the results in \"dst\".\n//\n//go:linkname Mm256MoveldupPs Mm256MoveldupPs\n//go:noescape\nfunc Mm256MoveldupPs(r *x86.M256, v0 *x86.M256)\n\n// Duplicate even-indexed double-precision (64-bit) floating-point elements from \"a\", and store the results in \"dst\".\n//\n//go:linkname Mm256MovedupPd Mm256MovedupPd\n//go:noescape\nfunc Mm256MovedupPd(r *x86.M256D, v0 *x86.M256D)\n\n// Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256UnpackhiPd Mm256UnpackhiPd\n//go:noescape\nfunc Mm256UnpackhiPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256UnpackloPd Mm256UnpackloPd\n//go:noescape\nfunc Mm256UnpackloPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D)\n\n// Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256UnpackhiPs Mm256UnpackhiPs\n//go:noescape\nfunc Mm256UnpackhiPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256UnpackloPs Mm256UnpackloPs\n//go:noescape\nfunc Mm256UnpackloPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256)\n\n// Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in \"a\" and \"b\", producing an intermediate 128-bit value, and set \"ZF\" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", producing an intermediate value, and set \"CF\" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set \"CF\" to 0. Return the \"ZF\" value.\n//\n//go:linkname MmTestzPd MmTestzPd\n//go:noescape\nfunc MmTestzPd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in \"a\" and \"b\", producing an intermediate 128-bit value, and set \"ZF\" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", producing an intermediate value, and set \"CF\" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set \"CF\" to 0. Return the \"CF\" value.\n//\n//go:linkname MmTestcPd MmTestcPd\n//go:noescape\nfunc MmTestcPd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in \"a\" and \"b\", producing an intermediate 128-bit value, and set \"ZF\" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", producing an intermediate value, and set \"CF\" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set \"CF\" to 0. Return 1 if both the \"ZF\" and \"CF\" values are zero, otherwise return 0.\n//\n//go:linkname MmTestnzcPd MmTestnzcPd\n//go:noescape\nfunc MmTestnzcPd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in \"a\" and \"b\", producing an intermediate 128-bit value, and set \"ZF\" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", producing an intermediate value, and set \"CF\" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set \"CF\" to 0. Return the \"ZF\" value.\n//\n//go:linkname MmTestzPs MmTestzPs\n//go:noescape\nfunc MmTestzPs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in \"a\" and \"b\", producing an intermediate 128-bit value, and set \"ZF\" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", producing an intermediate value, and set \"CF\" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set \"CF\" to 0. Return the \"CF\" value.\n//\n//go:linkname MmTestcPs MmTestcPs\n//go:noescape\nfunc MmTestcPs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in \"a\" and \"b\", producing an intermediate 128-bit value, and set \"ZF\" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", producing an intermediate value, and set \"CF\" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set \"CF\" to 0. Return 1 if both the \"ZF\" and \"CF\" values are zero, otherwise return 0.\n//\n//go:linkname MmTestnzcPs MmTestnzcPs\n//go:noescape\nfunc MmTestnzcPs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in \"a\" and \"b\", producing an intermediate 256-bit value, and set \"ZF\" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", producing an intermediate value, and set \"CF\" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set \"CF\" to 0. Return the \"ZF\" value.\n//\n//go:linkname Mm256TestzPd Mm256TestzPd\n//go:noescape\nfunc Mm256TestzPd(r *x86.Int, v0 *x86.M256D, v1 *x86.M256D)\n\n// Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in \"a\" and \"b\", producing an intermediate 256-bit value, and set \"ZF\" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", producing an intermediate value, and set \"CF\" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set \"CF\" to 0. Return the \"CF\" value.\n//\n//go:linkname Mm256TestcPd Mm256TestcPd\n//go:noescape\nfunc Mm256TestcPd(r *x86.Int, v0 *x86.M256D, v1 *x86.M256D)\n\n// Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in \"a\" and \"b\", producing an intermediate 256-bit value, and set \"ZF\" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", producing an intermediate value, and set \"CF\" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set \"CF\" to 0. Return 1 if both the \"ZF\" and \"CF\" values are zero, otherwise return 0.\n//\n//go:linkname Mm256TestnzcPd Mm256TestnzcPd\n//go:noescape\nfunc Mm256TestnzcPd(r *x86.Int, v0 *x86.M256D, v1 *x86.M256D)\n\n// Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in \"a\" and \"b\", producing an intermediate 256-bit value, and set \"ZF\" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", producing an intermediate value, and set \"CF\" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set \"CF\" to 0. Return the \"ZF\" value.\n//\n//go:linkname Mm256TestzPs Mm256TestzPs\n//go:noescape\nfunc Mm256TestzPs(r *x86.Int, v0 *x86.M256, v1 *x86.M256)\n\n// Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in \"a\" and \"b\", producing an intermediate 256-bit value, and set \"ZF\" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", producing an intermediate value, and set \"CF\" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set \"CF\" to 0. Return the \"CF\" value.\n//\n//go:linkname Mm256TestcPs Mm256TestcPs\n//go:noescape\nfunc Mm256TestcPs(r *x86.Int, v0 *x86.M256, v1 *x86.M256)\n\n// Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in \"a\" and \"b\", producing an intermediate 256-bit value, and set \"ZF\" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", producing an intermediate value, and set \"CF\" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set \"CF\" to 0. Return 1 if both the \"ZF\" and \"CF\" values are zero, otherwise return 0.\n//\n//go:linkname Mm256TestnzcPs Mm256TestnzcPs\n//go:noescape\nfunc Mm256TestnzcPs(r *x86.Int, v0 *x86.M256, v1 *x86.M256)\n\n// Compute the bitwise AND of 256 bits (representing integer data) in \"a\" and \"b\", and set \"ZF\" to 1 if the result is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", and set \"CF\" to 1 if the result is zero, otherwise set \"CF\" to 0. Return the \"ZF\" value.\n//\n//go:linkname Mm256TestzSi256 Mm256TestzSi256\n//go:noescape\nfunc Mm256TestzSi256(r *x86.Int, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compute the bitwise AND of 256 bits (representing integer data) in \"a\" and \"b\", and set \"ZF\" to 1 if the result is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", and set \"CF\" to 1 if the result is zero, otherwise set \"CF\" to 0. Return the \"CF\" value.\n//\n//go:linkname Mm256TestcSi256 Mm256TestcSi256\n//go:noescape\nfunc Mm256TestcSi256(r *x86.Int, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compute the bitwise AND of 256 bits (representing integer data) in \"a\" and \"b\", and set \"ZF\" to 1 if the result is zero, otherwise set \"ZF\" to 0. Compute the bitwise NOT of \"a\" and then AND with \"b\", and set \"CF\" to 1 if the result is zero, otherwise set \"CF\" to 0. Return 1 if both the \"ZF\" and \"CF\" values are zero, otherwise return 0.\n//\n//go:linkname Mm256TestnzcSi256 Mm256TestnzcSi256\n//go:noescape\nfunc Mm256TestnzcSi256(r *x86.Int, v0 *x86.M256I, v1 *x86.M256I)\n\n// Set each bit of mask \"dst\" based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in \"a\".\n//\n//go:linkname Mm256MovemaskPd Mm256MovemaskPd\n//go:noescape\nfunc Mm256MovemaskPd(r *x86.Int, v0 *x86.M256D)\n\n// Set each bit of mask \"dst\" based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in \"a\".\n//\n//go:linkname Mm256MovemaskPs Mm256MovemaskPs\n//go:noescape\nfunc Mm256MovemaskPs(r *x86.Int, v0 *x86.M256)\n\n// Zero the contents of all XMM or YMM registers.\n//\n//go:linkname Mm256Zeroall Mm256Zeroall\n//go:noescape\nfunc Mm256Zeroall()\n\n// Zero the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified.\n//\n//go:linkname Mm256Zeroupper Mm256Zeroupper\n//go:noescape\nfunc Mm256Zeroupper()\n\n// Return vector of type __m256d with undefined elements.\n//\n//go:linkname Mm256UndefinedPd Mm256UndefinedPd\n//go:noescape\nfunc Mm256UndefinedPd(r *x86.M256D, )\n\n// Return vector of type __m256 with undefined elements.\n//\n//go:linkname Mm256UndefinedPs Mm256UndefinedPs\n//go:noescape\nfunc Mm256UndefinedPs(r *x86.M256, )\n\n// Return vector of type __m256i with undefined elements.\n//\n//go:linkname Mm256UndefinedSi256 Mm256UndefinedSi256\n//go:noescape\nfunc Mm256UndefinedSi256(r *x86.M256I, )\n\n// Set packed double-precision (64-bit) floating-point elements in \"dst\" with the supplied values.\n//\n//go:linkname Mm256SetPd Mm256SetPd\n//go:noescape\nfunc Mm256SetPd(r *x86.M256D, v0 *x86.Double, v1 *x86.Double, v2 *x86.Double, v3 *x86.Double)\n\n// Set packed single-precision (32-bit) floating-point elements in \"dst\" with the supplied values.\n//\n//go:linkname Mm256SetPs Mm256SetPs\n//go:noescape\nfunc Mm256SetPs(r *x86.M256, v0 *x86.Float, v1 *x86.Float, v2 *x86.Float, v3 *x86.Float, v4 *x86.Float, v5 *x86.Float, v6 *x86.Float, v7 *x86.Float)\n\n// Set packed 32-bit integers in \"dst\" with the supplied values.\n//\n//go:linkname Mm256SetEpi32 Mm256SetEpi32\n//go:noescape\nfunc Mm256SetEpi32(r *x86.M256I, v0 *x86.Int, v1 *x86.Int, v2 *x86.Int, v3 *x86.Int, v4 *x86.Int, v5 *x86.Int, v6 *x86.Int, v7 *x86.Int)\n\n// Set packed 16-bit integers in \"dst\" with the supplied values.\n//\n//go:linkname Mm256SetEpi16 Mm256SetEpi16\n//go:noescape\nfunc Mm256SetEpi16(r *x86.M256I, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short, v4 *x86.Short, v5 *x86.Short, v6 *x86.Short, v7 *x86.Short, v8 *x86.Short, v9 *x86.Short, v10 *x86.Short, v11 *x86.Short, v12 *x86.Short, v13 *x86.Short, v14 *x86.Short, v15 *x86.Short)\n\n// Set packed 8-bit integers in \"dst\" with the supplied values.\n//\n//go:linkname Mm256SetEpi8 Mm256SetEpi8\n//go:noescape\nfunc Mm256SetEpi8(r *x86.M256I, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char, v8 *x86.Char, v9 *x86.Char, v10 *x86.Char, v11 *x86.Char, v12 *x86.Char, v13 *x86.Char, v14 *x86.Char, v15 *x86.Char, v16 *x86.Char, v17 *x86.Char, v18 *x86.Char, v19 *x86.Char, v20 *x86.Char, v21 *x86.Char, v22 *x86.Char, v23 *x86.Char, v24 *x86.Char, v25 *x86.Char, v26 *x86.Char, v27 *x86.Char, v28 *x86.Char, v29 *x86.Char, v30 *x86.Char, v31 *x86.Char)\n\n// Set packed 64-bit integers in \"dst\" with the supplied values.\n//\n//go:linkname Mm256SetEpi64X Mm256SetEpi64X\n//go:noescape\nfunc Mm256SetEpi64X(r *x86.M256I, v0 *x86.Longlong, v1 *x86.Longlong, v2 *x86.Longlong, v3 *x86.Longlong)\n\n// Set packed double-precision (64-bit) floating-point elements in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname Mm256SetrPd Mm256SetrPd\n//go:noescape\nfunc Mm256SetrPd(r *x86.M256D, v0 *x86.Double, v1 *x86.Double, v2 *x86.Double, v3 *x86.Double)\n\n// Set packed single-precision (32-bit) floating-point elements in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname Mm256SetrPs Mm256SetrPs\n//go:noescape\nfunc Mm256SetrPs(r *x86.M256, v0 *x86.Float, v1 *x86.Float, v2 *x86.Float, v3 *x86.Float, v4 *x86.Float, v5 *x86.Float, v6 *x86.Float, v7 *x86.Float)\n\n// Set packed 32-bit integers in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname Mm256SetrEpi32 Mm256SetrEpi32\n//go:noescape\nfunc Mm256SetrEpi32(r *x86.M256I, v0 *x86.Int, v1 *x86.Int, v2 *x86.Int, v3 *x86.Int, v4 *x86.Int, v5 *x86.Int, v6 *x86.Int, v7 *x86.Int)\n\n// Set packed 16-bit integers in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname Mm256SetrEpi16 Mm256SetrEpi16\n//go:noescape\nfunc Mm256SetrEpi16(r *x86.M256I, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short, v4 *x86.Short, v5 *x86.Short, v6 *x86.Short, v7 *x86.Short, v8 *x86.Short, v9 *x86.Short, v10 *x86.Short, v11 *x86.Short, v12 *x86.Short, v13 *x86.Short, v14 *x86.Short, v15 *x86.Short)\n\n// Set packed 8-bit integers in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname Mm256SetrEpi8 Mm256SetrEpi8\n//go:noescape\nfunc Mm256SetrEpi8(r *x86.M256I, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char, v8 *x86.Char, v9 *x86.Char, v10 *x86.Char, v11 *x86.Char, v12 *x86.Char, v13 *x86.Char, v14 *x86.Char, v15 *x86.Char, v16 *x86.Char, v17 *x86.Char, v18 *x86.Char, v19 *x86.Char, v20 *x86.Char, v21 *x86.Char, v22 *x86.Char, v23 *x86.Char, v24 *x86.Char, v25 *x86.Char, v26 *x86.Char, v27 *x86.Char, v28 *x86.Char, v29 *x86.Char, v30 *x86.Char, v31 *x86.Char)\n\n// Set packed 64-bit integers in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname Mm256SetrEpi64X Mm256SetrEpi64X\n//go:noescape\nfunc Mm256SetrEpi64X(r *x86.M256I, v0 *x86.Longlong, v1 *x86.Longlong, v2 *x86.Longlong, v3 *x86.Longlong)\n\n// Broadcast double-precision (64-bit) floating-point value \"a\" to all elements of \"dst\".\n//\n//go:linkname Mm256Set1Pd Mm256Set1Pd\n//go:noescape\nfunc Mm256Set1Pd(r *x86.M256D, v0 *x86.Double)\n\n// Broadcast single-precision (32-bit) floating-point value \"a\" to all elements of \"dst\".\n//\n//go:linkname Mm256Set1Ps Mm256Set1Ps\n//go:noescape\nfunc Mm256Set1Ps(r *x86.M256, v0 *x86.Float)\n\n// Broadcast 32-bit integer \"a\" to all elements of \"dst\". This intrinsic may generate the \"vpbroadcastd\".\n//\n//go:linkname Mm256Set1Epi32 Mm256Set1Epi32\n//go:noescape\nfunc Mm256Set1Epi32(r *x86.M256I, v0 *x86.Int)\n\n// Broadcast 16-bit integer \"a\" to all all elements of \"dst\". This intrinsic may generate the \"vpbroadcastw\".\n//\n//go:linkname Mm256Set1Epi16 Mm256Set1Epi16\n//go:noescape\nfunc Mm256Set1Epi16(r *x86.M256I, v0 *x86.Short)\n\n// Broadcast 8-bit integer \"a\" to all elements of \"dst\". This intrinsic may generate the \"vpbroadcastb\".\n//\n//go:linkname Mm256Set1Epi8 Mm256Set1Epi8\n//go:noescape\nfunc Mm256Set1Epi8(r *x86.M256I, v0 *x86.Char)\n\n// Broadcast 64-bit integer \"a\" to all elements of \"dst\". This intrinsic may generate the \"vpbroadcastq\".\n//\n//go:linkname Mm256Set1Epi64X Mm256Set1Epi64X\n//go:noescape\nfunc Mm256Set1Epi64X(r *x86.M256I, v0 *x86.Longlong)\n\n// Return vector of type __m256d with all elements set to zero.\n//\n//go:linkname Mm256SetzeroPd Mm256SetzeroPd\n//go:noescape\nfunc Mm256SetzeroPd(r *x86.M256D, )\n\n// Return vector of type __m256 with all elements set to zero.\n//\n//go:linkname Mm256SetzeroPs Mm256SetzeroPs\n//go:noescape\nfunc Mm256SetzeroPs(r *x86.M256, )\n\n// Return vector of type __m256i with all elements set to zero.\n//\n//go:linkname Mm256SetzeroSi256 Mm256SetzeroSi256\n//go:noescape\nfunc Mm256SetzeroSi256(r *x86.M256I, )\n\n// Cast vector of type __m256d to type __m256.\tThis intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256CastpdPs Mm256CastpdPs\n//go:noescape\nfunc Mm256CastpdPs(r *x86.M256, v0 *x86.M256D)\n\n// Cast vector of type __m256d to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256CastpdSi256 Mm256CastpdSi256\n//go:noescape\nfunc Mm256CastpdSi256(r *x86.M256I, v0 *x86.M256D)\n\n// Cast vector of type __m256 to type __m256d.\tThis intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256CastpsPd Mm256CastpsPd\n//go:noescape\nfunc Mm256CastpsPd(r *x86.M256D, v0 *x86.M256)\n\n// Cast vector of type __m256 to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256CastpsSi256 Mm256CastpsSi256\n//go:noescape\nfunc Mm256CastpsSi256(r *x86.M256I, v0 *x86.M256)\n\n// Cast vector of type __m256i to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256Castsi256Ps Mm256Castsi256Ps\n//go:noescape\nfunc Mm256Castsi256Ps(r *x86.M256, v0 *x86.M256I)\n\n// Cast vector of type __m256i to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256Castsi256Pd Mm256Castsi256Pd\n//go:noescape\nfunc Mm256Castsi256Pd(r *x86.M256D, v0 *x86.M256I)\n\n// Cast vector of type __m256d to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256Castpd256Pd128 Mm256Castpd256Pd128\n//go:noescape\nfunc Mm256Castpd256Pd128(r *x86.M128D, v0 *x86.M256D)\n\n// Cast vector of type __m256 to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256Castps256Ps128 Mm256Castps256Ps128\n//go:noescape\nfunc Mm256Castps256Ps128(r *x86.M128, v0 *x86.M256)\n\n// Cast vector of type __m256i to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256Castsi256Si128 Mm256Castsi256Si128\n//go:noescape\nfunc Mm256Castsi256Si128(r *x86.M128I, v0 *x86.M256I)\n\n// Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256Castpd128Pd256 Mm256Castpd128Pd256\n//go:noescape\nfunc Mm256Castpd128Pd256(r *x86.M256D, v0 *x86.M128D)\n\n// Cast vector of type __m128 to type __m256; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256Castps128Ps256 Mm256Castps128Ps256\n//go:noescape\nfunc Mm256Castps128Ps256(r *x86.M256, v0 *x86.M128)\n\n// Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256Castsi128Si256 Mm256Castsi128Si256\n//go:noescape\nfunc Mm256Castsi128Si256(r *x86.M256I, v0 *x86.M128I)\n\n// Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256Zextpd128Pd256 Mm256Zextpd128Pd256\n//go:noescape\nfunc Mm256Zextpd128Pd256(r *x86.M256D, v0 *x86.M128D)\n\n// Cast vector of type __m128 to type __m256; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256Zextps128Ps256 Mm256Zextps128Ps256\n//go:noescape\nfunc Mm256Zextps128Ps256(r *x86.M256, v0 *x86.M128)\n\n// Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname Mm256Zextsi128Si256 Mm256Zextsi128Si256\n//go:noescape\nfunc Mm256Zextsi128Si256(r *x86.M256I, v0 *x86.M128I)\n\n// Set packed __m256 vector \"dst\" with the supplied values.\n//\n//go:linkname Mm256SetM128 Mm256SetM128\n//go:noescape\nfunc Mm256SetM128(r *x86.M256, v0 *x86.M128, v1 *x86.M128)\n\n// Set packed __m256d vector \"dst\" with the supplied values.\n//\n//go:linkname Mm256SetM128D Mm256SetM128D\n//go:noescape\nfunc Mm256SetM128D(r *x86.M256D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Set packed __m256i vector \"dst\" with the supplied values.\n//\n//go:linkname Mm256SetM128I Mm256SetM128I\n//go:noescape\nfunc Mm256SetM128I(r *x86.M256I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Set packed __m256 vector \"dst\" with the supplied values.\n//\n//go:linkname Mm256SetrM128 Mm256SetrM128\n//go:noescape\nfunc Mm256SetrM128(r *x86.M256, v0 *x86.M128, v1 *x86.M128)\n\n// Set packed __m256d vector \"dst\" with the supplied values.\n//\n//go:linkname Mm256SetrM128D Mm256SetrM128D\n//go:noescape\nfunc Mm256SetrM128D(r *x86.M256D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Set packed __m256i vector \"dst\" with the supplied values.\n//\n//go:linkname Mm256SetrM128I Mm256SetrM128I\n//go:noescape\nfunc Mm256SetrM128I(r *x86.M256I, v0 *x86.M128I, v1 *x86.M128I)\n"
  },
  {
    "path": "x86/avx2/functions.c",
    "content": "#include <immintrin.h>\n\nvoid Mm256AbsEpi8(__m256i* r, __m256i* v0) { *r = _mm256_abs_epi8(*v0); }\nvoid Mm256AbsEpi16(__m256i* r, __m256i* v0) { *r = _mm256_abs_epi16(*v0); }\nvoid Mm256AbsEpi32(__m256i* r, __m256i* v0) { *r = _mm256_abs_epi32(*v0); }\nvoid Mm256PacksEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_packs_epi16(*v0, *v1); }\nvoid Mm256PacksEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_packs_epi32(*v0, *v1); }\nvoid Mm256PackusEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_packus_epi16(*v0, *v1); }\nvoid Mm256PackusEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_packus_epi32(*v0, *v1); }\nvoid Mm256AddEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_add_epi8(*v0, *v1); }\nvoid Mm256AddEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_add_epi16(*v0, *v1); }\nvoid Mm256AddEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_add_epi32(*v0, *v1); }\nvoid Mm256AddEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_add_epi64(*v0, *v1); }\nvoid Mm256AddsEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_adds_epi8(*v0, *v1); }\nvoid Mm256AddsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_adds_epi16(*v0, *v1); }\nvoid Mm256AddsEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_adds_epu8(*v0, *v1); }\nvoid Mm256AddsEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_adds_epu16(*v0, *v1); }\nvoid Mm256AndSi256(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_and_si256(*v0, *v1); }\nvoid Mm256AndnotSi256(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_andnot_si256(*v0, *v1); }\nvoid Mm256AvgEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_avg_epu8(*v0, *v1); }\nvoid Mm256AvgEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_avg_epu16(*v0, *v1); }\nvoid Mm256BlendvEpi8(__m256i* r, __m256i* v0, __m256i* v1, __m256i* v2) { *r = _mm256_blendv_epi8(*v0, *v1, *v2); }\nvoid Mm256CmpeqEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpeq_epi8(*v0, *v1); }\nvoid Mm256CmpeqEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpeq_epi16(*v0, *v1); }\nvoid Mm256CmpeqEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpeq_epi32(*v0, *v1); }\nvoid Mm256CmpeqEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpeq_epi64(*v0, *v1); }\nvoid Mm256CmpgtEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpgt_epi8(*v0, *v1); }\nvoid Mm256CmpgtEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpgt_epi16(*v0, *v1); }\nvoid Mm256CmpgtEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpgt_epi32(*v0, *v1); }\nvoid Mm256CmpgtEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpgt_epi64(*v0, *v1); }\nvoid Mm256HaddEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hadd_epi16(*v0, *v1); }\nvoid Mm256HaddEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hadd_epi32(*v0, *v1); }\nvoid Mm256HaddsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hadds_epi16(*v0, *v1); }\nvoid Mm256HsubEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hsub_epi16(*v0, *v1); }\nvoid Mm256HsubEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hsub_epi32(*v0, *v1); }\nvoid Mm256HsubsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hsubs_epi16(*v0, *v1); }\nvoid Mm256MaddubsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_maddubs_epi16(*v0, *v1); }\nvoid Mm256MaddEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_madd_epi16(*v0, *v1); }\nvoid Mm256MaxEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epi8(*v0, *v1); }\nvoid Mm256MaxEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epi16(*v0, *v1); }\nvoid Mm256MaxEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epi32(*v0, *v1); }\nvoid Mm256MaxEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epu8(*v0, *v1); }\nvoid Mm256MaxEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epu16(*v0, *v1); }\nvoid Mm256MaxEpu32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epu32(*v0, *v1); }\nvoid Mm256MinEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epi8(*v0, *v1); }\nvoid Mm256MinEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epi16(*v0, *v1); }\nvoid Mm256MinEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epi32(*v0, *v1); }\nvoid Mm256MinEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epu8(*v0, *v1); }\nvoid Mm256MinEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epu16(*v0, *v1); }\nvoid Mm256MinEpu32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epu32(*v0, *v1); }\nvoid Mm256MovemaskEpi8(int* r, __m256i* v0) { *r = _mm256_movemask_epi8(*v0); }\nvoid Mm256Cvtepi8Epi16(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi8_epi16(*v0); }\nvoid Mm256Cvtepi8Epi32(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi8_epi32(*v0); }\nvoid Mm256Cvtepi8Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi8_epi64(*v0); }\nvoid Mm256Cvtepi16Epi32(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi16_epi32(*v0); }\nvoid Mm256Cvtepi16Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi16_epi64(*v0); }\nvoid Mm256Cvtepi32Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi32_epi64(*v0); }\nvoid Mm256Cvtepu8Epi16(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu8_epi16(*v0); }\nvoid Mm256Cvtepu8Epi32(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu8_epi32(*v0); }\nvoid Mm256Cvtepu8Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu8_epi64(*v0); }\nvoid Mm256Cvtepu16Epi32(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu16_epi32(*v0); }\nvoid Mm256Cvtepu16Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu16_epi64(*v0); }\nvoid Mm256Cvtepu32Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu32_epi64(*v0); }\nvoid Mm256MulEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mul_epi32(*v0, *v1); }\nvoid Mm256MulhrsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mulhrs_epi16(*v0, *v1); }\nvoid Mm256MulhiEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mulhi_epu16(*v0, *v1); }\nvoid Mm256MulhiEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mulhi_epi16(*v0, *v1); }\nvoid Mm256MulloEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mullo_epi16(*v0, *v1); }\nvoid Mm256MulloEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mullo_epi32(*v0, *v1); }\nvoid Mm256MulEpu32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mul_epu32(*v0, *v1); }\nvoid Mm256OrSi256(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_or_si256(*v0, *v1); }\nvoid Mm256SadEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sad_epu8(*v0, *v1); }\nvoid Mm256ShuffleEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_shuffle_epi8(*v0, *v1); }\nvoid Mm256SignEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sign_epi8(*v0, *v1); }\nvoid Mm256SignEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sign_epi16(*v0, *v1); }\nvoid Mm256SignEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sign_epi32(*v0, *v1); }\nvoid Mm256SlliEpi16(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_slli_epi16(*v0, *v1); }\nvoid Mm256SllEpi16(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sll_epi16(*v0, *v1); }\nvoid Mm256SlliEpi32(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_slli_epi32(*v0, *v1); }\nvoid Mm256SllEpi32(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sll_epi32(*v0, *v1); }\nvoid Mm256SlliEpi64(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_slli_epi64(*v0, *v1); }\nvoid Mm256SllEpi64(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sll_epi64(*v0, *v1); }\nvoid Mm256SraiEpi16(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srai_epi16(*v0, *v1); }\nvoid Mm256SraEpi16(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sra_epi16(*v0, *v1); }\nvoid Mm256SraiEpi32(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srai_epi32(*v0, *v1); }\nvoid Mm256SraEpi32(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sra_epi32(*v0, *v1); }\nvoid Mm256SrliEpi16(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srli_epi16(*v0, *v1); }\nvoid Mm256SrlEpi16(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_srl_epi16(*v0, *v1); }\nvoid Mm256SrliEpi32(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srli_epi32(*v0, *v1); }\nvoid Mm256SrlEpi32(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_srl_epi32(*v0, *v1); }\nvoid Mm256SrliEpi64(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srli_epi64(*v0, *v1); }\nvoid Mm256SrlEpi64(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_srl_epi64(*v0, *v1); }\nvoid Mm256SubEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sub_epi8(*v0, *v1); }\nvoid Mm256SubEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sub_epi16(*v0, *v1); }\nvoid Mm256SubEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sub_epi32(*v0, *v1); }\nvoid Mm256SubEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sub_epi64(*v0, *v1); }\nvoid Mm256SubsEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_subs_epi8(*v0, *v1); }\nvoid Mm256SubsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_subs_epi16(*v0, *v1); }\nvoid Mm256SubsEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_subs_epu8(*v0, *v1); }\nvoid Mm256SubsEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_subs_epu16(*v0, *v1); }\nvoid Mm256UnpackhiEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpackhi_epi8(*v0, *v1); }\nvoid Mm256UnpackhiEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpackhi_epi16(*v0, *v1); }\nvoid Mm256UnpackhiEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpackhi_epi32(*v0, *v1); }\nvoid Mm256UnpackhiEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpackhi_epi64(*v0, *v1); }\nvoid Mm256UnpackloEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpacklo_epi8(*v0, *v1); }\nvoid Mm256UnpackloEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpacklo_epi16(*v0, *v1); }\nvoid Mm256UnpackloEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpacklo_epi32(*v0, *v1); }\nvoid Mm256UnpackloEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpacklo_epi64(*v0, *v1); }\nvoid Mm256XorSi256(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_xor_si256(*v0, *v1); }\nvoid MmBroadcastssPs(__m128* r, __m128* v0) { *r = _mm_broadcastss_ps(*v0); }\nvoid MmBroadcastsdPd(__m128d* r, __m128d* v0) { *r = _mm_broadcastsd_pd(*v0); }\nvoid Mm256BroadcastssPs(__m256* r, __m128* v0) { *r = _mm256_broadcastss_ps(*v0); }\nvoid Mm256BroadcastsdPd(__m256d* r, __m128d* v0) { *r = _mm256_broadcastsd_pd(*v0); }\nvoid Mm256Broadcastsi128Si256(__m256i* r, __m128i* v0) { *r = _mm256_broadcastsi128_si256(*v0); }\nvoid Mm256BroadcastbEpi8(__m256i* r, __m128i* v0) { *r = _mm256_broadcastb_epi8(*v0); }\nvoid Mm256BroadcastwEpi16(__m256i* r, __m128i* v0) { *r = _mm256_broadcastw_epi16(*v0); }\nvoid Mm256BroadcastdEpi32(__m256i* r, __m128i* v0) { *r = _mm256_broadcastd_epi32(*v0); }\nvoid Mm256BroadcastqEpi64(__m256i* r, __m128i* v0) { *r = _mm256_broadcastq_epi64(*v0); }\nvoid MmBroadcastbEpi8(__m128i* r, __m128i* v0) { *r = _mm_broadcastb_epi8(*v0); }\nvoid MmBroadcastwEpi16(__m128i* r, __m128i* v0) { *r = _mm_broadcastw_epi16(*v0); }\nvoid MmBroadcastdEpi32(__m128i* r, __m128i* v0) { *r = _mm_broadcastd_epi32(*v0); }\nvoid MmBroadcastqEpi64(__m128i* r, __m128i* v0) { *r = _mm_broadcastq_epi64(*v0); }\nvoid Mm256Permutevar8X32Epi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_permutevar8x32_epi32(*v0, *v1); }\nvoid Mm256Permutevar8X32Ps(__m256* r, __m256* v0, __m256i* v1) { *r = _mm256_permutevar8x32_ps(*v0, *v1); }\nvoid Mm256SllvEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sllv_epi32(*v0, *v1); }\nvoid MmSllvEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sllv_epi32(*v0, *v1); }\nvoid Mm256SllvEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sllv_epi64(*v0, *v1); }\nvoid MmSllvEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sllv_epi64(*v0, *v1); }\nvoid Mm256SravEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_srav_epi32(*v0, *v1); }\nvoid MmSravEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srav_epi32(*v0, *v1); }\nvoid Mm256SrlvEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_srlv_epi32(*v0, *v1); }\nvoid MmSrlvEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srlv_epi32(*v0, *v1); }\nvoid Mm256SrlvEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_srlv_epi64(*v0, *v1); }\nvoid MmSrlvEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srlv_epi64(*v0, *v1); }\n"
  },
  {
    "path": "x86/avx2/functions.go",
    "content": "package avx2\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mavx2\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Compute the absolute value of packed signed 8-bit integers in \"a\", and store the unsigned results in \"dst\".\n//\n//go:linkname Mm256AbsEpi8 Mm256AbsEpi8\n//go:noescape\nfunc Mm256AbsEpi8(r *x86.M256I, v0 *x86.M256I)\n\n// Compute the absolute value of packed signed 16-bit integers in \"a\", and store the unsigned results in \"dst\".\n//\n//go:linkname Mm256AbsEpi16 Mm256AbsEpi16\n//go:noescape\nfunc Mm256AbsEpi16(r *x86.M256I, v0 *x86.M256I)\n\n// Compute the absolute value of packed signed 32-bit integers in \"a\", and store the unsigned results in \"dst\".\n//\n//go:linkname Mm256AbsEpi32 Mm256AbsEpi32\n//go:noescape\nfunc Mm256AbsEpi32(r *x86.M256I, v0 *x86.M256I)\n\n// Convert packed signed 16-bit integers from \"a\" and \"b\" to packed 8-bit integers using signed saturation, and store the results in \"dst\".\n//\n//go:linkname Mm256PacksEpi16 Mm256PacksEpi16\n//go:noescape\nfunc Mm256PacksEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Convert packed signed 32-bit integers from \"a\" and \"b\" to packed 16-bit integers using signed saturation, and store the results in \"dst\".\n//\n//go:linkname Mm256PacksEpi32 Mm256PacksEpi32\n//go:noescape\nfunc Mm256PacksEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Convert packed signed 16-bit integers from \"a\" and \"b\" to packed 8-bit integers using unsigned saturation, and store the results in \"dst\".\n//\n//go:linkname Mm256PackusEpi16 Mm256PackusEpi16\n//go:noescape\nfunc Mm256PackusEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Convert packed signed 32-bit integers from \"a\" and \"b\" to packed 16-bit integers using unsigned saturation, and store the results in \"dst\".\n//\n//go:linkname Mm256PackusEpi32 Mm256PackusEpi32\n//go:noescape\nfunc Mm256PackusEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Add packed 8-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256AddEpi8 Mm256AddEpi8\n//go:noescape\nfunc Mm256AddEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Add packed 16-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256AddEpi16 Mm256AddEpi16\n//go:noescape\nfunc Mm256AddEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Add packed 32-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256AddEpi32 Mm256AddEpi32\n//go:noescape\nfunc Mm256AddEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Add packed 64-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256AddEpi64 Mm256AddEpi64\n//go:noescape\nfunc Mm256AddEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Add packed 8-bit integers in \"a\" and \"b\" using saturation, and store the results in \"dst\".\n//\n//go:linkname Mm256AddsEpi8 Mm256AddsEpi8\n//go:noescape\nfunc Mm256AddsEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Add packed 16-bit integers in \"a\" and \"b\" using saturation, and store the results in \"dst\".\n//\n//go:linkname Mm256AddsEpi16 Mm256AddsEpi16\n//go:noescape\nfunc Mm256AddsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Add packed unsigned 8-bit integers in \"a\" and \"b\" using saturation, and store the results in \"dst\".\n//\n//go:linkname Mm256AddsEpu8 Mm256AddsEpu8\n//go:noescape\nfunc Mm256AddsEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Add packed unsigned 16-bit integers in \"a\" and \"b\" using saturation, and store the results in \"dst\".\n//\n//go:linkname Mm256AddsEpu16 Mm256AddsEpu16\n//go:noescape\nfunc Mm256AddsEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compute the bitwise AND of 256 bits (representing integer data) in \"a\" and \"b\", and store the result in \"dst\".\n//\n//go:linkname Mm256AndSi256 Mm256AndSi256\n//go:noescape\nfunc Mm256AndSi256(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compute the bitwise NOT of 256 bits (representing integer data) in \"a\" and then AND with \"b\", and store the result in \"dst\".\n//\n//go:linkname Mm256AndnotSi256 Mm256AndnotSi256\n//go:noescape\nfunc Mm256AndnotSi256(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Average packed unsigned 8-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256AvgEpu8 Mm256AvgEpu8\n//go:noescape\nfunc Mm256AvgEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Average packed unsigned 16-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256AvgEpu16 Mm256AvgEpu16\n//go:noescape\nfunc Mm256AvgEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Blend packed 8-bit integers from \"a\" and \"b\" using \"mask\", and store the results in \"dst\".\n//\n//go:linkname Mm256BlendvEpi8 Mm256BlendvEpi8\n//go:noescape\nfunc Mm256BlendvEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I, v2 *x86.M256I)\n\n// Compare packed 8-bit integers in \"a\" and \"b\" for equality, and store the results in \"dst\".\n//\n//go:linkname Mm256CmpeqEpi8 Mm256CmpeqEpi8\n//go:noescape\nfunc Mm256CmpeqEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed 16-bit integers in \"a\" and \"b\" for equality, and store the results in \"dst\".\n//\n//go:linkname Mm256CmpeqEpi16 Mm256CmpeqEpi16\n//go:noescape\nfunc Mm256CmpeqEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed 32-bit integers in \"a\" and \"b\" for equality, and store the results in \"dst\".\n//\n//go:linkname Mm256CmpeqEpi32 Mm256CmpeqEpi32\n//go:noescape\nfunc Mm256CmpeqEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed 64-bit integers in \"a\" and \"b\" for equality, and store the results in \"dst\".\n//\n//go:linkname Mm256CmpeqEpi64 Mm256CmpeqEpi64\n//go:noescape\nfunc Mm256CmpeqEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed signed 8-bit integers in \"a\" and \"b\" for greater-than, and store the results in \"dst\".\n//\n//go:linkname Mm256CmpgtEpi8 Mm256CmpgtEpi8\n//go:noescape\nfunc Mm256CmpgtEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed signed 16-bit integers in \"a\" and \"b\" for greater-than, and store the results in \"dst\".\n//\n//go:linkname Mm256CmpgtEpi16 Mm256CmpgtEpi16\n//go:noescape\nfunc Mm256CmpgtEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed signed 32-bit integers in \"a\" and \"b\" for greater-than, and store the results in \"dst\".\n//\n//go:linkname Mm256CmpgtEpi32 Mm256CmpgtEpi32\n//go:noescape\nfunc Mm256CmpgtEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed signed 64-bit integers in \"a\" and \"b\" for greater-than, and store the results in \"dst\".\n//\n//go:linkname Mm256CmpgtEpi64 Mm256CmpgtEpi64\n//go:noescape\nfunc Mm256CmpgtEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Horizontally add adjacent pairs of 16-bit integers in \"a\" and \"b\", and pack the signed 16-bit results in \"dst\".\n//\n//go:linkname Mm256HaddEpi16 Mm256HaddEpi16\n//go:noescape\nfunc Mm256HaddEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Horizontally add adjacent pairs of 32-bit integers in \"a\" and \"b\", and pack the signed 32-bit results in \"dst\".\n//\n//go:linkname Mm256HaddEpi32 Mm256HaddEpi32\n//go:noescape\nfunc Mm256HaddEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Horizontally add adjacent pairs of signed 16-bit integers in \"a\" and \"b\" using saturation, and pack the signed 16-bit results in \"dst\".\n//\n//go:linkname Mm256HaddsEpi16 Mm256HaddsEpi16\n//go:noescape\nfunc Mm256HaddsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Horizontally subtract adjacent pairs of 16-bit integers in \"a\" and \"b\", and pack the signed 16-bit results in \"dst\".\n//\n//go:linkname Mm256HsubEpi16 Mm256HsubEpi16\n//go:noescape\nfunc Mm256HsubEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Horizontally subtract adjacent pairs of 32-bit integers in \"a\" and \"b\", and pack the signed 32-bit results in \"dst\".\n//\n//go:linkname Mm256HsubEpi32 Mm256HsubEpi32\n//go:noescape\nfunc Mm256HsubEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Horizontally subtract adjacent pairs of signed 16-bit integers in \"a\" and \"b\" using saturation, and pack the signed 16-bit results in \"dst\".\n//\n//go:linkname Mm256HsubsEpi16 Mm256HsubsEpi16\n//go:noescape\nfunc Mm256HsubsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Vertically multiply each unsigned 8-bit integer from \"a\" with the corresponding signed 8-bit integer from \"b\", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in \"dst\".\n//\n//go:linkname Mm256MaddubsEpi16 Mm256MaddubsEpi16\n//go:noescape\nfunc Mm256MaddubsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Multiply packed signed 16-bit integers in \"a\" and \"b\", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in \"dst\".\n//\n//go:linkname Mm256MaddEpi16 Mm256MaddEpi16\n//go:noescape\nfunc Mm256MaddEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed signed 8-bit integers in \"a\" and \"b\", and store packed maximum values in \"dst\".\n//\n//go:linkname Mm256MaxEpi8 Mm256MaxEpi8\n//go:noescape\nfunc Mm256MaxEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed signed 16-bit integers in \"a\" and \"b\", and store packed maximum values in \"dst\".\n//\n//go:linkname Mm256MaxEpi16 Mm256MaxEpi16\n//go:noescape\nfunc Mm256MaxEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed signed 32-bit integers in \"a\" and \"b\", and store packed maximum values in \"dst\".\n//\n//go:linkname Mm256MaxEpi32 Mm256MaxEpi32\n//go:noescape\nfunc Mm256MaxEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed unsigned 8-bit integers in \"a\" and \"b\", and store packed maximum values in \"dst\".\n//\n//go:linkname Mm256MaxEpu8 Mm256MaxEpu8\n//go:noescape\nfunc Mm256MaxEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed unsigned 16-bit integers in \"a\" and \"b\", and store packed maximum values in \"dst\".\n//\n//go:linkname Mm256MaxEpu16 Mm256MaxEpu16\n//go:noescape\nfunc Mm256MaxEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed unsigned 32-bit integers in \"a\" and \"b\", and store packed maximum values in \"dst\".\n//\n//go:linkname Mm256MaxEpu32 Mm256MaxEpu32\n//go:noescape\nfunc Mm256MaxEpu32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed signed 8-bit integers in \"a\" and \"b\", and store packed minimum values in \"dst\".\n//\n//go:linkname Mm256MinEpi8 Mm256MinEpi8\n//go:noescape\nfunc Mm256MinEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed signed 16-bit integers in \"a\" and \"b\", and store packed minimum values in \"dst\".\n//\n//go:linkname Mm256MinEpi16 Mm256MinEpi16\n//go:noescape\nfunc Mm256MinEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed signed 32-bit integers in \"a\" and \"b\", and store packed minimum values in \"dst\".\n//\n//go:linkname Mm256MinEpi32 Mm256MinEpi32\n//go:noescape\nfunc Mm256MinEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed unsigned 8-bit integers in \"a\" and \"b\", and store packed minimum values in \"dst\".\n//\n//go:linkname Mm256MinEpu8 Mm256MinEpu8\n//go:noescape\nfunc Mm256MinEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed unsigned 16-bit integers in \"a\" and \"b\", and store packed minimum values in \"dst\".\n//\n//go:linkname Mm256MinEpu16 Mm256MinEpu16\n//go:noescape\nfunc Mm256MinEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compare packed unsigned 32-bit integers in \"a\" and \"b\", and store packed minimum values in \"dst\".\n//\n//go:linkname Mm256MinEpu32 Mm256MinEpu32\n//go:noescape\nfunc Mm256MinEpu32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Create mask from the most significant bit of each 8-bit element in \"a\", and store the result in \"dst\".\n//\n//go:linkname Mm256MovemaskEpi8 Mm256MovemaskEpi8\n//go:noescape\nfunc Mm256MovemaskEpi8(r *x86.Int, v0 *x86.M256I)\n\n// Sign extend packed 8-bit integers in \"a\" to packed 16-bit integers, and store the results in \"dst\".\n//\n//go:linkname Mm256Cvtepi8Epi16 Mm256Cvtepi8Epi16\n//go:noescape\nfunc Mm256Cvtepi8Epi16(r *x86.M256I, v0 *x86.M128I)\n\n// Sign extend packed 8-bit integers in \"a\" to packed 32-bit integers, and store the results in \"dst\".\n//\n//go:linkname Mm256Cvtepi8Epi32 Mm256Cvtepi8Epi32\n//go:noescape\nfunc Mm256Cvtepi8Epi32(r *x86.M256I, v0 *x86.M128I)\n\n// Sign extend packed 8-bit integers in the low 8 bytes of \"a\" to packed 64-bit integers, and store the results in \"dst\".\n//\n//go:linkname Mm256Cvtepi8Epi64 Mm256Cvtepi8Epi64\n//go:noescape\nfunc Mm256Cvtepi8Epi64(r *x86.M256I, v0 *x86.M128I)\n\n// Sign extend packed 16-bit integers in \"a\" to packed 32-bit integers, and store the results in \"dst\".\n//\n//go:linkname Mm256Cvtepi16Epi32 Mm256Cvtepi16Epi32\n//go:noescape\nfunc Mm256Cvtepi16Epi32(r *x86.M256I, v0 *x86.M128I)\n\n// Sign extend packed 16-bit integers in \"a\" to packed 64-bit integers, and store the results in \"dst\".\n//\n//go:linkname Mm256Cvtepi16Epi64 Mm256Cvtepi16Epi64\n//go:noescape\nfunc Mm256Cvtepi16Epi64(r *x86.M256I, v0 *x86.M128I)\n\n// Sign extend packed 32-bit integers in \"a\" to packed 64-bit integers, and store the results in \"dst\".\n//\n//go:linkname Mm256Cvtepi32Epi64 Mm256Cvtepi32Epi64\n//go:noescape\nfunc Mm256Cvtepi32Epi64(r *x86.M256I, v0 *x86.M128I)\n\n// Zero extend packed unsigned 8-bit integers in \"a\" to packed 16-bit integers, and store the results in \"dst\".\n//\n//go:linkname Mm256Cvtepu8Epi16 Mm256Cvtepu8Epi16\n//go:noescape\nfunc Mm256Cvtepu8Epi16(r *x86.M256I, v0 *x86.M128I)\n\n// Zero extend packed unsigned 8-bit integers in \"a\" to packed 32-bit integers, and store the results in \"dst\".\n//\n//go:linkname Mm256Cvtepu8Epi32 Mm256Cvtepu8Epi32\n//go:noescape\nfunc Mm256Cvtepu8Epi32(r *x86.M256I, v0 *x86.M128I)\n\n// Zero extend packed unsigned 8-bit integers in the low 8 byte sof \"a\" to packed 64-bit integers, and store the results in \"dst\".\n//\n//go:linkname Mm256Cvtepu8Epi64 Mm256Cvtepu8Epi64\n//go:noescape\nfunc Mm256Cvtepu8Epi64(r *x86.M256I, v0 *x86.M128I)\n\n// Zero extend packed unsigned 16-bit integers in \"a\" to packed 32-bit integers, and store the results in \"dst\".\n//\n//go:linkname Mm256Cvtepu16Epi32 Mm256Cvtepu16Epi32\n//go:noescape\nfunc Mm256Cvtepu16Epi32(r *x86.M256I, v0 *x86.M128I)\n\n// Zero extend packed unsigned 16-bit integers in \"a\" to packed 64-bit integers, and store the results in \"dst\".\n//\n//go:linkname Mm256Cvtepu16Epi64 Mm256Cvtepu16Epi64\n//go:noescape\nfunc Mm256Cvtepu16Epi64(r *x86.M256I, v0 *x86.M128I)\n\n// Zero extend packed unsigned 32-bit integers in \"a\" to packed 64-bit integers, and store the results in \"dst\".\n//\n//go:linkname Mm256Cvtepu32Epi64 Mm256Cvtepu32Epi64\n//go:noescape\nfunc Mm256Cvtepu32Epi64(r *x86.M256I, v0 *x86.M128I)\n\n// Multiply the low signed 32-bit integers from each packed 64-bit element in \"a\" and \"b\", and store the signed 64-bit results in \"dst\".\n//\n//go:linkname Mm256MulEpi32 Mm256MulEpi32\n//go:noescape\nfunc Mm256MulEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Multiply packed signed 16-bit integers in \"a\" and \"b\", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to \"dst\".\n//\n//go:linkname Mm256MulhrsEpi16 Mm256MulhrsEpi16\n//go:noescape\nfunc Mm256MulhrsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Multiply the packed unsigned 16-bit integers in \"a\" and \"b\", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in \"dst\".\n//\n//go:linkname Mm256MulhiEpu16 Mm256MulhiEpu16\n//go:noescape\nfunc Mm256MulhiEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Multiply the packed signed 16-bit integers in \"a\" and \"b\", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in \"dst\".\n//\n//go:linkname Mm256MulhiEpi16 Mm256MulhiEpi16\n//go:noescape\nfunc Mm256MulhiEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Multiply the packed signed 16-bit integers in \"a\" and \"b\", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in \"dst\".\n//\n//go:linkname Mm256MulloEpi16 Mm256MulloEpi16\n//go:noescape\nfunc Mm256MulloEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Multiply the packed signed 32-bit integers in \"a\" and \"b\", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in \"dst\".\n//\n//go:linkname Mm256MulloEpi32 Mm256MulloEpi32\n//go:noescape\nfunc Mm256MulloEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Multiply the low unsigned 32-bit integers from each packed 64-bit element in \"a\" and \"b\", and store the unsigned 64-bit results in \"dst\".\n//\n//go:linkname Mm256MulEpu32 Mm256MulEpu32\n//go:noescape\nfunc Mm256MulEpu32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compute the bitwise OR of 256 bits (representing integer data) in \"a\" and \"b\", and store the result in \"dst\".\n//\n//go:linkname Mm256OrSi256 Mm256OrSi256\n//go:noescape\nfunc Mm256OrSi256(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compute the absolute differences of packed unsigned 8-bit integers in \"a\" and \"b\", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in \"dst\".\n//\n//go:linkname Mm256SadEpu8 Mm256SadEpu8\n//go:noescape\nfunc Mm256SadEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Shuffle 8-bit integers in \"a\" within 128-bit lanes according to shuffle control mask in the corresponding 8-bit element of \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256ShuffleEpi8 Mm256ShuffleEpi8\n//go:noescape\nfunc Mm256ShuffleEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Negate packed signed 8-bit integers in \"a\" when the corresponding signed 8-bit integer in \"b\" is negative, and store the results in \"dst\". Element in \"dst\" are zeroed out when the corresponding element in \"b\" is zero.\n//\n//go:linkname Mm256SignEpi8 Mm256SignEpi8\n//go:noescape\nfunc Mm256SignEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Negate packed signed 16-bit integers in \"a\" when the corresponding signed 16-bit integer in \"b\" is negative, and store the results in \"dst\". Element in \"dst\" are zeroed out when the corresponding element in \"b\" is zero.\n//\n//go:linkname Mm256SignEpi16 Mm256SignEpi16\n//go:noescape\nfunc Mm256SignEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Negate packed signed 32-bit integers in \"a\" when the corresponding signed 32-bit integer in \"b\" is negative, and store the results in \"dst\". Element in \"dst\" are zeroed out when the corresponding element in \"b\" is zero.\n//\n//go:linkname Mm256SignEpi32 Mm256SignEpi32\n//go:noescape\nfunc Mm256SignEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Shift packed 16-bit integers in \"a\" left by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SlliEpi16 Mm256SlliEpi16\n//go:noescape\nfunc Mm256SlliEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)\n\n// Shift packed 16-bit integers in \"a\" left by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SllEpi16 Mm256SllEpi16\n//go:noescape\nfunc Mm256SllEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)\n\n// Shift packed 32-bit integers in \"a\" left by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SlliEpi32 Mm256SlliEpi32\n//go:noescape\nfunc Mm256SlliEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)\n\n// Shift packed 32-bit integers in \"a\" left by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SllEpi32 Mm256SllEpi32\n//go:noescape\nfunc Mm256SllEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)\n\n// Shift packed 64-bit integers in \"a\" left by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SlliEpi64 Mm256SlliEpi64\n//go:noescape\nfunc Mm256SlliEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)\n\n// Shift packed 64-bit integers in \"a\" left by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SllEpi64 Mm256SllEpi64\n//go:noescape\nfunc Mm256SllEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)\n\n// Shift packed 16-bit integers in \"a\" right by \"imm8\" while shifting in sign bits, and store the results in \"dst\".\n//\n//go:linkname Mm256SraiEpi16 Mm256SraiEpi16\n//go:noescape\nfunc Mm256SraiEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)\n\n// Shift packed 16-bit integers in \"a\" right by \"count\" while shifting in sign bits, and store the results in \"dst\".\n//\n//go:linkname Mm256SraEpi16 Mm256SraEpi16\n//go:noescape\nfunc Mm256SraEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)\n\n// Shift packed 32-bit integers in \"a\" right by \"imm8\" while shifting in sign bits, and store the results in \"dst\".\n//\n//go:linkname Mm256SraiEpi32 Mm256SraiEpi32\n//go:noescape\nfunc Mm256SraiEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)\n\n// Shift packed 32-bit integers in \"a\" right by \"count\" while shifting in sign bits, and store the results in \"dst\".\n//\n//go:linkname Mm256SraEpi32 Mm256SraEpi32\n//go:noescape\nfunc Mm256SraEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)\n\n// Shift packed 16-bit integers in \"a\" right by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SrliEpi16 Mm256SrliEpi16\n//go:noescape\nfunc Mm256SrliEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)\n\n// Shift packed 16-bit integers in \"a\" right by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SrlEpi16 Mm256SrlEpi16\n//go:noescape\nfunc Mm256SrlEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)\n\n// Shift packed 32-bit integers in \"a\" right by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SrliEpi32 Mm256SrliEpi32\n//go:noescape\nfunc Mm256SrliEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)\n\n// Shift packed 32-bit integers in \"a\" right by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SrlEpi32 Mm256SrlEpi32\n//go:noescape\nfunc Mm256SrlEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)\n\n// Shift packed 64-bit integers in \"a\" right by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SrliEpi64 Mm256SrliEpi64\n//go:noescape\nfunc Mm256SrliEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int)\n\n// Shift packed 64-bit integers in \"a\" right by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SrlEpi64 Mm256SrlEpi64\n//go:noescape\nfunc Mm256SrlEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I)\n\n// Subtract packed 8-bit integers in \"b\" from packed 8-bit integers in \"a\", and store the results in \"dst\".\n//\n//go:linkname Mm256SubEpi8 Mm256SubEpi8\n//go:noescape\nfunc Mm256SubEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Subtract packed 16-bit integers in \"b\" from packed 16-bit integers in \"a\", and store the results in \"dst\".\n//\n//go:linkname Mm256SubEpi16 Mm256SubEpi16\n//go:noescape\nfunc Mm256SubEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Subtract packed 32-bit integers in \"b\" from packed 32-bit integers in \"a\", and store the results in \"dst\".\n//\n//go:linkname Mm256SubEpi32 Mm256SubEpi32\n//go:noescape\nfunc Mm256SubEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Subtract packed 64-bit integers in \"b\" from packed 64-bit integers in \"a\", and store the results in \"dst\".\n//\n//go:linkname Mm256SubEpi64 Mm256SubEpi64\n//go:noescape\nfunc Mm256SubEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Subtract packed signed 8-bit integers in \"b\" from packed 8-bit integers in \"a\" using saturation, and store the results in \"dst\".\n//\n//go:linkname Mm256SubsEpi8 Mm256SubsEpi8\n//go:noescape\nfunc Mm256SubsEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Subtract packed signed 16-bit integers in \"b\" from packed 16-bit integers in \"a\" using saturation, and store the results in \"dst\".\n//\n//go:linkname Mm256SubsEpi16 Mm256SubsEpi16\n//go:noescape\nfunc Mm256SubsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Subtract packed unsigned 8-bit integers in \"b\" from packed unsigned 8-bit integers in \"a\" using saturation, and store the results in \"dst\".\n//\n//go:linkname Mm256SubsEpu8 Mm256SubsEpu8\n//go:noescape\nfunc Mm256SubsEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Subtract packed unsigned 16-bit integers in \"b\" from packed unsigned 16-bit integers in \"a\" using saturation, and store the results in \"dst\".\n//\n//go:linkname Mm256SubsEpu16 Mm256SubsEpu16\n//go:noescape\nfunc Mm256SubsEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Unpack and interleave 8-bit integers from the high half of each 128-bit lane in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256UnpackhiEpi8 Mm256UnpackhiEpi8\n//go:noescape\nfunc Mm256UnpackhiEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Unpack and interleave 16-bit integers from the high half of each 128-bit lane in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256UnpackhiEpi16 Mm256UnpackhiEpi16\n//go:noescape\nfunc Mm256UnpackhiEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Unpack and interleave 32-bit integers from the high half of each 128-bit lane in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256UnpackhiEpi32 Mm256UnpackhiEpi32\n//go:noescape\nfunc Mm256UnpackhiEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Unpack and interleave 64-bit integers from the high half of each 128-bit lane in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256UnpackhiEpi64 Mm256UnpackhiEpi64\n//go:noescape\nfunc Mm256UnpackhiEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Unpack and interleave 8-bit integers from the low half of each 128-bit lane in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256UnpackloEpi8 Mm256UnpackloEpi8\n//go:noescape\nfunc Mm256UnpackloEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Unpack and interleave 16-bit integers from the low half of each 128-bit lane in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256UnpackloEpi16 Mm256UnpackloEpi16\n//go:noescape\nfunc Mm256UnpackloEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Unpack and interleave 32-bit integers from the low half of each 128-bit lane in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256UnpackloEpi32 Mm256UnpackloEpi32\n//go:noescape\nfunc Mm256UnpackloEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Unpack and interleave 64-bit integers from the low half of each 128-bit lane in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname Mm256UnpackloEpi64 Mm256UnpackloEpi64\n//go:noescape\nfunc Mm256UnpackloEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Compute the bitwise XOR of 256 bits (representing integer data) in \"a\" and \"b\", and store the result in \"dst\".\n//\n//go:linkname Mm256XorSi256 Mm256XorSi256\n//go:noescape\nfunc Mm256XorSi256(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Broadcast the low single-precision (32-bit) floating-point element from \"a\" to all elements of \"dst\".\n//\n//go:linkname MmBroadcastssPs MmBroadcastssPs\n//go:noescape\nfunc MmBroadcastssPs(r *x86.M128, v0 *x86.M128)\n\n// Broadcast the low double-precision (64-bit) floating-point element from \"a\" to all elements of \"dst\".\n//\n//go:linkname MmBroadcastsdPd MmBroadcastsdPd\n//go:noescape\nfunc MmBroadcastsdPd(r *x86.M128D, v0 *x86.M128D)\n\n// Broadcast the low single-precision (32-bit) floating-point element from \"a\" to all elements of \"dst\".\n//\n//go:linkname Mm256BroadcastssPs Mm256BroadcastssPs\n//go:noescape\nfunc Mm256BroadcastssPs(r *x86.M256, v0 *x86.M128)\n\n// Broadcast the low double-precision (64-bit) floating-point element from \"a\" to all elements of \"dst\".\n//\n//go:linkname Mm256BroadcastsdPd Mm256BroadcastsdPd\n//go:noescape\nfunc Mm256BroadcastsdPd(r *x86.M256D, v0 *x86.M128D)\n\n// Broadcast 128 bits of integer data from \"a\" to all 128-bit lanes in \"dst\".\n//\n//go:linkname Mm256Broadcastsi128Si256 Mm256Broadcastsi128Si256\n//go:noescape\nfunc Mm256Broadcastsi128Si256(r *x86.M256I, v0 *x86.M128I)\n\n// Broadcast the low packed 8-bit integer from \"a\" to all elements of \"dst\".\n//\n//go:linkname Mm256BroadcastbEpi8 Mm256BroadcastbEpi8\n//go:noescape\nfunc Mm256BroadcastbEpi8(r *x86.M256I, v0 *x86.M128I)\n\n// Broadcast the low packed 16-bit integer from \"a\" to all elements of \"dst\".\n//\n//go:linkname Mm256BroadcastwEpi16 Mm256BroadcastwEpi16\n//go:noescape\nfunc Mm256BroadcastwEpi16(r *x86.M256I, v0 *x86.M128I)\n\n// Broadcast the low packed 32-bit integer from \"a\" to all elements of \"dst\".\n//\n//go:linkname Mm256BroadcastdEpi32 Mm256BroadcastdEpi32\n//go:noescape\nfunc Mm256BroadcastdEpi32(r *x86.M256I, v0 *x86.M128I)\n\n// Broadcast the low packed 64-bit integer from \"a\" to all elements of \"dst\".\n//\n//go:linkname Mm256BroadcastqEpi64 Mm256BroadcastqEpi64\n//go:noescape\nfunc Mm256BroadcastqEpi64(r *x86.M256I, v0 *x86.M128I)\n\n// Broadcast the low packed 8-bit integer from \"a\" to all elements of \"dst\".\n//\n//go:linkname MmBroadcastbEpi8 MmBroadcastbEpi8\n//go:noescape\nfunc MmBroadcastbEpi8(r *x86.M128I, v0 *x86.M128I)\n\n// Broadcast the low packed 16-bit integer from \"a\" to all elements of \"dst\".\n//\n//go:linkname MmBroadcastwEpi16 MmBroadcastwEpi16\n//go:noescape\nfunc MmBroadcastwEpi16(r *x86.M128I, v0 *x86.M128I)\n\n// Broadcast the low packed 32-bit integer from \"a\" to all elements of \"dst\".\n//\n//go:linkname MmBroadcastdEpi32 MmBroadcastdEpi32\n//go:noescape\nfunc MmBroadcastdEpi32(r *x86.M128I, v0 *x86.M128I)\n\n// Broadcast the low packed 64-bit integer from \"a\" to all elements of \"dst\".\n//\n//go:linkname MmBroadcastqEpi64 MmBroadcastqEpi64\n//go:noescape\nfunc MmBroadcastqEpi64(r *x86.M128I, v0 *x86.M128I)\n\n// Shuffle 32-bit integers in \"a\" across lanes using the corresponding index in \"idx\", and store the results in \"dst\".\n//\n//go:linkname Mm256Permutevar8X32Epi32 Mm256Permutevar8X32Epi32\n//go:noescape\nfunc Mm256Permutevar8X32Epi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Shuffle single-precision (32-bit) floating-point elements in \"a\" across lanes using the corresponding index in \"idx\".\n//\n//go:linkname Mm256Permutevar8X32Ps Mm256Permutevar8X32Ps\n//go:noescape\nfunc Mm256Permutevar8X32Ps(r *x86.M256, v0 *x86.M256, v1 *x86.M256I)\n\n// Shift packed 32-bit integers in \"a\" left by the amount specified by the corresponding element in \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SllvEpi32 Mm256SllvEpi32\n//go:noescape\nfunc Mm256SllvEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Shift packed 32-bit integers in \"a\" left by the amount specified by the corresponding element in \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSllvEpi32 MmSllvEpi32\n//go:noescape\nfunc MmSllvEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Shift packed 64-bit integers in \"a\" left by the amount specified by the corresponding element in \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SllvEpi64 Mm256SllvEpi64\n//go:noescape\nfunc Mm256SllvEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Shift packed 64-bit integers in \"a\" left by the amount specified by the corresponding element in \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSllvEpi64 MmSllvEpi64\n//go:noescape\nfunc MmSllvEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Shift packed 32-bit integers in \"a\" right by the amount specified by the corresponding element in \"count\" while shifting in sign bits, and store the results in \"dst\".\n//\n//go:linkname Mm256SravEpi32 Mm256SravEpi32\n//go:noescape\nfunc Mm256SravEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Shift packed 32-bit integers in \"a\" right by the amount specified by the corresponding element in \"count\" while shifting in sign bits, and store the results in \"dst\".\n//\n//go:linkname MmSravEpi32 MmSravEpi32\n//go:noescape\nfunc MmSravEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Shift packed 32-bit integers in \"a\" right by the amount specified by the corresponding element in \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SrlvEpi32 Mm256SrlvEpi32\n//go:noescape\nfunc Mm256SrlvEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Shift packed 32-bit integers in \"a\" right by the amount specified by the corresponding element in \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSrlvEpi32 MmSrlvEpi32\n//go:noescape\nfunc MmSrlvEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Shift packed 64-bit integers in \"a\" right by the amount specified by the corresponding element in \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname Mm256SrlvEpi64 Mm256SrlvEpi64\n//go:noescape\nfunc Mm256SrlvEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I)\n\n// Shift packed 64-bit integers in \"a\" right by the amount specified by the corresponding element in \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSrlvEpi64 MmSrlvEpi64\n//go:noescape\nfunc MmSrlvEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n"
  },
  {
    "path": "x86/bmi/functions.c",
    "content": "#include <immintrin.h>\n\nvoid AndnU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = __andn_u32(*v0, *v1); }\nvoid BextrU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = __bextr_u32(*v0, *v1); }\nvoid BextrU32(unsigned int* r, unsigned int* v0, unsigned int* v1, unsigned int* v2) { *r = _bextr_u32(*v0, *v1, *v2); }\nvoid Bextr2U32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _bextr2_u32(*v0, *v1); }\nvoid BlsiU32(unsigned int* r, unsigned int* v0) { *r = __blsi_u32(*v0); }\nvoid BlsmskU32(unsigned int* r, unsigned int* v0) { *r = __blsmsk_u32(*v0); }\nvoid BlsrU32(unsigned int* r, unsigned int* v0) { *r = __blsr_u32(*v0); }\nvoid AndnU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = __andn_u64(*v0, *v1); }\nvoid BextrU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = __bextr_u64(*v0, *v1); }\nvoid BextrU64(unsigned long long* r, unsigned long long* v0, unsigned int* v1, unsigned int* v2) { *r = _bextr_u64(*v0, *v1, *v2); }\nvoid Bextr2U64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _bextr2_u64(*v0, *v1); }\nvoid BlsiU64(unsigned long long* r, unsigned long long* v0) { *r = __blsi_u64(*v0); }\nvoid BlsmskU64(unsigned long long* r, unsigned long long* v0) { *r = __blsmsk_u64(*v0); }\nvoid BlsrU64(unsigned long long* r, unsigned long long* v0) { *r = __blsr_u64(*v0); }\n"
  },
  {
    "path": "x86/bmi/functions.go",
    "content": "package bmi\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mbmi\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// __andn_u32\n//\n//go:linkname AndnU32 AndnU32\n//go:noescape\nfunc AndnU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint)\n\n// __bextr_u32\n//\n//go:linkname BextrU32 BextrU32\n//go:noescape\nfunc BextrU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint)\n\n// Extract contiguous bits from unsigned 32-bit integer \"a\", and store the result in \"dst\". Extract the number of bits specified by \"len\", starting at the bit specified by \"start\".\n//\n//go:linkname BextrU32 BextrU32\n//go:noescape\nfunc BextrU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint, v2 *x86.Uint)\n\n// Extract contiguous bits from unsigned 32-bit integer \"a\", and store the result in \"dst\". Extract the number of bits specified by bits 15:8 of \"control\", starting at the bit specified by bits 0:7 of \"control\".\n//\n//go:linkname Bextr2U32 Bextr2U32\n//go:noescape\nfunc Bextr2U32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint)\n\n// __blsi_u32\n//\n//go:linkname BlsiU32 BlsiU32\n//go:noescape\nfunc BlsiU32(r *x86.Uint, v0 *x86.Uint)\n\n// __blsmsk_u32\n//\n//go:linkname BlsmskU32 BlsmskU32\n//go:noescape\nfunc BlsmskU32(r *x86.Uint, v0 *x86.Uint)\n\n// __blsr_u32\n//\n//go:linkname BlsrU32 BlsrU32\n//go:noescape\nfunc BlsrU32(r *x86.Uint, v0 *x86.Uint)\n\n// __andn_u64\n//\n//go:linkname AndnU64 AndnU64\n//go:noescape\nfunc AndnU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong)\n\n// __bextr_u64\n//\n//go:linkname BextrU64 BextrU64\n//go:noescape\nfunc BextrU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong)\n\n// Extract contiguous bits from unsigned 64-bit integer \"a\", and store the result in \"dst\". Extract the number of bits specified by \"len\", starting at the bit specified by \"start\".\n//\n//go:linkname BextrU64 BextrU64\n//go:noescape\nfunc BextrU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Uint, v2 *x86.Uint)\n\n// Extract contiguous bits from unsigned 64-bit integer \"a\", and store the result in \"dst\". Extract the number of bits specified by bits 15:8 of \"control\", starting at the bit specified by bits 0:7 of \"control\"..\n//\n//go:linkname Bextr2U64 Bextr2U64\n//go:noescape\nfunc Bextr2U64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong)\n\n// __blsi_u64\n//\n//go:linkname BlsiU64 BlsiU64\n//go:noescape\nfunc BlsiU64(r *x86.Ulonglong, v0 *x86.Ulonglong)\n\n// __blsmsk_u64\n//\n//go:linkname BlsmskU64 BlsmskU64\n//go:noescape\nfunc BlsmskU64(r *x86.Ulonglong, v0 *x86.Ulonglong)\n\n// __blsr_u64\n//\n//go:linkname BlsrU64 BlsrU64\n//go:noescape\nfunc BlsrU64(r *x86.Ulonglong, v0 *x86.Ulonglong)\n"
  },
  {
    "path": "x86/bmi2/functions.c",
    "content": "#include <immintrin.h>\n\nvoid BzhiU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _bzhi_u32(*v0, *v1); }\nvoid PdepU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _pdep_u32(*v0, *v1); }\nvoid PextU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _pext_u32(*v0, *v1); }\nvoid BzhiU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _bzhi_u64(*v0, *v1); }\nvoid PdepU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _pdep_u64(*v0, *v1); }\nvoid PextU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _pext_u64(*v0, *v1); }\n"
  },
  {
    "path": "x86/bmi2/functions.go",
    "content": "package bmi2\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mbmi2\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Copy all bits from unsigned 32-bit integer \"a\" to \"dst\", and reset (set to 0) the high bits in \"dst\" starting at \"index\".\n//\n//go:linkname BzhiU32 BzhiU32\n//go:noescape\nfunc BzhiU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint)\n\n// Deposit contiguous low bits from unsigned 32-bit integer \"a\" to \"dst\" at the corresponding bit locations specified by \"mask\"; all other bits in \"dst\" are set to zero.\n//\n//go:linkname PdepU32 PdepU32\n//go:noescape\nfunc PdepU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint)\n\n// Extract bits from unsigned 32-bit integer \"a\" at the corresponding bit locations specified by \"mask\" to contiguous low bits in \"dst\"; the remaining upper bits in \"dst\" are set to zero.\n//\n//go:linkname PextU32 PextU32\n//go:noescape\nfunc PextU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint)\n\n// Copy all bits from unsigned 64-bit integer \"a\" to \"dst\", and reset (set to 0) the high bits in \"dst\" starting at \"index\".\n//\n//go:linkname BzhiU64 BzhiU64\n//go:noescape\nfunc BzhiU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong)\n\n// Deposit contiguous low bits from unsigned 64-bit integer \"a\" to \"dst\" at the corresponding bit locations specified by \"mask\"; all other bits in \"dst\" are set to zero.\n//\n//go:linkname PdepU64 PdepU64\n//go:noescape\nfunc PdepU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong)\n\n// Extract bits from unsigned 64-bit integer \"a\" at the corresponding bit locations specified by \"mask\" to contiguous low bits in \"dst\"; the remaining upper bits in \"dst\" are set to zero.\n//\n//go:linkname PextU64 PextU64\n//go:noescape\nfunc PextU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong)\n"
  },
  {
    "path": "x86/crc32/functions.c",
    "content": "#include <immintrin.h>\n\nvoid MmCrc32U8(unsigned int* r, unsigned int* v0, unsigned char* v1) { *r = _mm_crc32_u8(*v0, *v1); }\nvoid MmCrc32U16(unsigned int* r, unsigned int* v0, unsigned short* v1) { *r = _mm_crc32_u16(*v0, *v1); }\nvoid MmCrc32U32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _mm_crc32_u32(*v0, *v1); }\nvoid MmCrc32U64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _mm_crc32_u64(*v0, *v1); }\n"
  },
  {
    "path": "x86/crc32/functions.go",
    "content": "package crc32\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mcrc32\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Starting with the initial value in \"crc\", accumulates a CRC32 value for unsigned 8-bit integer \"v\", and stores the result in \"dst\".\n//\n//go:linkname MmCrc32U8 MmCrc32U8\n//go:noescape\nfunc MmCrc32U8(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uchar)\n\n// Starting with the initial value in \"crc\", accumulates a CRC32 value for unsigned 16-bit integer \"v\", and stores the result in \"dst\".\n//\n//go:linkname MmCrc32U16 MmCrc32U16\n//go:noescape\nfunc MmCrc32U16(r *x86.Uint, v0 *x86.Uint, v1 *x86.Ushort)\n\n// Starting with the initial value in \"crc\", accumulates a CRC32 value for unsigned 32-bit integer \"v\", and stores the result in \"dst\".\n//\n//go:linkname MmCrc32U32 MmCrc32U32\n//go:noescape\nfunc MmCrc32U32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint)\n\n// Starting with the initial value in \"crc\", accumulates a CRC32 value for unsigned 64-bit integer \"v\", and stores the result in \"dst\".\n//\n//go:linkname MmCrc32U64 MmCrc32U64\n//go:noescape\nfunc MmCrc32U64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong)\n"
  },
  {
    "path": "x86/f16c/functions.c",
    "content": "#include <immintrin.h>\n\nvoid CvtshSs(float* r, unsigned short* v0) { *r = _cvtsh_ss(*v0); }\nvoid MmCvtphPs(__m128* r, __m128i* v0) { *r = _mm_cvtph_ps(*v0); }\nvoid Mm256CvtphPs(__m256* r, __m128i* v0) { *r = _mm256_cvtph_ps(*v0); }\n"
  },
  {
    "path": "x86/f16c/functions.go",
    "content": "package f16c\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mf16c\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Convert the half-precision (16-bit) floating-point value \"a\" to a single-precision (32-bit) floating-point value, and store the result in \"dst\".\n//\n//go:linkname CvtshSs CvtshSs\n//go:noescape\nfunc CvtshSs(r *x86.Float, v0 *x86.Ushort)\n\n// Convert packed half-precision (16-bit) floating-point elements in \"a\" to packed single-precision (32-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname MmCvtphPs MmCvtphPs\n//go:noescape\nfunc MmCvtphPs(r *x86.M128, v0 *x86.M128I)\n\n// Convert packed half-precision (16-bit) floating-point elements in \"a\" to packed single-precision (32-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname Mm256CvtphPs Mm256CvtphPs\n//go:noescape\nfunc Mm256CvtphPs(r *x86.M256, v0 *x86.M128I)\n"
  },
  {
    "path": "x86/fma/functions.c",
    "content": "#include <immintrin.h>\n\nvoid MmFmaddPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmadd_ps(*v0, *v1, *v2); }\nvoid MmFmaddPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmadd_pd(*v0, *v1, *v2); }\nvoid MmFmaddSs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmadd_ss(*v0, *v1, *v2); }\nvoid MmFmaddSd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmadd_sd(*v0, *v1, *v2); }\nvoid MmFmsubPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmsub_ps(*v0, *v1, *v2); }\nvoid MmFmsubPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmsub_pd(*v0, *v1, *v2); }\nvoid MmFmsubSs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmsub_ss(*v0, *v1, *v2); }\nvoid MmFmsubSd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmsub_sd(*v0, *v1, *v2); }\nvoid MmFnmaddPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fnmadd_ps(*v0, *v1, *v2); }\nvoid MmFnmaddPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fnmadd_pd(*v0, *v1, *v2); }\nvoid MmFnmaddSs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fnmadd_ss(*v0, *v1, *v2); }\nvoid MmFnmaddSd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fnmadd_sd(*v0, *v1, *v2); }\nvoid MmFnmsubPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fnmsub_ps(*v0, *v1, *v2); }\nvoid MmFnmsubPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fnmsub_pd(*v0, *v1, *v2); }\nvoid MmFnmsubSs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fnmsub_ss(*v0, *v1, *v2); }\nvoid MmFnmsubSd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fnmsub_sd(*v0, *v1, *v2); }\nvoid MmFmaddsubPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmaddsub_ps(*v0, *v1, *v2); }\nvoid MmFmaddsubPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmaddsub_pd(*v0, *v1, *v2); }\nvoid MmFmsubaddPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmsubadd_ps(*v0, *v1, *v2); }\nvoid MmFmsubaddPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmsubadd_pd(*v0, *v1, *v2); }\nvoid Mm256FmaddPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fmadd_ps(*v0, *v1, *v2); }\nvoid Mm256FmaddPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fmadd_pd(*v0, *v1, *v2); }\nvoid Mm256FmsubPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fmsub_ps(*v0, *v1, *v2); }\nvoid Mm256FmsubPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fmsub_pd(*v0, *v1, *v2); }\nvoid Mm256FnmaddPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fnmadd_ps(*v0, *v1, *v2); }\nvoid Mm256FnmaddPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fnmadd_pd(*v0, *v1, *v2); }\nvoid Mm256FnmsubPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fnmsub_ps(*v0, *v1, *v2); }\nvoid Mm256FnmsubPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fnmsub_pd(*v0, *v1, *v2); }\nvoid Mm256FmaddsubPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fmaddsub_ps(*v0, *v1, *v2); }\nvoid Mm256FmaddsubPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fmaddsub_pd(*v0, *v1, *v2); }\nvoid Mm256FmsubaddPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fmsubadd_ps(*v0, *v1, *v2); }\nvoid Mm256FmsubaddPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fmsubadd_pd(*v0, *v1, *v2); }\n"
  },
  {
    "path": "x86/fma/functions.go",
    "content": "package fma\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mfma\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Multiply packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", add the intermediate result to packed elements in \"c\", and store the results in \"dst\".\n//\n//go:linkname MmFmaddPs MmFmaddPs\n//go:noescape\nfunc MmFmaddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)\n\n// Multiply packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", add the intermediate result to packed elements in \"c\", and store the results in \"dst\".\n//\n//go:linkname MmFmaddPd MmFmaddPd\n//go:noescape\nfunc MmFmaddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)\n\n// Multiply the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\", and add the intermediate result to the lower element in \"c\". Store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmFmaddSs MmFmaddSs\n//go:noescape\nfunc MmFmaddSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)\n\n// Multiply the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\", and add the intermediate result to the lower element in \"c\". Store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmFmaddSd MmFmaddSd\n//go:noescape\nfunc MmFmaddSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)\n\n// Multiply packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", subtract packed elements in \"c\" from the intermediate result, and store the results in \"dst\".\n//\n//go:linkname MmFmsubPs MmFmsubPs\n//go:noescape\nfunc MmFmsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)\n\n// Multiply packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", subtract packed elements in \"c\" from the intermediate result, and store the results in \"dst\".\n//\n//go:linkname MmFmsubPd MmFmsubPd\n//go:noescape\nfunc MmFmsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)\n\n// Multiply the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\", and subtract the lower element in \"c\" from the intermediate result. Store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmFmsubSs MmFmsubSs\n//go:noescape\nfunc MmFmsubSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)\n\n// Multiply the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\", and subtract the lower element in \"c\" from the intermediate result. Store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmFmsubSd MmFmsubSd\n//go:noescape\nfunc MmFmsubSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)\n\n// Multiply packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", add the negated intermediate result to packed elements in \"c\", and store the results in \"dst\".\n//\n//go:linkname MmFnmaddPs MmFnmaddPs\n//go:noescape\nfunc MmFnmaddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)\n\n// Multiply packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", add the negated intermediate result to packed elements in \"c\", and store the results in \"dst\".\n//\n//go:linkname MmFnmaddPd MmFnmaddPd\n//go:noescape\nfunc MmFnmaddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)\n\n// Multiply the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\", and add the negated intermediate result to the lower element in \"c\". Store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmFnmaddSs MmFnmaddSs\n//go:noescape\nfunc MmFnmaddSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)\n\n// Multiply the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\", and add the negated intermediate result to the lower element in \"c\". Store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmFnmaddSd MmFnmaddSd\n//go:noescape\nfunc MmFnmaddSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)\n\n// Multiply packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", subtract packed elements in \"c\" from the negated intermediate result, and store the results in \"dst\".\n//\n//go:linkname MmFnmsubPs MmFnmsubPs\n//go:noescape\nfunc MmFnmsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)\n\n// Multiply packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", subtract packed elements in \"c\" from the negated intermediate result, and store the results in \"dst\".\n//\n//go:linkname MmFnmsubPd MmFnmsubPd\n//go:noescape\nfunc MmFnmsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)\n\n// Multiply the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\", and subtract the lower element in \"c\" from the negated intermediate result. Store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmFnmsubSs MmFnmsubSs\n//go:noescape\nfunc MmFnmsubSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)\n\n// Multiply the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\", and subtract the lower element in \"c\" from the negated intermediate result. Store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmFnmsubSd MmFnmsubSd\n//go:noescape\nfunc MmFnmsubSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)\n\n// Multiply packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", alternatively add and subtract packed elements in \"c\" to/from the intermediate result, and store the results in \"dst\".\n//\n//go:linkname MmFmaddsubPs MmFmaddsubPs\n//go:noescape\nfunc MmFmaddsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)\n\n// Multiply packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", alternatively add and subtract packed elements in \"c\" to/from the intermediate result, and store the results in \"dst\".\n//\n//go:linkname MmFmaddsubPd MmFmaddsubPd\n//go:noescape\nfunc MmFmaddsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)\n\n// Multiply packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", alternatively subtract and add packed elements in \"c\" from/to the intermediate result, and store the results in \"dst\".\n//\n//go:linkname MmFmsubaddPs MmFmsubaddPs\n//go:noescape\nfunc MmFmsubaddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128)\n\n// Multiply packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", alternatively subtract and add packed elements in \"c\" from/to the intermediate result, and store the results in \"dst\".\n//\n//go:linkname MmFmsubaddPd MmFmsubaddPd\n//go:noescape\nfunc MmFmsubaddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D)\n\n// Multiply packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", add the intermediate result to packed elements in \"c\", and store the results in \"dst\".\n//\n//go:linkname Mm256FmaddPs Mm256FmaddPs\n//go:noescape\nfunc Mm256FmaddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256)\n\n// Multiply packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", add the intermediate result to packed elements in \"c\", and store the results in \"dst\".\n//\n//go:linkname Mm256FmaddPd Mm256FmaddPd\n//go:noescape\nfunc Mm256FmaddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D)\n\n// Multiply packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", subtract packed elements in \"c\" from the intermediate result, and store the results in \"dst\".\n//\n//go:linkname Mm256FmsubPs Mm256FmsubPs\n//go:noescape\nfunc Mm256FmsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256)\n\n// Multiply packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", subtract packed elements in \"c\" from the intermediate result, and store the results in \"dst\".\n//\n//go:linkname Mm256FmsubPd Mm256FmsubPd\n//go:noescape\nfunc Mm256FmsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D)\n\n// Multiply packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", add the negated intermediate result to packed elements in \"c\", and store the results in \"dst\".\n//\n//go:linkname Mm256FnmaddPs Mm256FnmaddPs\n//go:noescape\nfunc Mm256FnmaddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256)\n\n// Multiply packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", add the negated intermediate result to packed elements in \"c\", and store the results in \"dst\".\n//\n//go:linkname Mm256FnmaddPd Mm256FnmaddPd\n//go:noescape\nfunc Mm256FnmaddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D)\n\n// Multiply packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", subtract packed elements in \"c\" from the negated intermediate result, and store the results in \"dst\".\n//\n//go:linkname Mm256FnmsubPs Mm256FnmsubPs\n//go:noescape\nfunc Mm256FnmsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256)\n\n// Multiply packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", subtract packed elements in \"c\" from the negated intermediate result, and store the results in \"dst\".\n//\n//go:linkname Mm256FnmsubPd Mm256FnmsubPd\n//go:noescape\nfunc Mm256FnmsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D)\n\n// Multiply packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", alternatively add and subtract packed elements in \"c\" to/from the intermediate result, and store the results in \"dst\".\n//\n//go:linkname Mm256FmaddsubPs Mm256FmaddsubPs\n//go:noescape\nfunc Mm256FmaddsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256)\n\n// Multiply packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", alternatively add and subtract packed elements in \"c\" to/from the intermediate result, and store the results in \"dst\".\n//\n//go:linkname Mm256FmaddsubPd Mm256FmaddsubPd\n//go:noescape\nfunc Mm256FmaddsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D)\n\n// Multiply packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", alternatively subtract and add packed elements in \"c\" from/to the intermediate result, and store the results in \"dst\".\n//\n//go:linkname Mm256FmsubaddPs Mm256FmsubaddPs\n//go:noescape\nfunc Mm256FmsubaddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256)\n\n// Multiply packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", alternatively subtract and add packed elements in \"c\" from/to the intermediate result, and store the results in \"dst\".\n//\n//go:linkname Mm256FmsubaddPd Mm256FmsubaddPd\n//go:noescape\nfunc Mm256FmsubaddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D)\n"
  },
  {
    "path": "x86/fsgsbase/functions.c",
    "content": "#include <immintrin.h>\n\nvoid ReadfsbaseU32(unsigned int* r) { *r = _readfsbase_u32(); }\nvoid ReadfsbaseU64(unsigned long long* r) { *r = _readfsbase_u64(); }\nvoid ReadgsbaseU32(unsigned int* r) { *r = _readgsbase_u32(); }\nvoid ReadgsbaseU64(unsigned long long* r) { *r = _readgsbase_u64(); }\nvoid WritefsbaseU32(unsigned int* v0) { _writefsbase_u32(*v0); }\nvoid WritefsbaseU64(unsigned long long* v0) { _writefsbase_u64(*v0); }\nvoid WritegsbaseU32(unsigned int* v0) { _writegsbase_u32(*v0); }\nvoid WritegsbaseU64(unsigned long long* v0) { _writegsbase_u64(*v0); }\n"
  },
  {
    "path": "x86/fsgsbase/functions.go",
    "content": "package fsgsbase\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mfsgsbase\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Read the FS segment base register and store the 32-bit result in \"dst\".\n//\n//go:linkname ReadfsbaseU32 ReadfsbaseU32\n//go:noescape\nfunc ReadfsbaseU32(r *x86.Uint, )\n\n// Read the FS segment base register and store the 64-bit result in \"dst\".\n//\n//go:linkname ReadfsbaseU64 ReadfsbaseU64\n//go:noescape\nfunc ReadfsbaseU64(r *x86.Ulonglong, )\n\n// Read the GS segment base register and store the 32-bit result in \"dst\".\n//\n//go:linkname ReadgsbaseU32 ReadgsbaseU32\n//go:noescape\nfunc ReadgsbaseU32(r *x86.Uint, )\n\n// Read the GS segment base register and store the 64-bit result in \"dst\".\n//\n//go:linkname ReadgsbaseU64 ReadgsbaseU64\n//go:noescape\nfunc ReadgsbaseU64(r *x86.Ulonglong, )\n\n// Write the unsigned 32-bit integer \"a\" to the FS segment base register.\n//\n//go:linkname WritefsbaseU32 WritefsbaseU32\n//go:noescape\nfunc WritefsbaseU32(v0 *x86.Uint)\n\n// Write the unsigned 64-bit integer \"a\" to the FS segment base register.\n//\n//go:linkname WritefsbaseU64 WritefsbaseU64\n//go:noescape\nfunc WritefsbaseU64(v0 *x86.Ulonglong)\n\n// Write the unsigned 32-bit integer \"a\" to the GS segment base register.\n//\n//go:linkname WritegsbaseU32 WritegsbaseU32\n//go:noescape\nfunc WritegsbaseU32(v0 *x86.Uint)\n\n// Write the unsigned 64-bit integer \"a\" to the GS segment base register.\n//\n//go:linkname WritegsbaseU64 WritegsbaseU64\n//go:noescape\nfunc WritegsbaseU64(v0 *x86.Ulonglong)\n"
  },
  {
    "path": "x86/generate.go",
    "content": "package x86\n\n//go:generate go run ../generator/x86\n"
  },
  {
    "path": "x86/lzcnt/functions.c",
    "content": "#include <immintrin.h>\n\nvoid Lzcnt32(unsigned int* r, unsigned int* v0) { *r = __lzcnt32(*v0); }\nvoid LzcntU32(unsigned int* r, unsigned int* v0) { *r = _lzcnt_u32(*v0); }\nvoid LzcntU64(unsigned long long* r, unsigned long long* v0) { *r = _lzcnt_u64(*v0); }\n"
  },
  {
    "path": "x86/lzcnt/functions.go",
    "content": "package lzcnt\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mlzcnt\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// __lzcnt32\n//\n//go:linkname Lzcnt32 Lzcnt32\n//go:noescape\nfunc Lzcnt32(r *x86.Uint, v0 *x86.Uint)\n\n// Count the number of leading zero bits in unsigned 32-bit integer \"a\", and return that count in \"dst\".\n//\n//go:linkname LzcntU32 LzcntU32\n//go:noescape\nfunc LzcntU32(r *x86.Uint, v0 *x86.Uint)\n\n// Count the number of leading zero bits in unsigned 64-bit integer \"a\", and return that count in \"dst\".\n//\n//go:linkname LzcntU64 LzcntU64\n//go:noescape\nfunc LzcntU64(r *x86.Ulonglong, v0 *x86.Ulonglong)\n"
  },
  {
    "path": "x86/mmx/functions.c",
    "content": "#include <immintrin.h>\n\nvoid MmEmpty() { _mm_empty(); }\nvoid MmCvtsi32Si64(__m64* r, int* v0) { *r = _mm_cvtsi32_si64(*v0); }\nvoid MmCvtsi64Si32(int* r, __m64* v0) { *r = _mm_cvtsi64_si32(*v0); }\nvoid MmCvtsi64M64(__m64* r, long long* v0) { *r = _mm_cvtsi64_m64(*v0); }\nvoid MmCvtm64Si64(long long* r, __m64* v0) { *r = _mm_cvtm64_si64(*v0); }\nvoid MmPacksPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_packs_pi16(*v0, *v1); }\nvoid MmPacksPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_packs_pi32(*v0, *v1); }\nvoid MmPacksPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_packs_pu16(*v0, *v1); }\nvoid MmUnpackhiPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpackhi_pi8(*v0, *v1); }\nvoid MmUnpackhiPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpackhi_pi16(*v0, *v1); }\nvoid MmUnpackhiPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpackhi_pi32(*v0, *v1); }\nvoid MmUnpackloPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpacklo_pi8(*v0, *v1); }\nvoid MmUnpackloPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpacklo_pi16(*v0, *v1); }\nvoid MmUnpackloPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpacklo_pi32(*v0, *v1); }\nvoid MmAddPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_add_pi8(*v0, *v1); }\nvoid MmAddPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_add_pi16(*v0, *v1); }\nvoid MmAddPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_add_pi32(*v0, *v1); }\nvoid MmAddsPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_adds_pi8(*v0, *v1); }\nvoid MmAddsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_adds_pi16(*v0, *v1); }\nvoid MmAddsPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_adds_pu8(*v0, *v1); }\nvoid MmAddsPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_adds_pu16(*v0, *v1); }\nvoid MmSubPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sub_pi8(*v0, *v1); }\nvoid MmSubPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sub_pi16(*v0, *v1); }\nvoid MmSubPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sub_pi32(*v0, *v1); }\nvoid MmSubsPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_subs_pi8(*v0, *v1); }\nvoid MmSubsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_subs_pi16(*v0, *v1); }\nvoid MmSubsPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_subs_pu8(*v0, *v1); }\nvoid MmSubsPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_subs_pu16(*v0, *v1); }\nvoid MmMaddPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_madd_pi16(*v0, *v1); }\nvoid MmMulhiPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mulhi_pi16(*v0, *v1); }\nvoid MmMulloPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mullo_pi16(*v0, *v1); }\nvoid MmSllPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sll_pi16(*v0, *v1); }\nvoid MmSlliPi16(__m64* r, __m64* v0, int* v1) { *r = _mm_slli_pi16(*v0, *v1); }\nvoid MmSllPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sll_pi32(*v0, *v1); }\nvoid MmSlliPi32(__m64* r, __m64* v0, int* v1) { *r = _mm_slli_pi32(*v0, *v1); }\nvoid MmSllSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sll_si64(*v0, *v1); }\nvoid MmSlliSi64(__m64* r, __m64* v0, int* v1) { *r = _mm_slli_si64(*v0, *v1); }\nvoid MmSraPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sra_pi16(*v0, *v1); }\nvoid MmSraiPi16(__m64* r, __m64* v0, int* v1) { *r = _mm_srai_pi16(*v0, *v1); }\nvoid MmSraPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sra_pi32(*v0, *v1); }\nvoid MmSraiPi32(__m64* r, __m64* v0, int* v1) { *r = _mm_srai_pi32(*v0, *v1); }\nvoid MmSrlPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_srl_pi16(*v0, *v1); }\nvoid MmSrliPi16(__m64* r, __m64* v0, int* v1) { *r = _mm_srli_pi16(*v0, *v1); }\nvoid MmSrlPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_srl_pi32(*v0, *v1); }\nvoid MmSrliPi32(__m64* r, __m64* v0, int* v1) { *r = _mm_srli_pi32(*v0, *v1); }\nvoid MmSrlSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_srl_si64(*v0, *v1); }\nvoid MmSrliSi64(__m64* r, __m64* v0, int* v1) { *r = _mm_srli_si64(*v0, *v1); }\nvoid MmAndSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_and_si64(*v0, *v1); }\nvoid MmAndnotSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_andnot_si64(*v0, *v1); }\nvoid MmOrSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_or_si64(*v0, *v1); }\nvoid MmXorSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_xor_si64(*v0, *v1); }\nvoid MmCmpeqPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpeq_pi8(*v0, *v1); }\nvoid MmCmpeqPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpeq_pi16(*v0, *v1); }\nvoid MmCmpeqPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpeq_pi32(*v0, *v1); }\nvoid MmCmpgtPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpgt_pi8(*v0, *v1); }\nvoid MmCmpgtPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpgt_pi16(*v0, *v1); }\nvoid MmCmpgtPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpgt_pi32(*v0, *v1); }\nvoid MmSetzeroSi64(__m64* r) { *r = _mm_setzero_si64(); }\nvoid MmSetPi32(__m64* r, int* v0, int* v1) { *r = _mm_set_pi32(*v0, *v1); }\nvoid MmSetPi16(__m64* r, short* v0, short* v1, short* v2, short* v3) { *r = _mm_set_pi16(*v0, *v1, *v2, *v3); }\nvoid MmSetPi8(__m64* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7) { *r = _mm_set_pi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }\nvoid MmSet1Pi32(__m64* r, int* v0) { *r = _mm_set1_pi32(*v0); }\nvoid MmSet1Pi16(__m64* r, short* v0) { *r = _mm_set1_pi16(*v0); }\nvoid MmSet1Pi8(__m64* r, char* v0) { *r = _mm_set1_pi8(*v0); }\nvoid MmSetrPi32(__m64* r, int* v0, int* v1) { *r = _mm_setr_pi32(*v0, *v1); }\nvoid MmSetrPi16(__m64* r, short* v0, short* v1, short* v2, short* v3) { *r = _mm_setr_pi16(*v0, *v1, *v2, *v3); }\nvoid MmSetrPi8(__m64* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7) { *r = _mm_setr_pi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }\n"
  },
  {
    "path": "x86/mmx/functions.go",
    "content": "package mmx\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mmmx\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Empty the MMX state, which marks the x87 FPU registers as available for use by x87 instructions. This instruction must be used at the end of all MMX technology procedures.\n//\n//go:linkname MmEmpty MmEmpty\n//go:noescape\nfunc MmEmpty()\n\n// Copy 32-bit integer \"a\" to the lower elements of \"dst\", and zero the upper element of \"dst\".\n//\n//go:linkname MmCvtsi32Si64 MmCvtsi32Si64\n//go:noescape\nfunc MmCvtsi32Si64(r *x86.M64, v0 *x86.Int)\n\n// Copy the lower 32-bit integer in \"a\" to \"dst\".\n//\n//go:linkname MmCvtsi64Si32 MmCvtsi64Si32\n//go:noescape\nfunc MmCvtsi64Si32(r *x86.Int, v0 *x86.M64)\n\n// Copy 64-bit integer \"a\" to \"dst\".\n//\n//go:linkname MmCvtsi64M64 MmCvtsi64M64\n//go:noescape\nfunc MmCvtsi64M64(r *x86.M64, v0 *x86.Longlong)\n\n// Copy 64-bit integer \"a\" to \"dst\".\n//\n//go:linkname MmCvtm64Si64 MmCvtm64Si64\n//go:noescape\nfunc MmCvtm64Si64(r *x86.Longlong, v0 *x86.M64)\n\n// Convert packed signed 16-bit integers from \"a\" and \"b\" to packed 8-bit integers using signed saturation, and store the results in \"dst\".\n//\n//go:linkname MmPacksPi16 MmPacksPi16\n//go:noescape\nfunc MmPacksPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Convert packed signed 32-bit integers from \"a\" and \"b\" to packed 16-bit integers using signed saturation, and store the results in \"dst\".\n//\n//go:linkname MmPacksPi32 MmPacksPi32\n//go:noescape\nfunc MmPacksPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Convert packed signed 16-bit integers from \"a\" and \"b\" to packed 8-bit integers using unsigned saturation, and store the results in \"dst\".\n//\n//go:linkname MmPacksPu16 MmPacksPu16\n//go:noescape\nfunc MmPacksPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Unpack and interleave 8-bit integers from the high half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackhiPi8 MmUnpackhiPi8\n//go:noescape\nfunc MmUnpackhiPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Unpack and interleave 16-bit integers from the high half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackhiPi16 MmUnpackhiPi16\n//go:noescape\nfunc MmUnpackhiPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Unpack and interleave 32-bit integers from the high half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackhiPi32 MmUnpackhiPi32\n//go:noescape\nfunc MmUnpackhiPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Unpack and interleave 8-bit integers from the low half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackloPi8 MmUnpackloPi8\n//go:noescape\nfunc MmUnpackloPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Unpack and interleave 16-bit integers from the low half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackloPi16 MmUnpackloPi16\n//go:noescape\nfunc MmUnpackloPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Unpack and interleave 32-bit integers from the low half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackloPi32 MmUnpackloPi32\n//go:noescape\nfunc MmUnpackloPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Add packed 8-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAddPi8 MmAddPi8\n//go:noescape\nfunc MmAddPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Add packed 16-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAddPi16 MmAddPi16\n//go:noescape\nfunc MmAddPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Add packed 32-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAddPi32 MmAddPi32\n//go:noescape\nfunc MmAddPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Add packed signed 8-bit integers in \"a\" and \"b\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmAddsPi8 MmAddsPi8\n//go:noescape\nfunc MmAddsPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Add packed signed 16-bit integers in \"a\" and \"b\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmAddsPi16 MmAddsPi16\n//go:noescape\nfunc MmAddsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Add packed unsigned 8-bit integers in \"a\" and \"b\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmAddsPu8 MmAddsPu8\n//go:noescape\nfunc MmAddsPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Add packed unsigned 16-bit integers in \"a\" and \"b\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmAddsPu16 MmAddsPu16\n//go:noescape\nfunc MmAddsPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Subtract packed 8-bit integers in \"b\" from packed 8-bit integers in \"a\", and store the results in \"dst\".\n//\n//go:linkname MmSubPi8 MmSubPi8\n//go:noescape\nfunc MmSubPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Subtract packed 16-bit integers in \"b\" from packed 16-bit integers in \"a\", and store the results in \"dst\".\n//\n//go:linkname MmSubPi16 MmSubPi16\n//go:noescape\nfunc MmSubPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Subtract packed 32-bit integers in \"b\" from packed 32-bit integers in \"a\", and store the results in \"dst\".\n//\n//go:linkname MmSubPi32 MmSubPi32\n//go:noescape\nfunc MmSubPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Subtract packed signed 8-bit integers in \"b\" from packed 8-bit integers in \"a\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmSubsPi8 MmSubsPi8\n//go:noescape\nfunc MmSubsPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Subtract packed signed 16-bit integers in \"b\" from packed 16-bit integers in \"a\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmSubsPi16 MmSubsPi16\n//go:noescape\nfunc MmSubsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Subtract packed unsigned 8-bit integers in \"b\" from packed unsigned 8-bit integers in \"a\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmSubsPu8 MmSubsPu8\n//go:noescape\nfunc MmSubsPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Subtract packed unsigned 16-bit integers in \"b\" from packed unsigned 16-bit integers in \"a\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmSubsPu16 MmSubsPu16\n//go:noescape\nfunc MmSubsPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Multiply packed signed 16-bit integers in \"a\" and \"b\", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in \"dst\".\n//\n//go:linkname MmMaddPi16 MmMaddPi16\n//go:noescape\nfunc MmMaddPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Multiply the packed signed 16-bit integers in \"a\" and \"b\", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in \"dst\".\n//\n//go:linkname MmMulhiPi16 MmMulhiPi16\n//go:noescape\nfunc MmMulhiPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Multiply the packed 16-bit integers in \"a\" and \"b\", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in \"dst\".\n//\n//go:linkname MmMulloPi16 MmMulloPi16\n//go:noescape\nfunc MmMulloPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Shift packed 16-bit integers in \"a\" left by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSllPi16 MmSllPi16\n//go:noescape\nfunc MmSllPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Shift packed 16-bit integers in \"a\" left by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSlliPi16 MmSlliPi16\n//go:noescape\nfunc MmSlliPi16(r *x86.M64, v0 *x86.M64, v1 *x86.Int)\n\n// Shift packed 32-bit integers in \"a\" left by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSllPi32 MmSllPi32\n//go:noescape\nfunc MmSllPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Shift packed 32-bit integers in \"a\" left by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSlliPi32 MmSlliPi32\n//go:noescape\nfunc MmSlliPi32(r *x86.M64, v0 *x86.M64, v1 *x86.Int)\n\n// Shift 64-bit integer \"a\" left by \"count\" while shifting in zeros, and store the result in \"dst\".\n//\n//go:linkname MmSllSi64 MmSllSi64\n//go:noescape\nfunc MmSllSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Shift 64-bit integer \"a\" left by \"imm8\" while shifting in zeros, and store the result in \"dst\".\n//\n//go:linkname MmSlliSi64 MmSlliSi64\n//go:noescape\nfunc MmSlliSi64(r *x86.M64, v0 *x86.M64, v1 *x86.Int)\n\n// Shift packed 16-bit integers in \"a\" right by \"count\" while shifting in sign bits, and store the results in \"dst\".\n//\n//go:linkname MmSraPi16 MmSraPi16\n//go:noescape\nfunc MmSraPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Shift packed 16-bit integers in \"a\" right by \"imm8\" while shifting in sign bits, and store the results in \"dst\".\n//\n//go:linkname MmSraiPi16 MmSraiPi16\n//go:noescape\nfunc MmSraiPi16(r *x86.M64, v0 *x86.M64, v1 *x86.Int)\n\n// Shift packed 32-bit integers in \"a\" right by \"count\" while shifting in sign bits, and store the results in \"dst\".\n//\n//go:linkname MmSraPi32 MmSraPi32\n//go:noescape\nfunc MmSraPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Shift packed 32-bit integers in \"a\" right by \"imm8\" while shifting in sign bits, and store the results in \"dst\".\n//\n//go:linkname MmSraiPi32 MmSraiPi32\n//go:noescape\nfunc MmSraiPi32(r *x86.M64, v0 *x86.M64, v1 *x86.Int)\n\n// Shift packed 16-bit integers in \"a\" right by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSrlPi16 MmSrlPi16\n//go:noescape\nfunc MmSrlPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Shift packed 16-bit integers in \"a\" right by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSrliPi16 MmSrliPi16\n//go:noescape\nfunc MmSrliPi16(r *x86.M64, v0 *x86.M64, v1 *x86.Int)\n\n// Shift packed 32-bit integers in \"a\" right by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSrlPi32 MmSrlPi32\n//go:noescape\nfunc MmSrlPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Shift packed 32-bit integers in \"a\" right by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSrliPi32 MmSrliPi32\n//go:noescape\nfunc MmSrliPi32(r *x86.M64, v0 *x86.M64, v1 *x86.Int)\n\n// Shift 64-bit integer \"a\" right by \"count\" while shifting in zeros, and store the result in \"dst\".\n//\n//go:linkname MmSrlSi64 MmSrlSi64\n//go:noescape\nfunc MmSrlSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Shift 64-bit integer \"a\" right by \"imm8\" while shifting in zeros, and store the result in \"dst\".\n//\n//go:linkname MmSrliSi64 MmSrliSi64\n//go:noescape\nfunc MmSrliSi64(r *x86.M64, v0 *x86.M64, v1 *x86.Int)\n\n// Compute the bitwise AND of 64 bits (representing integer data) in \"a\" and \"b\", and store the result in \"dst\".\n//\n//go:linkname MmAndSi64 MmAndSi64\n//go:noescape\nfunc MmAndSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Compute the bitwise NOT of 64 bits (representing integer data) in \"a\" and then AND with \"b\", and store the result in \"dst\".\n//\n//go:linkname MmAndnotSi64 MmAndnotSi64\n//go:noescape\nfunc MmAndnotSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Compute the bitwise OR of 64 bits (representing integer data) in \"a\" and \"b\", and store the result in \"dst\".\n//\n//go:linkname MmOrSi64 MmOrSi64\n//go:noescape\nfunc MmOrSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Compute the bitwise XOR of 64 bits (representing integer data) in \"a\" and \"b\", and store the result in \"dst\".\n//\n//go:linkname MmXorSi64 MmXorSi64\n//go:noescape\nfunc MmXorSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Compare packed 8-bit integers in \"a\" and \"b\" for equality, and store the results in \"dst\".\n//\n//go:linkname MmCmpeqPi8 MmCmpeqPi8\n//go:noescape\nfunc MmCmpeqPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Compare packed 16-bit integers in \"a\" and \"b\" for equality, and store the results in \"dst\".\n//\n//go:linkname MmCmpeqPi16 MmCmpeqPi16\n//go:noescape\nfunc MmCmpeqPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Compare packed 32-bit integers in \"a\" and \"b\" for equality, and store the results in \"dst\".\n//\n//go:linkname MmCmpeqPi32 MmCmpeqPi32\n//go:noescape\nfunc MmCmpeqPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Compare packed signed 8-bit integers in \"a\" and \"b\" for greater-than, and store the results in \"dst\".\n//\n//go:linkname MmCmpgtPi8 MmCmpgtPi8\n//go:noescape\nfunc MmCmpgtPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Compare packed signed 16-bit integers in \"a\" and \"b\" for greater-than, and store the results in \"dst\".\n//\n//go:linkname MmCmpgtPi16 MmCmpgtPi16\n//go:noescape\nfunc MmCmpgtPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Compare packed signed 32-bit integers in \"a\" and \"b\" for greater-than, and store the results in \"dst\".\n//\n//go:linkname MmCmpgtPi32 MmCmpgtPi32\n//go:noescape\nfunc MmCmpgtPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Return vector of type __m64 with all elements set to zero.\n//\n//go:linkname MmSetzeroSi64 MmSetzeroSi64\n//go:noescape\nfunc MmSetzeroSi64(r *x86.M64, )\n\n// Set packed 32-bit integers in \"dst\" with the supplied values.\n//\n//go:linkname MmSetPi32 MmSetPi32\n//go:noescape\nfunc MmSetPi32(r *x86.M64, v0 *x86.Int, v1 *x86.Int)\n\n// Set packed 16-bit integers in \"dst\" with the supplied values.\n//\n//go:linkname MmSetPi16 MmSetPi16\n//go:noescape\nfunc MmSetPi16(r *x86.M64, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short)\n\n// Set packed 8-bit integers in \"dst\" with the supplied values.\n//\n//go:linkname MmSetPi8 MmSetPi8\n//go:noescape\nfunc MmSetPi8(r *x86.M64, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char)\n\n// Broadcast 32-bit integer \"a\" to all elements of \"dst\".\n//\n//go:linkname MmSet1Pi32 MmSet1Pi32\n//go:noescape\nfunc MmSet1Pi32(r *x86.M64, v0 *x86.Int)\n\n// Broadcast 16-bit integer \"a\" to all all elements of \"dst\".\n//\n//go:linkname MmSet1Pi16 MmSet1Pi16\n//go:noescape\nfunc MmSet1Pi16(r *x86.M64, v0 *x86.Short)\n\n// Broadcast 8-bit integer \"a\" to all elements of \"dst\".\n//\n//go:linkname MmSet1Pi8 MmSet1Pi8\n//go:noescape\nfunc MmSet1Pi8(r *x86.M64, v0 *x86.Char)\n\n// Set packed 32-bit integers in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname MmSetrPi32 MmSetrPi32\n//go:noescape\nfunc MmSetrPi32(r *x86.M64, v0 *x86.Int, v1 *x86.Int)\n\n// Set packed 16-bit integers in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname MmSetrPi16 MmSetrPi16\n//go:noescape\nfunc MmSetrPi16(r *x86.M64, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short)\n\n// Set packed 8-bit integers in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname MmSetrPi8 MmSetrPi8\n//go:noescape\nfunc MmSetrPi8(r *x86.M64, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char)\n"
  },
  {
    "path": "x86/mmx_sse/functions.c",
    "content": "#include <immintrin.h>\n\nvoid MmCvtpsPi32(__m64* r, __m128* v0) { *r = _mm_cvtps_pi32(*v0); }\nvoid MmCvtPs2Pi(__m64* r, __m128* v0) { *r = _mm_cvt_ps2pi(*v0); }\nvoid MmCvttpsPi32(__m64* r, __m128* v0) { *r = _mm_cvttps_pi32(*v0); }\nvoid MmCvttPs2Pi(__m64* r, __m128* v0) { *r = _mm_cvtt_ps2pi(*v0); }\nvoid MmCvtpi32Ps(__m128* r, __m128* v0, __m64* v1) { *r = _mm_cvtpi32_ps(*v0, *v1); }\nvoid MmCvtPi2Ps(__m128* r, __m128* v0, __m64* v1) { *r = _mm_cvt_pi2ps(*v0, *v1); }\nvoid MmMaxPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_max_pi16(*v0, *v1); }\nvoid MmMaxPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_max_pu8(*v0, *v1); }\nvoid MmMinPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_min_pi16(*v0, *v1); }\nvoid MmMinPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_min_pu8(*v0, *v1); }\nvoid MmMovemaskPi8(int* r, __m64* v0) { *r = _mm_movemask_pi8(*v0); }\nvoid MmMulhiPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mulhi_pu16(*v0, *v1); }\nvoid MmAvgPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_avg_pu8(*v0, *v1); }\nvoid MmAvgPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_avg_pu16(*v0, *v1); }\nvoid MmSadPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sad_pu8(*v0, *v1); }\nvoid MmCvtpi16Ps(__m128* r, __m64* v0) { *r = _mm_cvtpi16_ps(*v0); }\nvoid MmCvtpu16Ps(__m128* r, __m64* v0) { *r = _mm_cvtpu16_ps(*v0); }\nvoid MmCvtpi8Ps(__m128* r, __m64* v0) { *r = _mm_cvtpi8_ps(*v0); }\nvoid MmCvtpu8Ps(__m128* r, __m64* v0) { *r = _mm_cvtpu8_ps(*v0); }\nvoid MmCvtpi32X2Ps(__m128* r, __m64* v0, __m64* v1) { *r = _mm_cvtpi32x2_ps(*v0, *v1); }\nvoid MmCvtpsPi16(__m64* r, __m128* v0) { *r = _mm_cvtps_pi16(*v0); }\nvoid MmCvtpsPi8(__m64* r, __m128* v0) { *r = _mm_cvtps_pi8(*v0); }\n"
  },
  {
    "path": "x86/mmx_sse/functions.go",
    "content": "package mmx_sse\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mmmx -msse\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Convert packed single-precision (32-bit) floating-point elements in \"a\" to packed 32-bit integers, and store the results in \"dst\".\n//\n//go:linkname MmCvtpsPi32 MmCvtpsPi32\n//go:noescape\nfunc MmCvtpsPi32(r *x86.M64, v0 *x86.M128)\n\n// Convert packed single-precision (32-bit) floating-point elements in \"a\" to packed 32-bit integers, and store the results in \"dst\".\n//\n//go:linkname MmCvtPs2Pi MmCvtPs2Pi\n//go:noescape\nfunc MmCvtPs2Pi(r *x86.M64, v0 *x86.M128)\n\n// Convert packed single-precision (32-bit) floating-point elements in \"a\" to packed 32-bit integers with truncation, and store the results in \"dst\".\n//\n//go:linkname MmCvttpsPi32 MmCvttpsPi32\n//go:noescape\nfunc MmCvttpsPi32(r *x86.M64, v0 *x86.M128)\n\n// Convert packed single-precision (32-bit) floating-point elements in \"a\" to packed 32-bit integers with truncation, and store the results in \"dst\".\n//\n//go:linkname MmCvttPs2Pi MmCvttPs2Pi\n//go:noescape\nfunc MmCvttPs2Pi(r *x86.M64, v0 *x86.M128)\n\n// Convert packed 32-bit integers in \"b\" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of \"dst\", and copy the upper 2 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCvtpi32Ps MmCvtpi32Ps\n//go:noescape\nfunc MmCvtpi32Ps(r *x86.M128, v0 *x86.M128, v1 *x86.M64)\n\n// Convert packed signed 32-bit integers in \"b\" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of \"dst\", and copy the upper 2 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCvtPi2Ps MmCvtPi2Ps\n//go:noescape\nfunc MmCvtPi2Ps(r *x86.M128, v0 *x86.M128, v1 *x86.M64)\n\n// Compare packed signed 16-bit integers in \"a\" and \"b\", and store packed maximum values in \"dst\".\n//\n//go:linkname MmMaxPi16 MmMaxPi16\n//go:noescape\nfunc MmMaxPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Compare packed unsigned 8-bit integers in \"a\" and \"b\", and store packed maximum values in \"dst\".\n//\n//go:linkname MmMaxPu8 MmMaxPu8\n//go:noescape\nfunc MmMaxPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Compare packed signed 16-bit integers in \"a\" and \"b\", and store packed minimum values in \"dst\".\n//\n//go:linkname MmMinPi16 MmMinPi16\n//go:noescape\nfunc MmMinPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Compare packed unsigned 8-bit integers in \"a\" and \"b\", and store packed minimum values in \"dst\".\n//\n//go:linkname MmMinPu8 MmMinPu8\n//go:noescape\nfunc MmMinPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Create mask from the most significant bit of each 8-bit element in \"a\", and store the result in \"dst\".\n//\n//go:linkname MmMovemaskPi8 MmMovemaskPi8\n//go:noescape\nfunc MmMovemaskPi8(r *x86.Int, v0 *x86.M64)\n\n// Multiply the packed unsigned 16-bit integers in \"a\" and \"b\", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in \"dst\".\n//\n//go:linkname MmMulhiPu16 MmMulhiPu16\n//go:noescape\nfunc MmMulhiPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Average packed unsigned 8-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAvgPu8 MmAvgPu8\n//go:noescape\nfunc MmAvgPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Average packed unsigned 16-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAvgPu16 MmAvgPu16\n//go:noescape\nfunc MmAvgPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Compute the absolute differences of packed unsigned 8-bit integers in \"a\" and \"b\", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of \"dst\".\n//\n//go:linkname MmSadPu8 MmSadPu8\n//go:noescape\nfunc MmSadPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Convert packed 16-bit integers in \"a\" to packed single-precision (32-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname MmCvtpi16Ps MmCvtpi16Ps\n//go:noescape\nfunc MmCvtpi16Ps(r *x86.M128, v0 *x86.M64)\n\n// Convert packed unsigned 16-bit integers in \"a\" to packed single-precision (32-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname MmCvtpu16Ps MmCvtpu16Ps\n//go:noescape\nfunc MmCvtpu16Ps(r *x86.M128, v0 *x86.M64)\n\n// Convert the lower packed 8-bit integers in \"a\" to packed single-precision (32-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname MmCvtpi8Ps MmCvtpi8Ps\n//go:noescape\nfunc MmCvtpi8Ps(r *x86.M128, v0 *x86.M64)\n\n// Convert the lower packed unsigned 8-bit integers in \"a\" to packed single-precision (32-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname MmCvtpu8Ps MmCvtpu8Ps\n//go:noescape\nfunc MmCvtpu8Ps(r *x86.M128, v0 *x86.M64)\n\n// Convert packed signed 32-bit integers in \"a\" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of \"dst\", then covert the packed signed 32-bit integers in \"b\" to single-precision (32-bit) floating-point element, and store the results in the upper 2 elements of \"dst\".\n//\n//go:linkname MmCvtpi32X2Ps MmCvtpi32X2Ps\n//go:noescape\nfunc MmCvtpi32X2Ps(r *x86.M128, v0 *x86.M64, v1 *x86.M64)\n\n// Convert packed single-precision (32-bit) floating-point elements in \"a\" to packed 16-bit integers, and store the results in \"dst\". Note: this intrinsic will generate 0x7FFF, rather than 0x8000, for input values between 0x7FFF and 0x7FFFFFFF.\n//\n//go:linkname MmCvtpsPi16 MmCvtpsPi16\n//go:noescape\nfunc MmCvtpsPi16(r *x86.M64, v0 *x86.M128)\n\n// Convert packed single-precision (32-bit) floating-point elements in \"a\" to packed 8-bit integers, and store the results in lower 4 elements of \"dst\". Note: this intrinsic will generate 0x7F, rather than 0x80, for input values between 0x7F and 0x7FFFFFFF.\n//\n//go:linkname MmCvtpsPi8 MmCvtpsPi8\n//go:noescape\nfunc MmCvtpsPi8(r *x86.M64, v0 *x86.M128)\n"
  },
  {
    "path": "x86/mmx_sse2/functions.c",
    "content": "#include <immintrin.h>\n\nvoid MmCvtpdPi32(__m64* r, __m128d* v0) { *r = _mm_cvtpd_pi32(*v0); }\nvoid MmCvttpdPi32(__m64* r, __m128d* v0) { *r = _mm_cvttpd_pi32(*v0); }\nvoid MmCvtpi32Pd(__m128d* r, __m64* v0) { *r = _mm_cvtpi32_pd(*v0); }\nvoid MmAddSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_add_si64(*v0, *v1); }\nvoid MmMulSu32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mul_su32(*v0, *v1); }\nvoid MmSubSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sub_si64(*v0, *v1); }\n"
  },
  {
    "path": "x86/mmx_sse2/functions.go",
    "content": "package mmx_sse2\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mmmx -msse2\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Convert packed double-precision (64-bit) floating-point elements in \"a\" to packed 32-bit integers, and store the results in \"dst\".\n//\n//go:linkname MmCvtpdPi32 MmCvtpdPi32\n//go:noescape\nfunc MmCvtpdPi32(r *x86.M64, v0 *x86.M128D)\n\n// Convert packed double-precision (64-bit) floating-point elements in \"a\" to packed 32-bit integers with truncation, and store the results in \"dst\".\n//\n//go:linkname MmCvttpdPi32 MmCvttpdPi32\n//go:noescape\nfunc MmCvttpdPi32(r *x86.M64, v0 *x86.M128D)\n\n// Convert packed signed 32-bit integers in \"a\" to packed double-precision (64-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname MmCvtpi32Pd MmCvtpi32Pd\n//go:noescape\nfunc MmCvtpi32Pd(r *x86.M128D, v0 *x86.M64)\n\n// Add 64-bit integers \"a\" and \"b\", and store the result in \"dst\".\n//\n//go:linkname MmAddSi64 MmAddSi64\n//go:noescape\nfunc MmAddSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Multiply the low unsigned 32-bit integers from \"a\" and \"b\", and store the unsigned 64-bit result in \"dst\".\n//\n//go:linkname MmMulSu32 MmMulSu32\n//go:noescape\nfunc MmMulSu32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Subtract 64-bit integer \"b\" from 64-bit integer \"a\", and store the result in \"dst\".\n//\n//go:linkname MmSubSi64 MmSubSi64\n//go:noescape\nfunc MmSubSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n"
  },
  {
    "path": "x86/mmx_ssse3/functions.c",
    "content": "#include <immintrin.h>\n\nvoid MmAbsPi8(__m64* r, __m64* v0) { *r = _mm_abs_pi8(*v0); }\nvoid MmAbsPi16(__m64* r, __m64* v0) { *r = _mm_abs_pi16(*v0); }\nvoid MmAbsPi32(__m64* r, __m64* v0) { *r = _mm_abs_pi32(*v0); }\nvoid MmHaddPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hadd_pi16(*v0, *v1); }\nvoid MmHaddPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hadd_pi32(*v0, *v1); }\nvoid MmHaddsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hadds_pi16(*v0, *v1); }\nvoid MmHsubPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hsub_pi16(*v0, *v1); }\nvoid MmHsubPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hsub_pi32(*v0, *v1); }\nvoid MmHsubsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hsubs_pi16(*v0, *v1); }\nvoid MmMaddubsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_maddubs_pi16(*v0, *v1); }\nvoid MmMulhrsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mulhrs_pi16(*v0, *v1); }\nvoid MmShufflePi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_shuffle_pi8(*v0, *v1); }\nvoid MmSignPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sign_pi8(*v0, *v1); }\nvoid MmSignPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sign_pi16(*v0, *v1); }\nvoid MmSignPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sign_pi32(*v0, *v1); }\n"
  },
  {
    "path": "x86/mmx_ssse3/functions.go",
    "content": "package mmx_ssse3\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mmmx -mssse3\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Compute the absolute value of packed signed 8-bit integers in \"a\", and store the unsigned results in \"dst\".\n//\n//go:linkname MmAbsPi8 MmAbsPi8\n//go:noescape\nfunc MmAbsPi8(r *x86.M64, v0 *x86.M64)\n\n// Compute the absolute value of packed signed 16-bit integers in \"a\", and store the unsigned results in \"dst\".\n//\n//go:linkname MmAbsPi16 MmAbsPi16\n//go:noescape\nfunc MmAbsPi16(r *x86.M64, v0 *x86.M64)\n\n// Compute the absolute value of packed signed 32-bit integers in \"a\", and store the unsigned results in \"dst\".\n//\n//go:linkname MmAbsPi32 MmAbsPi32\n//go:noescape\nfunc MmAbsPi32(r *x86.M64, v0 *x86.M64)\n\n// Horizontally add adjacent pairs of 16-bit integers in \"a\" and \"b\", and pack the signed 16-bit results in \"dst\".\n//\n//go:linkname MmHaddPi16 MmHaddPi16\n//go:noescape\nfunc MmHaddPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Horizontally add adjacent pairs of 32-bit integers in \"a\" and \"b\", and pack the signed 32-bit results in \"dst\".\n//\n//go:linkname MmHaddPi32 MmHaddPi32\n//go:noescape\nfunc MmHaddPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Horizontally add adjacent pairs of signed 16-bit integers in \"a\" and \"b\" using saturation, and pack the signed 16-bit results in \"dst\".\n//\n//go:linkname MmHaddsPi16 MmHaddsPi16\n//go:noescape\nfunc MmHaddsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Horizontally subtract adjacent pairs of 16-bit integers in \"a\" and \"b\", and pack the signed 16-bit results in \"dst\".\n//\n//go:linkname MmHsubPi16 MmHsubPi16\n//go:noescape\nfunc MmHsubPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Horizontally subtract adjacent pairs of 32-bit integers in \"a\" and \"b\", and pack the signed 32-bit results in \"dst\".\n//\n//go:linkname MmHsubPi32 MmHsubPi32\n//go:noescape\nfunc MmHsubPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Horizontally subtract adjacent pairs of signed 16-bit integers in \"a\" and \"b\" using saturation, and pack the signed 16-bit results in \"dst\".\n//\n//go:linkname MmHsubsPi16 MmHsubsPi16\n//go:noescape\nfunc MmHsubsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Vertically multiply each unsigned 8-bit integer from \"a\" with the corresponding signed 8-bit integer from \"b\", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in \"dst\".\n//\n//go:linkname MmMaddubsPi16 MmMaddubsPi16\n//go:noescape\nfunc MmMaddubsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Multiply packed signed 16-bit integers in \"a\" and \"b\", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to \"dst\".\n//\n//go:linkname MmMulhrsPi16 MmMulhrsPi16\n//go:noescape\nfunc MmMulhrsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Shuffle packed 8-bit integers in \"a\" according to shuffle control mask in the corresponding 8-bit element of \"b\", and store the results in \"dst\".\n//\n//go:linkname MmShufflePi8 MmShufflePi8\n//go:noescape\nfunc MmShufflePi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Negate packed 8-bit integers in \"a\" when the corresponding signed 8-bit integer in \"b\" is negative, and store the results in \"dst\". Element in \"dst\" are zeroed out when the corresponding element in \"b\" is zero.\n//\n//go:linkname MmSignPi8 MmSignPi8\n//go:noescape\nfunc MmSignPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Negate packed 16-bit integers in \"a\" when the corresponding signed 16-bit integer in \"b\" is negative, and store the results in \"dst\". Element in \"dst\" are zeroed out when the corresponding element in \"b\" is zero.\n//\n//go:linkname MmSignPi16 MmSignPi16\n//go:noescape\nfunc MmSignPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n\n// Negate packed 32-bit integers in \"a\" when the corresponding signed 32-bit integer in \"b\" is negative, and store the results in \"dst\". Element in \"dst\" are zeroed out when the corresponding element in \"b\" is zero.\n//\n//go:linkname MmSignPi32 MmSignPi32\n//go:noescape\nfunc MmSignPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64)\n"
  },
  {
    "path": "x86/popcnt/functions.c",
    "content": "#include <immintrin.h>\n\nvoid MmPopcntU32(int* r, unsigned int* v0) { *r = _mm_popcnt_u32(*v0); }\nvoid MmPopcntU64(long long* r, unsigned long long* v0) { *r = _mm_popcnt_u64(*v0); }\n"
  },
  {
    "path": "x86/popcnt/functions.go",
    "content": "package popcnt\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mpopcnt\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Count the number of bits set to 1 in unsigned 32-bit integer \"a\", and return that count in \"dst\".\n//\n//go:linkname MmPopcntU32 MmPopcntU32\n//go:noescape\nfunc MmPopcntU32(r *x86.Int, v0 *x86.Uint)\n\n// Count the number of bits set to 1 in unsigned 64-bit integer \"a\", and return that count in \"dst\".\n//\n//go:linkname MmPopcntU64 MmPopcntU64\n//go:noescape\nfunc MmPopcntU64(r *x86.Longlong, v0 *x86.Ulonglong)\n"
  },
  {
    "path": "x86/sse/functions.c",
    "content": "#include <immintrin.h>\n\nvoid MmAddSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_add_ss(*v0, *v1); }\nvoid MmAddPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_add_ps(*v0, *v1); }\nvoid MmSubSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_sub_ss(*v0, *v1); }\nvoid MmSubPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_sub_ps(*v0, *v1); }\nvoid MmMulSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_mul_ss(*v0, *v1); }\nvoid MmMulPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_mul_ps(*v0, *v1); }\nvoid MmDivSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_div_ss(*v0, *v1); }\nvoid MmDivPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_div_ps(*v0, *v1); }\nvoid MmSqrtSs(__m128* r, __m128* v0) { *r = _mm_sqrt_ss(*v0); }\nvoid MmSqrtPs(__m128* r, __m128* v0) { *r = _mm_sqrt_ps(*v0); }\nvoid MmRcpSs(__m128* r, __m128* v0) { *r = _mm_rcp_ss(*v0); }\nvoid MmRcpPs(__m128* r, __m128* v0) { *r = _mm_rcp_ps(*v0); }\nvoid MmRsqrtSs(__m128* r, __m128* v0) { *r = _mm_rsqrt_ss(*v0); }\nvoid MmRsqrtPs(__m128* r, __m128* v0) { *r = _mm_rsqrt_ps(*v0); }\nvoid MmMinSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_min_ss(*v0, *v1); }\nvoid MmMinPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_min_ps(*v0, *v1); }\nvoid MmMaxSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_max_ss(*v0, *v1); }\nvoid MmMaxPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_max_ps(*v0, *v1); }\nvoid MmAndPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_and_ps(*v0, *v1); }\nvoid MmAndnotPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_andnot_ps(*v0, *v1); }\nvoid MmOrPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_or_ps(*v0, *v1); }\nvoid MmXorPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_xor_ps(*v0, *v1); }\nvoid MmCmpeqSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpeq_ss(*v0, *v1); }\nvoid MmCmpeqPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpeq_ps(*v0, *v1); }\nvoid MmCmpltSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmplt_ss(*v0, *v1); }\nvoid MmCmpltPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmplt_ps(*v0, *v1); }\nvoid MmCmpleSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmple_ss(*v0, *v1); }\nvoid MmCmplePs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmple_ps(*v0, *v1); }\nvoid MmCmpgtSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpgt_ss(*v0, *v1); }\nvoid MmCmpgtPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpgt_ps(*v0, *v1); }\nvoid MmCmpgeSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpge_ss(*v0, *v1); }\nvoid MmCmpgePs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpge_ps(*v0, *v1); }\nvoid MmCmpneqSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpneq_ss(*v0, *v1); }\nvoid MmCmpneqPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpneq_ps(*v0, *v1); }\nvoid MmCmpnltSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnlt_ss(*v0, *v1); }\nvoid MmCmpnltPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnlt_ps(*v0, *v1); }\nvoid MmCmpnleSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnle_ss(*v0, *v1); }\nvoid MmCmpnlePs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnle_ps(*v0, *v1); }\nvoid MmCmpngtSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpngt_ss(*v0, *v1); }\nvoid MmCmpngtPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpngt_ps(*v0, *v1); }\nvoid MmCmpngeSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnge_ss(*v0, *v1); }\nvoid MmCmpngePs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnge_ps(*v0, *v1); }\nvoid MmCmpordSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpord_ss(*v0, *v1); }\nvoid MmCmpordPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpord_ps(*v0, *v1); }\nvoid MmCmpunordSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpunord_ss(*v0, *v1); }\nvoid MmCmpunordPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpunord_ps(*v0, *v1); }\nvoid MmComieqSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comieq_ss(*v0, *v1); }\nvoid MmComiltSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comilt_ss(*v0, *v1); }\nvoid MmComileSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comile_ss(*v0, *v1); }\nvoid MmComigtSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comigt_ss(*v0, *v1); }\nvoid MmComigeSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comige_ss(*v0, *v1); }\nvoid MmComineqSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comineq_ss(*v0, *v1); }\nvoid MmUcomieqSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomieq_ss(*v0, *v1); }\nvoid MmUcomiltSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomilt_ss(*v0, *v1); }\nvoid MmUcomileSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomile_ss(*v0, *v1); }\nvoid MmUcomigtSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomigt_ss(*v0, *v1); }\nvoid MmUcomigeSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomige_ss(*v0, *v1); }\nvoid MmUcomineqSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomineq_ss(*v0, *v1); }\nvoid MmCvtssSi32(int* r, __m128* v0) { *r = _mm_cvtss_si32(*v0); }\nvoid MmCvtSs2Si(int* r, __m128* v0) { *r = _mm_cvt_ss2si(*v0); }\nvoid MmCvtssSi64(long long* r, __m128* v0) { *r = _mm_cvtss_si64(*v0); }\nvoid MmCvttssSi32(int* r, __m128* v0) { *r = _mm_cvttss_si32(*v0); }\nvoid MmCvttSs2Si(int* r, __m128* v0) { *r = _mm_cvtt_ss2si(*v0); }\nvoid MmCvttssSi64(long long* r, __m128* v0) { *r = _mm_cvttss_si64(*v0); }\nvoid MmCvtsi32Ss(__m128* r, __m128* v0, int* v1) { *r = _mm_cvtsi32_ss(*v0, *v1); }\nvoid MmCvtSi2Ss(__m128* r, __m128* v0, int* v1) { *r = _mm_cvt_si2ss(*v0, *v1); }\nvoid MmCvtsi64Ss(__m128* r, __m128* v0, long long* v1) { *r = _mm_cvtsi64_ss(*v0, *v1); }\nvoid MmCvtssF32(float* r, __m128* v0) { *r = _mm_cvtss_f32(*v0); }\nvoid MmUndefinedPs(__m128* r) { *r = _mm_undefined_ps(); }\nvoid MmSetSs(__m128* r, float* v0) { *r = _mm_set_ss(*v0); }\nvoid MmSet1Ps(__m128* r, float* v0) { *r = _mm_set1_ps(*v0); }\nvoid MmSetPs1(__m128* r, float* v0) { *r = _mm_set_ps1(*v0); }\nvoid MmSetPs(__m128* r, float* v0, float* v1, float* v2, float* v3) { *r = _mm_set_ps(*v0, *v1, *v2, *v3); }\nvoid MmSetrPs(__m128* r, float* v0, float* v1, float* v2, float* v3) { *r = _mm_setr_ps(*v0, *v1, *v2, *v3); }\nvoid MmSetzeroPs(__m128* r) { *r = _mm_setzero_ps(); }\nvoid MmUnpackhiPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_unpackhi_ps(*v0, *v1); }\nvoid MmUnpackloPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_unpacklo_ps(*v0, *v1); }\nvoid MmMoveSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_move_ss(*v0, *v1); }\nvoid MmMovehlPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_movehl_ps(*v0, *v1); }\nvoid MmMovelhPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_movelh_ps(*v0, *v1); }\nvoid MmMovemaskPs(int* r, __m128* v0) { *r = _mm_movemask_ps(*v0); }\n"
  },
  {
    "path": "x86/sse/functions.go",
    "content": "package sse\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -msse\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Add the lower single-precision (32-bit) floating-point element in \"a\" and \"b\", store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmAddSs MmAddSs\n//go:noescape\nfunc MmAddSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Add packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAddPs MmAddPs\n//go:noescape\nfunc MmAddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Subtract the lower single-precision (32-bit) floating-point element in \"b\" from the lower single-precision (32-bit) floating-point element in \"a\", store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmSubSs MmSubSs\n//go:noescape\nfunc MmSubSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Subtract packed single-precision (32-bit) floating-point elements in \"b\" from packed single-precision (32-bit) floating-point elements in \"a\", and store the results in \"dst\".\n//\n//go:linkname MmSubPs MmSubPs\n//go:noescape\nfunc MmSubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Multiply the lower single-precision (32-bit) floating-point element in \"a\" and \"b\", store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmMulSs MmMulSs\n//go:noescape\nfunc MmMulSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Multiply packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmMulPs MmMulPs\n//go:noescape\nfunc MmMulPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Divide the lower single-precision (32-bit) floating-point element in \"a\" by the lower single-precision (32-bit) floating-point element in \"b\", store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmDivSs MmDivSs\n//go:noescape\nfunc MmDivSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Divide packed single-precision (32-bit) floating-point elements in \"a\" by packed elements in \"b\", and store the results in \"dst\".\n//\n//go:linkname MmDivPs MmDivPs\n//go:noescape\nfunc MmDivPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compute the square root of the lower single-precision (32-bit) floating-point element in \"a\", store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmSqrtSs MmSqrtSs\n//go:noescape\nfunc MmSqrtSs(r *x86.M128, v0 *x86.M128)\n\n// Compute the square root of packed single-precision (32-bit) floating-point elements in \"a\", and store the results in \"dst\".\n//\n//go:linkname MmSqrtPs MmSqrtPs\n//go:noescape\nfunc MmSqrtPs(r *x86.M128, v0 *x86.M128)\n\n// Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in \"a\", store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\". The maximum relative error for this approximation is less than 1.5*2^-12.\n//\n//go:linkname MmRcpSs MmRcpSs\n//go:noescape\nfunc MmRcpSs(r *x86.M128, v0 *x86.M128)\n\n// Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in \"a\", and store the results in \"dst\". The maximum relative error for this approximation is less than 1.5*2^-12.\n//\n//go:linkname MmRcpPs MmRcpPs\n//go:noescape\nfunc MmRcpPs(r *x86.M128, v0 *x86.M128)\n\n// Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in \"a\", store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\". The maximum relative error for this approximation is less than 1.5*2^-12.\n//\n//go:linkname MmRsqrtSs MmRsqrtSs\n//go:noescape\nfunc MmRsqrtSs(r *x86.M128, v0 *x86.M128)\n\n// Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in \"a\", and store the results in \"dst\". The maximum relative error for this approximation is less than 1.5*2^-12.\n//\n//go:linkname MmRsqrtPs MmRsqrtPs\n//go:noescape\nfunc MmRsqrtPs(r *x86.M128, v0 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\", store the minimum value in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper element of \"dst\". [min_float_note]\n//\n//go:linkname MmMinSs MmMinSs\n//go:noescape\nfunc MmMinSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", and store packed minimum values in \"dst\". [min_float_note]\n//\n//go:linkname MmMinPs MmMinPs\n//go:noescape\nfunc MmMinPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\", store the maximum value in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper element of \"dst\". [max_float_note]\n//\n//go:linkname MmMaxSs MmMaxSs\n//go:noescape\nfunc MmMaxSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", and store packed maximum values in \"dst\". [max_float_note]\n//\n//go:linkname MmMaxPs MmMaxPs\n//go:noescape\nfunc MmMaxPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAndPs MmAndPs\n//go:noescape\nfunc MmAndPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in \"a\" and then AND with \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAndnotPs MmAndnotPs\n//go:noescape\nfunc MmAndnotPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmOrPs MmOrPs\n//go:noescape\nfunc MmOrPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmXorPs MmXorPs\n//go:noescape\nfunc MmXorPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\" for equality, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCmpeqSs MmCmpeqSs\n//go:noescape\nfunc MmCmpeqSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\" for equality, and store the results in \"dst\".\n//\n//go:linkname MmCmpeqPs MmCmpeqPs\n//go:noescape\nfunc MmCmpeqPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\" for less-than, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCmpltSs MmCmpltSs\n//go:noescape\nfunc MmCmpltSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\" for less-than, and store the results in \"dst\".\n//\n//go:linkname MmCmpltPs MmCmpltPs\n//go:noescape\nfunc MmCmpltPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\" for less-than-or-equal, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCmpleSs MmCmpleSs\n//go:noescape\nfunc MmCmpleSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\" for less-than-or-equal, and store the results in \"dst\".\n//\n//go:linkname MmCmplePs MmCmplePs\n//go:noescape\nfunc MmCmplePs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\" for greater-than, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCmpgtSs MmCmpgtSs\n//go:noescape\nfunc MmCmpgtSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\" for greater-than, and store the results in \"dst\".\n//\n//go:linkname MmCmpgtPs MmCmpgtPs\n//go:noescape\nfunc MmCmpgtPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\" for greater-than-or-equal, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCmpgeSs MmCmpgeSs\n//go:noescape\nfunc MmCmpgeSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\" for greater-than-or-equal, and store the results in \"dst\".\n//\n//go:linkname MmCmpgePs MmCmpgePs\n//go:noescape\nfunc MmCmpgePs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\" for not-equal, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCmpneqSs MmCmpneqSs\n//go:noescape\nfunc MmCmpneqSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\" for not-equal, and store the results in \"dst\".\n//\n//go:linkname MmCmpneqPs MmCmpneqPs\n//go:noescape\nfunc MmCmpneqPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\" for not-less-than, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCmpnltSs MmCmpnltSs\n//go:noescape\nfunc MmCmpnltSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\" for not-less-than, and store the results in \"dst\".\n//\n//go:linkname MmCmpnltPs MmCmpnltPs\n//go:noescape\nfunc MmCmpnltPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\" for not-less-than-or-equal, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCmpnleSs MmCmpnleSs\n//go:noescape\nfunc MmCmpnleSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\" for not-less-than-or-equal, and store the results in \"dst\".\n//\n//go:linkname MmCmpnlePs MmCmpnlePs\n//go:noescape\nfunc MmCmpnlePs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\" for not-greater-than, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCmpngtSs MmCmpngtSs\n//go:noescape\nfunc MmCmpngtSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\" for not-greater-than, and store the results in \"dst\".\n//\n//go:linkname MmCmpngtPs MmCmpngtPs\n//go:noescape\nfunc MmCmpngtPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\" for not-greater-than-or-equal, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCmpngeSs MmCmpngeSs\n//go:noescape\nfunc MmCmpngeSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\" for not-greater-than-or-equal, and store the results in \"dst\".\n//\n//go:linkname MmCmpngePs MmCmpngePs\n//go:noescape\nfunc MmCmpngePs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\" to see if neither is NaN, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCmpordSs MmCmpordSs\n//go:noescape\nfunc MmCmpordSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\" to see if neither is NaN, and store the results in \"dst\".\n//\n//go:linkname MmCmpordPs MmCmpordPs\n//go:noescape\nfunc MmCmpordPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point elements in \"a\" and \"b\" to see if either is NaN, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCmpunordSs MmCmpunordSs\n//go:noescape\nfunc MmCmpunordSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare packed single-precision (32-bit) floating-point elements in \"a\" and \"b\" to see if either is NaN, and store the results in \"dst\".\n//\n//go:linkname MmCmpunordPs MmCmpunordPs\n//go:noescape\nfunc MmCmpunordPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point element in \"a\" and \"b\" for equality, and return the boolean result (0 or 1).\n//\n//go:linkname MmComieqSs MmComieqSs\n//go:noescape\nfunc MmComieqSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point element in \"a\" and \"b\" for less-than, and return the boolean result (0 or 1).\n//\n//go:linkname MmComiltSs MmComiltSs\n//go:noescape\nfunc MmComiltSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point element in \"a\" and \"b\" for less-than-or-equal, and return the boolean result (0 or 1).\n//\n//go:linkname MmComileSs MmComileSs\n//go:noescape\nfunc MmComileSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point element in \"a\" and \"b\" for greater-than, and return the boolean result (0 or 1).\n//\n//go:linkname MmComigtSs MmComigtSs\n//go:noescape\nfunc MmComigtSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point element in \"a\" and \"b\" for greater-than-or-equal, and return the boolean result (0 or 1).\n//\n//go:linkname MmComigeSs MmComigeSs\n//go:noescape\nfunc MmComigeSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point element in \"a\" and \"b\" for not-equal, and return the boolean result (0 or 1).\n//\n//go:linkname MmComineqSs MmComineqSs\n//go:noescape\nfunc MmComineqSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point element in \"a\" and \"b\" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.\n//\n//go:linkname MmUcomieqSs MmUcomieqSs\n//go:noescape\nfunc MmUcomieqSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point element in \"a\" and \"b\" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.\n//\n//go:linkname MmUcomiltSs MmUcomiltSs\n//go:noescape\nfunc MmUcomiltSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point element in \"a\" and \"b\" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.\n//\n//go:linkname MmUcomileSs MmUcomileSs\n//go:noescape\nfunc MmUcomileSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point element in \"a\" and \"b\" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.\n//\n//go:linkname MmUcomigtSs MmUcomigtSs\n//go:noescape\nfunc MmUcomigtSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point element in \"a\" and \"b\" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.\n//\n//go:linkname MmUcomigeSs MmUcomigeSs\n//go:noescape\nfunc MmUcomigeSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Compare the lower single-precision (32-bit) floating-point element in \"a\" and \"b\" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.\n//\n//go:linkname MmUcomineqSs MmUcomineqSs\n//go:noescape\nfunc MmUcomineqSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128)\n\n// Convert the lower single-precision (32-bit) floating-point element in \"a\" to a 32-bit integer, and store the result in \"dst\".\n//\n//go:linkname MmCvtssSi32 MmCvtssSi32\n//go:noescape\nfunc MmCvtssSi32(r *x86.Int, v0 *x86.M128)\n\n// Convert the lower single-precision (32-bit) floating-point element in \"a\" to a 32-bit integer, and store the result in \"dst\".\n//\n//go:linkname MmCvtSs2Si MmCvtSs2Si\n//go:noescape\nfunc MmCvtSs2Si(r *x86.Int, v0 *x86.M128)\n\n// Convert the lower single-precision (32-bit) floating-point element in \"a\" to a 64-bit integer, and store the result in \"dst\".\n//\n//go:linkname MmCvtssSi64 MmCvtssSi64\n//go:noescape\nfunc MmCvtssSi64(r *x86.Longlong, v0 *x86.M128)\n\n// Convert the lower single-precision (32-bit) floating-point element in \"a\" to a 32-bit integer with truncation, and store the result in \"dst\".\n//\n//go:linkname MmCvttssSi32 MmCvttssSi32\n//go:noescape\nfunc MmCvttssSi32(r *x86.Int, v0 *x86.M128)\n\n// Convert the lower single-precision (32-bit) floating-point element in \"a\" to a 32-bit integer with truncation, and store the result in \"dst\".\n//\n//go:linkname MmCvttSs2Si MmCvttSs2Si\n//go:noescape\nfunc MmCvttSs2Si(r *x86.Int, v0 *x86.M128)\n\n// Convert the lower single-precision (32-bit) floating-point element in \"a\" to a 64-bit integer with truncation, and store the result in \"dst\".\n//\n//go:linkname MmCvttssSi64 MmCvttssSi64\n//go:noescape\nfunc MmCvttssSi64(r *x86.Longlong, v0 *x86.M128)\n\n// Convert the signed 32-bit integer \"b\" to a single-precision (32-bit) floating-point element, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCvtsi32Ss MmCvtsi32Ss\n//go:noescape\nfunc MmCvtsi32Ss(r *x86.M128, v0 *x86.M128, v1 *x86.Int)\n\n// Convert the signed 32-bit integer \"b\" to a single-precision (32-bit) floating-point element, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCvtSi2Ss MmCvtSi2Ss\n//go:noescape\nfunc MmCvtSi2Ss(r *x86.M128, v0 *x86.M128, v1 *x86.Int)\n\n// Convert the signed 64-bit integer \"b\" to a single-precision (32-bit) floating-point element, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCvtsi64Ss MmCvtsi64Ss\n//go:noescape\nfunc MmCvtsi64Ss(r *x86.M128, v0 *x86.M128, v1 *x86.Longlong)\n\n// Copy the lower single-precision (32-bit) floating-point element of \"a\" to \"dst\".\n//\n//go:linkname MmCvtssF32 MmCvtssF32\n//go:noescape\nfunc MmCvtssF32(r *x86.Float, v0 *x86.M128)\n\n// Return vector of type __m128 with undefined elements.\n//\n//go:linkname MmUndefinedPs MmUndefinedPs\n//go:noescape\nfunc MmUndefinedPs(r *x86.M128, )\n\n// Copy single-precision (32-bit) floating-point element \"a\" to the lower element of \"dst\", and zero the upper 3 elements.\n//\n//go:linkname MmSetSs MmSetSs\n//go:noescape\nfunc MmSetSs(r *x86.M128, v0 *x86.Float)\n\n// Broadcast single-precision (32-bit) floating-point value \"a\" to all elements of \"dst\".\n//\n//go:linkname MmSet1Ps MmSet1Ps\n//go:noescape\nfunc MmSet1Ps(r *x86.M128, v0 *x86.Float)\n\n// Broadcast single-precision (32-bit) floating-point value \"a\" to all elements of \"dst\".\n//\n//go:linkname MmSetPs1 MmSetPs1\n//go:noescape\nfunc MmSetPs1(r *x86.M128, v0 *x86.Float)\n\n// Set packed single-precision (32-bit) floating-point elements in \"dst\" with the supplied values.\n//\n//go:linkname MmSetPs MmSetPs\n//go:noescape\nfunc MmSetPs(r *x86.M128, v0 *x86.Float, v1 *x86.Float, v2 *x86.Float, v3 *x86.Float)\n\n// Set packed single-precision (32-bit) floating-point elements in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname MmSetrPs MmSetrPs\n//go:noescape\nfunc MmSetrPs(r *x86.M128, v0 *x86.Float, v1 *x86.Float, v2 *x86.Float, v3 *x86.Float)\n\n// Return vector of type __m128 with all elements set to zero.\n//\n//go:linkname MmSetzeroPs MmSetzeroPs\n//go:noescape\nfunc MmSetzeroPs(r *x86.M128, )\n\n// Unpack and interleave single-precision (32-bit) floating-point elements from the high half \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackhiPs MmUnpackhiPs\n//go:noescape\nfunc MmUnpackhiPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Unpack and interleave single-precision (32-bit) floating-point elements from the low half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackloPs MmUnpackloPs\n//go:noescape\nfunc MmUnpackloPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Move the lower single-precision (32-bit) floating-point element from \"b\" to the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmMoveSs MmMoveSs\n//go:noescape\nfunc MmMoveSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Move the upper 2 single-precision (32-bit) floating-point elements from \"b\" to the lower 2 elements of \"dst\", and copy the upper 2 elements from \"a\" to the upper 2 elements of \"dst\".\n//\n//go:linkname MmMovehlPs MmMovehlPs\n//go:noescape\nfunc MmMovehlPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Move the lower 2 single-precision (32-bit) floating-point elements from \"b\" to the upper 2 elements of \"dst\", and copy the lower 2 elements from \"a\" to the lower 2 elements of \"dst\".\n//\n//go:linkname MmMovelhPs MmMovelhPs\n//go:noescape\nfunc MmMovelhPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Set each bit of mask \"dst\" based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in \"a\".\n//\n//go:linkname MmMovemaskPs MmMovemaskPs\n//go:noescape\nfunc MmMovemaskPs(r *x86.Int, v0 *x86.M128)\n"
  },
  {
    "path": "x86/sse2/functions.c",
    "content": "#include <immintrin.h>\n\nvoid MmAddSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_add_sd(*v0, *v1); }\nvoid MmAddPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_add_pd(*v0, *v1); }\nvoid MmSubSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_sub_sd(*v0, *v1); }\nvoid MmSubPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_sub_pd(*v0, *v1); }\nvoid MmMulSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_mul_sd(*v0, *v1); }\nvoid MmMulPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_mul_pd(*v0, *v1); }\nvoid MmDivSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_div_sd(*v0, *v1); }\nvoid MmDivPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_div_pd(*v0, *v1); }\nvoid MmSqrtSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_sqrt_sd(*v0, *v1); }\nvoid MmSqrtPd(__m128d* r, __m128d* v0) { *r = _mm_sqrt_pd(*v0); }\nvoid MmMinSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_min_sd(*v0, *v1); }\nvoid MmMinPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_min_pd(*v0, *v1); }\nvoid MmMaxSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_max_sd(*v0, *v1); }\nvoid MmMaxPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_max_pd(*v0, *v1); }\nvoid MmAndPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_and_pd(*v0, *v1); }\nvoid MmAndnotPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_andnot_pd(*v0, *v1); }\nvoid MmOrPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_or_pd(*v0, *v1); }\nvoid MmXorPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_xor_pd(*v0, *v1); }\nvoid MmCmpeqPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpeq_pd(*v0, *v1); }\nvoid MmCmpltPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmplt_pd(*v0, *v1); }\nvoid MmCmplePd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmple_pd(*v0, *v1); }\nvoid MmCmpgtPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpgt_pd(*v0, *v1); }\nvoid MmCmpgePd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpge_pd(*v0, *v1); }\nvoid MmCmpordPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpord_pd(*v0, *v1); }\nvoid MmCmpunordPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpunord_pd(*v0, *v1); }\nvoid MmCmpneqPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpneq_pd(*v0, *v1); }\nvoid MmCmpnltPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnlt_pd(*v0, *v1); }\nvoid MmCmpnlePd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnle_pd(*v0, *v1); }\nvoid MmCmpngtPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpngt_pd(*v0, *v1); }\nvoid MmCmpngePd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnge_pd(*v0, *v1); }\nvoid MmCmpeqSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpeq_sd(*v0, *v1); }\nvoid MmCmpltSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmplt_sd(*v0, *v1); }\nvoid MmCmpleSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmple_sd(*v0, *v1); }\nvoid MmCmpgtSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpgt_sd(*v0, *v1); }\nvoid MmCmpgeSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpge_sd(*v0, *v1); }\nvoid MmCmpordSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpord_sd(*v0, *v1); }\nvoid MmCmpunordSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpunord_sd(*v0, *v1); }\nvoid MmCmpneqSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpneq_sd(*v0, *v1); }\nvoid MmCmpnltSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnlt_sd(*v0, *v1); }\nvoid MmCmpnleSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnle_sd(*v0, *v1); }\nvoid MmCmpngtSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpngt_sd(*v0, *v1); }\nvoid MmCmpngeSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnge_sd(*v0, *v1); }\nvoid MmComieqSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comieq_sd(*v0, *v1); }\nvoid MmComiltSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comilt_sd(*v0, *v1); }\nvoid MmComileSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comile_sd(*v0, *v1); }\nvoid MmComigtSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comigt_sd(*v0, *v1); }\nvoid MmComigeSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comige_sd(*v0, *v1); }\nvoid MmComineqSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comineq_sd(*v0, *v1); }\nvoid MmUcomieqSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomieq_sd(*v0, *v1); }\nvoid MmUcomiltSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomilt_sd(*v0, *v1); }\nvoid MmUcomileSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomile_sd(*v0, *v1); }\nvoid MmUcomigtSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomigt_sd(*v0, *v1); }\nvoid MmUcomigeSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomige_sd(*v0, *v1); }\nvoid MmUcomineqSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomineq_sd(*v0, *v1); }\nvoid MmCvtpdPs(__m128* r, __m128d* v0) { *r = _mm_cvtpd_ps(*v0); }\nvoid MmCvtpsPd(__m128d* r, __m128* v0) { *r = _mm_cvtps_pd(*v0); }\nvoid MmCvtepi32Pd(__m128d* r, __m128i* v0) { *r = _mm_cvtepi32_pd(*v0); }\nvoid MmCvtpdEpi32(__m128i* r, __m128d* v0) { *r = _mm_cvtpd_epi32(*v0); }\nvoid MmCvtsdSi32(int* r, __m128d* v0) { *r = _mm_cvtsd_si32(*v0); }\nvoid MmCvtsdSs(__m128* r, __m128* v0, __m128d* v1) { *r = _mm_cvtsd_ss(*v0, *v1); }\nvoid MmCvtsi32Sd(__m128d* r, __m128d* v0, int* v1) { *r = _mm_cvtsi32_sd(*v0, *v1); }\nvoid MmCvtssSd(__m128d* r, __m128d* v0, __m128* v1) { *r = _mm_cvtss_sd(*v0, *v1); }\nvoid MmCvttpdEpi32(__m128i* r, __m128d* v0) { *r = _mm_cvttpd_epi32(*v0); }\nvoid MmCvttsdSi32(int* r, __m128d* v0) { *r = _mm_cvttsd_si32(*v0); }\nvoid MmCvtsdF64(double* r, __m128d* v0) { *r = _mm_cvtsd_f64(*v0); }\nvoid MmUndefinedPd(__m128d* r) { *r = _mm_undefined_pd(); }\nvoid MmSetSd(__m128d* r, double* v0) { *r = _mm_set_sd(*v0); }\nvoid MmSet1Pd(__m128d* r, double* v0) { *r = _mm_set1_pd(*v0); }\nvoid MmSetPd1(__m128d* r, double* v0) { *r = _mm_set_pd1(*v0); }\nvoid MmSetPd(__m128d* r, double* v0, double* v1) { *r = _mm_set_pd(*v0, *v1); }\nvoid MmSetrPd(__m128d* r, double* v0, double* v1) { *r = _mm_setr_pd(*v0, *v1); }\nvoid MmSetzeroPd(__m128d* r) { *r = _mm_setzero_pd(); }\nvoid MmMoveSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_move_sd(*v0, *v1); }\nvoid MmAddEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_add_epi8(*v0, *v1); }\nvoid MmAddEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_add_epi16(*v0, *v1); }\nvoid MmAddEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_add_epi32(*v0, *v1); }\nvoid MmAddEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_add_epi64(*v0, *v1); }\nvoid MmAddsEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_adds_epi8(*v0, *v1); }\nvoid MmAddsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_adds_epi16(*v0, *v1); }\nvoid MmAddsEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_adds_epu8(*v0, *v1); }\nvoid MmAddsEpu16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_adds_epu16(*v0, *v1); }\nvoid MmAvgEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_avg_epu8(*v0, *v1); }\nvoid MmAvgEpu16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_avg_epu16(*v0, *v1); }\nvoid MmMaddEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_madd_epi16(*v0, *v1); }\nvoid MmMaxEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_max_epi16(*v0, *v1); }\nvoid MmMaxEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_max_epu8(*v0, *v1); }\nvoid MmMinEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_min_epi16(*v0, *v1); }\nvoid MmMinEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_min_epu8(*v0, *v1); }\nvoid MmMulhiEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mulhi_epi16(*v0, *v1); }\nvoid MmMulhiEpu16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mulhi_epu16(*v0, *v1); }\nvoid MmMulloEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mullo_epi16(*v0, *v1); }\nvoid MmMulEpu32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mul_epu32(*v0, *v1); }\nvoid MmSadEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sad_epu8(*v0, *v1); }\nvoid MmSubEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sub_epi8(*v0, *v1); }\nvoid MmSubEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sub_epi16(*v0, *v1); }\nvoid MmSubEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sub_epi32(*v0, *v1); }\nvoid MmSubEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sub_epi64(*v0, *v1); }\nvoid MmSubsEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_subs_epi8(*v0, *v1); }\nvoid MmSubsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_subs_epi16(*v0, *v1); }\nvoid MmSubsEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_subs_epu8(*v0, *v1); }\nvoid MmSubsEpu16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_subs_epu16(*v0, *v1); }\nvoid MmAndSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_and_si128(*v0, *v1); }\nvoid MmAndnotSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_andnot_si128(*v0, *v1); }\nvoid MmOrSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_or_si128(*v0, *v1); }\nvoid MmXorSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_xor_si128(*v0, *v1); }\nvoid MmSlliEpi16(__m128i* r, __m128i* v0, int* v1) { *r = _mm_slli_epi16(*v0, *v1); }\nvoid MmSllEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sll_epi16(*v0, *v1); }\nvoid MmSlliEpi32(__m128i* r, __m128i* v0, int* v1) { *r = _mm_slli_epi32(*v0, *v1); }\nvoid MmSllEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sll_epi32(*v0, *v1); }\nvoid MmSlliEpi64(__m128i* r, __m128i* v0, int* v1) { *r = _mm_slli_epi64(*v0, *v1); }\nvoid MmSllEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sll_epi64(*v0, *v1); }\nvoid MmSraiEpi16(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srai_epi16(*v0, *v1); }\nvoid MmSraEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sra_epi16(*v0, *v1); }\nvoid MmSraiEpi32(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srai_epi32(*v0, *v1); }\nvoid MmSraEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sra_epi32(*v0, *v1); }\nvoid MmSrliEpi16(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srli_epi16(*v0, *v1); }\nvoid MmSrlEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srl_epi16(*v0, *v1); }\nvoid MmSrliEpi32(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srli_epi32(*v0, *v1); }\nvoid MmSrlEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srl_epi32(*v0, *v1); }\nvoid MmSrliEpi64(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srli_epi64(*v0, *v1); }\nvoid MmSrlEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srl_epi64(*v0, *v1); }\nvoid MmCmpeqEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpeq_epi8(*v0, *v1); }\nvoid MmCmpeqEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpeq_epi16(*v0, *v1); }\nvoid MmCmpeqEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpeq_epi32(*v0, *v1); }\nvoid MmCmpgtEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpgt_epi8(*v0, *v1); }\nvoid MmCmpgtEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpgt_epi16(*v0, *v1); }\nvoid MmCmpgtEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpgt_epi32(*v0, *v1); }\nvoid MmCmpltEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmplt_epi8(*v0, *v1); }\nvoid MmCmpltEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmplt_epi16(*v0, *v1); }\nvoid MmCmpltEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmplt_epi32(*v0, *v1); }\nvoid MmCvtsi64Sd(__m128d* r, __m128d* v0, long long* v1) { *r = _mm_cvtsi64_sd(*v0, *v1); }\nvoid MmCvtsdSi64(long long* r, __m128d* v0) { *r = _mm_cvtsd_si64(*v0); }\nvoid MmCvttsdSi64(long long* r, __m128d* v0) { *r = _mm_cvttsd_si64(*v0); }\nvoid MmCvtepi32Ps(__m128* r, __m128i* v0) { *r = _mm_cvtepi32_ps(*v0); }\nvoid MmCvtpsEpi32(__m128i* r, __m128* v0) { *r = _mm_cvtps_epi32(*v0); }\nvoid MmCvttpsEpi32(__m128i* r, __m128* v0) { *r = _mm_cvttps_epi32(*v0); }\nvoid MmCvtsi32Si128(__m128i* r, int* v0) { *r = _mm_cvtsi32_si128(*v0); }\nvoid MmCvtsi64Si128(__m128i* r, long long* v0) { *r = _mm_cvtsi64_si128(*v0); }\nvoid MmCvtsi128Si32(int* r, __m128i* v0) { *r = _mm_cvtsi128_si32(*v0); }\nvoid MmCvtsi128Si64(long long* r, __m128i* v0) { *r = _mm_cvtsi128_si64(*v0); }\nvoid MmUndefinedSi128(__m128i* r) { *r = _mm_undefined_si128(); }\nvoid MmSetEpi64X(__m128i* r, long long* v0, long long* v1) { *r = _mm_set_epi64x(*v0, *v1); }\nvoid MmSetEpi64(__m128i* r, __m64* v0, __m64* v1) { *r = _mm_set_epi64(*v0, *v1); }\nvoid MmSetEpi32(__m128i* r, int* v0, int* v1, int* v2, int* v3) { *r = _mm_set_epi32(*v0, *v1, *v2, *v3); }\nvoid MmSetEpi16(__m128i* r, short* v0, short* v1, short* v2, short* v3, short* v4, short* v5, short* v6, short* v7) { *r = _mm_set_epi16(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }\nvoid MmSetEpi8(__m128i* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7, char* v8, char* v9, char* v10, char* v11, char* v12, char* v13, char* v14, char* v15) { *r = _mm_set_epi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15); }\nvoid MmSet1Epi64X(__m128i* r, long long* v0) { *r = _mm_set1_epi64x(*v0); }\nvoid MmSet1Epi64(__m128i* r, __m64* v0) { *r = _mm_set1_epi64(*v0); }\nvoid MmSet1Epi32(__m128i* r, int* v0) { *r = _mm_set1_epi32(*v0); }\nvoid MmSet1Epi16(__m128i* r, short* v0) { *r = _mm_set1_epi16(*v0); }\nvoid MmSet1Epi8(__m128i* r, char* v0) { *r = _mm_set1_epi8(*v0); }\nvoid MmSetrEpi64(__m128i* r, __m64* v0, __m64* v1) { *r = _mm_setr_epi64(*v0, *v1); }\nvoid MmSetrEpi32(__m128i* r, int* v0, int* v1, int* v2, int* v3) { *r = _mm_setr_epi32(*v0, *v1, *v2, *v3); }\nvoid MmSetrEpi16(__m128i* r, short* v0, short* v1, short* v2, short* v3, short* v4, short* v5, short* v6, short* v7) { *r = _mm_setr_epi16(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); }\nvoid MmSetrEpi8(__m128i* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7, char* v8, char* v9, char* v10, char* v11, char* v12, char* v13, char* v14, char* v15) { *r = _mm_setr_epi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15); }\nvoid MmSetzeroSi128(__m128i* r) { *r = _mm_setzero_si128(); }\nvoid MmPacksEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_packs_epi16(*v0, *v1); }\nvoid MmPacksEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_packs_epi32(*v0, *v1); }\nvoid MmPackusEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_packus_epi16(*v0, *v1); }\nvoid MmMovemaskEpi8(int* r, __m128i* v0) { *r = _mm_movemask_epi8(*v0); }\nvoid MmUnpackhiEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpackhi_epi8(*v0, *v1); }\nvoid MmUnpackhiEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpackhi_epi16(*v0, *v1); }\nvoid MmUnpackhiEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpackhi_epi32(*v0, *v1); }\nvoid MmUnpackhiEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpackhi_epi64(*v0, *v1); }\nvoid MmUnpackloEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpacklo_epi8(*v0, *v1); }\nvoid MmUnpackloEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpacklo_epi16(*v0, *v1); }\nvoid MmUnpackloEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpacklo_epi32(*v0, *v1); }\nvoid MmUnpackloEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpacklo_epi64(*v0, *v1); }\nvoid MmMovepi64Pi64(__m64* r, __m128i* v0) { *r = _mm_movepi64_pi64(*v0); }\nvoid MmMovpi64Epi64(__m128i* r, __m64* v0) { *r = _mm_movpi64_epi64(*v0); }\nvoid MmMoveEpi64(__m128i* r, __m128i* v0) { *r = _mm_move_epi64(*v0); }\nvoid MmUnpackhiPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_unpackhi_pd(*v0, *v1); }\nvoid MmUnpackloPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_unpacklo_pd(*v0, *v1); }\nvoid MmMovemaskPd(int* r, __m128d* v0) { *r = _mm_movemask_pd(*v0); }\nvoid MmCastpdPs(__m128* r, __m128d* v0) { *r = _mm_castpd_ps(*v0); }\nvoid MmCastpdSi128(__m128i* r, __m128d* v0) { *r = _mm_castpd_si128(*v0); }\nvoid MmCastpsPd(__m128d* r, __m128* v0) { *r = _mm_castps_pd(*v0); }\nvoid MmCastpsSi128(__m128i* r, __m128* v0) { *r = _mm_castps_si128(*v0); }\nvoid MmCastsi128Ps(__m128* r, __m128i* v0) { *r = _mm_castsi128_ps(*v0); }\nvoid MmCastsi128Pd(__m128d* r, __m128i* v0) { *r = _mm_castsi128_pd(*v0); }\n"
  },
  {
    "path": "x86/sse2/functions.go",
    "content": "package sse2\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -msse2\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Add the lower double-precision (64-bit) floating-point element in \"a\" and \"b\", store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmAddSd MmAddSd\n//go:noescape\nfunc MmAddSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Add packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAddPd MmAddPd\n//go:noescape\nfunc MmAddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Subtract the lower double-precision (64-bit) floating-point element in \"b\" from the lower double-precision (64-bit) floating-point element in \"a\", store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmSubSd MmSubSd\n//go:noescape\nfunc MmSubSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Subtract packed double-precision (64-bit) floating-point elements in \"b\" from packed double-precision (64-bit) floating-point elements in \"a\", and store the results in \"dst\".\n//\n//go:linkname MmSubPd MmSubPd\n//go:noescape\nfunc MmSubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Multiply the lower double-precision (64-bit) floating-point element in \"a\" and \"b\", store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmMulSd MmMulSd\n//go:noescape\nfunc MmMulSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Multiply packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmMulPd MmMulPd\n//go:noescape\nfunc MmMulPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Divide the lower double-precision (64-bit) floating-point element in \"a\" by the lower double-precision (64-bit) floating-point element in \"b\", store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmDivSd MmDivSd\n//go:noescape\nfunc MmDivSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Divide packed double-precision (64-bit) floating-point elements in \"a\" by packed elements in \"b\", and store the results in \"dst\".\n//\n//go:linkname MmDivPd MmDivPd\n//go:noescape\nfunc MmDivPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compute the square root of the lower double-precision (64-bit) floating-point element in \"b\", store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmSqrtSd MmSqrtSd\n//go:noescape\nfunc MmSqrtSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compute the square root of packed double-precision (64-bit) floating-point elements in \"a\", and store the results in \"dst\".\n//\n//go:linkname MmSqrtPd MmSqrtPd\n//go:noescape\nfunc MmSqrtPd(r *x86.M128D, v0 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\", store the minimum value in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\". [min_float_note]\n//\n//go:linkname MmMinSd MmMinSd\n//go:noescape\nfunc MmMinSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", and store packed minimum values in \"dst\". [min_float_note]\n//\n//go:linkname MmMinPd MmMinPd\n//go:noescape\nfunc MmMinPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\", store the maximum value in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\". [max_float_note]\n//\n//go:linkname MmMaxSd MmMaxSd\n//go:noescape\nfunc MmMaxSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", and store packed maximum values in \"dst\". [max_float_note]\n//\n//go:linkname MmMaxPd MmMaxPd\n//go:noescape\nfunc MmMaxPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAndPd MmAndPd\n//go:noescape\nfunc MmAndPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in \"a\" and then AND with \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAndnotPd MmAndnotPd\n//go:noescape\nfunc MmAndnotPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmOrPd MmOrPd\n//go:noescape\nfunc MmOrPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmXorPd MmXorPd\n//go:noescape\nfunc MmXorPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\" for equality, and store the results in \"dst\".\n//\n//go:linkname MmCmpeqPd MmCmpeqPd\n//go:noescape\nfunc MmCmpeqPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\" for less-than, and store the results in \"dst\".\n//\n//go:linkname MmCmpltPd MmCmpltPd\n//go:noescape\nfunc MmCmpltPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\" for less-than-or-equal, and store the results in \"dst\".\n//\n//go:linkname MmCmplePd MmCmplePd\n//go:noescape\nfunc MmCmplePd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\" for greater-than, and store the results in \"dst\".\n//\n//go:linkname MmCmpgtPd MmCmpgtPd\n//go:noescape\nfunc MmCmpgtPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\" for greater-than-or-equal, and store the results in \"dst\".\n//\n//go:linkname MmCmpgePd MmCmpgePd\n//go:noescape\nfunc MmCmpgePd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\" to see if neither is NaN, and store the results in \"dst\".\n//\n//go:linkname MmCmpordPd MmCmpordPd\n//go:noescape\nfunc MmCmpordPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\" to see if either is NaN, and store the results in \"dst\".\n//\n//go:linkname MmCmpunordPd MmCmpunordPd\n//go:noescape\nfunc MmCmpunordPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\" for not-equal, and store the results in \"dst\".\n//\n//go:linkname MmCmpneqPd MmCmpneqPd\n//go:noescape\nfunc MmCmpneqPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\" for not-less-than, and store the results in \"dst\".\n//\n//go:linkname MmCmpnltPd MmCmpnltPd\n//go:noescape\nfunc MmCmpnltPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\" for not-less-than-or-equal, and store the results in \"dst\".\n//\n//go:linkname MmCmpnlePd MmCmpnlePd\n//go:noescape\nfunc MmCmpnlePd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\" for not-greater-than, and store the results in \"dst\".\n//\n//go:linkname MmCmpngtPd MmCmpngtPd\n//go:noescape\nfunc MmCmpngtPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare packed double-precision (64-bit) floating-point elements in \"a\" and \"b\" for not-greater-than-or-equal, and store the results in \"dst\".\n//\n//go:linkname MmCmpngePd MmCmpngePd\n//go:noescape\nfunc MmCmpngePd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\" for equality, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCmpeqSd MmCmpeqSd\n//go:noescape\nfunc MmCmpeqSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\" for less-than, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCmpltSd MmCmpltSd\n//go:noescape\nfunc MmCmpltSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\" for less-than-or-equal, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCmpleSd MmCmpleSd\n//go:noescape\nfunc MmCmpleSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\" for greater-than, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCmpgtSd MmCmpgtSd\n//go:noescape\nfunc MmCmpgtSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\" for greater-than-or-equal, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCmpgeSd MmCmpgeSd\n//go:noescape\nfunc MmCmpgeSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\" to see if neither is NaN, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCmpordSd MmCmpordSd\n//go:noescape\nfunc MmCmpordSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\" to see if either is NaN, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCmpunordSd MmCmpunordSd\n//go:noescape\nfunc MmCmpunordSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\" for not-equal, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCmpneqSd MmCmpneqSd\n//go:noescape\nfunc MmCmpneqSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\" for not-less-than, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCmpnltSd MmCmpnltSd\n//go:noescape\nfunc MmCmpnltSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\" for not-less-than-or-equal, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCmpnleSd MmCmpnleSd\n//go:noescape\nfunc MmCmpnleSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\" for not-greater-than, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCmpngtSd MmCmpngtSd\n//go:noescape\nfunc MmCmpngtSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point elements in \"a\" and \"b\" for not-greater-than-or-equal, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCmpngeSd MmCmpngeSd\n//go:noescape\nfunc MmCmpngeSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point element in \"a\" and \"b\" for equality, and return the boolean result (0 or 1).\n//\n//go:linkname MmComieqSd MmComieqSd\n//go:noescape\nfunc MmComieqSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point element in \"a\" and \"b\" for less-than, and return the boolean result (0 or 1).\n//\n//go:linkname MmComiltSd MmComiltSd\n//go:noescape\nfunc MmComiltSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point element in \"a\" and \"b\" for less-than-or-equal, and return the boolean result (0 or 1).\n//\n//go:linkname MmComileSd MmComileSd\n//go:noescape\nfunc MmComileSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point element in \"a\" and \"b\" for greater-than, and return the boolean result (0 or 1).\n//\n//go:linkname MmComigtSd MmComigtSd\n//go:noescape\nfunc MmComigtSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point element in \"a\" and \"b\" for greater-than-or-equal, and return the boolean result (0 or 1).\n//\n//go:linkname MmComigeSd MmComigeSd\n//go:noescape\nfunc MmComigeSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point element in \"a\" and \"b\" for not-equal, and return the boolean result (0 or 1).\n//\n//go:linkname MmComineqSd MmComineqSd\n//go:noescape\nfunc MmComineqSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point element in \"a\" and \"b\" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.\n//\n//go:linkname MmUcomieqSd MmUcomieqSd\n//go:noescape\nfunc MmUcomieqSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point element in \"a\" and \"b\" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.\n//\n//go:linkname MmUcomiltSd MmUcomiltSd\n//go:noescape\nfunc MmUcomiltSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point element in \"a\" and \"b\" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.\n//\n//go:linkname MmUcomileSd MmUcomileSd\n//go:noescape\nfunc MmUcomileSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point element in \"a\" and \"b\" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.\n//\n//go:linkname MmUcomigtSd MmUcomigtSd\n//go:noescape\nfunc MmUcomigtSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point element in \"a\" and \"b\" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.\n//\n//go:linkname MmUcomigeSd MmUcomigeSd\n//go:noescape\nfunc MmUcomigeSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Compare the lower double-precision (64-bit) floating-point element in \"a\" and \"b\" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs.\n//\n//go:linkname MmUcomineqSd MmUcomineqSd\n//go:noescape\nfunc MmUcomineqSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D)\n\n// Convert packed double-precision (64-bit) floating-point elements in \"a\" to packed single-precision (32-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname MmCvtpdPs MmCvtpdPs\n//go:noescape\nfunc MmCvtpdPs(r *x86.M128, v0 *x86.M128D)\n\n// Convert packed single-precision (32-bit) floating-point elements in \"a\" to packed double-precision (64-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname MmCvtpsPd MmCvtpsPd\n//go:noescape\nfunc MmCvtpsPd(r *x86.M128D, v0 *x86.M128)\n\n// Convert packed signed 32-bit integers in \"a\" to packed double-precision (64-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname MmCvtepi32Pd MmCvtepi32Pd\n//go:noescape\nfunc MmCvtepi32Pd(r *x86.M128D, v0 *x86.M128I)\n\n// Convert packed double-precision (64-bit) floating-point elements in \"a\" to packed 32-bit integers, and store the results in \"dst\".\n//\n//go:linkname MmCvtpdEpi32 MmCvtpdEpi32\n//go:noescape\nfunc MmCvtpdEpi32(r *x86.M128I, v0 *x86.M128D)\n\n// Convert the lower double-precision (64-bit) floating-point element in \"a\" to a 32-bit integer, and store the result in \"dst\".\n//\n//go:linkname MmCvtsdSi32 MmCvtsdSi32\n//go:noescape\nfunc MmCvtsdSi32(r *x86.Int, v0 *x86.M128D)\n\n// Convert the lower double-precision (64-bit) floating-point element in \"b\" to a single-precision (32-bit) floating-point element, store the result in the lower element of \"dst\", and copy the upper 3 packed elements from \"a\" to the upper elements of \"dst\".\n//\n//go:linkname MmCvtsdSs MmCvtsdSs\n//go:noescape\nfunc MmCvtsdSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128D)\n\n// Convert the signed 32-bit integer \"b\" to a double-precision (64-bit) floating-point element, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCvtsi32Sd MmCvtsi32Sd\n//go:noescape\nfunc MmCvtsi32Sd(r *x86.M128D, v0 *x86.M128D, v1 *x86.Int)\n\n// Convert the lower single-precision (32-bit) floating-point element in \"b\" to a double-precision (64-bit) floating-point element, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCvtssSd MmCvtssSd\n//go:noescape\nfunc MmCvtssSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128)\n\n// Convert packed double-precision (64-bit) floating-point elements in \"a\" to packed 32-bit integers with truncation, and store the results in \"dst\".\n//\n//go:linkname MmCvttpdEpi32 MmCvttpdEpi32\n//go:noescape\nfunc MmCvttpdEpi32(r *x86.M128I, v0 *x86.M128D)\n\n// Convert the lower double-precision (64-bit) floating-point element in \"a\" to a 32-bit integer with truncation, and store the result in \"dst\".\n//\n//go:linkname MmCvttsdSi32 MmCvttsdSi32\n//go:noescape\nfunc MmCvttsdSi32(r *x86.Int, v0 *x86.M128D)\n\n// Copy the lower double-precision (64-bit) floating-point element of \"a\" to \"dst\".\n//\n//go:linkname MmCvtsdF64 MmCvtsdF64\n//go:noescape\nfunc MmCvtsdF64(r *x86.Double, v0 *x86.M128D)\n\n// Return vector of type __m128d with undefined elements.\n//\n//go:linkname MmUndefinedPd MmUndefinedPd\n//go:noescape\nfunc MmUndefinedPd(r *x86.M128D, )\n\n// Copy double-precision (64-bit) floating-point element \"a\" to the lower element of \"dst\", and zero the upper element.\n//\n//go:linkname MmSetSd MmSetSd\n//go:noescape\nfunc MmSetSd(r *x86.M128D, v0 *x86.Double)\n\n// Broadcast double-precision (64-bit) floating-point value \"a\" to all elements of \"dst\".\n//\n//go:linkname MmSet1Pd MmSet1Pd\n//go:noescape\nfunc MmSet1Pd(r *x86.M128D, v0 *x86.Double)\n\n// Broadcast double-precision (64-bit) floating-point value \"a\" to all elements of \"dst\".\n//\n//go:linkname MmSetPd1 MmSetPd1\n//go:noescape\nfunc MmSetPd1(r *x86.M128D, v0 *x86.Double)\n\n// Set packed double-precision (64-bit) floating-point elements in \"dst\" with the supplied values.\n//\n//go:linkname MmSetPd MmSetPd\n//go:noescape\nfunc MmSetPd(r *x86.M128D, v0 *x86.Double, v1 *x86.Double)\n\n// Set packed double-precision (64-bit) floating-point elements in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname MmSetrPd MmSetrPd\n//go:noescape\nfunc MmSetrPd(r *x86.M128D, v0 *x86.Double, v1 *x86.Double)\n\n// Return vector of type __m128d with all elements set to zero.\n//\n//go:linkname MmSetzeroPd MmSetzeroPd\n//go:noescape\nfunc MmSetzeroPd(r *x86.M128D, )\n\n// Move the lower double-precision (64-bit) floating-point element from \"b\" to the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmMoveSd MmMoveSd\n//go:noescape\nfunc MmMoveSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Add packed 8-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAddEpi8 MmAddEpi8\n//go:noescape\nfunc MmAddEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Add packed 16-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAddEpi16 MmAddEpi16\n//go:noescape\nfunc MmAddEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Add packed 32-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAddEpi32 MmAddEpi32\n//go:noescape\nfunc MmAddEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Add packed 64-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAddEpi64 MmAddEpi64\n//go:noescape\nfunc MmAddEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Add packed signed 8-bit integers in \"a\" and \"b\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmAddsEpi8 MmAddsEpi8\n//go:noescape\nfunc MmAddsEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Add packed signed 16-bit integers in \"a\" and \"b\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmAddsEpi16 MmAddsEpi16\n//go:noescape\nfunc MmAddsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Add packed unsigned 8-bit integers in \"a\" and \"b\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmAddsEpu8 MmAddsEpu8\n//go:noescape\nfunc MmAddsEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Add packed unsigned 16-bit integers in \"a\" and \"b\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmAddsEpu16 MmAddsEpu16\n//go:noescape\nfunc MmAddsEpu16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Average packed unsigned 8-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAvgEpu8 MmAvgEpu8\n//go:noescape\nfunc MmAvgEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Average packed unsigned 16-bit integers in \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAvgEpu16 MmAvgEpu16\n//go:noescape\nfunc MmAvgEpu16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Multiply packed signed 16-bit integers in \"a\" and \"b\", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in \"dst\".\n//\n//go:linkname MmMaddEpi16 MmMaddEpi16\n//go:noescape\nfunc MmMaddEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compare packed signed 16-bit integers in \"a\" and \"b\", and store packed maximum values in \"dst\".\n//\n//go:linkname MmMaxEpi16 MmMaxEpi16\n//go:noescape\nfunc MmMaxEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compare packed unsigned 8-bit integers in \"a\" and \"b\", and store packed maximum values in \"dst\".\n//\n//go:linkname MmMaxEpu8 MmMaxEpu8\n//go:noescape\nfunc MmMaxEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compare packed signed 16-bit integers in \"a\" and \"b\", and store packed minimum values in \"dst\".\n//\n//go:linkname MmMinEpi16 MmMinEpi16\n//go:noescape\nfunc MmMinEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compare packed unsigned 8-bit integers in \"a\" and \"b\", and store packed minimum values in \"dst\".\n//\n//go:linkname MmMinEpu8 MmMinEpu8\n//go:noescape\nfunc MmMinEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Multiply the packed signed 16-bit integers in \"a\" and \"b\", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in \"dst\".\n//\n//go:linkname MmMulhiEpi16 MmMulhiEpi16\n//go:noescape\nfunc MmMulhiEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Multiply the packed unsigned 16-bit integers in \"a\" and \"b\", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in \"dst\".\n//\n//go:linkname MmMulhiEpu16 MmMulhiEpu16\n//go:noescape\nfunc MmMulhiEpu16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Multiply the packed 16-bit integers in \"a\" and \"b\", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in \"dst\".\n//\n//go:linkname MmMulloEpi16 MmMulloEpi16\n//go:noescape\nfunc MmMulloEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Multiply the low unsigned 32-bit integers from each packed 64-bit element in \"a\" and \"b\", and store the unsigned 64-bit results in \"dst\".\n//\n//go:linkname MmMulEpu32 MmMulEpu32\n//go:noescape\nfunc MmMulEpu32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compute the absolute differences of packed unsigned 8-bit integers in \"a\" and \"b\", then horizontally sum each consecutive 8 differences to produce two unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in \"dst\".\n//\n//go:linkname MmSadEpu8 MmSadEpu8\n//go:noescape\nfunc MmSadEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Subtract packed 8-bit integers in \"b\" from packed 8-bit integers in \"a\", and store the results in \"dst\".\n//\n//go:linkname MmSubEpi8 MmSubEpi8\n//go:noescape\nfunc MmSubEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Subtract packed 16-bit integers in \"b\" from packed 16-bit integers in \"a\", and store the results in \"dst\".\n//\n//go:linkname MmSubEpi16 MmSubEpi16\n//go:noescape\nfunc MmSubEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Subtract packed 32-bit integers in \"b\" from packed 32-bit integers in \"a\", and store the results in \"dst\".\n//\n//go:linkname MmSubEpi32 MmSubEpi32\n//go:noescape\nfunc MmSubEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Subtract packed 64-bit integers in \"b\" from packed 64-bit integers in \"a\", and store the results in \"dst\".\n//\n//go:linkname MmSubEpi64 MmSubEpi64\n//go:noescape\nfunc MmSubEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Subtract packed signed 8-bit integers in \"b\" from packed 8-bit integers in \"a\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmSubsEpi8 MmSubsEpi8\n//go:noescape\nfunc MmSubsEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Subtract packed signed 16-bit integers in \"b\" from packed 16-bit integers in \"a\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmSubsEpi16 MmSubsEpi16\n//go:noescape\nfunc MmSubsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Subtract packed unsigned 8-bit integers in \"b\" from packed unsigned 8-bit integers in \"a\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmSubsEpu8 MmSubsEpu8\n//go:noescape\nfunc MmSubsEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Subtract packed unsigned 16-bit integers in \"b\" from packed unsigned 16-bit integers in \"a\" using saturation, and store the results in \"dst\".\n//\n//go:linkname MmSubsEpu16 MmSubsEpu16\n//go:noescape\nfunc MmSubsEpu16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compute the bitwise AND of 128 bits (representing integer data) in \"a\" and \"b\", and store the result in \"dst\".\n//\n//go:linkname MmAndSi128 MmAndSi128\n//go:noescape\nfunc MmAndSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compute the bitwise NOT of 128 bits (representing integer data) in \"a\" and then AND with \"b\", and store the result in \"dst\".\n//\n//go:linkname MmAndnotSi128 MmAndnotSi128\n//go:noescape\nfunc MmAndnotSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compute the bitwise OR of 128 bits (representing integer data) in \"a\" and \"b\", and store the result in \"dst\".\n//\n//go:linkname MmOrSi128 MmOrSi128\n//go:noescape\nfunc MmOrSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compute the bitwise XOR of 128 bits (representing integer data) in \"a\" and \"b\", and store the result in \"dst\".\n//\n//go:linkname MmXorSi128 MmXorSi128\n//go:noescape\nfunc MmXorSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Shift packed 16-bit integers in \"a\" left by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSlliEpi16 MmSlliEpi16\n//go:noescape\nfunc MmSlliEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)\n\n// Shift packed 16-bit integers in \"a\" left by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSllEpi16 MmSllEpi16\n//go:noescape\nfunc MmSllEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Shift packed 32-bit integers in \"a\" left by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSlliEpi32 MmSlliEpi32\n//go:noescape\nfunc MmSlliEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)\n\n// Shift packed 32-bit integers in \"a\" left by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSllEpi32 MmSllEpi32\n//go:noescape\nfunc MmSllEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Shift packed 64-bit integers in \"a\" left by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSlliEpi64 MmSlliEpi64\n//go:noescape\nfunc MmSlliEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)\n\n// Shift packed 64-bit integers in \"a\" left by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSllEpi64 MmSllEpi64\n//go:noescape\nfunc MmSllEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Shift packed 16-bit integers in \"a\" right by \"imm8\" while shifting in sign bits, and store the results in \"dst\".\n//\n//go:linkname MmSraiEpi16 MmSraiEpi16\n//go:noescape\nfunc MmSraiEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)\n\n// Shift packed 16-bit integers in \"a\" right by \"count\" while shifting in sign bits, and store the results in \"dst\".\n//\n//go:linkname MmSraEpi16 MmSraEpi16\n//go:noescape\nfunc MmSraEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Shift packed 32-bit integers in \"a\" right by \"imm8\" while shifting in sign bits, and store the results in \"dst\".\n//\n//go:linkname MmSraiEpi32 MmSraiEpi32\n//go:noescape\nfunc MmSraiEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)\n\n// Shift packed 32-bit integers in \"a\" right by \"count\" while shifting in sign bits, and store the results in \"dst\".\n//\n//go:linkname MmSraEpi32 MmSraEpi32\n//go:noescape\nfunc MmSraEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Shift packed 16-bit integers in \"a\" right by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSrliEpi16 MmSrliEpi16\n//go:noescape\nfunc MmSrliEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)\n\n// Shift packed 16-bit integers in \"a\" right by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSrlEpi16 MmSrlEpi16\n//go:noescape\nfunc MmSrlEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Shift packed 32-bit integers in \"a\" right by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSrliEpi32 MmSrliEpi32\n//go:noescape\nfunc MmSrliEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)\n\n// Shift packed 32-bit integers in \"a\" right by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSrlEpi32 MmSrlEpi32\n//go:noescape\nfunc MmSrlEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Shift packed 64-bit integers in \"a\" right by \"imm8\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSrliEpi64 MmSrliEpi64\n//go:noescape\nfunc MmSrliEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int)\n\n// Shift packed 64-bit integers in \"a\" right by \"count\" while shifting in zeros, and store the results in \"dst\".\n//\n//go:linkname MmSrlEpi64 MmSrlEpi64\n//go:noescape\nfunc MmSrlEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compare packed 8-bit integers in \"a\" and \"b\" for equality, and store the results in \"dst\".\n//\n//go:linkname MmCmpeqEpi8 MmCmpeqEpi8\n//go:noescape\nfunc MmCmpeqEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compare packed 16-bit integers in \"a\" and \"b\" for equality, and store the results in \"dst\".\n//\n//go:linkname MmCmpeqEpi16 MmCmpeqEpi16\n//go:noescape\nfunc MmCmpeqEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compare packed 32-bit integers in \"a\" and \"b\" for equality, and store the results in \"dst\".\n//\n//go:linkname MmCmpeqEpi32 MmCmpeqEpi32\n//go:noescape\nfunc MmCmpeqEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compare packed signed 8-bit integers in \"a\" and \"b\" for greater-than, and store the results in \"dst\".\n//\n//go:linkname MmCmpgtEpi8 MmCmpgtEpi8\n//go:noescape\nfunc MmCmpgtEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compare packed signed 16-bit integers in \"a\" and \"b\" for greater-than, and store the results in \"dst\".\n//\n//go:linkname MmCmpgtEpi16 MmCmpgtEpi16\n//go:noescape\nfunc MmCmpgtEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compare packed signed 32-bit integers in \"a\" and \"b\" for greater-than, and store the results in \"dst\".\n//\n//go:linkname MmCmpgtEpi32 MmCmpgtEpi32\n//go:noescape\nfunc MmCmpgtEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compare packed signed 8-bit integers in \"a\" and \"b\" for less-than, and store the results in \"dst\". Note: This intrinsic emits the pcmpgtb instruction with the order of the operands switched.\n//\n//go:linkname MmCmpltEpi8 MmCmpltEpi8\n//go:noescape\nfunc MmCmpltEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compare packed signed 16-bit integers in \"a\" and \"b\" for less-than, and store the results in \"dst\". Note: This intrinsic emits the pcmpgtw instruction with the order of the operands switched.\n//\n//go:linkname MmCmpltEpi16 MmCmpltEpi16\n//go:noescape\nfunc MmCmpltEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Compare packed signed 32-bit integers in \"a\" and \"b\" for less-than, and store the results in \"dst\". Note: This intrinsic emits the pcmpgtd instruction with the order of the operands switched.\n//\n//go:linkname MmCmpltEpi32 MmCmpltEpi32\n//go:noescape\nfunc MmCmpltEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Convert the signed 64-bit integer \"b\" to a double-precision (64-bit) floating-point element, store the result in the lower element of \"dst\", and copy the upper element from \"a\" to the upper element of \"dst\".\n//\n//go:linkname MmCvtsi64Sd MmCvtsi64Sd\n//go:noescape\nfunc MmCvtsi64Sd(r *x86.M128D, v0 *x86.M128D, v1 *x86.Longlong)\n\n// Convert the lower double-precision (64-bit) floating-point element in \"a\" to a 64-bit integer, and store the result in \"dst\".\n//\n//go:linkname MmCvtsdSi64 MmCvtsdSi64\n//go:noescape\nfunc MmCvtsdSi64(r *x86.Longlong, v0 *x86.M128D)\n\n// Convert the lower double-precision (64-bit) floating-point element in \"a\" to a 64-bit integer with truncation, and store the result in \"dst\".\n//\n//go:linkname MmCvttsdSi64 MmCvttsdSi64\n//go:noescape\nfunc MmCvttsdSi64(r *x86.Longlong, v0 *x86.M128D)\n\n// Convert packed signed 32-bit integers in \"a\" to packed single-precision (32-bit) floating-point elements, and store the results in \"dst\".\n//\n//go:linkname MmCvtepi32Ps MmCvtepi32Ps\n//go:noescape\nfunc MmCvtepi32Ps(r *x86.M128, v0 *x86.M128I)\n\n// Convert packed single-precision (32-bit) floating-point elements in \"a\" to packed 32-bit integers, and store the results in \"dst\".\n//\n//go:linkname MmCvtpsEpi32 MmCvtpsEpi32\n//go:noescape\nfunc MmCvtpsEpi32(r *x86.M128I, v0 *x86.M128)\n\n// Convert packed single-precision (32-bit) floating-point elements in \"a\" to packed 32-bit integers with truncation, and store the results in \"dst\".\n//\n//go:linkname MmCvttpsEpi32 MmCvttpsEpi32\n//go:noescape\nfunc MmCvttpsEpi32(r *x86.M128I, v0 *x86.M128)\n\n// Copy 32-bit integer \"a\" to the lower elements of \"dst\", and zero the upper elements of \"dst\".\n//\n//go:linkname MmCvtsi32Si128 MmCvtsi32Si128\n//go:noescape\nfunc MmCvtsi32Si128(r *x86.M128I, v0 *x86.Int)\n\n// Copy 64-bit integer \"a\" to the lower element of \"dst\", and zero the upper element.\n//\n//go:linkname MmCvtsi64Si128 MmCvtsi64Si128\n//go:noescape\nfunc MmCvtsi64Si128(r *x86.M128I, v0 *x86.Longlong)\n\n// Copy the lower 32-bit integer in \"a\" to \"dst\".\n//\n//go:linkname MmCvtsi128Si32 MmCvtsi128Si32\n//go:noescape\nfunc MmCvtsi128Si32(r *x86.Int, v0 *x86.M128I)\n\n// Copy the lower 64-bit integer in \"a\" to \"dst\".\n//\n//go:linkname MmCvtsi128Si64 MmCvtsi128Si64\n//go:noescape\nfunc MmCvtsi128Si64(r *x86.Longlong, v0 *x86.M128I)\n\n// Return vector of type __m128i with undefined elements.\n//\n//go:linkname MmUndefinedSi128 MmUndefinedSi128\n//go:noescape\nfunc MmUndefinedSi128(r *x86.M128I, )\n\n// Set packed 64-bit integers in \"dst\" with the supplied values.\n//\n//go:linkname MmSetEpi64X MmSetEpi64X\n//go:noescape\nfunc MmSetEpi64X(r *x86.M128I, v0 *x86.Longlong, v1 *x86.Longlong)\n\n// Set packed 64-bit integers in \"dst\" with the supplied values.\n//\n//go:linkname MmSetEpi64 MmSetEpi64\n//go:noescape\nfunc MmSetEpi64(r *x86.M128I, v0 *x86.M64, v1 *x86.M64)\n\n// Set packed 32-bit integers in \"dst\" with the supplied values.\n//\n//go:linkname MmSetEpi32 MmSetEpi32\n//go:noescape\nfunc MmSetEpi32(r *x86.M128I, v0 *x86.Int, v1 *x86.Int, v2 *x86.Int, v3 *x86.Int)\n\n// Set packed 16-bit integers in \"dst\" with the supplied values.\n//\n//go:linkname MmSetEpi16 MmSetEpi16\n//go:noescape\nfunc MmSetEpi16(r *x86.M128I, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short, v4 *x86.Short, v5 *x86.Short, v6 *x86.Short, v7 *x86.Short)\n\n// Set packed 8-bit integers in \"dst\" with the supplied values.\n//\n//go:linkname MmSetEpi8 MmSetEpi8\n//go:noescape\nfunc MmSetEpi8(r *x86.M128I, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char, v8 *x86.Char, v9 *x86.Char, v10 *x86.Char, v11 *x86.Char, v12 *x86.Char, v13 *x86.Char, v14 *x86.Char, v15 *x86.Char)\n\n// Broadcast 64-bit integer \"a\" to all elements of \"dst\". This intrinsic may generate the \"vpbroadcastq\".\n//\n//go:linkname MmSet1Epi64X MmSet1Epi64X\n//go:noescape\nfunc MmSet1Epi64X(r *x86.M128I, v0 *x86.Longlong)\n\n// Broadcast 64-bit integer \"a\" to all elements of \"dst\".\n//\n//go:linkname MmSet1Epi64 MmSet1Epi64\n//go:noescape\nfunc MmSet1Epi64(r *x86.M128I, v0 *x86.M64)\n\n// Broadcast 32-bit integer \"a\" to all elements of \"dst\". This intrinsic may generate \"vpbroadcastd\".\n//\n//go:linkname MmSet1Epi32 MmSet1Epi32\n//go:noescape\nfunc MmSet1Epi32(r *x86.M128I, v0 *x86.Int)\n\n// Broadcast 16-bit integer \"a\" to all all elements of \"dst\". This intrinsic may generate \"vpbroadcastw\".\n//\n//go:linkname MmSet1Epi16 MmSet1Epi16\n//go:noescape\nfunc MmSet1Epi16(r *x86.M128I, v0 *x86.Short)\n\n// Broadcast 8-bit integer \"a\" to all elements of \"dst\". This intrinsic may generate \"vpbroadcastb\".\n//\n//go:linkname MmSet1Epi8 MmSet1Epi8\n//go:noescape\nfunc MmSet1Epi8(r *x86.M128I, v0 *x86.Char)\n\n// Set packed 64-bit integers in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname MmSetrEpi64 MmSetrEpi64\n//go:noescape\nfunc MmSetrEpi64(r *x86.M128I, v0 *x86.M64, v1 *x86.M64)\n\n// Set packed 32-bit integers in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname MmSetrEpi32 MmSetrEpi32\n//go:noescape\nfunc MmSetrEpi32(r *x86.M128I, v0 *x86.Int, v1 *x86.Int, v2 *x86.Int, v3 *x86.Int)\n\n// Set packed 16-bit integers in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname MmSetrEpi16 MmSetrEpi16\n//go:noescape\nfunc MmSetrEpi16(r *x86.M128I, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short, v4 *x86.Short, v5 *x86.Short, v6 *x86.Short, v7 *x86.Short)\n\n// Set packed 8-bit integers in \"dst\" with the supplied values in reverse order.\n//\n//go:linkname MmSetrEpi8 MmSetrEpi8\n//go:noescape\nfunc MmSetrEpi8(r *x86.M128I, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char, v8 *x86.Char, v9 *x86.Char, v10 *x86.Char, v11 *x86.Char, v12 *x86.Char, v13 *x86.Char, v14 *x86.Char, v15 *x86.Char)\n\n// Return vector of type __m128i with all elements set to zero.\n//\n//go:linkname MmSetzeroSi128 MmSetzeroSi128\n//go:noescape\nfunc MmSetzeroSi128(r *x86.M128I, )\n\n// Convert packed signed 16-bit integers from \"a\" and \"b\" to packed 8-bit integers using signed saturation, and store the results in \"dst\".\n//\n//go:linkname MmPacksEpi16 MmPacksEpi16\n//go:noescape\nfunc MmPacksEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Convert packed signed 32-bit integers from \"a\" and \"b\" to packed 16-bit integers using signed saturation, and store the results in \"dst\".\n//\n//go:linkname MmPacksEpi32 MmPacksEpi32\n//go:noescape\nfunc MmPacksEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Convert packed signed 16-bit integers from \"a\" and \"b\" to packed 8-bit integers using unsigned saturation, and store the results in \"dst\".\n//\n//go:linkname MmPackusEpi16 MmPackusEpi16\n//go:noescape\nfunc MmPackusEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Create mask from the most significant bit of each 8-bit element in \"a\", and store the result in \"dst\".\n//\n//go:linkname MmMovemaskEpi8 MmMovemaskEpi8\n//go:noescape\nfunc MmMovemaskEpi8(r *x86.Int, v0 *x86.M128I)\n\n// Unpack and interleave 8-bit integers from the high half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackhiEpi8 MmUnpackhiEpi8\n//go:noescape\nfunc MmUnpackhiEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Unpack and interleave 16-bit integers from the high half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackhiEpi16 MmUnpackhiEpi16\n//go:noescape\nfunc MmUnpackhiEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Unpack and interleave 32-bit integers from the high half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackhiEpi32 MmUnpackhiEpi32\n//go:noescape\nfunc MmUnpackhiEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Unpack and interleave 64-bit integers from the high half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackhiEpi64 MmUnpackhiEpi64\n//go:noescape\nfunc MmUnpackhiEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Unpack and interleave 8-bit integers from the low half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackloEpi8 MmUnpackloEpi8\n//go:noescape\nfunc MmUnpackloEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Unpack and interleave 16-bit integers from the low half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackloEpi16 MmUnpackloEpi16\n//go:noescape\nfunc MmUnpackloEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Unpack and interleave 32-bit integers from the low half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackloEpi32 MmUnpackloEpi32\n//go:noescape\nfunc MmUnpackloEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Unpack and interleave 64-bit integers from the low half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackloEpi64 MmUnpackloEpi64\n//go:noescape\nfunc MmUnpackloEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Copy the lower 64-bit integer in \"a\" to \"dst\".\n//\n//go:linkname MmMovepi64Pi64 MmMovepi64Pi64\n//go:noescape\nfunc MmMovepi64Pi64(r *x86.M64, v0 *x86.M128I)\n\n// Copy the 64-bit integer \"a\" to the lower element of \"dst\", and zero the upper element.\n//\n//go:linkname MmMovpi64Epi64 MmMovpi64Epi64\n//go:noescape\nfunc MmMovpi64Epi64(r *x86.M128I, v0 *x86.M64)\n\n// Copy the lower 64-bit integer in \"a\" to the lower element of \"dst\", and zero the upper element.\n//\n//go:linkname MmMoveEpi64 MmMoveEpi64\n//go:noescape\nfunc MmMoveEpi64(r *x86.M128I, v0 *x86.M128I)\n\n// Unpack and interleave double-precision (64-bit) floating-point elements from the high half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackhiPd MmUnpackhiPd\n//go:noescape\nfunc MmUnpackhiPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Unpack and interleave double-precision (64-bit) floating-point elements from the low half of \"a\" and \"b\", and store the results in \"dst\".\n//\n//go:linkname MmUnpackloPd MmUnpackloPd\n//go:noescape\nfunc MmUnpackloPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Set each bit of mask \"dst\" based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in \"a\".\n//\n//go:linkname MmMovemaskPd MmMovemaskPd\n//go:noescape\nfunc MmMovemaskPd(r *x86.Int, v0 *x86.M128D)\n\n// Cast vector of type __m128d to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname MmCastpdPs MmCastpdPs\n//go:noescape\nfunc MmCastpdPs(r *x86.M128, v0 *x86.M128D)\n\n// Cast vector of type __m128d to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname MmCastpdSi128 MmCastpdSi128\n//go:noescape\nfunc MmCastpdSi128(r *x86.M128I, v0 *x86.M128D)\n\n// Cast vector of type __m128 to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname MmCastpsPd MmCastpsPd\n//go:noescape\nfunc MmCastpsPd(r *x86.M128D, v0 *x86.M128)\n\n// Cast vector of type __m128 to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname MmCastpsSi128 MmCastpsSi128\n//go:noescape\nfunc MmCastpsSi128(r *x86.M128I, v0 *x86.M128)\n\n// Cast vector of type __m128i to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname MmCastsi128Ps MmCastsi128Ps\n//go:noescape\nfunc MmCastsi128Ps(r *x86.M128, v0 *x86.M128I)\n\n// Cast vector of type __m128i to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency.\n//\n//go:linkname MmCastsi128Pd MmCastsi128Pd\n//go:noescape\nfunc MmCastsi128Pd(r *x86.M128D, v0 *x86.M128I)\n"
  },
  {
    "path": "x86/sse3/functions.c",
    "content": "#include <immintrin.h>\n\nvoid MmAddsubPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_addsub_ps(*v0, *v1); }\nvoid MmHaddPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_hadd_ps(*v0, *v1); }\nvoid MmHsubPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_hsub_ps(*v0, *v1); }\nvoid MmMovehdupPs(__m128* r, __m128* v0) { *r = _mm_movehdup_ps(*v0); }\nvoid MmMoveldupPs(__m128* r, __m128* v0) { *r = _mm_moveldup_ps(*v0); }\nvoid MmAddsubPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_addsub_pd(*v0, *v1); }\nvoid MmHaddPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_hadd_pd(*v0, *v1); }\nvoid MmHsubPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_hsub_pd(*v0, *v1); }\nvoid MmMovedupPd(__m128d* r, __m128d* v0) { *r = _mm_movedup_pd(*v0); }\nvoid MmMwait(unsigned* v0, unsigned* v1) { _mm_mwait(*v0, *v1); }\n"
  },
  {
    "path": "x86/sse3/functions.go",
    "content": "package sse3\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -msse3\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Alternatively add and subtract packed single-precision (32-bit) floating-point elements in \"a\" to/from packed elements in \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAddsubPs MmAddsubPs\n//go:noescape\nfunc MmAddsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in \"a\" and \"b\", and pack the results in \"dst\".\n//\n//go:linkname MmHaddPs MmHaddPs\n//go:noescape\nfunc MmHaddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in \"a\" and \"b\", and pack the results in \"dst\".\n//\n//go:linkname MmHsubPs MmHsubPs\n//go:noescape\nfunc MmHsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128)\n\n// Duplicate odd-indexed single-precision (32-bit) floating-point elements from \"a\", and store the results in \"dst\".\n//\n//go:linkname MmMovehdupPs MmMovehdupPs\n//go:noescape\nfunc MmMovehdupPs(r *x86.M128, v0 *x86.M128)\n\n// Duplicate even-indexed single-precision (32-bit) floating-point elements from \"a\", and store the results in \"dst\".\n//\n//go:linkname MmMoveldupPs MmMoveldupPs\n//go:noescape\nfunc MmMoveldupPs(r *x86.M128, v0 *x86.M128)\n\n// Alternatively add and subtract packed double-precision (64-bit) floating-point elements in \"a\" to/from packed elements in \"b\", and store the results in \"dst\".\n//\n//go:linkname MmAddsubPd MmAddsubPd\n//go:noescape\nfunc MmAddsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in \"a\" and \"b\", and pack the results in \"dst\".\n//\n//go:linkname MmHaddPd MmHaddPd\n//go:noescape\nfunc MmHaddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in \"a\" and \"b\", and pack the results in \"dst\".\n//\n//go:linkname MmHsubPd MmHsubPd\n//go:noescape\nfunc MmHsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D)\n\n// Duplicate the low double-precision (64-bit) floating-point element from \"a\", and store the results in \"dst\".\n//\n//go:linkname MmMovedupPd MmMovedupPd\n//go:noescape\nfunc MmMovedupPd(r *x86.M128D, v0 *x86.M128D)\n\n// Hint to the processor that it can enter an implementation-dependent-optimized state while waiting for an event or store operation to the address range specified by MONITOR.\n//\n//go:linkname MmMwait MmMwait\n//go:noescape\nfunc MmMwait(v0 *x86.Unsigned, v1 *x86.Unsigned)\n"
  },
  {
    "path": "x86/ssse3/functions.c",
    "content": "#include <immintrin.h>\n\nvoid MmAbsEpi8(__m128i* r, __m128i* v0) { *r = _mm_abs_epi8(*v0); }\nvoid MmAbsEpi16(__m128i* r, __m128i* v0) { *r = _mm_abs_epi16(*v0); }\nvoid MmAbsEpi32(__m128i* r, __m128i* v0) { *r = _mm_abs_epi32(*v0); }\nvoid MmHaddEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hadd_epi16(*v0, *v1); }\nvoid MmHaddEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hadd_epi32(*v0, *v1); }\nvoid MmHaddsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hadds_epi16(*v0, *v1); }\nvoid MmHsubEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hsub_epi16(*v0, *v1); }\nvoid MmHsubEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hsub_epi32(*v0, *v1); }\nvoid MmHsubsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hsubs_epi16(*v0, *v1); }\nvoid MmMaddubsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_maddubs_epi16(*v0, *v1); }\nvoid MmMulhrsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mulhrs_epi16(*v0, *v1); }\nvoid MmShuffleEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_shuffle_epi8(*v0, *v1); }\nvoid MmSignEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sign_epi8(*v0, *v1); }\nvoid MmSignEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sign_epi16(*v0, *v1); }\nvoid MmSignEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sign_epi32(*v0, *v1); }\n"
  },
  {
    "path": "x86/ssse3/functions.go",
    "content": "package ssse3\n\nimport (\n\t\"github.com/alivanz/go-simd/x86\"\n)\n\n/*\n#cgo CFLAGS: -mssse3\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// Compute the absolute value of packed signed 8-bit integers in \"a\", and store the unsigned results in \"dst\".\n//\n//go:linkname MmAbsEpi8 MmAbsEpi8\n//go:noescape\nfunc MmAbsEpi8(r *x86.M128I, v0 *x86.M128I)\n\n// Compute the absolute value of packed signed 16-bit integers in \"a\", and store the unsigned results in \"dst\".\n//\n//go:linkname MmAbsEpi16 MmAbsEpi16\n//go:noescape\nfunc MmAbsEpi16(r *x86.M128I, v0 *x86.M128I)\n\n// Compute the absolute value of packed signed 32-bit integers in \"a\", and store the unsigned results in \"dst\".\n//\n//go:linkname MmAbsEpi32 MmAbsEpi32\n//go:noescape\nfunc MmAbsEpi32(r *x86.M128I, v0 *x86.M128I)\n\n// Horizontally add adjacent pairs of 16-bit integers in \"a\" and \"b\", and pack the signed 16-bit results in \"dst\".\n//\n//go:linkname MmHaddEpi16 MmHaddEpi16\n//go:noescape\nfunc MmHaddEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Horizontally add adjacent pairs of 32-bit integers in \"a\" and \"b\", and pack the signed 32-bit results in \"dst\".\n//\n//go:linkname MmHaddEpi32 MmHaddEpi32\n//go:noescape\nfunc MmHaddEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Horizontally add adjacent pairs of signed 16-bit integers in \"a\" and \"b\" using saturation, and pack the signed 16-bit results in \"dst\".\n//\n//go:linkname MmHaddsEpi16 MmHaddsEpi16\n//go:noescape\nfunc MmHaddsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Horizontally subtract adjacent pairs of 16-bit integers in \"a\" and \"b\", and pack the signed 16-bit results in \"dst\".\n//\n//go:linkname MmHsubEpi16 MmHsubEpi16\n//go:noescape\nfunc MmHsubEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Horizontally subtract adjacent pairs of 32-bit integers in \"a\" and \"b\", and pack the signed 32-bit results in \"dst\".\n//\n//go:linkname MmHsubEpi32 MmHsubEpi32\n//go:noescape\nfunc MmHsubEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Horizontally subtract adjacent pairs of signed 16-bit integers in \"a\" and \"b\" using saturation, and pack the signed 16-bit results in \"dst\".\n//\n//go:linkname MmHsubsEpi16 MmHsubsEpi16\n//go:noescape\nfunc MmHsubsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Vertically multiply each unsigned 8-bit integer from \"a\" with the corresponding signed 8-bit integer from \"b\", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in \"dst\".\n//\n//go:linkname MmMaddubsEpi16 MmMaddubsEpi16\n//go:noescape\nfunc MmMaddubsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Multiply packed signed 16-bit integers in \"a\" and \"b\", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to \"dst\".\n//\n//go:linkname MmMulhrsEpi16 MmMulhrsEpi16\n//go:noescape\nfunc MmMulhrsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Shuffle packed 8-bit integers in \"a\" according to shuffle control mask in the corresponding 8-bit element of \"b\", and store the results in \"dst\".\n//\n//go:linkname MmShuffleEpi8 MmShuffleEpi8\n//go:noescape\nfunc MmShuffleEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Negate packed 8-bit integers in \"a\" when the corresponding signed 8-bit integer in \"b\" is negative, and store the results in \"dst\". Element in \"dst\" are zeroed out when the corresponding element in \"b\" is zero.\n//\n//go:linkname MmSignEpi8 MmSignEpi8\n//go:noescape\nfunc MmSignEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Negate packed 16-bit integers in \"a\" when the corresponding signed 16-bit integer in \"b\" is negative, and store the results in \"dst\". Element in \"dst\" are zeroed out when the corresponding element in \"b\" is zero.\n//\n//go:linkname MmSignEpi16 MmSignEpi16\n//go:noescape\nfunc MmSignEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n\n// Negate packed 32-bit integers in \"a\" when the corresponding signed 32-bit integer in \"b\" is negative, and store the results in \"dst\". Element in \"dst\" are zeroed out when the corresponding element in \"b\" is zero.\n//\n//go:linkname MmSignEpi32 MmSignEpi32\n//go:noescape\nfunc MmSignEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I)\n"
  },
  {
    "path": "x86/types.go",
    "content": "package x86\n\n/*\n#include <immintrin.h>\n*/\nimport \"C\"\n\n// typedef longlong __m64 __attribute__((__vector_size__(8), __aligned__(8)));\ntype M64 = C.__m64\n\n// typedef float __m128 __attribute__((__vector_size__(16), __aligned__(16)));\ntype M128 = C.__m128\n\n// typedef double __m128d __attribute__((__vector_size__(16), __aligned__(16)));\ntype M128D = C.__m128d\n\n// typedef longlong __m128i __attribute__((__vector_size__(16), __aligned__(16)));\ntype M128I = C.__m128i\n\n// typedef double __m256d __attribute__((__vector_size__(32), __aligned__(32)));\ntype M256D = C.__m256d\n\n// typedef longlong __m256i __attribute__((__vector_size__(32), __aligned__(32)));\ntype M256I = C.__m256i\n\n// uint\ntype Uint = C.uint\n\n// uchar __D\ntype Uchar = C.uchar\n\n// ushort __D\ntype Ushort = C.ushort\n\n// ulonglong\ntype Ulonglong = C.ulonglong\n\n// int __i\ntype Int = C.int\n\n// longlong __i\ntype Longlong = C.longlong\n\n// short __s3\ntype Short = C.short\n\n// char __b7\ntype Char = C.char\n\n// float\ntype Float = C.float\n\n// double\ntype Double = C.double\n\n// unsigned __extensions\ntype Unsigned = C.unsigned\n\n// __m256\ntype M256 = C.__m256\n"
  }
]