Repository: alivanz/go-simd Branch: main Commit: f3b5f7d73797 Files: 76 Total size: 1.9 MB Directory structure: gitextract_6cse9yuu/ ├── .gitignore ├── LICENSE ├── README.md ├── arm/ │ ├── generate.go │ ├── neon/ │ │ ├── functions.c │ │ ├── functions.go │ │ ├── functions_bypass.go │ │ ├── functions_cgo.go │ │ ├── functions_test.go │ │ ├── loops.c │ │ ├── loops.go │ │ └── loops_test.go │ └── types.go ├── example/ │ ├── neon/ │ │ └── main.go │ └── sse2/ │ └── main.go ├── generator/ │ ├── arm/ │ │ ├── arm.go │ │ ├── main.go │ │ └── sort.go │ ├── scanner/ │ │ ├── scan.go │ │ ├── scan_test.go │ │ └── util.go │ ├── types/ │ │ ├── function.go │ │ └── type.go │ ├── utils/ │ │ ├── download.go │ │ ├── filter.go │ │ └── slice.go │ ├── writer/ │ │ ├── cgo.go │ │ ├── function.go │ │ ├── package.go │ │ ├── package_test.go │ │ ├── type.go │ │ └── writer.go │ └── x86/ │ ├── info.go │ └── main.go ├── go.mod ├── go.sum └── x86/ ├── aes/ │ ├── functions.c │ └── functions.go ├── avx/ │ ├── functions.c │ └── functions.go ├── avx2/ │ ├── functions.c │ └── functions.go ├── bmi/ │ ├── functions.c │ └── functions.go ├── bmi2/ │ ├── functions.c │ └── functions.go ├── crc32/ │ ├── functions.c │ └── functions.go ├── f16c/ │ ├── functions.c │ └── functions.go ├── fma/ │ ├── functions.c │ └── functions.go ├── fsgsbase/ │ ├── functions.c │ └── functions.go ├── generate.go ├── lzcnt/ │ ├── functions.c │ └── functions.go ├── mmx/ │ ├── functions.c │ └── functions.go ├── mmx_sse/ │ ├── functions.c │ └── functions.go ├── mmx_sse2/ │ ├── functions.c │ └── functions.go ├── mmx_ssse3/ │ ├── functions.c │ └── functions.go ├── popcnt/ │ ├── functions.c │ └── functions.go ├── sse/ │ ├── functions.c │ └── functions.go ├── sse2/ │ ├── functions.c │ └── functions.go ├── sse3/ │ ├── functions.c │ └── functions.go ├── ssse3/ │ ├── functions.c │ └── functions.go └── types.go ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ .vscode raw.h intrinsics.json data.xml ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2023 Alivan Akbar Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # SIMD Implementation in Golang This repository contains an implementation of SIMD (Single Instruction, Multiple Data) operations in Go, specifically targeting ARM NEON architecture. The goal is to provide optimized parallel processing capabilities for certain computational tasks. ## Future Plans We are actively working on expanding the SIMD implementation to support x86 architecture as well. The upcoming x86 implementation will provide similar SIMD functionalities for parallel processing on x86-based systems. ## Hacks When we call a C function through CGO, there are some overheads due to Go design. In general, avoiding CGO would be a good idea. But I found a hack, instead of relying on CGO, we can utilize `linkname` directive to call C code, bypass CGO, and get better performance. ``` goos: darwin goarch: arm64 pkg: github.com/alivanz/go-simd/arm/neon BenchmarkMultRef-8 131395 9168 ns/op BenchmarkMultSimd-8 598742 1954 ns/op BenchmarkMultSimdBypass-8 605554 1959 ns/op BenchmarkMultSimdFull-8 1816879 661.3 ns/op BenchmarkMultSimdCgo-8 13020 92213 ns/op PASS ``` ``` goos: darwin goarch: arm64 pkg: github.com/alivanz/go-simd/arm/neon cpu: Apple M2 BenchmarkVmulqF32N-8 8848 124616 ns/op 33657.86 MB/s 1422 B/op 0 allocs/op BenchmarkVmulqF32C-8 2256 528683 ns/op 7933.49 MB/s 5577 B/op 0 allocs/op BenchmarkVmulqF32Ref-8 3630 327995 ns/op 12787.69 MB/s 3466 B/op 0 allocs/op PASS ok github.com/alivanz/go-simd/arm/neon 5.793s ``` The floating-point multiplication benchmarks demonstrate significant performance differences between implementations: - `VmulqF32N` (Native): Achieves the highest throughput at 33.6 GB/s with minimal memory allocation (1422 B/op). This implementation leverages direct SIMD instructions for optimal performance. - `VmulqF32C` (C): Shows the lowest performance at 7.9 GB/s with higher memory allocation (5577 B/op), likely due to the overhead of CGO calls and memory management. - `VmulqF32Ref` (Reference): Performs at 12.8 GB/s with moderate memory usage (3466 B/op), serving as a baseline for comparison. These results highlight the importance of using native SIMD implementations over CGO-based solutions for performance-critical applications. The native implementation is approximately 2.6x faster than the reference implementation, while the C implementation is about 1.6x slower than the reference. ## Features - SIMD operations for ARM NEON architecture. - High-performance parallel processing for specific tasks. - Utilizes the power of SIMD instructions to process multiple data elements simultaneously. - Supports a range of data types, including integers and floating-point numbers. - Modular design for easy integration into existing projects. - Well-documented code for understanding and extending the implementation. ## Roadmap - [x] Implement SIMD operations for ARM NEON architecture. - [ ] Add support for x86 architecture. - [ ] Expand SIMD operations for additional data types. - [ ] Optimize performance for specific use cases. - [ ] Develop comprehensive test suite for validation. ## Usage To use the SIMD implementations in your project, follow these steps: 1. Import the required package in your Go code: ```go import "github.com/alivanz/go-simd" ``` 2. Use the SIMD functions in your code as needed. Example: ```go package main import ( "log" "github.com/alivanz/go-simd/arm" "github.com/alivanz/go-simd/arm/neon" ) func main() { var a, b arm.Int8X8 var add, mul arm.Int16X8 for i := 0; i < 8; i++ { a[i] = arm.Int8(i) b[i] = arm.Int8(i * i) } log.Printf("a = %+v", a) log.Printf("b = %+v", b) neon.VaddlS8(&add, &a, &b) neon.VmullS8(&mul, &a, &b) log.Printf("add = %+v", add) log.Printf("mul = %+v", mul) } ``` ## Supported Operations Only ARM Neon supported, for now. Refer to the documentation in each respective file for more details on how to use each operation. ## Contributing Contributions to this project are welcome. To contribute, please follow these steps: 1. Fork the repository. 2. Create a new branch for your feature or bug fix. 3. Make your changes and commit them with descriptive messages. 4. Push your changes to your forked repository. 5. Submit a pull request to the main repository. Please ensure that your code follows the existing code style and includes appropriate tests. ## Acknowledgments - The ARM NEON architecture documentation for providing valuable insights into SIMD programming techniques. - The open-source community for their contributions and inspiration. ## Contact For any questions or feedback regarding this repository, please feel free to contact me at [alivan1627@gmail.com](mailto:alivan1627@gmail.com) ================================================ FILE: arm/generate.go ================================================ package arm //go:generate go run ../generator/arm ================================================ FILE: arm/neon/functions.c ================================================ #include void VabaS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vaba_s8(*v0, *v1, *v2); } void VabaS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vaba_s16(*v0, *v1, *v2); } void VabaS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vaba_s32(*v0, *v1, *v2); } void VabaU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vaba_u8(*v0, *v1, *v2); } void VabaU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vaba_u16(*v0, *v1, *v2); } void VabaU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vaba_u32(*v0, *v1, *v2); } void VabalS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vabal_s8(*v0, *v1, *v2); } void VabalS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vabal_s16(*v0, *v1, *v2); } void VabalS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vabal_s32(*v0, *v1, *v2); } void VabalU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vabal_u8(*v0, *v1, *v2); } void VabalU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vabal_u16(*v0, *v1, *v2); } void VabalU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vabal_u32(*v0, *v1, *v2); } void VabalHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vabal_high_s8(*v0, *v1, *v2); } void VabalHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vabal_high_s16(*v0, *v1, *v2); } void VabalHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vabal_high_s32(*v0, *v1, *v2); } void VabalHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vabal_high_u8(*v0, *v1, *v2); } void VabalHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vabal_high_u16(*v0, *v1, *v2); } void VabalHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vabal_high_u32(*v0, *v1, *v2); } void VabaqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vabaq_s8(*v0, *v1, *v2); } void VabaqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vabaq_s16(*v0, *v1, *v2); } void VabaqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vabaq_s32(*v0, *v1, *v2); } void VabaqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vabaq_u8(*v0, *v1, *v2); } void VabaqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vabaq_u16(*v0, *v1, *v2); } void VabaqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vabaq_u32(*v0, *v1, *v2); } void VabdS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vabd_s8(*v0, *v1); } void VabdS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vabd_s16(*v0, *v1); } void VabdS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vabd_s32(*v0, *v1); } void VabdU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vabd_u8(*v0, *v1); } void VabdU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vabd_u16(*v0, *v1); } void VabdU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vabd_u32(*v0, *v1); } void VabdF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vabd_f32(*v0, *v1); } void VabdF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vabd_f64(*v0, *v1); } void VabddF64(float64_t* r, float64_t* v0, float64_t* v1) { *r = vabdd_f64(*v0, *v1); } void VabdlS8(int16x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vabdl_s8(*v0, *v1); } void VabdlS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vabdl_s16(*v0, *v1); } void VabdlS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vabdl_s32(*v0, *v1); } void VabdlU8(uint16x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vabdl_u8(*v0, *v1); } void VabdlU16(uint32x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vabdl_u16(*v0, *v1); } void VabdlU32(uint64x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vabdl_u32(*v0, *v1); } void VabdlHighS8(int16x8_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vabdl_high_s8(*v0, *v1); } void VabdlHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vabdl_high_s16(*v0, *v1); } void VabdlHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vabdl_high_s32(*v0, *v1); } void VabdlHighU8(uint16x8_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vabdl_high_u8(*v0, *v1); } void VabdlHighU16(uint32x4_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vabdl_high_u16(*v0, *v1); } void VabdlHighU32(uint64x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vabdl_high_u32(*v0, *v1); } void VabdqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vabdq_s8(*v0, *v1); } void VabdqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vabdq_s16(*v0, *v1); } void VabdqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vabdq_s32(*v0, *v1); } void VabdqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vabdq_u8(*v0, *v1); } void VabdqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vabdq_u16(*v0, *v1); } void VabdqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vabdq_u32(*v0, *v1); } void VabdqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vabdq_f32(*v0, *v1); } void VabdqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vabdq_f64(*v0, *v1); } void VabdsF32(float32_t* r, float32_t* v0, float32_t* v1) { *r = vabds_f32(*v0, *v1); } void VabsS8(int8x8_t* r, int8x8_t* v0) { *r = vabs_s8(*v0); } void VabsS16(int16x4_t* r, int16x4_t* v0) { *r = vabs_s16(*v0); } void VabsS32(int32x2_t* r, int32x2_t* v0) { *r = vabs_s32(*v0); } void VabsS64(int64x1_t* r, int64x1_t* v0) { *r = vabs_s64(*v0); } void VabsF32(float32x2_t* r, float32x2_t* v0) { *r = vabs_f32(*v0); } void VabsF64(float64x1_t* r, float64x1_t* v0) { *r = vabs_f64(*v0); } void VabsdS64(int64_t* r, int64_t* v0) { *r = vabsd_s64(*v0); } void VabsqS8(int8x16_t* r, int8x16_t* v0) { *r = vabsq_s8(*v0); } void VabsqS16(int16x8_t* r, int16x8_t* v0) { *r = vabsq_s16(*v0); } void VabsqS32(int32x4_t* r, int32x4_t* v0) { *r = vabsq_s32(*v0); } void VabsqS64(int64x2_t* r, int64x2_t* v0) { *r = vabsq_s64(*v0); } void VabsqF32(float32x4_t* r, float32x4_t* v0) { *r = vabsq_f32(*v0); } void VabsqF64(float64x2_t* r, float64x2_t* v0) { *r = vabsq_f64(*v0); } void VaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vadd_s8(*v0, *v1); } void VaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vadd_s16(*v0, *v1); } void VaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vadd_s32(*v0, *v1); } void VaddS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vadd_s64(*v0, *v1); } void VaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vadd_u8(*v0, *v1); } void VaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vadd_u16(*v0, *v1); } void VaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vadd_u32(*v0, *v1); } void VaddU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vadd_u64(*v0, *v1); } void VaddF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vadd_f32(*v0, *v1); } void VaddF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vadd_f64(*v0, *v1); } void VaddP16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vadd_p16(*v0, *v1); } void VaddP64(poly64x1_t* r, poly64x1_t* v0, poly64x1_t* v1) { *r = vadd_p64(*v0, *v1); } void VaddP8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vadd_p8(*v0, *v1); } void VadddS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vaddd_s64(*v0, *v1); } void VadddU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vaddd_u64(*v0, *v1); } void VaddhnS16(int8x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vaddhn_s16(*v0, *v1); } void VaddhnS32(int16x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vaddhn_s32(*v0, *v1); } void VaddhnS64(int32x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vaddhn_s64(*v0, *v1); } void VaddhnU16(uint8x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vaddhn_u16(*v0, *v1); } void VaddhnU32(uint16x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vaddhn_u32(*v0, *v1); } void VaddhnU64(uint32x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vaddhn_u64(*v0, *v1); } void VaddhnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vaddhn_high_s16(*v0, *v1, *v2); } void VaddhnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vaddhn_high_s32(*v0, *v1, *v2); } void VaddhnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vaddhn_high_s64(*v0, *v1, *v2); } void VaddhnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vaddhn_high_u16(*v0, *v1, *v2); } void VaddhnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vaddhn_high_u32(*v0, *v1, *v2); } void VaddhnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vaddhn_high_u64(*v0, *v1, *v2); } void VaddlS8(int16x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vaddl_s8(*v0, *v1); } void VaddlS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vaddl_s16(*v0, *v1); } void VaddlS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vaddl_s32(*v0, *v1); } void VaddlU8(uint16x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vaddl_u8(*v0, *v1); } void VaddlU16(uint32x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vaddl_u16(*v0, *v1); } void VaddlU32(uint64x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vaddl_u32(*v0, *v1); } void VaddlHighS8(int16x8_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vaddl_high_s8(*v0, *v1); } void VaddlHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vaddl_high_s16(*v0, *v1); } void VaddlHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vaddl_high_s32(*v0, *v1); } void VaddlHighU8(uint16x8_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vaddl_high_u8(*v0, *v1); } void VaddlHighU16(uint32x4_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vaddl_high_u16(*v0, *v1); } void VaddlHighU32(uint64x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vaddl_high_u32(*v0, *v1); } void VaddlvS8(int16_t* r, int8x8_t* v0) { *r = vaddlv_s8(*v0); } void VaddlvS16(int32_t* r, int16x4_t* v0) { *r = vaddlv_s16(*v0); } void VaddlvS32(int64_t* r, int32x2_t* v0) { *r = vaddlv_s32(*v0); } void VaddlvU8(uint16_t* r, uint8x8_t* v0) { *r = vaddlv_u8(*v0); } void VaddlvU16(uint32_t* r, uint16x4_t* v0) { *r = vaddlv_u16(*v0); } void VaddlvU32(uint64_t* r, uint32x2_t* v0) { *r = vaddlv_u32(*v0); } void VaddlvqS8(int16_t* r, int8x16_t* v0) { *r = vaddlvq_s8(*v0); } void VaddlvqS16(int32_t* r, int16x8_t* v0) { *r = vaddlvq_s16(*v0); } void VaddlvqS32(int64_t* r, int32x4_t* v0) { *r = vaddlvq_s32(*v0); } void VaddlvqU8(uint16_t* r, uint8x16_t* v0) { *r = vaddlvq_u8(*v0); } void VaddlvqU16(uint32_t* r, uint16x8_t* v0) { *r = vaddlvq_u16(*v0); } void VaddlvqU32(uint64_t* r, uint32x4_t* v0) { *r = vaddlvq_u32(*v0); } void VaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vaddq_s8(*v0, *v1); } void VaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vaddq_s16(*v0, *v1); } void VaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vaddq_s32(*v0, *v1); } void VaddqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vaddq_s64(*v0, *v1); } void VaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vaddq_u8(*v0, *v1); } void VaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vaddq_u16(*v0, *v1); } void VaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vaddq_u32(*v0, *v1); } void VaddqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vaddq_u64(*v0, *v1); } void VaddqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vaddq_f32(*v0, *v1); } void VaddqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vaddq_f64(*v0, *v1); } void VaddqP128(poly128_t* r, poly128_t* v0, poly128_t* v1) { *r = vaddq_p128(*v0, *v1); } void VaddqP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vaddq_p16(*v0, *v1); } void VaddqP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vaddq_p64(*v0, *v1); } void VaddqP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vaddq_p8(*v0, *v1); } void VaddvS8(int8_t* r, int8x8_t* v0) { *r = vaddv_s8(*v0); } void VaddvS16(int16_t* r, int16x4_t* v0) { *r = vaddv_s16(*v0); } void VaddvS32(int32_t* r, int32x2_t* v0) { *r = vaddv_s32(*v0); } void VaddvU8(uint8_t* r, uint8x8_t* v0) { *r = vaddv_u8(*v0); } void VaddvU16(uint16_t* r, uint16x4_t* v0) { *r = vaddv_u16(*v0); } void VaddvU32(uint32_t* r, uint32x2_t* v0) { *r = vaddv_u32(*v0); } void VaddvF32(float32_t* r, float32x2_t* v0) { *r = vaddv_f32(*v0); } void VaddvqS8(int8_t* r, int8x16_t* v0) { *r = vaddvq_s8(*v0); } void VaddvqS16(int16_t* r, int16x8_t* v0) { *r = vaddvq_s16(*v0); } void VaddvqS32(int32_t* r, int32x4_t* v0) { *r = vaddvq_s32(*v0); } void VaddvqS64(int64_t* r, int64x2_t* v0) { *r = vaddvq_s64(*v0); } void VaddvqU8(uint8_t* r, uint8x16_t* v0) { *r = vaddvq_u8(*v0); } void VaddvqU16(uint16_t* r, uint16x8_t* v0) { *r = vaddvq_u16(*v0); } void VaddvqU32(uint32_t* r, uint32x4_t* v0) { *r = vaddvq_u32(*v0); } void VaddvqU64(uint64_t* r, uint64x2_t* v0) { *r = vaddvq_u64(*v0); } void VaddvqF32(float32_t* r, float32x4_t* v0) { *r = vaddvq_f32(*v0); } void VaddvqF64(float64_t* r, float64x2_t* v0) { *r = vaddvq_f64(*v0); } void VaddwS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1) { *r = vaddw_s8(*v0, *v1); } void VaddwS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1) { *r = vaddw_s16(*v0, *v1); } void VaddwS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1) { *r = vaddw_s32(*v0, *v1); } void VaddwU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1) { *r = vaddw_u8(*v0, *v1); } void VaddwU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1) { *r = vaddw_u16(*v0, *v1); } void VaddwU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1) { *r = vaddw_u32(*v0, *v1); } void VaddwHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1) { *r = vaddw_high_s8(*v0, *v1); } void VaddwHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1) { *r = vaddw_high_s16(*v0, *v1); } void VaddwHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1) { *r = vaddw_high_s32(*v0, *v1); } void VaddwHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1) { *r = vaddw_high_u8(*v0, *v1); } void VaddwHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1) { *r = vaddw_high_u16(*v0, *v1); } void VaddwHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1) { *r = vaddw_high_u32(*v0, *v1); } void VaesdqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vaesdq_u8(*v0, *v1); } void VaeseqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vaeseq_u8(*v0, *v1); } void VaesimcqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vaesimcq_u8(*v0); } void VaesmcqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vaesmcq_u8(*v0); } void VandS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vand_s8(*v0, *v1); } void VandS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vand_s16(*v0, *v1); } void VandS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vand_s32(*v0, *v1); } void VandS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vand_s64(*v0, *v1); } void VandU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vand_u8(*v0, *v1); } void VandU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vand_u16(*v0, *v1); } void VandU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vand_u32(*v0, *v1); } void VandU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vand_u64(*v0, *v1); } void VandqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vandq_s8(*v0, *v1); } void VandqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vandq_s16(*v0, *v1); } void VandqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vandq_s32(*v0, *v1); } void VandqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vandq_s64(*v0, *v1); } void VandqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vandq_u8(*v0, *v1); } void VandqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vandq_u16(*v0, *v1); } void VandqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vandq_u32(*v0, *v1); } void VandqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vandq_u64(*v0, *v1); } void VbcaxqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vbcaxq_s8(*v0, *v1, *v2); } void VbcaxqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vbcaxq_s16(*v0, *v1, *v2); } void VbcaxqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vbcaxq_s32(*v0, *v1, *v2); } void VbcaxqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vbcaxq_s64(*v0, *v1, *v2); } void VbcaxqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vbcaxq_u8(*v0, *v1, *v2); } void VbcaxqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vbcaxq_u16(*v0, *v1, *v2); } void VbcaxqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vbcaxq_u32(*v0, *v1, *v2); } void VbcaxqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vbcaxq_u64(*v0, *v1, *v2); } void VbicS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vbic_s8(*v0, *v1); } void VbicS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vbic_s16(*v0, *v1); } void VbicS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vbic_s32(*v0, *v1); } void VbicS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vbic_s64(*v0, *v1); } void VbicU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vbic_u8(*v0, *v1); } void VbicU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vbic_u16(*v0, *v1); } void VbicU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vbic_u32(*v0, *v1); } void VbicU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vbic_u64(*v0, *v1); } void VbicqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vbicq_s8(*v0, *v1); } void VbicqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vbicq_s16(*v0, *v1); } void VbicqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vbicq_s32(*v0, *v1); } void VbicqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vbicq_s64(*v0, *v1); } void VbicqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vbicq_u8(*v0, *v1); } void VbicqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vbicq_u16(*v0, *v1); } void VbicqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vbicq_u32(*v0, *v1); } void VbicqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vbicq_u64(*v0, *v1); } void VbslS8(int8x8_t* r, uint8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vbsl_s8(*v0, *v1, *v2); } void VbslS16(int16x4_t* r, uint16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vbsl_s16(*v0, *v1, *v2); } void VbslS32(int32x2_t* r, uint32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vbsl_s32(*v0, *v1, *v2); } void VbslS64(int64x1_t* r, uint64x1_t* v0, int64x1_t* v1, int64x1_t* v2) { *r = vbsl_s64(*v0, *v1, *v2); } void VbslU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vbsl_u8(*v0, *v1, *v2); } void VbslU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vbsl_u16(*v0, *v1, *v2); } void VbslU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vbsl_u32(*v0, *v1, *v2); } void VbslU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1, uint64x1_t* v2) { *r = vbsl_u64(*v0, *v1, *v2); } void VbslF32(float32x2_t* r, uint32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vbsl_f32(*v0, *v1, *v2); } void VbslF64(float64x1_t* r, uint64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vbsl_f64(*v0, *v1, *v2); } void VbslP16(poly16x4_t* r, uint16x4_t* v0, poly16x4_t* v1, poly16x4_t* v2) { *r = vbsl_p16(*v0, *v1, *v2); } void VbslP64(poly64x1_t* r, uint64x1_t* v0, poly64x1_t* v1, poly64x1_t* v2) { *r = vbsl_p64(*v0, *v1, *v2); } void VbslP8(poly8x8_t* r, uint8x8_t* v0, poly8x8_t* v1, poly8x8_t* v2) { *r = vbsl_p8(*v0, *v1, *v2); } void VbslqS8(int8x16_t* r, uint8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vbslq_s8(*v0, *v1, *v2); } void VbslqS16(int16x8_t* r, uint16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vbslq_s16(*v0, *v1, *v2); } void VbslqS32(int32x4_t* r, uint32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vbslq_s32(*v0, *v1, *v2); } void VbslqS64(int64x2_t* r, uint64x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vbslq_s64(*v0, *v1, *v2); } void VbslqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vbslq_u8(*v0, *v1, *v2); } void VbslqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vbslq_u16(*v0, *v1, *v2); } void VbslqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vbslq_u32(*v0, *v1, *v2); } void VbslqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vbslq_u64(*v0, *v1, *v2); } void VbslqF32(float32x4_t* r, uint32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vbslq_f32(*v0, *v1, *v2); } void VbslqF64(float64x2_t* r, uint64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vbslq_f64(*v0, *v1, *v2); } void VbslqP16(poly16x8_t* r, uint16x8_t* v0, poly16x8_t* v1, poly16x8_t* v2) { *r = vbslq_p16(*v0, *v1, *v2); } void VbslqP64(poly64x2_t* r, uint64x2_t* v0, poly64x2_t* v1, poly64x2_t* v2) { *r = vbslq_p64(*v0, *v1, *v2); } void VbslqP8(poly8x16_t* r, uint8x16_t* v0, poly8x16_t* v1, poly8x16_t* v2) { *r = vbslq_p8(*v0, *v1, *v2); } void VcaddRot270F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcadd_rot270_f32(*v0, *v1); } void VcaddRot90F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcadd_rot90_f32(*v0, *v1); } void VcaddqRot270F32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcaddq_rot270_f32(*v0, *v1); } void VcaddqRot270F64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcaddq_rot270_f64(*v0, *v1); } void VcaddqRot90F32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcaddq_rot90_f32(*v0, *v1); } void VcaddqRot90F64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcaddq_rot90_f64(*v0, *v1); } void VcageF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcage_f32(*v0, *v1); } void VcageF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcage_f64(*v0, *v1); } void VcagedF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcaged_f64(*v0, *v1); } void VcageqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcageq_f32(*v0, *v1); } void VcageqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcageq_f64(*v0, *v1); } void VcagesF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcages_f32(*v0, *v1); } void VcagtF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcagt_f32(*v0, *v1); } void VcagtF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcagt_f64(*v0, *v1); } void VcagtdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcagtd_f64(*v0, *v1); } void VcagtqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcagtq_f32(*v0, *v1); } void VcagtqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcagtq_f64(*v0, *v1); } void VcagtsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcagts_f32(*v0, *v1); } void VcaleF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcale_f32(*v0, *v1); } void VcaleF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcale_f64(*v0, *v1); } void VcaledF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcaled_f64(*v0, *v1); } void VcaleqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcaleq_f32(*v0, *v1); } void VcaleqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcaleq_f64(*v0, *v1); } void VcalesF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcales_f32(*v0, *v1); } void VcaltF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcalt_f32(*v0, *v1); } void VcaltF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcalt_f64(*v0, *v1); } void VcaltdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcaltd_f64(*v0, *v1); } void VcaltqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcaltq_f32(*v0, *v1); } void VcaltqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcaltq_f64(*v0, *v1); } void VcaltsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcalts_f32(*v0, *v1); } void VceqS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vceq_s8(*v0, *v1); } void VceqS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vceq_s16(*v0, *v1); } void VceqS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vceq_s32(*v0, *v1); } void VceqS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vceq_s64(*v0, *v1); } void VceqU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vceq_u8(*v0, *v1); } void VceqU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vceq_u16(*v0, *v1); } void VceqU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vceq_u32(*v0, *v1); } void VceqU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vceq_u64(*v0, *v1); } void VceqF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vceq_f32(*v0, *v1); } void VceqF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vceq_f64(*v0, *v1); } void VceqP64(uint64x1_t* r, poly64x1_t* v0, poly64x1_t* v1) { *r = vceq_p64(*v0, *v1); } void VceqP8(uint8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vceq_p8(*v0, *v1); } void VceqdS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vceqd_s64(*v0, *v1); } void VceqdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vceqd_u64(*v0, *v1); } void VceqdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vceqd_f64(*v0, *v1); } void VceqqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vceqq_s8(*v0, *v1); } void VceqqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vceqq_s16(*v0, *v1); } void VceqqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vceqq_s32(*v0, *v1); } void VceqqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vceqq_s64(*v0, *v1); } void VceqqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vceqq_u8(*v0, *v1); } void VceqqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vceqq_u16(*v0, *v1); } void VceqqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vceqq_u32(*v0, *v1); } void VceqqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vceqq_u64(*v0, *v1); } void VceqqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vceqq_f32(*v0, *v1); } void VceqqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vceqq_f64(*v0, *v1); } void VceqqP64(uint64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vceqq_p64(*v0, *v1); } void VceqqP8(uint8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vceqq_p8(*v0, *v1); } void VceqsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vceqs_f32(*v0, *v1); } void VceqzS8(uint8x8_t* r, int8x8_t* v0) { *r = vceqz_s8(*v0); } void VceqzS16(uint16x4_t* r, int16x4_t* v0) { *r = vceqz_s16(*v0); } void VceqzS32(uint32x2_t* r, int32x2_t* v0) { *r = vceqz_s32(*v0); } void VceqzS64(uint64x1_t* r, int64x1_t* v0) { *r = vceqz_s64(*v0); } void VceqzU8(uint8x8_t* r, uint8x8_t* v0) { *r = vceqz_u8(*v0); } void VceqzU16(uint16x4_t* r, uint16x4_t* v0) { *r = vceqz_u16(*v0); } void VceqzU32(uint32x2_t* r, uint32x2_t* v0) { *r = vceqz_u32(*v0); } void VceqzU64(uint64x1_t* r, uint64x1_t* v0) { *r = vceqz_u64(*v0); } void VceqzF32(uint32x2_t* r, float32x2_t* v0) { *r = vceqz_f32(*v0); } void VceqzF64(uint64x1_t* r, float64x1_t* v0) { *r = vceqz_f64(*v0); } void VceqzP64(uint64x1_t* r, poly64x1_t* v0) { *r = vceqz_p64(*v0); } void VceqzP8(uint8x8_t* r, poly8x8_t* v0) { *r = vceqz_p8(*v0); } void VceqzdS64(uint64_t* r, int64_t* v0) { *r = vceqzd_s64(*v0); } void VceqzdU64(uint64_t* r, uint64_t* v0) { *r = vceqzd_u64(*v0); } void VceqzdF64(uint64_t* r, float64_t* v0) { *r = vceqzd_f64(*v0); } void VceqzqS8(uint8x16_t* r, int8x16_t* v0) { *r = vceqzq_s8(*v0); } void VceqzqS16(uint16x8_t* r, int16x8_t* v0) { *r = vceqzq_s16(*v0); } void VceqzqS32(uint32x4_t* r, int32x4_t* v0) { *r = vceqzq_s32(*v0); } void VceqzqS64(uint64x2_t* r, int64x2_t* v0) { *r = vceqzq_s64(*v0); } void VceqzqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vceqzq_u8(*v0); } void VceqzqU16(uint16x8_t* r, uint16x8_t* v0) { *r = vceqzq_u16(*v0); } void VceqzqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vceqzq_u32(*v0); } void VceqzqU64(uint64x2_t* r, uint64x2_t* v0) { *r = vceqzq_u64(*v0); } void VceqzqF32(uint32x4_t* r, float32x4_t* v0) { *r = vceqzq_f32(*v0); } void VceqzqF64(uint64x2_t* r, float64x2_t* v0) { *r = vceqzq_f64(*v0); } void VceqzqP64(uint64x2_t* r, poly64x2_t* v0) { *r = vceqzq_p64(*v0); } void VceqzqP8(uint8x16_t* r, poly8x16_t* v0) { *r = vceqzq_p8(*v0); } void VceqzsF32(uint32_t* r, float32_t* v0) { *r = vceqzs_f32(*v0); } void VcgeS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vcge_s8(*v0, *v1); } void VcgeS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vcge_s16(*v0, *v1); } void VcgeS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vcge_s32(*v0, *v1); } void VcgeS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vcge_s64(*v0, *v1); } void VcgeU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vcge_u8(*v0, *v1); } void VcgeU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vcge_u16(*v0, *v1); } void VcgeU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vcge_u32(*v0, *v1); } void VcgeU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vcge_u64(*v0, *v1); } void VcgeF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcge_f32(*v0, *v1); } void VcgeF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcge_f64(*v0, *v1); } void VcgedS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vcged_s64(*v0, *v1); } void VcgedU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vcged_u64(*v0, *v1); } void VcgedF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcged_f64(*v0, *v1); } void VcgeqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vcgeq_s8(*v0, *v1); } void VcgeqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vcgeq_s16(*v0, *v1); } void VcgeqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vcgeq_s32(*v0, *v1); } void VcgeqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vcgeq_s64(*v0, *v1); } void VcgeqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vcgeq_u8(*v0, *v1); } void VcgeqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vcgeq_u16(*v0, *v1); } void VcgeqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vcgeq_u32(*v0, *v1); } void VcgeqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vcgeq_u64(*v0, *v1); } void VcgeqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcgeq_f32(*v0, *v1); } void VcgeqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcgeq_f64(*v0, *v1); } void VcgesF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcges_f32(*v0, *v1); } void VcgezS8(uint8x8_t* r, int8x8_t* v0) { *r = vcgez_s8(*v0); } void VcgezS16(uint16x4_t* r, int16x4_t* v0) { *r = vcgez_s16(*v0); } void VcgezS32(uint32x2_t* r, int32x2_t* v0) { *r = vcgez_s32(*v0); } void VcgezS64(uint64x1_t* r, int64x1_t* v0) { *r = vcgez_s64(*v0); } void VcgezF32(uint32x2_t* r, float32x2_t* v0) { *r = vcgez_f32(*v0); } void VcgezF64(uint64x1_t* r, float64x1_t* v0) { *r = vcgez_f64(*v0); } void VcgezdS64(uint64_t* r, int64_t* v0) { *r = vcgezd_s64(*v0); } void VcgezdF64(uint64_t* r, float64_t* v0) { *r = vcgezd_f64(*v0); } void VcgezqS8(uint8x16_t* r, int8x16_t* v0) { *r = vcgezq_s8(*v0); } void VcgezqS16(uint16x8_t* r, int16x8_t* v0) { *r = vcgezq_s16(*v0); } void VcgezqS32(uint32x4_t* r, int32x4_t* v0) { *r = vcgezq_s32(*v0); } void VcgezqS64(uint64x2_t* r, int64x2_t* v0) { *r = vcgezq_s64(*v0); } void VcgezqF32(uint32x4_t* r, float32x4_t* v0) { *r = vcgezq_f32(*v0); } void VcgezqF64(uint64x2_t* r, float64x2_t* v0) { *r = vcgezq_f64(*v0); } void VcgezsF32(uint32_t* r, float32_t* v0) { *r = vcgezs_f32(*v0); } void VcgtS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vcgt_s8(*v0, *v1); } void VcgtS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vcgt_s16(*v0, *v1); } void VcgtS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vcgt_s32(*v0, *v1); } void VcgtS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vcgt_s64(*v0, *v1); } void VcgtU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vcgt_u8(*v0, *v1); } void VcgtU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vcgt_u16(*v0, *v1); } void VcgtU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vcgt_u32(*v0, *v1); } void VcgtU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vcgt_u64(*v0, *v1); } void VcgtF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcgt_f32(*v0, *v1); } void VcgtF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcgt_f64(*v0, *v1); } void VcgtdS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vcgtd_s64(*v0, *v1); } void VcgtdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vcgtd_u64(*v0, *v1); } void VcgtdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcgtd_f64(*v0, *v1); } void VcgtqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vcgtq_s8(*v0, *v1); } void VcgtqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vcgtq_s16(*v0, *v1); } void VcgtqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vcgtq_s32(*v0, *v1); } void VcgtqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vcgtq_s64(*v0, *v1); } void VcgtqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vcgtq_u8(*v0, *v1); } void VcgtqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vcgtq_u16(*v0, *v1); } void VcgtqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vcgtq_u32(*v0, *v1); } void VcgtqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vcgtq_u64(*v0, *v1); } void VcgtqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcgtq_f32(*v0, *v1); } void VcgtqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcgtq_f64(*v0, *v1); } void VcgtsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcgts_f32(*v0, *v1); } void VcgtzS8(uint8x8_t* r, int8x8_t* v0) { *r = vcgtz_s8(*v0); } void VcgtzS16(uint16x4_t* r, int16x4_t* v0) { *r = vcgtz_s16(*v0); } void VcgtzS32(uint32x2_t* r, int32x2_t* v0) { *r = vcgtz_s32(*v0); } void VcgtzS64(uint64x1_t* r, int64x1_t* v0) { *r = vcgtz_s64(*v0); } void VcgtzF32(uint32x2_t* r, float32x2_t* v0) { *r = vcgtz_f32(*v0); } void VcgtzF64(uint64x1_t* r, float64x1_t* v0) { *r = vcgtz_f64(*v0); } void VcgtzdS64(uint64_t* r, int64_t* v0) { *r = vcgtzd_s64(*v0); } void VcgtzdF64(uint64_t* r, float64_t* v0) { *r = vcgtzd_f64(*v0); } void VcgtzqS8(uint8x16_t* r, int8x16_t* v0) { *r = vcgtzq_s8(*v0); } void VcgtzqS16(uint16x8_t* r, int16x8_t* v0) { *r = vcgtzq_s16(*v0); } void VcgtzqS32(uint32x4_t* r, int32x4_t* v0) { *r = vcgtzq_s32(*v0); } void VcgtzqS64(uint64x2_t* r, int64x2_t* v0) { *r = vcgtzq_s64(*v0); } void VcgtzqF32(uint32x4_t* r, float32x4_t* v0) { *r = vcgtzq_f32(*v0); } void VcgtzqF64(uint64x2_t* r, float64x2_t* v0) { *r = vcgtzq_f64(*v0); } void VcgtzsF32(uint32_t* r, float32_t* v0) { *r = vcgtzs_f32(*v0); } void VcleS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vcle_s8(*v0, *v1); } void VcleS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vcle_s16(*v0, *v1); } void VcleS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vcle_s32(*v0, *v1); } void VcleS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vcle_s64(*v0, *v1); } void VcleU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vcle_u8(*v0, *v1); } void VcleU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vcle_u16(*v0, *v1); } void VcleU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vcle_u32(*v0, *v1); } void VcleU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vcle_u64(*v0, *v1); } void VcleF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcle_f32(*v0, *v1); } void VcleF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcle_f64(*v0, *v1); } void VcledS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vcled_s64(*v0, *v1); } void VcledU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vcled_u64(*v0, *v1); } void VcledF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcled_f64(*v0, *v1); } void VcleqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vcleq_s8(*v0, *v1); } void VcleqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vcleq_s16(*v0, *v1); } void VcleqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vcleq_s32(*v0, *v1); } void VcleqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vcleq_s64(*v0, *v1); } void VcleqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vcleq_u8(*v0, *v1); } void VcleqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vcleq_u16(*v0, *v1); } void VcleqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vcleq_u32(*v0, *v1); } void VcleqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vcleq_u64(*v0, *v1); } void VcleqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcleq_f32(*v0, *v1); } void VcleqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcleq_f64(*v0, *v1); } void VclesF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vcles_f32(*v0, *v1); } void VclezS8(uint8x8_t* r, int8x8_t* v0) { *r = vclez_s8(*v0); } void VclezS16(uint16x4_t* r, int16x4_t* v0) { *r = vclez_s16(*v0); } void VclezS32(uint32x2_t* r, int32x2_t* v0) { *r = vclez_s32(*v0); } void VclezS64(uint64x1_t* r, int64x1_t* v0) { *r = vclez_s64(*v0); } void VclezF32(uint32x2_t* r, float32x2_t* v0) { *r = vclez_f32(*v0); } void VclezF64(uint64x1_t* r, float64x1_t* v0) { *r = vclez_f64(*v0); } void VclezdS64(uint64_t* r, int64_t* v0) { *r = vclezd_s64(*v0); } void VclezdF64(uint64_t* r, float64_t* v0) { *r = vclezd_f64(*v0); } void VclezqS8(uint8x16_t* r, int8x16_t* v0) { *r = vclezq_s8(*v0); } void VclezqS16(uint16x8_t* r, int16x8_t* v0) { *r = vclezq_s16(*v0); } void VclezqS32(uint32x4_t* r, int32x4_t* v0) { *r = vclezq_s32(*v0); } void VclezqS64(uint64x2_t* r, int64x2_t* v0) { *r = vclezq_s64(*v0); } void VclezqF32(uint32x4_t* r, float32x4_t* v0) { *r = vclezq_f32(*v0); } void VclezqF64(uint64x2_t* r, float64x2_t* v0) { *r = vclezq_f64(*v0); } void VclezsF32(uint32_t* r, float32_t* v0) { *r = vclezs_f32(*v0); } void VclsS8(int8x8_t* r, int8x8_t* v0) { *r = vcls_s8(*v0); } void VclsS16(int16x4_t* r, int16x4_t* v0) { *r = vcls_s16(*v0); } void VclsS32(int32x2_t* r, int32x2_t* v0) { *r = vcls_s32(*v0); } void VclsU8(int8x8_t* r, uint8x8_t* v0) { *r = vcls_u8(*v0); } void VclsU16(int16x4_t* r, uint16x4_t* v0) { *r = vcls_u16(*v0); } void VclsU32(int32x2_t* r, uint32x2_t* v0) { *r = vcls_u32(*v0); } void VclsqS8(int8x16_t* r, int8x16_t* v0) { *r = vclsq_s8(*v0); } void VclsqS16(int16x8_t* r, int16x8_t* v0) { *r = vclsq_s16(*v0); } void VclsqS32(int32x4_t* r, int32x4_t* v0) { *r = vclsq_s32(*v0); } void VclsqU8(int8x16_t* r, uint8x16_t* v0) { *r = vclsq_u8(*v0); } void VclsqU16(int16x8_t* r, uint16x8_t* v0) { *r = vclsq_u16(*v0); } void VclsqU32(int32x4_t* r, uint32x4_t* v0) { *r = vclsq_u32(*v0); } void VcltS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vclt_s8(*v0, *v1); } void VcltS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vclt_s16(*v0, *v1); } void VcltS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vclt_s32(*v0, *v1); } void VcltS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vclt_s64(*v0, *v1); } void VcltU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vclt_u8(*v0, *v1); } void VcltU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vclt_u16(*v0, *v1); } void VcltU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vclt_u32(*v0, *v1); } void VcltU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vclt_u64(*v0, *v1); } void VcltF32(uint32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vclt_f32(*v0, *v1); } void VcltF64(uint64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vclt_f64(*v0, *v1); } void VcltdS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vcltd_s64(*v0, *v1); } void VcltdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vcltd_u64(*v0, *v1); } void VcltdF64(uint64_t* r, float64_t* v0, float64_t* v1) { *r = vcltd_f64(*v0, *v1); } void VcltqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vcltq_s8(*v0, *v1); } void VcltqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vcltq_s16(*v0, *v1); } void VcltqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vcltq_s32(*v0, *v1); } void VcltqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vcltq_s64(*v0, *v1); } void VcltqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vcltq_u8(*v0, *v1); } void VcltqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vcltq_u16(*v0, *v1); } void VcltqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vcltq_u32(*v0, *v1); } void VcltqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vcltq_u64(*v0, *v1); } void VcltqF32(uint32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vcltq_f32(*v0, *v1); } void VcltqF64(uint64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vcltq_f64(*v0, *v1); } void VcltsF32(uint32_t* r, float32_t* v0, float32_t* v1) { *r = vclts_f32(*v0, *v1); } void VcltzS8(uint8x8_t* r, int8x8_t* v0) { *r = vcltz_s8(*v0); } void VcltzS16(uint16x4_t* r, int16x4_t* v0) { *r = vcltz_s16(*v0); } void VcltzS32(uint32x2_t* r, int32x2_t* v0) { *r = vcltz_s32(*v0); } void VcltzS64(uint64x1_t* r, int64x1_t* v0) { *r = vcltz_s64(*v0); } void VcltzF32(uint32x2_t* r, float32x2_t* v0) { *r = vcltz_f32(*v0); } void VcltzF64(uint64x1_t* r, float64x1_t* v0) { *r = vcltz_f64(*v0); } void VcltzdS64(uint64_t* r, int64_t* v0) { *r = vcltzd_s64(*v0); } void VcltzdF64(uint64_t* r, float64_t* v0) { *r = vcltzd_f64(*v0); } void VcltzqS8(uint8x16_t* r, int8x16_t* v0) { *r = vcltzq_s8(*v0); } void VcltzqS16(uint16x8_t* r, int16x8_t* v0) { *r = vcltzq_s16(*v0); } void VcltzqS32(uint32x4_t* r, int32x4_t* v0) { *r = vcltzq_s32(*v0); } void VcltzqS64(uint64x2_t* r, int64x2_t* v0) { *r = vcltzq_s64(*v0); } void VcltzqF32(uint32x4_t* r, float32x4_t* v0) { *r = vcltzq_f32(*v0); } void VcltzqF64(uint64x2_t* r, float64x2_t* v0) { *r = vcltzq_f64(*v0); } void VcltzsF32(uint32_t* r, float32_t* v0) { *r = vcltzs_f32(*v0); } void VclzS8(int8x8_t* r, int8x8_t* v0) { *r = vclz_s8(*v0); } void VclzS16(int16x4_t* r, int16x4_t* v0) { *r = vclz_s16(*v0); } void VclzS32(int32x2_t* r, int32x2_t* v0) { *r = vclz_s32(*v0); } void VclzU8(uint8x8_t* r, uint8x8_t* v0) { *r = vclz_u8(*v0); } void VclzU16(uint16x4_t* r, uint16x4_t* v0) { *r = vclz_u16(*v0); } void VclzU32(uint32x2_t* r, uint32x2_t* v0) { *r = vclz_u32(*v0); } void VclzqS8(int8x16_t* r, int8x16_t* v0) { *r = vclzq_s8(*v0); } void VclzqS16(int16x8_t* r, int16x8_t* v0) { *r = vclzq_s16(*v0); } void VclzqS32(int32x4_t* r, int32x4_t* v0) { *r = vclzq_s32(*v0); } void VclzqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vclzq_u8(*v0); } void VclzqU16(uint16x8_t* r, uint16x8_t* v0) { *r = vclzq_u16(*v0); } void VclzqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vclzq_u32(*v0); } void VcntS8(int8x8_t* r, int8x8_t* v0) { *r = vcnt_s8(*v0); } void VcntU8(uint8x8_t* r, uint8x8_t* v0) { *r = vcnt_u8(*v0); } void VcntP8(poly8x8_t* r, poly8x8_t* v0) { *r = vcnt_p8(*v0); } void VcntqS8(int8x16_t* r, int8x16_t* v0) { *r = vcntq_s8(*v0); } void VcntqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vcntq_u8(*v0); } void VcntqP8(poly8x16_t* r, poly8x16_t* v0) { *r = vcntq_p8(*v0); } void VcombineS8(int8x16_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vcombine_s8(*v0, *v1); } void VcombineS16(int16x8_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vcombine_s16(*v0, *v1); } void VcombineS32(int32x4_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vcombine_s32(*v0, *v1); } void VcombineS64(int64x2_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vcombine_s64(*v0, *v1); } void VcombineU8(uint8x16_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vcombine_u8(*v0, *v1); } void VcombineU16(uint16x8_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vcombine_u16(*v0, *v1); } void VcombineU32(uint32x4_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vcombine_u32(*v0, *v1); } void VcombineU64(uint64x2_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vcombine_u64(*v0, *v1); } void VcombineF32(float32x4_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vcombine_f32(*v0, *v1); } void VcombineF64(float64x2_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vcombine_f64(*v0, *v1); } void VcombineP16(poly16x8_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vcombine_p16(*v0, *v1); } void VcombineP64(poly64x2_t* r, poly64x1_t* v0, poly64x1_t* v1) { *r = vcombine_p64(*v0, *v1); } void VcombineP8(poly8x16_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vcombine_p8(*v0, *v1); } void VcvtF32S32(float32x2_t* r, int32x2_t* v0) { *r = vcvt_f32_s32(*v0); } void VcvtF32U32(float32x2_t* r, uint32x2_t* v0) { *r = vcvt_f32_u32(*v0); } void VcvtF32F64(float32x2_t* r, float64x2_t* v0) { *r = vcvt_f32_f64(*v0); } void VcvtF64S64(float64x1_t* r, int64x1_t* v0) { *r = vcvt_f64_s64(*v0); } void VcvtF64U64(float64x1_t* r, uint64x1_t* v0) { *r = vcvt_f64_u64(*v0); } void VcvtF64F32(float64x2_t* r, float32x2_t* v0) { *r = vcvt_f64_f32(*v0); } void VcvtHighF32F64(float32x4_t* r, float32x2_t* v0, float64x2_t* v1) { *r = vcvt_high_f32_f64(*v0, *v1); } void VcvtHighF64F32(float64x2_t* r, float32x4_t* v0) { *r = vcvt_high_f64_f32(*v0); } void VcvtS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvt_s32_f32(*v0); } void VcvtS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvt_s64_f64(*v0); } void VcvtU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvt_u32_f32(*v0); } void VcvtU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvt_u64_f64(*v0); } void VcvtaS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvta_s32_f32(*v0); } void VcvtaS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvta_s64_f64(*v0); } void VcvtaU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvta_u32_f32(*v0); } void VcvtaU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvta_u64_f64(*v0); } void VcvtadS64F64(int64_t* r, float64_t* v0) { *r = vcvtad_s64_f64(*v0); } void VcvtadU64F64(uint64_t* r, float64_t* v0) { *r = vcvtad_u64_f64(*v0); } void VcvtaqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtaq_s32_f32(*v0); } void VcvtaqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtaq_s64_f64(*v0); } void VcvtaqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtaq_u32_f32(*v0); } void VcvtaqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtaq_u64_f64(*v0); } void VcvtasS32F32(int32_t* r, float32_t* v0) { *r = vcvtas_s32_f32(*v0); } void VcvtasU32F32(uint32_t* r, float32_t* v0) { *r = vcvtas_u32_f32(*v0); } void VcvtdF64S64(float64_t* r, int64_t* v0) { *r = vcvtd_f64_s64(*v0); } void VcvtdF64U64(float64_t* r, uint64_t* v0) { *r = vcvtd_f64_u64(*v0); } void VcvtdS64F64(int64_t* r, float64_t* v0) { *r = vcvtd_s64_f64(*v0); } void VcvtdU64F64(uint64_t* r, float64_t* v0) { *r = vcvtd_u64_f64(*v0); } void VcvtmS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvtm_s32_f32(*v0); } void VcvtmS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvtm_s64_f64(*v0); } void VcvtmU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvtm_u32_f32(*v0); } void VcvtmU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvtm_u64_f64(*v0); } void VcvtmdS64F64(int64_t* r, float64_t* v0) { *r = vcvtmd_s64_f64(*v0); } void VcvtmdU64F64(uint64_t* r, float64_t* v0) { *r = vcvtmd_u64_f64(*v0); } void VcvtmqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtmq_s32_f32(*v0); } void VcvtmqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtmq_s64_f64(*v0); } void VcvtmqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtmq_u32_f32(*v0); } void VcvtmqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtmq_u64_f64(*v0); } void VcvtmsS32F32(int32_t* r, float32_t* v0) { *r = vcvtms_s32_f32(*v0); } void VcvtmsU32F32(uint32_t* r, float32_t* v0) { *r = vcvtms_u32_f32(*v0); } void VcvtnS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvtn_s32_f32(*v0); } void VcvtnS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvtn_s64_f64(*v0); } void VcvtnU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvtn_u32_f32(*v0); } void VcvtnU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvtn_u64_f64(*v0); } void VcvtndS64F64(int64_t* r, float64_t* v0) { *r = vcvtnd_s64_f64(*v0); } void VcvtndU64F64(uint64_t* r, float64_t* v0) { *r = vcvtnd_u64_f64(*v0); } void VcvtnqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtnq_s32_f32(*v0); } void VcvtnqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtnq_s64_f64(*v0); } void VcvtnqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtnq_u32_f32(*v0); } void VcvtnqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtnq_u64_f64(*v0); } void VcvtnsS32F32(int32_t* r, float32_t* v0) { *r = vcvtns_s32_f32(*v0); } void VcvtnsU32F32(uint32_t* r, float32_t* v0) { *r = vcvtns_u32_f32(*v0); } void VcvtpS32F32(int32x2_t* r, float32x2_t* v0) { *r = vcvtp_s32_f32(*v0); } void VcvtpS64F64(int64x1_t* r, float64x1_t* v0) { *r = vcvtp_s64_f64(*v0); } void VcvtpU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vcvtp_u32_f32(*v0); } void VcvtpU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vcvtp_u64_f64(*v0); } void VcvtpdS64F64(int64_t* r, float64_t* v0) { *r = vcvtpd_s64_f64(*v0); } void VcvtpdU64F64(uint64_t* r, float64_t* v0) { *r = vcvtpd_u64_f64(*v0); } void VcvtpqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtpq_s32_f32(*v0); } void VcvtpqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtpq_s64_f64(*v0); } void VcvtpqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtpq_u32_f32(*v0); } void VcvtpqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtpq_u64_f64(*v0); } void VcvtpsS32F32(int32_t* r, float32_t* v0) { *r = vcvtps_s32_f32(*v0); } void VcvtpsU32F32(uint32_t* r, float32_t* v0) { *r = vcvtps_u32_f32(*v0); } void VcvtqF32S32(float32x4_t* r, int32x4_t* v0) { *r = vcvtq_f32_s32(*v0); } void VcvtqF32U32(float32x4_t* r, uint32x4_t* v0) { *r = vcvtq_f32_u32(*v0); } void VcvtqF64S64(float64x2_t* r, int64x2_t* v0) { *r = vcvtq_f64_s64(*v0); } void VcvtqF64U64(float64x2_t* r, uint64x2_t* v0) { *r = vcvtq_f64_u64(*v0); } void VcvtqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vcvtq_s32_f32(*v0); } void VcvtqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vcvtq_s64_f64(*v0); } void VcvtqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vcvtq_u32_f32(*v0); } void VcvtqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vcvtq_u64_f64(*v0); } void VcvtsF32S32(float32_t* r, int32_t* v0) { *r = vcvts_f32_s32(*v0); } void VcvtsF32U32(float32_t* r, uint32_t* v0) { *r = vcvts_f32_u32(*v0); } void VcvtsS32F32(int32_t* r, float32_t* v0) { *r = vcvts_s32_f32(*v0); } void VcvtsU32F32(uint32_t* r, float32_t* v0) { *r = vcvts_u32_f32(*v0); } void VcvtxF32F64(float32x2_t* r, float64x2_t* v0) { *r = vcvtx_f32_f64(*v0); } void VcvtxHighF32F64(float32x4_t* r, float32x2_t* v0, float64x2_t* v1) { *r = vcvtx_high_f32_f64(*v0, *v1); } void VcvtxdF32F64(float32_t* r, float64_t* v0) { *r = vcvtxd_f32_f64(*v0); } void VdivF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vdiv_f32(*v0, *v1); } void VdivF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vdiv_f64(*v0, *v1); } void VdivqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vdivq_f32(*v0, *v1); } void VdivqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vdivq_f64(*v0, *v1); } void VdotS32(int32x2_t* r, int32x2_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vdot_s32(*v0, *v1, *v2); } void VdotU32(uint32x2_t* r, uint32x2_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vdot_u32(*v0, *v1, *v2); } void VdotqS32(int32x4_t* r, int32x4_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vdotq_s32(*v0, *v1, *v2); } void VdotqU32(uint32x4_t* r, uint32x4_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vdotq_u32(*v0, *v1, *v2); } void VdupNS8(int8x8_t* r, int8_t* v0) { *r = vdup_n_s8(*v0); } void VdupNS16(int16x4_t* r, int16_t* v0) { *r = vdup_n_s16(*v0); } void VdupNS32(int32x2_t* r, int32_t* v0) { *r = vdup_n_s32(*v0); } void VdupNS64(int64x1_t* r, int64_t* v0) { *r = vdup_n_s64(*v0); } void VdupNU8(uint8x8_t* r, uint8_t* v0) { *r = vdup_n_u8(*v0); } void VdupNU16(uint16x4_t* r, uint16_t* v0) { *r = vdup_n_u16(*v0); } void VdupNU32(uint32x2_t* r, uint32_t* v0) { *r = vdup_n_u32(*v0); } void VdupNU64(uint64x1_t* r, uint64_t* v0) { *r = vdup_n_u64(*v0); } void VdupNF32(float32x2_t* r, float32_t* v0) { *r = vdup_n_f32(*v0); } void VdupNF64(float64x1_t* r, float64_t* v0) { *r = vdup_n_f64(*v0); } void VdupNP16(poly16x4_t* r, poly16_t* v0) { *r = vdup_n_p16(*v0); } void VdupNP64(poly64x1_t* r, poly64_t* v0) { *r = vdup_n_p64(*v0); } void VdupNP8(poly8x8_t* r, poly8_t* v0) { *r = vdup_n_p8(*v0); } void VdupqNS8(int8x16_t* r, int8_t* v0) { *r = vdupq_n_s8(*v0); } void VdupqNS16(int16x8_t* r, int16_t* v0) { *r = vdupq_n_s16(*v0); } void VdupqNS32(int32x4_t* r, int32_t* v0) { *r = vdupq_n_s32(*v0); } void VdupqNS64(int64x2_t* r, int64_t* v0) { *r = vdupq_n_s64(*v0); } void VdupqNU8(uint8x16_t* r, uint8_t* v0) { *r = vdupq_n_u8(*v0); } void VdupqNU16(uint16x8_t* r, uint16_t* v0) { *r = vdupq_n_u16(*v0); } void VdupqNU32(uint32x4_t* r, uint32_t* v0) { *r = vdupq_n_u32(*v0); } void VdupqNU64(uint64x2_t* r, uint64_t* v0) { *r = vdupq_n_u64(*v0); } void VdupqNF32(float32x4_t* r, float32_t* v0) { *r = vdupq_n_f32(*v0); } void VdupqNF64(float64x2_t* r, float64_t* v0) { *r = vdupq_n_f64(*v0); } void VdupqNP16(poly16x8_t* r, poly16_t* v0) { *r = vdupq_n_p16(*v0); } void VdupqNP64(poly64x2_t* r, poly64_t* v0) { *r = vdupq_n_p64(*v0); } void VdupqNP8(poly8x16_t* r, poly8_t* v0) { *r = vdupq_n_p8(*v0); } void VeorS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = veor_s8(*v0, *v1); } void VeorS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = veor_s16(*v0, *v1); } void VeorS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = veor_s32(*v0, *v1); } void VeorS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = veor_s64(*v0, *v1); } void VeorU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = veor_u8(*v0, *v1); } void VeorU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = veor_u16(*v0, *v1); } void VeorU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = veor_u32(*v0, *v1); } void VeorU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = veor_u64(*v0, *v1); } void Veor3QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = veor3q_s8(*v0, *v1, *v2); } void Veor3QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = veor3q_s16(*v0, *v1, *v2); } void Veor3QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = veor3q_s32(*v0, *v1, *v2); } void Veor3QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = veor3q_s64(*v0, *v1, *v2); } void Veor3QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = veor3q_u8(*v0, *v1, *v2); } void Veor3QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = veor3q_u16(*v0, *v1, *v2); } void Veor3QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = veor3q_u32(*v0, *v1, *v2); } void Veor3QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = veor3q_u64(*v0, *v1, *v2); } void VeorqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = veorq_s8(*v0, *v1); } void VeorqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = veorq_s16(*v0, *v1); } void VeorqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = veorq_s32(*v0, *v1); } void VeorqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = veorq_s64(*v0, *v1); } void VeorqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = veorq_u8(*v0, *v1); } void VeorqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = veorq_u16(*v0, *v1); } void VeorqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = veorq_u32(*v0, *v1); } void VeorqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = veorq_u64(*v0, *v1); } void VfmaF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vfma_f32(*v0, *v1, *v2); } void VfmaF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vfma_f64(*v0, *v1, *v2); } void VfmaNF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32_t* v2) { *r = vfma_n_f32(*v0, *v1, *v2); } void VfmaNF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64_t* v2) { *r = vfma_n_f64(*v0, *v1, *v2); } void VfmaqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vfmaq_f32(*v0, *v1, *v2); } void VfmaqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vfmaq_f64(*v0, *v1, *v2); } void VfmaqNF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32_t* v2) { *r = vfmaq_n_f32(*v0, *v1, *v2); } void VfmaqNF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64_t* v2) { *r = vfmaq_n_f64(*v0, *v1, *v2); } void VfmsF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vfms_f32(*v0, *v1, *v2); } void VfmsF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vfms_f64(*v0, *v1, *v2); } void VfmsNF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32_t* v2) { *r = vfms_n_f32(*v0, *v1, *v2); } void VfmsNF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64_t* v2) { *r = vfms_n_f64(*v0, *v1, *v2); } void VfmsqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vfmsq_f32(*v0, *v1, *v2); } void VfmsqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vfmsq_f64(*v0, *v1, *v2); } void VfmsqNF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32_t* v2) { *r = vfmsq_n_f32(*v0, *v1, *v2); } void VfmsqNF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64_t* v2) { *r = vfmsq_n_f64(*v0, *v1, *v2); } void VgetHighS8(int8x8_t* r, int8x16_t* v0) { *r = vget_high_s8(*v0); } void VgetHighS16(int16x4_t* r, int16x8_t* v0) { *r = vget_high_s16(*v0); } void VgetHighS32(int32x2_t* r, int32x4_t* v0) { *r = vget_high_s32(*v0); } void VgetHighS64(int64x1_t* r, int64x2_t* v0) { *r = vget_high_s64(*v0); } void VgetHighU8(uint8x8_t* r, uint8x16_t* v0) { *r = vget_high_u8(*v0); } void VgetHighU16(uint16x4_t* r, uint16x8_t* v0) { *r = vget_high_u16(*v0); } void VgetHighU32(uint32x2_t* r, uint32x4_t* v0) { *r = vget_high_u32(*v0); } void VgetHighU64(uint64x1_t* r, uint64x2_t* v0) { *r = vget_high_u64(*v0); } void VgetHighF32(float32x2_t* r, float32x4_t* v0) { *r = vget_high_f32(*v0); } void VgetHighF64(float64x1_t* r, float64x2_t* v0) { *r = vget_high_f64(*v0); } void VgetHighP16(poly16x4_t* r, poly16x8_t* v0) { *r = vget_high_p16(*v0); } void VgetHighP64(poly64x1_t* r, poly64x2_t* v0) { *r = vget_high_p64(*v0); } void VgetHighP8(poly8x8_t* r, poly8x16_t* v0) { *r = vget_high_p8(*v0); } void VgetLowS8(int8x8_t* r, int8x16_t* v0) { *r = vget_low_s8(*v0); } void VgetLowS16(int16x4_t* r, int16x8_t* v0) { *r = vget_low_s16(*v0); } void VgetLowS32(int32x2_t* r, int32x4_t* v0) { *r = vget_low_s32(*v0); } void VgetLowS64(int64x1_t* r, int64x2_t* v0) { *r = vget_low_s64(*v0); } void VgetLowU8(uint8x8_t* r, uint8x16_t* v0) { *r = vget_low_u8(*v0); } void VgetLowU16(uint16x4_t* r, uint16x8_t* v0) { *r = vget_low_u16(*v0); } void VgetLowU32(uint32x2_t* r, uint32x4_t* v0) { *r = vget_low_u32(*v0); } void VgetLowU64(uint64x1_t* r, uint64x2_t* v0) { *r = vget_low_u64(*v0); } void VgetLowF32(float32x2_t* r, float32x4_t* v0) { *r = vget_low_f32(*v0); } void VgetLowF64(float64x1_t* r, float64x2_t* v0) { *r = vget_low_f64(*v0); } void VgetLowP16(poly16x4_t* r, poly16x8_t* v0) { *r = vget_low_p16(*v0); } void VgetLowP64(poly64x1_t* r, poly64x2_t* v0) { *r = vget_low_p64(*v0); } void VgetLowP8(poly8x8_t* r, poly8x16_t* v0) { *r = vget_low_p8(*v0); } void VhaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vhadd_s8(*v0, *v1); } void VhaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vhadd_s16(*v0, *v1); } void VhaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vhadd_s32(*v0, *v1); } void VhaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vhadd_u8(*v0, *v1); } void VhaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vhadd_u16(*v0, *v1); } void VhaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vhadd_u32(*v0, *v1); } void VhaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vhaddq_s8(*v0, *v1); } void VhaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vhaddq_s16(*v0, *v1); } void VhaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vhaddq_s32(*v0, *v1); } void VhaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vhaddq_u8(*v0, *v1); } void VhaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vhaddq_u16(*v0, *v1); } void VhaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vhaddq_u32(*v0, *v1); } void VhsubS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vhsub_s8(*v0, *v1); } void VhsubS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vhsub_s16(*v0, *v1); } void VhsubS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vhsub_s32(*v0, *v1); } void VhsubU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vhsub_u8(*v0, *v1); } void VhsubU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vhsub_u16(*v0, *v1); } void VhsubU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vhsub_u32(*v0, *v1); } void VhsubqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vhsubq_s8(*v0, *v1); } void VhsubqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vhsubq_s16(*v0, *v1); } void VhsubqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vhsubq_s32(*v0, *v1); } void VhsubqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vhsubq_u8(*v0, *v1); } void VhsubqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vhsubq_u16(*v0, *v1); } void VhsubqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vhsubq_u32(*v0, *v1); } void VmaxS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmax_s8(*v0, *v1); } void VmaxS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vmax_s16(*v0, *v1); } void VmaxS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vmax_s32(*v0, *v1); } void VmaxU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vmax_u8(*v0, *v1); } void VmaxU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vmax_u16(*v0, *v1); } void VmaxU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vmax_u32(*v0, *v1); } void VmaxF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmax_f32(*v0, *v1); } void VmaxF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmax_f64(*v0, *v1); } void VmaxnmF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmaxnm_f32(*v0, *v1); } void VmaxnmF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmaxnm_f64(*v0, *v1); } void VmaxnmqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vmaxnmq_f32(*v0, *v1); } void VmaxnmqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vmaxnmq_f64(*v0, *v1); } void VmaxnmvF32(float32_t* r, float32x2_t* v0) { *r = vmaxnmv_f32(*v0); } void VmaxnmvqF32(float32_t* r, float32x4_t* v0) { *r = vmaxnmvq_f32(*v0); } void VmaxnmvqF64(float64_t* r, float64x2_t* v0) { *r = vmaxnmvq_f64(*v0); } void VmaxqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vmaxq_s8(*v0, *v1); } void VmaxqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vmaxq_s16(*v0, *v1); } void VmaxqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vmaxq_s32(*v0, *v1); } void VmaxqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vmaxq_u8(*v0, *v1); } void VmaxqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vmaxq_u16(*v0, *v1); } void VmaxqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vmaxq_u32(*v0, *v1); } void VmaxqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vmaxq_f32(*v0, *v1); } void VmaxqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vmaxq_f64(*v0, *v1); } void VmaxvS8(int8_t* r, int8x8_t* v0) { *r = vmaxv_s8(*v0); } void VmaxvS16(int16_t* r, int16x4_t* v0) { *r = vmaxv_s16(*v0); } void VmaxvS32(int32_t* r, int32x2_t* v0) { *r = vmaxv_s32(*v0); } void VmaxvU8(uint8_t* r, uint8x8_t* v0) { *r = vmaxv_u8(*v0); } void VmaxvU16(uint16_t* r, uint16x4_t* v0) { *r = vmaxv_u16(*v0); } void VmaxvU32(uint32_t* r, uint32x2_t* v0) { *r = vmaxv_u32(*v0); } void VmaxvF32(float32_t* r, float32x2_t* v0) { *r = vmaxv_f32(*v0); } void VmaxvqS8(int8_t* r, int8x16_t* v0) { *r = vmaxvq_s8(*v0); } void VmaxvqS16(int16_t* r, int16x8_t* v0) { *r = vmaxvq_s16(*v0); } void VmaxvqS32(int32_t* r, int32x4_t* v0) { *r = vmaxvq_s32(*v0); } void VmaxvqU8(uint8_t* r, uint8x16_t* v0) { *r = vmaxvq_u8(*v0); } void VmaxvqU16(uint16_t* r, uint16x8_t* v0) { *r = vmaxvq_u16(*v0); } void VmaxvqU32(uint32_t* r, uint32x4_t* v0) { *r = vmaxvq_u32(*v0); } void VmaxvqF32(float32_t* r, float32x4_t* v0) { *r = vmaxvq_f32(*v0); } void VmaxvqF64(float64_t* r, float64x2_t* v0) { *r = vmaxvq_f64(*v0); } void VminS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmin_s8(*v0, *v1); } void VminS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vmin_s16(*v0, *v1); } void VminS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vmin_s32(*v0, *v1); } void VminU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vmin_u8(*v0, *v1); } void VminU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vmin_u16(*v0, *v1); } void VminU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vmin_u32(*v0, *v1); } void VminF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmin_f32(*v0, *v1); } void VminF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmin_f64(*v0, *v1); } void VminnmF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vminnm_f32(*v0, *v1); } void VminnmF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vminnm_f64(*v0, *v1); } void VminnmqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vminnmq_f32(*v0, *v1); } void VminnmqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vminnmq_f64(*v0, *v1); } void VminnmvF32(float32_t* r, float32x2_t* v0) { *r = vminnmv_f32(*v0); } void VminnmvqF32(float32_t* r, float32x4_t* v0) { *r = vminnmvq_f32(*v0); } void VminnmvqF64(float64_t* r, float64x2_t* v0) { *r = vminnmvq_f64(*v0); } void VminqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vminq_s8(*v0, *v1); } void VminqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vminq_s16(*v0, *v1); } void VminqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vminq_s32(*v0, *v1); } void VminqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vminq_u8(*v0, *v1); } void VminqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vminq_u16(*v0, *v1); } void VminqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vminq_u32(*v0, *v1); } void VminqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vminq_f32(*v0, *v1); } void VminqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vminq_f64(*v0, *v1); } void VminvS8(int8_t* r, int8x8_t* v0) { *r = vminv_s8(*v0); } void VminvS16(int16_t* r, int16x4_t* v0) { *r = vminv_s16(*v0); } void VminvS32(int32_t* r, int32x2_t* v0) { *r = vminv_s32(*v0); } void VminvU8(uint8_t* r, uint8x8_t* v0) { *r = vminv_u8(*v0); } void VminvU16(uint16_t* r, uint16x4_t* v0) { *r = vminv_u16(*v0); } void VminvU32(uint32_t* r, uint32x2_t* v0) { *r = vminv_u32(*v0); } void VminvF32(float32_t* r, float32x2_t* v0) { *r = vminv_f32(*v0); } void VminvqS8(int8_t* r, int8x16_t* v0) { *r = vminvq_s8(*v0); } void VminvqS16(int16_t* r, int16x8_t* v0) { *r = vminvq_s16(*v0); } void VminvqS32(int32_t* r, int32x4_t* v0) { *r = vminvq_s32(*v0); } void VminvqU8(uint8_t* r, uint8x16_t* v0) { *r = vminvq_u8(*v0); } void VminvqU16(uint16_t* r, uint16x8_t* v0) { *r = vminvq_u16(*v0); } void VminvqU32(uint32_t* r, uint32x4_t* v0) { *r = vminvq_u32(*v0); } void VminvqF32(float32_t* r, float32x4_t* v0) { *r = vminvq_f32(*v0); } void VminvqF64(float64_t* r, float64x2_t* v0) { *r = vminvq_f64(*v0); } void VmlaS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vmla_s8(*v0, *v1, *v2); } void VmlaS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vmla_s16(*v0, *v1, *v2); } void VmlaS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vmla_s32(*v0, *v1, *v2); } void VmlaU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vmla_u8(*v0, *v1, *v2); } void VmlaU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vmla_u16(*v0, *v1, *v2); } void VmlaU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vmla_u32(*v0, *v1, *v2); } void VmlaF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vmla_f32(*v0, *v1, *v2); } void VmlaF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vmla_f64(*v0, *v1, *v2); } void VmlaNS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vmla_n_s16(*v0, *v1, *v2); } void VmlaNS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vmla_n_s32(*v0, *v1, *v2); } void VmlaNU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16_t* v2) { *r = vmla_n_u16(*v0, *v1, *v2); } void VmlaNU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32_t* v2) { *r = vmla_n_u32(*v0, *v1, *v2); } void VmlaNF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32_t* v2) { *r = vmla_n_f32(*v0, *v1, *v2); } void VmlalS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vmlal_s8(*v0, *v1, *v2); } void VmlalS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vmlal_s16(*v0, *v1, *v2); } void VmlalS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vmlal_s32(*v0, *v1, *v2); } void VmlalU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vmlal_u8(*v0, *v1, *v2); } void VmlalU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vmlal_u16(*v0, *v1, *v2); } void VmlalU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vmlal_u32(*v0, *v1, *v2); } void VmlalHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmlal_high_s8(*v0, *v1, *v2); } void VmlalHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vmlal_high_s16(*v0, *v1, *v2); } void VmlalHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vmlal_high_s32(*v0, *v1, *v2); } void VmlalHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmlal_high_u8(*v0, *v1, *v2); } void VmlalHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vmlal_high_u16(*v0, *v1, *v2); } void VmlalHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vmlal_high_u32(*v0, *v1, *v2); } void VmlalHighNS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16_t* v2) { *r = vmlal_high_n_s16(*v0, *v1, *v2); } void VmlalHighNS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32_t* v2) { *r = vmlal_high_n_s32(*v0, *v1, *v2); } void VmlalHighNU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16_t* v2) { *r = vmlal_high_n_u16(*v0, *v1, *v2); } void VmlalHighNU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32_t* v2) { *r = vmlal_high_n_u32(*v0, *v1, *v2); } void VmlalNS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vmlal_n_s16(*v0, *v1, *v2); } void VmlalNS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vmlal_n_s32(*v0, *v1, *v2); } void VmlalNU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16_t* v2) { *r = vmlal_n_u16(*v0, *v1, *v2); } void VmlalNU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32_t* v2) { *r = vmlal_n_u32(*v0, *v1, *v2); } void VmlaqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmlaq_s8(*v0, *v1, *v2); } void VmlaqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vmlaq_s16(*v0, *v1, *v2); } void VmlaqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vmlaq_s32(*v0, *v1, *v2); } void VmlaqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmlaq_u8(*v0, *v1, *v2); } void VmlaqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vmlaq_u16(*v0, *v1, *v2); } void VmlaqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vmlaq_u32(*v0, *v1, *v2); } void VmlaqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vmlaq_f32(*v0, *v1, *v2); } void VmlaqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vmlaq_f64(*v0, *v1, *v2); } void VmlaqNS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16_t* v2) { *r = vmlaq_n_s16(*v0, *v1, *v2); } void VmlaqNS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32_t* v2) { *r = vmlaq_n_s32(*v0, *v1, *v2); } void VmlaqNU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16_t* v2) { *r = vmlaq_n_u16(*v0, *v1, *v2); } void VmlaqNU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32_t* v2) { *r = vmlaq_n_u32(*v0, *v1, *v2); } void VmlaqNF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32_t* v2) { *r = vmlaq_n_f32(*v0, *v1, *v2); } void VmlsS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vmls_s8(*v0, *v1, *v2); } void VmlsS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vmls_s16(*v0, *v1, *v2); } void VmlsS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vmls_s32(*v0, *v1, *v2); } void VmlsU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vmls_u8(*v0, *v1, *v2); } void VmlsU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vmls_u16(*v0, *v1, *v2); } void VmlsU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vmls_u32(*v0, *v1, *v2); } void VmlsF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32x2_t* v2) { *r = vmls_f32(*v0, *v1, *v2); } void VmlsF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1, float64x1_t* v2) { *r = vmls_f64(*v0, *v1, *v2); } void VmlsNS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vmls_n_s16(*v0, *v1, *v2); } void VmlsNS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vmls_n_s32(*v0, *v1, *v2); } void VmlsNU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1, uint16_t* v2) { *r = vmls_n_u16(*v0, *v1, *v2); } void VmlsNU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1, uint32_t* v2) { *r = vmls_n_u32(*v0, *v1, *v2); } void VmlsNF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1, float32_t* v2) { *r = vmls_n_f32(*v0, *v1, *v2); } void VmlslS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vmlsl_s8(*v0, *v1, *v2); } void VmlslS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vmlsl_s16(*v0, *v1, *v2); } void VmlslS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vmlsl_s32(*v0, *v1, *v2); } void VmlslU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vmlsl_u8(*v0, *v1, *v2); } void VmlslU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16x4_t* v2) { *r = vmlsl_u16(*v0, *v1, *v2); } void VmlslU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32x2_t* v2) { *r = vmlsl_u32(*v0, *v1, *v2); } void VmlslHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmlsl_high_s8(*v0, *v1, *v2); } void VmlslHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vmlsl_high_s16(*v0, *v1, *v2); } void VmlslHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vmlsl_high_s32(*v0, *v1, *v2); } void VmlslHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmlsl_high_u8(*v0, *v1, *v2); } void VmlslHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vmlsl_high_u16(*v0, *v1, *v2); } void VmlslHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vmlsl_high_u32(*v0, *v1, *v2); } void VmlslHighNS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16_t* v2) { *r = vmlsl_high_n_s16(*v0, *v1, *v2); } void VmlslHighNS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32_t* v2) { *r = vmlsl_high_n_s32(*v0, *v1, *v2); } void VmlslHighNU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1, uint16_t* v2) { *r = vmlsl_high_n_u16(*v0, *v1, *v2); } void VmlslHighNU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1, uint32_t* v2) { *r = vmlsl_high_n_u32(*v0, *v1, *v2); } void VmlslNS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vmlsl_n_s16(*v0, *v1, *v2); } void VmlslNS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vmlsl_n_s32(*v0, *v1, *v2); } void VmlslNU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1, uint16_t* v2) { *r = vmlsl_n_u16(*v0, *v1, *v2); } void VmlslNU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1, uint32_t* v2) { *r = vmlsl_n_u32(*v0, *v1, *v2); } void VmlsqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmlsq_s8(*v0, *v1, *v2); } void VmlsqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vmlsq_s16(*v0, *v1, *v2); } void VmlsqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vmlsq_s32(*v0, *v1, *v2); } void VmlsqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmlsq_u8(*v0, *v1, *v2); } void VmlsqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vmlsq_u16(*v0, *v1, *v2); } void VmlsqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vmlsq_u32(*v0, *v1, *v2); } void VmlsqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32x4_t* v2) { *r = vmlsq_f32(*v0, *v1, *v2); } void VmlsqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1, float64x2_t* v2) { *r = vmlsq_f64(*v0, *v1, *v2); } void VmlsqNS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16_t* v2) { *r = vmlsq_n_s16(*v0, *v1, *v2); } void VmlsqNS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32_t* v2) { *r = vmlsq_n_s32(*v0, *v1, *v2); } void VmlsqNU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1, uint16_t* v2) { *r = vmlsq_n_u16(*v0, *v1, *v2); } void VmlsqNU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32_t* v2) { *r = vmlsq_n_u32(*v0, *v1, *v2); } void VmlsqNF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1, float32_t* v2) { *r = vmlsq_n_f32(*v0, *v1, *v2); } void VmmlaqS32(int32x4_t* r, int32x4_t* v0, int8x16_t* v1, int8x16_t* v2) { *r = vmmlaq_s32(*v0, *v1, *v2); } void VmmlaqU32(uint32x4_t* r, uint32x4_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vmmlaq_u32(*v0, *v1, *v2); } void VmovNS8(int8x8_t* r, int8_t* v0) { *r = vmov_n_s8(*v0); } void VmovNS16(int16x4_t* r, int16_t* v0) { *r = vmov_n_s16(*v0); } void VmovNS32(int32x2_t* r, int32_t* v0) { *r = vmov_n_s32(*v0); } void VmovNS64(int64x1_t* r, int64_t* v0) { *r = vmov_n_s64(*v0); } void VmovNU8(uint8x8_t* r, uint8_t* v0) { *r = vmov_n_u8(*v0); } void VmovNU16(uint16x4_t* r, uint16_t* v0) { *r = vmov_n_u16(*v0); } void VmovNU32(uint32x2_t* r, uint32_t* v0) { *r = vmov_n_u32(*v0); } void VmovNU64(uint64x1_t* r, uint64_t* v0) { *r = vmov_n_u64(*v0); } void VmovNF32(float32x2_t* r, float32_t* v0) { *r = vmov_n_f32(*v0); } void VmovNF64(float64x1_t* r, float64_t* v0) { *r = vmov_n_f64(*v0); } void VmovNP16(poly16x4_t* r, poly16_t* v0) { *r = vmov_n_p16(*v0); } void VmovNP64(poly64x1_t* r, poly64_t* v0) { *r = vmov_n_p64(*v0); } void VmovNP8(poly8x8_t* r, poly8_t* v0) { *r = vmov_n_p8(*v0); } void VmovlS8(int16x8_t* r, int8x8_t* v0) { *r = vmovl_s8(*v0); } void VmovlS16(int32x4_t* r, int16x4_t* v0) { *r = vmovl_s16(*v0); } void VmovlS32(int64x2_t* r, int32x2_t* v0) { *r = vmovl_s32(*v0); } void VmovlU8(uint16x8_t* r, uint8x8_t* v0) { *r = vmovl_u8(*v0); } void VmovlU16(uint32x4_t* r, uint16x4_t* v0) { *r = vmovl_u16(*v0); } void VmovlU32(uint64x2_t* r, uint32x2_t* v0) { *r = vmovl_u32(*v0); } void VmovlHighS8(int16x8_t* r, int8x16_t* v0) { *r = vmovl_high_s8(*v0); } void VmovlHighS16(int32x4_t* r, int16x8_t* v0) { *r = vmovl_high_s16(*v0); } void VmovlHighS32(int64x2_t* r, int32x4_t* v0) { *r = vmovl_high_s32(*v0); } void VmovlHighU8(uint16x8_t* r, uint8x16_t* v0) { *r = vmovl_high_u8(*v0); } void VmovlHighU16(uint32x4_t* r, uint16x8_t* v0) { *r = vmovl_high_u16(*v0); } void VmovlHighU32(uint64x2_t* r, uint32x4_t* v0) { *r = vmovl_high_u32(*v0); } void VmovnS16(int8x8_t* r, int16x8_t* v0) { *r = vmovn_s16(*v0); } void VmovnS32(int16x4_t* r, int32x4_t* v0) { *r = vmovn_s32(*v0); } void VmovnS64(int32x2_t* r, int64x2_t* v0) { *r = vmovn_s64(*v0); } void VmovnU16(uint8x8_t* r, uint16x8_t* v0) { *r = vmovn_u16(*v0); } void VmovnU32(uint16x4_t* r, uint32x4_t* v0) { *r = vmovn_u32(*v0); } void VmovnU64(uint32x2_t* r, uint64x2_t* v0) { *r = vmovn_u64(*v0); } void VmovnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1) { *r = vmovn_high_s16(*v0, *v1); } void VmovnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1) { *r = vmovn_high_s32(*v0, *v1); } void VmovnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1) { *r = vmovn_high_s64(*v0, *v1); } void VmovnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1) { *r = vmovn_high_u16(*v0, *v1); } void VmovnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1) { *r = vmovn_high_u32(*v0, *v1); } void VmovnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1) { *r = vmovn_high_u64(*v0, *v1); } void VmovqNS8(int8x16_t* r, int8_t* v0) { *r = vmovq_n_s8(*v0); } void VmovqNS16(int16x8_t* r, int16_t* v0) { *r = vmovq_n_s16(*v0); } void VmovqNS32(int32x4_t* r, int32_t* v0) { *r = vmovq_n_s32(*v0); } void VmovqNS64(int64x2_t* r, int64_t* v0) { *r = vmovq_n_s64(*v0); } void VmovqNU8(uint8x16_t* r, uint8_t* v0) { *r = vmovq_n_u8(*v0); } void VmovqNU16(uint16x8_t* r, uint16_t* v0) { *r = vmovq_n_u16(*v0); } void VmovqNU32(uint32x4_t* r, uint32_t* v0) { *r = vmovq_n_u32(*v0); } void VmovqNU64(uint64x2_t* r, uint64_t* v0) { *r = vmovq_n_u64(*v0); } void VmovqNF32(float32x4_t* r, float32_t* v0) { *r = vmovq_n_f32(*v0); } void VmovqNF64(float64x2_t* r, float64_t* v0) { *r = vmovq_n_f64(*v0); } void VmovqNP16(poly16x8_t* r, poly16_t* v0) { *r = vmovq_n_p16(*v0); } void VmovqNP64(poly64x2_t* r, poly64_t* v0) { *r = vmovq_n_p64(*v0); } void VmovqNP8(poly8x16_t* r, poly8_t* v0) { *r = vmovq_n_p8(*v0); } void VmulS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmul_s8(*v0, *v1); } void VmulS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vmul_s16(*v0, *v1); } void VmulS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vmul_s32(*v0, *v1); } void VmulU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vmul_u8(*v0, *v1); } void VmulU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vmul_u16(*v0, *v1); } void VmulU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vmul_u32(*v0, *v1); } void VmulF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmul_f32(*v0, *v1); } void VmulF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmul_f64(*v0, *v1); } void VmulNS16(int16x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vmul_n_s16(*v0, *v1); } void VmulNS32(int32x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vmul_n_s32(*v0, *v1); } void VmulNU16(uint16x4_t* r, uint16x4_t* v0, uint16_t* v1) { *r = vmul_n_u16(*v0, *v1); } void VmulNU32(uint32x2_t* r, uint32x2_t* v0, uint32_t* v1) { *r = vmul_n_u32(*v0, *v1); } void VmulNF32(float32x2_t* r, float32x2_t* v0, float32_t* v1) { *r = vmul_n_f32(*v0, *v1); } void VmulNF64(float64x1_t* r, float64x1_t* v0, float64_t* v1) { *r = vmul_n_f64(*v0, *v1); } void VmulP8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vmul_p8(*v0, *v1); } void VmullS8(int16x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmull_s8(*v0, *v1); } void VmullS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vmull_s16(*v0, *v1); } void VmullS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vmull_s32(*v0, *v1); } void VmullU8(uint16x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vmull_u8(*v0, *v1); } void VmullU16(uint32x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vmull_u16(*v0, *v1); } void VmullU32(uint64x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vmull_u32(*v0, *v1); } void VmullHighS8(int16x8_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vmull_high_s8(*v0, *v1); } void VmullHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vmull_high_s16(*v0, *v1); } void VmullHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vmull_high_s32(*v0, *v1); } void VmullHighU8(uint16x8_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vmull_high_u8(*v0, *v1); } void VmullHighU16(uint32x4_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vmull_high_u16(*v0, *v1); } void VmullHighU32(uint64x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vmull_high_u32(*v0, *v1); } void VmullHighNS16(int32x4_t* r, int16x8_t* v0, int16_t* v1) { *r = vmull_high_n_s16(*v0, *v1); } void VmullHighNS32(int64x2_t* r, int32x4_t* v0, int32_t* v1) { *r = vmull_high_n_s32(*v0, *v1); } void VmullHighNU16(uint32x4_t* r, uint16x8_t* v0, uint16_t* v1) { *r = vmull_high_n_u16(*v0, *v1); } void VmullHighNU32(uint64x2_t* r, uint32x4_t* v0, uint32_t* v1) { *r = vmull_high_n_u32(*v0, *v1); } void VmullHighP64(poly128_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vmull_high_p64(*v0, *v1); } void VmullHighP8(poly16x8_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vmull_high_p8(*v0, *v1); } void VmullNS16(int32x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vmull_n_s16(*v0, *v1); } void VmullNS32(int64x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vmull_n_s32(*v0, *v1); } void VmullNU16(uint32x4_t* r, uint16x4_t* v0, uint16_t* v1) { *r = vmull_n_u16(*v0, *v1); } void VmullNU32(uint64x2_t* r, uint32x2_t* v0, uint32_t* v1) { *r = vmull_n_u32(*v0, *v1); } void VmullP64(poly128_t* r, poly64_t* v0, poly64_t* v1) { *r = vmull_p64(*v0, *v1); } void VmullP8(poly16x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vmull_p8(*v0, *v1); } void VmulqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vmulq_s8(*v0, *v1); } void VmulqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vmulq_s16(*v0, *v1); } void VmulqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vmulq_s32(*v0, *v1); } void VmulqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vmulq_u8(*v0, *v1); } void VmulqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vmulq_u16(*v0, *v1); } void VmulqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vmulq_u32(*v0, *v1); } void VmulqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vmulq_f32(*v0, *v1); } void VmulqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vmulq_f64(*v0, *v1); } void VmulqNS16(int16x8_t* r, int16x8_t* v0, int16_t* v1) { *r = vmulq_n_s16(*v0, *v1); } void VmulqNS32(int32x4_t* r, int32x4_t* v0, int32_t* v1) { *r = vmulq_n_s32(*v0, *v1); } void VmulqNU16(uint16x8_t* r, uint16x8_t* v0, uint16_t* v1) { *r = vmulq_n_u16(*v0, *v1); } void VmulqNU32(uint32x4_t* r, uint32x4_t* v0, uint32_t* v1) { *r = vmulq_n_u32(*v0, *v1); } void VmulqNF32(float32x4_t* r, float32x4_t* v0, float32_t* v1) { *r = vmulq_n_f32(*v0, *v1); } void VmulqNF64(float64x2_t* r, float64x2_t* v0, float64_t* v1) { *r = vmulq_n_f64(*v0, *v1); } void VmulqP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vmulq_p8(*v0, *v1); } void VmulxF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vmulx_f32(*v0, *v1); } void VmulxF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vmulx_f64(*v0, *v1); } void VmulxdF64(float64_t* r, float64_t* v0, float64_t* v1) { *r = vmulxd_f64(*v0, *v1); } void VmulxqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vmulxq_f32(*v0, *v1); } void VmulxqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vmulxq_f64(*v0, *v1); } void VmulxsF32(float32_t* r, float32_t* v0, float32_t* v1) { *r = vmulxs_f32(*v0, *v1); } void VmvnS8(int8x8_t* r, int8x8_t* v0) { *r = vmvn_s8(*v0); } void VmvnS16(int16x4_t* r, int16x4_t* v0) { *r = vmvn_s16(*v0); } void VmvnS32(int32x2_t* r, int32x2_t* v0) { *r = vmvn_s32(*v0); } void VmvnU8(uint8x8_t* r, uint8x8_t* v0) { *r = vmvn_u8(*v0); } void VmvnU16(uint16x4_t* r, uint16x4_t* v0) { *r = vmvn_u16(*v0); } void VmvnU32(uint32x2_t* r, uint32x2_t* v0) { *r = vmvn_u32(*v0); } void VmvnP8(poly8x8_t* r, poly8x8_t* v0) { *r = vmvn_p8(*v0); } void VmvnqS8(int8x16_t* r, int8x16_t* v0) { *r = vmvnq_s8(*v0); } void VmvnqS16(int16x8_t* r, int16x8_t* v0) { *r = vmvnq_s16(*v0); } void VmvnqS32(int32x4_t* r, int32x4_t* v0) { *r = vmvnq_s32(*v0); } void VmvnqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vmvnq_u8(*v0); } void VmvnqU16(uint16x8_t* r, uint16x8_t* v0) { *r = vmvnq_u16(*v0); } void VmvnqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vmvnq_u32(*v0); } void VmvnqP8(poly8x16_t* r, poly8x16_t* v0) { *r = vmvnq_p8(*v0); } void VnegS8(int8x8_t* r, int8x8_t* v0) { *r = vneg_s8(*v0); } void VnegS16(int16x4_t* r, int16x4_t* v0) { *r = vneg_s16(*v0); } void VnegS32(int32x2_t* r, int32x2_t* v0) { *r = vneg_s32(*v0); } void VnegS64(int64x1_t* r, int64x1_t* v0) { *r = vneg_s64(*v0); } void VnegF32(float32x2_t* r, float32x2_t* v0) { *r = vneg_f32(*v0); } void VnegF64(float64x1_t* r, float64x1_t* v0) { *r = vneg_f64(*v0); } void VnegdS64(int64_t* r, int64_t* v0) { *r = vnegd_s64(*v0); } void VnegqS8(int8x16_t* r, int8x16_t* v0) { *r = vnegq_s8(*v0); } void VnegqS16(int16x8_t* r, int16x8_t* v0) { *r = vnegq_s16(*v0); } void VnegqS32(int32x4_t* r, int32x4_t* v0) { *r = vnegq_s32(*v0); } void VnegqS64(int64x2_t* r, int64x2_t* v0) { *r = vnegq_s64(*v0); } void VnegqF32(float32x4_t* r, float32x4_t* v0) { *r = vnegq_f32(*v0); } void VnegqF64(float64x2_t* r, float64x2_t* v0) { *r = vnegq_f64(*v0); } void VornS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vorn_s8(*v0, *v1); } void VornS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vorn_s16(*v0, *v1); } void VornS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vorn_s32(*v0, *v1); } void VornS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vorn_s64(*v0, *v1); } void VornU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vorn_u8(*v0, *v1); } void VornU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vorn_u16(*v0, *v1); } void VornU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vorn_u32(*v0, *v1); } void VornU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vorn_u64(*v0, *v1); } void VornqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vornq_s8(*v0, *v1); } void VornqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vornq_s16(*v0, *v1); } void VornqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vornq_s32(*v0, *v1); } void VornqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vornq_s64(*v0, *v1); } void VornqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vornq_u8(*v0, *v1); } void VornqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vornq_u16(*v0, *v1); } void VornqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vornq_u32(*v0, *v1); } void VornqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vornq_u64(*v0, *v1); } void VorrS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vorr_s8(*v0, *v1); } void VorrS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vorr_s16(*v0, *v1); } void VorrS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vorr_s32(*v0, *v1); } void VorrS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vorr_s64(*v0, *v1); } void VorrU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vorr_u8(*v0, *v1); } void VorrU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vorr_u16(*v0, *v1); } void VorrU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vorr_u32(*v0, *v1); } void VorrU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vorr_u64(*v0, *v1); } void VorrqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vorrq_s8(*v0, *v1); } void VorrqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vorrq_s16(*v0, *v1); } void VorrqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vorrq_s32(*v0, *v1); } void VorrqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vorrq_s64(*v0, *v1); } void VorrqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vorrq_u8(*v0, *v1); } void VorrqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vorrq_u16(*v0, *v1); } void VorrqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vorrq_u32(*v0, *v1); } void VorrqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vorrq_u64(*v0, *v1); } void VpadalS8(int16x4_t* r, int16x4_t* v0, int8x8_t* v1) { *r = vpadal_s8(*v0, *v1); } void VpadalS16(int32x2_t* r, int32x2_t* v0, int16x4_t* v1) { *r = vpadal_s16(*v0, *v1); } void VpadalS32(int64x1_t* r, int64x1_t* v0, int32x2_t* v1) { *r = vpadal_s32(*v0, *v1); } void VpadalU8(uint16x4_t* r, uint16x4_t* v0, uint8x8_t* v1) { *r = vpadal_u8(*v0, *v1); } void VpadalU16(uint32x2_t* r, uint32x2_t* v0, uint16x4_t* v1) { *r = vpadal_u16(*v0, *v1); } void VpadalU32(uint64x1_t* r, uint64x1_t* v0, uint32x2_t* v1) { *r = vpadal_u32(*v0, *v1); } void VpadalqS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1) { *r = vpadalq_s8(*v0, *v1); } void VpadalqS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1) { *r = vpadalq_s16(*v0, *v1); } void VpadalqS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1) { *r = vpadalq_s32(*v0, *v1); } void VpadalqU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1) { *r = vpadalq_u8(*v0, *v1); } void VpadalqU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1) { *r = vpadalq_u16(*v0, *v1); } void VpadalqU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1) { *r = vpadalq_u32(*v0, *v1); } void VpaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vpadd_s8(*v0, *v1); } void VpaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vpadd_s16(*v0, *v1); } void VpaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vpadd_s32(*v0, *v1); } void VpaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vpadd_u8(*v0, *v1); } void VpaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vpadd_u16(*v0, *v1); } void VpaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vpadd_u32(*v0, *v1); } void VpaddF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpadd_f32(*v0, *v1); } void VpadddS64(int64_t* r, int64x2_t* v0) { *r = vpaddd_s64(*v0); } void VpadddU64(uint64_t* r, uint64x2_t* v0) { *r = vpaddd_u64(*v0); } void VpadddF64(float64_t* r, float64x2_t* v0) { *r = vpaddd_f64(*v0); } void VpaddlS8(int16x4_t* r, int8x8_t* v0) { *r = vpaddl_s8(*v0); } void VpaddlS16(int32x2_t* r, int16x4_t* v0) { *r = vpaddl_s16(*v0); } void VpaddlS32(int64x1_t* r, int32x2_t* v0) { *r = vpaddl_s32(*v0); } void VpaddlU8(uint16x4_t* r, uint8x8_t* v0) { *r = vpaddl_u8(*v0); } void VpaddlU16(uint32x2_t* r, uint16x4_t* v0) { *r = vpaddl_u16(*v0); } void VpaddlU32(uint64x1_t* r, uint32x2_t* v0) { *r = vpaddl_u32(*v0); } void VpaddlqS8(int16x8_t* r, int8x16_t* v0) { *r = vpaddlq_s8(*v0); } void VpaddlqS16(int32x4_t* r, int16x8_t* v0) { *r = vpaddlq_s16(*v0); } void VpaddlqS32(int64x2_t* r, int32x4_t* v0) { *r = vpaddlq_s32(*v0); } void VpaddlqU8(uint16x8_t* r, uint8x16_t* v0) { *r = vpaddlq_u8(*v0); } void VpaddlqU16(uint32x4_t* r, uint16x8_t* v0) { *r = vpaddlq_u16(*v0); } void VpaddlqU32(uint64x2_t* r, uint32x4_t* v0) { *r = vpaddlq_u32(*v0); } void VpaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vpaddq_s8(*v0, *v1); } void VpaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vpaddq_s16(*v0, *v1); } void VpaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vpaddq_s32(*v0, *v1); } void VpaddqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vpaddq_s64(*v0, *v1); } void VpaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vpaddq_u8(*v0, *v1); } void VpaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vpaddq_u16(*v0, *v1); } void VpaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vpaddq_u32(*v0, *v1); } void VpaddqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vpaddq_u64(*v0, *v1); } void VpaddqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpaddq_f32(*v0, *v1); } void VpaddqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpaddq_f64(*v0, *v1); } void VpaddsF32(float32_t* r, float32x2_t* v0) { *r = vpadds_f32(*v0); } void VpmaxS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vpmax_s8(*v0, *v1); } void VpmaxS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vpmax_s16(*v0, *v1); } void VpmaxS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vpmax_s32(*v0, *v1); } void VpmaxU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vpmax_u8(*v0, *v1); } void VpmaxU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vpmax_u16(*v0, *v1); } void VpmaxU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vpmax_u32(*v0, *v1); } void VpmaxF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpmax_f32(*v0, *v1); } void VpmaxnmF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpmaxnm_f32(*v0, *v1); } void VpmaxnmqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpmaxnmq_f32(*v0, *v1); } void VpmaxnmqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpmaxnmq_f64(*v0, *v1); } void VpmaxnmqdF64(float64_t* r, float64x2_t* v0) { *r = vpmaxnmqd_f64(*v0); } void VpmaxnmsF32(float32_t* r, float32x2_t* v0) { *r = vpmaxnms_f32(*v0); } void VpmaxqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vpmaxq_s8(*v0, *v1); } void VpmaxqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vpmaxq_s16(*v0, *v1); } void VpmaxqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vpmaxq_s32(*v0, *v1); } void VpmaxqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vpmaxq_u8(*v0, *v1); } void VpmaxqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vpmaxq_u16(*v0, *v1); } void VpmaxqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vpmaxq_u32(*v0, *v1); } void VpmaxqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpmaxq_f32(*v0, *v1); } void VpmaxqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpmaxq_f64(*v0, *v1); } void VpmaxqdF64(float64_t* r, float64x2_t* v0) { *r = vpmaxqd_f64(*v0); } void VpmaxsF32(float32_t* r, float32x2_t* v0) { *r = vpmaxs_f32(*v0); } void VpminS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vpmin_s8(*v0, *v1); } void VpminS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vpmin_s16(*v0, *v1); } void VpminS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vpmin_s32(*v0, *v1); } void VpminU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vpmin_u8(*v0, *v1); } void VpminU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vpmin_u16(*v0, *v1); } void VpminU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vpmin_u32(*v0, *v1); } void VpminF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpmin_f32(*v0, *v1); } void VpminnmF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vpminnm_f32(*v0, *v1); } void VpminnmqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpminnmq_f32(*v0, *v1); } void VpminnmqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpminnmq_f64(*v0, *v1); } void VpminnmqdF64(float64_t* r, float64x2_t* v0) { *r = vpminnmqd_f64(*v0); } void VpminnmsF32(float32_t* r, float32x2_t* v0) { *r = vpminnms_f32(*v0); } void VpminqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vpminq_s8(*v0, *v1); } void VpminqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vpminq_s16(*v0, *v1); } void VpminqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vpminq_s32(*v0, *v1); } void VpminqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vpminq_u8(*v0, *v1); } void VpminqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vpminq_u16(*v0, *v1); } void VpminqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vpminq_u32(*v0, *v1); } void VpminqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vpminq_f32(*v0, *v1); } void VpminqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vpminq_f64(*v0, *v1); } void VpminqdF64(float64_t* r, float64x2_t* v0) { *r = vpminqd_f64(*v0); } void VpminsF32(float32_t* r, float32x2_t* v0) { *r = vpmins_f32(*v0); } void VqabsS8(int8x8_t* r, int8x8_t* v0) { *r = vqabs_s8(*v0); } void VqabsS16(int16x4_t* r, int16x4_t* v0) { *r = vqabs_s16(*v0); } void VqabsS32(int32x2_t* r, int32x2_t* v0) { *r = vqabs_s32(*v0); } void VqabsS64(int64x1_t* r, int64x1_t* v0) { *r = vqabs_s64(*v0); } void VqabsbS8(int8_t* r, int8_t* v0) { *r = vqabsb_s8(*v0); } void VqabsdS64(int64_t* r, int64_t* v0) { *r = vqabsd_s64(*v0); } void VqabshS16(int16_t* r, int16_t* v0) { *r = vqabsh_s16(*v0); } void VqabsqS8(int8x16_t* r, int8x16_t* v0) { *r = vqabsq_s8(*v0); } void VqabsqS16(int16x8_t* r, int16x8_t* v0) { *r = vqabsq_s16(*v0); } void VqabsqS32(int32x4_t* r, int32x4_t* v0) { *r = vqabsq_s32(*v0); } void VqabsqS64(int64x2_t* r, int64x2_t* v0) { *r = vqabsq_s64(*v0); } void VqabssS32(int32_t* r, int32_t* v0) { *r = vqabss_s32(*v0); } void VqaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vqadd_s8(*v0, *v1); } void VqaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqadd_s16(*v0, *v1); } void VqaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqadd_s32(*v0, *v1); } void VqaddS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vqadd_s64(*v0, *v1); } void VqaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vqadd_u8(*v0, *v1); } void VqaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vqadd_u16(*v0, *v1); } void VqaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vqadd_u32(*v0, *v1); } void VqaddU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vqadd_u64(*v0, *v1); } void VqaddbS8(int8_t* r, int8_t* v0, int8_t* v1) { *r = vqaddb_s8(*v0, *v1); } void VqaddbU8(uint8_t* r, uint8_t* v0, uint8_t* v1) { *r = vqaddb_u8(*v0, *v1); } void VqadddS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vqaddd_s64(*v0, *v1); } void VqadddU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vqaddd_u64(*v0, *v1); } void VqaddhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqaddh_s16(*v0, *v1); } void VqaddhU16(uint16_t* r, uint16_t* v0, uint16_t* v1) { *r = vqaddh_u16(*v0, *v1); } void VqaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vqaddq_s8(*v0, *v1); } void VqaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqaddq_s16(*v0, *v1); } void VqaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqaddq_s32(*v0, *v1); } void VqaddqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vqaddq_s64(*v0, *v1); } void VqaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vqaddq_u8(*v0, *v1); } void VqaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vqaddq_u16(*v0, *v1); } void VqaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vqaddq_u32(*v0, *v1); } void VqaddqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vqaddq_u64(*v0, *v1); } void VqaddsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqadds_s32(*v0, *v1); } void VqaddsU32(uint32_t* r, uint32_t* v0, uint32_t* v1) { *r = vqadds_u32(*v0, *v1); } void VqdmlalS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vqdmlal_s16(*v0, *v1, *v2); } void VqdmlalS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vqdmlal_s32(*v0, *v1, *v2); } void VqdmlalHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vqdmlal_high_s16(*v0, *v1, *v2); } void VqdmlalHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vqdmlal_high_s32(*v0, *v1, *v2); } void VqdmlalHighNS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16_t* v2) { *r = vqdmlal_high_n_s16(*v0, *v1, *v2); } void VqdmlalHighNS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32_t* v2) { *r = vqdmlal_high_n_s32(*v0, *v1, *v2); } void VqdmlalNS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vqdmlal_n_s16(*v0, *v1, *v2); } void VqdmlalNS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vqdmlal_n_s32(*v0, *v1, *v2); } void VqdmlalhS16(int32_t* r, int32_t* v0, int16_t* v1, int16_t* v2) { *r = vqdmlalh_s16(*v0, *v1, *v2); } void VqdmlalsS32(int64_t* r, int64_t* v0, int32_t* v1, int32_t* v2) { *r = vqdmlals_s32(*v0, *v1, *v2); } void VqdmlslS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vqdmlsl_s16(*v0, *v1, *v2); } void VqdmlslS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vqdmlsl_s32(*v0, *v1, *v2); } void VqdmlslHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vqdmlsl_high_s16(*v0, *v1, *v2); } void VqdmlslHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vqdmlsl_high_s32(*v0, *v1, *v2); } void VqdmlslHighNS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1, int16_t* v2) { *r = vqdmlsl_high_n_s16(*v0, *v1, *v2); } void VqdmlslHighNS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1, int32_t* v2) { *r = vqdmlsl_high_n_s32(*v0, *v1, *v2); } void VqdmlslNS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1, int16_t* v2) { *r = vqdmlsl_n_s16(*v0, *v1, *v2); } void VqdmlslNS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1, int32_t* v2) { *r = vqdmlsl_n_s32(*v0, *v1, *v2); } void VqdmlslhS16(int32_t* r, int32_t* v0, int16_t* v1, int16_t* v2) { *r = vqdmlslh_s16(*v0, *v1, *v2); } void VqdmlslsS32(int64_t* r, int64_t* v0, int32_t* v1, int32_t* v2) { *r = vqdmlsls_s32(*v0, *v1, *v2); } void VqdmulhS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqdmulh_s16(*v0, *v1); } void VqdmulhS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqdmulh_s32(*v0, *v1); } void VqdmulhNS16(int16x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vqdmulh_n_s16(*v0, *v1); } void VqdmulhNS32(int32x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vqdmulh_n_s32(*v0, *v1); } void VqdmulhhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqdmulhh_s16(*v0, *v1); } void VqdmulhqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqdmulhq_s16(*v0, *v1); } void VqdmulhqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqdmulhq_s32(*v0, *v1); } void VqdmulhqNS16(int16x8_t* r, int16x8_t* v0, int16_t* v1) { *r = vqdmulhq_n_s16(*v0, *v1); } void VqdmulhqNS32(int32x4_t* r, int32x4_t* v0, int32_t* v1) { *r = vqdmulhq_n_s32(*v0, *v1); } void VqdmulhsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqdmulhs_s32(*v0, *v1); } void VqdmullS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqdmull_s16(*v0, *v1); } void VqdmullS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqdmull_s32(*v0, *v1); } void VqdmullHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqdmull_high_s16(*v0, *v1); } void VqdmullHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqdmull_high_s32(*v0, *v1); } void VqdmullHighNS16(int32x4_t* r, int16x8_t* v0, int16_t* v1) { *r = vqdmull_high_n_s16(*v0, *v1); } void VqdmullHighNS32(int64x2_t* r, int32x4_t* v0, int32_t* v1) { *r = vqdmull_high_n_s32(*v0, *v1); } void VqdmullNS16(int32x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vqdmull_n_s16(*v0, *v1); } void VqdmullNS32(int64x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vqdmull_n_s32(*v0, *v1); } void VqdmullhS16(int32_t* r, int16_t* v0, int16_t* v1) { *r = vqdmullh_s16(*v0, *v1); } void VqdmullsS32(int64_t* r, int32_t* v0, int32_t* v1) { *r = vqdmulls_s32(*v0, *v1); } void VqmovnS16(int8x8_t* r, int16x8_t* v0) { *r = vqmovn_s16(*v0); } void VqmovnS32(int16x4_t* r, int32x4_t* v0) { *r = vqmovn_s32(*v0); } void VqmovnS64(int32x2_t* r, int64x2_t* v0) { *r = vqmovn_s64(*v0); } void VqmovnU16(uint8x8_t* r, uint16x8_t* v0) { *r = vqmovn_u16(*v0); } void VqmovnU32(uint16x4_t* r, uint32x4_t* v0) { *r = vqmovn_u32(*v0); } void VqmovnU64(uint32x2_t* r, uint64x2_t* v0) { *r = vqmovn_u64(*v0); } void VqmovnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1) { *r = vqmovn_high_s16(*v0, *v1); } void VqmovnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1) { *r = vqmovn_high_s32(*v0, *v1); } void VqmovnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1) { *r = vqmovn_high_s64(*v0, *v1); } void VqmovnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1) { *r = vqmovn_high_u16(*v0, *v1); } void VqmovnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1) { *r = vqmovn_high_u32(*v0, *v1); } void VqmovnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1) { *r = vqmovn_high_u64(*v0, *v1); } void VqmovndS64(int32_t* r, int64_t* v0) { *r = vqmovnd_s64(*v0); } void VqmovndU64(uint32_t* r, uint64_t* v0) { *r = vqmovnd_u64(*v0); } void VqmovnhS16(int8_t* r, int16_t* v0) { *r = vqmovnh_s16(*v0); } void VqmovnhU16(uint8_t* r, uint16_t* v0) { *r = vqmovnh_u16(*v0); } void VqmovnsS32(int16_t* r, int32_t* v0) { *r = vqmovns_s32(*v0); } void VqmovnsU32(uint16_t* r, uint32_t* v0) { *r = vqmovns_u32(*v0); } void VqmovunS16(uint8x8_t* r, int16x8_t* v0) { *r = vqmovun_s16(*v0); } void VqmovunS32(uint16x4_t* r, int32x4_t* v0) { *r = vqmovun_s32(*v0); } void VqmovunS64(uint32x2_t* r, int64x2_t* v0) { *r = vqmovun_s64(*v0); } void VqmovunHighS16(uint8x16_t* r, uint8x8_t* v0, int16x8_t* v1) { *r = vqmovun_high_s16(*v0, *v1); } void VqmovunHighS32(uint16x8_t* r, uint16x4_t* v0, int32x4_t* v1) { *r = vqmovun_high_s32(*v0, *v1); } void VqmovunHighS64(uint32x4_t* r, uint32x2_t* v0, int64x2_t* v1) { *r = vqmovun_high_s64(*v0, *v1); } void VqmovundS64(uint32_t* r, int64_t* v0) { *r = vqmovund_s64(*v0); } void VqmovunhS16(uint8_t* r, int16_t* v0) { *r = vqmovunh_s16(*v0); } void VqmovunsS32(uint16_t* r, int32_t* v0) { *r = vqmovuns_s32(*v0); } void VqnegS8(int8x8_t* r, int8x8_t* v0) { *r = vqneg_s8(*v0); } void VqnegS16(int16x4_t* r, int16x4_t* v0) { *r = vqneg_s16(*v0); } void VqnegS32(int32x2_t* r, int32x2_t* v0) { *r = vqneg_s32(*v0); } void VqnegS64(int64x1_t* r, int64x1_t* v0) { *r = vqneg_s64(*v0); } void VqnegbS8(int8_t* r, int8_t* v0) { *r = vqnegb_s8(*v0); } void VqnegdS64(int64_t* r, int64_t* v0) { *r = vqnegd_s64(*v0); } void VqneghS16(int16_t* r, int16_t* v0) { *r = vqnegh_s16(*v0); } void VqnegqS8(int8x16_t* r, int8x16_t* v0) { *r = vqnegq_s8(*v0); } void VqnegqS16(int16x8_t* r, int16x8_t* v0) { *r = vqnegq_s16(*v0); } void VqnegqS32(int32x4_t* r, int32x4_t* v0) { *r = vqnegq_s32(*v0); } void VqnegqS64(int64x2_t* r, int64x2_t* v0) { *r = vqnegq_s64(*v0); } void VqnegsS32(int32_t* r, int32_t* v0) { *r = vqnegs_s32(*v0); } void VqrdmlahS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vqrdmlah_s16(*v0, *v1, *v2); } void VqrdmlahS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vqrdmlah_s32(*v0, *v1, *v2); } void VqrdmlahhS16(int16_t* r, int16_t* v0, int16_t* v1, int16_t* v2) { *r = vqrdmlahh_s16(*v0, *v1, *v2); } void VqrdmlahqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vqrdmlahq_s16(*v0, *v1, *v2); } void VqrdmlahqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vqrdmlahq_s32(*v0, *v1, *v2); } void VqrdmlahsS32(int32_t* r, int32_t* v0, int32_t* v1, int32_t* v2) { *r = vqrdmlahs_s32(*v0, *v1, *v2); } void VqrdmlshS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1, int16x4_t* v2) { *r = vqrdmlsh_s16(*v0, *v1, *v2); } void VqrdmlshS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1, int32x2_t* v2) { *r = vqrdmlsh_s32(*v0, *v1, *v2); } void VqrdmlshhS16(int16_t* r, int16_t* v0, int16_t* v1, int16_t* v2) { *r = vqrdmlshh_s16(*v0, *v1, *v2); } void VqrdmlshqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vqrdmlshq_s16(*v0, *v1, *v2); } void VqrdmlshqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vqrdmlshq_s32(*v0, *v1, *v2); } void VqrdmlshsS32(int32_t* r, int32_t* v0, int32_t* v1, int32_t* v2) { *r = vqrdmlshs_s32(*v0, *v1, *v2); } void VqrdmulhS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqrdmulh_s16(*v0, *v1); } void VqrdmulhS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqrdmulh_s32(*v0, *v1); } void VqrdmulhNS16(int16x4_t* r, int16x4_t* v0, int16_t* v1) { *r = vqrdmulh_n_s16(*v0, *v1); } void VqrdmulhNS32(int32x2_t* r, int32x2_t* v0, int32_t* v1) { *r = vqrdmulh_n_s32(*v0, *v1); } void VqrdmulhhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqrdmulhh_s16(*v0, *v1); } void VqrdmulhqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqrdmulhq_s16(*v0, *v1); } void VqrdmulhqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqrdmulhq_s32(*v0, *v1); } void VqrdmulhqNS16(int16x8_t* r, int16x8_t* v0, int16_t* v1) { *r = vqrdmulhq_n_s16(*v0, *v1); } void VqrdmulhqNS32(int32x4_t* r, int32x4_t* v0, int32_t* v1) { *r = vqrdmulhq_n_s32(*v0, *v1); } void VqrdmulhsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqrdmulhs_s32(*v0, *v1); } void VqrshlS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vqrshl_s8(*v0, *v1); } void VqrshlS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqrshl_s16(*v0, *v1); } void VqrshlS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqrshl_s32(*v0, *v1); } void VqrshlS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vqrshl_s64(*v0, *v1); } void VqrshlU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vqrshl_u8(*v0, *v1); } void VqrshlU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vqrshl_u16(*v0, *v1); } void VqrshlU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vqrshl_u32(*v0, *v1); } void VqrshlU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vqrshl_u64(*v0, *v1); } void VqrshlbS8(int8_t* r, int8_t* v0, int8_t* v1) { *r = vqrshlb_s8(*v0, *v1); } void VqrshlbU8(uint8_t* r, uint8_t* v0, int8_t* v1) { *r = vqrshlb_u8(*v0, *v1); } void VqrshldS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vqrshld_s64(*v0, *v1); } void VqrshldU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vqrshld_u64(*v0, *v1); } void VqrshlhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqrshlh_s16(*v0, *v1); } void VqrshlhU16(uint16_t* r, uint16_t* v0, int16_t* v1) { *r = vqrshlh_u16(*v0, *v1); } void VqrshlqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vqrshlq_s8(*v0, *v1); } void VqrshlqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqrshlq_s16(*v0, *v1); } void VqrshlqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqrshlq_s32(*v0, *v1); } void VqrshlqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vqrshlq_s64(*v0, *v1); } void VqrshlqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vqrshlq_u8(*v0, *v1); } void VqrshlqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vqrshlq_u16(*v0, *v1); } void VqrshlqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vqrshlq_u32(*v0, *v1); } void VqrshlqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vqrshlq_u64(*v0, *v1); } void VqrshlsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqrshls_s32(*v0, *v1); } void VqrshlsU32(uint32_t* r, uint32_t* v0, int32_t* v1) { *r = vqrshls_u32(*v0, *v1); } void VqshlS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vqshl_s8(*v0, *v1); } void VqshlS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqshl_s16(*v0, *v1); } void VqshlS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqshl_s32(*v0, *v1); } void VqshlS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vqshl_s64(*v0, *v1); } void VqshlU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vqshl_u8(*v0, *v1); } void VqshlU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vqshl_u16(*v0, *v1); } void VqshlU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vqshl_u32(*v0, *v1); } void VqshlU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vqshl_u64(*v0, *v1); } void VqshlbS8(int8_t* r, int8_t* v0, int8_t* v1) { *r = vqshlb_s8(*v0, *v1); } void VqshlbU8(uint8_t* r, uint8_t* v0, int8_t* v1) { *r = vqshlb_u8(*v0, *v1); } void VqshldS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vqshld_s64(*v0, *v1); } void VqshldU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vqshld_u64(*v0, *v1); } void VqshlhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqshlh_s16(*v0, *v1); } void VqshlhU16(uint16_t* r, uint16_t* v0, int16_t* v1) { *r = vqshlh_u16(*v0, *v1); } void VqshlqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vqshlq_s8(*v0, *v1); } void VqshlqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqshlq_s16(*v0, *v1); } void VqshlqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqshlq_s32(*v0, *v1); } void VqshlqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vqshlq_s64(*v0, *v1); } void VqshlqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vqshlq_u8(*v0, *v1); } void VqshlqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vqshlq_u16(*v0, *v1); } void VqshlqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vqshlq_u32(*v0, *v1); } void VqshlqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vqshlq_u64(*v0, *v1); } void VqshlsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqshls_s32(*v0, *v1); } void VqshlsU32(uint32_t* r, uint32_t* v0, int32_t* v1) { *r = vqshls_u32(*v0, *v1); } void VqsubS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vqsub_s8(*v0, *v1); } void VqsubS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vqsub_s16(*v0, *v1); } void VqsubS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vqsub_s32(*v0, *v1); } void VqsubS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vqsub_s64(*v0, *v1); } void VqsubU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vqsub_u8(*v0, *v1); } void VqsubU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vqsub_u16(*v0, *v1); } void VqsubU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vqsub_u32(*v0, *v1); } void VqsubU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vqsub_u64(*v0, *v1); } void VqsubbS8(int8_t* r, int8_t* v0, int8_t* v1) { *r = vqsubb_s8(*v0, *v1); } void VqsubbU8(uint8_t* r, uint8_t* v0, uint8_t* v1) { *r = vqsubb_u8(*v0, *v1); } void VqsubdS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vqsubd_s64(*v0, *v1); } void VqsubdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vqsubd_u64(*v0, *v1); } void VqsubhS16(int16_t* r, int16_t* v0, int16_t* v1) { *r = vqsubh_s16(*v0, *v1); } void VqsubhU16(uint16_t* r, uint16_t* v0, uint16_t* v1) { *r = vqsubh_u16(*v0, *v1); } void VqsubqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vqsubq_s8(*v0, *v1); } void VqsubqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vqsubq_s16(*v0, *v1); } void VqsubqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vqsubq_s32(*v0, *v1); } void VqsubqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vqsubq_s64(*v0, *v1); } void VqsubqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vqsubq_u8(*v0, *v1); } void VqsubqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vqsubq_u16(*v0, *v1); } void VqsubqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vqsubq_u32(*v0, *v1); } void VqsubqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vqsubq_u64(*v0, *v1); } void VqsubsS32(int32_t* r, int32_t* v0, int32_t* v1) { *r = vqsubs_s32(*v0, *v1); } void VqsubsU32(uint32_t* r, uint32_t* v0, uint32_t* v1) { *r = vqsubs_u32(*v0, *v1); } void Vqtbl1S8(int8x8_t* r, int8x16_t* v0, uint8x8_t* v1) { *r = vqtbl1_s8(*v0, *v1); } void Vqtbl1U8(uint8x8_t* r, uint8x16_t* v0, uint8x8_t* v1) { *r = vqtbl1_u8(*v0, *v1); } void Vqtbl1P8(poly8x8_t* r, poly8x16_t* v0, uint8x8_t* v1) { *r = vqtbl1_p8(*v0, *v1); } void Vqtbl1QS8(int8x16_t* r, int8x16_t* v0, uint8x16_t* v1) { *r = vqtbl1q_s8(*v0, *v1); } void Vqtbl1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vqtbl1q_u8(*v0, *v1); } void Vqtbl1QP8(poly8x16_t* r, poly8x16_t* v0, uint8x16_t* v1) { *r = vqtbl1q_p8(*v0, *v1); } void Vqtbl2S8(int8x8_t* r, int8x16x2_t* v0, uint8x8_t* v1) { *r = vqtbl2_s8(*v0, *v1); } void Vqtbl2U8(uint8x8_t* r, uint8x16x2_t* v0, uint8x8_t* v1) { *r = vqtbl2_u8(*v0, *v1); } void Vqtbl2P8(poly8x8_t* r, poly8x16x2_t* v0, uint8x8_t* v1) { *r = vqtbl2_p8(*v0, *v1); } void Vqtbl2QS8(int8x16_t* r, int8x16x2_t* v0, uint8x16_t* v1) { *r = vqtbl2q_s8(*v0, *v1); } void Vqtbl2QU8(uint8x16_t* r, uint8x16x2_t* v0, uint8x16_t* v1) { *r = vqtbl2q_u8(*v0, *v1); } void Vqtbl2QP8(poly8x16_t* r, poly8x16x2_t* v0, uint8x16_t* v1) { *r = vqtbl2q_p8(*v0, *v1); } void Vqtbl3S8(int8x8_t* r, int8x16x3_t* v0, uint8x8_t* v1) { *r = vqtbl3_s8(*v0, *v1); } void Vqtbl3U8(uint8x8_t* r, uint8x16x3_t* v0, uint8x8_t* v1) { *r = vqtbl3_u8(*v0, *v1); } void Vqtbl3P8(poly8x8_t* r, poly8x16x3_t* v0, uint8x8_t* v1) { *r = vqtbl3_p8(*v0, *v1); } void Vqtbl3QS8(int8x16_t* r, int8x16x3_t* v0, uint8x16_t* v1) { *r = vqtbl3q_s8(*v0, *v1); } void Vqtbl3QU8(uint8x16_t* r, uint8x16x3_t* v0, uint8x16_t* v1) { *r = vqtbl3q_u8(*v0, *v1); } void Vqtbl3QP8(poly8x16_t* r, poly8x16x3_t* v0, uint8x16_t* v1) { *r = vqtbl3q_p8(*v0, *v1); } void Vqtbl4S8(int8x8_t* r, int8x16x4_t* v0, uint8x8_t* v1) { *r = vqtbl4_s8(*v0, *v1); } void Vqtbl4U8(uint8x8_t* r, uint8x16x4_t* v0, uint8x8_t* v1) { *r = vqtbl4_u8(*v0, *v1); } void Vqtbl4P8(poly8x8_t* r, poly8x16x4_t* v0, uint8x8_t* v1) { *r = vqtbl4_p8(*v0, *v1); } void Vqtbl4QS8(int8x16_t* r, int8x16x4_t* v0, uint8x16_t* v1) { *r = vqtbl4q_s8(*v0, *v1); } void Vqtbl4QU8(uint8x16_t* r, uint8x16x4_t* v0, uint8x16_t* v1) { *r = vqtbl4q_u8(*v0, *v1); } void Vqtbl4QP8(poly8x16_t* r, poly8x16x4_t* v0, uint8x16_t* v1) { *r = vqtbl4q_p8(*v0, *v1); } void Vqtbx1S8(int8x8_t* r, int8x8_t* v0, int8x16_t* v1, uint8x8_t* v2) { *r = vqtbx1_s8(*v0, *v1, *v2); } void Vqtbx1U8(uint8x8_t* r, uint8x8_t* v0, uint8x16_t* v1, uint8x8_t* v2) { *r = vqtbx1_u8(*v0, *v1, *v2); } void Vqtbx1P8(poly8x8_t* r, poly8x8_t* v0, poly8x16_t* v1, uint8x8_t* v2) { *r = vqtbx1_p8(*v0, *v1, *v2); } void Vqtbx1QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1, uint8x16_t* v2) { *r = vqtbx1q_s8(*v0, *v1, *v2); } void Vqtbx1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1, uint8x16_t* v2) { *r = vqtbx1q_u8(*v0, *v1, *v2); } void Vqtbx1QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1, uint8x16_t* v2) { *r = vqtbx1q_p8(*v0, *v1, *v2); } void Vqtbx2S8(int8x8_t* r, int8x8_t* v0, int8x16x2_t* v1, uint8x8_t* v2) { *r = vqtbx2_s8(*v0, *v1, *v2); } void Vqtbx2U8(uint8x8_t* r, uint8x8_t* v0, uint8x16x2_t* v1, uint8x8_t* v2) { *r = vqtbx2_u8(*v0, *v1, *v2); } void Vqtbx2P8(poly8x8_t* r, poly8x8_t* v0, poly8x16x2_t* v1, uint8x8_t* v2) { *r = vqtbx2_p8(*v0, *v1, *v2); } void Vqtbx2QS8(int8x16_t* r, int8x16_t* v0, int8x16x2_t* v1, uint8x16_t* v2) { *r = vqtbx2q_s8(*v0, *v1, *v2); } void Vqtbx2QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16x2_t* v1, uint8x16_t* v2) { *r = vqtbx2q_u8(*v0, *v1, *v2); } void Vqtbx2QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16x2_t* v1, uint8x16_t* v2) { *r = vqtbx2q_p8(*v0, *v1, *v2); } void Vqtbx3S8(int8x8_t* r, int8x8_t* v0, int8x16x3_t* v1, uint8x8_t* v2) { *r = vqtbx3_s8(*v0, *v1, *v2); } void Vqtbx3U8(uint8x8_t* r, uint8x8_t* v0, uint8x16x3_t* v1, uint8x8_t* v2) { *r = vqtbx3_u8(*v0, *v1, *v2); } void Vqtbx3P8(poly8x8_t* r, poly8x8_t* v0, poly8x16x3_t* v1, uint8x8_t* v2) { *r = vqtbx3_p8(*v0, *v1, *v2); } void Vqtbx3QS8(int8x16_t* r, int8x16_t* v0, int8x16x3_t* v1, uint8x16_t* v2) { *r = vqtbx3q_s8(*v0, *v1, *v2); } void Vqtbx3QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16x3_t* v1, uint8x16_t* v2) { *r = vqtbx3q_u8(*v0, *v1, *v2); } void Vqtbx3QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16x3_t* v1, uint8x16_t* v2) { *r = vqtbx3q_p8(*v0, *v1, *v2); } void Vqtbx4S8(int8x8_t* r, int8x8_t* v0, int8x16x4_t* v1, uint8x8_t* v2) { *r = vqtbx4_s8(*v0, *v1, *v2); } void Vqtbx4U8(uint8x8_t* r, uint8x8_t* v0, uint8x16x4_t* v1, uint8x8_t* v2) { *r = vqtbx4_u8(*v0, *v1, *v2); } void Vqtbx4P8(poly8x8_t* r, poly8x8_t* v0, poly8x16x4_t* v1, uint8x8_t* v2) { *r = vqtbx4_p8(*v0, *v1, *v2); } void Vqtbx4QS8(int8x16_t* r, int8x16_t* v0, int8x16x4_t* v1, uint8x16_t* v2) { *r = vqtbx4q_s8(*v0, *v1, *v2); } void Vqtbx4QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16x4_t* v1, uint8x16_t* v2) { *r = vqtbx4q_u8(*v0, *v1, *v2); } void Vqtbx4QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16x4_t* v1, uint8x16_t* v2) { *r = vqtbx4q_p8(*v0, *v1, *v2); } void VraddhnS16(int8x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vraddhn_s16(*v0, *v1); } void VraddhnS32(int16x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vraddhn_s32(*v0, *v1); } void VraddhnS64(int32x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vraddhn_s64(*v0, *v1); } void VraddhnU16(uint8x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vraddhn_u16(*v0, *v1); } void VraddhnU32(uint16x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vraddhn_u32(*v0, *v1); } void VraddhnU64(uint32x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vraddhn_u64(*v0, *v1); } void VraddhnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vraddhn_high_s16(*v0, *v1, *v2); } void VraddhnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vraddhn_high_s32(*v0, *v1, *v2); } void VraddhnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vraddhn_high_s64(*v0, *v1, *v2); } void VraddhnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vraddhn_high_u16(*v0, *v1, *v2); } void VraddhnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vraddhn_high_u32(*v0, *v1, *v2); } void VraddhnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vraddhn_high_u64(*v0, *v1, *v2); } void Vrax1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vrax1q_u64(*v0, *v1); } void VrbitS8(int8x8_t* r, int8x8_t* v0) { *r = vrbit_s8(*v0); } void VrbitU8(uint8x8_t* r, uint8x8_t* v0) { *r = vrbit_u8(*v0); } void VrbitP8(poly8x8_t* r, poly8x8_t* v0) { *r = vrbit_p8(*v0); } void VrbitqS8(int8x16_t* r, int8x16_t* v0) { *r = vrbitq_s8(*v0); } void VrbitqU8(uint8x16_t* r, uint8x16_t* v0) { *r = vrbitq_u8(*v0); } void VrbitqP8(poly8x16_t* r, poly8x16_t* v0) { *r = vrbitq_p8(*v0); } void VrecpeU32(uint32x2_t* r, uint32x2_t* v0) { *r = vrecpe_u32(*v0); } void VrecpeF32(float32x2_t* r, float32x2_t* v0) { *r = vrecpe_f32(*v0); } void VrecpeF64(float64x1_t* r, float64x1_t* v0) { *r = vrecpe_f64(*v0); } void VrecpedF64(float64_t* r, float64_t* v0) { *r = vrecped_f64(*v0); } void VrecpeqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vrecpeq_u32(*v0); } void VrecpeqF32(float32x4_t* r, float32x4_t* v0) { *r = vrecpeq_f32(*v0); } void VrecpeqF64(float64x2_t* r, float64x2_t* v0) { *r = vrecpeq_f64(*v0); } void VrecpesF32(float32_t* r, float32_t* v0) { *r = vrecpes_f32(*v0); } void VrecpsF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vrecps_f32(*v0, *v1); } void VrecpsF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vrecps_f64(*v0, *v1); } void VrecpsdF64(float64_t* r, float64_t* v0, float64_t* v1) { *r = vrecpsd_f64(*v0, *v1); } void VrecpsqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vrecpsq_f32(*v0, *v1); } void VrecpsqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vrecpsq_f64(*v0, *v1); } void VrecpssF32(float32_t* r, float32_t* v0, float32_t* v1) { *r = vrecpss_f32(*v0, *v1); } void VrecpxdF64(float64_t* r, float64_t* v0) { *r = vrecpxd_f64(*v0); } void VrecpxsF32(float32_t* r, float32_t* v0) { *r = vrecpxs_f32(*v0); } void VreinterpretF32S8(float32x2_t* r, int8x8_t* v0) { *r = vreinterpret_f32_s8(*v0); } void VreinterpretF32S16(float32x2_t* r, int16x4_t* v0) { *r = vreinterpret_f32_s16(*v0); } void VreinterpretF32S32(float32x2_t* r, int32x2_t* v0) { *r = vreinterpret_f32_s32(*v0); } void VreinterpretF32S64(float32x2_t* r, int64x1_t* v0) { *r = vreinterpret_f32_s64(*v0); } void VreinterpretF32U8(float32x2_t* r, uint8x8_t* v0) { *r = vreinterpret_f32_u8(*v0); } void VreinterpretF32U16(float32x2_t* r, uint16x4_t* v0) { *r = vreinterpret_f32_u16(*v0); } void VreinterpretF32U32(float32x2_t* r, uint32x2_t* v0) { *r = vreinterpret_f32_u32(*v0); } void VreinterpretF32U64(float32x2_t* r, uint64x1_t* v0) { *r = vreinterpret_f32_u64(*v0); } void VreinterpretF32F64(float32x2_t* r, float64x1_t* v0) { *r = vreinterpret_f32_f64(*v0); } void VreinterpretF32P16(float32x2_t* r, poly16x4_t* v0) { *r = vreinterpret_f32_p16(*v0); } void VreinterpretF32P64(float32x2_t* r, poly64x1_t* v0) { *r = vreinterpret_f32_p64(*v0); } void VreinterpretF32P8(float32x2_t* r, poly8x8_t* v0) { *r = vreinterpret_f32_p8(*v0); } void VreinterpretF64S8(float64x1_t* r, int8x8_t* v0) { *r = vreinterpret_f64_s8(*v0); } void VreinterpretF64S16(float64x1_t* r, int16x4_t* v0) { *r = vreinterpret_f64_s16(*v0); } void VreinterpretF64S32(float64x1_t* r, int32x2_t* v0) { *r = vreinterpret_f64_s32(*v0); } void VreinterpretF64S64(float64x1_t* r, int64x1_t* v0) { *r = vreinterpret_f64_s64(*v0); } void VreinterpretF64U8(float64x1_t* r, uint8x8_t* v0) { *r = vreinterpret_f64_u8(*v0); } void VreinterpretF64U16(float64x1_t* r, uint16x4_t* v0) { *r = vreinterpret_f64_u16(*v0); } void VreinterpretF64U32(float64x1_t* r, uint32x2_t* v0) { *r = vreinterpret_f64_u32(*v0); } void VreinterpretF64U64(float64x1_t* r, uint64x1_t* v0) { *r = vreinterpret_f64_u64(*v0); } void VreinterpretF64F32(float64x1_t* r, float32x2_t* v0) { *r = vreinterpret_f64_f32(*v0); } void VreinterpretF64P16(float64x1_t* r, poly16x4_t* v0) { *r = vreinterpret_f64_p16(*v0); } void VreinterpretF64P64(float64x1_t* r, poly64x1_t* v0) { *r = vreinterpret_f64_p64(*v0); } void VreinterpretF64P8(float64x1_t* r, poly8x8_t* v0) { *r = vreinterpret_f64_p8(*v0); } void VreinterpretP16S8(poly16x4_t* r, int8x8_t* v0) { *r = vreinterpret_p16_s8(*v0); } void VreinterpretP16S16(poly16x4_t* r, int16x4_t* v0) { *r = vreinterpret_p16_s16(*v0); } void VreinterpretP16S32(poly16x4_t* r, int32x2_t* v0) { *r = vreinterpret_p16_s32(*v0); } void VreinterpretP16S64(poly16x4_t* r, int64x1_t* v0) { *r = vreinterpret_p16_s64(*v0); } void VreinterpretP16U8(poly16x4_t* r, uint8x8_t* v0) { *r = vreinterpret_p16_u8(*v0); } void VreinterpretP16U16(poly16x4_t* r, uint16x4_t* v0) { *r = vreinterpret_p16_u16(*v0); } void VreinterpretP16U32(poly16x4_t* r, uint32x2_t* v0) { *r = vreinterpret_p16_u32(*v0); } void VreinterpretP16U64(poly16x4_t* r, uint64x1_t* v0) { *r = vreinterpret_p16_u64(*v0); } void VreinterpretP16F32(poly16x4_t* r, float32x2_t* v0) { *r = vreinterpret_p16_f32(*v0); } void VreinterpretP16F64(poly16x4_t* r, float64x1_t* v0) { *r = vreinterpret_p16_f64(*v0); } void VreinterpretP16P64(poly16x4_t* r, poly64x1_t* v0) { *r = vreinterpret_p16_p64(*v0); } void VreinterpretP16P8(poly16x4_t* r, poly8x8_t* v0) { *r = vreinterpret_p16_p8(*v0); } void VreinterpretP64S8(poly64x1_t* r, int8x8_t* v0) { *r = vreinterpret_p64_s8(*v0); } void VreinterpretP64S16(poly64x1_t* r, int16x4_t* v0) { *r = vreinterpret_p64_s16(*v0); } void VreinterpretP64S32(poly64x1_t* r, int32x2_t* v0) { *r = vreinterpret_p64_s32(*v0); } void VreinterpretP64S64(poly64x1_t* r, int64x1_t* v0) { *r = vreinterpret_p64_s64(*v0); } void VreinterpretP64U8(poly64x1_t* r, uint8x8_t* v0) { *r = vreinterpret_p64_u8(*v0); } void VreinterpretP64U16(poly64x1_t* r, uint16x4_t* v0) { *r = vreinterpret_p64_u16(*v0); } void VreinterpretP64U32(poly64x1_t* r, uint32x2_t* v0) { *r = vreinterpret_p64_u32(*v0); } void VreinterpretP64U64(poly64x1_t* r, uint64x1_t* v0) { *r = vreinterpret_p64_u64(*v0); } void VreinterpretP64F32(poly64x1_t* r, float32x2_t* v0) { *r = vreinterpret_p64_f32(*v0); } void VreinterpretP64F64(poly64x1_t* r, float64x1_t* v0) { *r = vreinterpret_p64_f64(*v0); } void VreinterpretP64P16(poly64x1_t* r, poly16x4_t* v0) { *r = vreinterpret_p64_p16(*v0); } void VreinterpretP64P8(poly64x1_t* r, poly8x8_t* v0) { *r = vreinterpret_p64_p8(*v0); } void VreinterpretP8S8(poly8x8_t* r, int8x8_t* v0) { *r = vreinterpret_p8_s8(*v0); } void VreinterpretP8S16(poly8x8_t* r, int16x4_t* v0) { *r = vreinterpret_p8_s16(*v0); } void VreinterpretP8S32(poly8x8_t* r, int32x2_t* v0) { *r = vreinterpret_p8_s32(*v0); } void VreinterpretP8S64(poly8x8_t* r, int64x1_t* v0) { *r = vreinterpret_p8_s64(*v0); } void VreinterpretP8U8(poly8x8_t* r, uint8x8_t* v0) { *r = vreinterpret_p8_u8(*v0); } void VreinterpretP8U16(poly8x8_t* r, uint16x4_t* v0) { *r = vreinterpret_p8_u16(*v0); } void VreinterpretP8U32(poly8x8_t* r, uint32x2_t* v0) { *r = vreinterpret_p8_u32(*v0); } void VreinterpretP8U64(poly8x8_t* r, uint64x1_t* v0) { *r = vreinterpret_p8_u64(*v0); } void VreinterpretP8F32(poly8x8_t* r, float32x2_t* v0) { *r = vreinterpret_p8_f32(*v0); } void VreinterpretP8F64(poly8x8_t* r, float64x1_t* v0) { *r = vreinterpret_p8_f64(*v0); } void VreinterpretP8P16(poly8x8_t* r, poly16x4_t* v0) { *r = vreinterpret_p8_p16(*v0); } void VreinterpretP8P64(poly8x8_t* r, poly64x1_t* v0) { *r = vreinterpret_p8_p64(*v0); } void VreinterpretS16S8(int16x4_t* r, int8x8_t* v0) { *r = vreinterpret_s16_s8(*v0); } void VreinterpretS16S32(int16x4_t* r, int32x2_t* v0) { *r = vreinterpret_s16_s32(*v0); } void VreinterpretS16S64(int16x4_t* r, int64x1_t* v0) { *r = vreinterpret_s16_s64(*v0); } void VreinterpretS16U8(int16x4_t* r, uint8x8_t* v0) { *r = vreinterpret_s16_u8(*v0); } void VreinterpretS16U16(int16x4_t* r, uint16x4_t* v0) { *r = vreinterpret_s16_u16(*v0); } void VreinterpretS16U32(int16x4_t* r, uint32x2_t* v0) { *r = vreinterpret_s16_u32(*v0); } void VreinterpretS16U64(int16x4_t* r, uint64x1_t* v0) { *r = vreinterpret_s16_u64(*v0); } void VreinterpretS16F32(int16x4_t* r, float32x2_t* v0) { *r = vreinterpret_s16_f32(*v0); } void VreinterpretS16F64(int16x4_t* r, float64x1_t* v0) { *r = vreinterpret_s16_f64(*v0); } void VreinterpretS16P16(int16x4_t* r, poly16x4_t* v0) { *r = vreinterpret_s16_p16(*v0); } void VreinterpretS16P64(int16x4_t* r, poly64x1_t* v0) { *r = vreinterpret_s16_p64(*v0); } void VreinterpretS16P8(int16x4_t* r, poly8x8_t* v0) { *r = vreinterpret_s16_p8(*v0); } void VreinterpretS32S8(int32x2_t* r, int8x8_t* v0) { *r = vreinterpret_s32_s8(*v0); } void VreinterpretS32S16(int32x2_t* r, int16x4_t* v0) { *r = vreinterpret_s32_s16(*v0); } void VreinterpretS32S64(int32x2_t* r, int64x1_t* v0) { *r = vreinterpret_s32_s64(*v0); } void VreinterpretS32U8(int32x2_t* r, uint8x8_t* v0) { *r = vreinterpret_s32_u8(*v0); } void VreinterpretS32U16(int32x2_t* r, uint16x4_t* v0) { *r = vreinterpret_s32_u16(*v0); } void VreinterpretS32U32(int32x2_t* r, uint32x2_t* v0) { *r = vreinterpret_s32_u32(*v0); } void VreinterpretS32U64(int32x2_t* r, uint64x1_t* v0) { *r = vreinterpret_s32_u64(*v0); } void VreinterpretS32F32(int32x2_t* r, float32x2_t* v0) { *r = vreinterpret_s32_f32(*v0); } void VreinterpretS32F64(int32x2_t* r, float64x1_t* v0) { *r = vreinterpret_s32_f64(*v0); } void VreinterpretS32P16(int32x2_t* r, poly16x4_t* v0) { *r = vreinterpret_s32_p16(*v0); } void VreinterpretS32P64(int32x2_t* r, poly64x1_t* v0) { *r = vreinterpret_s32_p64(*v0); } void VreinterpretS32P8(int32x2_t* r, poly8x8_t* v0) { *r = vreinterpret_s32_p8(*v0); } void VreinterpretS64S8(int64x1_t* r, int8x8_t* v0) { *r = vreinterpret_s64_s8(*v0); } void VreinterpretS64S16(int64x1_t* r, int16x4_t* v0) { *r = vreinterpret_s64_s16(*v0); } void VreinterpretS64S32(int64x1_t* r, int32x2_t* v0) { *r = vreinterpret_s64_s32(*v0); } void VreinterpretS64U8(int64x1_t* r, uint8x8_t* v0) { *r = vreinterpret_s64_u8(*v0); } void VreinterpretS64U16(int64x1_t* r, uint16x4_t* v0) { *r = vreinterpret_s64_u16(*v0); } void VreinterpretS64U32(int64x1_t* r, uint32x2_t* v0) { *r = vreinterpret_s64_u32(*v0); } void VreinterpretS64U64(int64x1_t* r, uint64x1_t* v0) { *r = vreinterpret_s64_u64(*v0); } void VreinterpretS64F32(int64x1_t* r, float32x2_t* v0) { *r = vreinterpret_s64_f32(*v0); } void VreinterpretS64F64(int64x1_t* r, float64x1_t* v0) { *r = vreinterpret_s64_f64(*v0); } void VreinterpretS64P16(int64x1_t* r, poly16x4_t* v0) { *r = vreinterpret_s64_p16(*v0); } void VreinterpretS64P64(int64x1_t* r, poly64x1_t* v0) { *r = vreinterpret_s64_p64(*v0); } void VreinterpretS64P8(int64x1_t* r, poly8x8_t* v0) { *r = vreinterpret_s64_p8(*v0); } void VreinterpretS8S16(int8x8_t* r, int16x4_t* v0) { *r = vreinterpret_s8_s16(*v0); } void VreinterpretS8S32(int8x8_t* r, int32x2_t* v0) { *r = vreinterpret_s8_s32(*v0); } void VreinterpretS8S64(int8x8_t* r, int64x1_t* v0) { *r = vreinterpret_s8_s64(*v0); } void VreinterpretS8U8(int8x8_t* r, uint8x8_t* v0) { *r = vreinterpret_s8_u8(*v0); } void VreinterpretS8U16(int8x8_t* r, uint16x4_t* v0) { *r = vreinterpret_s8_u16(*v0); } void VreinterpretS8U32(int8x8_t* r, uint32x2_t* v0) { *r = vreinterpret_s8_u32(*v0); } void VreinterpretS8U64(int8x8_t* r, uint64x1_t* v0) { *r = vreinterpret_s8_u64(*v0); } void VreinterpretS8F32(int8x8_t* r, float32x2_t* v0) { *r = vreinterpret_s8_f32(*v0); } void VreinterpretS8F64(int8x8_t* r, float64x1_t* v0) { *r = vreinterpret_s8_f64(*v0); } void VreinterpretS8P16(int8x8_t* r, poly16x4_t* v0) { *r = vreinterpret_s8_p16(*v0); } void VreinterpretS8P64(int8x8_t* r, poly64x1_t* v0) { *r = vreinterpret_s8_p64(*v0); } void VreinterpretS8P8(int8x8_t* r, poly8x8_t* v0) { *r = vreinterpret_s8_p8(*v0); } void VreinterpretU16S8(uint16x4_t* r, int8x8_t* v0) { *r = vreinterpret_u16_s8(*v0); } void VreinterpretU16S16(uint16x4_t* r, int16x4_t* v0) { *r = vreinterpret_u16_s16(*v0); } void VreinterpretU16S32(uint16x4_t* r, int32x2_t* v0) { *r = vreinterpret_u16_s32(*v0); } void VreinterpretU16S64(uint16x4_t* r, int64x1_t* v0) { *r = vreinterpret_u16_s64(*v0); } void VreinterpretU16U8(uint16x4_t* r, uint8x8_t* v0) { *r = vreinterpret_u16_u8(*v0); } void VreinterpretU16U32(uint16x4_t* r, uint32x2_t* v0) { *r = vreinterpret_u16_u32(*v0); } void VreinterpretU16U64(uint16x4_t* r, uint64x1_t* v0) { *r = vreinterpret_u16_u64(*v0); } void VreinterpretU16F32(uint16x4_t* r, float32x2_t* v0) { *r = vreinterpret_u16_f32(*v0); } void VreinterpretU16F64(uint16x4_t* r, float64x1_t* v0) { *r = vreinterpret_u16_f64(*v0); } void VreinterpretU16P16(uint16x4_t* r, poly16x4_t* v0) { *r = vreinterpret_u16_p16(*v0); } void VreinterpretU16P64(uint16x4_t* r, poly64x1_t* v0) { *r = vreinterpret_u16_p64(*v0); } void VreinterpretU16P8(uint16x4_t* r, poly8x8_t* v0) { *r = vreinterpret_u16_p8(*v0); } void VreinterpretU32S8(uint32x2_t* r, int8x8_t* v0) { *r = vreinterpret_u32_s8(*v0); } void VreinterpretU32S16(uint32x2_t* r, int16x4_t* v0) { *r = vreinterpret_u32_s16(*v0); } void VreinterpretU32S32(uint32x2_t* r, int32x2_t* v0) { *r = vreinterpret_u32_s32(*v0); } void VreinterpretU32S64(uint32x2_t* r, int64x1_t* v0) { *r = vreinterpret_u32_s64(*v0); } void VreinterpretU32U8(uint32x2_t* r, uint8x8_t* v0) { *r = vreinterpret_u32_u8(*v0); } void VreinterpretU32U16(uint32x2_t* r, uint16x4_t* v0) { *r = vreinterpret_u32_u16(*v0); } void VreinterpretU32U64(uint32x2_t* r, uint64x1_t* v0) { *r = vreinterpret_u32_u64(*v0); } void VreinterpretU32F32(uint32x2_t* r, float32x2_t* v0) { *r = vreinterpret_u32_f32(*v0); } void VreinterpretU32F64(uint32x2_t* r, float64x1_t* v0) { *r = vreinterpret_u32_f64(*v0); } void VreinterpretU32P16(uint32x2_t* r, poly16x4_t* v0) { *r = vreinterpret_u32_p16(*v0); } void VreinterpretU32P64(uint32x2_t* r, poly64x1_t* v0) { *r = vreinterpret_u32_p64(*v0); } void VreinterpretU32P8(uint32x2_t* r, poly8x8_t* v0) { *r = vreinterpret_u32_p8(*v0); } void VreinterpretU64S8(uint64x1_t* r, int8x8_t* v0) { *r = vreinterpret_u64_s8(*v0); } void VreinterpretU64S16(uint64x1_t* r, int16x4_t* v0) { *r = vreinterpret_u64_s16(*v0); } void VreinterpretU64S32(uint64x1_t* r, int32x2_t* v0) { *r = vreinterpret_u64_s32(*v0); } void VreinterpretU64S64(uint64x1_t* r, int64x1_t* v0) { *r = vreinterpret_u64_s64(*v0); } void VreinterpretU64U8(uint64x1_t* r, uint8x8_t* v0) { *r = vreinterpret_u64_u8(*v0); } void VreinterpretU64U16(uint64x1_t* r, uint16x4_t* v0) { *r = vreinterpret_u64_u16(*v0); } void VreinterpretU64U32(uint64x1_t* r, uint32x2_t* v0) { *r = vreinterpret_u64_u32(*v0); } void VreinterpretU64F32(uint64x1_t* r, float32x2_t* v0) { *r = vreinterpret_u64_f32(*v0); } void VreinterpretU64F64(uint64x1_t* r, float64x1_t* v0) { *r = vreinterpret_u64_f64(*v0); } void VreinterpretU64P16(uint64x1_t* r, poly16x4_t* v0) { *r = vreinterpret_u64_p16(*v0); } void VreinterpretU64P64(uint64x1_t* r, poly64x1_t* v0) { *r = vreinterpret_u64_p64(*v0); } void VreinterpretU64P8(uint64x1_t* r, poly8x8_t* v0) { *r = vreinterpret_u64_p8(*v0); } void VreinterpretU8S8(uint8x8_t* r, int8x8_t* v0) { *r = vreinterpret_u8_s8(*v0); } void VreinterpretU8S16(uint8x8_t* r, int16x4_t* v0) { *r = vreinterpret_u8_s16(*v0); } void VreinterpretU8S32(uint8x8_t* r, int32x2_t* v0) { *r = vreinterpret_u8_s32(*v0); } void VreinterpretU8S64(uint8x8_t* r, int64x1_t* v0) { *r = vreinterpret_u8_s64(*v0); } void VreinterpretU8U16(uint8x8_t* r, uint16x4_t* v0) { *r = vreinterpret_u8_u16(*v0); } void VreinterpretU8U32(uint8x8_t* r, uint32x2_t* v0) { *r = vreinterpret_u8_u32(*v0); } void VreinterpretU8U64(uint8x8_t* r, uint64x1_t* v0) { *r = vreinterpret_u8_u64(*v0); } void VreinterpretU8F32(uint8x8_t* r, float32x2_t* v0) { *r = vreinterpret_u8_f32(*v0); } void VreinterpretU8F64(uint8x8_t* r, float64x1_t* v0) { *r = vreinterpret_u8_f64(*v0); } void VreinterpretU8P16(uint8x8_t* r, poly16x4_t* v0) { *r = vreinterpret_u8_p16(*v0); } void VreinterpretU8P64(uint8x8_t* r, poly64x1_t* v0) { *r = vreinterpret_u8_p64(*v0); } void VreinterpretU8P8(uint8x8_t* r, poly8x8_t* v0) { *r = vreinterpret_u8_p8(*v0); } void VreinterpretqF32S8(float32x4_t* r, int8x16_t* v0) { *r = vreinterpretq_f32_s8(*v0); } void VreinterpretqF32S16(float32x4_t* r, int16x8_t* v0) { *r = vreinterpretq_f32_s16(*v0); } void VreinterpretqF32S32(float32x4_t* r, int32x4_t* v0) { *r = vreinterpretq_f32_s32(*v0); } void VreinterpretqF32S64(float32x4_t* r, int64x2_t* v0) { *r = vreinterpretq_f32_s64(*v0); } void VreinterpretqF32U8(float32x4_t* r, uint8x16_t* v0) { *r = vreinterpretq_f32_u8(*v0); } void VreinterpretqF32U16(float32x4_t* r, uint16x8_t* v0) { *r = vreinterpretq_f32_u16(*v0); } void VreinterpretqF32U32(float32x4_t* r, uint32x4_t* v0) { *r = vreinterpretq_f32_u32(*v0); } void VreinterpretqF32U64(float32x4_t* r, uint64x2_t* v0) { *r = vreinterpretq_f32_u64(*v0); } void VreinterpretqF32F64(float32x4_t* r, float64x2_t* v0) { *r = vreinterpretq_f32_f64(*v0); } void VreinterpretqF32P128(float32x4_t* r, poly128_t* v0) { *r = vreinterpretq_f32_p128(*v0); } void VreinterpretqF32P16(float32x4_t* r, poly16x8_t* v0) { *r = vreinterpretq_f32_p16(*v0); } void VreinterpretqF32P64(float32x4_t* r, poly64x2_t* v0) { *r = vreinterpretq_f32_p64(*v0); } void VreinterpretqF32P8(float32x4_t* r, poly8x16_t* v0) { *r = vreinterpretq_f32_p8(*v0); } void VreinterpretqF64S8(float64x2_t* r, int8x16_t* v0) { *r = vreinterpretq_f64_s8(*v0); } void VreinterpretqF64S16(float64x2_t* r, int16x8_t* v0) { *r = vreinterpretq_f64_s16(*v0); } void VreinterpretqF64S32(float64x2_t* r, int32x4_t* v0) { *r = vreinterpretq_f64_s32(*v0); } void VreinterpretqF64S64(float64x2_t* r, int64x2_t* v0) { *r = vreinterpretq_f64_s64(*v0); } void VreinterpretqF64U8(float64x2_t* r, uint8x16_t* v0) { *r = vreinterpretq_f64_u8(*v0); } void VreinterpretqF64U16(float64x2_t* r, uint16x8_t* v0) { *r = vreinterpretq_f64_u16(*v0); } void VreinterpretqF64U32(float64x2_t* r, uint32x4_t* v0) { *r = vreinterpretq_f64_u32(*v0); } void VreinterpretqF64U64(float64x2_t* r, uint64x2_t* v0) { *r = vreinterpretq_f64_u64(*v0); } void VreinterpretqF64F32(float64x2_t* r, float32x4_t* v0) { *r = vreinterpretq_f64_f32(*v0); } void VreinterpretqF64P128(float64x2_t* r, poly128_t* v0) { *r = vreinterpretq_f64_p128(*v0); } void VreinterpretqF64P16(float64x2_t* r, poly16x8_t* v0) { *r = vreinterpretq_f64_p16(*v0); } void VreinterpretqF64P64(float64x2_t* r, poly64x2_t* v0) { *r = vreinterpretq_f64_p64(*v0); } void VreinterpretqF64P8(float64x2_t* r, poly8x16_t* v0) { *r = vreinterpretq_f64_p8(*v0); } void VreinterpretqP128S8(poly128_t* r, int8x16_t* v0) { *r = vreinterpretq_p128_s8(*v0); } void VreinterpretqP128S16(poly128_t* r, int16x8_t* v0) { *r = vreinterpretq_p128_s16(*v0); } void VreinterpretqP128S32(poly128_t* r, int32x4_t* v0) { *r = vreinterpretq_p128_s32(*v0); } void VreinterpretqP128S64(poly128_t* r, int64x2_t* v0) { *r = vreinterpretq_p128_s64(*v0); } void VreinterpretqP128U8(poly128_t* r, uint8x16_t* v0) { *r = vreinterpretq_p128_u8(*v0); } void VreinterpretqP128U16(poly128_t* r, uint16x8_t* v0) { *r = vreinterpretq_p128_u16(*v0); } void VreinterpretqP128U32(poly128_t* r, uint32x4_t* v0) { *r = vreinterpretq_p128_u32(*v0); } void VreinterpretqP128U64(poly128_t* r, uint64x2_t* v0) { *r = vreinterpretq_p128_u64(*v0); } void VreinterpretqP128F32(poly128_t* r, float32x4_t* v0) { *r = vreinterpretq_p128_f32(*v0); } void VreinterpretqP128F64(poly128_t* r, float64x2_t* v0) { *r = vreinterpretq_p128_f64(*v0); } void VreinterpretqP128P16(poly128_t* r, poly16x8_t* v0) { *r = vreinterpretq_p128_p16(*v0); } void VreinterpretqP128P64(poly128_t* r, poly64x2_t* v0) { *r = vreinterpretq_p128_p64(*v0); } void VreinterpretqP128P8(poly128_t* r, poly8x16_t* v0) { *r = vreinterpretq_p128_p8(*v0); } void VreinterpretqP16S8(poly16x8_t* r, int8x16_t* v0) { *r = vreinterpretq_p16_s8(*v0); } void VreinterpretqP16S16(poly16x8_t* r, int16x8_t* v0) { *r = vreinterpretq_p16_s16(*v0); } void VreinterpretqP16S32(poly16x8_t* r, int32x4_t* v0) { *r = vreinterpretq_p16_s32(*v0); } void VreinterpretqP16S64(poly16x8_t* r, int64x2_t* v0) { *r = vreinterpretq_p16_s64(*v0); } void VreinterpretqP16U8(poly16x8_t* r, uint8x16_t* v0) { *r = vreinterpretq_p16_u8(*v0); } void VreinterpretqP16U16(poly16x8_t* r, uint16x8_t* v0) { *r = vreinterpretq_p16_u16(*v0); } void VreinterpretqP16U32(poly16x8_t* r, uint32x4_t* v0) { *r = vreinterpretq_p16_u32(*v0); } void VreinterpretqP16U64(poly16x8_t* r, uint64x2_t* v0) { *r = vreinterpretq_p16_u64(*v0); } void VreinterpretqP16F32(poly16x8_t* r, float32x4_t* v0) { *r = vreinterpretq_p16_f32(*v0); } void VreinterpretqP16F64(poly16x8_t* r, float64x2_t* v0) { *r = vreinterpretq_p16_f64(*v0); } void VreinterpretqP16P128(poly16x8_t* r, poly128_t* v0) { *r = vreinterpretq_p16_p128(*v0); } void VreinterpretqP16P64(poly16x8_t* r, poly64x2_t* v0) { *r = vreinterpretq_p16_p64(*v0); } void VreinterpretqP16P8(poly16x8_t* r, poly8x16_t* v0) { *r = vreinterpretq_p16_p8(*v0); } void VreinterpretqP64S8(poly64x2_t* r, int8x16_t* v0) { *r = vreinterpretq_p64_s8(*v0); } void VreinterpretqP64S16(poly64x2_t* r, int16x8_t* v0) { *r = vreinterpretq_p64_s16(*v0); } void VreinterpretqP64S32(poly64x2_t* r, int32x4_t* v0) { *r = vreinterpretq_p64_s32(*v0); } void VreinterpretqP64S64(poly64x2_t* r, int64x2_t* v0) { *r = vreinterpretq_p64_s64(*v0); } void VreinterpretqP64U8(poly64x2_t* r, uint8x16_t* v0) { *r = vreinterpretq_p64_u8(*v0); } void VreinterpretqP64U16(poly64x2_t* r, uint16x8_t* v0) { *r = vreinterpretq_p64_u16(*v0); } void VreinterpretqP64U32(poly64x2_t* r, uint32x4_t* v0) { *r = vreinterpretq_p64_u32(*v0); } void VreinterpretqP64U64(poly64x2_t* r, uint64x2_t* v0) { *r = vreinterpretq_p64_u64(*v0); } void VreinterpretqP64F32(poly64x2_t* r, float32x4_t* v0) { *r = vreinterpretq_p64_f32(*v0); } void VreinterpretqP64F64(poly64x2_t* r, float64x2_t* v0) { *r = vreinterpretq_p64_f64(*v0); } void VreinterpretqP64P128(poly64x2_t* r, poly128_t* v0) { *r = vreinterpretq_p64_p128(*v0); } void VreinterpretqP64P16(poly64x2_t* r, poly16x8_t* v0) { *r = vreinterpretq_p64_p16(*v0); } void VreinterpretqP64P8(poly64x2_t* r, poly8x16_t* v0) { *r = vreinterpretq_p64_p8(*v0); } void VreinterpretqP8S8(poly8x16_t* r, int8x16_t* v0) { *r = vreinterpretq_p8_s8(*v0); } void VreinterpretqP8S16(poly8x16_t* r, int16x8_t* v0) { *r = vreinterpretq_p8_s16(*v0); } void VreinterpretqP8S32(poly8x16_t* r, int32x4_t* v0) { *r = vreinterpretq_p8_s32(*v0); } void VreinterpretqP8S64(poly8x16_t* r, int64x2_t* v0) { *r = vreinterpretq_p8_s64(*v0); } void VreinterpretqP8U8(poly8x16_t* r, uint8x16_t* v0) { *r = vreinterpretq_p8_u8(*v0); } void VreinterpretqP8U16(poly8x16_t* r, uint16x8_t* v0) { *r = vreinterpretq_p8_u16(*v0); } void VreinterpretqP8U32(poly8x16_t* r, uint32x4_t* v0) { *r = vreinterpretq_p8_u32(*v0); } void VreinterpretqP8U64(poly8x16_t* r, uint64x2_t* v0) { *r = vreinterpretq_p8_u64(*v0); } void VreinterpretqP8F32(poly8x16_t* r, float32x4_t* v0) { *r = vreinterpretq_p8_f32(*v0); } void VreinterpretqP8F64(poly8x16_t* r, float64x2_t* v0) { *r = vreinterpretq_p8_f64(*v0); } void VreinterpretqP8P128(poly8x16_t* r, poly128_t* v0) { *r = vreinterpretq_p8_p128(*v0); } void VreinterpretqP8P16(poly8x16_t* r, poly16x8_t* v0) { *r = vreinterpretq_p8_p16(*v0); } void VreinterpretqP8P64(poly8x16_t* r, poly64x2_t* v0) { *r = vreinterpretq_p8_p64(*v0); } void VreinterpretqS16S8(int16x8_t* r, int8x16_t* v0) { *r = vreinterpretq_s16_s8(*v0); } void VreinterpretqS16S32(int16x8_t* r, int32x4_t* v0) { *r = vreinterpretq_s16_s32(*v0); } void VreinterpretqS16S64(int16x8_t* r, int64x2_t* v0) { *r = vreinterpretq_s16_s64(*v0); } void VreinterpretqS16U8(int16x8_t* r, uint8x16_t* v0) { *r = vreinterpretq_s16_u8(*v0); } void VreinterpretqS16U16(int16x8_t* r, uint16x8_t* v0) { *r = vreinterpretq_s16_u16(*v0); } void VreinterpretqS16U32(int16x8_t* r, uint32x4_t* v0) { *r = vreinterpretq_s16_u32(*v0); } void VreinterpretqS16U64(int16x8_t* r, uint64x2_t* v0) { *r = vreinterpretq_s16_u64(*v0); } void VreinterpretqS16F32(int16x8_t* r, float32x4_t* v0) { *r = vreinterpretq_s16_f32(*v0); } void VreinterpretqS16F64(int16x8_t* r, float64x2_t* v0) { *r = vreinterpretq_s16_f64(*v0); } void VreinterpretqS16P128(int16x8_t* r, poly128_t* v0) { *r = vreinterpretq_s16_p128(*v0); } void VreinterpretqS16P16(int16x8_t* r, poly16x8_t* v0) { *r = vreinterpretq_s16_p16(*v0); } void VreinterpretqS16P64(int16x8_t* r, poly64x2_t* v0) { *r = vreinterpretq_s16_p64(*v0); } void VreinterpretqS16P8(int16x8_t* r, poly8x16_t* v0) { *r = vreinterpretq_s16_p8(*v0); } void VreinterpretqS32S8(int32x4_t* r, int8x16_t* v0) { *r = vreinterpretq_s32_s8(*v0); } void VreinterpretqS32S16(int32x4_t* r, int16x8_t* v0) { *r = vreinterpretq_s32_s16(*v0); } void VreinterpretqS32S64(int32x4_t* r, int64x2_t* v0) { *r = vreinterpretq_s32_s64(*v0); } void VreinterpretqS32U8(int32x4_t* r, uint8x16_t* v0) { *r = vreinterpretq_s32_u8(*v0); } void VreinterpretqS32U16(int32x4_t* r, uint16x8_t* v0) { *r = vreinterpretq_s32_u16(*v0); } void VreinterpretqS32U32(int32x4_t* r, uint32x4_t* v0) { *r = vreinterpretq_s32_u32(*v0); } void VreinterpretqS32U64(int32x4_t* r, uint64x2_t* v0) { *r = vreinterpretq_s32_u64(*v0); } void VreinterpretqS32F32(int32x4_t* r, float32x4_t* v0) { *r = vreinterpretq_s32_f32(*v0); } void VreinterpretqS32F64(int32x4_t* r, float64x2_t* v0) { *r = vreinterpretq_s32_f64(*v0); } void VreinterpretqS32P128(int32x4_t* r, poly128_t* v0) { *r = vreinterpretq_s32_p128(*v0); } void VreinterpretqS32P16(int32x4_t* r, poly16x8_t* v0) { *r = vreinterpretq_s32_p16(*v0); } void VreinterpretqS32P64(int32x4_t* r, poly64x2_t* v0) { *r = vreinterpretq_s32_p64(*v0); } void VreinterpretqS32P8(int32x4_t* r, poly8x16_t* v0) { *r = vreinterpretq_s32_p8(*v0); } void VreinterpretqS64S8(int64x2_t* r, int8x16_t* v0) { *r = vreinterpretq_s64_s8(*v0); } void VreinterpretqS64S16(int64x2_t* r, int16x8_t* v0) { *r = vreinterpretq_s64_s16(*v0); } void VreinterpretqS64S32(int64x2_t* r, int32x4_t* v0) { *r = vreinterpretq_s64_s32(*v0); } void VreinterpretqS64U8(int64x2_t* r, uint8x16_t* v0) { *r = vreinterpretq_s64_u8(*v0); } void VreinterpretqS64U16(int64x2_t* r, uint16x8_t* v0) { *r = vreinterpretq_s64_u16(*v0); } void VreinterpretqS64U32(int64x2_t* r, uint32x4_t* v0) { *r = vreinterpretq_s64_u32(*v0); } void VreinterpretqS64U64(int64x2_t* r, uint64x2_t* v0) { *r = vreinterpretq_s64_u64(*v0); } void VreinterpretqS64F32(int64x2_t* r, float32x4_t* v0) { *r = vreinterpretq_s64_f32(*v0); } void VreinterpretqS64F64(int64x2_t* r, float64x2_t* v0) { *r = vreinterpretq_s64_f64(*v0); } void VreinterpretqS64P128(int64x2_t* r, poly128_t* v0) { *r = vreinterpretq_s64_p128(*v0); } void VreinterpretqS64P16(int64x2_t* r, poly16x8_t* v0) { *r = vreinterpretq_s64_p16(*v0); } void VreinterpretqS64P64(int64x2_t* r, poly64x2_t* v0) { *r = vreinterpretq_s64_p64(*v0); } void VreinterpretqS64P8(int64x2_t* r, poly8x16_t* v0) { *r = vreinterpretq_s64_p8(*v0); } void VreinterpretqS8S16(int8x16_t* r, int16x8_t* v0) { *r = vreinterpretq_s8_s16(*v0); } void VreinterpretqS8S32(int8x16_t* r, int32x4_t* v0) { *r = vreinterpretq_s8_s32(*v0); } void VreinterpretqS8S64(int8x16_t* r, int64x2_t* v0) { *r = vreinterpretq_s8_s64(*v0); } void VreinterpretqS8U8(int8x16_t* r, uint8x16_t* v0) { *r = vreinterpretq_s8_u8(*v0); } void VreinterpretqS8U16(int8x16_t* r, uint16x8_t* v0) { *r = vreinterpretq_s8_u16(*v0); } void VreinterpretqS8U32(int8x16_t* r, uint32x4_t* v0) { *r = vreinterpretq_s8_u32(*v0); } void VreinterpretqS8U64(int8x16_t* r, uint64x2_t* v0) { *r = vreinterpretq_s8_u64(*v0); } void VreinterpretqS8F32(int8x16_t* r, float32x4_t* v0) { *r = vreinterpretq_s8_f32(*v0); } void VreinterpretqS8F64(int8x16_t* r, float64x2_t* v0) { *r = vreinterpretq_s8_f64(*v0); } void VreinterpretqS8P128(int8x16_t* r, poly128_t* v0) { *r = vreinterpretq_s8_p128(*v0); } void VreinterpretqS8P16(int8x16_t* r, poly16x8_t* v0) { *r = vreinterpretq_s8_p16(*v0); } void VreinterpretqS8P64(int8x16_t* r, poly64x2_t* v0) { *r = vreinterpretq_s8_p64(*v0); } void VreinterpretqS8P8(int8x16_t* r, poly8x16_t* v0) { *r = vreinterpretq_s8_p8(*v0); } void VreinterpretqU16S8(uint16x8_t* r, int8x16_t* v0) { *r = vreinterpretq_u16_s8(*v0); } void VreinterpretqU16S16(uint16x8_t* r, int16x8_t* v0) { *r = vreinterpretq_u16_s16(*v0); } void VreinterpretqU16S32(uint16x8_t* r, int32x4_t* v0) { *r = vreinterpretq_u16_s32(*v0); } void VreinterpretqU16S64(uint16x8_t* r, int64x2_t* v0) { *r = vreinterpretq_u16_s64(*v0); } void VreinterpretqU16U8(uint16x8_t* r, uint8x16_t* v0) { *r = vreinterpretq_u16_u8(*v0); } void VreinterpretqU16U32(uint16x8_t* r, uint32x4_t* v0) { *r = vreinterpretq_u16_u32(*v0); } void VreinterpretqU16U64(uint16x8_t* r, uint64x2_t* v0) { *r = vreinterpretq_u16_u64(*v0); } void VreinterpretqU16F32(uint16x8_t* r, float32x4_t* v0) { *r = vreinterpretq_u16_f32(*v0); } void VreinterpretqU16F64(uint16x8_t* r, float64x2_t* v0) { *r = vreinterpretq_u16_f64(*v0); } void VreinterpretqU16P128(uint16x8_t* r, poly128_t* v0) { *r = vreinterpretq_u16_p128(*v0); } void VreinterpretqU16P16(uint16x8_t* r, poly16x8_t* v0) { *r = vreinterpretq_u16_p16(*v0); } void VreinterpretqU16P64(uint16x8_t* r, poly64x2_t* v0) { *r = vreinterpretq_u16_p64(*v0); } void VreinterpretqU16P8(uint16x8_t* r, poly8x16_t* v0) { *r = vreinterpretq_u16_p8(*v0); } void VreinterpretqU32S8(uint32x4_t* r, int8x16_t* v0) { *r = vreinterpretq_u32_s8(*v0); } void VreinterpretqU32S16(uint32x4_t* r, int16x8_t* v0) { *r = vreinterpretq_u32_s16(*v0); } void VreinterpretqU32S32(uint32x4_t* r, int32x4_t* v0) { *r = vreinterpretq_u32_s32(*v0); } void VreinterpretqU32S64(uint32x4_t* r, int64x2_t* v0) { *r = vreinterpretq_u32_s64(*v0); } void VreinterpretqU32U8(uint32x4_t* r, uint8x16_t* v0) { *r = vreinterpretq_u32_u8(*v0); } void VreinterpretqU32U16(uint32x4_t* r, uint16x8_t* v0) { *r = vreinterpretq_u32_u16(*v0); } void VreinterpretqU32U64(uint32x4_t* r, uint64x2_t* v0) { *r = vreinterpretq_u32_u64(*v0); } void VreinterpretqU32F32(uint32x4_t* r, float32x4_t* v0) { *r = vreinterpretq_u32_f32(*v0); } void VreinterpretqU32F64(uint32x4_t* r, float64x2_t* v0) { *r = vreinterpretq_u32_f64(*v0); } void VreinterpretqU32P128(uint32x4_t* r, poly128_t* v0) { *r = vreinterpretq_u32_p128(*v0); } void VreinterpretqU32P16(uint32x4_t* r, poly16x8_t* v0) { *r = vreinterpretq_u32_p16(*v0); } void VreinterpretqU32P64(uint32x4_t* r, poly64x2_t* v0) { *r = vreinterpretq_u32_p64(*v0); } void VreinterpretqU32P8(uint32x4_t* r, poly8x16_t* v0) { *r = vreinterpretq_u32_p8(*v0); } void VreinterpretqU64S8(uint64x2_t* r, int8x16_t* v0) { *r = vreinterpretq_u64_s8(*v0); } void VreinterpretqU64S16(uint64x2_t* r, int16x8_t* v0) { *r = vreinterpretq_u64_s16(*v0); } void VreinterpretqU64S32(uint64x2_t* r, int32x4_t* v0) { *r = vreinterpretq_u64_s32(*v0); } void VreinterpretqU64S64(uint64x2_t* r, int64x2_t* v0) { *r = vreinterpretq_u64_s64(*v0); } void VreinterpretqU64U8(uint64x2_t* r, uint8x16_t* v0) { *r = vreinterpretq_u64_u8(*v0); } void VreinterpretqU64U16(uint64x2_t* r, uint16x8_t* v0) { *r = vreinterpretq_u64_u16(*v0); } void VreinterpretqU64U32(uint64x2_t* r, uint32x4_t* v0) { *r = vreinterpretq_u64_u32(*v0); } void VreinterpretqU64F32(uint64x2_t* r, float32x4_t* v0) { *r = vreinterpretq_u64_f32(*v0); } void VreinterpretqU64F64(uint64x2_t* r, float64x2_t* v0) { *r = vreinterpretq_u64_f64(*v0); } void VreinterpretqU64P128(uint64x2_t* r, poly128_t* v0) { *r = vreinterpretq_u64_p128(*v0); } void VreinterpretqU64P16(uint64x2_t* r, poly16x8_t* v0) { *r = vreinterpretq_u64_p16(*v0); } void VreinterpretqU64P64(uint64x2_t* r, poly64x2_t* v0) { *r = vreinterpretq_u64_p64(*v0); } void VreinterpretqU64P8(uint64x2_t* r, poly8x16_t* v0) { *r = vreinterpretq_u64_p8(*v0); } void VreinterpretqU8S8(uint8x16_t* r, int8x16_t* v0) { *r = vreinterpretq_u8_s8(*v0); } void VreinterpretqU8S16(uint8x16_t* r, int16x8_t* v0) { *r = vreinterpretq_u8_s16(*v0); } void VreinterpretqU8S32(uint8x16_t* r, int32x4_t* v0) { *r = vreinterpretq_u8_s32(*v0); } void VreinterpretqU8S64(uint8x16_t* r, int64x2_t* v0) { *r = vreinterpretq_u8_s64(*v0); } void VreinterpretqU8U16(uint8x16_t* r, uint16x8_t* v0) { *r = vreinterpretq_u8_u16(*v0); } void VreinterpretqU8U32(uint8x16_t* r, uint32x4_t* v0) { *r = vreinterpretq_u8_u32(*v0); } void VreinterpretqU8U64(uint8x16_t* r, uint64x2_t* v0) { *r = vreinterpretq_u8_u64(*v0); } void VreinterpretqU8F32(uint8x16_t* r, float32x4_t* v0) { *r = vreinterpretq_u8_f32(*v0); } void VreinterpretqU8F64(uint8x16_t* r, float64x2_t* v0) { *r = vreinterpretq_u8_f64(*v0); } void VreinterpretqU8P128(uint8x16_t* r, poly128_t* v0) { *r = vreinterpretq_u8_p128(*v0); } void VreinterpretqU8P16(uint8x16_t* r, poly16x8_t* v0) { *r = vreinterpretq_u8_p16(*v0); } void VreinterpretqU8P64(uint8x16_t* r, poly64x2_t* v0) { *r = vreinterpretq_u8_p64(*v0); } void VreinterpretqU8P8(uint8x16_t* r, poly8x16_t* v0) { *r = vreinterpretq_u8_p8(*v0); } void Vrev16S8(int8x8_t* r, int8x8_t* v0) { *r = vrev16_s8(*v0); } void Vrev16U8(uint8x8_t* r, uint8x8_t* v0) { *r = vrev16_u8(*v0); } void Vrev16P8(poly8x8_t* r, poly8x8_t* v0) { *r = vrev16_p8(*v0); } void Vrev16QS8(int8x16_t* r, int8x16_t* v0) { *r = vrev16q_s8(*v0); } void Vrev16QU8(uint8x16_t* r, uint8x16_t* v0) { *r = vrev16q_u8(*v0); } void Vrev16QP8(poly8x16_t* r, poly8x16_t* v0) { *r = vrev16q_p8(*v0); } void Vrev32S8(int8x8_t* r, int8x8_t* v0) { *r = vrev32_s8(*v0); } void Vrev32S16(int16x4_t* r, int16x4_t* v0) { *r = vrev32_s16(*v0); } void Vrev32U8(uint8x8_t* r, uint8x8_t* v0) { *r = vrev32_u8(*v0); } void Vrev32U16(uint16x4_t* r, uint16x4_t* v0) { *r = vrev32_u16(*v0); } void Vrev32P16(poly16x4_t* r, poly16x4_t* v0) { *r = vrev32_p16(*v0); } void Vrev32P8(poly8x8_t* r, poly8x8_t* v0) { *r = vrev32_p8(*v0); } void Vrev32QS8(int8x16_t* r, int8x16_t* v0) { *r = vrev32q_s8(*v0); } void Vrev32QS16(int16x8_t* r, int16x8_t* v0) { *r = vrev32q_s16(*v0); } void Vrev32QU8(uint8x16_t* r, uint8x16_t* v0) { *r = vrev32q_u8(*v0); } void Vrev32QU16(uint16x8_t* r, uint16x8_t* v0) { *r = vrev32q_u16(*v0); } void Vrev32QP16(poly16x8_t* r, poly16x8_t* v0) { *r = vrev32q_p16(*v0); } void Vrev32QP8(poly8x16_t* r, poly8x16_t* v0) { *r = vrev32q_p8(*v0); } void Vrev64S8(int8x8_t* r, int8x8_t* v0) { *r = vrev64_s8(*v0); } void Vrev64S16(int16x4_t* r, int16x4_t* v0) { *r = vrev64_s16(*v0); } void Vrev64S32(int32x2_t* r, int32x2_t* v0) { *r = vrev64_s32(*v0); } void Vrev64U8(uint8x8_t* r, uint8x8_t* v0) { *r = vrev64_u8(*v0); } void Vrev64U16(uint16x4_t* r, uint16x4_t* v0) { *r = vrev64_u16(*v0); } void Vrev64U32(uint32x2_t* r, uint32x2_t* v0) { *r = vrev64_u32(*v0); } void Vrev64F32(float32x2_t* r, float32x2_t* v0) { *r = vrev64_f32(*v0); } void Vrev64P16(poly16x4_t* r, poly16x4_t* v0) { *r = vrev64_p16(*v0); } void Vrev64P8(poly8x8_t* r, poly8x8_t* v0) { *r = vrev64_p8(*v0); } void Vrev64QS8(int8x16_t* r, int8x16_t* v0) { *r = vrev64q_s8(*v0); } void Vrev64QS16(int16x8_t* r, int16x8_t* v0) { *r = vrev64q_s16(*v0); } void Vrev64QS32(int32x4_t* r, int32x4_t* v0) { *r = vrev64q_s32(*v0); } void Vrev64QU8(uint8x16_t* r, uint8x16_t* v0) { *r = vrev64q_u8(*v0); } void Vrev64QU16(uint16x8_t* r, uint16x8_t* v0) { *r = vrev64q_u16(*v0); } void Vrev64QU32(uint32x4_t* r, uint32x4_t* v0) { *r = vrev64q_u32(*v0); } void Vrev64QF32(float32x4_t* r, float32x4_t* v0) { *r = vrev64q_f32(*v0); } void Vrev64QP16(poly16x8_t* r, poly16x8_t* v0) { *r = vrev64q_p16(*v0); } void Vrev64QP8(poly8x16_t* r, poly8x16_t* v0) { *r = vrev64q_p8(*v0); } void VrhaddS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vrhadd_s8(*v0, *v1); } void VrhaddS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vrhadd_s16(*v0, *v1); } void VrhaddS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vrhadd_s32(*v0, *v1); } void VrhaddU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vrhadd_u8(*v0, *v1); } void VrhaddU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vrhadd_u16(*v0, *v1); } void VrhaddU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vrhadd_u32(*v0, *v1); } void VrhaddqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vrhaddq_s8(*v0, *v1); } void VrhaddqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vrhaddq_s16(*v0, *v1); } void VrhaddqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vrhaddq_s32(*v0, *v1); } void VrhaddqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vrhaddq_u8(*v0, *v1); } void VrhaddqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vrhaddq_u16(*v0, *v1); } void VrhaddqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vrhaddq_u32(*v0, *v1); } void VrndF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd_f32(*v0); } void VrndF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd_f64(*v0); } void Vrnd32XF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd32x_f32(*v0); } void Vrnd32XF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd32x_f64(*v0); } void Vrnd32XqF32(float32x4_t* r, float32x4_t* v0) { *r = vrnd32xq_f32(*v0); } void Vrnd32XqF64(float64x2_t* r, float64x2_t* v0) { *r = vrnd32xq_f64(*v0); } void Vrnd32ZF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd32z_f32(*v0); } void Vrnd32ZF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd32z_f64(*v0); } void Vrnd32ZqF32(float32x4_t* r, float32x4_t* v0) { *r = vrnd32zq_f32(*v0); } void Vrnd32ZqF64(float64x2_t* r, float64x2_t* v0) { *r = vrnd32zq_f64(*v0); } void Vrnd64XF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd64x_f32(*v0); } void Vrnd64XF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd64x_f64(*v0); } void Vrnd64XqF32(float32x4_t* r, float32x4_t* v0) { *r = vrnd64xq_f32(*v0); } void Vrnd64XqF64(float64x2_t* r, float64x2_t* v0) { *r = vrnd64xq_f64(*v0); } void Vrnd64ZF32(float32x2_t* r, float32x2_t* v0) { *r = vrnd64z_f32(*v0); } void Vrnd64ZF64(float64x1_t* r, float64x1_t* v0) { *r = vrnd64z_f64(*v0); } void Vrnd64ZqF32(float32x4_t* r, float32x4_t* v0) { *r = vrnd64zq_f32(*v0); } void Vrnd64ZqF64(float64x2_t* r, float64x2_t* v0) { *r = vrnd64zq_f64(*v0); } void VrndaF32(float32x2_t* r, float32x2_t* v0) { *r = vrnda_f32(*v0); } void VrndaF64(float64x1_t* r, float64x1_t* v0) { *r = vrnda_f64(*v0); } void VrndaqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndaq_f32(*v0); } void VrndaqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndaq_f64(*v0); } void VrndiF32(float32x2_t* r, float32x2_t* v0) { *r = vrndi_f32(*v0); } void VrndiF64(float64x1_t* r, float64x1_t* v0) { *r = vrndi_f64(*v0); } void VrndiqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndiq_f32(*v0); } void VrndiqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndiq_f64(*v0); } void VrndmF32(float32x2_t* r, float32x2_t* v0) { *r = vrndm_f32(*v0); } void VrndmF64(float64x1_t* r, float64x1_t* v0) { *r = vrndm_f64(*v0); } void VrndmqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndmq_f32(*v0); } void VrndmqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndmq_f64(*v0); } void VrndnF32(float32x2_t* r, float32x2_t* v0) { *r = vrndn_f32(*v0); } void VrndnF64(float64x1_t* r, float64x1_t* v0) { *r = vrndn_f64(*v0); } void VrndnqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndnq_f32(*v0); } void VrndnqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndnq_f64(*v0); } void VrndnsF32(float32_t* r, float32_t* v0) { *r = vrndns_f32(*v0); } void VrndpF32(float32x2_t* r, float32x2_t* v0) { *r = vrndp_f32(*v0); } void VrndpF64(float64x1_t* r, float64x1_t* v0) { *r = vrndp_f64(*v0); } void VrndpqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndpq_f32(*v0); } void VrndpqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndpq_f64(*v0); } void VrndqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndq_f32(*v0); } void VrndqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndq_f64(*v0); } void VrndxF32(float32x2_t* r, float32x2_t* v0) { *r = vrndx_f32(*v0); } void VrndxF64(float64x1_t* r, float64x1_t* v0) { *r = vrndx_f64(*v0); } void VrndxqF32(float32x4_t* r, float32x4_t* v0) { *r = vrndxq_f32(*v0); } void VrndxqF64(float64x2_t* r, float64x2_t* v0) { *r = vrndxq_f64(*v0); } void VrshlS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vrshl_s8(*v0, *v1); } void VrshlS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vrshl_s16(*v0, *v1); } void VrshlS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vrshl_s32(*v0, *v1); } void VrshlS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vrshl_s64(*v0, *v1); } void VrshlU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vrshl_u8(*v0, *v1); } void VrshlU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vrshl_u16(*v0, *v1); } void VrshlU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vrshl_u32(*v0, *v1); } void VrshlU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vrshl_u64(*v0, *v1); } void VrshldS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vrshld_s64(*v0, *v1); } void VrshldU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vrshld_u64(*v0, *v1); } void VrshlqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vrshlq_s8(*v0, *v1); } void VrshlqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vrshlq_s16(*v0, *v1); } void VrshlqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vrshlq_s32(*v0, *v1); } void VrshlqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vrshlq_s64(*v0, *v1); } void VrshlqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vrshlq_u8(*v0, *v1); } void VrshlqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vrshlq_u16(*v0, *v1); } void VrshlqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vrshlq_u32(*v0, *v1); } void VrshlqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vrshlq_u64(*v0, *v1); } void VrsqrteU32(uint32x2_t* r, uint32x2_t* v0) { *r = vrsqrte_u32(*v0); } void VrsqrteF32(float32x2_t* r, float32x2_t* v0) { *r = vrsqrte_f32(*v0); } void VrsqrteF64(float64x1_t* r, float64x1_t* v0) { *r = vrsqrte_f64(*v0); } void VrsqrtedF64(float64_t* r, float64_t* v0) { *r = vrsqrted_f64(*v0); } void VrsqrteqU32(uint32x4_t* r, uint32x4_t* v0) { *r = vrsqrteq_u32(*v0); } void VrsqrteqF32(float32x4_t* r, float32x4_t* v0) { *r = vrsqrteq_f32(*v0); } void VrsqrteqF64(float64x2_t* r, float64x2_t* v0) { *r = vrsqrteq_f64(*v0); } void VrsqrtesF32(float32_t* r, float32_t* v0) { *r = vrsqrtes_f32(*v0); } void VrsqrtsF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vrsqrts_f32(*v0, *v1); } void VrsqrtsF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vrsqrts_f64(*v0, *v1); } void VrsqrtsdF64(float64_t* r, float64_t* v0, float64_t* v1) { *r = vrsqrtsd_f64(*v0, *v1); } void VrsqrtsqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vrsqrtsq_f32(*v0, *v1); } void VrsqrtsqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vrsqrtsq_f64(*v0, *v1); } void VrsqrtssF32(float32_t* r, float32_t* v0, float32_t* v1) { *r = vrsqrtss_f32(*v0, *v1); } void VrsubhnS16(int8x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vrsubhn_s16(*v0, *v1); } void VrsubhnS32(int16x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vrsubhn_s32(*v0, *v1); } void VrsubhnS64(int32x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vrsubhn_s64(*v0, *v1); } void VrsubhnU16(uint8x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vrsubhn_u16(*v0, *v1); } void VrsubhnU32(uint16x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vrsubhn_u32(*v0, *v1); } void VrsubhnU64(uint32x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vrsubhn_u64(*v0, *v1); } void VrsubhnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vrsubhn_high_s16(*v0, *v1, *v2); } void VrsubhnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vrsubhn_high_s32(*v0, *v1, *v2); } void VrsubhnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vrsubhn_high_s64(*v0, *v1, *v2); } void VrsubhnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vrsubhn_high_u16(*v0, *v1, *v2); } void VrsubhnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vrsubhn_high_u32(*v0, *v1, *v2); } void VrsubhnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vrsubhn_high_u64(*v0, *v1, *v2); } void Vsha1CqU32(uint32x4_t* r, uint32x4_t* v0, uint32_t* v1, uint32x4_t* v2) { *r = vsha1cq_u32(*v0, *v1, *v2); } void Vsha1HU32(uint32_t* r, uint32_t* v0) { *r = vsha1h_u32(*v0); } void Vsha1MqU32(uint32x4_t* r, uint32x4_t* v0, uint32_t* v1, uint32x4_t* v2) { *r = vsha1mq_u32(*v0, *v1, *v2); } void Vsha1PqU32(uint32x4_t* r, uint32x4_t* v0, uint32_t* v1, uint32x4_t* v2) { *r = vsha1pq_u32(*v0, *v1, *v2); } void Vsha1Su0QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsha1su0q_u32(*v0, *v1, *v2); } void Vsha1Su1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsha1su1q_u32(*v0, *v1); } void Vsha256H2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsha256h2q_u32(*v0, *v1, *v2); } void Vsha256HqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsha256hq_u32(*v0, *v1, *v2); } void Vsha256Su0QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsha256su0q_u32(*v0, *v1); } void Vsha256Su1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsha256su1q_u32(*v0, *v1, *v2); } void Vsha512H2QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vsha512h2q_u64(*v0, *v1, *v2); } void Vsha512HqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vsha512hq_u64(*v0, *v1, *v2); } void Vsha512Su0QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vsha512su0q_u64(*v0, *v1); } void Vsha512Su1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vsha512su1q_u64(*v0, *v1, *v2); } void VshlS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vshl_s8(*v0, *v1); } void VshlS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vshl_s16(*v0, *v1); } void VshlS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vshl_s32(*v0, *v1); } void VshlS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vshl_s64(*v0, *v1); } void VshlU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vshl_u8(*v0, *v1); } void VshlU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vshl_u16(*v0, *v1); } void VshlU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vshl_u32(*v0, *v1); } void VshlU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vshl_u64(*v0, *v1); } void VshldS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vshld_s64(*v0, *v1); } void VshldU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vshld_u64(*v0, *v1); } void VshlqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vshlq_s8(*v0, *v1); } void VshlqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vshlq_s16(*v0, *v1); } void VshlqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vshlq_s32(*v0, *v1); } void VshlqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vshlq_s64(*v0, *v1); } void VshlqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vshlq_u8(*v0, *v1); } void VshlqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vshlq_u16(*v0, *v1); } void VshlqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vshlq_u32(*v0, *v1); } void VshlqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vshlq_u64(*v0, *v1); } void Vsm3Partw1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsm3partw1q_u32(*v0, *v1, *v2); } void Vsm3Partw2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsm3partw2q_u32(*v0, *v1, *v2); } void Vsm3Ss1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsm3ss1q_u32(*v0, *v1, *v2); } void Vsm4EkeyqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsm4ekeyq_u32(*v0, *v1); } void Vsm4EqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsm4eq_u32(*v0, *v1); } void VsqaddU8(uint8x8_t* r, uint8x8_t* v0, int8x8_t* v1) { *r = vsqadd_u8(*v0, *v1); } void VsqaddU16(uint16x4_t* r, uint16x4_t* v0, int16x4_t* v1) { *r = vsqadd_u16(*v0, *v1); } void VsqaddU32(uint32x2_t* r, uint32x2_t* v0, int32x2_t* v1) { *r = vsqadd_u32(*v0, *v1); } void VsqaddU64(uint64x1_t* r, uint64x1_t* v0, int64x1_t* v1) { *r = vsqadd_u64(*v0, *v1); } void VsqaddbU8(uint8_t* r, uint8_t* v0, int8_t* v1) { *r = vsqaddb_u8(*v0, *v1); } void VsqadddU64(uint64_t* r, uint64_t* v0, int64_t* v1) { *r = vsqaddd_u64(*v0, *v1); } void VsqaddhU16(uint16_t* r, uint16_t* v0, int16_t* v1) { *r = vsqaddh_u16(*v0, *v1); } void VsqaddqU8(uint8x16_t* r, uint8x16_t* v0, int8x16_t* v1) { *r = vsqaddq_u8(*v0, *v1); } void VsqaddqU16(uint16x8_t* r, uint16x8_t* v0, int16x8_t* v1) { *r = vsqaddq_u16(*v0, *v1); } void VsqaddqU32(uint32x4_t* r, uint32x4_t* v0, int32x4_t* v1) { *r = vsqaddq_u32(*v0, *v1); } void VsqaddqU64(uint64x2_t* r, uint64x2_t* v0, int64x2_t* v1) { *r = vsqaddq_u64(*v0, *v1); } void VsqaddsU32(uint32_t* r, uint32_t* v0, int32_t* v1) { *r = vsqadds_u32(*v0, *v1); } void VsqrtF32(float32x2_t* r, float32x2_t* v0) { *r = vsqrt_f32(*v0); } void VsqrtF64(float64x1_t* r, float64x1_t* v0) { *r = vsqrt_f64(*v0); } void VsqrtqF32(float32x4_t* r, float32x4_t* v0) { *r = vsqrtq_f32(*v0); } void VsqrtqF64(float64x2_t* r, float64x2_t* v0) { *r = vsqrtq_f64(*v0); } void VsubS8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vsub_s8(*v0, *v1); } void VsubS16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vsub_s16(*v0, *v1); } void VsubS32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vsub_s32(*v0, *v1); } void VsubS64(int64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vsub_s64(*v0, *v1); } void VsubU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vsub_u8(*v0, *v1); } void VsubU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vsub_u16(*v0, *v1); } void VsubU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vsub_u32(*v0, *v1); } void VsubU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vsub_u64(*v0, *v1); } void VsubF32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vsub_f32(*v0, *v1); } void VsubF64(float64x1_t* r, float64x1_t* v0, float64x1_t* v1) { *r = vsub_f64(*v0, *v1); } void VsubdS64(int64_t* r, int64_t* v0, int64_t* v1) { *r = vsubd_s64(*v0, *v1); } void VsubdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vsubd_u64(*v0, *v1); } void VsubhnS16(int8x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vsubhn_s16(*v0, *v1); } void VsubhnS32(int16x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vsubhn_s32(*v0, *v1); } void VsubhnS64(int32x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vsubhn_s64(*v0, *v1); } void VsubhnU16(uint8x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vsubhn_u16(*v0, *v1); } void VsubhnU32(uint16x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsubhn_u32(*v0, *v1); } void VsubhnU64(uint32x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vsubhn_u64(*v0, *v1); } void VsubhnHighS16(int8x16_t* r, int8x8_t* v0, int16x8_t* v1, int16x8_t* v2) { *r = vsubhn_high_s16(*v0, *v1, *v2); } void VsubhnHighS32(int16x8_t* r, int16x4_t* v0, int32x4_t* v1, int32x4_t* v2) { *r = vsubhn_high_s32(*v0, *v1, *v2); } void VsubhnHighS64(int32x4_t* r, int32x2_t* v0, int64x2_t* v1, int64x2_t* v2) { *r = vsubhn_high_s64(*v0, *v1, *v2); } void VsubhnHighU16(uint8x16_t* r, uint8x8_t* v0, uint16x8_t* v1, uint16x8_t* v2) { *r = vsubhn_high_u16(*v0, *v1, *v2); } void VsubhnHighU32(uint16x8_t* r, uint16x4_t* v0, uint32x4_t* v1, uint32x4_t* v2) { *r = vsubhn_high_u32(*v0, *v1, *v2); } void VsubhnHighU64(uint32x4_t* r, uint32x2_t* v0, uint64x2_t* v1, uint64x2_t* v2) { *r = vsubhn_high_u64(*v0, *v1, *v2); } void VsublS8(int16x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vsubl_s8(*v0, *v1); } void VsublS16(int32x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vsubl_s16(*v0, *v1); } void VsublS32(int64x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vsubl_s32(*v0, *v1); } void VsublU8(uint16x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vsubl_u8(*v0, *v1); } void VsublU16(uint32x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vsubl_u16(*v0, *v1); } void VsublU32(uint64x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vsubl_u32(*v0, *v1); } void VsublHighS8(int16x8_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vsubl_high_s8(*v0, *v1); } void VsublHighS16(int32x4_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vsubl_high_s16(*v0, *v1); } void VsublHighS32(int64x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vsubl_high_s32(*v0, *v1); } void VsublHighU8(uint16x8_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vsubl_high_u8(*v0, *v1); } void VsublHighU16(uint32x4_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vsubl_high_u16(*v0, *v1); } void VsublHighU32(uint64x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsubl_high_u32(*v0, *v1); } void VsubqS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vsubq_s8(*v0, *v1); } void VsubqS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vsubq_s16(*v0, *v1); } void VsubqS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vsubq_s32(*v0, *v1); } void VsubqS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vsubq_s64(*v0, *v1); } void VsubqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vsubq_u8(*v0, *v1); } void VsubqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vsubq_u16(*v0, *v1); } void VsubqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vsubq_u32(*v0, *v1); } void VsubqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vsubq_u64(*v0, *v1); } void VsubqF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vsubq_f32(*v0, *v1); } void VsubqF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vsubq_f64(*v0, *v1); } void VsubwS8(int16x8_t* r, int16x8_t* v0, int8x8_t* v1) { *r = vsubw_s8(*v0, *v1); } void VsubwS16(int32x4_t* r, int32x4_t* v0, int16x4_t* v1) { *r = vsubw_s16(*v0, *v1); } void VsubwS32(int64x2_t* r, int64x2_t* v0, int32x2_t* v1) { *r = vsubw_s32(*v0, *v1); } void VsubwU8(uint16x8_t* r, uint16x8_t* v0, uint8x8_t* v1) { *r = vsubw_u8(*v0, *v1); } void VsubwU16(uint32x4_t* r, uint32x4_t* v0, uint16x4_t* v1) { *r = vsubw_u16(*v0, *v1); } void VsubwU32(uint64x2_t* r, uint64x2_t* v0, uint32x2_t* v1) { *r = vsubw_u32(*v0, *v1); } void VsubwHighS8(int16x8_t* r, int16x8_t* v0, int8x16_t* v1) { *r = vsubw_high_s8(*v0, *v1); } void VsubwHighS16(int32x4_t* r, int32x4_t* v0, int16x8_t* v1) { *r = vsubw_high_s16(*v0, *v1); } void VsubwHighS32(int64x2_t* r, int64x2_t* v0, int32x4_t* v1) { *r = vsubw_high_s32(*v0, *v1); } void VsubwHighU8(uint16x8_t* r, uint16x8_t* v0, uint8x16_t* v1) { *r = vsubw_high_u8(*v0, *v1); } void VsubwHighU16(uint32x4_t* r, uint32x4_t* v0, uint16x8_t* v1) { *r = vsubw_high_u16(*v0, *v1); } void VsubwHighU32(uint64x2_t* r, uint64x2_t* v0, uint32x4_t* v1) { *r = vsubw_high_u32(*v0, *v1); } void Vtbl1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtbl1_s8(*v0, *v1); } void Vtbl1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtbl1_u8(*v0, *v1); } void Vtbl1P8(poly8x8_t* r, poly8x8_t* v0, uint8x8_t* v1) { *r = vtbl1_p8(*v0, *v1); } void Vtbl2S8(int8x8_t* r, int8x8x2_t* v0, int8x8_t* v1) { *r = vtbl2_s8(*v0, *v1); } void Vtbl2U8(uint8x8_t* r, uint8x8x2_t* v0, uint8x8_t* v1) { *r = vtbl2_u8(*v0, *v1); } void Vtbl2P8(poly8x8_t* r, poly8x8x2_t* v0, uint8x8_t* v1) { *r = vtbl2_p8(*v0, *v1); } void Vtbl3S8(int8x8_t* r, int8x8x3_t* v0, int8x8_t* v1) { *r = vtbl3_s8(*v0, *v1); } void Vtbl3U8(uint8x8_t* r, uint8x8x3_t* v0, uint8x8_t* v1) { *r = vtbl3_u8(*v0, *v1); } void Vtbl3P8(poly8x8_t* r, poly8x8x3_t* v0, uint8x8_t* v1) { *r = vtbl3_p8(*v0, *v1); } void Vtbl4S8(int8x8_t* r, int8x8x4_t* v0, int8x8_t* v1) { *r = vtbl4_s8(*v0, *v1); } void Vtbl4U8(uint8x8_t* r, uint8x8x4_t* v0, uint8x8_t* v1) { *r = vtbl4_u8(*v0, *v1); } void Vtbl4P8(poly8x8_t* r, poly8x8x4_t* v0, uint8x8_t* v1) { *r = vtbl4_p8(*v0, *v1); } void Vtbx1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1, int8x8_t* v2) { *r = vtbx1_s8(*v0, *v1, *v2); } void Vtbx1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1, uint8x8_t* v2) { *r = vtbx1_u8(*v0, *v1, *v2); } void Vtbx1P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1, uint8x8_t* v2) { *r = vtbx1_p8(*v0, *v1, *v2); } void Vtbx2S8(int8x8_t* r, int8x8_t* v0, int8x8x2_t* v1, int8x8_t* v2) { *r = vtbx2_s8(*v0, *v1, *v2); } void Vtbx2U8(uint8x8_t* r, uint8x8_t* v0, uint8x8x2_t* v1, uint8x8_t* v2) { *r = vtbx2_u8(*v0, *v1, *v2); } void Vtbx2P8(poly8x8_t* r, poly8x8_t* v0, poly8x8x2_t* v1, uint8x8_t* v2) { *r = vtbx2_p8(*v0, *v1, *v2); } void Vtbx3S8(int8x8_t* r, int8x8_t* v0, int8x8x3_t* v1, int8x8_t* v2) { *r = vtbx3_s8(*v0, *v1, *v2); } void Vtbx3U8(uint8x8_t* r, uint8x8_t* v0, uint8x8x3_t* v1, uint8x8_t* v2) { *r = vtbx3_u8(*v0, *v1, *v2); } void Vtbx3P8(poly8x8_t* r, poly8x8_t* v0, poly8x8x3_t* v1, uint8x8_t* v2) { *r = vtbx3_p8(*v0, *v1, *v2); } void Vtbx4S8(int8x8_t* r, int8x8_t* v0, int8x8x4_t* v1, int8x8_t* v2) { *r = vtbx4_s8(*v0, *v1, *v2); } void Vtbx4U8(uint8x8_t* r, uint8x8_t* v0, uint8x8x4_t* v1, uint8x8_t* v2) { *r = vtbx4_u8(*v0, *v1, *v2); } void Vtbx4P8(poly8x8_t* r, poly8x8_t* v0, poly8x8x4_t* v1, uint8x8_t* v2) { *r = vtbx4_p8(*v0, *v1, *v2); } void VtrnS8(int8x8x2_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtrn_s8(*v0, *v1); } void VtrnS16(int16x4x2_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vtrn_s16(*v0, *v1); } void VtrnS32(int32x2x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vtrn_s32(*v0, *v1); } void VtrnU8(uint8x8x2_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtrn_u8(*v0, *v1); } void VtrnU16(uint16x4x2_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vtrn_u16(*v0, *v1); } void VtrnU32(uint32x2x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vtrn_u32(*v0, *v1); } void VtrnF32(float32x2x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vtrn_f32(*v0, *v1); } void Vtrn1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtrn1_s8(*v0, *v1); } void Vtrn1S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vtrn1_s16(*v0, *v1); } void Vtrn1S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vtrn1_s32(*v0, *v1); } void Vtrn1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtrn1_u8(*v0, *v1); } void Vtrn1U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vtrn1_u16(*v0, *v1); } void Vtrn1U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vtrn1_u32(*v0, *v1); } void Vtrn1F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vtrn1_f32(*v0, *v1); } void Vtrn1P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vtrn1_p16(*v0, *v1); } void Vtrn1P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vtrn1_p8(*v0, *v1); } void Vtrn1QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vtrn1q_s8(*v0, *v1); } void Vtrn1QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vtrn1q_s16(*v0, *v1); } void Vtrn1QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vtrn1q_s32(*v0, *v1); } void Vtrn1QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vtrn1q_s64(*v0, *v1); } void Vtrn1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vtrn1q_u8(*v0, *v1); } void Vtrn1QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vtrn1q_u16(*v0, *v1); } void Vtrn1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vtrn1q_u32(*v0, *v1); } void Vtrn1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vtrn1q_u64(*v0, *v1); } void Vtrn1QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vtrn1q_f32(*v0, *v1); } void Vtrn1QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vtrn1q_f64(*v0, *v1); } void Vtrn1QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vtrn1q_p16(*v0, *v1); } void Vtrn1QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vtrn1q_p64(*v0, *v1); } void Vtrn1QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vtrn1q_p8(*v0, *v1); } void Vtrn2S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtrn2_s8(*v0, *v1); } void Vtrn2S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vtrn2_s16(*v0, *v1); } void Vtrn2S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vtrn2_s32(*v0, *v1); } void Vtrn2U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtrn2_u8(*v0, *v1); } void Vtrn2U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vtrn2_u16(*v0, *v1); } void Vtrn2U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vtrn2_u32(*v0, *v1); } void Vtrn2F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vtrn2_f32(*v0, *v1); } void Vtrn2P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vtrn2_p16(*v0, *v1); } void Vtrn2P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vtrn2_p8(*v0, *v1); } void Vtrn2QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vtrn2q_s8(*v0, *v1); } void Vtrn2QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vtrn2q_s16(*v0, *v1); } void Vtrn2QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vtrn2q_s32(*v0, *v1); } void Vtrn2QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vtrn2q_s64(*v0, *v1); } void Vtrn2QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vtrn2q_u8(*v0, *v1); } void Vtrn2QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vtrn2q_u16(*v0, *v1); } void Vtrn2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vtrn2q_u32(*v0, *v1); } void Vtrn2QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vtrn2q_u64(*v0, *v1); } void Vtrn2QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vtrn2q_f32(*v0, *v1); } void Vtrn2QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vtrn2q_f64(*v0, *v1); } void Vtrn2QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vtrn2q_p16(*v0, *v1); } void Vtrn2QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vtrn2q_p64(*v0, *v1); } void Vtrn2QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vtrn2q_p8(*v0, *v1); } void VtrnP16(poly16x4x2_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vtrn_p16(*v0, *v1); } void VtrnP8(poly8x8x2_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vtrn_p8(*v0, *v1); } void VtrnqS8(int8x16x2_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vtrnq_s8(*v0, *v1); } void VtrnqS16(int16x8x2_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vtrnq_s16(*v0, *v1); } void VtrnqS32(int32x4x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vtrnq_s32(*v0, *v1); } void VtrnqU8(uint8x16x2_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vtrnq_u8(*v0, *v1); } void VtrnqU16(uint16x8x2_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vtrnq_u16(*v0, *v1); } void VtrnqU32(uint32x4x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vtrnq_u32(*v0, *v1); } void VtrnqF32(float32x4x2_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vtrnq_f32(*v0, *v1); } void VtrnqP16(poly16x8x2_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vtrnq_p16(*v0, *v1); } void VtrnqP8(poly8x16x2_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vtrnq_p8(*v0, *v1); } void VtstS8(uint8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vtst_s8(*v0, *v1); } void VtstS16(uint16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vtst_s16(*v0, *v1); } void VtstS32(uint32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vtst_s32(*v0, *v1); } void VtstS64(uint64x1_t* r, int64x1_t* v0, int64x1_t* v1) { *r = vtst_s64(*v0, *v1); } void VtstU8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vtst_u8(*v0, *v1); } void VtstU16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vtst_u16(*v0, *v1); } void VtstU32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vtst_u32(*v0, *v1); } void VtstU64(uint64x1_t* r, uint64x1_t* v0, uint64x1_t* v1) { *r = vtst_u64(*v0, *v1); } void VtstP16(uint16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vtst_p16(*v0, *v1); } void VtstP64(uint64x1_t* r, poly64x1_t* v0, poly64x1_t* v1) { *r = vtst_p64(*v0, *v1); } void VtstP8(uint8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vtst_p8(*v0, *v1); } void VtstdS64(uint64_t* r, int64_t* v0, int64_t* v1) { *r = vtstd_s64(*v0, *v1); } void VtstdU64(uint64_t* r, uint64_t* v0, uint64_t* v1) { *r = vtstd_u64(*v0, *v1); } void VtstqS8(uint8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vtstq_s8(*v0, *v1); } void VtstqS16(uint16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vtstq_s16(*v0, *v1); } void VtstqS32(uint32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vtstq_s32(*v0, *v1); } void VtstqS64(uint64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vtstq_s64(*v0, *v1); } void VtstqU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vtstq_u8(*v0, *v1); } void VtstqU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vtstq_u16(*v0, *v1); } void VtstqU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vtstq_u32(*v0, *v1); } void VtstqU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vtstq_u64(*v0, *v1); } void VtstqP16(uint16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vtstq_p16(*v0, *v1); } void VtstqP64(uint64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vtstq_p64(*v0, *v1); } void VtstqP8(uint8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vtstq_p8(*v0, *v1); } void VuqaddS8(int8x8_t* r, int8x8_t* v0, uint8x8_t* v1) { *r = vuqadd_s8(*v0, *v1); } void VuqaddS16(int16x4_t* r, int16x4_t* v0, uint16x4_t* v1) { *r = vuqadd_s16(*v0, *v1); } void VuqaddS32(int32x2_t* r, int32x2_t* v0, uint32x2_t* v1) { *r = vuqadd_s32(*v0, *v1); } void VuqaddS64(int64x1_t* r, int64x1_t* v0, uint64x1_t* v1) { *r = vuqadd_s64(*v0, *v1); } void VuqaddbS8(int8_t* r, int8_t* v0, uint8_t* v1) { *r = vuqaddb_s8(*v0, *v1); } void VuqadddS64(int64_t* r, int64_t* v0, uint64_t* v1) { *r = vuqaddd_s64(*v0, *v1); } void VuqaddhS16(int16_t* r, int16_t* v0, uint16_t* v1) { *r = vuqaddh_s16(*v0, *v1); } void VuqaddqS8(int8x16_t* r, int8x16_t* v0, uint8x16_t* v1) { *r = vuqaddq_s8(*v0, *v1); } void VuqaddqS16(int16x8_t* r, int16x8_t* v0, uint16x8_t* v1) { *r = vuqaddq_s16(*v0, *v1); } void VuqaddqS32(int32x4_t* r, int32x4_t* v0, uint32x4_t* v1) { *r = vuqaddq_s32(*v0, *v1); } void VuqaddqS64(int64x2_t* r, int64x2_t* v0, uint64x2_t* v1) { *r = vuqaddq_s64(*v0, *v1); } void VuqaddsS32(int32_t* r, int32_t* v0, uint32_t* v1) { *r = vuqadds_s32(*v0, *v1); } void VusdotS32(int32x2_t* r, int32x2_t* v0, uint8x8_t* v1, int8x8_t* v2) { *r = vusdot_s32(*v0, *v1, *v2); } void VusdotqS32(int32x4_t* r, int32x4_t* v0, uint8x16_t* v1, int8x16_t* v2) { *r = vusdotq_s32(*v0, *v1, *v2); } void VusmmlaqS32(int32x4_t* r, int32x4_t* v0, uint8x16_t* v1, int8x16_t* v2) { *r = vusmmlaq_s32(*v0, *v1, *v2); } void VuzpS8(int8x8x2_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vuzp_s8(*v0, *v1); } void VuzpS16(int16x4x2_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vuzp_s16(*v0, *v1); } void VuzpS32(int32x2x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vuzp_s32(*v0, *v1); } void VuzpU8(uint8x8x2_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vuzp_u8(*v0, *v1); } void VuzpU16(uint16x4x2_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vuzp_u16(*v0, *v1); } void VuzpU32(uint32x2x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vuzp_u32(*v0, *v1); } void VuzpF32(float32x2x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vuzp_f32(*v0, *v1); } void Vuzp1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vuzp1_s8(*v0, *v1); } void Vuzp1S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vuzp1_s16(*v0, *v1); } void Vuzp1S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vuzp1_s32(*v0, *v1); } void Vuzp1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vuzp1_u8(*v0, *v1); } void Vuzp1U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vuzp1_u16(*v0, *v1); } void Vuzp1U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vuzp1_u32(*v0, *v1); } void Vuzp1F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vuzp1_f32(*v0, *v1); } void Vuzp1P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vuzp1_p16(*v0, *v1); } void Vuzp1P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vuzp1_p8(*v0, *v1); } void Vuzp1QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vuzp1q_s8(*v0, *v1); } void Vuzp1QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vuzp1q_s16(*v0, *v1); } void Vuzp1QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vuzp1q_s32(*v0, *v1); } void Vuzp1QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vuzp1q_s64(*v0, *v1); } void Vuzp1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vuzp1q_u8(*v0, *v1); } void Vuzp1QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vuzp1q_u16(*v0, *v1); } void Vuzp1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vuzp1q_u32(*v0, *v1); } void Vuzp1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vuzp1q_u64(*v0, *v1); } void Vuzp1QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vuzp1q_f32(*v0, *v1); } void Vuzp1QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vuzp1q_f64(*v0, *v1); } void Vuzp1QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vuzp1q_p16(*v0, *v1); } void Vuzp1QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vuzp1q_p64(*v0, *v1); } void Vuzp1QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vuzp1q_p8(*v0, *v1); } void Vuzp2S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vuzp2_s8(*v0, *v1); } void Vuzp2S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vuzp2_s16(*v0, *v1); } void Vuzp2S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vuzp2_s32(*v0, *v1); } void Vuzp2U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vuzp2_u8(*v0, *v1); } void Vuzp2U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vuzp2_u16(*v0, *v1); } void Vuzp2U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vuzp2_u32(*v0, *v1); } void Vuzp2F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vuzp2_f32(*v0, *v1); } void Vuzp2P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vuzp2_p16(*v0, *v1); } void Vuzp2P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vuzp2_p8(*v0, *v1); } void Vuzp2QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vuzp2q_s8(*v0, *v1); } void Vuzp2QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vuzp2q_s16(*v0, *v1); } void Vuzp2QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vuzp2q_s32(*v0, *v1); } void Vuzp2QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vuzp2q_s64(*v0, *v1); } void Vuzp2QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vuzp2q_u8(*v0, *v1); } void Vuzp2QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vuzp2q_u16(*v0, *v1); } void Vuzp2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vuzp2q_u32(*v0, *v1); } void Vuzp2QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vuzp2q_u64(*v0, *v1); } void Vuzp2QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vuzp2q_f32(*v0, *v1); } void Vuzp2QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vuzp2q_f64(*v0, *v1); } void Vuzp2QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vuzp2q_p16(*v0, *v1); } void Vuzp2QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vuzp2q_p64(*v0, *v1); } void Vuzp2QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vuzp2q_p8(*v0, *v1); } void VuzpP16(poly16x4x2_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vuzp_p16(*v0, *v1); } void VuzpP8(poly8x8x2_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vuzp_p8(*v0, *v1); } void VuzpqS8(int8x16x2_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vuzpq_s8(*v0, *v1); } void VuzpqS16(int16x8x2_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vuzpq_s16(*v0, *v1); } void VuzpqS32(int32x4x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vuzpq_s32(*v0, *v1); } void VuzpqU8(uint8x16x2_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vuzpq_u8(*v0, *v1); } void VuzpqU16(uint16x8x2_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vuzpq_u16(*v0, *v1); } void VuzpqU32(uint32x4x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vuzpq_u32(*v0, *v1); } void VuzpqF32(float32x4x2_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vuzpq_f32(*v0, *v1); } void VuzpqP16(poly16x8x2_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vuzpq_p16(*v0, *v1); } void VuzpqP8(poly8x16x2_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vuzpq_p8(*v0, *v1); } void VzipS8(int8x8x2_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vzip_s8(*v0, *v1); } void VzipS16(int16x4x2_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vzip_s16(*v0, *v1); } void VzipS32(int32x2x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vzip_s32(*v0, *v1); } void VzipU8(uint8x8x2_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vzip_u8(*v0, *v1); } void VzipU16(uint16x4x2_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vzip_u16(*v0, *v1); } void VzipU32(uint32x2x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vzip_u32(*v0, *v1); } void VzipF32(float32x2x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vzip_f32(*v0, *v1); } void Vzip1S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vzip1_s8(*v0, *v1); } void Vzip1S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vzip1_s16(*v0, *v1); } void Vzip1S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vzip1_s32(*v0, *v1); } void Vzip1U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vzip1_u8(*v0, *v1); } void Vzip1U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vzip1_u16(*v0, *v1); } void Vzip1U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vzip1_u32(*v0, *v1); } void Vzip1F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vzip1_f32(*v0, *v1); } void Vzip1P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vzip1_p16(*v0, *v1); } void Vzip1P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vzip1_p8(*v0, *v1); } void Vzip1QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vzip1q_s8(*v0, *v1); } void Vzip1QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vzip1q_s16(*v0, *v1); } void Vzip1QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vzip1q_s32(*v0, *v1); } void Vzip1QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vzip1q_s64(*v0, *v1); } void Vzip1QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vzip1q_u8(*v0, *v1); } void Vzip1QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vzip1q_u16(*v0, *v1); } void Vzip1QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vzip1q_u32(*v0, *v1); } void Vzip1QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vzip1q_u64(*v0, *v1); } void Vzip1QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vzip1q_f32(*v0, *v1); } void Vzip1QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vzip1q_f64(*v0, *v1); } void Vzip1QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vzip1q_p16(*v0, *v1); } void Vzip1QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vzip1q_p64(*v0, *v1); } void Vzip1QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vzip1q_p8(*v0, *v1); } void Vzip2S8(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vzip2_s8(*v0, *v1); } void Vzip2S16(int16x4_t* r, int16x4_t* v0, int16x4_t* v1) { *r = vzip2_s16(*v0, *v1); } void Vzip2S32(int32x2_t* r, int32x2_t* v0, int32x2_t* v1) { *r = vzip2_s32(*v0, *v1); } void Vzip2U8(uint8x8_t* r, uint8x8_t* v0, uint8x8_t* v1) { *r = vzip2_u8(*v0, *v1); } void Vzip2U16(uint16x4_t* r, uint16x4_t* v0, uint16x4_t* v1) { *r = vzip2_u16(*v0, *v1); } void Vzip2U32(uint32x2_t* r, uint32x2_t* v0, uint32x2_t* v1) { *r = vzip2_u32(*v0, *v1); } void Vzip2F32(float32x2_t* r, float32x2_t* v0, float32x2_t* v1) { *r = vzip2_f32(*v0, *v1); } void Vzip2P16(poly16x4_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vzip2_p16(*v0, *v1); } void Vzip2P8(poly8x8_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vzip2_p8(*v0, *v1); } void Vzip2QS8(int8x16_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vzip2q_s8(*v0, *v1); } void Vzip2QS16(int16x8_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vzip2q_s16(*v0, *v1); } void Vzip2QS32(int32x4_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vzip2q_s32(*v0, *v1); } void Vzip2QS64(int64x2_t* r, int64x2_t* v0, int64x2_t* v1) { *r = vzip2q_s64(*v0, *v1); } void Vzip2QU8(uint8x16_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vzip2q_u8(*v0, *v1); } void Vzip2QU16(uint16x8_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vzip2q_u16(*v0, *v1); } void Vzip2QU32(uint32x4_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vzip2q_u32(*v0, *v1); } void Vzip2QU64(uint64x2_t* r, uint64x2_t* v0, uint64x2_t* v1) { *r = vzip2q_u64(*v0, *v1); } void Vzip2QF32(float32x4_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vzip2q_f32(*v0, *v1); } void Vzip2QF64(float64x2_t* r, float64x2_t* v0, float64x2_t* v1) { *r = vzip2q_f64(*v0, *v1); } void Vzip2QP16(poly16x8_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vzip2q_p16(*v0, *v1); } void Vzip2QP64(poly64x2_t* r, poly64x2_t* v0, poly64x2_t* v1) { *r = vzip2q_p64(*v0, *v1); } void Vzip2QP8(poly8x16_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vzip2q_p8(*v0, *v1); } void VzipP16(poly16x4x2_t* r, poly16x4_t* v0, poly16x4_t* v1) { *r = vzip_p16(*v0, *v1); } void VzipP8(poly8x8x2_t* r, poly8x8_t* v0, poly8x8_t* v1) { *r = vzip_p8(*v0, *v1); } void VzipqS8(int8x16x2_t* r, int8x16_t* v0, int8x16_t* v1) { *r = vzipq_s8(*v0, *v1); } void VzipqS16(int16x8x2_t* r, int16x8_t* v0, int16x8_t* v1) { *r = vzipq_s16(*v0, *v1); } void VzipqS32(int32x4x2_t* r, int32x4_t* v0, int32x4_t* v1) { *r = vzipq_s32(*v0, *v1); } void VzipqU8(uint8x16x2_t* r, uint8x16_t* v0, uint8x16_t* v1) { *r = vzipq_u8(*v0, *v1); } void VzipqU16(uint16x8x2_t* r, uint16x8_t* v0, uint16x8_t* v1) { *r = vzipq_u16(*v0, *v1); } void VzipqU32(uint32x4x2_t* r, uint32x4_t* v0, uint32x4_t* v1) { *r = vzipq_u32(*v0, *v1); } void VzipqF32(float32x4x2_t* r, float32x4_t* v0, float32x4_t* v1) { *r = vzipq_f32(*v0, *v1); } void VzipqP16(poly16x8x2_t* r, poly16x8_t* v0, poly16x8_t* v1) { *r = vzipq_p16(*v0, *v1); } void VzipqP8(poly8x16x2_t* r, poly8x16_t* v0, poly8x16_t* v1) { *r = vzipq_p8(*v0, *v1); } ================================================ FILE: arm/neon/functions.go ================================================ package neon import ( "github.com/alivanz/go-simd/arm" ) /* #include */ import "C" // Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register. // //go:linkname VabaS8 VabaS8 //go:noescape func VabaS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8, v2 *arm.Int8X8) // Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register. // //go:linkname VabaS16 VabaS16 //go:noescape func VabaS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4) // Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register. // //go:linkname VabaS32 VabaS32 //go:noescape func VabaS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2) // Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register. // //go:linkname VabaU8 VabaU8 //go:noescape func VabaU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8) // Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register. // //go:linkname VabaU16 VabaU16 //go:noescape func VabaU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4) // Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register. // //go:linkname VabaU32 VabaU32 //go:noescape func VabaU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2) // Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VabalS8 VabalS8 //go:noescape func VabalS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8, v2 *arm.Int8X8) // Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VabalS16 VabalS16 //go:noescape func VabalS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4) // Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VabalS32 VabalS32 //go:noescape func VabalS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2) // Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VabalU8 VabalU8 //go:noescape func VabalU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8) // Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VabalU16 VabalU16 //go:noescape func VabalU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4) // Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VabalU32 VabalU32 //go:noescape func VabalU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2) // Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VabalHighS8 VabalHighS8 //go:noescape func VabalHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16, v2 *arm.Int8X16) // Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VabalHighS16 VabalHighS16 //go:noescape func VabalHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8) // Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VabalHighS32 VabalHighS32 //go:noescape func VabalHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4) // Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VabalHighU8 VabalHighU8 //go:noescape func VabalHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16, v2 *arm.Uint8X16) // Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VabalHighU16 VabalHighU16 //go:noescape func VabalHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16X8) // Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VabalHighU32 VabalHighU32 //go:noescape func VabalHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register. // //go:linkname VabaqS8 VabaqS8 //go:noescape func VabaqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16) // Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register. // //go:linkname VabaqS16 VabaqS16 //go:noescape func VabaqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8) // Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register. // //go:linkname VabaqS32 VabaqS32 //go:noescape func VabaqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4) // Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register. // //go:linkname VabaqU8 VabaqU8 //go:noescape func VabaqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16) // Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register. // //go:linkname VabaqU16 VabaqU16 //go:noescape func VabaqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8) // Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the absolute values of the results into the elements of the vector of the destination SIMD&FP register. // //go:linkname VabaqU32 VabaqU32 //go:noescape func VabaqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdS8 VabdS8 //go:noescape func VabdS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdS16 VabdS16 //go:noescape func VabdS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdS32 VabdS32 //go:noescape func VabdS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdU8 VabdU8 //go:noescape func VabdU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdU16 VabdU16 //go:noescape func VabdU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdU32 VabdU32 //go:noescape func VabdU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdF32 VabdF32 //go:noescape func VabdF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdF64 VabdF64 //go:noescape func VabdF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabddF64 VabddF64 //go:noescape func VabddF64(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64) // Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VabdlS8 VabdlS8 //go:noescape func VabdlS8(r *arm.Int16X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VabdlS16 VabdlS16 //go:noescape func VabdlS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VabdlS32 VabdlS32 //go:noescape func VabdlS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VabdlU8 VabdlU8 //go:noescape func VabdlU8(r *arm.Uint16X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VabdlU16 VabdlU16 //go:noescape func VabdlU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VabdlU32 VabdlU32 //go:noescape func VabdlU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VabdlHighS8 VabdlHighS8 //go:noescape func VabdlHighS8(r *arm.Int16X8, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VabdlHighS16 VabdlHighS16 //go:noescape func VabdlHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VabdlHighS32 VabdlHighS32 //go:noescape func VabdlHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VabdlHighU8 VabdlHighU8 //go:noescape func VabdlHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VabdlHighU16 VabdlHighU16 //go:noescape func VabdlHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VabdlHighU32 VabdlHighU32 //go:noescape func VabdlHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqS8 VabdqS8 //go:noescape func VabdqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqS16 VabdqS16 //go:noescape func VabdqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqS32 VabdqS32 //go:noescape func VabdqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqU8 VabdqU8 //go:noescape func VabdqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqU16 VabdqU16 //go:noescape func VabdqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqU32 VabdqU32 //go:noescape func VabdqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqF32 VabdqF32 //go:noescape func VabdqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqF64 VabdqF64 //go:noescape func VabdqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdsF32 VabdsF32 //go:noescape func VabdsF32(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsS8 VabsS8 //go:noescape func VabsS8(r *arm.Int8X8, v0 *arm.Int8X8) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsS16 VabsS16 //go:noescape func VabsS16(r *arm.Int16X4, v0 *arm.Int16X4) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsS32 VabsS32 //go:noescape func VabsS32(r *arm.Int32X2, v0 *arm.Int32X2) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsS64 VabsS64 //go:noescape func VabsS64(r *arm.Int64X1, v0 *arm.Int64X1) // Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsF32 VabsF32 //go:noescape func VabsF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsF64 VabsF64 //go:noescape func VabsF64(r *arm.Float64X1, v0 *arm.Float64X1) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsdS64 VabsdS64 //go:noescape func VabsdS64(r *arm.Int64, v0 *arm.Int64) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsqS8 VabsqS8 //go:noescape func VabsqS8(r *arm.Int8X16, v0 *arm.Int8X16) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsqS16 VabsqS16 //go:noescape func VabsqS16(r *arm.Int16X8, v0 *arm.Int16X8) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsqS32 VabsqS32 //go:noescape func VabsqS32(r *arm.Int32X4, v0 *arm.Int32X4) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsqS64 VabsqS64 //go:noescape func VabsqS64(r *arm.Int64X2, v0 *arm.Int64X2) // Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsqF32 VabsqF32 //go:noescape func VabsqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsqF64 VabsqF64 //go:noescape func VabsqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddS8 VaddS8 //go:noescape func VaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddS16 VaddS16 //go:noescape func VaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddS32 VaddS32 //go:noescape func VaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddS64 VaddS64 //go:noescape func VaddS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddU8 VaddU8 //go:noescape func VaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddU16 VaddU16 //go:noescape func VaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddU32 VaddU32 //go:noescape func VaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddU64 VaddU64 //go:noescape func VaddU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VaddF32 VaddF32 //go:noescape func VaddF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VaddF64 VaddF64 //go:noescape func VaddF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VaddP16 VaddP16 //go:noescape func VaddP16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VaddP64 VaddP64 //go:noescape func VaddP64(r *arm.Poly64X1, v0 *arm.Poly64X1, v1 *arm.Poly64X1) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VaddP8 VaddP8 //go:noescape func VaddP8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VadddS64 VadddS64 //go:noescape func VadddS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VadddU64 VadddU64 //go:noescape func VadddU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64) // Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VaddhnS16 VaddhnS16 //go:noescape func VaddhnS16(r *arm.Int8X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VaddhnS32 VaddhnS32 //go:noescape func VaddhnS32(r *arm.Int16X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VaddhnS64 VaddhnS64 //go:noescape func VaddhnS64(r *arm.Int32X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VaddhnU16 VaddhnU16 //go:noescape func VaddhnU16(r *arm.Uint8X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VaddhnU32 VaddhnU32 //go:noescape func VaddhnU32(r *arm.Uint16X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VaddhnU64 VaddhnU64 //go:noescape func VaddhnU64(r *arm.Uint32X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VaddhnHighS16 VaddhnHighS16 //go:noescape func VaddhnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8, v2 *arm.Int16X8) // Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VaddhnHighS32 VaddhnHighS32 //go:noescape func VaddhnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4, v2 *arm.Int32X4) // Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VaddhnHighS64 VaddhnHighS64 //go:noescape func VaddhnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2, v2 *arm.Int64X2) // Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VaddhnHighU16 VaddhnHighU16 //go:noescape func VaddhnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8) // Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VaddhnHighU32 VaddhnHighU32 //go:noescape func VaddhnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VaddhnHighU64 VaddhnHighU64 //go:noescape func VaddhnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2) // Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VaddlS8 VaddlS8 //go:noescape func VaddlS8(r *arm.Int16X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VaddlS16 VaddlS16 //go:noescape func VaddlS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VaddlS32 VaddlS32 //go:noescape func VaddlS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VaddlU8 VaddlU8 //go:noescape func VaddlU8(r *arm.Uint16X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VaddlU16 VaddlU16 //go:noescape func VaddlU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VaddlU32 VaddlU32 //go:noescape func VaddlU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VaddlHighS8 VaddlHighS8 //go:noescape func VaddlHighS8(r *arm.Int16X8, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VaddlHighS16 VaddlHighS16 //go:noescape func VaddlHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VaddlHighS32 VaddlHighS32 //go:noescape func VaddlHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VaddlHighU8 VaddlHighU8 //go:noescape func VaddlHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VaddlHighU16 VaddlHighU16 //go:noescape func VaddlHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VaddlHighU32 VaddlHighU32 //go:noescape func VaddlHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VaddlvS8 VaddlvS8 //go:noescape func VaddlvS8(r *arm.Int16, v0 *arm.Int8X8) // Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VaddlvS16 VaddlvS16 //go:noescape func VaddlvS16(r *arm.Int32, v0 *arm.Int16X4) // Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VaddlvS32 VaddlvS32 //go:noescape func VaddlvS32(r *arm.Int64, v0 *arm.Int32X2) // Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VaddlvU8 VaddlvU8 //go:noescape func VaddlvU8(r *arm.Uint16, v0 *arm.Uint8X8) // Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VaddlvU16 VaddlvU16 //go:noescape func VaddlvU16(r *arm.Uint32, v0 *arm.Uint16X4) // Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VaddlvU32 VaddlvU32 //go:noescape func VaddlvU32(r *arm.Uint64, v0 *arm.Uint32X2) // Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VaddlvqS8 VaddlvqS8 //go:noescape func VaddlvqS8(r *arm.Int16, v0 *arm.Int8X16) // Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VaddlvqS16 VaddlvqS16 //go:noescape func VaddlvqS16(r *arm.Int32, v0 *arm.Int16X8) // Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VaddlvqS32 VaddlvqS32 //go:noescape func VaddlvqS32(r *arm.Int64, v0 *arm.Int32X4) // Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VaddlvqU8 VaddlvqU8 //go:noescape func VaddlvqU8(r *arm.Uint16, v0 *arm.Uint8X16) // Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VaddlvqU16 VaddlvqU16 //go:noescape func VaddlvqU16(r *arm.Uint32, v0 *arm.Uint16X8) // Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source vector elements. All the values in this instruction are unsigned integer values. // //go:linkname VaddlvqU32 VaddlvqU32 //go:noescape func VaddlvqU32(r *arm.Uint64, v0 *arm.Uint32X4) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqS8 VaddqS8 //go:noescape func VaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqS16 VaddqS16 //go:noescape func VaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqS32 VaddqS32 //go:noescape func VaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqS64 VaddqS64 //go:noescape func VaddqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqU8 VaddqU8 //go:noescape func VaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqU16 VaddqU16 //go:noescape func VaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqU32 VaddqU32 //go:noescape func VaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqU64 VaddqU64 //go:noescape func VaddqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VaddqF32 VaddqF32 //go:noescape func VaddqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VaddqF64 VaddqF64 //go:noescape func VaddqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VaddqP128 VaddqP128 //go:noescape func VaddqP128(r *arm.Poly128, v0 *arm.Poly128, v1 *arm.Poly128) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VaddqP16 VaddqP16 //go:noescape func VaddqP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VaddqP64 VaddqP64 //go:noescape func VaddqP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VaddqP8 VaddqP8 //go:noescape func VaddqP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvS8 VaddvS8 //go:noescape func VaddvS8(r *arm.Int8, v0 *arm.Int8X8) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvS16 VaddvS16 //go:noescape func VaddvS16(r *arm.Int16, v0 *arm.Int16X4) // Add across vector // //go:linkname VaddvS32 VaddvS32 //go:noescape func VaddvS32(r *arm.Int32, v0 *arm.Int32X2) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvU8 VaddvU8 //go:noescape func VaddvU8(r *arm.Uint8, v0 *arm.Uint8X8) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvU16 VaddvU16 //go:noescape func VaddvU16(r *arm.Uint16, v0 *arm.Uint16X4) // Add across vector // //go:linkname VaddvU32 VaddvU32 //go:noescape func VaddvU32(r *arm.Uint32, v0 *arm.Uint32X2) // Floating-point add across vector // //go:linkname VaddvF32 VaddvF32 //go:noescape func VaddvF32(r *arm.Float32, v0 *arm.Float32X2) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvqS8 VaddvqS8 //go:noescape func VaddvqS8(r *arm.Int8, v0 *arm.Int8X16) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvqS16 VaddvqS16 //go:noescape func VaddvqS16(r *arm.Int16, v0 *arm.Int16X8) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvqS32 VaddvqS32 //go:noescape func VaddvqS32(r *arm.Int32, v0 *arm.Int32X4) // Add across vector // //go:linkname VaddvqS64 VaddvqS64 //go:noescape func VaddvqS64(r *arm.Int64, v0 *arm.Int64X2) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvqU8 VaddvqU8 //go:noescape func VaddvqU8(r *arm.Uint8, v0 *arm.Uint8X16) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvqU16 VaddvqU16 //go:noescape func VaddvqU16(r *arm.Uint16, v0 *arm.Uint16X8) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvqU32 VaddvqU32 //go:noescape func VaddvqU32(r *arm.Uint32, v0 *arm.Uint32X4) // Add across vector // //go:linkname VaddvqU64 VaddvqU64 //go:noescape func VaddvqU64(r *arm.Uint64, v0 *arm.Uint64X2) // Floating-point add across vector // //go:linkname VaddvqF32 VaddvqF32 //go:noescape func VaddvqF32(r *arm.Float32, v0 *arm.Float32X4) // Floating-point add across vector // //go:linkname VaddvqF64 VaddvqF64 //go:noescape func VaddvqF64(r *arm.Float64, v0 *arm.Float64X2) // Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register. // //go:linkname VaddwS8 VaddwS8 //go:noescape func VaddwS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8) // Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register. // //go:linkname VaddwS16 VaddwS16 //go:noescape func VaddwS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4) // Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register. // //go:linkname VaddwS32 VaddwS32 //go:noescape func VaddwS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2) // Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values. // //go:linkname VaddwU8 VaddwU8 //go:noescape func VaddwU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8) // Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values. // //go:linkname VaddwU16 VaddwU16 //go:noescape func VaddwU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4) // Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values. // //go:linkname VaddwU32 VaddwU32 //go:noescape func VaddwU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2) // Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register. // //go:linkname VaddwHighS8 VaddwHighS8 //go:noescape func VaddwHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16) // Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register. // //go:linkname VaddwHighS16 VaddwHighS16 //go:noescape func VaddwHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8) // Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and writes the vector to the SIMD&FP destination register. // //go:linkname VaddwHighS32 VaddwHighS32 //go:noescape func VaddwHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4) // Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values. // //go:linkname VaddwHighU8 VaddwHighU8 //go:noescape func VaddwHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16) // Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values. // //go:linkname VaddwHighU16 VaddwHighU16 //go:noescape func VaddwHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8) // Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register and the first source register are twice as long as the vector elements of the second source register. All the values in this instruction are unsigned integer values. // //go:linkname VaddwHighU32 VaddwHighU32 //go:noescape func VaddwHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4) // AES single round decryption. // //go:linkname VaesdqU8 VaesdqU8 //go:noescape func VaesdqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // AES single round encryption. // //go:linkname VaeseqU8 VaeseqU8 //go:noescape func VaeseqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // AES inverse mix columns. // //go:linkname VaesimcqU8 VaesimcqU8 //go:noescape func VaesimcqU8(r *arm.Uint8X16, v0 *arm.Uint8X16) // AES mix columns. // //go:linkname VaesmcqU8 VaesmcqU8 //go:noescape func VaesmcqU8(r *arm.Uint8X16, v0 *arm.Uint8X16) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandS8 VandS8 //go:noescape func VandS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandS16 VandS16 //go:noescape func VandS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandS32 VandS32 //go:noescape func VandS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandS64 VandS64 //go:noescape func VandS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandU8 VandU8 //go:noescape func VandU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandU16 VandU16 //go:noescape func VandU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandU32 VandU32 //go:noescape func VandU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandU64 VandU64 //go:noescape func VandU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqS8 VandqS8 //go:noescape func VandqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqS16 VandqS16 //go:noescape func VandqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqS32 VandqS32 //go:noescape func VandqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqS64 VandqS64 //go:noescape func VandqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqU8 VandqU8 //go:noescape func VandqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqU16 VandqU16 //go:noescape func VandqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqU32 VandqU32 //go:noescape func VandqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqU64 VandqU64 //go:noescape func VandqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbcaxqS8 VbcaxqS8 //go:noescape func VbcaxqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16) // Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbcaxqS16 VbcaxqS16 //go:noescape func VbcaxqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8) // Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbcaxqS32 VbcaxqS32 //go:noescape func VbcaxqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4) // Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbcaxqS64 VbcaxqS64 //go:noescape func VbcaxqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2, v2 *arm.Int64X2) // Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbcaxqU8 VbcaxqU8 //go:noescape func VbcaxqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16) // Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbcaxqU16 VbcaxqU16 //go:noescape func VbcaxqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8) // Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbcaxqU32 VbcaxqU32 //go:noescape func VbcaxqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbcaxqU64 VbcaxqU64 //go:noescape func VbcaxqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicS8 VbicS8 //go:noescape func VbicS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicS16 VbicS16 //go:noescape func VbicS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicS32 VbicS32 //go:noescape func VbicS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicS64 VbicS64 //go:noescape func VbicS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicU8 VbicU8 //go:noescape func VbicU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicU16 VbicU16 //go:noescape func VbicU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicU32 VbicU32 //go:noescape func VbicU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicU64 VbicU64 //go:noescape func VbicU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqS8 VbicqS8 //go:noescape func VbicqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqS16 VbicqS16 //go:noescape func VbicqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqS32 VbicqS32 //go:noescape func VbicqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqS64 VbicqS64 //go:noescape func VbicqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqU8 VbicqU8 //go:noescape func VbicqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqU16 VbicqU16 //go:noescape func VbicqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqU32 VbicqU32 //go:noescape func VbicqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqU64 VbicqU64 //go:noescape func VbicqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslS8 VbslS8 //go:noescape func VbslS8(r *arm.Int8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8, v2 *arm.Int8X8) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslS16 VbslS16 //go:noescape func VbslS16(r *arm.Int16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4, v2 *arm.Int16X4) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslS32 VbslS32 //go:noescape func VbslS32(r *arm.Int32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2, v2 *arm.Int32X2) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslS64 VbslS64 //go:noescape func VbslS64(r *arm.Int64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1, v2 *arm.Int64X1) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslU8 VbslU8 //go:noescape func VbslU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslU16 VbslU16 //go:noescape func VbslU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslU32 VbslU32 //go:noescape func VbslU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslU64 VbslU64 //go:noescape func VbslU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1, v2 *arm.Uint64X1) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslF32 VbslF32 //go:noescape func VbslF32(r *arm.Float32X2, v0 *arm.Uint32X2, v1 *arm.Float32X2, v2 *arm.Float32X2) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslF64 VbslF64 //go:noescape func VbslF64(r *arm.Float64X1, v0 *arm.Uint64X1, v1 *arm.Float64X1, v2 *arm.Float64X1) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslP16 VbslP16 //go:noescape func VbslP16(r *arm.Poly16X4, v0 *arm.Uint16X4, v1 *arm.Poly16X4, v2 *arm.Poly16X4) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslP64 VbslP64 //go:noescape func VbslP64(r *arm.Poly64X1, v0 *arm.Uint64X1, v1 *arm.Poly64X1, v2 *arm.Poly64X1) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslP8 VbslP8 //go:noescape func VbslP8(r *arm.Poly8X8, v0 *arm.Uint8X8, v1 *arm.Poly8X8, v2 *arm.Poly8X8) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslqS8 VbslqS8 //go:noescape func VbslqS8(r *arm.Int8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16, v2 *arm.Int8X16) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslqS16 VbslqS16 //go:noescape func VbslqS16(r *arm.Int16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8, v2 *arm.Int16X8) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslqS32 VbslqS32 //go:noescape func VbslqS32(r *arm.Int32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4, v2 *arm.Int32X4) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslqS64 VbslqS64 //go:noescape func VbslqS64(r *arm.Int64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2, v2 *arm.Int64X2) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslqU8 VbslqU8 //go:noescape func VbslqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslqU16 VbslqU16 //go:noescape func VbslqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslqU32 VbslqU32 //go:noescape func VbslqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslqU64 VbslqU64 //go:noescape func VbslqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslqF32 VbslqF32 //go:noescape func VbslqF32(r *arm.Float32X4, v0 *arm.Uint32X4, v1 *arm.Float32X4, v2 *arm.Float32X4) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslqF64 VbslqF64 //go:noescape func VbslqF64(r *arm.Float64X2, v0 *arm.Uint64X2, v1 *arm.Float64X2, v2 *arm.Float64X2) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslqP16 VbslqP16 //go:noescape func VbslqP16(r *arm.Poly16X8, v0 *arm.Uint16X8, v1 *arm.Poly16X8, v2 *arm.Poly16X8) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslqP64 VbslqP64 //go:noescape func VbslqP64(r *arm.Poly64X2, v0 *arm.Uint64X2, v1 *arm.Poly64X2, v2 *arm.Poly64X2) // Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP register. // //go:linkname VbslqP8 VbslqP8 //go:noescape func VbslqP8(r *arm.Poly8X16, v0 *arm.Uint8X16, v1 *arm.Poly8X16, v2 *arm.Poly8X16) // Floating-point Complex Add. // //go:linkname VcaddRot270F32 VcaddRot270F32 //go:noescape func VcaddRot270F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Complex Add. // //go:linkname VcaddRot90F32 VcaddRot90F32 //go:noescape func VcaddRot90F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Complex Add. // //go:linkname VcaddqRot270F32 VcaddqRot270F32 //go:noescape func VcaddqRot270F32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Complex Add. // //go:linkname VcaddqRot270F64 VcaddqRot270F64 //go:noescape func VcaddqRot270F64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Complex Add. // //go:linkname VcaddqRot90F32 VcaddqRot90F32 //go:noescape func VcaddqRot90F32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Complex Add. // //go:linkname VcaddqRot90F64 VcaddqRot90F64 //go:noescape func VcaddqRot90F64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcageF32 VcageF32 //go:noescape func VcageF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcageF64 VcageF64 //go:noescape func VcageF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagedF64 VcagedF64 //go:noescape func VcagedF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64) // Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcageqF32 VcageqF32 //go:noescape func VcageqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcageqF64 VcageqF64 //go:noescape func VcageqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagesF32 VcagesF32 //go:noescape func VcagesF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32) // Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagtF32 VcagtF32 //go:noescape func VcagtF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagtF64 VcagtF64 //go:noescape func VcagtF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagtdF64 VcagtdF64 //go:noescape func VcagtdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64) // Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagtqF32 VcagtqF32 //go:noescape func VcagtqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagtqF64 VcagtqF64 //go:noescape func VcagtqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagtsF32 VcagtsF32 //go:noescape func VcagtsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32) // Floating-point absolute compare less than or equal // //go:linkname VcaleF32 VcaleF32 //go:noescape func VcaleF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point absolute compare less than or equal // //go:linkname VcaleF64 VcaleF64 //go:noescape func VcaleF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Floating-point absolute compare less than or equal // //go:linkname VcaledF64 VcaledF64 //go:noescape func VcaledF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64) // Floating-point absolute compare less than or equal // //go:linkname VcaleqF32 VcaleqF32 //go:noescape func VcaleqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point absolute compare less than or equal // //go:linkname VcaleqF64 VcaleqF64 //go:noescape func VcaleqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point absolute compare less than or equal // //go:linkname VcalesF32 VcalesF32 //go:noescape func VcalesF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32) // Floating-point absolute compare less than // //go:linkname VcaltF32 VcaltF32 //go:noescape func VcaltF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point absolute compare less than // //go:linkname VcaltF64 VcaltF64 //go:noescape func VcaltF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Floating-point absolute compare less than // //go:linkname VcaltdF64 VcaltdF64 //go:noescape func VcaltdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64) // Floating-point absolute compare less than // //go:linkname VcaltqF32 VcaltqF32 //go:noescape func VcaltqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point absolute compare less than // //go:linkname VcaltqF64 VcaltqF64 //go:noescape func VcaltqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point absolute compare less than // //go:linkname VcaltsF32 VcaltsF32 //go:noescape func VcaltsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqS8 VceqS8 //go:noescape func VceqS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqS16 VceqS16 //go:noescape func VceqS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqS32 VceqS32 //go:noescape func VceqS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqS64 VceqS64 //go:noescape func VceqS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqU8 VceqU8 //go:noescape func VceqU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqU16 VceqU16 //go:noescape func VceqU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqU32 VceqU32 //go:noescape func VceqU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqU64 VceqU64 //go:noescape func VceqU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqF32 VceqF32 //go:noescape func VceqF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqF64 VceqF64 //go:noescape func VceqF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqP64 VceqP64 //go:noescape func VceqP64(r *arm.Uint64X1, v0 *arm.Poly64X1, v1 *arm.Poly64X1) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqP8 VceqP8 //go:noescape func VceqP8(r *arm.Uint8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqdS64 VceqdS64 //go:noescape func VceqdS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqdU64 VceqdU64 //go:noescape func VceqdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64) // Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqdF64 VceqdF64 //go:noescape func VceqdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqS8 VceqqS8 //go:noescape func VceqqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqS16 VceqqS16 //go:noescape func VceqqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqS32 VceqqS32 //go:noescape func VceqqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqS64 VceqqS64 //go:noescape func VceqqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqU8 VceqqU8 //go:noescape func VceqqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqU16 VceqqU16 //go:noescape func VceqqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqU32 VceqqU32 //go:noescape func VceqqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqU64 VceqqU64 //go:noescape func VceqqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqF32 VceqqF32 //go:noescape func VceqqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqF64 VceqqF64 //go:noescape func VceqqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqP64 VceqqP64 //go:noescape func VceqqP64(r *arm.Uint64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqP8 VceqqP8 //go:noescape func VceqqP8(r *arm.Uint8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16) // Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqsF32 VceqsF32 //go:noescape func VceqsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzS8 VceqzS8 //go:noescape func VceqzS8(r *arm.Uint8X8, v0 *arm.Int8X8) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzS16 VceqzS16 //go:noescape func VceqzS16(r *arm.Uint16X4, v0 *arm.Int16X4) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzS32 VceqzS32 //go:noescape func VceqzS32(r *arm.Uint32X2, v0 *arm.Int32X2) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzS64 VceqzS64 //go:noescape func VceqzS64(r *arm.Uint64X1, v0 *arm.Int64X1) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzU8 VceqzU8 //go:noescape func VceqzU8(r *arm.Uint8X8, v0 *arm.Uint8X8) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzU16 VceqzU16 //go:noescape func VceqzU16(r *arm.Uint16X4, v0 *arm.Uint16X4) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzU32 VceqzU32 //go:noescape func VceqzU32(r *arm.Uint32X2, v0 *arm.Uint32X2) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzU64 VceqzU64 //go:noescape func VceqzU64(r *arm.Uint64X1, v0 *arm.Uint64X1) // Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzF32 VceqzF32 //go:noescape func VceqzF32(r *arm.Uint32X2, v0 *arm.Float32X2) // Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzF64 VceqzF64 //go:noescape func VceqzF64(r *arm.Uint64X1, v0 *arm.Float64X1) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzP64 VceqzP64 //go:noescape func VceqzP64(r *arm.Uint64X1, v0 *arm.Poly64X1) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzP8 VceqzP8 //go:noescape func VceqzP8(r *arm.Uint8X8, v0 *arm.Poly8X8) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzdS64 VceqzdS64 //go:noescape func VceqzdS64(r *arm.Uint64, v0 *arm.Int64) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzdU64 VceqzdU64 //go:noescape func VceqzdU64(r *arm.Uint64, v0 *arm.Uint64) // Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzdF64 VceqzdF64 //go:noescape func VceqzdF64(r *arm.Uint64, v0 *arm.Float64) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqS8 VceqzqS8 //go:noescape func VceqzqS8(r *arm.Uint8X16, v0 *arm.Int8X16) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqS16 VceqzqS16 //go:noescape func VceqzqS16(r *arm.Uint16X8, v0 *arm.Int16X8) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqS32 VceqzqS32 //go:noescape func VceqzqS32(r *arm.Uint32X4, v0 *arm.Int32X4) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqS64 VceqzqS64 //go:noescape func VceqzqS64(r *arm.Uint64X2, v0 *arm.Int64X2) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqU8 VceqzqU8 //go:noescape func VceqzqU8(r *arm.Uint8X16, v0 *arm.Uint8X16) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqU16 VceqzqU16 //go:noescape func VceqzqU16(r *arm.Uint16X8, v0 *arm.Uint16X8) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqU32 VceqzqU32 //go:noescape func VceqzqU32(r *arm.Uint32X4, v0 *arm.Uint32X4) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqU64 VceqzqU64 //go:noescape func VceqzqU64(r *arm.Uint64X2, v0 *arm.Uint64X2) // Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqF32 VceqzqF32 //go:noescape func VceqzqF32(r *arm.Uint32X4, v0 *arm.Float32X4) // Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqF64 VceqzqF64 //go:noescape func VceqzqF64(r *arm.Uint64X2, v0 *arm.Float64X2) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqP64 VceqzqP64 //go:noescape func VceqzqP64(r *arm.Uint64X2, v0 *arm.Poly64X2) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqP8 VceqzqP8 //go:noescape func VceqzqP8(r *arm.Uint8X16, v0 *arm.Poly8X16) // Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzsF32 VceqzsF32 //go:noescape func VceqzsF32(r *arm.Uint32, v0 *arm.Float32) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeS8 VcgeS8 //go:noescape func VcgeS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeS16 VcgeS16 //go:noescape func VcgeS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeS32 VcgeS32 //go:noescape func VcgeS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeS64 VcgeS64 //go:noescape func VcgeS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeU8 VcgeU8 //go:noescape func VcgeU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeU16 VcgeU16 //go:noescape func VcgeU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeU32 VcgeU32 //go:noescape func VcgeU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeU64 VcgeU64 //go:noescape func VcgeU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeF32 VcgeF32 //go:noescape func VcgeF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeF64 VcgeF64 //go:noescape func VcgeF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgedS64 VcgedS64 //go:noescape func VcgedS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgedU64 VcgedU64 //go:noescape func VcgedU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64) // Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgedF64 VcgedF64 //go:noescape func VcgedF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqS8 VcgeqS8 //go:noescape func VcgeqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqS16 VcgeqS16 //go:noescape func VcgeqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqS32 VcgeqS32 //go:noescape func VcgeqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqS64 VcgeqS64 //go:noescape func VcgeqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqU8 VcgeqU8 //go:noescape func VcgeqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqU16 VcgeqU16 //go:noescape func VcgeqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqU32 VcgeqU32 //go:noescape func VcgeqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqU64 VcgeqU64 //go:noescape func VcgeqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqF32 VcgeqF32 //go:noescape func VcgeqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqF64 VcgeqF64 //go:noescape func VcgeqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgesF32 VcgesF32 //go:noescape func VcgesF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezS8 VcgezS8 //go:noescape func VcgezS8(r *arm.Uint8X8, v0 *arm.Int8X8) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezS16 VcgezS16 //go:noescape func VcgezS16(r *arm.Uint16X4, v0 *arm.Int16X4) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezS32 VcgezS32 //go:noescape func VcgezS32(r *arm.Uint32X2, v0 *arm.Int32X2) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezS64 VcgezS64 //go:noescape func VcgezS64(r *arm.Uint64X1, v0 *arm.Int64X1) // Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezF32 VcgezF32 //go:noescape func VcgezF32(r *arm.Uint32X2, v0 *arm.Float32X2) // Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezF64 VcgezF64 //go:noescape func VcgezF64(r *arm.Uint64X1, v0 *arm.Float64X1) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezdS64 VcgezdS64 //go:noescape func VcgezdS64(r *arm.Uint64, v0 *arm.Int64) // Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezdF64 VcgezdF64 //go:noescape func VcgezdF64(r *arm.Uint64, v0 *arm.Float64) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezqS8 VcgezqS8 //go:noescape func VcgezqS8(r *arm.Uint8X16, v0 *arm.Int8X16) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezqS16 VcgezqS16 //go:noescape func VcgezqS16(r *arm.Uint16X8, v0 *arm.Int16X8) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezqS32 VcgezqS32 //go:noescape func VcgezqS32(r *arm.Uint32X4, v0 *arm.Int32X4) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezqS64 VcgezqS64 //go:noescape func VcgezqS64(r *arm.Uint64X2, v0 *arm.Int64X2) // Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezqF32 VcgezqF32 //go:noescape func VcgezqF32(r *arm.Uint32X4, v0 *arm.Float32X4) // Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezqF64 VcgezqF64 //go:noescape func VcgezqF64(r *arm.Uint64X2, v0 *arm.Float64X2) // Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezsF32 VcgezsF32 //go:noescape func VcgezsF32(r *arm.Uint32, v0 *arm.Float32) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtS8 VcgtS8 //go:noescape func VcgtS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtS16 VcgtS16 //go:noescape func VcgtS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtS32 VcgtS32 //go:noescape func VcgtS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtS64 VcgtS64 //go:noescape func VcgtS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtU8 VcgtU8 //go:noescape func VcgtU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtU16 VcgtU16 //go:noescape func VcgtU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtU32 VcgtU32 //go:noescape func VcgtU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtU64 VcgtU64 //go:noescape func VcgtU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtF32 VcgtF32 //go:noescape func VcgtF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtF64 VcgtF64 //go:noescape func VcgtF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtdS64 VcgtdS64 //go:noescape func VcgtdS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtdU64 VcgtdU64 //go:noescape func VcgtdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64) // Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtdF64 VcgtdF64 //go:noescape func VcgtdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqS8 VcgtqS8 //go:noescape func VcgtqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqS16 VcgtqS16 //go:noescape func VcgtqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqS32 VcgtqS32 //go:noescape func VcgtqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqS64 VcgtqS64 //go:noescape func VcgtqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqU8 VcgtqU8 //go:noescape func VcgtqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqU16 VcgtqU16 //go:noescape func VcgtqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqU32 VcgtqU32 //go:noescape func VcgtqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqU64 VcgtqU64 //go:noescape func VcgtqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqF32 VcgtqF32 //go:noescape func VcgtqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqF64 VcgtqF64 //go:noescape func VcgtqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtsF32 VcgtsF32 //go:noescape func VcgtsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzS8 VcgtzS8 //go:noescape func VcgtzS8(r *arm.Uint8X8, v0 *arm.Int8X8) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzS16 VcgtzS16 //go:noescape func VcgtzS16(r *arm.Uint16X4, v0 *arm.Int16X4) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzS32 VcgtzS32 //go:noescape func VcgtzS32(r *arm.Uint32X2, v0 *arm.Int32X2) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzS64 VcgtzS64 //go:noescape func VcgtzS64(r *arm.Uint64X1, v0 *arm.Int64X1) // Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzF32 VcgtzF32 //go:noescape func VcgtzF32(r *arm.Uint32X2, v0 *arm.Float32X2) // Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzF64 VcgtzF64 //go:noescape func VcgtzF64(r *arm.Uint64X1, v0 *arm.Float64X1) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzdS64 VcgtzdS64 //go:noescape func VcgtzdS64(r *arm.Uint64, v0 *arm.Int64) // Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzdF64 VcgtzdF64 //go:noescape func VcgtzdF64(r *arm.Uint64, v0 *arm.Float64) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzqS8 VcgtzqS8 //go:noescape func VcgtzqS8(r *arm.Uint8X16, v0 *arm.Int8X16) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzqS16 VcgtzqS16 //go:noescape func VcgtzqS16(r *arm.Uint16X8, v0 *arm.Int16X8) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzqS32 VcgtzqS32 //go:noescape func VcgtzqS32(r *arm.Uint32X4, v0 *arm.Int32X4) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzqS64 VcgtzqS64 //go:noescape func VcgtzqS64(r *arm.Uint64X2, v0 *arm.Int64X2) // Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzqF32 VcgtzqF32 //go:noescape func VcgtzqF32(r *arm.Uint32X4, v0 *arm.Float32X4) // Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzqF64 VcgtzqF64 //go:noescape func VcgtzqF64(r *arm.Uint64X2, v0 *arm.Float64X2) // Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzsF32 VcgtzsF32 //go:noescape func VcgtzsF32(r *arm.Uint32, v0 *arm.Float32) // Compare signed less than or equal // //go:linkname VcleS8 VcleS8 //go:noescape func VcleS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Compare signed less than or equal // //go:linkname VcleS16 VcleS16 //go:noescape func VcleS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Compare signed less than or equal // //go:linkname VcleS32 VcleS32 //go:noescape func VcleS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Compare signed less than or equal // //go:linkname VcleS64 VcleS64 //go:noescape func VcleS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Compare unsigned less than or equal // //go:linkname VcleU8 VcleU8 //go:noescape func VcleU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Compare unsigned less than or equal // //go:linkname VcleU16 VcleU16 //go:noescape func VcleU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Compare unsigned less than or equal // //go:linkname VcleU32 VcleU32 //go:noescape func VcleU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Compare unsigned less than or equal // //go:linkname VcleU64 VcleU64 //go:noescape func VcleU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Floating-point compare less than or equal // //go:linkname VcleF32 VcleF32 //go:noescape func VcleF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point compare less than or equal // //go:linkname VcleF64 VcleF64 //go:noescape func VcleF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Compare signed less than or equal // //go:linkname VcledS64 VcledS64 //go:noescape func VcledS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64) // Compare unsigned less than or equal // //go:linkname VcledU64 VcledU64 //go:noescape func VcledU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64) // Floating-point compare less than or equal // //go:linkname VcledF64 VcledF64 //go:noescape func VcledF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64) // Compare signed less than or equal // //go:linkname VcleqS8 VcleqS8 //go:noescape func VcleqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Compare signed less than or equal // //go:linkname VcleqS16 VcleqS16 //go:noescape func VcleqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Compare signed less than or equal // //go:linkname VcleqS32 VcleqS32 //go:noescape func VcleqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Compare signed less than or equal // //go:linkname VcleqS64 VcleqS64 //go:noescape func VcleqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Compare unsigned less than or equal // //go:linkname VcleqU8 VcleqU8 //go:noescape func VcleqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Compare unsigned less than or equal // //go:linkname VcleqU16 VcleqU16 //go:noescape func VcleqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Compare unsigned less than or equal // //go:linkname VcleqU32 VcleqU32 //go:noescape func VcleqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Compare unsigned less than or equal // //go:linkname VcleqU64 VcleqU64 //go:noescape func VcleqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Floating-point compare less than or equal // //go:linkname VcleqF32 VcleqF32 //go:noescape func VcleqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point compare less than or equal // //go:linkname VcleqF64 VcleqF64 //go:noescape func VcleqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point compare less than or equal // //go:linkname VclesF32 VclesF32 //go:noescape func VclesF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezS8 VclezS8 //go:noescape func VclezS8(r *arm.Uint8X8, v0 *arm.Int8X8) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezS16 VclezS16 //go:noescape func VclezS16(r *arm.Uint16X4, v0 *arm.Int16X4) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezS32 VclezS32 //go:noescape func VclezS32(r *arm.Uint32X2, v0 *arm.Int32X2) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezS64 VclezS64 //go:noescape func VclezS64(r *arm.Uint64X1, v0 *arm.Int64X1) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezF32 VclezF32 //go:noescape func VclezF32(r *arm.Uint32X2, v0 *arm.Float32X2) // Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezF64 VclezF64 //go:noescape func VclezF64(r *arm.Uint64X1, v0 *arm.Float64X1) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezdS64 VclezdS64 //go:noescape func VclezdS64(r *arm.Uint64, v0 *arm.Int64) // Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezdF64 VclezdF64 //go:noescape func VclezdF64(r *arm.Uint64, v0 *arm.Float64) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezqS8 VclezqS8 //go:noescape func VclezqS8(r *arm.Uint8X16, v0 *arm.Int8X16) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezqS16 VclezqS16 //go:noescape func VclezqS16(r *arm.Uint16X8, v0 *arm.Int16X8) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezqS32 VclezqS32 //go:noescape func VclezqS32(r *arm.Uint32X4, v0 *arm.Int32X4) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezqS64 VclezqS64 //go:noescape func VclezqS64(r *arm.Uint64X2, v0 *arm.Int64X2) // Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezqF32 VclezqF32 //go:noescape func VclezqF32(r *arm.Uint32X4, v0 *arm.Float32X4) // Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezqF64 VclezqF64 //go:noescape func VclezqF64(r *arm.Uint64X2, v0 *arm.Float64X2) // Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezsF32 VclezsF32 //go:noescape func VclezsF32(r *arm.Uint32, v0 *arm.Float32) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsS8 VclsS8 //go:noescape func VclsS8(r *arm.Int8X8, v0 *arm.Int8X8) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsS16 VclsS16 //go:noescape func VclsS16(r *arm.Int16X4, v0 *arm.Int16X4) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsS32 VclsS32 //go:noescape func VclsS32(r *arm.Int32X2, v0 *arm.Int32X2) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsU8 VclsU8 //go:noescape func VclsU8(r *arm.Int8X8, v0 *arm.Uint8X8) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsU16 VclsU16 //go:noescape func VclsU16(r *arm.Int16X4, v0 *arm.Uint16X4) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsU32 VclsU32 //go:noescape func VclsU32(r *arm.Int32X2, v0 *arm.Uint32X2) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsqS8 VclsqS8 //go:noescape func VclsqS8(r *arm.Int8X16, v0 *arm.Int8X16) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsqS16 VclsqS16 //go:noescape func VclsqS16(r *arm.Int16X8, v0 *arm.Int16X8) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsqS32 VclsqS32 //go:noescape func VclsqS32(r *arm.Int32X4, v0 *arm.Int32X4) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsqU8 VclsqU8 //go:noescape func VclsqU8(r *arm.Int8X16, v0 *arm.Uint8X16) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsqU16 VclsqU16 //go:noescape func VclsqU16(r *arm.Int16X8, v0 *arm.Uint16X8) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsqU32 VclsqU32 //go:noescape func VclsqU32(r *arm.Int32X4, v0 *arm.Uint32X4) // Compare signed less than // //go:linkname VcltS8 VcltS8 //go:noescape func VcltS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Compare signed less than // //go:linkname VcltS16 VcltS16 //go:noescape func VcltS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Compare signed less than // //go:linkname VcltS32 VcltS32 //go:noescape func VcltS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Compare signed less than // //go:linkname VcltS64 VcltS64 //go:noescape func VcltS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Compare unsigned less than // //go:linkname VcltU8 VcltU8 //go:noescape func VcltU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Compare unsigned less than // //go:linkname VcltU16 VcltU16 //go:noescape func VcltU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Compare unsigned less than // //go:linkname VcltU32 VcltU32 //go:noescape func VcltU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Compare unsigned less than // //go:linkname VcltU64 VcltU64 //go:noescape func VcltU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Floating-point compare less than // //go:linkname VcltF32 VcltF32 //go:noescape func VcltF32(r *arm.Uint32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point compare less than // //go:linkname VcltF64 VcltF64 //go:noescape func VcltF64(r *arm.Uint64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Compare signed less than // //go:linkname VcltdS64 VcltdS64 //go:noescape func VcltdS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64) // Compare unsigned less than // //go:linkname VcltdU64 VcltdU64 //go:noescape func VcltdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64) // Floating-point compare less than // //go:linkname VcltdF64 VcltdF64 //go:noescape func VcltdF64(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64) // Compare signed less than // //go:linkname VcltqS8 VcltqS8 //go:noescape func VcltqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Compare signed less than // //go:linkname VcltqS16 VcltqS16 //go:noescape func VcltqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Compare signed less than // //go:linkname VcltqS32 VcltqS32 //go:noescape func VcltqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Compare signed less than // //go:linkname VcltqS64 VcltqS64 //go:noescape func VcltqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Compare unsigned less than // //go:linkname VcltqU8 VcltqU8 //go:noescape func VcltqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Compare unsigned less than // //go:linkname VcltqU16 VcltqU16 //go:noescape func VcltqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Compare unsigned less than // //go:linkname VcltqU32 VcltqU32 //go:noescape func VcltqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Compare unsigned less than // //go:linkname VcltqU64 VcltqU64 //go:noescape func VcltqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Floating-point compare less than // //go:linkname VcltqF32 VcltqF32 //go:noescape func VcltqF32(r *arm.Uint32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point compare less than // //go:linkname VcltqF64 VcltqF64 //go:noescape func VcltqF64(r *arm.Uint64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point compare less than // //go:linkname VcltsF32 VcltsF32 //go:noescape func VcltsF32(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzS8 VcltzS8 //go:noescape func VcltzS8(r *arm.Uint8X8, v0 *arm.Int8X8) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzS16 VcltzS16 //go:noescape func VcltzS16(r *arm.Uint16X4, v0 *arm.Int16X4) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzS32 VcltzS32 //go:noescape func VcltzS32(r *arm.Uint32X2, v0 *arm.Int32X2) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzS64 VcltzS64 //go:noescape func VcltzS64(r *arm.Uint64X1, v0 *arm.Int64X1) // Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzF32 VcltzF32 //go:noescape func VcltzF32(r *arm.Uint32X2, v0 *arm.Float32X2) // Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzF64 VcltzF64 //go:noescape func VcltzF64(r *arm.Uint64X1, v0 *arm.Float64X1) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzdS64 VcltzdS64 //go:noescape func VcltzdS64(r *arm.Uint64, v0 *arm.Int64) // Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzdF64 VcltzdF64 //go:noescape func VcltzdF64(r *arm.Uint64, v0 *arm.Float64) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzqS8 VcltzqS8 //go:noescape func VcltzqS8(r *arm.Uint8X16, v0 *arm.Int8X16) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzqS16 VcltzqS16 //go:noescape func VcltzqS16(r *arm.Uint16X8, v0 *arm.Int16X8) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzqS32 VcltzqS32 //go:noescape func VcltzqS32(r *arm.Uint32X4, v0 *arm.Int32X4) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzqS64 VcltzqS64 //go:noescape func VcltzqS64(r *arm.Uint64X2, v0 *arm.Int64X2) // Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzqF32 VcltzqF32 //go:noescape func VcltzqF32(r *arm.Uint32X4, v0 *arm.Float32X4) // Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzqF64 VcltzqF64 //go:noescape func VcltzqF64(r *arm.Uint64X2, v0 *arm.Float64X2) // Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzsF32 VcltzsF32 //go:noescape func VcltzsF32(r *arm.Uint32, v0 *arm.Float32) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzS8 VclzS8 //go:noescape func VclzS8(r *arm.Int8X8, v0 *arm.Int8X8) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzS16 VclzS16 //go:noescape func VclzS16(r *arm.Int16X4, v0 *arm.Int16X4) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzS32 VclzS32 //go:noescape func VclzS32(r *arm.Int32X2, v0 *arm.Int32X2) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzU8 VclzU8 //go:noescape func VclzU8(r *arm.Uint8X8, v0 *arm.Uint8X8) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzU16 VclzU16 //go:noescape func VclzU16(r *arm.Uint16X4, v0 *arm.Uint16X4) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzU32 VclzU32 //go:noescape func VclzU32(r *arm.Uint32X2, v0 *arm.Uint32X2) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzqS8 VclzqS8 //go:noescape func VclzqS8(r *arm.Int8X16, v0 *arm.Int8X16) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzqS16 VclzqS16 //go:noescape func VclzqS16(r *arm.Int16X8, v0 *arm.Int16X8) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzqS32 VclzqS32 //go:noescape func VclzqS32(r *arm.Int32X4, v0 *arm.Int32X4) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzqU8 VclzqU8 //go:noescape func VclzqU8(r *arm.Uint8X16, v0 *arm.Uint8X16) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzqU16 VclzqU16 //go:noescape func VclzqU16(r *arm.Uint16X8, v0 *arm.Uint16X8) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzqU32 VclzqU32 //go:noescape func VclzqU32(r *arm.Uint32X4, v0 *arm.Uint32X4) // Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VcntS8 VcntS8 //go:noescape func VcntS8(r *arm.Int8X8, v0 *arm.Int8X8) // Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VcntU8 VcntU8 //go:noescape func VcntU8(r *arm.Uint8X8, v0 *arm.Uint8X8) // Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VcntP8 VcntP8 //go:noescape func VcntP8(r *arm.Poly8X8, v0 *arm.Poly8X8) // Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VcntqS8 VcntqS8 //go:noescape func VcntqS8(r *arm.Int8X16, v0 *arm.Int8X16) // Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VcntqU8 VcntqU8 //go:noescape func VcntqU8(r *arm.Uint8X16, v0 *arm.Uint8X16) // Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VcntqP8 VcntqP8 //go:noescape func VcntqP8(r *arm.Poly8X16, v0 *arm.Poly8X16) // Join two smaller vectors into a single larger vector // //go:linkname VcombineS8 VcombineS8 //go:noescape func VcombineS8(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int8X8) // Join two smaller vectors into a single larger vector // //go:linkname VcombineS16 VcombineS16 //go:noescape func VcombineS16(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int16X4) // Join two smaller vectors into a single larger vector // //go:linkname VcombineS32 VcombineS32 //go:noescape func VcombineS32(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int32X2) // Join two smaller vectors into a single larger vector // //go:linkname VcombineS64 VcombineS64 //go:noescape func VcombineS64(r *arm.Int64X2, v0 *arm.Int64X1, v1 *arm.Int64X1) // Join two smaller vectors into a single larger vector // //go:linkname VcombineU8 VcombineU8 //go:noescape func VcombineU8(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Join two smaller vectors into a single larger vector // //go:linkname VcombineU16 VcombineU16 //go:noescape func VcombineU16(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Join two smaller vectors into a single larger vector // //go:linkname VcombineU32 VcombineU32 //go:noescape func VcombineU32(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Join two smaller vectors into a single larger vector // //go:linkname VcombineU64 VcombineU64 //go:noescape func VcombineU64(r *arm.Uint64X2, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Join two smaller vectors into a single larger vector // //go:linkname VcombineF32 VcombineF32 //go:noescape func VcombineF32(r *arm.Float32X4, v0 *arm.Float32X2, v1 *arm.Float32X2) // Join two smaller vectors into a single larger vector // //go:linkname VcombineF64 VcombineF64 //go:noescape func VcombineF64(r *arm.Float64X2, v0 *arm.Float64X1, v1 *arm.Float64X1) // Join two smaller vectors into a single larger vector // //go:linkname VcombineP16 VcombineP16 //go:noescape func VcombineP16(r *arm.Poly16X8, v0 *arm.Poly16X4, v1 *arm.Poly16X4) // Join two smaller vectors into a single larger vector // //go:linkname VcombineP64 VcombineP64 //go:noescape func VcombineP64(r *arm.Poly64X2, v0 *arm.Poly64X1, v1 *arm.Poly64X1) // Join two smaller vectors into a single larger vector // //go:linkname VcombineP8 VcombineP8 //go:noescape func VcombineP8(r *arm.Poly8X16, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtF32S32 VcvtF32S32 //go:noescape func VcvtF32S32(r *arm.Float32X2, v0 *arm.Int32X2) // Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtF32U32 VcvtF32U32 //go:noescape func VcvtF32U32(r *arm.Float32X2, v0 *arm.Uint32X2) // Floating-point Convert to lower precision Narrow (vector). This instruction reads each vector element in the SIMD&FP source register, converts each result to half the precision of the source element, writes the final result to a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. The rounding mode is determined by the FPCR. // //go:linkname VcvtF32F64 VcvtF32F64 //go:noescape func VcvtF32F64(r *arm.Float32X2, v0 *arm.Float64X2) // Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtF64S64 VcvtF64S64 //go:noescape func VcvtF64S64(r *arm.Float64X1, v0 *arm.Int64X1) // Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtF64U64 VcvtF64U64 //go:noescape func VcvtF64U64(r *arm.Float64X1, v0 *arm.Uint64X1) // Floating-point Convert to higher precision Long (vector). This instruction reads each element in a vector in the SIMD&FP source register, converts each value to double the precision of the source element using the rounding mode that is determined by the FPCR, and writes each result to the equivalent element of the vector in the SIMD&FP destination register. // //go:linkname VcvtF64F32 VcvtF64F32 //go:noescape func VcvtF64F32(r *arm.Float64X2, v0 *arm.Float32X2) // Floating-point Convert to lower precision Narrow (vector). This instruction reads each vector element in the SIMD&FP source register, converts each result to half the precision of the source element, writes the final result to a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. The rounding mode is determined by the FPCR. // //go:linkname VcvtHighF32F64 VcvtHighF32F64 //go:noescape func VcvtHighF32F64(r *arm.Float32X4, v0 *arm.Float32X2, v1 *arm.Float64X2) // Floating-point Convert to higher precision Long (vector). This instruction reads each element in a vector in the SIMD&FP source register, converts each value to double the precision of the source element using the rounding mode that is determined by the FPCR, and writes each result to the equivalent element of the vector in the SIMD&FP destination register. // //go:linkname VcvtHighF64F32 VcvtHighF64F32 //go:noescape func VcvtHighF64F32(r *arm.Float64X2, v0 *arm.Float32X4) // Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtS32F32 VcvtS32F32 //go:noescape func VcvtS32F32(r *arm.Int32X2, v0 *arm.Float32X2) // Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtS64F64 VcvtS64F64 //go:noescape func VcvtS64F64(r *arm.Int64X1, v0 *arm.Float64X1) // Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register. // //go:linkname VcvtU32F32 VcvtU32F32 //go:noescape func VcvtU32F32(r *arm.Uint32X2, v0 *arm.Float32X2) // Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register. // //go:linkname VcvtU64F64 VcvtU64F64 //go:noescape func VcvtU64F64(r *arm.Uint64X1, v0 *arm.Float64X1) // Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaS32F32 VcvtaS32F32 //go:noescape func VcvtaS32F32(r *arm.Int32X2, v0 *arm.Float32X2) // Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaS64F64 VcvtaS64F64 //go:noescape func VcvtaS64F64(r *arm.Int64X1, v0 *arm.Float64X1) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaU32F32 VcvtaU32F32 //go:noescape func VcvtaU32F32(r *arm.Uint32X2, v0 *arm.Float32X2) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaU64F64 VcvtaU64F64 //go:noescape func VcvtaU64F64(r *arm.Uint64X1, v0 *arm.Float64X1) // Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtadS64F64 VcvtadS64F64 //go:noescape func VcvtadS64F64(r *arm.Int64, v0 *arm.Float64) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtadU64F64 VcvtadU64F64 //go:noescape func VcvtadU64F64(r *arm.Uint64, v0 *arm.Float64) // Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaqS32F32 VcvtaqS32F32 //go:noescape func VcvtaqS32F32(r *arm.Int32X4, v0 *arm.Float32X4) // Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaqS64F64 VcvtaqS64F64 //go:noescape func VcvtaqS64F64(r *arm.Int64X2, v0 *arm.Float64X2) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaqU32F32 VcvtaqU32F32 //go:noescape func VcvtaqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaqU64F64 VcvtaqU64F64 //go:noescape func VcvtaqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2) // Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtasS32F32 VcvtasS32F32 //go:noescape func VcvtasS32F32(r *arm.Int32, v0 *arm.Float32) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtasU32F32 VcvtasU32F32 //go:noescape func VcvtasU32F32(r *arm.Uint32, v0 *arm.Float32) // Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtdF64S64 VcvtdF64S64 //go:noescape func VcvtdF64S64(r *arm.Float64, v0 *arm.Int64) // Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtdF64U64 VcvtdF64U64 //go:noescape func VcvtdF64U64(r *arm.Float64, v0 *arm.Uint64) // Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtdS64F64 VcvtdS64F64 //go:noescape func VcvtdS64F64(r *arm.Int64, v0 *arm.Float64) // Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register. // //go:linkname VcvtdU64F64 VcvtdU64F64 //go:noescape func VcvtdU64F64(r *arm.Uint64, v0 *arm.Float64) // Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmS32F32 VcvtmS32F32 //go:noescape func VcvtmS32F32(r *arm.Int32X2, v0 *arm.Float32X2) // Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmS64F64 VcvtmS64F64 //go:noescape func VcvtmS64F64(r *arm.Int64X1, v0 *arm.Float64X1) // Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmU32F32 VcvtmU32F32 //go:noescape func VcvtmU32F32(r *arm.Uint32X2, v0 *arm.Float32X2) // Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmU64F64 VcvtmU64F64 //go:noescape func VcvtmU64F64(r *arm.Uint64X1, v0 *arm.Float64X1) // Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmdS64F64 VcvtmdS64F64 //go:noescape func VcvtmdS64F64(r *arm.Int64, v0 *arm.Float64) // Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmdU64F64 VcvtmdU64F64 //go:noescape func VcvtmdU64F64(r *arm.Uint64, v0 *arm.Float64) // Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmqS32F32 VcvtmqS32F32 //go:noescape func VcvtmqS32F32(r *arm.Int32X4, v0 *arm.Float32X4) // Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmqS64F64 VcvtmqS64F64 //go:noescape func VcvtmqS64F64(r *arm.Int64X2, v0 *arm.Float64X2) // Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmqU32F32 VcvtmqU32F32 //go:noescape func VcvtmqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4) // Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmqU64F64 VcvtmqU64F64 //go:noescape func VcvtmqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2) // Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmsS32F32 VcvtmsS32F32 //go:noescape func VcvtmsS32F32(r *arm.Int32, v0 *arm.Float32) // Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmsU32F32 VcvtmsU32F32 //go:noescape func VcvtmsU32F32(r *arm.Uint32, v0 *arm.Float32) // Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnS32F32 VcvtnS32F32 //go:noescape func VcvtnS32F32(r *arm.Int32X2, v0 *arm.Float32X2) // Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnS64F64 VcvtnS64F64 //go:noescape func VcvtnS64F64(r *arm.Int64X1, v0 *arm.Float64X1) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnU32F32 VcvtnU32F32 //go:noescape func VcvtnU32F32(r *arm.Uint32X2, v0 *arm.Float32X2) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnU64F64 VcvtnU64F64 //go:noescape func VcvtnU64F64(r *arm.Uint64X1, v0 *arm.Float64X1) // Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtndS64F64 VcvtndS64F64 //go:noescape func VcvtndS64F64(r *arm.Int64, v0 *arm.Float64) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtndU64F64 VcvtndU64F64 //go:noescape func VcvtndU64F64(r *arm.Uint64, v0 *arm.Float64) // Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnqS32F32 VcvtnqS32F32 //go:noescape func VcvtnqS32F32(r *arm.Int32X4, v0 *arm.Float32X4) // Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnqS64F64 VcvtnqS64F64 //go:noescape func VcvtnqS64F64(r *arm.Int64X2, v0 *arm.Float64X2) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnqU32F32 VcvtnqU32F32 //go:noescape func VcvtnqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnqU64F64 VcvtnqU64F64 //go:noescape func VcvtnqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2) // Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnsS32F32 VcvtnsS32F32 //go:noescape func VcvtnsS32F32(r *arm.Int32, v0 *arm.Float32) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnsU32F32 VcvtnsU32F32 //go:noescape func VcvtnsU32F32(r *arm.Uint32, v0 *arm.Float32) // Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpS32F32 VcvtpS32F32 //go:noescape func VcvtpS32F32(r *arm.Int32X2, v0 *arm.Float32X2) // Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpS64F64 VcvtpS64F64 //go:noescape func VcvtpS64F64(r *arm.Int64X1, v0 *arm.Float64X1) // Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpU32F32 VcvtpU32F32 //go:noescape func VcvtpU32F32(r *arm.Uint32X2, v0 *arm.Float32X2) // Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpU64F64 VcvtpU64F64 //go:noescape func VcvtpU64F64(r *arm.Uint64X1, v0 *arm.Float64X1) // Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpdS64F64 VcvtpdS64F64 //go:noescape func VcvtpdS64F64(r *arm.Int64, v0 *arm.Float64) // Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpdU64F64 VcvtpdU64F64 //go:noescape func VcvtpdU64F64(r *arm.Uint64, v0 *arm.Float64) // Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpqS32F32 VcvtpqS32F32 //go:noescape func VcvtpqS32F32(r *arm.Int32X4, v0 *arm.Float32X4) // Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpqS64F64 VcvtpqS64F64 //go:noescape func VcvtpqS64F64(r *arm.Int64X2, v0 *arm.Float64X2) // Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpqU32F32 VcvtpqU32F32 //go:noescape func VcvtpqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4) // Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpqU64F64 VcvtpqU64F64 //go:noescape func VcvtpqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2) // Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpsS32F32 VcvtpsS32F32 //go:noescape func VcvtpsS32F32(r *arm.Int32, v0 *arm.Float32) // Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpsU32F32 VcvtpsU32F32 //go:noescape func VcvtpsU32F32(r *arm.Uint32, v0 *arm.Float32) // Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtqF32S32 VcvtqF32S32 //go:noescape func VcvtqF32S32(r *arm.Float32X4, v0 *arm.Int32X4) // Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtqF32U32 VcvtqF32U32 //go:noescape func VcvtqF32U32(r *arm.Float32X4, v0 *arm.Uint32X4) // Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtqF64S64 VcvtqF64S64 //go:noescape func VcvtqF64S64(r *arm.Float64X2, v0 *arm.Int64X2) // Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtqF64U64 VcvtqF64U64 //go:noescape func VcvtqF64U64(r *arm.Float64X2, v0 *arm.Uint64X2) // Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtqS32F32 VcvtqS32F32 //go:noescape func VcvtqS32F32(r *arm.Int32X4, v0 *arm.Float32X4) // Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtqS64F64 VcvtqS64F64 //go:noescape func VcvtqS64F64(r *arm.Int64X2, v0 *arm.Float64X2) // Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register. // //go:linkname VcvtqU32F32 VcvtqU32F32 //go:noescape func VcvtqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4) // Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register. // //go:linkname VcvtqU64F64 VcvtqU64F64 //go:noescape func VcvtqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2) // Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtsF32S32 VcvtsF32S32 //go:noescape func VcvtsF32S32(r *arm.Float32, v0 *arm.Int32) // Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtsF32U32 VcvtsF32U32 //go:noescape func VcvtsF32U32(r *arm.Float32, v0 *arm.Uint32) // Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtsS32F32 VcvtsS32F32 //go:noescape func VcvtsS32F32(r *arm.Int32, v0 *arm.Float32) // Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register. // //go:linkname VcvtsU32F32 VcvtsU32F32 //go:noescape func VcvtsU32F32(r *arm.Uint32, v0 *arm.Float32) // Floating-point Convert to lower precision Narrow, rounding to odd (vector). This instruction reads each vector element in the source SIMD&FP register, narrows each value to half the precision of the source element using the Round to Odd rounding mode, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VcvtxF32F64 VcvtxF32F64 //go:noescape func VcvtxF32F64(r *arm.Float32X2, v0 *arm.Float64X2) // Floating-point Convert to lower precision Narrow, rounding to odd (vector). This instruction reads each vector element in the source SIMD&FP register, narrows each value to half the precision of the source element using the Round to Odd rounding mode, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VcvtxHighF32F64 VcvtxHighF32F64 //go:noescape func VcvtxHighF32F64(r *arm.Float32X4, v0 *arm.Float32X2, v1 *arm.Float64X2) // Floating-point Convert to lower precision Narrow, rounding to odd (vector). This instruction reads each vector element in the source SIMD&FP register, narrows each value to half the precision of the source element using the Round to Odd rounding mode, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VcvtxdF32F64 VcvtxdF32F64 //go:noescape func VcvtxdF32F64(r *arm.Float32, v0 *arm.Float64) // Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VdivF32 VdivF32 //go:noescape func VdivF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VdivF64 VdivF64 //go:noescape func VdivF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VdivqF32 VdivqF32 //go:noescape func VdivqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VdivqF64 VdivqF64 //go:noescape func VdivqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Dot Product signed arithmetic (vector). This instruction performs the dot product of the four signed 8-bit elements in each 32-bit element of the first source register with the four signed 8-bit elements of the corresponding 32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the destination register. // //go:linkname VdotS32 VdotS32 //go:noescape func VdotS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int8X8, v2 *arm.Int8X8) // Dot Product unsigned arithmetic (vector). This instruction performs the dot product of the four unsigned 8-bit elements in each 32-bit element of the first source register with the four unsigned 8-bit elements of the corresponding 32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the destination register. // //go:linkname VdotU32 VdotU32 //go:noescape func VdotU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint8X8, v2 *arm.Uint8X8) // Dot Product signed arithmetic (vector). This instruction performs the dot product of the four signed 8-bit elements in each 32-bit element of the first source register with the four signed 8-bit elements of the corresponding 32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the destination register. // //go:linkname VdotqS32 VdotqS32 //go:noescape func VdotqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int8X16, v2 *arm.Int8X16) // Dot Product unsigned arithmetic (vector). This instruction performs the dot product of the four unsigned 8-bit elements in each 32-bit element of the first source register with the four unsigned 8-bit elements of the corresponding 32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the destination register. // //go:linkname VdotqU32 VdotqU32 //go:noescape func VdotqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint8X16, v2 *arm.Uint8X16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNS8 VdupNS8 //go:noescape func VdupNS8(r *arm.Int8X8, v0 *arm.Int8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNS16 VdupNS16 //go:noescape func VdupNS16(r *arm.Int16X4, v0 *arm.Int16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNS32 VdupNS32 //go:noescape func VdupNS32(r *arm.Int32X2, v0 *arm.Int32) // Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register. // //go:linkname VdupNS64 VdupNS64 //go:noescape func VdupNS64(r *arm.Int64X1, v0 *arm.Int64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNU8 VdupNU8 //go:noescape func VdupNU8(r *arm.Uint8X8, v0 *arm.Uint8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNU16 VdupNU16 //go:noescape func VdupNU16(r *arm.Uint16X4, v0 *arm.Uint16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNU32 VdupNU32 //go:noescape func VdupNU32(r *arm.Uint32X2, v0 *arm.Uint32) // Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register. // //go:linkname VdupNU64 VdupNU64 //go:noescape func VdupNU64(r *arm.Uint64X1, v0 *arm.Uint64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNF32 VdupNF32 //go:noescape func VdupNF32(r *arm.Float32X2, v0 *arm.Float32) // Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register. // //go:linkname VdupNF64 VdupNF64 //go:noescape func VdupNF64(r *arm.Float64X1, v0 *arm.Float64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNP16 VdupNP16 //go:noescape func VdupNP16(r *arm.Poly16X4, v0 *arm.Poly16) // Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register. // //go:linkname VdupNP64 VdupNP64 //go:noescape func VdupNP64(r *arm.Poly64X1, v0 *arm.Poly64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNP8 VdupNP8 //go:noescape func VdupNP8(r *arm.Poly8X8, v0 *arm.Poly8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNS8 VdupqNS8 //go:noescape func VdupqNS8(r *arm.Int8X16, v0 *arm.Int8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNS16 VdupqNS16 //go:noescape func VdupqNS16(r *arm.Int16X8, v0 *arm.Int16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNS32 VdupqNS32 //go:noescape func VdupqNS32(r *arm.Int32X4, v0 *arm.Int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNS64 VdupqNS64 //go:noescape func VdupqNS64(r *arm.Int64X2, v0 *arm.Int64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNU8 VdupqNU8 //go:noescape func VdupqNU8(r *arm.Uint8X16, v0 *arm.Uint8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNU16 VdupqNU16 //go:noescape func VdupqNU16(r *arm.Uint16X8, v0 *arm.Uint16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNU32 VdupqNU32 //go:noescape func VdupqNU32(r *arm.Uint32X4, v0 *arm.Uint32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNU64 VdupqNU64 //go:noescape func VdupqNU64(r *arm.Uint64X2, v0 *arm.Uint64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNF32 VdupqNF32 //go:noescape func VdupqNF32(r *arm.Float32X4, v0 *arm.Float32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNF64 VdupqNF64 //go:noescape func VdupqNF64(r *arm.Float64X2, v0 *arm.Float64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNP16 VdupqNP16 //go:noescape func VdupqNP16(r *arm.Poly16X8, v0 *arm.Poly16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNP64 VdupqNP64 //go:noescape func VdupqNP64(r *arm.Poly64X2, v0 *arm.Poly64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNP8 VdupqNP8 //go:noescape func VdupqNP8(r *arm.Poly8X16, v0 *arm.Poly8) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorS8 VeorS8 //go:noescape func VeorS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorS16 VeorS16 //go:noescape func VeorS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorS32 VeorS32 //go:noescape func VeorS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorS64 VeorS64 //go:noescape func VeorS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorU8 VeorU8 //go:noescape func VeorU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorU16 VeorU16 //go:noescape func VeorU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorU32 VeorU32 //go:noescape func VeorU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorU64 VeorU64 //go:noescape func VeorU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname Veor3QS8 Veor3QS8 //go:noescape func Veor3QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16) // Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname Veor3QS16 Veor3QS16 //go:noescape func Veor3QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8) // Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname Veor3QS32 Veor3QS32 //go:noescape func Veor3QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4) // Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname Veor3QS64 Veor3QS64 //go:noescape func Veor3QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2, v2 *arm.Int64X2) // Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname Veor3QU8 Veor3QU8 //go:noescape func Veor3QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16) // Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname Veor3QU16 Veor3QU16 //go:noescape func Veor3QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8) // Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname Veor3QU32 Veor3QU32 //go:noescape func Veor3QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname Veor3QU64 Veor3QU64 //go:noescape func Veor3QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqS8 VeorqS8 //go:noescape func VeorqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqS16 VeorqS16 //go:noescape func VeorqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqS32 VeorqS32 //go:noescape func VeorqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqS64 VeorqS64 //go:noescape func VeorqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqU8 VeorqU8 //go:noescape func VeorqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqU16 VeorqU16 //go:noescape func VeorqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqU32 VeorqU32 //go:noescape func VeorqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqU64 VeorqU64 //go:noescape func VeorqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VfmaF32 VfmaF32 //go:noescape func VfmaF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32X2) // Floating-point fused Multiply-Add (scalar). This instruction multiplies the values of the first two SIMD&FP source registers, adds the product to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP destination register. // //go:linkname VfmaF64 VfmaF64 //go:noescape func VfmaF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64X1) // Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VfmaNF32 VfmaNF32 //go:noescape func VfmaNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32) // Floating-point fused Multiply-Add (scalar). This instruction multiplies the values of the first two SIMD&FP source registers, adds the product to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP destination register. // //go:linkname VfmaNF64 VfmaNF64 //go:noescape func VfmaNF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64) // Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VfmaqF32 VfmaqF32 //go:noescape func VfmaqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32X4) // Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VfmaqF64 VfmaqF64 //go:noescape func VfmaqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64X2) // Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VfmaqNF32 VfmaqNF32 //go:noescape func VfmaqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32) // Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VfmaqNF64 VfmaqNF64 //go:noescape func VfmaqNF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64) // Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VfmsF32 VfmsF32 //go:noescape func VfmsF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32X2) // Floating-point Fused Multiply-Subtract (scalar). This instruction multiplies the values of the first two SIMD&FP source registers, negates the product, adds that to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP destination register. // //go:linkname VfmsF64 VfmsF64 //go:noescape func VfmsF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64X1) // Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VfmsNF32 VfmsNF32 //go:noescape func VfmsNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32) // Floating-point Fused Multiply-Subtract (scalar). This instruction multiplies the values of the first two SIMD&FP source registers, negates the product, adds that to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP destination register. // //go:linkname VfmsNF64 VfmsNF64 //go:noescape func VfmsNF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64) // Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VfmsqF32 VfmsqF32 //go:noescape func VfmsqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32X4) // Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VfmsqF64 VfmsqF64 //go:noescape func VfmsqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64X2) // Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VfmsqNF32 VfmsqNF32 //go:noescape func VfmsqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32) // Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VfmsqNF64 VfmsqNF64 //go:noescape func VfmsqNF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighS8 VgetHighS8 //go:noescape func VgetHighS8(r *arm.Int8X8, v0 *arm.Int8X16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighS16 VgetHighS16 //go:noescape func VgetHighS16(r *arm.Int16X4, v0 *arm.Int16X8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighS32 VgetHighS32 //go:noescape func VgetHighS32(r *arm.Int32X2, v0 *arm.Int32X4) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighS64 VgetHighS64 //go:noescape func VgetHighS64(r *arm.Int64X1, v0 *arm.Int64X2) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighU8 VgetHighU8 //go:noescape func VgetHighU8(r *arm.Uint8X8, v0 *arm.Uint8X16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighU16 VgetHighU16 //go:noescape func VgetHighU16(r *arm.Uint16X4, v0 *arm.Uint16X8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighU32 VgetHighU32 //go:noescape func VgetHighU32(r *arm.Uint32X2, v0 *arm.Uint32X4) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighU64 VgetHighU64 //go:noescape func VgetHighU64(r *arm.Uint64X1, v0 *arm.Uint64X2) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighF32 VgetHighF32 //go:noescape func VgetHighF32(r *arm.Float32X2, v0 *arm.Float32X4) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighF64 VgetHighF64 //go:noescape func VgetHighF64(r *arm.Float64X1, v0 *arm.Float64X2) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighP16 VgetHighP16 //go:noescape func VgetHighP16(r *arm.Poly16X4, v0 *arm.Poly16X8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighP64 VgetHighP64 //go:noescape func VgetHighP64(r *arm.Poly64X1, v0 *arm.Poly64X2) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighP8 VgetHighP8 //go:noescape func VgetHighP8(r *arm.Poly8X8, v0 *arm.Poly8X16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowS8 VgetLowS8 //go:noescape func VgetLowS8(r *arm.Int8X8, v0 *arm.Int8X16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowS16 VgetLowS16 //go:noescape func VgetLowS16(r *arm.Int16X4, v0 *arm.Int16X8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowS32 VgetLowS32 //go:noescape func VgetLowS32(r *arm.Int32X2, v0 *arm.Int32X4) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowS64 VgetLowS64 //go:noescape func VgetLowS64(r *arm.Int64X1, v0 *arm.Int64X2) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowU8 VgetLowU8 //go:noescape func VgetLowU8(r *arm.Uint8X8, v0 *arm.Uint8X16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowU16 VgetLowU16 //go:noescape func VgetLowU16(r *arm.Uint16X4, v0 *arm.Uint16X8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowU32 VgetLowU32 //go:noescape func VgetLowU32(r *arm.Uint32X2, v0 *arm.Uint32X4) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowU64 VgetLowU64 //go:noescape func VgetLowU64(r *arm.Uint64X1, v0 *arm.Uint64X2) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowF32 VgetLowF32 //go:noescape func VgetLowF32(r *arm.Float32X2, v0 *arm.Float32X4) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowF64 VgetLowF64 //go:noescape func VgetLowF64(r *arm.Float64X1, v0 *arm.Float64X2) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowP16 VgetLowP16 //go:noescape func VgetLowP16(r *arm.Poly16X4, v0 *arm.Poly16X8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowP64 VgetLowP64 //go:noescape func VgetLowP64(r *arm.Poly64X1, v0 *arm.Poly64X2) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowP8 VgetLowP8 //go:noescape func VgetLowP8(r *arm.Poly8X8, v0 *arm.Poly8X16) // Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddS8 VhaddS8 //go:noescape func VhaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddS16 VhaddS16 //go:noescape func VhaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddS32 VhaddS32 //go:noescape func VhaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddU8 VhaddU8 //go:noescape func VhaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddU16 VhaddU16 //go:noescape func VhaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddU32 VhaddU32 //go:noescape func VhaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddqS8 VhaddqS8 //go:noescape func VhaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddqS16 VhaddqS16 //go:noescape func VhaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddqS32 VhaddqS32 //go:noescape func VhaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddqU8 VhaddqU8 //go:noescape func VhaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddqU16 VhaddqU16 //go:noescape func VhaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddqU32 VhaddqU32 //go:noescape func VhaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubS8 VhsubS8 //go:noescape func VhsubS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubS16 VhsubS16 //go:noescape func VhsubS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubS32 VhsubS32 //go:noescape func VhsubS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubU8 VhsubU8 //go:noescape func VhsubU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubU16 VhsubU16 //go:noescape func VhsubU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubU32 VhsubU32 //go:noescape func VhsubU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubqS8 VhsubqS8 //go:noescape func VhsubqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubqS16 VhsubqS16 //go:noescape func VhsubqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubqS32 VhsubqS32 //go:noescape func VhsubqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubqU8 VhsubqU8 //go:noescape func VhsubqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubqU16 VhsubqU16 //go:noescape func VhsubqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubqU32 VhsubqU32 //go:noescape func VhsubqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxS8 VmaxS8 //go:noescape func VmaxS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxS16 VmaxS16 //go:noescape func VmaxS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxS32 VmaxS32 //go:noescape func VmaxS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxU8 VmaxU8 //go:noescape func VmaxU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxU16 VmaxU16 //go:noescape func VmaxU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxU32 VmaxU32 //go:noescape func VmaxU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxF32 VmaxF32 //go:noescape func VmaxF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxF64 VmaxF64 //go:noescape func VmaxF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxnmF32 VmaxnmF32 //go:noescape func VmaxnmF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxnmF64 VmaxnmF64 //go:noescape func VmaxnmF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxnmqF32 VmaxnmqF32 //go:noescape func VmaxnmqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxnmqF64 VmaxnmqF64 //go:noescape func VmaxnmqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VmaxnmvF32 VmaxnmvF32 //go:noescape func VmaxnmvF32(r *arm.Float32, v0 *arm.Float32X2) // Floating-point Maximum Number across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VmaxnmvqF32 VmaxnmvqF32 //go:noescape func VmaxnmvqF32(r *arm.Float32, v0 *arm.Float32X4) // Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VmaxnmvqF64 VmaxnmvqF64 //go:noescape func VmaxnmvqF64(r *arm.Float64, v0 *arm.Float64X2) // Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqS8 VmaxqS8 //go:noescape func VmaxqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqS16 VmaxqS16 //go:noescape func VmaxqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqS32 VmaxqS32 //go:noescape func VmaxqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqU8 VmaxqU8 //go:noescape func VmaxqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqU16 VmaxqU16 //go:noescape func VmaxqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqU32 VmaxqU32 //go:noescape func VmaxqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqF32 VmaxqF32 //go:noescape func VmaxqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqF64 VmaxqF64 //go:noescape func VmaxqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VmaxvS8 VmaxvS8 //go:noescape func VmaxvS8(r *arm.Int8, v0 *arm.Int8X8) // Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VmaxvS16 VmaxvS16 //go:noescape func VmaxvS16(r *arm.Int16, v0 *arm.Int16X4) // Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxvS32 VmaxvS32 //go:noescape func VmaxvS32(r *arm.Int32, v0 *arm.Int32X2) // Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VmaxvU8 VmaxvU8 //go:noescape func VmaxvU8(r *arm.Uint8, v0 *arm.Uint8X8) // Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VmaxvU16 VmaxvU16 //go:noescape func VmaxvU16(r *arm.Uint16, v0 *arm.Uint16X4) // Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxvU32 VmaxvU32 //go:noescape func VmaxvU32(r *arm.Uint32, v0 *arm.Uint32X2) // Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VmaxvF32 VmaxvF32 //go:noescape func VmaxvF32(r *arm.Float32, v0 *arm.Float32X2) // Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VmaxvqS8 VmaxvqS8 //go:noescape func VmaxvqS8(r *arm.Int8, v0 *arm.Int8X16) // Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VmaxvqS16 VmaxvqS16 //go:noescape func VmaxvqS16(r *arm.Int16, v0 *arm.Int16X8) // Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VmaxvqS32 VmaxvqS32 //go:noescape func VmaxvqS32(r *arm.Int32, v0 *arm.Int32X4) // Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VmaxvqU8 VmaxvqU8 //go:noescape func VmaxvqU8(r *arm.Uint8, v0 *arm.Uint8X16) // Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VmaxvqU16 VmaxvqU16 //go:noescape func VmaxvqU16(r *arm.Uint16, v0 *arm.Uint16X8) // Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VmaxvqU32 VmaxvqU32 //go:noescape func VmaxvqU32(r *arm.Uint32, v0 *arm.Uint32X4) // Floating-point Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VmaxvqF32 VmaxvqF32 //go:noescape func VmaxvqF32(r *arm.Float32, v0 *arm.Float32X4) // Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VmaxvqF64 VmaxvqF64 //go:noescape func VmaxvqF64(r *arm.Float64, v0 *arm.Float64X2) // Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminS8 VminS8 //go:noescape func VminS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminS16 VminS16 //go:noescape func VminS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminS32 VminS32 //go:noescape func VminS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminU8 VminU8 //go:noescape func VminU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminU16 VminU16 //go:noescape func VminU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminU32 VminU32 //go:noescape func VminU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminF32 VminF32 //go:noescape func VminF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminF64 VminF64 //go:noescape func VminF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminnmF32 VminnmF32 //go:noescape func VminnmF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminnmF64 VminnmF64 //go:noescape func VminnmF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminnmqF32 VminnmqF32 //go:noescape func VminnmqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminnmqF64 VminnmqF64 //go:noescape func VminnmqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VminnmvF32 VminnmvF32 //go:noescape func VminnmvF32(r *arm.Float32, v0 *arm.Float32X2) // Floating-point Minimum Number across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VminnmvqF32 VminnmvqF32 //go:noescape func VminnmvqF32(r *arm.Float32, v0 *arm.Float32X4) // Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VminnmvqF64 VminnmvqF64 //go:noescape func VminnmvqF64(r *arm.Float64, v0 *arm.Float64X2) // Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqS8 VminqS8 //go:noescape func VminqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqS16 VminqS16 //go:noescape func VminqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqS32 VminqS32 //go:noescape func VminqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqU8 VminqU8 //go:noescape func VminqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqU16 VminqU16 //go:noescape func VminqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqU32 VminqU32 //go:noescape func VminqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqF32 VminqF32 //go:noescape func VminqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqF64 VminqF64 //go:noescape func VminqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VminvS8 VminvS8 //go:noescape func VminvS8(r *arm.Int8, v0 *arm.Int8X8) // Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VminvS16 VminvS16 //go:noescape func VminvS16(r *arm.Int16, v0 *arm.Int16X4) // Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminvS32 VminvS32 //go:noescape func VminvS32(r *arm.Int32, v0 *arm.Int32X2) // Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VminvU8 VminvU8 //go:noescape func VminvU8(r *arm.Uint8, v0 *arm.Uint8X8) // Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VminvU16 VminvU16 //go:noescape func VminvU16(r *arm.Uint16, v0 *arm.Uint16X4) // Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminvU32 VminvU32 //go:noescape func VminvU32(r *arm.Uint32, v0 *arm.Uint32X2) // Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VminvF32 VminvF32 //go:noescape func VminvF32(r *arm.Float32, v0 *arm.Float32X2) // Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VminvqS8 VminvqS8 //go:noescape func VminvqS8(r *arm.Int8, v0 *arm.Int8X16) // Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VminvqS16 VminvqS16 //go:noescape func VminvqS16(r *arm.Int16, v0 *arm.Int16X8) // Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VminvqS32 VminvqS32 //go:noescape func VminvqS32(r *arm.Int32, v0 *arm.Int32X4) // Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VminvqU8 VminvqU8 //go:noescape func VminvqU8(r *arm.Uint8, v0 *arm.Uint8X16) // Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VminvqU16 VminvqU16 //go:noescape func VminvqU16(r *arm.Uint16, v0 *arm.Uint16X8) // Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VminvqU32 VminvqU32 //go:noescape func VminvqU32(r *arm.Uint32, v0 *arm.Uint32X4) // Floating-point Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VminvqF32 VminvqF32 //go:noescape func VminvqF32(r *arm.Float32, v0 *arm.Float32X4) // Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VminvqF64 VminvqF64 //go:noescape func VminvqF64(r *arm.Float64, v0 *arm.Float64X2) // Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. // //go:linkname VmlaS8 VmlaS8 //go:noescape func VmlaS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8, v2 *arm.Int8X8) // Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. // //go:linkname VmlaS16 VmlaS16 //go:noescape func VmlaS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4) // Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. // //go:linkname VmlaS32 VmlaS32 //go:noescape func VmlaS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2) // Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. // //go:linkname VmlaU8 VmlaU8 //go:noescape func VmlaU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8) // Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. // //go:linkname VmlaU16 VmlaU16 //go:noescape func VmlaU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4) // Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. // //go:linkname VmlaU32 VmlaU32 //go:noescape func VmlaU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2) // Floating-point multiply-add to accumulator // //go:linkname VmlaF32 VmlaF32 //go:noescape func VmlaF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32X2) // Floating-point multiply-add to accumulator // //go:linkname VmlaF64 VmlaF64 //go:noescape func VmlaF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64X1) // Vector multiply accumulate with scalar // //go:linkname VmlaNS16 VmlaNS16 //go:noescape func VmlaNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16) // Vector multiply accumulate with scalar // //go:linkname VmlaNS32 VmlaNS32 //go:noescape func VmlaNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32) // Vector multiply accumulate with scalar // //go:linkname VmlaNU16 VmlaNU16 //go:noescape func VmlaNU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16) // Vector multiply accumulate with scalar // //go:linkname VmlaNU32 VmlaNU32 //go:noescape func VmlaNU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32) // Vector multiply accumulate with scalar // //go:linkname VmlaNF32 VmlaNF32 //go:noescape func VmlaNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32) // Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalS8 VmlalS8 //go:noescape func VmlalS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8, v2 *arm.Int8X8) // Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalS16 VmlalS16 //go:noescape func VmlalS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4) // Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalS32 VmlalS32 //go:noescape func VmlalS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2) // Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalU8 VmlalU8 //go:noescape func VmlalU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8) // Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalU16 VmlalU16 //go:noescape func VmlalU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4) // Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalU32 VmlalU32 //go:noescape func VmlalU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2) // Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalHighS8 VmlalHighS8 //go:noescape func VmlalHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16, v2 *arm.Int8X16) // Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalHighS16 VmlalHighS16 //go:noescape func VmlalHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8) // Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalHighS32 VmlalHighS32 //go:noescape func VmlalHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4) // Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalHighU8 VmlalHighU8 //go:noescape func VmlalHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16, v2 *arm.Uint8X16) // Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalHighU16 VmlalHighU16 //go:noescape func VmlalHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16X8) // Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalHighU32 VmlalHighU32 //go:noescape func VmlalHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalHighNS16 VmlalHighNS16 //go:noescape func VmlalHighNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16) // Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalHighNS32 VmlalHighNS32 //go:noescape func VmlalHighNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32) // Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalHighNU16 VmlalHighNU16 //go:noescape func VmlalHighNU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16) // Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlalHighNU32 VmlalHighNU32 //go:noescape func VmlalHighNU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32) // Vector widening multiply accumulate with scalar // //go:linkname VmlalNS16 VmlalNS16 //go:noescape func VmlalNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16) // Vector widening multiply accumulate with scalar // //go:linkname VmlalNS32 VmlalNS32 //go:noescape func VmlalNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32) // Vector widening multiply accumulate with scalar // //go:linkname VmlalNU16 VmlalNU16 //go:noescape func VmlalNU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16) // Vector widening multiply accumulate with scalar // //go:linkname VmlalNU32 VmlalNU32 //go:noescape func VmlalNU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32) // Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. // //go:linkname VmlaqS8 VmlaqS8 //go:noescape func VmlaqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16) // Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. // //go:linkname VmlaqS16 VmlaqS16 //go:noescape func VmlaqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8) // Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. // //go:linkname VmlaqS32 VmlaqS32 //go:noescape func VmlaqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4) // Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. // //go:linkname VmlaqU8 VmlaqU8 //go:noescape func VmlaqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16) // Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. // //go:linkname VmlaqU16 VmlaqU16 //go:noescape func VmlaqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8) // Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP register. // //go:linkname VmlaqU32 VmlaqU32 //go:noescape func VmlaqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // Floating-point multiply-add to accumulator // //go:linkname VmlaqF32 VmlaqF32 //go:noescape func VmlaqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32X4) // Floating-point multiply-add to accumulator // //go:linkname VmlaqF64 VmlaqF64 //go:noescape func VmlaqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64X2) // Vector multiply accumulate with scalar // //go:linkname VmlaqNS16 VmlaqNS16 //go:noescape func VmlaqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16) // Vector multiply accumulate with scalar // //go:linkname VmlaqNS32 VmlaqNS32 //go:noescape func VmlaqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32) // Vector multiply accumulate with scalar // //go:linkname VmlaqNU16 VmlaqNU16 //go:noescape func VmlaqNU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16) // Vector multiply accumulate with scalar // //go:linkname VmlaqNU32 VmlaqNU32 //go:noescape func VmlaqNU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32) // Vector multiply accumulate with scalar // //go:linkname VmlaqNF32 VmlaqNF32 //go:noescape func VmlaqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32) // Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. // //go:linkname VmlsS8 VmlsS8 //go:noescape func VmlsS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8, v2 *arm.Int8X8) // Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. // //go:linkname VmlsS16 VmlsS16 //go:noescape func VmlsS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4) // Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. // //go:linkname VmlsS32 VmlsS32 //go:noescape func VmlsS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2) // Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. // //go:linkname VmlsU8 VmlsU8 //go:noescape func VmlsU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8) // Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. // //go:linkname VmlsU16 VmlsU16 //go:noescape func VmlsU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4) // Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. // //go:linkname VmlsU32 VmlsU32 //go:noescape func VmlsU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2) // Multiply-subtract from accumulator // //go:linkname VmlsF32 VmlsF32 //go:noescape func VmlsF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32X2) // Multiply-subtract from accumulator // //go:linkname VmlsF64 VmlsF64 //go:noescape func VmlsF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1, v2 *arm.Float64X1) // Vector multiply subtract with scalar // //go:linkname VmlsNS16 VmlsNS16 //go:noescape func VmlsNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16) // Vector multiply subtract with scalar // //go:linkname VmlsNS32 VmlsNS32 //go:noescape func VmlsNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32) // Vector multiply subtract with scalar // //go:linkname VmlsNU16 VmlsNU16 //go:noescape func VmlsNU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4, v2 *arm.Uint16) // Vector multiply subtract with scalar // //go:linkname VmlsNU32 VmlsNU32 //go:noescape func VmlsNU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2, v2 *arm.Uint32) // Vector multiply subtract with scalar // //go:linkname VmlsNF32 VmlsNF32 //go:noescape func VmlsNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2, v2 *arm.Float32) // Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlslS8 VmlslS8 //go:noescape func VmlslS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8, v2 *arm.Int8X8) // Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlslS16 VmlslS16 //go:noescape func VmlslS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4) // Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlslS32 VmlslS32 //go:noescape func VmlslS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2) // Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmlslU8 VmlslU8 //go:noescape func VmlslU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8) // Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmlslU16 VmlslU16 //go:noescape func VmlslU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16X4) // Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmlslU32 VmlslU32 //go:noescape func VmlslU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32X2) // Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlslHighS8 VmlslHighS8 //go:noescape func VmlslHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16, v2 *arm.Int8X16) // Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlslHighS16 VmlslHighS16 //go:noescape func VmlslHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8) // Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlslHighS32 VmlslHighS32 //go:noescape func VmlslHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4) // Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmlslHighU8 VmlslHighU8 //go:noescape func VmlslHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16, v2 *arm.Uint8X16) // Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmlslHighU16 VmlslHighU16 //go:noescape func VmlslHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16X8) // Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmlslHighU32 VmlslHighU32 //go:noescape func VmlslHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlslHighNS16 VmlslHighNS16 //go:noescape func VmlslHighNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16) // Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmlslHighNS32 VmlslHighNS32 //go:noescape func VmlslHighNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32) // Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmlslHighNU16 VmlslHighNU16 //go:noescape func VmlslHighNU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8, v2 *arm.Uint16) // Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmlslHighNU32 VmlslHighNU32 //go:noescape func VmlslHighNU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4, v2 *arm.Uint32) // Vector widening multiply subtract with scalar // //go:linkname VmlslNS16 VmlslNS16 //go:noescape func VmlslNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16) // Vector widening multiply subtract with scalar // //go:linkname VmlslNS32 VmlslNS32 //go:noescape func VmlslNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32) // Vector widening multiply subtract with scalar // //go:linkname VmlslNU16 VmlslNU16 //go:noescape func VmlslNU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4, v2 *arm.Uint16) // Vector widening multiply subtract with scalar // //go:linkname VmlslNU32 VmlslNU32 //go:noescape func VmlslNU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2, v2 *arm.Uint32) // Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. // //go:linkname VmlsqS8 VmlsqS8 //go:noescape func VmlsqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Int8X16) // Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. // //go:linkname VmlsqS16 VmlsqS16 //go:noescape func VmlsqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8) // Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. // //go:linkname VmlsqS32 VmlsqS32 //go:noescape func VmlsqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4) // Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. // //go:linkname VmlsqU8 VmlsqU8 //go:noescape func VmlsqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16) // Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. // //go:linkname VmlsqU16 VmlsqU16 //go:noescape func VmlsqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8) // Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP register. // //go:linkname VmlsqU32 VmlsqU32 //go:noescape func VmlsqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // Multiply-subtract from accumulator // //go:linkname VmlsqF32 VmlsqF32 //go:noescape func VmlsqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32X4) // Multiply-subtract from accumulator // //go:linkname VmlsqF64 VmlsqF64 //go:noescape func VmlsqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2, v2 *arm.Float64X2) // Vector multiply subtract with scalar // //go:linkname VmlsqNS16 VmlsqNS16 //go:noescape func VmlsqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16) // Vector multiply subtract with scalar // //go:linkname VmlsqNS32 VmlsqNS32 //go:noescape func VmlsqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32) // Vector multiply subtract with scalar // //go:linkname VmlsqNU16 VmlsqNU16 //go:noescape func VmlsqNU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8, v2 *arm.Uint16) // Vector multiply subtract with scalar // //go:linkname VmlsqNU32 VmlsqNU32 //go:noescape func VmlsqNU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32) // Vector multiply subtract with scalar // //go:linkname VmlsqNF32 VmlsqNF32 //go:noescape func VmlsqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4, v2 *arm.Float32) // Signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of signed 8-bit integer values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element. // //go:linkname VmmlaqS32 VmmlaqS32 //go:noescape func VmmlaqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int8X16, v2 *arm.Int8X16) // Unsigned 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned 8-bit integer values in the first source vector by the 8x2 matrix of unsigned 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element. // //go:linkname VmmlaqU32 VmmlaqU32 //go:noescape func VmmlaqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint8X16, v2 *arm.Uint8X16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNS8 VmovNS8 //go:noescape func VmovNS8(r *arm.Int8X8, v0 *arm.Int8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNS16 VmovNS16 //go:noescape func VmovNS16(r *arm.Int16X4, v0 *arm.Int16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNS32 VmovNS32 //go:noescape func VmovNS32(r *arm.Int32X2, v0 *arm.Int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNS64 VmovNS64 //go:noescape func VmovNS64(r *arm.Int64X1, v0 *arm.Int64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNU8 VmovNU8 //go:noescape func VmovNU8(r *arm.Uint8X8, v0 *arm.Uint8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNU16 VmovNU16 //go:noescape func VmovNU16(r *arm.Uint16X4, v0 *arm.Uint16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNU32 VmovNU32 //go:noescape func VmovNU32(r *arm.Uint32X2, v0 *arm.Uint32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNU64 VmovNU64 //go:noescape func VmovNU64(r *arm.Uint64X1, v0 *arm.Uint64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNF32 VmovNF32 //go:noescape func VmovNF32(r *arm.Float32X2, v0 *arm.Float32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNF64 VmovNF64 //go:noescape func VmovNF64(r *arm.Float64X1, v0 *arm.Float64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNP16 VmovNP16 //go:noescape func VmovNP16(r *arm.Poly16X4, v0 *arm.Poly16) // vmov_n_p64 // //go:linkname VmovNP64 VmovNP64 //go:noescape func VmovNP64(r *arm.Poly64X1, v0 *arm.Poly64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNP8 VmovNP8 //go:noescape func VmovNP8(r *arm.Poly8X8, v0 *arm.Poly8) // Vector move // //go:linkname VmovlS8 VmovlS8 //go:noescape func VmovlS8(r *arm.Int16X8, v0 *arm.Int8X8) // Vector move // //go:linkname VmovlS16 VmovlS16 //go:noescape func VmovlS16(r *arm.Int32X4, v0 *arm.Int16X4) // Vector move // //go:linkname VmovlS32 VmovlS32 //go:noescape func VmovlS32(r *arm.Int64X2, v0 *arm.Int32X2) // Vector move // //go:linkname VmovlU8 VmovlU8 //go:noescape func VmovlU8(r *arm.Uint16X8, v0 *arm.Uint8X8) // Vector move // //go:linkname VmovlU16 VmovlU16 //go:noescape func VmovlU16(r *arm.Uint32X4, v0 *arm.Uint16X4) // Vector move // //go:linkname VmovlU32 VmovlU32 //go:noescape func VmovlU32(r *arm.Uint64X2, v0 *arm.Uint32X2) // Vector move // //go:linkname VmovlHighS8 VmovlHighS8 //go:noescape func VmovlHighS8(r *arm.Int16X8, v0 *arm.Int8X16) // Vector move // //go:linkname VmovlHighS16 VmovlHighS16 //go:noescape func VmovlHighS16(r *arm.Int32X4, v0 *arm.Int16X8) // Vector move // //go:linkname VmovlHighS32 VmovlHighS32 //go:noescape func VmovlHighS32(r *arm.Int64X2, v0 *arm.Int32X4) // Vector move // //go:linkname VmovlHighU8 VmovlHighU8 //go:noescape func VmovlHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16) // Vector move // //go:linkname VmovlHighU16 VmovlHighU16 //go:noescape func VmovlHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8) // Vector move // //go:linkname VmovlHighU32 VmovlHighU32 //go:noescape func VmovlHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4) // Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VmovnS16 VmovnS16 //go:noescape func VmovnS16(r *arm.Int8X8, v0 *arm.Int16X8) // Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VmovnS32 VmovnS32 //go:noescape func VmovnS32(r *arm.Int16X4, v0 *arm.Int32X4) // Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VmovnS64 VmovnS64 //go:noescape func VmovnS64(r *arm.Int32X2, v0 *arm.Int64X2) // Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VmovnU16 VmovnU16 //go:noescape func VmovnU16(r *arm.Uint8X8, v0 *arm.Uint16X8) // Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VmovnU32 VmovnU32 //go:noescape func VmovnU32(r *arm.Uint16X4, v0 *arm.Uint32X4) // Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VmovnU64 VmovnU64 //go:noescape func VmovnU64(r *arm.Uint32X2, v0 *arm.Uint64X2) // Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VmovnHighS16 VmovnHighS16 //go:noescape func VmovnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8) // Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VmovnHighS32 VmovnHighS32 //go:noescape func VmovnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4) // Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VmovnHighS64 VmovnHighS64 //go:noescape func VmovnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2) // Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VmovnHighU16 VmovnHighU16 //go:noescape func VmovnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8) // Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VmovnHighU32 VmovnHighU32 //go:noescape func VmovnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4) // Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VmovnHighU64 VmovnHighU64 //go:noescape func VmovnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNS8 VmovqNS8 //go:noescape func VmovqNS8(r *arm.Int8X16, v0 *arm.Int8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNS16 VmovqNS16 //go:noescape func VmovqNS16(r *arm.Int16X8, v0 *arm.Int16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNS32 VmovqNS32 //go:noescape func VmovqNS32(r *arm.Int32X4, v0 *arm.Int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNS64 VmovqNS64 //go:noescape func VmovqNS64(r *arm.Int64X2, v0 *arm.Int64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNU8 VmovqNU8 //go:noescape func VmovqNU8(r *arm.Uint8X16, v0 *arm.Uint8) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNU16 VmovqNU16 //go:noescape func VmovqNU16(r *arm.Uint16X8, v0 *arm.Uint16) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNU32 VmovqNU32 //go:noescape func VmovqNU32(r *arm.Uint32X4, v0 *arm.Uint32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNU64 VmovqNU64 //go:noescape func VmovqNU64(r *arm.Uint64X2, v0 *arm.Uint64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNF32 VmovqNF32 //go:noescape func VmovqNF32(r *arm.Float32X4, v0 *arm.Float32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNF64 VmovqNF64 //go:noescape func VmovqNF64(r *arm.Float64X2, v0 *arm.Float64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNP16 VmovqNP16 //go:noescape func VmovqNP16(r *arm.Poly16X8, v0 *arm.Poly16) // vmovq_n_p64 // //go:linkname VmovqNP64 VmovqNP64 //go:noescape func VmovqNP64(r *arm.Poly64X2, v0 *arm.Poly64) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNP8 VmovqNP8 //go:noescape func VmovqNP8(r *arm.Poly8X16, v0 *arm.Poly8) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulS8 VmulS8 //go:noescape func VmulS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulS16 VmulS16 //go:noescape func VmulS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulS32 VmulS32 //go:noescape func VmulS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulU8 VmulU8 //go:noescape func VmulU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulU16 VmulU16 //go:noescape func VmulU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulU32 VmulU32 //go:noescape func VmulU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulF32 VmulF32 //go:noescape func VmulF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulF64 VmulF64 //go:noescape func VmulF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Vector multiply by scalar // //go:linkname VmulNS16 VmulNS16 //go:noescape func VmulNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16) // Vector multiply by scalar // //go:linkname VmulNS32 VmulNS32 //go:noescape func VmulNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32) // Vector multiply by scalar // //go:linkname VmulNU16 VmulNU16 //go:noescape func VmulNU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16) // Vector multiply by scalar // //go:linkname VmulNU32 VmulNU32 //go:noescape func VmulNU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32) // Vector multiply by scalar // //go:linkname VmulNF32 VmulNF32 //go:noescape func VmulNF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32) // Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulNF64 VmulNF64 //go:noescape func VmulNF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64) // Polynomial Multiply. This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulP8 VmulP8 //go:noescape func VmulP8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmullS8 VmullS8 //go:noescape func VmullS8(r *arm.Int16X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmullS16 VmullS16 //go:noescape func VmullS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmullS32 VmullS32 //go:noescape func VmullS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmullU8 VmullU8 //go:noescape func VmullU8(r *arm.Uint16X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmullU16 VmullU16 //go:noescape func VmullU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmullU32 VmullU32 //go:noescape func VmullU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmullHighS8 VmullHighS8 //go:noescape func VmullHighS8(r *arm.Int16X8, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmullHighS16 VmullHighS16 //go:noescape func VmullHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmullHighS32 VmullHighS32 //go:noescape func VmullHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmullHighU8 VmullHighU8 //go:noescape func VmullHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmullHighU16 VmullHighU16 //go:noescape func VmullHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmullHighU32 VmullHighU32 //go:noescape func VmullHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmullHighNS16 VmullHighNS16 //go:noescape func VmullHighNS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16) // Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmullHighNS32 VmullHighNS32 //go:noescape func VmullHighNS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32) // Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmullHighNU16 VmullHighNU16 //go:noescape func VmullHighNU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16) // Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this instruction are unsigned integer values. // //go:linkname VmullHighNU32 VmullHighNU32 //go:noescape func VmullHighNU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32) // Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmullHighP64 VmullHighP64 //go:noescape func VmullHighP64(r *arm.Poly128, v0 *arm.Poly64X2, v1 *arm.Poly64X2) // Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmullHighP8 VmullHighP8 //go:noescape func VmullHighP8(r *arm.Poly16X8, v0 *arm.Poly8X16, v1 *arm.Poly8X16) // Vector long multiply with scalar // //go:linkname VmullNS16 VmullNS16 //go:noescape func VmullNS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16) // Vector long multiply with scalar // //go:linkname VmullNS32 VmullNS32 //go:noescape func VmullNS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32) // Vector long multiply with scalar // //go:linkname VmullNU16 VmullNU16 //go:noescape func VmullNU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16) // Vector long multiply with scalar // //go:linkname VmullNU32 VmullNU32 //go:noescape func VmullNU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32) // Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmullP64 VmullP64 //go:noescape func VmullP64(r *arm.Poly128, v0 *arm.Poly64, v1 *arm.Poly64) // Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VmullP8 VmullP8 //go:noescape func VmullP8(r *arm.Poly16X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqS8 VmulqS8 //go:noescape func VmulqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqS16 VmulqS16 //go:noescape func VmulqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqS32 VmulqS32 //go:noescape func VmulqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqU8 VmulqU8 //go:noescape func VmulqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqU16 VmulqU16 //go:noescape func VmulqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqU32 VmulqU32 //go:noescape func VmulqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqF32 VmulqF32 //go:noescape func VmulqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqF64 VmulqF64 //go:noescape func VmulqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Vector multiply by scalar // //go:linkname VmulqNS16 VmulqNS16 //go:noescape func VmulqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16) // Vector multiply by scalar // //go:linkname VmulqNS32 VmulqNS32 //go:noescape func VmulqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32) // Vector multiply by scalar // //go:linkname VmulqNU16 VmulqNU16 //go:noescape func VmulqNU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16) // Vector multiply by scalar // //go:linkname VmulqNU32 VmulqNU32 //go:noescape func VmulqNU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32) // Vector multiply by scalar // //go:linkname VmulqNF32 VmulqNF32 //go:noescape func VmulqNF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32) // Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqNF64 VmulqNF64 //go:noescape func VmulqNF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64) // Polynomial Multiply. This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqP8 VmulqP8 //go:noescape func VmulqP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16) // Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulxF32 VmulxF32 //go:noescape func VmulxF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulxF64 VmulxF64 //go:noescape func VmulxF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulxdF64 VmulxdF64 //go:noescape func VmulxdF64(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64) // Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulxqF32 VmulxqF32 //go:noescape func VmulxqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulxqF64 VmulxqF64 //go:noescape func VmulxqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulxsF32 VmulxsF32 //go:noescape func VmulxsF32(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnS8 VmvnS8 //go:noescape func VmvnS8(r *arm.Int8X8, v0 *arm.Int8X8) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnS16 VmvnS16 //go:noescape func VmvnS16(r *arm.Int16X4, v0 *arm.Int16X4) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnS32 VmvnS32 //go:noescape func VmvnS32(r *arm.Int32X2, v0 *arm.Int32X2) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnU8 VmvnU8 //go:noescape func VmvnU8(r *arm.Uint8X8, v0 *arm.Uint8X8) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnU16 VmvnU16 //go:noescape func VmvnU16(r *arm.Uint16X4, v0 *arm.Uint16X4) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnU32 VmvnU32 //go:noescape func VmvnU32(r *arm.Uint32X2, v0 *arm.Uint32X2) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnP8 VmvnP8 //go:noescape func VmvnP8(r *arm.Poly8X8, v0 *arm.Poly8X8) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnqS8 VmvnqS8 //go:noescape func VmvnqS8(r *arm.Int8X16, v0 *arm.Int8X16) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnqS16 VmvnqS16 //go:noescape func VmvnqS16(r *arm.Int16X8, v0 *arm.Int16X8) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnqS32 VmvnqS32 //go:noescape func VmvnqS32(r *arm.Int32X4, v0 *arm.Int32X4) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnqU8 VmvnqU8 //go:noescape func VmvnqU8(r *arm.Uint8X16, v0 *arm.Uint8X16) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnqU16 VmvnqU16 //go:noescape func VmvnqU16(r *arm.Uint16X8, v0 *arm.Uint16X8) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnqU32 VmvnqU32 //go:noescape func VmvnqU32(r *arm.Uint32X4, v0 *arm.Uint32X4) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnqP8 VmvnqP8 //go:noescape func VmvnqP8(r *arm.Poly8X16, v0 *arm.Poly8X16) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegS8 VnegS8 //go:noescape func VnegS8(r *arm.Int8X8, v0 *arm.Int8X8) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegS16 VnegS16 //go:noescape func VnegS16(r *arm.Int16X4, v0 *arm.Int16X4) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegS32 VnegS32 //go:noescape func VnegS32(r *arm.Int32X2, v0 *arm.Int32X2) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegS64 VnegS64 //go:noescape func VnegS64(r *arm.Int64X1, v0 *arm.Int64X1) // Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegF32 VnegF32 //go:noescape func VnegF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegF64 VnegF64 //go:noescape func VnegF64(r *arm.Float64X1, v0 *arm.Float64X1) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegdS64 VnegdS64 //go:noescape func VnegdS64(r *arm.Int64, v0 *arm.Int64) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegqS8 VnegqS8 //go:noescape func VnegqS8(r *arm.Int8X16, v0 *arm.Int8X16) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegqS16 VnegqS16 //go:noescape func VnegqS16(r *arm.Int16X8, v0 *arm.Int16X8) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegqS32 VnegqS32 //go:noescape func VnegqS32(r *arm.Int32X4, v0 *arm.Int32X4) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegqS64 VnegqS64 //go:noescape func VnegqS64(r *arm.Int64X2, v0 *arm.Int64X2) // Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegqF32 VnegqF32 //go:noescape func VnegqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegqF64 VnegqF64 //go:noescape func VnegqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornS8 VornS8 //go:noescape func VornS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornS16 VornS16 //go:noescape func VornS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornS32 VornS32 //go:noescape func VornS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornS64 VornS64 //go:noescape func VornS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornU8 VornU8 //go:noescape func VornU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornU16 VornU16 //go:noescape func VornU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornU32 VornU32 //go:noescape func VornU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornU64 VornU64 //go:noescape func VornU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqS8 VornqS8 //go:noescape func VornqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqS16 VornqS16 //go:noescape func VornqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqS32 VornqS32 //go:noescape func VornqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqS64 VornqS64 //go:noescape func VornqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqU8 VornqU8 //go:noescape func VornqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqU16 VornqU16 //go:noescape func VornqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqU32 VornqU32 //go:noescape func VornqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqU64 VornqU64 //go:noescape func VornqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrS8 VorrS8 //go:noescape func VorrS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrS16 VorrS16 //go:noescape func VorrS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrS32 VorrS32 //go:noescape func VorrS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrS64 VorrS64 //go:noescape func VorrS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrU8 VorrU8 //go:noescape func VorrU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrU16 VorrU16 //go:noescape func VorrU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrU32 VorrU32 //go:noescape func VorrU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrU64 VorrU64 //go:noescape func VorrU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqS8 VorrqS8 //go:noescape func VorrqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqS16 VorrqS16 //go:noescape func VorrqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqS32 VorrqS32 //go:noescape func VorrqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqS64 VorrqS64 //go:noescape func VorrqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqU8 VorrqU8 //go:noescape func VorrqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqU16 VorrqU16 //go:noescape func VorrqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqU32 VorrqU32 //go:noescape func VorrqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqU64 VorrqU64 //go:noescape func VorrqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpadalS8 VpadalS8 //go:noescape func VpadalS8(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int8X8) // Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpadalS16 VpadalS16 //go:noescape func VpadalS16(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int16X4) // Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpadalS32 VpadalS32 //go:noescape func VpadalS32(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int32X2) // Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpadalU8 VpadalU8 //go:noescape func VpadalU8(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint8X8) // Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpadalU16 VpadalU16 //go:noescape func VpadalU16(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint16X4) // Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpadalU32 VpadalU32 //go:noescape func VpadalU32(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint32X2) // Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpadalqS8 VpadalqS8 //go:noescape func VpadalqS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16) // Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpadalqS16 VpadalqS16 //go:noescape func VpadalqS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8) // Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpadalqS32 VpadalqS32 //go:noescape func VpadalqS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4) // Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpadalqU8 VpadalqU8 //go:noescape func VpadalqU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16) // Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpadalqU16 VpadalqU16 //go:noescape func VpadalqU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8) // Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpadalqU32 VpadalqU32 //go:noescape func VpadalqU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddS8 VpaddS8 //go:noescape func VpaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddS16 VpaddS16 //go:noescape func VpaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddS32 VpaddS32 //go:noescape func VpaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddU8 VpaddU8 //go:noescape func VpaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddU16 VpaddU16 //go:noescape func VpaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddU32 VpaddU32 //go:noescape func VpaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpaddF32 VpaddF32 //go:noescape func VpaddF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpadddS64 VpadddS64 //go:noescape func VpadddS64(r *arm.Int64, v0 *arm.Int64X2) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpadddU64 VpadddU64 //go:noescape func VpadddU64(r *arm.Uint64, v0 *arm.Uint64X2) // Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpadddF64 VpadddF64 //go:noescape func VpadddF64(r *arm.Float64, v0 *arm.Float64X2) // Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpaddlS8 VpaddlS8 //go:noescape func VpaddlS8(r *arm.Int16X4, v0 *arm.Int8X8) // Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpaddlS16 VpaddlS16 //go:noescape func VpaddlS16(r *arm.Int32X2, v0 *arm.Int16X4) // Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpaddlS32 VpaddlS32 //go:noescape func VpaddlS32(r *arm.Int64X1, v0 *arm.Int32X2) // Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpaddlU8 VpaddlU8 //go:noescape func VpaddlU8(r *arm.Uint16X4, v0 *arm.Uint8X8) // Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpaddlU16 VpaddlU16 //go:noescape func VpaddlU16(r *arm.Uint32X2, v0 *arm.Uint16X4) // Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpaddlU32 VpaddlU32 //go:noescape func VpaddlU32(r *arm.Uint64X1, v0 *arm.Uint32X2) // Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpaddlqS8 VpaddlqS8 //go:noescape func VpaddlqS8(r *arm.Int16X8, v0 *arm.Int8X16) // Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpaddlqS16 VpaddlqS16 //go:noescape func VpaddlqS16(r *arm.Int32X4, v0 *arm.Int16X8) // Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpaddlqS32 VpaddlqS32 //go:noescape func VpaddlqS32(r *arm.Int64X2, v0 *arm.Int32X4) // Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpaddlqU8 VpaddlqU8 //go:noescape func VpaddlqU8(r *arm.Uint16X8, v0 *arm.Uint8X16) // Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpaddlqU16 VpaddlqU16 //go:noescape func VpaddlqU16(r *arm.Uint32X4, v0 *arm.Uint16X8) // Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements. // //go:linkname VpaddlqU32 VpaddlqU32 //go:noescape func VpaddlqU32(r *arm.Uint64X2, v0 *arm.Uint32X4) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqS8 VpaddqS8 //go:noescape func VpaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqS16 VpaddqS16 //go:noescape func VpaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqS32 VpaddqS32 //go:noescape func VpaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqS64 VpaddqS64 //go:noescape func VpaddqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqU8 VpaddqU8 //go:noescape func VpaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqU16 VpaddqU16 //go:noescape func VpaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqU32 VpaddqU32 //go:noescape func VpaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqU64 VpaddqU64 //go:noescape func VpaddqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpaddqF32 VpaddqF32 //go:noescape func VpaddqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpaddqF64 VpaddqF64 //go:noescape func VpaddqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpaddsF32 VpaddsF32 //go:noescape func VpaddsF32(r *arm.Float32, v0 *arm.Float32X2) // Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxS8 VpmaxS8 //go:noescape func VpmaxS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxS16 VpmaxS16 //go:noescape func VpmaxS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxS32 VpmaxS32 //go:noescape func VpmaxS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxU8 VpmaxU8 //go:noescape func VpmaxU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxU16 VpmaxU16 //go:noescape func VpmaxU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxU32 VpmaxU32 //go:noescape func VpmaxU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxF32 VpmaxF32 //go:noescape func VpmaxF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxnmF32 VpmaxnmF32 //go:noescape func VpmaxnmF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxnmqF32 VpmaxnmqF32 //go:noescape func VpmaxnmqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxnmqF64 VpmaxnmqF64 //go:noescape func VpmaxnmqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxnmqdF64 VpmaxnmqdF64 //go:noescape func VpmaxnmqdF64(r *arm.Float64, v0 *arm.Float64X2) // Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxnmsF32 VpmaxnmsF32 //go:noescape func VpmaxnmsF32(r *arm.Float32, v0 *arm.Float32X2) // Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxqS8 VpmaxqS8 //go:noescape func VpmaxqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxqS16 VpmaxqS16 //go:noescape func VpmaxqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxqS32 VpmaxqS32 //go:noescape func VpmaxqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxqU8 VpmaxqU8 //go:noescape func VpmaxqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxqU16 VpmaxqU16 //go:noescape func VpmaxqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxqU32 VpmaxqU32 //go:noescape func VpmaxqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxqF32 VpmaxqF32 //go:noescape func VpmaxqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxqF64 VpmaxqF64 //go:noescape func VpmaxqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxqdF64 VpmaxqdF64 //go:noescape func VpmaxqdF64(r *arm.Float64, v0 *arm.Float64X2) // Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxsF32 VpmaxsF32 //go:noescape func VpmaxsF32(r *arm.Float32, v0 *arm.Float32X2) // Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminS8 VpminS8 //go:noescape func VpminS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminS16 VpminS16 //go:noescape func VpminS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminS32 VpminS32 //go:noescape func VpminS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminU8 VpminU8 //go:noescape func VpminU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminU16 VpminU16 //go:noescape func VpminU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminU32 VpminU32 //go:noescape func VpminU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminF32 VpminF32 //go:noescape func VpminF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminnmF32 VpminnmF32 //go:noescape func VpminnmF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminnmqF32 VpminnmqF32 //go:noescape func VpminnmqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminnmqF64 VpminnmqF64 //go:noescape func VpminnmqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminnmqdF64 VpminnmqdF64 //go:noescape func VpminnmqdF64(r *arm.Float64, v0 *arm.Float64X2) // Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminnmsF32 VpminnmsF32 //go:noescape func VpminnmsF32(r *arm.Float32, v0 *arm.Float32X2) // Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminqS8 VpminqS8 //go:noescape func VpminqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminqS16 VpminqS16 //go:noescape func VpminqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminqS32 VpminqS32 //go:noescape func VpminqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminqU8 VpminqU8 //go:noescape func VpminqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminqU16 VpminqU16 //go:noescape func VpminqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminqU32 VpminqU32 //go:noescape func VpminqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminqF32 VpminqF32 //go:noescape func VpminqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminqF64 VpminqF64 //go:noescape func VpminqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminqdF64 VpminqdF64 //go:noescape func VpminqdF64(r *arm.Float64, v0 *arm.Float64X2) // Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminsF32 VpminsF32 //go:noescape func VpminsF32(r *arm.Float32, v0 *arm.Float32X2) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsS8 VqabsS8 //go:noescape func VqabsS8(r *arm.Int8X8, v0 *arm.Int8X8) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsS16 VqabsS16 //go:noescape func VqabsS16(r *arm.Int16X4, v0 *arm.Int16X4) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsS32 VqabsS32 //go:noescape func VqabsS32(r *arm.Int32X2, v0 *arm.Int32X2) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsS64 VqabsS64 //go:noescape func VqabsS64(r *arm.Int64X1, v0 *arm.Int64X1) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsbS8 VqabsbS8 //go:noescape func VqabsbS8(r *arm.Int8, v0 *arm.Int8) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsdS64 VqabsdS64 //go:noescape func VqabsdS64(r *arm.Int64, v0 *arm.Int64) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabshS16 VqabshS16 //go:noescape func VqabshS16(r *arm.Int16, v0 *arm.Int16) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsqS8 VqabsqS8 //go:noescape func VqabsqS8(r *arm.Int8X16, v0 *arm.Int8X16) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsqS16 VqabsqS16 //go:noescape func VqabsqS16(r *arm.Int16X8, v0 *arm.Int16X8) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsqS32 VqabsqS32 //go:noescape func VqabsqS32(r *arm.Int32X4, v0 *arm.Int32X4) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsqS64 VqabsqS64 //go:noescape func VqabsqS64(r *arm.Int64X2, v0 *arm.Int64X2) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabssS32 VqabssS32 //go:noescape func VqabssS32(r *arm.Int32, v0 *arm.Int32) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddS8 VqaddS8 //go:noescape func VqaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddS16 VqaddS16 //go:noescape func VqaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddS32 VqaddS32 //go:noescape func VqaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddS64 VqaddS64 //go:noescape func VqaddS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddU8 VqaddU8 //go:noescape func VqaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddU16 VqaddU16 //go:noescape func VqaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddU32 VqaddU32 //go:noescape func VqaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddU64 VqaddU64 //go:noescape func VqaddU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddbS8 VqaddbS8 //go:noescape func VqaddbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddbU8 VqaddbU8 //go:noescape func VqaddbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqadddS64 VqadddS64 //go:noescape func VqadddS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqadddU64 VqadddU64 //go:noescape func VqadddU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddhS16 VqaddhS16 //go:noescape func VqaddhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddhU16 VqaddhU16 //go:noescape func VqaddhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqS8 VqaddqS8 //go:noescape func VqaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqS16 VqaddqS16 //go:noescape func VqaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqS32 VqaddqS32 //go:noescape func VqaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqS64 VqaddqS64 //go:noescape func VqaddqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqU8 VqaddqU8 //go:noescape func VqaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqU16 VqaddqU16 //go:noescape func VqaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqU32 VqaddqU32 //go:noescape func VqaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqU64 VqaddqU64 //go:noescape func VqaddqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddsS32 VqaddsS32 //go:noescape func VqaddsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddsU32 VqaddsU32 //go:noescape func VqaddsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32) // Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlalS16 VqdmlalS16 //go:noescape func VqdmlalS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4) // Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlalS32 VqdmlalS32 //go:noescape func VqdmlalS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2) // Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlalHighS16 VqdmlalHighS16 //go:noescape func VqdmlalHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8) // Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlalHighS32 VqdmlalHighS32 //go:noescape func VqdmlalHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4) // Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlalHighNS16 VqdmlalHighNS16 //go:noescape func VqdmlalHighNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16) // Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlalHighNS32 VqdmlalHighNS32 //go:noescape func VqdmlalHighNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32) // Vector widening saturating doubling multiply accumulate with scalar // //go:linkname VqdmlalNS16 VqdmlalNS16 //go:noescape func VqdmlalNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16) // Vector widening saturating doubling multiply accumulate with scalar // //go:linkname VqdmlalNS32 VqdmlalNS32 //go:noescape func VqdmlalNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32) // Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlalhS16 VqdmlalhS16 //go:noescape func VqdmlalhS16(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int16, v2 *arm.Int16) // Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlalsS32 VqdmlalsS32 //go:noescape func VqdmlalsS32(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int32, v2 *arm.Int32) // Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlslS16 VqdmlslS16 //go:noescape func VqdmlslS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16X4) // Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlslS32 VqdmlslS32 //go:noescape func VqdmlslS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32X2) // Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlslHighS16 VqdmlslHighS16 //go:noescape func VqdmlslHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16X8) // Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlslHighS32 VqdmlslHighS32 //go:noescape func VqdmlslHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32X4) // Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlslHighNS16 VqdmlslHighNS16 //go:noescape func VqdmlslHighNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8, v2 *arm.Int16) // Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlslHighNS32 VqdmlslHighNS32 //go:noescape func VqdmlslHighNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4, v2 *arm.Int32) // Vector widening saturating doubling multiply subtract with scalar // //go:linkname VqdmlslNS16 VqdmlslNS16 //go:noescape func VqdmlslNS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4, v2 *arm.Int16) // Vector widening saturating doubling multiply subtract with scalar // //go:linkname VqdmlslNS32 VqdmlslNS32 //go:noescape func VqdmlslNS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2, v2 *arm.Int32) // Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlslhS16 VqdmlslhS16 //go:noescape func VqdmlslhS16(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int16, v2 *arm.Int16) // Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. // //go:linkname VqdmlslsS32 VqdmlslsS32 //go:noescape func VqdmlslsS32(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int32, v2 *arm.Int32) // Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmulhS16 VqdmulhS16 //go:noescape func VqdmulhS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmulhS32 VqdmulhS32 //go:noescape func VqdmulhS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Vector saturating doubling multiply high with scalar // //go:linkname VqdmulhNS16 VqdmulhNS16 //go:noescape func VqdmulhNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16) // Vector saturating doubling multiply high with scalar // //go:linkname VqdmulhNS32 VqdmulhNS32 //go:noescape func VqdmulhNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32) // Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmulhhS16 VqdmulhhS16 //go:noescape func VqdmulhhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16) // Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmulhqS16 VqdmulhqS16 //go:noescape func VqdmulhqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmulhqS32 VqdmulhqS32 //go:noescape func VqdmulhqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Vector saturating doubling multiply high with scalar // //go:linkname VqdmulhqNS16 VqdmulhqNS16 //go:noescape func VqdmulhqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16) // Vector saturating doubling multiply high with scalar // //go:linkname VqdmulhqNS32 VqdmulhqNS32 //go:noescape func VqdmulhqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32) // Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmulhsS32 VqdmulhsS32 //go:noescape func VqdmulhsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32) // Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmullS16 VqdmullS16 //go:noescape func VqdmullS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmullS32 VqdmullS32 //go:noescape func VqdmullS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmullHighS16 VqdmullHighS16 //go:noescape func VqdmullHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmullHighS32 VqdmullHighS32 //go:noescape func VqdmullHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4) // Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmullHighNS16 VqdmullHighNS16 //go:noescape func VqdmullHighNS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16) // Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmullHighNS32 VqdmullHighNS32 //go:noescape func VqdmullHighNS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32) // Vector saturating doubling long multiply with scalar // //go:linkname VqdmullNS16 VqdmullNS16 //go:noescape func VqdmullNS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16) // Vector saturating doubling long multiply with scalar // //go:linkname VqdmullNS32 VqdmullNS32 //go:noescape func VqdmullNS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32) // Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmullhS16 VqdmullhS16 //go:noescape func VqdmullhS16(r *arm.Int32, v0 *arm.Int16, v1 *arm.Int16) // Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmullsS32 VqdmullsS32 //go:noescape func VqdmullsS32(r *arm.Int64, v0 *arm.Int32, v1 *arm.Int32) // Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VqmovnS16 VqmovnS16 //go:noescape func VqmovnS16(r *arm.Int8X8, v0 *arm.Int16X8) // Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VqmovnS32 VqmovnS32 //go:noescape func VqmovnS32(r *arm.Int16X4, v0 *arm.Int32X4) // Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VqmovnS64 VqmovnS64 //go:noescape func VqmovnS64(r *arm.Int32X2, v0 *arm.Int64X2) // Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VqmovnU16 VqmovnU16 //go:noescape func VqmovnU16(r *arm.Uint8X8, v0 *arm.Uint16X8) // Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VqmovnU32 VqmovnU32 //go:noescape func VqmovnU32(r *arm.Uint16X4, v0 *arm.Uint32X4) // Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VqmovnU64 VqmovnU64 //go:noescape func VqmovnU64(r *arm.Uint32X2, v0 *arm.Uint64X2) // Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VqmovnHighS16 VqmovnHighS16 //go:noescape func VqmovnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8) // Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VqmovnHighS32 VqmovnHighS32 //go:noescape func VqmovnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4) // Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VqmovnHighS64 VqmovnHighS64 //go:noescape func VqmovnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2) // Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VqmovnHighU16 VqmovnHighU16 //go:noescape func VqmovnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8) // Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VqmovnHighU32 VqmovnHighU32 //go:noescape func VqmovnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4) // Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VqmovnHighU64 VqmovnHighU64 //go:noescape func VqmovnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2) // Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VqmovndS64 VqmovndS64 //go:noescape func VqmovndS64(r *arm.Int32, v0 *arm.Int64) // Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VqmovndU64 VqmovndU64 //go:noescape func VqmovndU64(r *arm.Uint32, v0 *arm.Uint64) // Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VqmovnhS16 VqmovnhS16 //go:noescape func VqmovnhS16(r *arm.Int8, v0 *arm.Int16) // Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VqmovnhU16 VqmovnhU16 //go:noescape func VqmovnhU16(r *arm.Uint8, v0 *arm.Uint16) // Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. All the values in this instruction are signed integer values. // //go:linkname VqmovnsS32 VqmovnsS32 //go:noescape func VqmovnsS32(r *arm.Int16, v0 *arm.Int32) // Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register, saturates each value to half the original width, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VqmovnsU32 VqmovnsU32 //go:noescape func VqmovnsU32(r *arm.Uint16, v0 *arm.Uint32) // Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VqmovunS16 VqmovunS16 //go:noescape func VqmovunS16(r *arm.Uint8X8, v0 *arm.Int16X8) // Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VqmovunS32 VqmovunS32 //go:noescape func VqmovunS32(r *arm.Uint16X4, v0 *arm.Int32X4) // Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VqmovunS64 VqmovunS64 //go:noescape func VqmovunS64(r *arm.Uint32X2, v0 *arm.Int64X2) // Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VqmovunHighS16 VqmovunHighS16 //go:noescape func VqmovunHighS16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Int16X8) // Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VqmovunHighS32 VqmovunHighS32 //go:noescape func VqmovunHighS32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Int32X4) // Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VqmovunHighS64 VqmovunHighS64 //go:noescape func VqmovunHighS64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Int64X2) // Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VqmovundS64 VqmovundS64 //go:noescape func VqmovundS64(r *arm.Uint32, v0 *arm.Int64) // Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VqmovunhS16 VqmovunhS16 //go:noescape func VqmovunhS16(r *arm.Uint8, v0 *arm.Int16) // Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector elements. // //go:linkname VqmovunsS32 VqmovunsS32 //go:noescape func VqmovunsS32(r *arm.Uint16, v0 *arm.Int32) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegS8 VqnegS8 //go:noescape func VqnegS8(r *arm.Int8X8, v0 *arm.Int8X8) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegS16 VqnegS16 //go:noescape func VqnegS16(r *arm.Int16X4, v0 *arm.Int16X4) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegS32 VqnegS32 //go:noescape func VqnegS32(r *arm.Int32X2, v0 *arm.Int32X2) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegS64 VqnegS64 //go:noescape func VqnegS64(r *arm.Int64X1, v0 *arm.Int64X1) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegbS8 VqnegbS8 //go:noescape func VqnegbS8(r *arm.Int8, v0 *arm.Int8) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegdS64 VqnegdS64 //go:noescape func VqnegdS64(r *arm.Int64, v0 *arm.Int64) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqneghS16 VqneghS16 //go:noescape func VqneghS16(r *arm.Int16, v0 *arm.Int16) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegqS8 VqnegqS8 //go:noescape func VqnegqS8(r *arm.Int8X16, v0 *arm.Int8X16) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegqS16 VqnegqS16 //go:noescape func VqnegqS16(r *arm.Int16X8, v0 *arm.Int16X8) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegqS32 VqnegqS32 //go:noescape func VqnegqS32(r *arm.Int32X4, v0 *arm.Int32X4) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegqS64 VqnegqS64 //go:noescape func VqnegqS64(r *arm.Int64X2, v0 *arm.Int64X2) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegsS32 VqnegsS32 //go:noescape func VqnegsS32(r *arm.Int32, v0 *arm.Int32) // Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded. // //go:linkname VqrdmlahS16 VqrdmlahS16 //go:noescape func VqrdmlahS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4) // Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded. // //go:linkname VqrdmlahS32 VqrdmlahS32 //go:noescape func VqrdmlahS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2) // Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded. // //go:linkname VqrdmlahhS16 VqrdmlahhS16 //go:noescape func VqrdmlahhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, v2 *arm.Int16) // Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded. // //go:linkname VqrdmlahqS16 VqrdmlahqS16 //go:noescape func VqrdmlahqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8) // Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded. // //go:linkname VqrdmlahqS32 VqrdmlahqS32 //go:noescape func VqrdmlahqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4) // Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded. // //go:linkname VqrdmlahsS32 VqrdmlahsS32 //go:noescape func VqrdmlahsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, v2 *arm.Int32) // Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded. // //go:linkname VqrdmlshS16 VqrdmlshS16 //go:noescape func VqrdmlshS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4, v2 *arm.Int16X4) // Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded. // //go:linkname VqrdmlshS32 VqrdmlshS32 //go:noescape func VqrdmlshS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2, v2 *arm.Int32X2) // Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded. // //go:linkname VqrdmlshhS16 VqrdmlshhS16 //go:noescape func VqrdmlshhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, v2 *arm.Int16) // Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded. // //go:linkname VqrdmlshqS16 VqrdmlshqS16 //go:noescape func VqrdmlshqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8, v2 *arm.Int16X8) // Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded. // //go:linkname VqrdmlshqS32 VqrdmlshqS32 //go:noescape func VqrdmlshqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4, v2 *arm.Int32X4) // Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half of the final results from the vector elements of the destination SIMD&FP register. The results are rounded. // //go:linkname VqrdmlshsS32 VqrdmlshsS32 //go:noescape func VqrdmlshsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, v2 *arm.Int32) // Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrdmulhS16 VqrdmulhS16 //go:noescape func VqrdmulhS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrdmulhS32 VqrdmulhS32 //go:noescape func VqrdmulhS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Vector saturating rounding doubling multiply high with scalar // //go:linkname VqrdmulhNS16 VqrdmulhNS16 //go:noescape func VqrdmulhNS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16) // Vector saturating rounding doubling multiply high with scalar // //go:linkname VqrdmulhNS32 VqrdmulhNS32 //go:noescape func VqrdmulhNS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32) // Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrdmulhhS16 VqrdmulhhS16 //go:noescape func VqrdmulhhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16) // Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrdmulhqS16 VqrdmulhqS16 //go:noescape func VqrdmulhqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrdmulhqS32 VqrdmulhqS32 //go:noescape func VqrdmulhqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Vector saturating rounding doubling multiply high with scalar // //go:linkname VqrdmulhqNS16 VqrdmulhqNS16 //go:noescape func VqrdmulhqNS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16) // Vector saturating rounding doubling multiply high with scalar // //go:linkname VqrdmulhqNS32 VqrdmulhqNS32 //go:noescape func VqrdmulhqNS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32) // Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrdmulhsS32 VqrdmulhsS32 //go:noescape func VqrdmulhsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlS8 VqrshlS8 //go:noescape func VqrshlS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlS16 VqrshlS16 //go:noescape func VqrshlS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlS32 VqrshlS32 //go:noescape func VqrshlS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlS64 VqrshlS64 //go:noescape func VqrshlS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlU8 VqrshlU8 //go:noescape func VqrshlU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8) // Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlU16 VqrshlU16 //go:noescape func VqrshlU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4) // Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlU32 VqrshlU32 //go:noescape func VqrshlU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2) // Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlU64 VqrshlU64 //go:noescape func VqrshlU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlbS8 VqrshlbS8 //go:noescape func VqrshlbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8) // Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlbU8 VqrshlbU8 //go:noescape func VqrshlbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Int8) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshldS64 VqrshldS64 //go:noescape func VqrshldS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64) // Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshldU64 VqrshldU64 //go:noescape func VqrshldU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlhS16 VqrshlhS16 //go:noescape func VqrshlhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16) // Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlhU16 VqrshlhU16 //go:noescape func VqrshlhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Int16) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlqS8 VqrshlqS8 //go:noescape func VqrshlqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlqS16 VqrshlqS16 //go:noescape func VqrshlqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlqS32 VqrshlqS32 //go:noescape func VqrshlqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlqS64 VqrshlqS64 //go:noescape func VqrshlqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlqU8 VqrshlqU8 //go:noescape func VqrshlqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16) // Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlqU16 VqrshlqU16 //go:noescape func VqrshlqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8) // Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlqU32 VqrshlqU32 //go:noescape func VqrshlqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4) // Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlqU64 VqrshlqU64 //go:noescape func VqrshlqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlsS32 VqrshlsS32 //go:noescape func VqrshlsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32) // Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlsU32 VqrshlsU32 //go:noescape func VqrshlsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Int32) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlS8 VqshlS8 //go:noescape func VqshlS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlS16 VqshlS16 //go:noescape func VqshlS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlS32 VqshlS32 //go:noescape func VqshlS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlS64 VqshlS64 //go:noescape func VqshlS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlU8 VqshlU8 //go:noescape func VqshlU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8) // Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlU16 VqshlU16 //go:noescape func VqshlU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4) // Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlU32 VqshlU32 //go:noescape func VqshlU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2) // Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlU64 VqshlU64 //go:noescape func VqshlU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlbS8 VqshlbS8 //go:noescape func VqshlbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8) // Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlbU8 VqshlbU8 //go:noescape func VqshlbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Int8) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshldS64 VqshldS64 //go:noescape func VqshldS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64) // Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshldU64 VqshldU64 //go:noescape func VqshldU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlhS16 VqshlhS16 //go:noescape func VqshlhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16) // Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlhU16 VqshlhU16 //go:noescape func VqshlhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Int16) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlqS8 VqshlqS8 //go:noescape func VqshlqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlqS16 VqshlqS16 //go:noescape func VqshlqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlqS32 VqshlqS32 //go:noescape func VqshlqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlqS64 VqshlqS64 //go:noescape func VqshlqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlqU8 VqshlqU8 //go:noescape func VqshlqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16) // Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlqU16 VqshlqU16 //go:noescape func VqshlqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8) // Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlqU32 VqshlqU32 //go:noescape func VqshlqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4) // Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlqU64 VqshlqU64 //go:noescape func VqshlqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlsS32 VqshlsS32 //go:noescape func VqshlsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32) // Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlsU32 VqshlsU32 //go:noescape func VqshlsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Int32) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubS8 VqsubS8 //go:noescape func VqsubS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubS16 VqsubS16 //go:noescape func VqsubS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubS32 VqsubS32 //go:noescape func VqsubS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubS64 VqsubS64 //go:noescape func VqsubS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubU8 VqsubU8 //go:noescape func VqsubU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubU16 VqsubU16 //go:noescape func VqsubU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubU32 VqsubU32 //go:noescape func VqsubU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubU64 VqsubU64 //go:noescape func VqsubU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubbS8 VqsubbS8 //go:noescape func VqsubbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubbU8 VqsubbU8 //go:noescape func VqsubbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubdS64 VqsubdS64 //go:noescape func VqsubdS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubdU64 VqsubdU64 //go:noescape func VqsubdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubhS16 VqsubhS16 //go:noescape func VqsubhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubhU16 VqsubhU16 //go:noescape func VqsubhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqS8 VqsubqS8 //go:noescape func VqsubqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqS16 VqsubqS16 //go:noescape func VqsubqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqS32 VqsubqS32 //go:noescape func VqsubqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqS64 VqsubqS64 //go:noescape func VqsubqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqU8 VqsubqU8 //go:noescape func VqsubqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqU16 VqsubqU16 //go:noescape func VqsubqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqU32 VqsubqU32 //go:noescape func VqsubqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqU64 VqsubqU64 //go:noescape func VqsubqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubsS32 VqsubsS32 //go:noescape func VqsubsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubsU32 VqsubsU32 //go:noescape func VqsubsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl1S8 Vqtbl1S8 //go:noescape func Vqtbl1S8(r *arm.Int8X8, v0 *arm.Int8X16, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl1U8 Vqtbl1U8 //go:noescape func Vqtbl1U8(r *arm.Uint8X8, v0 *arm.Uint8X16, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl1P8 Vqtbl1P8 //go:noescape func Vqtbl1P8(r *arm.Poly8X8, v0 *arm.Poly8X16, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl1QS8 Vqtbl1QS8 //go:noescape func Vqtbl1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Uint8X16) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl1QU8 Vqtbl1QU8 //go:noescape func Vqtbl1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl1QP8 Vqtbl1QP8 //go:noescape func Vqtbl1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Uint8X16) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl2S8 Vqtbl2S8 //go:noescape func Vqtbl2S8(r *arm.Int8X8, v0 *arm.Int8X16X2, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl2U8 Vqtbl2U8 //go:noescape func Vqtbl2U8(r *arm.Uint8X8, v0 *arm.Uint8X16X2, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl2P8 Vqtbl2P8 //go:noescape func Vqtbl2P8(r *arm.Poly8X8, v0 *arm.Poly8X16X2, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl2QS8 Vqtbl2QS8 //go:noescape func Vqtbl2QS8(r *arm.Int8X16, v0 *arm.Int8X16X2, v1 *arm.Uint8X16) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl2QU8 Vqtbl2QU8 //go:noescape func Vqtbl2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16X2, v1 *arm.Uint8X16) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl2QP8 Vqtbl2QP8 //go:noescape func Vqtbl2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16X2, v1 *arm.Uint8X16) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl3S8 Vqtbl3S8 //go:noescape func Vqtbl3S8(r *arm.Int8X8, v0 *arm.Int8X16X3, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl3U8 Vqtbl3U8 //go:noescape func Vqtbl3U8(r *arm.Uint8X8, v0 *arm.Uint8X16X3, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl3P8 Vqtbl3P8 //go:noescape func Vqtbl3P8(r *arm.Poly8X8, v0 *arm.Poly8X16X3, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl3QS8 Vqtbl3QS8 //go:noescape func Vqtbl3QS8(r *arm.Int8X16, v0 *arm.Int8X16X3, v1 *arm.Uint8X16) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl3QU8 Vqtbl3QU8 //go:noescape func Vqtbl3QU8(r *arm.Uint8X16, v0 *arm.Uint8X16X3, v1 *arm.Uint8X16) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl3QP8 Vqtbl3QP8 //go:noescape func Vqtbl3QP8(r *arm.Poly8X16, v0 *arm.Poly8X16X3, v1 *arm.Uint8X16) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl4S8 Vqtbl4S8 //go:noescape func Vqtbl4S8(r *arm.Int8X8, v0 *arm.Int8X16X4, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl4U8 Vqtbl4U8 //go:noescape func Vqtbl4U8(r *arm.Uint8X8, v0 *arm.Uint8X16X4, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl4P8 Vqtbl4P8 //go:noescape func Vqtbl4P8(r *arm.Poly8X8, v0 *arm.Poly8X16X4, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl4QS8 Vqtbl4QS8 //go:noescape func Vqtbl4QS8(r *arm.Int8X16, v0 *arm.Int8X16X4, v1 *arm.Uint8X16) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl4QU8 Vqtbl4QU8 //go:noescape func Vqtbl4QU8(r *arm.Uint8X16, v0 *arm.Uint8X16X4, v1 *arm.Uint8X16) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl4QP8 Vqtbl4QP8 //go:noescape func Vqtbl4QP8(r *arm.Poly8X16, v0 *arm.Poly8X16X4, v1 *arm.Uint8X16) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx1S8 Vqtbx1S8 //go:noescape func Vqtbx1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X16, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx1U8 Vqtbx1U8 //go:noescape func Vqtbx1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X16, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx1P8 Vqtbx1P8 //go:noescape func Vqtbx1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X16, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx1QS8 Vqtbx1QS8 //go:noescape func Vqtbx1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16, v2 *arm.Uint8X16) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx1QU8 Vqtbx1QU8 //go:noescape func Vqtbx1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16, v2 *arm.Uint8X16) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx1QP8 Vqtbx1QP8 //go:noescape func Vqtbx1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16, v2 *arm.Uint8X16) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx2S8 Vqtbx2S8 //go:noescape func Vqtbx2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X16X2, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx2U8 Vqtbx2U8 //go:noescape func Vqtbx2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X16X2, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx2P8 Vqtbx2P8 //go:noescape func Vqtbx2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X16X2, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx2QS8 Vqtbx2QS8 //go:noescape func Vqtbx2QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16X2, v2 *arm.Uint8X16) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx2QU8 Vqtbx2QU8 //go:noescape func Vqtbx2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16X2, v2 *arm.Uint8X16) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx2QP8 Vqtbx2QP8 //go:noescape func Vqtbx2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16X2, v2 *arm.Uint8X16) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx3S8 Vqtbx3S8 //go:noescape func Vqtbx3S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X16X3, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx3U8 Vqtbx3U8 //go:noescape func Vqtbx3U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X16X3, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx3P8 Vqtbx3P8 //go:noescape func Vqtbx3P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X16X3, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx3QS8 Vqtbx3QS8 //go:noescape func Vqtbx3QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16X3, v2 *arm.Uint8X16) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx3QU8 Vqtbx3QU8 //go:noescape func Vqtbx3QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16X3, v2 *arm.Uint8X16) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx3QP8 Vqtbx3QP8 //go:noescape func Vqtbx3QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16X3, v2 *arm.Uint8X16) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx4S8 Vqtbx4S8 //go:noescape func Vqtbx4S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X16X4, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx4U8 Vqtbx4U8 //go:noescape func Vqtbx4U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X16X4, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx4P8 Vqtbx4P8 //go:noescape func Vqtbx4P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X16X4, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx4QS8 Vqtbx4QS8 //go:noescape func Vqtbx4QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16X4, v2 *arm.Uint8X16) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx4QU8 Vqtbx4QU8 //go:noescape func Vqtbx4QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16X4, v2 *arm.Uint8X16) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbx4QP8 Vqtbx4QP8 //go:noescape func Vqtbx4QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16X4, v2 *arm.Uint8X16) // Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VraddhnS16 VraddhnS16 //go:noescape func VraddhnS16(r *arm.Int8X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VraddhnS32 VraddhnS32 //go:noescape func VraddhnS32(r *arm.Int16X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VraddhnS64 VraddhnS64 //go:noescape func VraddhnS64(r *arm.Int32X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VraddhnU16 VraddhnU16 //go:noescape func VraddhnU16(r *arm.Uint8X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VraddhnU32 VraddhnU32 //go:noescape func VraddhnU32(r *arm.Uint16X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VraddhnU64 VraddhnU64 //go:noescape func VraddhnU64(r *arm.Uint32X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VraddhnHighS16 VraddhnHighS16 //go:noescape func VraddhnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8, v2 *arm.Int16X8) // Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VraddhnHighS32 VraddhnHighS32 //go:noescape func VraddhnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4, v2 *arm.Int32X4) // Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VraddhnHighS64 VraddhnHighS64 //go:noescape func VraddhnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2, v2 *arm.Int64X2) // Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VraddhnHighU16 VraddhnHighU16 //go:noescape func VraddhnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8) // Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VraddhnHighU32 VraddhnHighU32 //go:noescape func VraddhnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VraddhnHighU64 VraddhnHighU64 //go:noescape func VraddhnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2) // Rotate and Exclusive OR rotates each 64-bit element of the 128-bit vector in a source SIMD&FP register left by 1, performs a bitwise exclusive OR of the resulting 128-bit vector and the vector in another source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname Vrax1QU64 Vrax1QU64 //go:noescape func Vrax1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrbitS8 VrbitS8 //go:noescape func VrbitS8(r *arm.Int8X8, v0 *arm.Int8X8) // Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrbitU8 VrbitU8 //go:noescape func VrbitU8(r *arm.Uint8X8, v0 *arm.Uint8X8) // Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrbitP8 VrbitP8 //go:noescape func VrbitP8(r *arm.Poly8X8, v0 *arm.Poly8X8) // Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrbitqS8 VrbitqS8 //go:noescape func VrbitqS8(r *arm.Int8X16, v0 *arm.Int8X16) // Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrbitqU8 VrbitqU8 //go:noescape func VrbitqU8(r *arm.Uint8X16, v0 *arm.Uint8X16) // Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrbitqP8 VrbitqP8 //go:noescape func VrbitqP8(r *arm.Poly8X16, v0 *arm.Poly8X16) // Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpeU32 VrecpeU32 //go:noescape func VrecpeU32(r *arm.Uint32X2, v0 *arm.Uint32X2) // Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpeF32 VrecpeF32 //go:noescape func VrecpeF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpeF64 VrecpeF64 //go:noescape func VrecpeF64(r *arm.Float64X1, v0 *arm.Float64X1) // Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpedF64 VrecpedF64 //go:noescape func VrecpedF64(r *arm.Float64, v0 *arm.Float64) // Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpeqU32 VrecpeqU32 //go:noescape func VrecpeqU32(r *arm.Uint32X4, v0 *arm.Uint32X4) // Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpeqF32 VrecpeqF32 //go:noescape func VrecpeqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpeqF64 VrecpeqF64 //go:noescape func VrecpeqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpesF32 VrecpesF32 //go:noescape func VrecpesF32(r *arm.Float32, v0 *arm.Float32) // Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpsF32 VrecpsF32 //go:noescape func VrecpsF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpsF64 VrecpsF64 //go:noescape func VrecpsF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpsdF64 VrecpsdF64 //go:noescape func VrecpsdF64(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64) // Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpsqF32 VrecpsqF32 //go:noescape func VrecpsqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpsqF64 VrecpsqF64 //go:noescape func VrecpsqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpssF32 VrecpssF32 //go:noescape func VrecpssF32(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32) // Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpxdF64 VrecpxdF64 //go:noescape func VrecpxdF64(r *arm.Float64, v0 *arm.Float64) // Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpxsF32 VrecpxsF32 //go:noescape func VrecpxsF32(r *arm.Float32, v0 *arm.Float32) // Vector reinterpret cast operation // //go:linkname VreinterpretF32S8 VreinterpretF32S8 //go:noescape func VreinterpretF32S8(r *arm.Float32X2, v0 *arm.Int8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretF32S16 VreinterpretF32S16 //go:noescape func VreinterpretF32S16(r *arm.Float32X2, v0 *arm.Int16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretF32S32 VreinterpretF32S32 //go:noescape func VreinterpretF32S32(r *arm.Float32X2, v0 *arm.Int32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretF32S64 VreinterpretF32S64 //go:noescape func VreinterpretF32S64(r *arm.Float32X2, v0 *arm.Int64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretF32U8 VreinterpretF32U8 //go:noescape func VreinterpretF32U8(r *arm.Float32X2, v0 *arm.Uint8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretF32U16 VreinterpretF32U16 //go:noescape func VreinterpretF32U16(r *arm.Float32X2, v0 *arm.Uint16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretF32U32 VreinterpretF32U32 //go:noescape func VreinterpretF32U32(r *arm.Float32X2, v0 *arm.Uint32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretF32U64 VreinterpretF32U64 //go:noescape func VreinterpretF32U64(r *arm.Float32X2, v0 *arm.Uint64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretF32F64 VreinterpretF32F64 //go:noescape func VreinterpretF32F64(r *arm.Float32X2, v0 *arm.Float64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretF32P16 VreinterpretF32P16 //go:noescape func VreinterpretF32P16(r *arm.Float32X2, v0 *arm.Poly16X4) // vreinterpret_f32_p64 // //go:linkname VreinterpretF32P64 VreinterpretF32P64 //go:noescape func VreinterpretF32P64(r *arm.Float32X2, v0 *arm.Poly64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretF32P8 VreinterpretF32P8 //go:noescape func VreinterpretF32P8(r *arm.Float32X2, v0 *arm.Poly8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretF64S8 VreinterpretF64S8 //go:noescape func VreinterpretF64S8(r *arm.Float64X1, v0 *arm.Int8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretF64S16 VreinterpretF64S16 //go:noescape func VreinterpretF64S16(r *arm.Float64X1, v0 *arm.Int16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretF64S32 VreinterpretF64S32 //go:noescape func VreinterpretF64S32(r *arm.Float64X1, v0 *arm.Int32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretF64S64 VreinterpretF64S64 //go:noescape func VreinterpretF64S64(r *arm.Float64X1, v0 *arm.Int64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretF64U8 VreinterpretF64U8 //go:noescape func VreinterpretF64U8(r *arm.Float64X1, v0 *arm.Uint8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretF64U16 VreinterpretF64U16 //go:noescape func VreinterpretF64U16(r *arm.Float64X1, v0 *arm.Uint16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretF64U32 VreinterpretF64U32 //go:noescape func VreinterpretF64U32(r *arm.Float64X1, v0 *arm.Uint32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretF64U64 VreinterpretF64U64 //go:noescape func VreinterpretF64U64(r *arm.Float64X1, v0 *arm.Uint64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretF64F32 VreinterpretF64F32 //go:noescape func VreinterpretF64F32(r *arm.Float64X1, v0 *arm.Float32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretF64P16 VreinterpretF64P16 //go:noescape func VreinterpretF64P16(r *arm.Float64X1, v0 *arm.Poly16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretF64P64 VreinterpretF64P64 //go:noescape func VreinterpretF64P64(r *arm.Float64X1, v0 *arm.Poly64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretF64P8 VreinterpretF64P8 //go:noescape func VreinterpretF64P8(r *arm.Float64X1, v0 *arm.Poly8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretP16S8 VreinterpretP16S8 //go:noescape func VreinterpretP16S8(r *arm.Poly16X4, v0 *arm.Int8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretP16S16 VreinterpretP16S16 //go:noescape func VreinterpretP16S16(r *arm.Poly16X4, v0 *arm.Int16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretP16S32 VreinterpretP16S32 //go:noescape func VreinterpretP16S32(r *arm.Poly16X4, v0 *arm.Int32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretP16S64 VreinterpretP16S64 //go:noescape func VreinterpretP16S64(r *arm.Poly16X4, v0 *arm.Int64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretP16U8 VreinterpretP16U8 //go:noescape func VreinterpretP16U8(r *arm.Poly16X4, v0 *arm.Uint8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretP16U16 VreinterpretP16U16 //go:noescape func VreinterpretP16U16(r *arm.Poly16X4, v0 *arm.Uint16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretP16U32 VreinterpretP16U32 //go:noescape func VreinterpretP16U32(r *arm.Poly16X4, v0 *arm.Uint32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretP16U64 VreinterpretP16U64 //go:noescape func VreinterpretP16U64(r *arm.Poly16X4, v0 *arm.Uint64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretP16F32 VreinterpretP16F32 //go:noescape func VreinterpretP16F32(r *arm.Poly16X4, v0 *arm.Float32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretP16F64 VreinterpretP16F64 //go:noescape func VreinterpretP16F64(r *arm.Poly16X4, v0 *arm.Float64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretP16P64 VreinterpretP16P64 //go:noescape func VreinterpretP16P64(r *arm.Poly16X4, v0 *arm.Poly64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretP16P8 VreinterpretP16P8 //go:noescape func VreinterpretP16P8(r *arm.Poly16X4, v0 *arm.Poly8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretP64S8 VreinterpretP64S8 //go:noescape func VreinterpretP64S8(r *arm.Poly64X1, v0 *arm.Int8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretP64S16 VreinterpretP64S16 //go:noescape func VreinterpretP64S16(r *arm.Poly64X1, v0 *arm.Int16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretP64S32 VreinterpretP64S32 //go:noescape func VreinterpretP64S32(r *arm.Poly64X1, v0 *arm.Int32X2) // vreinterpret_p64_s64 // //go:linkname VreinterpretP64S64 VreinterpretP64S64 //go:noescape func VreinterpretP64S64(r *arm.Poly64X1, v0 *arm.Int64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretP64U8 VreinterpretP64U8 //go:noescape func VreinterpretP64U8(r *arm.Poly64X1, v0 *arm.Uint8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretP64U16 VreinterpretP64U16 //go:noescape func VreinterpretP64U16(r *arm.Poly64X1, v0 *arm.Uint16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretP64U32 VreinterpretP64U32 //go:noescape func VreinterpretP64U32(r *arm.Poly64X1, v0 *arm.Uint32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretP64U64 VreinterpretP64U64 //go:noescape func VreinterpretP64U64(r *arm.Poly64X1, v0 *arm.Uint64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretP64F32 VreinterpretP64F32 //go:noescape func VreinterpretP64F32(r *arm.Poly64X1, v0 *arm.Float32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretP64F64 VreinterpretP64F64 //go:noescape func VreinterpretP64F64(r *arm.Poly64X1, v0 *arm.Float64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretP64P16 VreinterpretP64P16 //go:noescape func VreinterpretP64P16(r *arm.Poly64X1, v0 *arm.Poly16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretP64P8 VreinterpretP64P8 //go:noescape func VreinterpretP64P8(r *arm.Poly64X1, v0 *arm.Poly8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretP8S8 VreinterpretP8S8 //go:noescape func VreinterpretP8S8(r *arm.Poly8X8, v0 *arm.Int8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretP8S16 VreinterpretP8S16 //go:noescape func VreinterpretP8S16(r *arm.Poly8X8, v0 *arm.Int16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretP8S32 VreinterpretP8S32 //go:noescape func VreinterpretP8S32(r *arm.Poly8X8, v0 *arm.Int32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretP8S64 VreinterpretP8S64 //go:noescape func VreinterpretP8S64(r *arm.Poly8X8, v0 *arm.Int64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretP8U8 VreinterpretP8U8 //go:noescape func VreinterpretP8U8(r *arm.Poly8X8, v0 *arm.Uint8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretP8U16 VreinterpretP8U16 //go:noescape func VreinterpretP8U16(r *arm.Poly8X8, v0 *arm.Uint16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretP8U32 VreinterpretP8U32 //go:noescape func VreinterpretP8U32(r *arm.Poly8X8, v0 *arm.Uint32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretP8U64 VreinterpretP8U64 //go:noescape func VreinterpretP8U64(r *arm.Poly8X8, v0 *arm.Uint64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretP8F32 VreinterpretP8F32 //go:noescape func VreinterpretP8F32(r *arm.Poly8X8, v0 *arm.Float32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretP8F64 VreinterpretP8F64 //go:noescape func VreinterpretP8F64(r *arm.Poly8X8, v0 *arm.Float64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretP8P16 VreinterpretP8P16 //go:noescape func VreinterpretP8P16(r *arm.Poly8X8, v0 *arm.Poly16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretP8P64 VreinterpretP8P64 //go:noescape func VreinterpretP8P64(r *arm.Poly8X8, v0 *arm.Poly64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS16S8 VreinterpretS16S8 //go:noescape func VreinterpretS16S8(r *arm.Int16X4, v0 *arm.Int8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretS16S32 VreinterpretS16S32 //go:noescape func VreinterpretS16S32(r *arm.Int16X4, v0 *arm.Int32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretS16S64 VreinterpretS16S64 //go:noescape func VreinterpretS16S64(r *arm.Int16X4, v0 *arm.Int64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS16U8 VreinterpretS16U8 //go:noescape func VreinterpretS16U8(r *arm.Int16X4, v0 *arm.Uint8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretS16U16 VreinterpretS16U16 //go:noescape func VreinterpretS16U16(r *arm.Int16X4, v0 *arm.Uint16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretS16U32 VreinterpretS16U32 //go:noescape func VreinterpretS16U32(r *arm.Int16X4, v0 *arm.Uint32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretS16U64 VreinterpretS16U64 //go:noescape func VreinterpretS16U64(r *arm.Int16X4, v0 *arm.Uint64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS16F32 VreinterpretS16F32 //go:noescape func VreinterpretS16F32(r *arm.Int16X4, v0 *arm.Float32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretS16F64 VreinterpretS16F64 //go:noescape func VreinterpretS16F64(r *arm.Int16X4, v0 *arm.Float64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS16P16 VreinterpretS16P16 //go:noescape func VreinterpretS16P16(r *arm.Int16X4, v0 *arm.Poly16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretS16P64 VreinterpretS16P64 //go:noescape func VreinterpretS16P64(r *arm.Int16X4, v0 *arm.Poly64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS16P8 VreinterpretS16P8 //go:noescape func VreinterpretS16P8(r *arm.Int16X4, v0 *arm.Poly8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretS32S8 VreinterpretS32S8 //go:noescape func VreinterpretS32S8(r *arm.Int32X2, v0 *arm.Int8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretS32S16 VreinterpretS32S16 //go:noescape func VreinterpretS32S16(r *arm.Int32X2, v0 *arm.Int16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretS32S64 VreinterpretS32S64 //go:noescape func VreinterpretS32S64(r *arm.Int32X2, v0 *arm.Int64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS32U8 VreinterpretS32U8 //go:noescape func VreinterpretS32U8(r *arm.Int32X2, v0 *arm.Uint8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretS32U16 VreinterpretS32U16 //go:noescape func VreinterpretS32U16(r *arm.Int32X2, v0 *arm.Uint16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretS32U32 VreinterpretS32U32 //go:noescape func VreinterpretS32U32(r *arm.Int32X2, v0 *arm.Uint32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretS32U64 VreinterpretS32U64 //go:noescape func VreinterpretS32U64(r *arm.Int32X2, v0 *arm.Uint64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS32F32 VreinterpretS32F32 //go:noescape func VreinterpretS32F32(r *arm.Int32X2, v0 *arm.Float32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretS32F64 VreinterpretS32F64 //go:noescape func VreinterpretS32F64(r *arm.Int32X2, v0 *arm.Float64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS32P16 VreinterpretS32P16 //go:noescape func VreinterpretS32P16(r *arm.Int32X2, v0 *arm.Poly16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretS32P64 VreinterpretS32P64 //go:noescape func VreinterpretS32P64(r *arm.Int32X2, v0 *arm.Poly64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS32P8 VreinterpretS32P8 //go:noescape func VreinterpretS32P8(r *arm.Int32X2, v0 *arm.Poly8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretS64S8 VreinterpretS64S8 //go:noescape func VreinterpretS64S8(r *arm.Int64X1, v0 *arm.Int8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretS64S16 VreinterpretS64S16 //go:noescape func VreinterpretS64S16(r *arm.Int64X1, v0 *arm.Int16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretS64S32 VreinterpretS64S32 //go:noescape func VreinterpretS64S32(r *arm.Int64X1, v0 *arm.Int32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretS64U8 VreinterpretS64U8 //go:noescape func VreinterpretS64U8(r *arm.Int64X1, v0 *arm.Uint8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretS64U16 VreinterpretS64U16 //go:noescape func VreinterpretS64U16(r *arm.Int64X1, v0 *arm.Uint16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretS64U32 VreinterpretS64U32 //go:noescape func VreinterpretS64U32(r *arm.Int64X1, v0 *arm.Uint32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretS64U64 VreinterpretS64U64 //go:noescape func VreinterpretS64U64(r *arm.Int64X1, v0 *arm.Uint64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS64F32 VreinterpretS64F32 //go:noescape func VreinterpretS64F32(r *arm.Int64X1, v0 *arm.Float32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretS64F64 VreinterpretS64F64 //go:noescape func VreinterpretS64F64(r *arm.Int64X1, v0 *arm.Float64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS64P16 VreinterpretS64P16 //go:noescape func VreinterpretS64P16(r *arm.Int64X1, v0 *arm.Poly16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretS64P64 VreinterpretS64P64 //go:noescape func VreinterpretS64P64(r *arm.Int64X1, v0 *arm.Poly64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS64P8 VreinterpretS64P8 //go:noescape func VreinterpretS64P8(r *arm.Int64X1, v0 *arm.Poly8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretS8S16 VreinterpretS8S16 //go:noescape func VreinterpretS8S16(r *arm.Int8X8, v0 *arm.Int16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretS8S32 VreinterpretS8S32 //go:noescape func VreinterpretS8S32(r *arm.Int8X8, v0 *arm.Int32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretS8S64 VreinterpretS8S64 //go:noescape func VreinterpretS8S64(r *arm.Int8X8, v0 *arm.Int64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS8U8 VreinterpretS8U8 //go:noescape func VreinterpretS8U8(r *arm.Int8X8, v0 *arm.Uint8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretS8U16 VreinterpretS8U16 //go:noescape func VreinterpretS8U16(r *arm.Int8X8, v0 *arm.Uint16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretS8U32 VreinterpretS8U32 //go:noescape func VreinterpretS8U32(r *arm.Int8X8, v0 *arm.Uint32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretS8U64 VreinterpretS8U64 //go:noescape func VreinterpretS8U64(r *arm.Int8X8, v0 *arm.Uint64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS8F32 VreinterpretS8F32 //go:noescape func VreinterpretS8F32(r *arm.Int8X8, v0 *arm.Float32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretS8F64 VreinterpretS8F64 //go:noescape func VreinterpretS8F64(r *arm.Int8X8, v0 *arm.Float64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS8P16 VreinterpretS8P16 //go:noescape func VreinterpretS8P16(r *arm.Int8X8, v0 *arm.Poly16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretS8P64 VreinterpretS8P64 //go:noescape func VreinterpretS8P64(r *arm.Int8X8, v0 *arm.Poly64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretS8P8 VreinterpretS8P8 //go:noescape func VreinterpretS8P8(r *arm.Int8X8, v0 *arm.Poly8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretU16S8 VreinterpretU16S8 //go:noescape func VreinterpretU16S8(r *arm.Uint16X4, v0 *arm.Int8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretU16S16 VreinterpretU16S16 //go:noescape func VreinterpretU16S16(r *arm.Uint16X4, v0 *arm.Int16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretU16S32 VreinterpretU16S32 //go:noescape func VreinterpretU16S32(r *arm.Uint16X4, v0 *arm.Int32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretU16S64 VreinterpretU16S64 //go:noescape func VreinterpretU16S64(r *arm.Uint16X4, v0 *arm.Int64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU16U8 VreinterpretU16U8 //go:noescape func VreinterpretU16U8(r *arm.Uint16X4, v0 *arm.Uint8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretU16U32 VreinterpretU16U32 //go:noescape func VreinterpretU16U32(r *arm.Uint16X4, v0 *arm.Uint32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretU16U64 VreinterpretU16U64 //go:noescape func VreinterpretU16U64(r *arm.Uint16X4, v0 *arm.Uint64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU16F32 VreinterpretU16F32 //go:noescape func VreinterpretU16F32(r *arm.Uint16X4, v0 *arm.Float32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretU16F64 VreinterpretU16F64 //go:noescape func VreinterpretU16F64(r *arm.Uint16X4, v0 *arm.Float64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU16P16 VreinterpretU16P16 //go:noescape func VreinterpretU16P16(r *arm.Uint16X4, v0 *arm.Poly16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretU16P64 VreinterpretU16P64 //go:noescape func VreinterpretU16P64(r *arm.Uint16X4, v0 *arm.Poly64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU16P8 VreinterpretU16P8 //go:noescape func VreinterpretU16P8(r *arm.Uint16X4, v0 *arm.Poly8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretU32S8 VreinterpretU32S8 //go:noescape func VreinterpretU32S8(r *arm.Uint32X2, v0 *arm.Int8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretU32S16 VreinterpretU32S16 //go:noescape func VreinterpretU32S16(r *arm.Uint32X2, v0 *arm.Int16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretU32S32 VreinterpretU32S32 //go:noescape func VreinterpretU32S32(r *arm.Uint32X2, v0 *arm.Int32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretU32S64 VreinterpretU32S64 //go:noescape func VreinterpretU32S64(r *arm.Uint32X2, v0 *arm.Int64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU32U8 VreinterpretU32U8 //go:noescape func VreinterpretU32U8(r *arm.Uint32X2, v0 *arm.Uint8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretU32U16 VreinterpretU32U16 //go:noescape func VreinterpretU32U16(r *arm.Uint32X2, v0 *arm.Uint16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretU32U64 VreinterpretU32U64 //go:noescape func VreinterpretU32U64(r *arm.Uint32X2, v0 *arm.Uint64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU32F32 VreinterpretU32F32 //go:noescape func VreinterpretU32F32(r *arm.Uint32X2, v0 *arm.Float32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretU32F64 VreinterpretU32F64 //go:noescape func VreinterpretU32F64(r *arm.Uint32X2, v0 *arm.Float64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU32P16 VreinterpretU32P16 //go:noescape func VreinterpretU32P16(r *arm.Uint32X2, v0 *arm.Poly16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretU32P64 VreinterpretU32P64 //go:noescape func VreinterpretU32P64(r *arm.Uint32X2, v0 *arm.Poly64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU32P8 VreinterpretU32P8 //go:noescape func VreinterpretU32P8(r *arm.Uint32X2, v0 *arm.Poly8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretU64S8 VreinterpretU64S8 //go:noescape func VreinterpretU64S8(r *arm.Uint64X1, v0 *arm.Int8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretU64S16 VreinterpretU64S16 //go:noescape func VreinterpretU64S16(r *arm.Uint64X1, v0 *arm.Int16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretU64S32 VreinterpretU64S32 //go:noescape func VreinterpretU64S32(r *arm.Uint64X1, v0 *arm.Int32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretU64S64 VreinterpretU64S64 //go:noescape func VreinterpretU64S64(r *arm.Uint64X1, v0 *arm.Int64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU64U8 VreinterpretU64U8 //go:noescape func VreinterpretU64U8(r *arm.Uint64X1, v0 *arm.Uint8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretU64U16 VreinterpretU64U16 //go:noescape func VreinterpretU64U16(r *arm.Uint64X1, v0 *arm.Uint16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretU64U32 VreinterpretU64U32 //go:noescape func VreinterpretU64U32(r *arm.Uint64X1, v0 *arm.Uint32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretU64F32 VreinterpretU64F32 //go:noescape func VreinterpretU64F32(r *arm.Uint64X1, v0 *arm.Float32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretU64F64 VreinterpretU64F64 //go:noescape func VreinterpretU64F64(r *arm.Uint64X1, v0 *arm.Float64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU64P16 VreinterpretU64P16 //go:noescape func VreinterpretU64P16(r *arm.Uint64X1, v0 *arm.Poly16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretU64P64 VreinterpretU64P64 //go:noescape func VreinterpretU64P64(r *arm.Uint64X1, v0 *arm.Poly64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU64P8 VreinterpretU64P8 //go:noescape func VreinterpretU64P8(r *arm.Uint64X1, v0 *arm.Poly8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretU8S8 VreinterpretU8S8 //go:noescape func VreinterpretU8S8(r *arm.Uint8X8, v0 *arm.Int8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretU8S16 VreinterpretU8S16 //go:noescape func VreinterpretU8S16(r *arm.Uint8X8, v0 *arm.Int16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretU8S32 VreinterpretU8S32 //go:noescape func VreinterpretU8S32(r *arm.Uint8X8, v0 *arm.Int32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretU8S64 VreinterpretU8S64 //go:noescape func VreinterpretU8S64(r *arm.Uint8X8, v0 *arm.Int64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU8U16 VreinterpretU8U16 //go:noescape func VreinterpretU8U16(r *arm.Uint8X8, v0 *arm.Uint16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretU8U32 VreinterpretU8U32 //go:noescape func VreinterpretU8U32(r *arm.Uint8X8, v0 *arm.Uint32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretU8U64 VreinterpretU8U64 //go:noescape func VreinterpretU8U64(r *arm.Uint8X8, v0 *arm.Uint64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU8F32 VreinterpretU8F32 //go:noescape func VreinterpretU8F32(r *arm.Uint8X8, v0 *arm.Float32X2) // Vector reinterpret cast operation // //go:linkname VreinterpretU8F64 VreinterpretU8F64 //go:noescape func VreinterpretU8F64(r *arm.Uint8X8, v0 *arm.Float64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU8P16 VreinterpretU8P16 //go:noescape func VreinterpretU8P16(r *arm.Uint8X8, v0 *arm.Poly16X4) // Vector reinterpret cast operation // //go:linkname VreinterpretU8P64 VreinterpretU8P64 //go:noescape func VreinterpretU8P64(r *arm.Uint8X8, v0 *arm.Poly64X1) // Vector reinterpret cast operation // //go:linkname VreinterpretU8P8 VreinterpretU8P8 //go:noescape func VreinterpretU8P8(r *arm.Uint8X8, v0 *arm.Poly8X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqF32S8 VreinterpretqF32S8 //go:noescape func VreinterpretqF32S8(r *arm.Float32X4, v0 *arm.Int8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqF32S16 VreinterpretqF32S16 //go:noescape func VreinterpretqF32S16(r *arm.Float32X4, v0 *arm.Int16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqF32S32 VreinterpretqF32S32 //go:noescape func VreinterpretqF32S32(r *arm.Float32X4, v0 *arm.Int32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqF32S64 VreinterpretqF32S64 //go:noescape func VreinterpretqF32S64(r *arm.Float32X4, v0 *arm.Int64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqF32U8 VreinterpretqF32U8 //go:noescape func VreinterpretqF32U8(r *arm.Float32X4, v0 *arm.Uint8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqF32U16 VreinterpretqF32U16 //go:noescape func VreinterpretqF32U16(r *arm.Float32X4, v0 *arm.Uint16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqF32U32 VreinterpretqF32U32 //go:noescape func VreinterpretqF32U32(r *arm.Float32X4, v0 *arm.Uint32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqF32U64 VreinterpretqF32U64 //go:noescape func VreinterpretqF32U64(r *arm.Float32X4, v0 *arm.Uint64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqF32F64 VreinterpretqF32F64 //go:noescape func VreinterpretqF32F64(r *arm.Float32X4, v0 *arm.Float64X2) // vreinterpretq_f32_p128 // //go:linkname VreinterpretqF32P128 VreinterpretqF32P128 //go:noescape func VreinterpretqF32P128(r *arm.Float32X4, v0 *arm.Poly128) // Vector reinterpret cast operation // //go:linkname VreinterpretqF32P16 VreinterpretqF32P16 //go:noescape func VreinterpretqF32P16(r *arm.Float32X4, v0 *arm.Poly16X8) // vreinterpretq_f32_p64 // //go:linkname VreinterpretqF32P64 VreinterpretqF32P64 //go:noescape func VreinterpretqF32P64(r *arm.Float32X4, v0 *arm.Poly64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqF32P8 VreinterpretqF32P8 //go:noescape func VreinterpretqF32P8(r *arm.Float32X4, v0 *arm.Poly8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64S8 VreinterpretqF64S8 //go:noescape func VreinterpretqF64S8(r *arm.Float64X2, v0 *arm.Int8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64S16 VreinterpretqF64S16 //go:noescape func VreinterpretqF64S16(r *arm.Float64X2, v0 *arm.Int16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64S32 VreinterpretqF64S32 //go:noescape func VreinterpretqF64S32(r *arm.Float64X2, v0 *arm.Int32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64S64 VreinterpretqF64S64 //go:noescape func VreinterpretqF64S64(r *arm.Float64X2, v0 *arm.Int64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64U8 VreinterpretqF64U8 //go:noescape func VreinterpretqF64U8(r *arm.Float64X2, v0 *arm.Uint8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64U16 VreinterpretqF64U16 //go:noescape func VreinterpretqF64U16(r *arm.Float64X2, v0 *arm.Uint16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64U32 VreinterpretqF64U32 //go:noescape func VreinterpretqF64U32(r *arm.Float64X2, v0 *arm.Uint32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64U64 VreinterpretqF64U64 //go:noescape func VreinterpretqF64U64(r *arm.Float64X2, v0 *arm.Uint64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64F32 VreinterpretqF64F32 //go:noescape func VreinterpretqF64F32(r *arm.Float64X2, v0 *arm.Float32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64P128 VreinterpretqF64P128 //go:noescape func VreinterpretqF64P128(r *arm.Float64X2, v0 *arm.Poly128) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64P16 VreinterpretqF64P16 //go:noescape func VreinterpretqF64P16(r *arm.Float64X2, v0 *arm.Poly16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64P64 VreinterpretqF64P64 //go:noescape func VreinterpretqF64P64(r *arm.Float64X2, v0 *arm.Poly64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64P8 VreinterpretqF64P8 //go:noescape func VreinterpretqF64P8(r *arm.Float64X2, v0 *arm.Poly8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqP128S8 VreinterpretqP128S8 //go:noescape func VreinterpretqP128S8(r *arm.Poly128, v0 *arm.Int8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqP128S16 VreinterpretqP128S16 //go:noescape func VreinterpretqP128S16(r *arm.Poly128, v0 *arm.Int16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqP128S32 VreinterpretqP128S32 //go:noescape func VreinterpretqP128S32(r *arm.Poly128, v0 *arm.Int32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqP128S64 VreinterpretqP128S64 //go:noescape func VreinterpretqP128S64(r *arm.Poly128, v0 *arm.Int64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqP128U8 VreinterpretqP128U8 //go:noescape func VreinterpretqP128U8(r *arm.Poly128, v0 *arm.Uint8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqP128U16 VreinterpretqP128U16 //go:noescape func VreinterpretqP128U16(r *arm.Poly128, v0 *arm.Uint16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqP128U32 VreinterpretqP128U32 //go:noescape func VreinterpretqP128U32(r *arm.Poly128, v0 *arm.Uint32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqP128U64 VreinterpretqP128U64 //go:noescape func VreinterpretqP128U64(r *arm.Poly128, v0 *arm.Uint64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqP128F32 VreinterpretqP128F32 //go:noescape func VreinterpretqP128F32(r *arm.Poly128, v0 *arm.Float32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqP128F64 VreinterpretqP128F64 //go:noescape func VreinterpretqP128F64(r *arm.Poly128, v0 *arm.Float64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqP128P16 VreinterpretqP128P16 //go:noescape func VreinterpretqP128P16(r *arm.Poly128, v0 *arm.Poly16X8) // vreinterpretq_p128_p64 // //go:linkname VreinterpretqP128P64 VreinterpretqP128P64 //go:noescape func VreinterpretqP128P64(r *arm.Poly128, v0 *arm.Poly64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqP128P8 VreinterpretqP128P8 //go:noescape func VreinterpretqP128P8(r *arm.Poly128, v0 *arm.Poly8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqP16S8 VreinterpretqP16S8 //go:noescape func VreinterpretqP16S8(r *arm.Poly16X8, v0 *arm.Int8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqP16S16 VreinterpretqP16S16 //go:noescape func VreinterpretqP16S16(r *arm.Poly16X8, v0 *arm.Int16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqP16S32 VreinterpretqP16S32 //go:noescape func VreinterpretqP16S32(r *arm.Poly16X8, v0 *arm.Int32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqP16S64 VreinterpretqP16S64 //go:noescape func VreinterpretqP16S64(r *arm.Poly16X8, v0 *arm.Int64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqP16U8 VreinterpretqP16U8 //go:noescape func VreinterpretqP16U8(r *arm.Poly16X8, v0 *arm.Uint8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqP16U16 VreinterpretqP16U16 //go:noescape func VreinterpretqP16U16(r *arm.Poly16X8, v0 *arm.Uint16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqP16U32 VreinterpretqP16U32 //go:noescape func VreinterpretqP16U32(r *arm.Poly16X8, v0 *arm.Uint32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqP16U64 VreinterpretqP16U64 //go:noescape func VreinterpretqP16U64(r *arm.Poly16X8, v0 *arm.Uint64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqP16F32 VreinterpretqP16F32 //go:noescape func VreinterpretqP16F32(r *arm.Poly16X8, v0 *arm.Float32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqP16F64 VreinterpretqP16F64 //go:noescape func VreinterpretqP16F64(r *arm.Poly16X8, v0 *arm.Float64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqP16P128 VreinterpretqP16P128 //go:noescape func VreinterpretqP16P128(r *arm.Poly16X8, v0 *arm.Poly128) // Vector reinterpret cast operation // //go:linkname VreinterpretqP16P64 VreinterpretqP16P64 //go:noescape func VreinterpretqP16P64(r *arm.Poly16X8, v0 *arm.Poly64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqP16P8 VreinterpretqP16P8 //go:noescape func VreinterpretqP16P8(r *arm.Poly16X8, v0 *arm.Poly8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqP64S8 VreinterpretqP64S8 //go:noescape func VreinterpretqP64S8(r *arm.Poly64X2, v0 *arm.Int8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqP64S16 VreinterpretqP64S16 //go:noescape func VreinterpretqP64S16(r *arm.Poly64X2, v0 *arm.Int16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqP64S32 VreinterpretqP64S32 //go:noescape func VreinterpretqP64S32(r *arm.Poly64X2, v0 *arm.Int32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqP64S64 VreinterpretqP64S64 //go:noescape func VreinterpretqP64S64(r *arm.Poly64X2, v0 *arm.Int64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqP64U8 VreinterpretqP64U8 //go:noescape func VreinterpretqP64U8(r *arm.Poly64X2, v0 *arm.Uint8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqP64U16 VreinterpretqP64U16 //go:noescape func VreinterpretqP64U16(r *arm.Poly64X2, v0 *arm.Uint16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqP64U32 VreinterpretqP64U32 //go:noescape func VreinterpretqP64U32(r *arm.Poly64X2, v0 *arm.Uint32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqP64U64 VreinterpretqP64U64 //go:noescape func VreinterpretqP64U64(r *arm.Poly64X2, v0 *arm.Uint64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqP64F32 VreinterpretqP64F32 //go:noescape func VreinterpretqP64F32(r *arm.Poly64X2, v0 *arm.Float32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqP64F64 VreinterpretqP64F64 //go:noescape func VreinterpretqP64F64(r *arm.Poly64X2, v0 *arm.Float64X2) // vreinterpretq_p64_p128 // //go:linkname VreinterpretqP64P128 VreinterpretqP64P128 //go:noescape func VreinterpretqP64P128(r *arm.Poly64X2, v0 *arm.Poly128) // Vector reinterpret cast operation // //go:linkname VreinterpretqP64P16 VreinterpretqP64P16 //go:noescape func VreinterpretqP64P16(r *arm.Poly64X2, v0 *arm.Poly16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqP64P8 VreinterpretqP64P8 //go:noescape func VreinterpretqP64P8(r *arm.Poly64X2, v0 *arm.Poly8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqP8S8 VreinterpretqP8S8 //go:noescape func VreinterpretqP8S8(r *arm.Poly8X16, v0 *arm.Int8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqP8S16 VreinterpretqP8S16 //go:noescape func VreinterpretqP8S16(r *arm.Poly8X16, v0 *arm.Int16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqP8S32 VreinterpretqP8S32 //go:noescape func VreinterpretqP8S32(r *arm.Poly8X16, v0 *arm.Int32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqP8S64 VreinterpretqP8S64 //go:noescape func VreinterpretqP8S64(r *arm.Poly8X16, v0 *arm.Int64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqP8U8 VreinterpretqP8U8 //go:noescape func VreinterpretqP8U8(r *arm.Poly8X16, v0 *arm.Uint8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqP8U16 VreinterpretqP8U16 //go:noescape func VreinterpretqP8U16(r *arm.Poly8X16, v0 *arm.Uint16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqP8U32 VreinterpretqP8U32 //go:noescape func VreinterpretqP8U32(r *arm.Poly8X16, v0 *arm.Uint32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqP8U64 VreinterpretqP8U64 //go:noescape func VreinterpretqP8U64(r *arm.Poly8X16, v0 *arm.Uint64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqP8F32 VreinterpretqP8F32 //go:noescape func VreinterpretqP8F32(r *arm.Poly8X16, v0 *arm.Float32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqP8F64 VreinterpretqP8F64 //go:noescape func VreinterpretqP8F64(r *arm.Poly8X16, v0 *arm.Float64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqP8P128 VreinterpretqP8P128 //go:noescape func VreinterpretqP8P128(r *arm.Poly8X16, v0 *arm.Poly128) // Vector reinterpret cast operation // //go:linkname VreinterpretqP8P16 VreinterpretqP8P16 //go:noescape func VreinterpretqP8P16(r *arm.Poly8X16, v0 *arm.Poly16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqP8P64 VreinterpretqP8P64 //go:noescape func VreinterpretqP8P64(r *arm.Poly8X16, v0 *arm.Poly64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS16S8 VreinterpretqS16S8 //go:noescape func VreinterpretqS16S8(r *arm.Int16X8, v0 *arm.Int8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqS16S32 VreinterpretqS16S32 //go:noescape func VreinterpretqS16S32(r *arm.Int16X8, v0 *arm.Int32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqS16S64 VreinterpretqS16S64 //go:noescape func VreinterpretqS16S64(r *arm.Int16X8, v0 *arm.Int64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS16U8 VreinterpretqS16U8 //go:noescape func VreinterpretqS16U8(r *arm.Int16X8, v0 *arm.Uint8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqS16U16 VreinterpretqS16U16 //go:noescape func VreinterpretqS16U16(r *arm.Int16X8, v0 *arm.Uint16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqS16U32 VreinterpretqS16U32 //go:noescape func VreinterpretqS16U32(r *arm.Int16X8, v0 *arm.Uint32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqS16U64 VreinterpretqS16U64 //go:noescape func VreinterpretqS16U64(r *arm.Int16X8, v0 *arm.Uint64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS16F32 VreinterpretqS16F32 //go:noescape func VreinterpretqS16F32(r *arm.Int16X8, v0 *arm.Float32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqS16F64 VreinterpretqS16F64 //go:noescape func VreinterpretqS16F64(r *arm.Int16X8, v0 *arm.Float64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS16P128 VreinterpretqS16P128 //go:noescape func VreinterpretqS16P128(r *arm.Int16X8, v0 *arm.Poly128) // Vector reinterpret cast operation // //go:linkname VreinterpretqS16P16 VreinterpretqS16P16 //go:noescape func VreinterpretqS16P16(r *arm.Int16X8, v0 *arm.Poly16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqS16P64 VreinterpretqS16P64 //go:noescape func VreinterpretqS16P64(r *arm.Int16X8, v0 *arm.Poly64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS16P8 VreinterpretqS16P8 //go:noescape func VreinterpretqS16P8(r *arm.Int16X8, v0 *arm.Poly8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32S8 VreinterpretqS32S8 //go:noescape func VreinterpretqS32S8(r *arm.Int32X4, v0 *arm.Int8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32S16 VreinterpretqS32S16 //go:noescape func VreinterpretqS32S16(r *arm.Int32X4, v0 *arm.Int16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32S64 VreinterpretqS32S64 //go:noescape func VreinterpretqS32S64(r *arm.Int32X4, v0 *arm.Int64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32U8 VreinterpretqS32U8 //go:noescape func VreinterpretqS32U8(r *arm.Int32X4, v0 *arm.Uint8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32U16 VreinterpretqS32U16 //go:noescape func VreinterpretqS32U16(r *arm.Int32X4, v0 *arm.Uint16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32U32 VreinterpretqS32U32 //go:noescape func VreinterpretqS32U32(r *arm.Int32X4, v0 *arm.Uint32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32U64 VreinterpretqS32U64 //go:noescape func VreinterpretqS32U64(r *arm.Int32X4, v0 *arm.Uint64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32F32 VreinterpretqS32F32 //go:noescape func VreinterpretqS32F32(r *arm.Int32X4, v0 *arm.Float32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32F64 VreinterpretqS32F64 //go:noescape func VreinterpretqS32F64(r *arm.Int32X4, v0 *arm.Float64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32P128 VreinterpretqS32P128 //go:noescape func VreinterpretqS32P128(r *arm.Int32X4, v0 *arm.Poly128) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32P16 VreinterpretqS32P16 //go:noescape func VreinterpretqS32P16(r *arm.Int32X4, v0 *arm.Poly16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32P64 VreinterpretqS32P64 //go:noescape func VreinterpretqS32P64(r *arm.Int32X4, v0 *arm.Poly64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32P8 VreinterpretqS32P8 //go:noescape func VreinterpretqS32P8(r *arm.Int32X4, v0 *arm.Poly8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64S8 VreinterpretqS64S8 //go:noescape func VreinterpretqS64S8(r *arm.Int64X2, v0 *arm.Int8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64S16 VreinterpretqS64S16 //go:noescape func VreinterpretqS64S16(r *arm.Int64X2, v0 *arm.Int16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64S32 VreinterpretqS64S32 //go:noescape func VreinterpretqS64S32(r *arm.Int64X2, v0 *arm.Int32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64U8 VreinterpretqS64U8 //go:noescape func VreinterpretqS64U8(r *arm.Int64X2, v0 *arm.Uint8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64U16 VreinterpretqS64U16 //go:noescape func VreinterpretqS64U16(r *arm.Int64X2, v0 *arm.Uint16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64U32 VreinterpretqS64U32 //go:noescape func VreinterpretqS64U32(r *arm.Int64X2, v0 *arm.Uint32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64U64 VreinterpretqS64U64 //go:noescape func VreinterpretqS64U64(r *arm.Int64X2, v0 *arm.Uint64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64F32 VreinterpretqS64F32 //go:noescape func VreinterpretqS64F32(r *arm.Int64X2, v0 *arm.Float32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64F64 VreinterpretqS64F64 //go:noescape func VreinterpretqS64F64(r *arm.Int64X2, v0 *arm.Float64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64P128 VreinterpretqS64P128 //go:noescape func VreinterpretqS64P128(r *arm.Int64X2, v0 *arm.Poly128) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64P16 VreinterpretqS64P16 //go:noescape func VreinterpretqS64P16(r *arm.Int64X2, v0 *arm.Poly16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64P64 VreinterpretqS64P64 //go:noescape func VreinterpretqS64P64(r *arm.Int64X2, v0 *arm.Poly64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64P8 VreinterpretqS64P8 //go:noescape func VreinterpretqS64P8(r *arm.Int64X2, v0 *arm.Poly8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqS8S16 VreinterpretqS8S16 //go:noescape func VreinterpretqS8S16(r *arm.Int8X16, v0 *arm.Int16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqS8S32 VreinterpretqS8S32 //go:noescape func VreinterpretqS8S32(r *arm.Int8X16, v0 *arm.Int32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqS8S64 VreinterpretqS8S64 //go:noescape func VreinterpretqS8S64(r *arm.Int8X16, v0 *arm.Int64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS8U8 VreinterpretqS8U8 //go:noescape func VreinterpretqS8U8(r *arm.Int8X16, v0 *arm.Uint8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqS8U16 VreinterpretqS8U16 //go:noescape func VreinterpretqS8U16(r *arm.Int8X16, v0 *arm.Uint16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqS8U32 VreinterpretqS8U32 //go:noescape func VreinterpretqS8U32(r *arm.Int8X16, v0 *arm.Uint32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqS8U64 VreinterpretqS8U64 //go:noescape func VreinterpretqS8U64(r *arm.Int8X16, v0 *arm.Uint64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS8F32 VreinterpretqS8F32 //go:noescape func VreinterpretqS8F32(r *arm.Int8X16, v0 *arm.Float32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqS8F64 VreinterpretqS8F64 //go:noescape func VreinterpretqS8F64(r *arm.Int8X16, v0 *arm.Float64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS8P128 VreinterpretqS8P128 //go:noescape func VreinterpretqS8P128(r *arm.Int8X16, v0 *arm.Poly128) // Vector reinterpret cast operation // //go:linkname VreinterpretqS8P16 VreinterpretqS8P16 //go:noescape func VreinterpretqS8P16(r *arm.Int8X16, v0 *arm.Poly16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqS8P64 VreinterpretqS8P64 //go:noescape func VreinterpretqS8P64(r *arm.Int8X16, v0 *arm.Poly64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqS8P8 VreinterpretqS8P8 //go:noescape func VreinterpretqS8P8(r *arm.Int8X16, v0 *arm.Poly8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqU16S8 VreinterpretqU16S8 //go:noescape func VreinterpretqU16S8(r *arm.Uint16X8, v0 *arm.Int8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqU16S16 VreinterpretqU16S16 //go:noescape func VreinterpretqU16S16(r *arm.Uint16X8, v0 *arm.Int16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqU16S32 VreinterpretqU16S32 //go:noescape func VreinterpretqU16S32(r *arm.Uint16X8, v0 *arm.Int32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqU16S64 VreinterpretqU16S64 //go:noescape func VreinterpretqU16S64(r *arm.Uint16X8, v0 *arm.Int64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU16U8 VreinterpretqU16U8 //go:noescape func VreinterpretqU16U8(r *arm.Uint16X8, v0 *arm.Uint8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqU16U32 VreinterpretqU16U32 //go:noescape func VreinterpretqU16U32(r *arm.Uint16X8, v0 *arm.Uint32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqU16U64 VreinterpretqU16U64 //go:noescape func VreinterpretqU16U64(r *arm.Uint16X8, v0 *arm.Uint64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU16F32 VreinterpretqU16F32 //go:noescape func VreinterpretqU16F32(r *arm.Uint16X8, v0 *arm.Float32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqU16F64 VreinterpretqU16F64 //go:noescape func VreinterpretqU16F64(r *arm.Uint16X8, v0 *arm.Float64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU16P128 VreinterpretqU16P128 //go:noescape func VreinterpretqU16P128(r *arm.Uint16X8, v0 *arm.Poly128) // Vector reinterpret cast operation // //go:linkname VreinterpretqU16P16 VreinterpretqU16P16 //go:noescape func VreinterpretqU16P16(r *arm.Uint16X8, v0 *arm.Poly16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqU16P64 VreinterpretqU16P64 //go:noescape func VreinterpretqU16P64(r *arm.Uint16X8, v0 *arm.Poly64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU16P8 VreinterpretqU16P8 //go:noescape func VreinterpretqU16P8(r *arm.Uint16X8, v0 *arm.Poly8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32S8 VreinterpretqU32S8 //go:noescape func VreinterpretqU32S8(r *arm.Uint32X4, v0 *arm.Int8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32S16 VreinterpretqU32S16 //go:noescape func VreinterpretqU32S16(r *arm.Uint32X4, v0 *arm.Int16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32S32 VreinterpretqU32S32 //go:noescape func VreinterpretqU32S32(r *arm.Uint32X4, v0 *arm.Int32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32S64 VreinterpretqU32S64 //go:noescape func VreinterpretqU32S64(r *arm.Uint32X4, v0 *arm.Int64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32U8 VreinterpretqU32U8 //go:noescape func VreinterpretqU32U8(r *arm.Uint32X4, v0 *arm.Uint8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32U16 VreinterpretqU32U16 //go:noescape func VreinterpretqU32U16(r *arm.Uint32X4, v0 *arm.Uint16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32U64 VreinterpretqU32U64 //go:noescape func VreinterpretqU32U64(r *arm.Uint32X4, v0 *arm.Uint64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32F32 VreinterpretqU32F32 //go:noescape func VreinterpretqU32F32(r *arm.Uint32X4, v0 *arm.Float32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32F64 VreinterpretqU32F64 //go:noescape func VreinterpretqU32F64(r *arm.Uint32X4, v0 *arm.Float64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32P128 VreinterpretqU32P128 //go:noescape func VreinterpretqU32P128(r *arm.Uint32X4, v0 *arm.Poly128) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32P16 VreinterpretqU32P16 //go:noescape func VreinterpretqU32P16(r *arm.Uint32X4, v0 *arm.Poly16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32P64 VreinterpretqU32P64 //go:noescape func VreinterpretqU32P64(r *arm.Uint32X4, v0 *arm.Poly64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32P8 VreinterpretqU32P8 //go:noescape func VreinterpretqU32P8(r *arm.Uint32X4, v0 *arm.Poly8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64S8 VreinterpretqU64S8 //go:noescape func VreinterpretqU64S8(r *arm.Uint64X2, v0 *arm.Int8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64S16 VreinterpretqU64S16 //go:noescape func VreinterpretqU64S16(r *arm.Uint64X2, v0 *arm.Int16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64S32 VreinterpretqU64S32 //go:noescape func VreinterpretqU64S32(r *arm.Uint64X2, v0 *arm.Int32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64S64 VreinterpretqU64S64 //go:noescape func VreinterpretqU64S64(r *arm.Uint64X2, v0 *arm.Int64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64U8 VreinterpretqU64U8 //go:noescape func VreinterpretqU64U8(r *arm.Uint64X2, v0 *arm.Uint8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64U16 VreinterpretqU64U16 //go:noescape func VreinterpretqU64U16(r *arm.Uint64X2, v0 *arm.Uint16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64U32 VreinterpretqU64U32 //go:noescape func VreinterpretqU64U32(r *arm.Uint64X2, v0 *arm.Uint32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64F32 VreinterpretqU64F32 //go:noescape func VreinterpretqU64F32(r *arm.Uint64X2, v0 *arm.Float32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64F64 VreinterpretqU64F64 //go:noescape func VreinterpretqU64F64(r *arm.Uint64X2, v0 *arm.Float64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64P128 VreinterpretqU64P128 //go:noescape func VreinterpretqU64P128(r *arm.Uint64X2, v0 *arm.Poly128) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64P16 VreinterpretqU64P16 //go:noescape func VreinterpretqU64P16(r *arm.Uint64X2, v0 *arm.Poly16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64P64 VreinterpretqU64P64 //go:noescape func VreinterpretqU64P64(r *arm.Uint64X2, v0 *arm.Poly64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64P8 VreinterpretqU64P8 //go:noescape func VreinterpretqU64P8(r *arm.Uint64X2, v0 *arm.Poly8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqU8S8 VreinterpretqU8S8 //go:noescape func VreinterpretqU8S8(r *arm.Uint8X16, v0 *arm.Int8X16) // Vector reinterpret cast operation // //go:linkname VreinterpretqU8S16 VreinterpretqU8S16 //go:noescape func VreinterpretqU8S16(r *arm.Uint8X16, v0 *arm.Int16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqU8S32 VreinterpretqU8S32 //go:noescape func VreinterpretqU8S32(r *arm.Uint8X16, v0 *arm.Int32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqU8S64 VreinterpretqU8S64 //go:noescape func VreinterpretqU8S64(r *arm.Uint8X16, v0 *arm.Int64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU8U16 VreinterpretqU8U16 //go:noescape func VreinterpretqU8U16(r *arm.Uint8X16, v0 *arm.Uint16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqU8U32 VreinterpretqU8U32 //go:noescape func VreinterpretqU8U32(r *arm.Uint8X16, v0 *arm.Uint32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqU8U64 VreinterpretqU8U64 //go:noescape func VreinterpretqU8U64(r *arm.Uint8X16, v0 *arm.Uint64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU8F32 VreinterpretqU8F32 //go:noescape func VreinterpretqU8F32(r *arm.Uint8X16, v0 *arm.Float32X4) // Vector reinterpret cast operation // //go:linkname VreinterpretqU8F64 VreinterpretqU8F64 //go:noescape func VreinterpretqU8F64(r *arm.Uint8X16, v0 *arm.Float64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU8P128 VreinterpretqU8P128 //go:noescape func VreinterpretqU8P128(r *arm.Uint8X16, v0 *arm.Poly128) // Vector reinterpret cast operation // //go:linkname VreinterpretqU8P16 VreinterpretqU8P16 //go:noescape func VreinterpretqU8P16(r *arm.Uint8X16, v0 *arm.Poly16X8) // Vector reinterpret cast operation // //go:linkname VreinterpretqU8P64 VreinterpretqU8P64 //go:noescape func VreinterpretqU8P64(r *arm.Uint8X16, v0 *arm.Poly64X2) // Vector reinterpret cast operation // //go:linkname VreinterpretqU8P8 VreinterpretqU8P8 //go:noescape func VreinterpretqU8P8(r *arm.Uint8X16, v0 *arm.Poly8X16) // Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev16S8 Vrev16S8 //go:noescape func Vrev16S8(r *arm.Int8X8, v0 *arm.Int8X8) // Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev16U8 Vrev16U8 //go:noescape func Vrev16U8(r *arm.Uint8X8, v0 *arm.Uint8X8) // Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev16P8 Vrev16P8 //go:noescape func Vrev16P8(r *arm.Poly8X8, v0 *arm.Poly8X8) // Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev16QS8 Vrev16QS8 //go:noescape func Vrev16QS8(r *arm.Int8X16, v0 *arm.Int8X16) // Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev16QU8 Vrev16QU8 //go:noescape func Vrev16QU8(r *arm.Uint8X16, v0 *arm.Uint8X16) // Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev16QP8 Vrev16QP8 //go:noescape func Vrev16QP8(r *arm.Poly8X16, v0 *arm.Poly8X16) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32S8 Vrev32S8 //go:noescape func Vrev32S8(r *arm.Int8X8, v0 *arm.Int8X8) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32S16 Vrev32S16 //go:noescape func Vrev32S16(r *arm.Int16X4, v0 *arm.Int16X4) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32U8 Vrev32U8 //go:noescape func Vrev32U8(r *arm.Uint8X8, v0 *arm.Uint8X8) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32U16 Vrev32U16 //go:noescape func Vrev32U16(r *arm.Uint16X4, v0 *arm.Uint16X4) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32P16 Vrev32P16 //go:noescape func Vrev32P16(r *arm.Poly16X4, v0 *arm.Poly16X4) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32P8 Vrev32P8 //go:noescape func Vrev32P8(r *arm.Poly8X8, v0 *arm.Poly8X8) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32QS8 Vrev32QS8 //go:noescape func Vrev32QS8(r *arm.Int8X16, v0 *arm.Int8X16) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32QS16 Vrev32QS16 //go:noescape func Vrev32QS16(r *arm.Int16X8, v0 *arm.Int16X8) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32QU8 Vrev32QU8 //go:noescape func Vrev32QU8(r *arm.Uint8X16, v0 *arm.Uint8X16) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32QU16 Vrev32QU16 //go:noescape func Vrev32QU16(r *arm.Uint16X8, v0 *arm.Uint16X8) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32QP16 Vrev32QP16 //go:noescape func Vrev32QP16(r *arm.Poly16X8, v0 *arm.Poly16X8) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32QP8 Vrev32QP8 //go:noescape func Vrev32QP8(r *arm.Poly8X16, v0 *arm.Poly8X16) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64S8 Vrev64S8 //go:noescape func Vrev64S8(r *arm.Int8X8, v0 *arm.Int8X8) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64S16 Vrev64S16 //go:noescape func Vrev64S16(r *arm.Int16X4, v0 *arm.Int16X4) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64S32 Vrev64S32 //go:noescape func Vrev64S32(r *arm.Int32X2, v0 *arm.Int32X2) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64U8 Vrev64U8 //go:noescape func Vrev64U8(r *arm.Uint8X8, v0 *arm.Uint8X8) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64U16 Vrev64U16 //go:noescape func Vrev64U16(r *arm.Uint16X4, v0 *arm.Uint16X4) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64U32 Vrev64U32 //go:noescape func Vrev64U32(r *arm.Uint32X2, v0 *arm.Uint32X2) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64F32 Vrev64F32 //go:noescape func Vrev64F32(r *arm.Float32X2, v0 *arm.Float32X2) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64P16 Vrev64P16 //go:noescape func Vrev64P16(r *arm.Poly16X4, v0 *arm.Poly16X4) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64P8 Vrev64P8 //go:noescape func Vrev64P8(r *arm.Poly8X8, v0 *arm.Poly8X8) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QS8 Vrev64QS8 //go:noescape func Vrev64QS8(r *arm.Int8X16, v0 *arm.Int8X16) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QS16 Vrev64QS16 //go:noescape func Vrev64QS16(r *arm.Int16X8, v0 *arm.Int16X8) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QS32 Vrev64QS32 //go:noescape func Vrev64QS32(r *arm.Int32X4, v0 *arm.Int32X4) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QU8 Vrev64QU8 //go:noescape func Vrev64QU8(r *arm.Uint8X16, v0 *arm.Uint8X16) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QU16 Vrev64QU16 //go:noescape func Vrev64QU16(r *arm.Uint16X8, v0 *arm.Uint16X8) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QU32 Vrev64QU32 //go:noescape func Vrev64QU32(r *arm.Uint32X4, v0 *arm.Uint32X4) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QF32 Vrev64QF32 //go:noescape func Vrev64QF32(r *arm.Float32X4, v0 *arm.Float32X4) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QP16 Vrev64QP16 //go:noescape func Vrev64QP16(r *arm.Poly16X8, v0 *arm.Poly16X8) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QP8 Vrev64QP8 //go:noescape func Vrev64QP8(r *arm.Poly8X16, v0 *arm.Poly8X16) // Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddS8 VrhaddS8 //go:noescape func VrhaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddS16 VrhaddS16 //go:noescape func VrhaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddS32 VrhaddS32 //go:noescape func VrhaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddU8 VrhaddU8 //go:noescape func VrhaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddU16 VrhaddU16 //go:noescape func VrhaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddU32 VrhaddU32 //go:noescape func VrhaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddqS8 VrhaddqS8 //go:noescape func VrhaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddqS16 VrhaddqS16 //go:noescape func VrhaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddqS32 VrhaddqS32 //go:noescape func VrhaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddqU8 VrhaddqU8 //go:noescape func VrhaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddqU16 VrhaddqU16 //go:noescape func VrhaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddqU32 VrhaddqU32 //go:noescape func VrhaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndF32 VrndF32 //go:noescape func VrndF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndF64 VrndF64 //go:noescape func VrndF64(r *arm.Float64X1, v0 *arm.Float64X1) // Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32XF32 Vrnd32XF32 //go:noescape func Vrnd32XF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32XF64 Vrnd32XF64 //go:noescape func Vrnd32XF64(r *arm.Float64X1, v0 *arm.Float64X1) // Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32XqF32 Vrnd32XqF32 //go:noescape func Vrnd32XqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32XqF64 Vrnd32XqF64 //go:noescape func Vrnd32XqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32ZF32 Vrnd32ZF32 //go:noescape func Vrnd32ZF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32ZF64 Vrnd32ZF64 //go:noescape func Vrnd32ZF64(r *arm.Float64X1, v0 *arm.Float64X1) // Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32ZqF32 Vrnd32ZqF32 //go:noescape func Vrnd32ZqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32ZqF64 Vrnd32ZqF64 //go:noescape func Vrnd32ZqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64XF32 Vrnd64XF32 //go:noescape func Vrnd64XF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64XF64 Vrnd64XF64 //go:noescape func Vrnd64XF64(r *arm.Float64X1, v0 *arm.Float64X1) // Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64XqF32 Vrnd64XqF32 //go:noescape func Vrnd64XqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64XqF64 Vrnd64XqF64 //go:noescape func Vrnd64XqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64ZF32 Vrnd64ZF32 //go:noescape func Vrnd64ZF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64ZF64 Vrnd64ZF64 //go:noescape func Vrnd64ZF64(r *arm.Float64X1, v0 *arm.Float64X1) // Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64ZqF32 Vrnd64ZqF32 //go:noescape func Vrnd64ZqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64ZqF64 Vrnd64ZqF64 //go:noescape func Vrnd64ZqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndaF32 VrndaF32 //go:noescape func VrndaF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndaF64 VrndaF64 //go:noescape func VrndaF64(r *arm.Float64X1, v0 *arm.Float64X1) // Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndaqF32 VrndaqF32 //go:noescape func VrndaqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndaqF64 VrndaqF64 //go:noescape func VrndaqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndiF32 VrndiF32 //go:noescape func VrndiF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndiF64 VrndiF64 //go:noescape func VrndiF64(r *arm.Float64X1, v0 *arm.Float64X1) // Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndiqF32 VrndiqF32 //go:noescape func VrndiqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndiqF64 VrndiqF64 //go:noescape func VrndiqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndmF32 VrndmF32 //go:noescape func VrndmF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndmF64 VrndmF64 //go:noescape func VrndmF64(r *arm.Float64X1, v0 *arm.Float64X1) // Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndmqF32 VrndmqF32 //go:noescape func VrndmqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndmqF64 VrndmqF64 //go:noescape func VrndmqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndnF32 VrndnF32 //go:noescape func VrndnF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndnF64 VrndnF64 //go:noescape func VrndnF64(r *arm.Float64X1, v0 *arm.Float64X1) // Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndnqF32 VrndnqF32 //go:noescape func VrndnqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndnqF64 VrndnqF64 //go:noescape func VrndnqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndnsF32 VrndnsF32 //go:noescape func VrndnsF32(r *arm.Float32, v0 *arm.Float32) // Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndpF32 VrndpF32 //go:noescape func VrndpF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndpF64 VrndpF64 //go:noescape func VrndpF64(r *arm.Float64X1, v0 *arm.Float64X1) // Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndpqF32 VrndpqF32 //go:noescape func VrndpqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndpqF64 VrndpqF64 //go:noescape func VrndpqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndqF32 VrndqF32 //go:noescape func VrndqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndqF64 VrndqF64 //go:noescape func VrndqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndxF32 VrndxF32 //go:noescape func VrndxF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndxF64 VrndxF64 //go:noescape func VrndxF64(r *arm.Float64X1, v0 *arm.Float64X1) // Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndxqF32 VrndxqF32 //go:noescape func VrndxqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndxqF64 VrndxqF64 //go:noescape func VrndxqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlS8 VrshlS8 //go:noescape func VrshlS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlS16 VrshlS16 //go:noescape func VrshlS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlS32 VrshlS32 //go:noescape func VrshlS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlS64 VrshlS64 //go:noescape func VrshlS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlU8 VrshlU8 //go:noescape func VrshlU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8) // Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlU16 VrshlU16 //go:noescape func VrshlU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4) // Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlU32 VrshlU32 //go:noescape func VrshlU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2) // Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlU64 VrshlU64 //go:noescape func VrshlU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshldS64 VrshldS64 //go:noescape func VrshldS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64) // Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshldU64 VrshldU64 //go:noescape func VrshldU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlqS8 VrshlqS8 //go:noescape func VrshlqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlqS16 VrshlqS16 //go:noescape func VrshlqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlqS32 VrshlqS32 //go:noescape func VrshlqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlqS64 VrshlqS64 //go:noescape func VrshlqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlqU8 VrshlqU8 //go:noescape func VrshlqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16) // Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlqU16 VrshlqU16 //go:noescape func VrshlqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8) // Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlqU32 VrshlqU32 //go:noescape func VrshlqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4) // Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlqU64 VrshlqU64 //go:noescape func VrshlqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2) // Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VrsqrteU32 VrsqrteU32 //go:noescape func VrsqrteU32(r *arm.Uint32X2, v0 *arm.Uint32X2) // Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrteF32 VrsqrteF32 //go:noescape func VrsqrteF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrteF64 VrsqrteF64 //go:noescape func VrsqrteF64(r *arm.Float64X1, v0 *arm.Float64X1) // Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtedF64 VrsqrtedF64 //go:noescape func VrsqrtedF64(r *arm.Float64, v0 *arm.Float64) // Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VrsqrteqU32 VrsqrteqU32 //go:noescape func VrsqrteqU32(r *arm.Uint32X4, v0 *arm.Uint32X4) // Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrteqF32 VrsqrteqF32 //go:noescape func VrsqrteqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrteqF64 VrsqrteqF64 //go:noescape func VrsqrteqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtesF32 VrsqrtesF32 //go:noescape func VrsqrtesF32(r *arm.Float32, v0 *arm.Float32) // Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtsF32 VrsqrtsF32 //go:noescape func VrsqrtsF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtsF64 VrsqrtsF64 //go:noescape func VrsqrtsF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtsdF64 VrsqrtsdF64 //go:noescape func VrsqrtsdF64(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64) // Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtsqF32 VrsqrtsqF32 //go:noescape func VrsqrtsqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtsqF64 VrsqrtsqF64 //go:noescape func VrsqrtsqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtssF32 VrsqrtssF32 //go:noescape func VrsqrtssF32(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32) // Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VrsubhnS16 VrsubhnS16 //go:noescape func VrsubhnS16(r *arm.Int8X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VrsubhnS32 VrsubhnS32 //go:noescape func VrsubhnS32(r *arm.Int16X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VrsubhnS64 VrsubhnS64 //go:noescape func VrsubhnS64(r *arm.Int32X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VrsubhnU16 VrsubhnU16 //go:noescape func VrsubhnU16(r *arm.Uint8X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VrsubhnU32 VrsubhnU32 //go:noescape func VrsubhnU32(r *arm.Uint16X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VrsubhnU64 VrsubhnU64 //go:noescape func VrsubhnU64(r *arm.Uint32X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VrsubhnHighS16 VrsubhnHighS16 //go:noescape func VrsubhnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8, v2 *arm.Int16X8) // Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VrsubhnHighS32 VrsubhnHighS32 //go:noescape func VrsubhnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4, v2 *arm.Int32X4) // Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VrsubhnHighS64 VrsubhnHighS64 //go:noescape func VrsubhnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2, v2 *arm.Int64X2) // Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VrsubhnHighU16 VrsubhnHighU16 //go:noescape func VrsubhnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8) // Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VrsubhnHighU32 VrsubhnHighU32 //go:noescape func VrsubhnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. // //go:linkname VrsubhnHighU64 VrsubhnHighU64 //go:noescape func VrsubhnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2) // SHA1 hash update (choose). // //go:linkname Vsha1CqU32 Vsha1CqU32 //go:noescape func Vsha1CqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32, v2 *arm.Uint32X4) // SHA1 fixed rotate. // //go:linkname Vsha1HU32 Vsha1HU32 //go:noescape func Vsha1HU32(r *arm.Uint32, v0 *arm.Uint32) // SHA1 hash update (majority). // //go:linkname Vsha1MqU32 Vsha1MqU32 //go:noescape func Vsha1MqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32, v2 *arm.Uint32X4) // SHA1 hash update (parity). // //go:linkname Vsha1PqU32 Vsha1PqU32 //go:noescape func Vsha1PqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32, v2 *arm.Uint32X4) // SHA1 schedule update 0. // //go:linkname Vsha1Su0QU32 Vsha1Su0QU32 //go:noescape func Vsha1Su0QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // SHA1 schedule update 1. // //go:linkname Vsha1Su1QU32 Vsha1Su1QU32 //go:noescape func Vsha1Su1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // SHA256 hash update (part 2). // //go:linkname Vsha256H2QU32 Vsha256H2QU32 //go:noescape func Vsha256H2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // SHA256 hash update (part 1). // //go:linkname Vsha256HqU32 Vsha256HqU32 //go:noescape func Vsha256HqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // SHA256 schedule update 0. // //go:linkname Vsha256Su0QU32 Vsha256Su0QU32 //go:noescape func Vsha256Su0QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // SHA256 schedule update 1. // //go:linkname Vsha256Su1QU32 Vsha256Su1QU32 //go:noescape func Vsha256Su1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // SHA512 Hash update part 2 takes the values from the three 128-bit source SIMD&FP registers and produces a 128-bit output value that combines the sigma0 and majority functions of two iterations of the SHA512 computation. It returns this value to the destination SIMD&FP register. // //go:linkname Vsha512H2QU64 Vsha512H2QU64 //go:noescape func Vsha512H2QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2) // SHA512 Hash update part 1 takes the values from the three 128-bit source SIMD&FP registers and produces a 128-bit output value that combines the sigma1 and chi functions of two iterations of the SHA512 computation. It returns this value to the destination SIMD&FP register. // //go:linkname Vsha512HqU64 Vsha512HqU64 //go:noescape func Vsha512HqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2) // SHA512 Schedule Update 0 takes the values from the two 128-bit source SIMD&FP registers and produces a 128-bit output value that combines the gamma0 functions of two iterations of the SHA512 schedule update that are performed after the first 16 iterations within a block. It returns this value to the destination SIMD&FP register. // //go:linkname Vsha512Su0QU64 Vsha512Su0QU64 //go:noescape func Vsha512Su0QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // SHA512 Schedule Update 1 takes the values from the three source SIMD&FP registers and produces a 128-bit output value that combines the gamma1 functions of two iterations of the SHA512 schedule update that are performed after the first 16 iterations within a block. It returns this value to the destination SIMD&FP register. // //go:linkname Vsha512Su1QU64 Vsha512Su1QU64 //go:noescape func Vsha512Su1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlS8 VshlS8 //go:noescape func VshlS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlS16 VshlS16 //go:noescape func VshlS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlS32 VshlS32 //go:noescape func VshlS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlS64 VshlS64 //go:noescape func VshlS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlU8 VshlU8 //go:noescape func VshlU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8) // Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlU16 VshlU16 //go:noescape func VshlU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4) // Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlU32 VshlU32 //go:noescape func VshlU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2) // Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlU64 VshlU64 //go:noescape func VshlU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshldS64 VshldS64 //go:noescape func VshldS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64) // Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshldU64 VshldU64 //go:noescape func VshldU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlqS8 VshlqS8 //go:noescape func VshlqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlqS16 VshlqS16 //go:noescape func VshlqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlqS32 VshlqS32 //go:noescape func VshlqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlqS64 VshlqS64 //go:noescape func VshlqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlqU8 VshlqU8 //go:noescape func VshlqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16) // Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlqU16 VshlqU16 //go:noescape func VshlqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8) // Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlqU32 VshlqU32 //go:noescape func VshlqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4) // Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlqU64 VshlqU64 //go:noescape func VshlqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2) // SM3PARTW1 takes three 128-bit vectors from the three source SIMD&FP registers and returns a 128-bit result in the destination SIMD&FP register. The result is obtained by a three-way exclusive OR of the elements within the input vectors with some fixed rotations, see the Operation pseudocode for more information. // //go:linkname Vsm3Partw1QU32 Vsm3Partw1QU32 //go:noescape func Vsm3Partw1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // SM3PARTW2 takes three 128-bit vectors from three source SIMD&FP registers and returns a 128-bit result in the destination SIMD&FP register. The result is obtained by a three-way exclusive OR of the elements within the input vectors with some fixed rotations, see the Operation pseudocode for more information. // //go:linkname Vsm3Partw2QU32 Vsm3Partw2QU32 //go:noescape func Vsm3Partw2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // SM3SS1 rotates the top 32 bits of the 128-bit vector in the first source SIMD&FP register by 12, and adds that 32-bit value to the two other 32-bit values held in the top 32 bits of each of the 128-bit vectors in the second and third source SIMD&FP registers, rotating this result left by 7 and writing the final result into the top 32 bits of the vector in the destination SIMD&FP register, with the bottom 96 bits of the vector being written to 0. // //go:linkname Vsm3Ss1QU32 Vsm3Ss1QU32 //go:noescape func Vsm3Ss1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // SM4 Key takes an input as a 128-bit vector from the first source SIMD&FP register and a 128-bit constant from the second SIMD&FP register. It derives four iterations of the output key, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register. // //go:linkname Vsm4EkeyqU32 Vsm4EkeyqU32 //go:noescape func Vsm4EkeyqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // SM4 Encode takes input data as a 128-bit vector from the first source SIMD&FP register, and four iterations of the round key held as the elements of the 128-bit vector in the second source SIMD&FP register. It encrypts the data by four rounds, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register. // //go:linkname Vsm4EqU32 Vsm4EqU32 //go:noescape func Vsm4EqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register. // //go:linkname VsqaddU8 VsqaddU8 //go:noescape func VsqaddU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Int8X8) // Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register. // //go:linkname VsqaddU16 VsqaddU16 //go:noescape func VsqaddU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Int16X4) // Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register. // //go:linkname VsqaddU32 VsqaddU32 //go:noescape func VsqaddU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Int32X2) // Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register. // //go:linkname VsqaddU64 VsqaddU64 //go:noescape func VsqaddU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Int64X1) // Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register. // //go:linkname VsqaddbU8 VsqaddbU8 //go:noescape func VsqaddbU8(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Int8) // Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register. // //go:linkname VsqadddU64 VsqadddU64 //go:noescape func VsqadddU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Int64) // Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register. // //go:linkname VsqaddhU16 VsqaddhU16 //go:noescape func VsqaddhU16(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Int16) // Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register. // //go:linkname VsqaddqU8 VsqaddqU8 //go:noescape func VsqaddqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Int8X16) // Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register. // //go:linkname VsqaddqU16 VsqaddqU16 //go:noescape func VsqaddqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Int16X8) // Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register. // //go:linkname VsqaddqU32 VsqaddqU32 //go:noescape func VsqaddqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Int32X4) // Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register. // //go:linkname VsqaddqU64 VsqaddqU64 //go:noescape func VsqaddqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Int64X2) // Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the destination SIMD&FP register. // //go:linkname VsqaddsU32 VsqaddsU32 //go:noescape func VsqaddsU32(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Int32) // Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsqrtF32 VsqrtF32 //go:noescape func VsqrtF32(r *arm.Float32X2, v0 *arm.Float32X2) // Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsqrtF64 VsqrtF64 //go:noescape func VsqrtF64(r *arm.Float64X1, v0 *arm.Float64X1) // Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsqrtqF32 VsqrtqF32 //go:noescape func VsqrtqF32(r *arm.Float32X4, v0 *arm.Float32X4) // Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsqrtqF64 VsqrtqF64 //go:noescape func VsqrtqF64(r *arm.Float64X2, v0 *arm.Float64X2) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubS8 VsubS8 //go:noescape func VsubS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubS16 VsubS16 //go:noescape func VsubS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubS32 VsubS32 //go:noescape func VsubS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubS64 VsubS64 //go:noescape func VsubS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubU8 VsubU8 //go:noescape func VsubU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubU16 VsubU16 //go:noescape func VsubU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubU32 VsubU32 //go:noescape func VsubU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubU64 VsubU64 //go:noescape func VsubU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubF32 VsubF32 //go:noescape func VsubF32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubF64 VsubF64 //go:noescape func VsubF64(r *arm.Float64X1, v0 *arm.Float64X1, v1 *arm.Float64X1) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubdS64 VsubdS64 //go:noescape func VsubdS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubdU64 VsubdU64 //go:noescape func VsubdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64) // Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VsubhnS16 VsubhnS16 //go:noescape func VsubhnS16(r *arm.Int8X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VsubhnS32 VsubhnS32 //go:noescape func VsubhnS32(r *arm.Int16X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VsubhnS64 VsubhnS64 //go:noescape func VsubhnS64(r *arm.Int32X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VsubhnU16 VsubhnU16 //go:noescape func VsubhnU16(r *arm.Uint8X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VsubhnU32 VsubhnU32 //go:noescape func VsubhnU32(r *arm.Uint16X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VsubhnU64 VsubhnU64 //go:noescape func VsubhnU64(r *arm.Uint32X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VsubhnHighS16 VsubhnHighS16 //go:noescape func VsubhnHighS16(r *arm.Int8X16, v0 *arm.Int8X8, v1 *arm.Int16X8, v2 *arm.Int16X8) // Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VsubhnHighS32 VsubhnHighS32 //go:noescape func VsubhnHighS32(r *arm.Int16X8, v0 *arm.Int16X4, v1 *arm.Int32X4, v2 *arm.Int32X4) // Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VsubhnHighS64 VsubhnHighS64 //go:noescape func VsubhnHighS64(r *arm.Int32X4, v0 *arm.Int32X2, v1 *arm.Int64X2, v2 *arm.Int64X2) // Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VsubhnHighU16 VsubhnHighU16 //go:noescape func VsubhnHighU16(r *arm.Uint8X16, v0 *arm.Uint8X8, v1 *arm.Uint16X8, v2 *arm.Uint16X8) // Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VsubhnHighU32 VsubhnHighU32 //go:noescape func VsubhnHighU32(r *arm.Uint16X8, v0 *arm.Uint16X4, v1 *arm.Uint32X4, v2 *arm.Uint32X4) // Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the most significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VsubhnHighU64 VsubhnHighU64 //go:noescape func VsubhnHighU64(r *arm.Uint32X4, v0 *arm.Uint32X2, v1 *arm.Uint64X2, v2 *arm.Uint64X2) // Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements. // //go:linkname VsublS8 VsublS8 //go:noescape func VsublS8(r *arm.Int16X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements. // //go:linkname VsublS16 VsublS16 //go:noescape func VsublS16(r *arm.Int32X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements. // //go:linkname VsublS32 VsublS32 //go:noescape func VsublS32(r *arm.Int64X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements. // //go:linkname VsublU8 VsublU8 //go:noescape func VsublU8(r *arm.Uint16X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements. // //go:linkname VsublU16 VsublU16 //go:noescape func VsublU16(r *arm.Uint32X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements. // //go:linkname VsublU32 VsublU32 //go:noescape func VsublU32(r *arm.Uint64X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements. // //go:linkname VsublHighS8 VsublHighS8 //go:noescape func VsublHighS8(r *arm.Int16X8, v0 *arm.Int8X16, v1 *arm.Int8X16) // Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements. // //go:linkname VsublHighS16 VsublHighS16 //go:noescape func VsublHighS16(r *arm.Int32X4, v0 *arm.Int16X8, v1 *arm.Int16X8) // Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are twice as long as the source vector elements. // //go:linkname VsublHighS32 VsublHighS32 //go:noescape func VsublHighS32(r *arm.Int64X2, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements. // //go:linkname VsublHighU8 VsublHighU8 //go:noescape func VsublHighU8(r *arm.Uint16X8, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements. // //go:linkname VsublHighU16 VsublHighU16 //go:noescape func VsublHighU16(r *arm.Uint32X4, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The destination vector elements are twice as long as the source vector elements. // //go:linkname VsublHighU32 VsublHighU32 //go:noescape func VsublHighU32(r *arm.Uint64X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqS8 VsubqS8 //go:noescape func VsubqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqS16 VsubqS16 //go:noescape func VsubqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqS32 VsubqS32 //go:noescape func VsubqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqS64 VsubqS64 //go:noescape func VsubqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqU8 VsubqU8 //go:noescape func VsubqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqU16 VsubqU16 //go:noescape func VsubqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqU32 VsubqU32 //go:noescape func VsubqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqU64 VsubqU64 //go:noescape func VsubqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqF32 VsubqF32 //go:noescape func VsubqF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqF64 VsubqF64 //go:noescape func VsubqF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values. // //go:linkname VsubwS8 VsubwS8 //go:noescape func VsubwS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X8) // Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values. // //go:linkname VsubwS16 VsubwS16 //go:noescape func VsubwS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X4) // Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values. // //go:linkname VsubwS32 VsubwS32 //go:noescape func VsubwS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X2) // Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values. // //go:linkname VsubwU8 VsubwU8 //go:noescape func VsubwU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X8) // Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values. // //go:linkname VsubwU16 VsubwU16 //go:noescape func VsubwU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X4) // Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values. // //go:linkname VsubwU32 VsubwU32 //go:noescape func VsubwU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X2) // Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values. // //go:linkname VsubwHighS8 VsubwHighS8 //go:noescape func VsubwHighS8(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int8X16) // Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values. // //go:linkname VsubwHighS16 VsubwHighS16 //go:noescape func VsubwHighS16(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int16X8) // Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values. // //go:linkname VsubwHighS32 VsubwHighS32 //go:noescape func VsubwHighS32(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int32X4) // Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values. // //go:linkname VsubwHighU8 VsubwHighU8 //go:noescape func VsubwHighU8(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint8X16) // Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values. // //go:linkname VsubwHighU16 VsubwHighU16 //go:noescape func VsubwHighU16(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint16X8) // Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer values. // //go:linkname VsubwHighU32 VsubwHighU32 //go:noescape func VsubwHighU32(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint32X4) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbl1S8 Vtbl1S8 //go:noescape func Vtbl1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbl1U8 Vtbl1U8 //go:noescape func Vtbl1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbl1P8 Vtbl1P8 //go:noescape func Vtbl1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbl2S8 Vtbl2S8 //go:noescape func Vtbl2S8(r *arm.Int8X8, v0 *arm.Int8X8X2, v1 *arm.Int8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbl2U8 Vtbl2U8 //go:noescape func Vtbl2U8(r *arm.Uint8X8, v0 *arm.Uint8X8X2, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbl2P8 Vtbl2P8 //go:noescape func Vtbl2P8(r *arm.Poly8X8, v0 *arm.Poly8X8X2, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbl3S8 Vtbl3S8 //go:noescape func Vtbl3S8(r *arm.Int8X8, v0 *arm.Int8X8X3, v1 *arm.Int8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbl3U8 Vtbl3U8 //go:noescape func Vtbl3U8(r *arm.Uint8X8, v0 *arm.Uint8X8X3, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbl3P8 Vtbl3P8 //go:noescape func Vtbl3P8(r *arm.Poly8X8, v0 *arm.Poly8X8X3, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbl4S8 Vtbl4S8 //go:noescape func Vtbl4S8(r *arm.Int8X8, v0 *arm.Int8X8X4, v1 *arm.Int8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbl4U8 Vtbl4U8 //go:noescape func Vtbl4U8(r *arm.Uint8X8, v0 *arm.Uint8X8X4, v1 *arm.Uint8X8) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbl4P8 Vtbl4P8 //go:noescape func Vtbl4P8(r *arm.Poly8X8, v0 *arm.Poly8X8X4, v1 *arm.Uint8X8) // Table vector lookup extension // //go:linkname Vtbx1S8 Vtbx1S8 //go:noescape func Vtbx1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8, v2 *arm.Int8X8) // Table vector lookup extension // //go:linkname Vtbx1U8 Vtbx1U8 //go:noescape func Vtbx1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8, v2 *arm.Uint8X8) // Table vector lookup extension // //go:linkname Vtbx1P8 Vtbx1P8 //go:noescape func Vtbx1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbx2S8 Vtbx2S8 //go:noescape func Vtbx2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8X2, v2 *arm.Int8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbx2U8 Vtbx2U8 //go:noescape func Vtbx2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8X2, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbx2P8 Vtbx2P8 //go:noescape func Vtbx2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8X2, v2 *arm.Uint8X8) // Table vector lookup extension // //go:linkname Vtbx3S8 Vtbx3S8 //go:noescape func Vtbx3S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8X3, v2 *arm.Int8X8) // Table vector lookup extension // //go:linkname Vtbx3U8 Vtbx3U8 //go:noescape func Vtbx3U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8X3, v2 *arm.Uint8X8) // Table vector lookup extension // //go:linkname Vtbx3P8 Vtbx3P8 //go:noescape func Vtbx3P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8X3, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbx4S8 Vtbx4S8 //go:noescape func Vtbx4S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8X4, v2 *arm.Int8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbx4U8 Vtbx4U8 //go:noescape func Vtbx4U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8X4, v2 *arm.Uint8X8) // Table vector lookup extension. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination register is left unchanged. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbx4P8 Vtbx4P8 //go:noescape func Vtbx4P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8X4, v2 *arm.Uint8X8) // Transpose elements // //go:linkname VtrnS8 VtrnS8 //go:noescape func VtrnS8(r *arm.Int8X8X2, v0 *arm.Int8X8, v1 *arm.Int8X8) // Transpose elements // //go:linkname VtrnS16 VtrnS16 //go:noescape func VtrnS16(r *arm.Int16X4X2, v0 *arm.Int16X4, v1 *arm.Int16X4) // Transpose elements // //go:linkname VtrnS32 VtrnS32 //go:noescape func VtrnS32(r *arm.Int32X2X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Transpose elements // //go:linkname VtrnU8 VtrnU8 //go:noescape func VtrnU8(r *arm.Uint8X8X2, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Transpose elements // //go:linkname VtrnU16 VtrnU16 //go:noescape func VtrnU16(r *arm.Uint16X4X2, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Transpose elements // //go:linkname VtrnU32 VtrnU32 //go:noescape func VtrnU32(r *arm.Uint32X2X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Transpose elements // //go:linkname VtrnF32 VtrnF32 //go:noescape func VtrnF32(r *arm.Float32X2X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1S8 Vtrn1S8 //go:noescape func Vtrn1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1S16 Vtrn1S16 //go:noescape func Vtrn1S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1S32 Vtrn1S32 //go:noescape func Vtrn1S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1U8 Vtrn1U8 //go:noescape func Vtrn1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1U16 Vtrn1U16 //go:noescape func Vtrn1U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1U32 Vtrn1U32 //go:noescape func Vtrn1U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1F32 Vtrn1F32 //go:noescape func Vtrn1F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1P16 Vtrn1P16 //go:noescape func Vtrn1P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1P8 Vtrn1P8 //go:noescape func Vtrn1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QS8 Vtrn1QS8 //go:noescape func Vtrn1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QS16 Vtrn1QS16 //go:noescape func Vtrn1QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QS32 Vtrn1QS32 //go:noescape func Vtrn1QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QS64 Vtrn1QS64 //go:noescape func Vtrn1QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QU8 Vtrn1QU8 //go:noescape func Vtrn1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QU16 Vtrn1QU16 //go:noescape func Vtrn1QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QU32 Vtrn1QU32 //go:noescape func Vtrn1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QU64 Vtrn1QU64 //go:noescape func Vtrn1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QF32 Vtrn1QF32 //go:noescape func Vtrn1QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QF64 Vtrn1QF64 //go:noescape func Vtrn1QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QP16 Vtrn1QP16 //go:noescape func Vtrn1QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QP64 Vtrn1QP64 //go:noescape func Vtrn1QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QP8 Vtrn1QP8 //go:noescape func Vtrn1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2S8 Vtrn2S8 //go:noescape func Vtrn2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2S16 Vtrn2S16 //go:noescape func Vtrn2S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2S32 Vtrn2S32 //go:noescape func Vtrn2S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2U8 Vtrn2U8 //go:noescape func Vtrn2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2U16 Vtrn2U16 //go:noescape func Vtrn2U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2U32 Vtrn2U32 //go:noescape func Vtrn2U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2F32 Vtrn2F32 //go:noescape func Vtrn2F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2P16 Vtrn2P16 //go:noescape func Vtrn2P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2P8 Vtrn2P8 //go:noescape func Vtrn2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QS8 Vtrn2QS8 //go:noescape func Vtrn2QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QS16 Vtrn2QS16 //go:noescape func Vtrn2QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QS32 Vtrn2QS32 //go:noescape func Vtrn2QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QS64 Vtrn2QS64 //go:noescape func Vtrn2QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QU8 Vtrn2QU8 //go:noescape func Vtrn2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QU16 Vtrn2QU16 //go:noescape func Vtrn2QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QU32 Vtrn2QU32 //go:noescape func Vtrn2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QU64 Vtrn2QU64 //go:noescape func Vtrn2QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QF32 Vtrn2QF32 //go:noescape func Vtrn2QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QF64 Vtrn2QF64 //go:noescape func Vtrn2QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QP16 Vtrn2QP16 //go:noescape func Vtrn2QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QP64 Vtrn2QP64 //go:noescape func Vtrn2QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QP8 Vtrn2QP8 //go:noescape func Vtrn2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16) // Transpose elements // //go:linkname VtrnP16 VtrnP16 //go:noescape func VtrnP16(r *arm.Poly16X4X2, v0 *arm.Poly16X4, v1 *arm.Poly16X4) // Transpose elements // //go:linkname VtrnP8 VtrnP8 //go:noescape func VtrnP8(r *arm.Poly8X8X2, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Transpose elements // //go:linkname VtrnqS8 VtrnqS8 //go:noescape func VtrnqS8(r *arm.Int8X16X2, v0 *arm.Int8X16, v1 *arm.Int8X16) // Transpose elements // //go:linkname VtrnqS16 VtrnqS16 //go:noescape func VtrnqS16(r *arm.Int16X8X2, v0 *arm.Int16X8, v1 *arm.Int16X8) // Transpose elements // //go:linkname VtrnqS32 VtrnqS32 //go:noescape func VtrnqS32(r *arm.Int32X4X2, v0 *arm.Int32X4, v1 *arm.Int32X4) // Transpose elements // //go:linkname VtrnqU8 VtrnqU8 //go:noescape func VtrnqU8(r *arm.Uint8X16X2, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Transpose elements // //go:linkname VtrnqU16 VtrnqU16 //go:noescape func VtrnqU16(r *arm.Uint16X8X2, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Transpose elements // //go:linkname VtrnqU32 VtrnqU32 //go:noescape func VtrnqU32(r *arm.Uint32X4X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Transpose elements // //go:linkname VtrnqF32 VtrnqF32 //go:noescape func VtrnqF32(r *arm.Float32X4X2, v0 *arm.Float32X4, v1 *arm.Float32X4) // Transpose elements // //go:linkname VtrnqP16 VtrnqP16 //go:noescape func VtrnqP16(r *arm.Poly16X8X2, v0 *arm.Poly16X8, v1 *arm.Poly16X8) // Transpose elements // //go:linkname VtrnqP8 VtrnqP8 //go:noescape func VtrnqP8(r *arm.Poly8X16X2, v0 *arm.Poly8X16, v1 *arm.Poly8X16) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstS8 VtstS8 //go:noescape func VtstS8(r *arm.Uint8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstS16 VtstS16 //go:noescape func VtstS16(r *arm.Uint16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstS32 VtstS32 //go:noescape func VtstS32(r *arm.Uint32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstS64 VtstS64 //go:noescape func VtstS64(r *arm.Uint64X1, v0 *arm.Int64X1, v1 *arm.Int64X1) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstU8 VtstU8 //go:noescape func VtstU8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstU16 VtstU16 //go:noescape func VtstU16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstU32 VtstU32 //go:noescape func VtstU32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstU64 VtstU64 //go:noescape func VtstU64(r *arm.Uint64X1, v0 *arm.Uint64X1, v1 *arm.Uint64X1) // vtst_p16 // //go:linkname VtstP16 VtstP16 //go:noescape func VtstP16(r *arm.Uint16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstP64 VtstP64 //go:noescape func VtstP64(r *arm.Uint64X1, v0 *arm.Poly64X1, v1 *arm.Poly64X1) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstP8 VtstP8 //go:noescape func VtstP8(r *arm.Uint8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstdS64 VtstdS64 //go:noescape func VtstdS64(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstdU64 VtstdU64 //go:noescape func VtstdU64(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqS8 VtstqS8 //go:noescape func VtstqS8(r *arm.Uint8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqS16 VtstqS16 //go:noescape func VtstqS16(r *arm.Uint16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqS32 VtstqS32 //go:noescape func VtstqS32(r *arm.Uint32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqS64 VtstqS64 //go:noescape func VtstqS64(r *arm.Uint64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqU8 VtstqU8 //go:noescape func VtstqU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqU16 VtstqU16 //go:noescape func VtstqU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqU32 VtstqU32 //go:noescape func VtstqU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqU64 VtstqU64 //go:noescape func VtstqU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // vtstq_p16 // //go:linkname VtstqP16 VtstqP16 //go:noescape func VtstqP16(r *arm.Uint16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqP64 VtstqP64 //go:noescape func VtstqP64(r *arm.Uint64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqP8 VtstqP8 //go:noescape func VtstqP8(r *arm.Uint8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16) // Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register. // //go:linkname VuqaddS8 VuqaddS8 //go:noescape func VuqaddS8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Uint8X8) // Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register. // //go:linkname VuqaddS16 VuqaddS16 //go:noescape func VuqaddS16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Uint16X4) // Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register. // //go:linkname VuqaddS32 VuqaddS32 //go:noescape func VuqaddS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Uint32X2) // Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register. // //go:linkname VuqaddS64 VuqaddS64 //go:noescape func VuqaddS64(r *arm.Int64X1, v0 *arm.Int64X1, v1 *arm.Uint64X1) // Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register. // //go:linkname VuqaddbS8 VuqaddbS8 //go:noescape func VuqaddbS8(r *arm.Int8, v0 *arm.Int8, v1 *arm.Uint8) // Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register. // //go:linkname VuqadddS64 VuqadddS64 //go:noescape func VuqadddS64(r *arm.Int64, v0 *arm.Int64, v1 *arm.Uint64) // Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register. // //go:linkname VuqaddhS16 VuqaddhS16 //go:noescape func VuqaddhS16(r *arm.Int16, v0 *arm.Int16, v1 *arm.Uint16) // Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register. // //go:linkname VuqaddqS8 VuqaddqS8 //go:noescape func VuqaddqS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Uint8X16) // Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register. // //go:linkname VuqaddqS16 VuqaddqS16 //go:noescape func VuqaddqS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Uint16X8) // Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register. // //go:linkname VuqaddqS32 VuqaddqS32 //go:noescape func VuqaddqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Uint32X4) // Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register. // //go:linkname VuqaddqS64 VuqaddqS64 //go:noescape func VuqaddqS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Uint64X2) // Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register. // //go:linkname VuqaddsS32 VuqaddsS32 //go:noescape func VuqaddsS32(r *arm.Int32, v0 *arm.Int32, v1 *arm.Uint32) // Dot Product vector form with unsigned and signed integers. This instruction performs the dot product of the four unsigned 8-bit integer values in each 32-bit element of the first source register with the four signed 8-bit integer values in the corresponding 32-bit element of the second source register, accumulating the result into the corresponding 32-bit element of the destination register. // //go:linkname VusdotS32 VusdotS32 //go:noescape func VusdotS32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Uint8X8, v2 *arm.Int8X8) // Dot Product vector form with unsigned and signed integers. This instruction performs the dot product of the four unsigned 8-bit integer values in each 32-bit element of the first source register with the four signed 8-bit integer values in the corresponding 32-bit element of the second source register, accumulating the result into the corresponding 32-bit element of the destination register. // //go:linkname VusdotqS32 VusdotqS32 //go:noescape func VusdotqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Uint8X16, v2 *arm.Int8X16) // Unsigned and signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned 8-bit integer values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the destination vector. This is equivalent to performing an 8-way dot product per destination element. // //go:linkname VusmmlaqS32 VusmmlaqS32 //go:noescape func VusmmlaqS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Uint8X16, v2 *arm.Int8X16) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpS8 VuzpS8 //go:noescape func VuzpS8(r *arm.Int8X8X2, v0 *arm.Int8X8, v1 *arm.Int8X8) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpS16 VuzpS16 //go:noescape func VuzpS16(r *arm.Int16X4X2, v0 *arm.Int16X4, v1 *arm.Int16X4) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpS32 VuzpS32 //go:noescape func VuzpS32(r *arm.Int32X2X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpU8 VuzpU8 //go:noescape func VuzpU8(r *arm.Uint8X8X2, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpU16 VuzpU16 //go:noescape func VuzpU16(r *arm.Uint16X4X2, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpU32 VuzpU32 //go:noescape func VuzpU32(r *arm.Uint32X2X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpF32 VuzpF32 //go:noescape func VuzpF32(r *arm.Float32X2X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1S8 Vuzp1S8 //go:noescape func Vuzp1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1S16 Vuzp1S16 //go:noescape func Vuzp1S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1S32 Vuzp1S32 //go:noescape func Vuzp1S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1U8 Vuzp1U8 //go:noescape func Vuzp1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1U16 Vuzp1U16 //go:noescape func Vuzp1U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1U32 Vuzp1U32 //go:noescape func Vuzp1U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1F32 Vuzp1F32 //go:noescape func Vuzp1F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1P16 Vuzp1P16 //go:noescape func Vuzp1P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1P8 Vuzp1P8 //go:noescape func Vuzp1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QS8 Vuzp1QS8 //go:noescape func Vuzp1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QS16 Vuzp1QS16 //go:noescape func Vuzp1QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QS32 Vuzp1QS32 //go:noescape func Vuzp1QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QS64 Vuzp1QS64 //go:noescape func Vuzp1QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QU8 Vuzp1QU8 //go:noescape func Vuzp1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QU16 Vuzp1QU16 //go:noescape func Vuzp1QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QU32 Vuzp1QU32 //go:noescape func Vuzp1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QU64 Vuzp1QU64 //go:noescape func Vuzp1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QF32 Vuzp1QF32 //go:noescape func Vuzp1QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QF64 Vuzp1QF64 //go:noescape func Vuzp1QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QP16 Vuzp1QP16 //go:noescape func Vuzp1QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QP64 Vuzp1QP64 //go:noescape func Vuzp1QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QP8 Vuzp1QP8 //go:noescape func Vuzp1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2S8 Vuzp2S8 //go:noescape func Vuzp2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2S16 Vuzp2S16 //go:noescape func Vuzp2S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2S32 Vuzp2S32 //go:noescape func Vuzp2S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2U8 Vuzp2U8 //go:noescape func Vuzp2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2U16 Vuzp2U16 //go:noescape func Vuzp2U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2U32 Vuzp2U32 //go:noescape func Vuzp2U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2F32 Vuzp2F32 //go:noescape func Vuzp2F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2P16 Vuzp2P16 //go:noescape func Vuzp2P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2P8 Vuzp2P8 //go:noescape func Vuzp2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QS8 Vuzp2QS8 //go:noescape func Vuzp2QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QS16 Vuzp2QS16 //go:noescape func Vuzp2QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QS32 Vuzp2QS32 //go:noescape func Vuzp2QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QS64 Vuzp2QS64 //go:noescape func Vuzp2QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QU8 Vuzp2QU8 //go:noescape func Vuzp2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QU16 Vuzp2QU16 //go:noescape func Vuzp2QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QU32 Vuzp2QU32 //go:noescape func Vuzp2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QU64 Vuzp2QU64 //go:noescape func Vuzp2QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QF32 Vuzp2QF32 //go:noescape func Vuzp2QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QF64 Vuzp2QF64 //go:noescape func Vuzp2QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QP16 Vuzp2QP16 //go:noescape func Vuzp2QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QP64 Vuzp2QP64 //go:noescape func Vuzp2QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QP8 Vuzp2QP8 //go:noescape func Vuzp2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpP16 VuzpP16 //go:noescape func VuzpP16(r *arm.Poly16X4X2, v0 *arm.Poly16X4, v1 *arm.Poly16X4) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpP8 VuzpP8 //go:noescape func VuzpP8(r *arm.Poly8X8X2, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpqS8 VuzpqS8 //go:noescape func VuzpqS8(r *arm.Int8X16X2, v0 *arm.Int8X16, v1 *arm.Int8X16) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpqS16 VuzpqS16 //go:noescape func VuzpqS16(r *arm.Int16X8X2, v0 *arm.Int16X8, v1 *arm.Int16X8) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpqS32 VuzpqS32 //go:noescape func VuzpqS32(r *arm.Int32X4X2, v0 *arm.Int32X4, v1 *arm.Int32X4) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpqU8 VuzpqU8 //go:noescape func VuzpqU8(r *arm.Uint8X16X2, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpqU16 VuzpqU16 //go:noescape func VuzpqU16(r *arm.Uint16X8X2, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpqU32 VuzpqU32 //go:noescape func VuzpqU32(r *arm.Uint32X4X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpqF32 VuzpqF32 //go:noescape func VuzpqF32(r *arm.Float32X4X2, v0 *arm.Float32X4, v1 *arm.Float32X4) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpqP16 VuzpqP16 //go:noescape func VuzpqP16(r *arm.Poly16X8X2, v0 *arm.Poly16X8, v1 *arm.Poly16X8) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VuzpqP8 VuzpqP8 //go:noescape func VuzpqP8(r *arm.Poly8X16X2, v0 *arm.Poly8X16, v1 *arm.Poly8X16) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipS8 VzipS8 //go:noescape func VzipS8(r *arm.Int8X8X2, v0 *arm.Int8X8, v1 *arm.Int8X8) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipS16 VzipS16 //go:noescape func VzipS16(r *arm.Int16X4X2, v0 *arm.Int16X4, v1 *arm.Int16X4) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipS32 VzipS32 //go:noescape func VzipS32(r *arm.Int32X2X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipU8 VzipU8 //go:noescape func VzipU8(r *arm.Uint8X8X2, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipU16 VzipU16 //go:noescape func VzipU16(r *arm.Uint16X4X2, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipU32 VzipU32 //go:noescape func VzipU32(r *arm.Uint32X2X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipF32 VzipF32 //go:noescape func VzipF32(r *arm.Float32X2X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1S8 Vzip1S8 //go:noescape func Vzip1S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1S16 Vzip1S16 //go:noescape func Vzip1S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1S32 Vzip1S32 //go:noescape func Vzip1S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1U8 Vzip1U8 //go:noescape func Vzip1U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1U16 Vzip1U16 //go:noescape func Vzip1U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1U32 Vzip1U32 //go:noescape func Vzip1U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1F32 Vzip1F32 //go:noescape func Vzip1F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1P16 Vzip1P16 //go:noescape func Vzip1P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1P8 Vzip1P8 //go:noescape func Vzip1P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QS8 Vzip1QS8 //go:noescape func Vzip1QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QS16 Vzip1QS16 //go:noescape func Vzip1QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QS32 Vzip1QS32 //go:noescape func Vzip1QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QS64 Vzip1QS64 //go:noescape func Vzip1QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QU8 Vzip1QU8 //go:noescape func Vzip1QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QU16 Vzip1QU16 //go:noescape func Vzip1QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QU32 Vzip1QU32 //go:noescape func Vzip1QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QU64 Vzip1QU64 //go:noescape func Vzip1QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QF32 Vzip1QF32 //go:noescape func Vzip1QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QF64 Vzip1QF64 //go:noescape func Vzip1QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QP16 Vzip1QP16 //go:noescape func Vzip1QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QP64 Vzip1QP64 //go:noescape func Vzip1QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QP8 Vzip1QP8 //go:noescape func Vzip1QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2S8 Vzip2S8 //go:noescape func Vzip2S8(r *arm.Int8X8, v0 *arm.Int8X8, v1 *arm.Int8X8) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2S16 Vzip2S16 //go:noescape func Vzip2S16(r *arm.Int16X4, v0 *arm.Int16X4, v1 *arm.Int16X4) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2S32 Vzip2S32 //go:noescape func Vzip2S32(r *arm.Int32X2, v0 *arm.Int32X2, v1 *arm.Int32X2) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2U8 Vzip2U8 //go:noescape func Vzip2U8(r *arm.Uint8X8, v0 *arm.Uint8X8, v1 *arm.Uint8X8) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2U16 Vzip2U16 //go:noescape func Vzip2U16(r *arm.Uint16X4, v0 *arm.Uint16X4, v1 *arm.Uint16X4) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2U32 Vzip2U32 //go:noescape func Vzip2U32(r *arm.Uint32X2, v0 *arm.Uint32X2, v1 *arm.Uint32X2) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2F32 Vzip2F32 //go:noescape func Vzip2F32(r *arm.Float32X2, v0 *arm.Float32X2, v1 *arm.Float32X2) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2P16 Vzip2P16 //go:noescape func Vzip2P16(r *arm.Poly16X4, v0 *arm.Poly16X4, v1 *arm.Poly16X4) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2P8 Vzip2P8 //go:noescape func Vzip2P8(r *arm.Poly8X8, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QS8 Vzip2QS8 //go:noescape func Vzip2QS8(r *arm.Int8X16, v0 *arm.Int8X16, v1 *arm.Int8X16) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QS16 Vzip2QS16 //go:noescape func Vzip2QS16(r *arm.Int16X8, v0 *arm.Int16X8, v1 *arm.Int16X8) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QS32 Vzip2QS32 //go:noescape func Vzip2QS32(r *arm.Int32X4, v0 *arm.Int32X4, v1 *arm.Int32X4) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QS64 Vzip2QS64 //go:noescape func Vzip2QS64(r *arm.Int64X2, v0 *arm.Int64X2, v1 *arm.Int64X2) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QU8 Vzip2QU8 //go:noescape func Vzip2QU8(r *arm.Uint8X16, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QU16 Vzip2QU16 //go:noescape func Vzip2QU16(r *arm.Uint16X8, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QU32 Vzip2QU32 //go:noescape func Vzip2QU32(r *arm.Uint32X4, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QU64 Vzip2QU64 //go:noescape func Vzip2QU64(r *arm.Uint64X2, v0 *arm.Uint64X2, v1 *arm.Uint64X2) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QF32 Vzip2QF32 //go:noescape func Vzip2QF32(r *arm.Float32X4, v0 *arm.Float32X4, v1 *arm.Float32X4) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QF64 Vzip2QF64 //go:noescape func Vzip2QF64(r *arm.Float64X2, v0 *arm.Float64X2, v1 *arm.Float64X2) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QP16 Vzip2QP16 //go:noescape func Vzip2QP16(r *arm.Poly16X8, v0 *arm.Poly16X8, v1 *arm.Poly16X8) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QP64 Vzip2QP64 //go:noescape func Vzip2QP64(r *arm.Poly64X2, v0 *arm.Poly64X2, v1 *arm.Poly64X2) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QP8 Vzip2QP8 //go:noescape func Vzip2QP8(r *arm.Poly8X16, v0 *arm.Poly8X16, v1 *arm.Poly8X16) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipP16 VzipP16 //go:noescape func VzipP16(r *arm.Poly16X4X2, v0 *arm.Poly16X4, v1 *arm.Poly16X4) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipP8 VzipP8 //go:noescape func VzipP8(r *arm.Poly8X8X2, v0 *arm.Poly8X8, v1 *arm.Poly8X8) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipqS8 VzipqS8 //go:noescape func VzipqS8(r *arm.Int8X16X2, v0 *arm.Int8X16, v1 *arm.Int8X16) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipqS16 VzipqS16 //go:noescape func VzipqS16(r *arm.Int16X8X2, v0 *arm.Int16X8, v1 *arm.Int16X8) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipqS32 VzipqS32 //go:noescape func VzipqS32(r *arm.Int32X4X2, v0 *arm.Int32X4, v1 *arm.Int32X4) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipqU8 VzipqU8 //go:noescape func VzipqU8(r *arm.Uint8X16X2, v0 *arm.Uint8X16, v1 *arm.Uint8X16) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipqU16 VzipqU16 //go:noescape func VzipqU16(r *arm.Uint16X8X2, v0 *arm.Uint16X8, v1 *arm.Uint16X8) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipqU32 VzipqU32 //go:noescape func VzipqU32(r *arm.Uint32X4X2, v0 *arm.Uint32X4, v1 *arm.Uint32X4) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipqF32 VzipqF32 //go:noescape func VzipqF32(r *arm.Float32X4X2, v0 *arm.Float32X4, v1 *arm.Float32X4) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipqP16 VzipqP16 //go:noescape func VzipqP16(r *arm.Poly16X8X2, v0 *arm.Poly16X8, v1 *arm.Poly16X8) // Interleave two vectors. This intrinsic reads corresponding elements from the two source vectors as pairs, interleaves the pairs, and returns the resulting interleaved vector. // //go:linkname VzipqP8 VzipqP8 //go:noescape func VzipqP8(r *arm.Poly8X16X2, v0 *arm.Poly8X16, v1 *arm.Poly8X16) ================================================ FILE: arm/neon/functions_bypass.go ================================================ package neon /* #include void vmulS8_bypass(int8x8_t* r, int8x8_t* v0, int8x8_t* v1) { *r = vmul_s8(*v0, *v1); } void vmulS8_full(int8_t* r, int8_t* v0, int8_t* v1, int n) { int8x8_t* pr = (int8x8_t*)r; int8x8_t* pa = (int8x8_t*)v0; int8x8_t* pb = (int8x8_t*)v1; for (int i=0; i */ import "C" type int8x8 = C.int8x8_t func vmulS8_cgo(r, v0, v1 *int8x8) { *r = C.vmul_s8(*v0, *v1) } ================================================ FILE: arm/neon/functions_test.go ================================================ package neon import ( "math/rand" "reflect" "runtime" "testing" "unsafe" "github.com/alivanz/go-simd/arm" ) func TestMult(t *testing.T) { var ( a = arm.Int8X8{0, 1, 2, 3, 4, 5, 6, 7} b = arm.Int8X8{7, 6, 5, 4, 3, 2, 1, 0} r = arm.Int8X8{0, 6, 10, 12, 12, 10, 6, 0} result arm.Int8X8 ) VmulS8(&result, &a, &b) if !reflect.DeepEqual(result, r) { t.Fatal(result) } } func TestMultFull(t *testing.T) { const N = 1024 * 16 var ( a [N]int8 b [N]int8 ref [N]int8 result [N]int8 ) for i := 0; i < N; i++ { a[i] = int8(rand.Int()) b[i] = int8(rand.Int()) ref[i] = a[i] * b[i] } vmulS8_full(&result[0], &a[0], &b[0], N) if !reflect.DeepEqual(result, ref) { t.Fail() } } func BenchmarkMultRef(t *testing.B) { const N = 1024 * 16 var ( a [N]int8 b [N]int8 result [N]int8 ) for j := range a[:] { a[j] = int8(rand.Int()) b[j] = int8(rand.Int()) } t.ResetTimer() for i := 0; i < t.N; i++ { for j := 0; j < N; j++ { result[j] = a[j] * b[j] } } runtime.KeepAlive(&result) } func BenchmarkMultSimd(t *testing.B) { const N = 1024 * 16 var ( a [N]int8 b [N]int8 result [N]int8 ) for i := 0; i < t.N; i++ { for j := 0; j < N; j += 8 { VmulS8( (*arm.Int8X8)(unsafe.Pointer(&result[j])), (*arm.Int8X8)(unsafe.Pointer(&a[j])), (*arm.Int8X8)(unsafe.Pointer(&b[j])), ) } } } func BenchmarkMultSimdBypass(t *testing.B) { const N = 1024 * 16 var ( a [N]int8 b [N]int8 result [N]int8 ) for i := 0; i < t.N; i++ { for j := 0; j < N; j += 8 { vmulS8_bypass( (*arm.Int8X8)(unsafe.Pointer(&result[j])), (*arm.Int8X8)(unsafe.Pointer(&a[j])), (*arm.Int8X8)(unsafe.Pointer(&b[j])), ) } } } func BenchmarkMultSimdFull(t *testing.B) { const N = 1024 * 16 var ( a [N]int8 b [N]int8 result [N]int8 ) for i := 0; i < t.N; i++ { vmulS8_full( &result[0], &a[0], &b[0], N, ) } } func BenchmarkMultSimdCgo(t *testing.B) { const N = 1024 * 16 var ( a [N]int8 b [N]int8 result [N]int8 ) for i := 0; i < t.N; i++ { for j := 0; j < N; j += 8 { vmulS8_cgo( (*int8x8)(unsafe.Pointer(&result[j])), (*int8x8)(unsafe.Pointer(&a[j])), (*int8x8)(unsafe.Pointer(&b[j])), ) } } } ================================================ FILE: arm/neon/loops.c ================================================ #include #define save(dst, src) *dst = src #define load(src) (*src) #define LOOP1(name, rtype, itype, f, set, load, rstep, istep) \ void name(rtype *r, itype *v, int32_t n) \ { \ while (n >= rstep) \ { \ set(r, f(load(v))); \ r += rstep; \ n -= rstep; \ v += istep; \ } \ } LOOP1(VabsS8N, int8_t, int8_t, vabs_s8, vst1_s8, vld1_s8, 8, 8) LOOP1(VabsS16N, int16_t, int16_t, vabs_s16, vst1_s16, vld1_s16, 4, 4) LOOP1(VabsS32N, int32_t, int32_t, vabs_s32, vst1_s32, vld1_s32, 2, 2) LOOP1(VabsS64N, int64_t, int64_t, vabs_s64, vst1_s64, vld1_s64, 1, 1) LOOP1(VabsF32N, float32_t, float32_t, vabs_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(VabsF64N, float64_t, float64_t, vabs_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(VabsdS64N, int64_t, int64_t, vabsd_s64, save, load, 1, 1) LOOP1(VabsqS8N, int8_t, int8_t, vabsq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP1(VabsqS16N, int16_t, int16_t, vabsq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP1(VabsqS32N, int32_t, int32_t, vabsq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP1(VabsqS64N, int64_t, int64_t, vabsq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP1(VabsqF32N, float32_t, float32_t, vabsq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(VabsqF64N, float64_t, float64_t, vabsq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(VaddvS8N, int8_t, int8_t, vaddv_s8, save, vld1_s8, 1, 8) LOOP1(VaddvS16N, int16_t, int16_t, vaddv_s16, save, vld1_s16, 1, 4) LOOP1(VaddvS32N, int32_t, int32_t, vaddv_s32, save, vld1_s32, 1, 2) LOOP1(VaddvU8N, uint8_t, uint8_t, vaddv_u8, save, vld1_u8, 1, 8) LOOP1(VaddvU16N, uint16_t, uint16_t, vaddv_u16, save, vld1_u16, 1, 4) LOOP1(VaddvU32N, uint32_t, uint32_t, vaddv_u32, save, vld1_u32, 1, 2) LOOP1(VaddvF32N, float32_t, float32_t, vaddv_f32, save, vld1_f32, 1, 2) LOOP1(VaddvqS8N, int8_t, int8_t, vaddvq_s8, save, vld1q_s8, 1, 16) LOOP1(VaddvqS16N, int16_t, int16_t, vaddvq_s16, save, vld1q_s16, 1, 8) LOOP1(VaddvqS32N, int32_t, int32_t, vaddvq_s32, save, vld1q_s32, 1, 4) LOOP1(VaddvqS64N, int64_t, int64_t, vaddvq_s64, save, vld1q_s64, 1, 2) LOOP1(VaddvqU8N, uint8_t, uint8_t, vaddvq_u8, save, vld1q_u8, 1, 16) LOOP1(VaddvqU16N, uint16_t, uint16_t, vaddvq_u16, save, vld1q_u16, 1, 8) LOOP1(VaddvqU32N, uint32_t, uint32_t, vaddvq_u32, save, vld1q_u32, 1, 4) LOOP1(VaddvqU64N, uint64_t, uint64_t, vaddvq_u64, save, vld1q_u64, 1, 2) LOOP1(VaddvqF32N, float32_t, float32_t, vaddvq_f32, save, vld1q_f32, 1, 4) LOOP1(VaddvqF64N, float64_t, float64_t, vaddvq_f64, save, vld1q_f64, 1, 2) LOOP1(VaesimcqU8N, uint8_t, uint8_t, vaesimcq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP1(VaesmcqU8N, uint8_t, uint8_t, vaesmcq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP1(VceqzS8N, uint8_t, int8_t, vceqz_s8, vst1_u8, vld1_s8, 8, 8) LOOP1(VceqzS16N, uint16_t, int16_t, vceqz_s16, vst1_u16, vld1_s16, 4, 4) LOOP1(VceqzS32N, uint32_t, int32_t, vceqz_s32, vst1_u32, vld1_s32, 2, 2) LOOP1(VceqzS64N, uint64_t, int64_t, vceqz_s64, vst1_u64, vld1_s64, 1, 1) LOOP1(VceqzU8N, uint8_t, uint8_t, vceqz_u8, vst1_u8, vld1_u8, 8, 8) LOOP1(VceqzU16N, uint16_t, uint16_t, vceqz_u16, vst1_u16, vld1_u16, 4, 4) LOOP1(VceqzU32N, uint32_t, uint32_t, vceqz_u32, vst1_u32, vld1_u32, 2, 2) LOOP1(VceqzU64N, uint64_t, uint64_t, vceqz_u64, vst1_u64, vld1_u64, 1, 1) LOOP1(VceqzF32N, uint32_t, float32_t, vceqz_f32, vst1_u32, vld1_f32, 2, 2) LOOP1(VceqzF64N, uint64_t, float64_t, vceqz_f64, vst1_u64, vld1_f64, 1, 1) LOOP1(VceqzdS64N, uint64_t, int64_t, vceqzd_s64, save, load, 1, 1) LOOP1(VceqzdU64N, uint64_t, uint64_t, vceqzd_u64, save, load, 1, 1) LOOP1(VceqzdF64N, uint64_t, float64_t, vceqzd_f64, save, load, 1, 1) LOOP1(VceqzqS8N, uint8_t, int8_t, vceqzq_s8, vst1q_u8, vld1q_s8, 16, 16) LOOP1(VceqzqS16N, uint16_t, int16_t, vceqzq_s16, vst1q_u16, vld1q_s16, 8, 8) LOOP1(VceqzqS32N, uint32_t, int32_t, vceqzq_s32, vst1q_u32, vld1q_s32, 4, 4) LOOP1(VceqzqS64N, uint64_t, int64_t, vceqzq_s64, vst1q_u64, vld1q_s64, 2, 2) LOOP1(VceqzqU8N, uint8_t, uint8_t, vceqzq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP1(VceqzqU16N, uint16_t, uint16_t, vceqzq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP1(VceqzqU32N, uint32_t, uint32_t, vceqzq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP1(VceqzqU64N, uint64_t, uint64_t, vceqzq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP1(VceqzqF32N, uint32_t, float32_t, vceqzq_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP1(VceqzqF64N, uint64_t, float64_t, vceqzq_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP1(VceqzsF32N, uint32_t, float32_t, vceqzs_f32, save, load, 1, 1) LOOP1(VcgezS8N, uint8_t, int8_t, vcgez_s8, vst1_u8, vld1_s8, 8, 8) LOOP1(VcgezS16N, uint16_t, int16_t, vcgez_s16, vst1_u16, vld1_s16, 4, 4) LOOP1(VcgezS32N, uint32_t, int32_t, vcgez_s32, vst1_u32, vld1_s32, 2, 2) LOOP1(VcgezS64N, uint64_t, int64_t, vcgez_s64, vst1_u64, vld1_s64, 1, 1) LOOP1(VcgezF32N, uint32_t, float32_t, vcgez_f32, vst1_u32, vld1_f32, 2, 2) LOOP1(VcgezF64N, uint64_t, float64_t, vcgez_f64, vst1_u64, vld1_f64, 1, 1) LOOP1(VcgezdS64N, uint64_t, int64_t, vcgezd_s64, save, load, 1, 1) LOOP1(VcgezdF64N, uint64_t, float64_t, vcgezd_f64, save, load, 1, 1) LOOP1(VcgezqS8N, uint8_t, int8_t, vcgezq_s8, vst1q_u8, vld1q_s8, 16, 16) LOOP1(VcgezqS16N, uint16_t, int16_t, vcgezq_s16, vst1q_u16, vld1q_s16, 8, 8) LOOP1(VcgezqS32N, uint32_t, int32_t, vcgezq_s32, vst1q_u32, vld1q_s32, 4, 4) LOOP1(VcgezqS64N, uint64_t, int64_t, vcgezq_s64, vst1q_u64, vld1q_s64, 2, 2) LOOP1(VcgezqF32N, uint32_t, float32_t, vcgezq_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP1(VcgezqF64N, uint64_t, float64_t, vcgezq_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP1(VcgezsF32N, uint32_t, float32_t, vcgezs_f32, save, load, 1, 1) LOOP1(VcgtzS8N, uint8_t, int8_t, vcgtz_s8, vst1_u8, vld1_s8, 8, 8) LOOP1(VcgtzS16N, uint16_t, int16_t, vcgtz_s16, vst1_u16, vld1_s16, 4, 4) LOOP1(VcgtzS32N, uint32_t, int32_t, vcgtz_s32, vst1_u32, vld1_s32, 2, 2) LOOP1(VcgtzS64N, uint64_t, int64_t, vcgtz_s64, vst1_u64, vld1_s64, 1, 1) LOOP1(VcgtzF32N, uint32_t, float32_t, vcgtz_f32, vst1_u32, vld1_f32, 2, 2) LOOP1(VcgtzF64N, uint64_t, float64_t, vcgtz_f64, vst1_u64, vld1_f64, 1, 1) LOOP1(VcgtzdS64N, uint64_t, int64_t, vcgtzd_s64, save, load, 1, 1) LOOP1(VcgtzdF64N, uint64_t, float64_t, vcgtzd_f64, save, load, 1, 1) LOOP1(VcgtzqS8N, uint8_t, int8_t, vcgtzq_s8, vst1q_u8, vld1q_s8, 16, 16) LOOP1(VcgtzqS16N, uint16_t, int16_t, vcgtzq_s16, vst1q_u16, vld1q_s16, 8, 8) LOOP1(VcgtzqS32N, uint32_t, int32_t, vcgtzq_s32, vst1q_u32, vld1q_s32, 4, 4) LOOP1(VcgtzqS64N, uint64_t, int64_t, vcgtzq_s64, vst1q_u64, vld1q_s64, 2, 2) LOOP1(VcgtzqF32N, uint32_t, float32_t, vcgtzq_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP1(VcgtzqF64N, uint64_t, float64_t, vcgtzq_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP1(VcgtzsF32N, uint32_t, float32_t, vcgtzs_f32, save, load, 1, 1) LOOP1(VclezS8N, uint8_t, int8_t, vclez_s8, vst1_u8, vld1_s8, 8, 8) LOOP1(VclezS16N, uint16_t, int16_t, vclez_s16, vst1_u16, vld1_s16, 4, 4) LOOP1(VclezS32N, uint32_t, int32_t, vclez_s32, vst1_u32, vld1_s32, 2, 2) LOOP1(VclezS64N, uint64_t, int64_t, vclez_s64, vst1_u64, vld1_s64, 1, 1) LOOP1(VclezF32N, uint32_t, float32_t, vclez_f32, vst1_u32, vld1_f32, 2, 2) LOOP1(VclezF64N, uint64_t, float64_t, vclez_f64, vst1_u64, vld1_f64, 1, 1) LOOP1(VclezdS64N, uint64_t, int64_t, vclezd_s64, save, load, 1, 1) LOOP1(VclezdF64N, uint64_t, float64_t, vclezd_f64, save, load, 1, 1) LOOP1(VclezqS8N, uint8_t, int8_t, vclezq_s8, vst1q_u8, vld1q_s8, 16, 16) LOOP1(VclezqS16N, uint16_t, int16_t, vclezq_s16, vst1q_u16, vld1q_s16, 8, 8) LOOP1(VclezqS32N, uint32_t, int32_t, vclezq_s32, vst1q_u32, vld1q_s32, 4, 4) LOOP1(VclezqS64N, uint64_t, int64_t, vclezq_s64, vst1q_u64, vld1q_s64, 2, 2) LOOP1(VclezqF32N, uint32_t, float32_t, vclezq_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP1(VclezqF64N, uint64_t, float64_t, vclezq_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP1(VclezsF32N, uint32_t, float32_t, vclezs_f32, save, load, 1, 1) LOOP1(VclsS8N, int8_t, int8_t, vcls_s8, vst1_s8, vld1_s8, 8, 8) LOOP1(VclsS16N, int16_t, int16_t, vcls_s16, vst1_s16, vld1_s16, 4, 4) LOOP1(VclsS32N, int32_t, int32_t, vcls_s32, vst1_s32, vld1_s32, 2, 2) LOOP1(VclsU8N, int8_t, uint8_t, vcls_u8, vst1_s8, vld1_u8, 8, 8) LOOP1(VclsU16N, int16_t, uint16_t, vcls_u16, vst1_s16, vld1_u16, 4, 4) LOOP1(VclsU32N, int32_t, uint32_t, vcls_u32, vst1_s32, vld1_u32, 2, 2) LOOP1(VclsqS8N, int8_t, int8_t, vclsq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP1(VclsqS16N, int16_t, int16_t, vclsq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP1(VclsqS32N, int32_t, int32_t, vclsq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP1(VclsqU8N, int8_t, uint8_t, vclsq_u8, vst1q_s8, vld1q_u8, 16, 16) LOOP1(VclsqU16N, int16_t, uint16_t, vclsq_u16, vst1q_s16, vld1q_u16, 8, 8) LOOP1(VclsqU32N, int32_t, uint32_t, vclsq_u32, vst1q_s32, vld1q_u32, 4, 4) LOOP1(VcltzS8N, uint8_t, int8_t, vcltz_s8, vst1_u8, vld1_s8, 8, 8) LOOP1(VcltzS16N, uint16_t, int16_t, vcltz_s16, vst1_u16, vld1_s16, 4, 4) LOOP1(VcltzS32N, uint32_t, int32_t, vcltz_s32, vst1_u32, vld1_s32, 2, 2) LOOP1(VcltzS64N, uint64_t, int64_t, vcltz_s64, vst1_u64, vld1_s64, 1, 1) LOOP1(VcltzF32N, uint32_t, float32_t, vcltz_f32, vst1_u32, vld1_f32, 2, 2) LOOP1(VcltzF64N, uint64_t, float64_t, vcltz_f64, vst1_u64, vld1_f64, 1, 1) LOOP1(VcltzdS64N, uint64_t, int64_t, vcltzd_s64, save, load, 1, 1) LOOP1(VcltzdF64N, uint64_t, float64_t, vcltzd_f64, save, load, 1, 1) LOOP1(VcltzqS8N, uint8_t, int8_t, vcltzq_s8, vst1q_u8, vld1q_s8, 16, 16) LOOP1(VcltzqS16N, uint16_t, int16_t, vcltzq_s16, vst1q_u16, vld1q_s16, 8, 8) LOOP1(VcltzqS32N, uint32_t, int32_t, vcltzq_s32, vst1q_u32, vld1q_s32, 4, 4) LOOP1(VcltzqS64N, uint64_t, int64_t, vcltzq_s64, vst1q_u64, vld1q_s64, 2, 2) LOOP1(VcltzqF32N, uint32_t, float32_t, vcltzq_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP1(VcltzqF64N, uint64_t, float64_t, vcltzq_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP1(VcltzsF32N, uint32_t, float32_t, vcltzs_f32, save, load, 1, 1) LOOP1(VclzS8N, int8_t, int8_t, vclz_s8, vst1_s8, vld1_s8, 8, 8) LOOP1(VclzS16N, int16_t, int16_t, vclz_s16, vst1_s16, vld1_s16, 4, 4) LOOP1(VclzS32N, int32_t, int32_t, vclz_s32, vst1_s32, vld1_s32, 2, 2) LOOP1(VclzU8N, uint8_t, uint8_t, vclz_u8, vst1_u8, vld1_u8, 8, 8) LOOP1(VclzU16N, uint16_t, uint16_t, vclz_u16, vst1_u16, vld1_u16, 4, 4) LOOP1(VclzU32N, uint32_t, uint32_t, vclz_u32, vst1_u32, vld1_u32, 2, 2) LOOP1(VclzqS8N, int8_t, int8_t, vclzq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP1(VclzqS16N, int16_t, int16_t, vclzq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP1(VclzqS32N, int32_t, int32_t, vclzq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP1(VclzqU8N, uint8_t, uint8_t, vclzq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP1(VclzqU16N, uint16_t, uint16_t, vclzq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP1(VclzqU32N, uint32_t, uint32_t, vclzq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP1(VcntS8N, int8_t, int8_t, vcnt_s8, vst1_s8, vld1_s8, 8, 8) LOOP1(VcntU8N, uint8_t, uint8_t, vcnt_u8, vst1_u8, vld1_u8, 8, 8) LOOP1(VcntqS8N, int8_t, int8_t, vcntq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP1(VcntqU8N, uint8_t, uint8_t, vcntq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP1(VcvtF32S32N, float32_t, int32_t, vcvt_f32_s32, vst1_f32, vld1_s32, 2, 2) LOOP1(VcvtF32U32N, float32_t, uint32_t, vcvt_f32_u32, vst1_f32, vld1_u32, 2, 2) LOOP1(VcvtF64S64N, float64_t, int64_t, vcvt_f64_s64, vst1_f64, vld1_s64, 1, 1) LOOP1(VcvtF64U64N, float64_t, uint64_t, vcvt_f64_u64, vst1_f64, vld1_u64, 1, 1) LOOP1(VcvtS32F32N, int32_t, float32_t, vcvt_s32_f32, vst1_s32, vld1_f32, 2, 2) LOOP1(VcvtS64F64N, int64_t, float64_t, vcvt_s64_f64, vst1_s64, vld1_f64, 1, 1) LOOP1(VcvtU32F32N, uint32_t, float32_t, vcvt_u32_f32, vst1_u32, vld1_f32, 2, 2) LOOP1(VcvtU64F64N, uint64_t, float64_t, vcvt_u64_f64, vst1_u64, vld1_f64, 1, 1) LOOP1(VcvtaS32F32N, int32_t, float32_t, vcvta_s32_f32, vst1_s32, vld1_f32, 2, 2) LOOP1(VcvtaS64F64N, int64_t, float64_t, vcvta_s64_f64, vst1_s64, vld1_f64, 1, 1) LOOP1(VcvtaU32F32N, uint32_t, float32_t, vcvta_u32_f32, vst1_u32, vld1_f32, 2, 2) LOOP1(VcvtaU64F64N, uint64_t, float64_t, vcvta_u64_f64, vst1_u64, vld1_f64, 1, 1) LOOP1(VcvtadS64F64N, int64_t, float64_t, vcvtad_s64_f64, save, load, 1, 1) LOOP1(VcvtadU64F64N, uint64_t, float64_t, vcvtad_u64_f64, save, load, 1, 1) LOOP1(VcvtaqS32F32N, int32_t, float32_t, vcvtaq_s32_f32, vst1q_s32, vld1q_f32, 4, 4) LOOP1(VcvtaqS64F64N, int64_t, float64_t, vcvtaq_s64_f64, vst1q_s64, vld1q_f64, 2, 2) LOOP1(VcvtaqU32F32N, uint32_t, float32_t, vcvtaq_u32_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP1(VcvtaqU64F64N, uint64_t, float64_t, vcvtaq_u64_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP1(VcvtasS32F32N, int32_t, float32_t, vcvtas_s32_f32, save, load, 1, 1) LOOP1(VcvtasU32F32N, uint32_t, float32_t, vcvtas_u32_f32, save, load, 1, 1) LOOP1(VcvtdF64S64N, float64_t, int64_t, vcvtd_f64_s64, save, load, 1, 1) LOOP1(VcvtdF64U64N, float64_t, uint64_t, vcvtd_f64_u64, save, load, 1, 1) LOOP1(VcvtdS64F64N, int64_t, float64_t, vcvtd_s64_f64, save, load, 1, 1) LOOP1(VcvtdU64F64N, uint64_t, float64_t, vcvtd_u64_f64, save, load, 1, 1) LOOP1(VcvtmS32F32N, int32_t, float32_t, vcvtm_s32_f32, vst1_s32, vld1_f32, 2, 2) LOOP1(VcvtmS64F64N, int64_t, float64_t, vcvtm_s64_f64, vst1_s64, vld1_f64, 1, 1) LOOP1(VcvtmU32F32N, uint32_t, float32_t, vcvtm_u32_f32, vst1_u32, vld1_f32, 2, 2) LOOP1(VcvtmU64F64N, uint64_t, float64_t, vcvtm_u64_f64, vst1_u64, vld1_f64, 1, 1) LOOP1(VcvtmdS64F64N, int64_t, float64_t, vcvtmd_s64_f64, save, load, 1, 1) LOOP1(VcvtmdU64F64N, uint64_t, float64_t, vcvtmd_u64_f64, save, load, 1, 1) LOOP1(VcvtmqS32F32N, int32_t, float32_t, vcvtmq_s32_f32, vst1q_s32, vld1q_f32, 4, 4) LOOP1(VcvtmqS64F64N, int64_t, float64_t, vcvtmq_s64_f64, vst1q_s64, vld1q_f64, 2, 2) LOOP1(VcvtmqU32F32N, uint32_t, float32_t, vcvtmq_u32_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP1(VcvtmqU64F64N, uint64_t, float64_t, vcvtmq_u64_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP1(VcvtmsS32F32N, int32_t, float32_t, vcvtms_s32_f32, save, load, 1, 1) LOOP1(VcvtmsU32F32N, uint32_t, float32_t, vcvtms_u32_f32, save, load, 1, 1) LOOP1(VcvtnS32F32N, int32_t, float32_t, vcvtn_s32_f32, vst1_s32, vld1_f32, 2, 2) LOOP1(VcvtnS64F64N, int64_t, float64_t, vcvtn_s64_f64, vst1_s64, vld1_f64, 1, 1) LOOP1(VcvtnU32F32N, uint32_t, float32_t, vcvtn_u32_f32, vst1_u32, vld1_f32, 2, 2) LOOP1(VcvtnU64F64N, uint64_t, float64_t, vcvtn_u64_f64, vst1_u64, vld1_f64, 1, 1) LOOP1(VcvtndS64F64N, int64_t, float64_t, vcvtnd_s64_f64, save, load, 1, 1) LOOP1(VcvtndU64F64N, uint64_t, float64_t, vcvtnd_u64_f64, save, load, 1, 1) LOOP1(VcvtnqS32F32N, int32_t, float32_t, vcvtnq_s32_f32, vst1q_s32, vld1q_f32, 4, 4) LOOP1(VcvtnqS64F64N, int64_t, float64_t, vcvtnq_s64_f64, vst1q_s64, vld1q_f64, 2, 2) LOOP1(VcvtnqU32F32N, uint32_t, float32_t, vcvtnq_u32_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP1(VcvtnqU64F64N, uint64_t, float64_t, vcvtnq_u64_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP1(VcvtnsS32F32N, int32_t, float32_t, vcvtns_s32_f32, save, load, 1, 1) LOOP1(VcvtnsU32F32N, uint32_t, float32_t, vcvtns_u32_f32, save, load, 1, 1) LOOP1(VcvtpS32F32N, int32_t, float32_t, vcvtp_s32_f32, vst1_s32, vld1_f32, 2, 2) LOOP1(VcvtpS64F64N, int64_t, float64_t, vcvtp_s64_f64, vst1_s64, vld1_f64, 1, 1) LOOP1(VcvtpU32F32N, uint32_t, float32_t, vcvtp_u32_f32, vst1_u32, vld1_f32, 2, 2) LOOP1(VcvtpU64F64N, uint64_t, float64_t, vcvtp_u64_f64, vst1_u64, vld1_f64, 1, 1) LOOP1(VcvtpdS64F64N, int64_t, float64_t, vcvtpd_s64_f64, save, load, 1, 1) LOOP1(VcvtpdU64F64N, uint64_t, float64_t, vcvtpd_u64_f64, save, load, 1, 1) LOOP1(VcvtpqS32F32N, int32_t, float32_t, vcvtpq_s32_f32, vst1q_s32, vld1q_f32, 4, 4) LOOP1(VcvtpqS64F64N, int64_t, float64_t, vcvtpq_s64_f64, vst1q_s64, vld1q_f64, 2, 2) LOOP1(VcvtpqU32F32N, uint32_t, float32_t, vcvtpq_u32_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP1(VcvtpqU64F64N, uint64_t, float64_t, vcvtpq_u64_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP1(VcvtpsS32F32N, int32_t, float32_t, vcvtps_s32_f32, save, load, 1, 1) LOOP1(VcvtpsU32F32N, uint32_t, float32_t, vcvtps_u32_f32, save, load, 1, 1) LOOP1(VcvtqF32S32N, float32_t, int32_t, vcvtq_f32_s32, vst1q_f32, vld1q_s32, 4, 4) LOOP1(VcvtqF32U32N, float32_t, uint32_t, vcvtq_f32_u32, vst1q_f32, vld1q_u32, 4, 4) LOOP1(VcvtqF64S64N, float64_t, int64_t, vcvtq_f64_s64, vst1q_f64, vld1q_s64, 2, 2) LOOP1(VcvtqF64U64N, float64_t, uint64_t, vcvtq_f64_u64, vst1q_f64, vld1q_u64, 2, 2) LOOP1(VcvtqS32F32N, int32_t, float32_t, vcvtq_s32_f32, vst1q_s32, vld1q_f32, 4, 4) LOOP1(VcvtqS64F64N, int64_t, float64_t, vcvtq_s64_f64, vst1q_s64, vld1q_f64, 2, 2) LOOP1(VcvtqU32F32N, uint32_t, float32_t, vcvtq_u32_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP1(VcvtqU64F64N, uint64_t, float64_t, vcvtq_u64_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP1(VcvtsF32S32N, float32_t, int32_t, vcvts_f32_s32, save, load, 1, 1) LOOP1(VcvtsF32U32N, float32_t, uint32_t, vcvts_f32_u32, save, load, 1, 1) LOOP1(VcvtsS32F32N, int32_t, float32_t, vcvts_s32_f32, save, load, 1, 1) LOOP1(VcvtsU32F32N, uint32_t, float32_t, vcvts_u32_f32, save, load, 1, 1) LOOP1(VdupNS8N, int8_t, int8_t, vdup_n_s8, vst1_s8, load, 8, 1) LOOP1(VdupNS16N, int16_t, int16_t, vdup_n_s16, vst1_s16, load, 4, 1) LOOP1(VdupNS32N, int32_t, int32_t, vdup_n_s32, vst1_s32, load, 2, 1) LOOP1(VdupNS64N, int64_t, int64_t, vdup_n_s64, vst1_s64, load, 1, 1) LOOP1(VdupNU8N, uint8_t, uint8_t, vdup_n_u8, vst1_u8, load, 8, 1) LOOP1(VdupNU16N, uint16_t, uint16_t, vdup_n_u16, vst1_u16, load, 4, 1) LOOP1(VdupNU32N, uint32_t, uint32_t, vdup_n_u32, vst1_u32, load, 2, 1) LOOP1(VdupNU64N, uint64_t, uint64_t, vdup_n_u64, vst1_u64, load, 1, 1) LOOP1(VdupNF32N, float32_t, float32_t, vdup_n_f32, vst1_f32, load, 2, 1) LOOP1(VdupNF64N, float64_t, float64_t, vdup_n_f64, vst1_f64, load, 1, 1) LOOP1(VdupqNS8N, int8_t, int8_t, vdupq_n_s8, vst1q_s8, load, 16, 1) LOOP1(VdupqNS16N, int16_t, int16_t, vdupq_n_s16, vst1q_s16, load, 8, 1) LOOP1(VdupqNS32N, int32_t, int32_t, vdupq_n_s32, vst1q_s32, load, 4, 1) LOOP1(VdupqNS64N, int64_t, int64_t, vdupq_n_s64, vst1q_s64, load, 2, 1) LOOP1(VdupqNU8N, uint8_t, uint8_t, vdupq_n_u8, vst1q_u8, load, 16, 1) LOOP1(VdupqNU16N, uint16_t, uint16_t, vdupq_n_u16, vst1q_u16, load, 8, 1) LOOP1(VdupqNU32N, uint32_t, uint32_t, vdupq_n_u32, vst1q_u32, load, 4, 1) LOOP1(VdupqNU64N, uint64_t, uint64_t, vdupq_n_u64, vst1q_u64, load, 2, 1) LOOP1(VdupqNF32N, float32_t, float32_t, vdupq_n_f32, vst1q_f32, load, 4, 1) LOOP1(VdupqNF64N, float64_t, float64_t, vdupq_n_f64, vst1q_f64, load, 2, 1) LOOP1(VgetHighS8N, int8_t, int8_t, vget_high_s8, vst1_s8, vld1q_s8, 8, 16) LOOP1(VgetHighS16N, int16_t, int16_t, vget_high_s16, vst1_s16, vld1q_s16, 4, 8) LOOP1(VgetHighS32N, int32_t, int32_t, vget_high_s32, vst1_s32, vld1q_s32, 2, 4) LOOP1(VgetHighS64N, int64_t, int64_t, vget_high_s64, vst1_s64, vld1q_s64, 1, 2) LOOP1(VgetHighU8N, uint8_t, uint8_t, vget_high_u8, vst1_u8, vld1q_u8, 8, 16) LOOP1(VgetHighU16N, uint16_t, uint16_t, vget_high_u16, vst1_u16, vld1q_u16, 4, 8) LOOP1(VgetHighU32N, uint32_t, uint32_t, vget_high_u32, vst1_u32, vld1q_u32, 2, 4) LOOP1(VgetHighU64N, uint64_t, uint64_t, vget_high_u64, vst1_u64, vld1q_u64, 1, 2) LOOP1(VgetHighF32N, float32_t, float32_t, vget_high_f32, vst1_f32, vld1q_f32, 2, 4) LOOP1(VgetHighF64N, float64_t, float64_t, vget_high_f64, vst1_f64, vld1q_f64, 1, 2) LOOP1(VgetLowS8N, int8_t, int8_t, vget_low_s8, vst1_s8, vld1q_s8, 8, 16) LOOP1(VgetLowS16N, int16_t, int16_t, vget_low_s16, vst1_s16, vld1q_s16, 4, 8) LOOP1(VgetLowS32N, int32_t, int32_t, vget_low_s32, vst1_s32, vld1q_s32, 2, 4) LOOP1(VgetLowS64N, int64_t, int64_t, vget_low_s64, vst1_s64, vld1q_s64, 1, 2) LOOP1(VgetLowU8N, uint8_t, uint8_t, vget_low_u8, vst1_u8, vld1q_u8, 8, 16) LOOP1(VgetLowU16N, uint16_t, uint16_t, vget_low_u16, vst1_u16, vld1q_u16, 4, 8) LOOP1(VgetLowU32N, uint32_t, uint32_t, vget_low_u32, vst1_u32, vld1q_u32, 2, 4) LOOP1(VgetLowU64N, uint64_t, uint64_t, vget_low_u64, vst1_u64, vld1q_u64, 1, 2) LOOP1(VgetLowF32N, float32_t, float32_t, vget_low_f32, vst1_f32, vld1q_f32, 2, 4) LOOP1(VgetLowF64N, float64_t, float64_t, vget_low_f64, vst1_f64, vld1q_f64, 1, 2) LOOP1(VmaxnmvF32N, float32_t, float32_t, vmaxnmv_f32, save, vld1_f32, 1, 2) LOOP1(VmaxnmvqF32N, float32_t, float32_t, vmaxnmvq_f32, save, vld1q_f32, 1, 4) LOOP1(VmaxnmvqF64N, float64_t, float64_t, vmaxnmvq_f64, save, vld1q_f64, 1, 2) LOOP1(VmaxvS8N, int8_t, int8_t, vmaxv_s8, save, vld1_s8, 1, 8) LOOP1(VmaxvS16N, int16_t, int16_t, vmaxv_s16, save, vld1_s16, 1, 4) LOOP1(VmaxvS32N, int32_t, int32_t, vmaxv_s32, save, vld1_s32, 1, 2) LOOP1(VmaxvU8N, uint8_t, uint8_t, vmaxv_u8, save, vld1_u8, 1, 8) LOOP1(VmaxvU16N, uint16_t, uint16_t, vmaxv_u16, save, vld1_u16, 1, 4) LOOP1(VmaxvU32N, uint32_t, uint32_t, vmaxv_u32, save, vld1_u32, 1, 2) LOOP1(VmaxvF32N, float32_t, float32_t, vmaxv_f32, save, vld1_f32, 1, 2) LOOP1(VmaxvqS8N, int8_t, int8_t, vmaxvq_s8, save, vld1q_s8, 1, 16) LOOP1(VmaxvqS16N, int16_t, int16_t, vmaxvq_s16, save, vld1q_s16, 1, 8) LOOP1(VmaxvqS32N, int32_t, int32_t, vmaxvq_s32, save, vld1q_s32, 1, 4) LOOP1(VmaxvqU8N, uint8_t, uint8_t, vmaxvq_u8, save, vld1q_u8, 1, 16) LOOP1(VmaxvqU16N, uint16_t, uint16_t, vmaxvq_u16, save, vld1q_u16, 1, 8) LOOP1(VmaxvqU32N, uint32_t, uint32_t, vmaxvq_u32, save, vld1q_u32, 1, 4) LOOP1(VmaxvqF32N, float32_t, float32_t, vmaxvq_f32, save, vld1q_f32, 1, 4) LOOP1(VmaxvqF64N, float64_t, float64_t, vmaxvq_f64, save, vld1q_f64, 1, 2) LOOP1(VminnmvF32N, float32_t, float32_t, vminnmv_f32, save, vld1_f32, 1, 2) LOOP1(VminnmvqF32N, float32_t, float32_t, vminnmvq_f32, save, vld1q_f32, 1, 4) LOOP1(VminnmvqF64N, float64_t, float64_t, vminnmvq_f64, save, vld1q_f64, 1, 2) LOOP1(VminvS8N, int8_t, int8_t, vminv_s8, save, vld1_s8, 1, 8) LOOP1(VminvS16N, int16_t, int16_t, vminv_s16, save, vld1_s16, 1, 4) LOOP1(VminvS32N, int32_t, int32_t, vminv_s32, save, vld1_s32, 1, 2) LOOP1(VminvU8N, uint8_t, uint8_t, vminv_u8, save, vld1_u8, 1, 8) LOOP1(VminvU16N, uint16_t, uint16_t, vminv_u16, save, vld1_u16, 1, 4) LOOP1(VminvU32N, uint32_t, uint32_t, vminv_u32, save, vld1_u32, 1, 2) LOOP1(VminvF32N, float32_t, float32_t, vminv_f32, save, vld1_f32, 1, 2) LOOP1(VminvqS8N, int8_t, int8_t, vminvq_s8, save, vld1q_s8, 1, 16) LOOP1(VminvqS16N, int16_t, int16_t, vminvq_s16, save, vld1q_s16, 1, 8) LOOP1(VminvqS32N, int32_t, int32_t, vminvq_s32, save, vld1q_s32, 1, 4) LOOP1(VminvqU8N, uint8_t, uint8_t, vminvq_u8, save, vld1q_u8, 1, 16) LOOP1(VminvqU16N, uint16_t, uint16_t, vminvq_u16, save, vld1q_u16, 1, 8) LOOP1(VminvqU32N, uint32_t, uint32_t, vminvq_u32, save, vld1q_u32, 1, 4) LOOP1(VminvqF32N, float32_t, float32_t, vminvq_f32, save, vld1q_f32, 1, 4) LOOP1(VminvqF64N, float64_t, float64_t, vminvq_f64, save, vld1q_f64, 1, 2) LOOP1(VmovNS8N, int8_t, int8_t, vmov_n_s8, vst1_s8, load, 8, 1) LOOP1(VmovNS16N, int16_t, int16_t, vmov_n_s16, vst1_s16, load, 4, 1) LOOP1(VmovNS32N, int32_t, int32_t, vmov_n_s32, vst1_s32, load, 2, 1) LOOP1(VmovNS64N, int64_t, int64_t, vmov_n_s64, vst1_s64, load, 1, 1) LOOP1(VmovNU8N, uint8_t, uint8_t, vmov_n_u8, vst1_u8, load, 8, 1) LOOP1(VmovNU16N, uint16_t, uint16_t, vmov_n_u16, vst1_u16, load, 4, 1) LOOP1(VmovNU32N, uint32_t, uint32_t, vmov_n_u32, vst1_u32, load, 2, 1) LOOP1(VmovNU64N, uint64_t, uint64_t, vmov_n_u64, vst1_u64, load, 1, 1) LOOP1(VmovNF32N, float32_t, float32_t, vmov_n_f32, vst1_f32, load, 2, 1) LOOP1(VmovNF64N, float64_t, float64_t, vmov_n_f64, vst1_f64, load, 1, 1) LOOP1(VmovqNS8N, int8_t, int8_t, vmovq_n_s8, vst1q_s8, load, 16, 1) LOOP1(VmovqNS16N, int16_t, int16_t, vmovq_n_s16, vst1q_s16, load, 8, 1) LOOP1(VmovqNS32N, int32_t, int32_t, vmovq_n_s32, vst1q_s32, load, 4, 1) LOOP1(VmovqNS64N, int64_t, int64_t, vmovq_n_s64, vst1q_s64, load, 2, 1) LOOP1(VmovqNU8N, uint8_t, uint8_t, vmovq_n_u8, vst1q_u8, load, 16, 1) LOOP1(VmovqNU16N, uint16_t, uint16_t, vmovq_n_u16, vst1q_u16, load, 8, 1) LOOP1(VmovqNU32N, uint32_t, uint32_t, vmovq_n_u32, vst1q_u32, load, 4, 1) LOOP1(VmovqNU64N, uint64_t, uint64_t, vmovq_n_u64, vst1q_u64, load, 2, 1) LOOP1(VmovqNF32N, float32_t, float32_t, vmovq_n_f32, vst1q_f32, load, 4, 1) LOOP1(VmovqNF64N, float64_t, float64_t, vmovq_n_f64, vst1q_f64, load, 2, 1) LOOP1(VmvnS8N, int8_t, int8_t, vmvn_s8, vst1_s8, vld1_s8, 8, 8) LOOP1(VmvnS16N, int16_t, int16_t, vmvn_s16, vst1_s16, vld1_s16, 4, 4) LOOP1(VmvnS32N, int32_t, int32_t, vmvn_s32, vst1_s32, vld1_s32, 2, 2) LOOP1(VmvnU8N, uint8_t, uint8_t, vmvn_u8, vst1_u8, vld1_u8, 8, 8) LOOP1(VmvnU16N, uint16_t, uint16_t, vmvn_u16, vst1_u16, vld1_u16, 4, 4) LOOP1(VmvnU32N, uint32_t, uint32_t, vmvn_u32, vst1_u32, vld1_u32, 2, 2) LOOP1(VmvnqS8N, int8_t, int8_t, vmvnq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP1(VmvnqS16N, int16_t, int16_t, vmvnq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP1(VmvnqS32N, int32_t, int32_t, vmvnq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP1(VmvnqU8N, uint8_t, uint8_t, vmvnq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP1(VmvnqU16N, uint16_t, uint16_t, vmvnq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP1(VmvnqU32N, uint32_t, uint32_t, vmvnq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP1(VnegS8N, int8_t, int8_t, vneg_s8, vst1_s8, vld1_s8, 8, 8) LOOP1(VnegS16N, int16_t, int16_t, vneg_s16, vst1_s16, vld1_s16, 4, 4) LOOP1(VnegS32N, int32_t, int32_t, vneg_s32, vst1_s32, vld1_s32, 2, 2) LOOP1(VnegS64N, int64_t, int64_t, vneg_s64, vst1_s64, vld1_s64, 1, 1) LOOP1(VnegF32N, float32_t, float32_t, vneg_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(VnegF64N, float64_t, float64_t, vneg_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(VnegdS64N, int64_t, int64_t, vnegd_s64, save, load, 1, 1) LOOP1(VnegqS8N, int8_t, int8_t, vnegq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP1(VnegqS16N, int16_t, int16_t, vnegq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP1(VnegqS32N, int32_t, int32_t, vnegq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP1(VnegqS64N, int64_t, int64_t, vnegq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP1(VnegqF32N, float32_t, float32_t, vnegq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(VnegqF64N, float64_t, float64_t, vnegq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(VpadddS64N, int64_t, int64_t, vpaddd_s64, save, vld1q_s64, 1, 2) LOOP1(VpadddU64N, uint64_t, uint64_t, vpaddd_u64, save, vld1q_u64, 1, 2) LOOP1(VpadddF64N, float64_t, float64_t, vpaddd_f64, save, vld1q_f64, 1, 2) LOOP1(VpaddsF32N, float32_t, float32_t, vpadds_f32, save, vld1_f32, 1, 2) LOOP1(VpmaxnmqdF64N, float64_t, float64_t, vpmaxnmqd_f64, save, vld1q_f64, 1, 2) LOOP1(VpmaxnmsF32N, float32_t, float32_t, vpmaxnms_f32, save, vld1_f32, 1, 2) LOOP1(VpmaxqdF64N, float64_t, float64_t, vpmaxqd_f64, save, vld1q_f64, 1, 2) LOOP1(VpmaxsF32N, float32_t, float32_t, vpmaxs_f32, save, vld1_f32, 1, 2) LOOP1(VpminnmqdF64N, float64_t, float64_t, vpminnmqd_f64, save, vld1q_f64, 1, 2) LOOP1(VpminnmsF32N, float32_t, float32_t, vpminnms_f32, save, vld1_f32, 1, 2) LOOP1(VpminqdF64N, float64_t, float64_t, vpminqd_f64, save, vld1q_f64, 1, 2) LOOP1(VpminsF32N, float32_t, float32_t, vpmins_f32, save, vld1_f32, 1, 2) LOOP1(VqabsS8N, int8_t, int8_t, vqabs_s8, vst1_s8, vld1_s8, 8, 8) LOOP1(VqabsS16N, int16_t, int16_t, vqabs_s16, vst1_s16, vld1_s16, 4, 4) LOOP1(VqabsS32N, int32_t, int32_t, vqabs_s32, vst1_s32, vld1_s32, 2, 2) LOOP1(VqabsS64N, int64_t, int64_t, vqabs_s64, vst1_s64, vld1_s64, 1, 1) LOOP1(VqabsbS8N, int8_t, int8_t, vqabsb_s8, save, load, 1, 1) LOOP1(VqabsdS64N, int64_t, int64_t, vqabsd_s64, save, load, 1, 1) LOOP1(VqabshS16N, int16_t, int16_t, vqabsh_s16, save, load, 1, 1) LOOP1(VqabsqS8N, int8_t, int8_t, vqabsq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP1(VqabsqS16N, int16_t, int16_t, vqabsq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP1(VqabsqS32N, int32_t, int32_t, vqabsq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP1(VqabsqS64N, int64_t, int64_t, vqabsq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP1(VqabssS32N, int32_t, int32_t, vqabss_s32, save, load, 1, 1) LOOP1(VqnegS8N, int8_t, int8_t, vqneg_s8, vst1_s8, vld1_s8, 8, 8) LOOP1(VqnegS16N, int16_t, int16_t, vqneg_s16, vst1_s16, vld1_s16, 4, 4) LOOP1(VqnegS32N, int32_t, int32_t, vqneg_s32, vst1_s32, vld1_s32, 2, 2) LOOP1(VqnegS64N, int64_t, int64_t, vqneg_s64, vst1_s64, vld1_s64, 1, 1) LOOP1(VqnegbS8N, int8_t, int8_t, vqnegb_s8, save, load, 1, 1) LOOP1(VqnegdS64N, int64_t, int64_t, vqnegd_s64, save, load, 1, 1) LOOP1(VqneghS16N, int16_t, int16_t, vqnegh_s16, save, load, 1, 1) LOOP1(VqnegqS8N, int8_t, int8_t, vqnegq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP1(VqnegqS16N, int16_t, int16_t, vqnegq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP1(VqnegqS32N, int32_t, int32_t, vqnegq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP1(VqnegqS64N, int64_t, int64_t, vqnegq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP1(VqnegsS32N, int32_t, int32_t, vqnegs_s32, save, load, 1, 1) LOOP1(VrbitS8N, int8_t, int8_t, vrbit_s8, vst1_s8, vld1_s8, 8, 8) LOOP1(VrbitU8N, uint8_t, uint8_t, vrbit_u8, vst1_u8, vld1_u8, 8, 8) LOOP1(VrbitqS8N, int8_t, int8_t, vrbitq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP1(VrbitqU8N, uint8_t, uint8_t, vrbitq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP1(VrecpeU32N, uint32_t, uint32_t, vrecpe_u32, vst1_u32, vld1_u32, 2, 2) LOOP1(VrecpeF32N, float32_t, float32_t, vrecpe_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(VrecpeF64N, float64_t, float64_t, vrecpe_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(VrecpedF64N, float64_t, float64_t, vrecped_f64, save, load, 1, 1) LOOP1(VrecpeqU32N, uint32_t, uint32_t, vrecpeq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP1(VrecpeqF32N, float32_t, float32_t, vrecpeq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(VrecpeqF64N, float64_t, float64_t, vrecpeq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(VrecpesF32N, float32_t, float32_t, vrecpes_f32, save, load, 1, 1) LOOP1(VrecpxdF64N, float64_t, float64_t, vrecpxd_f64, save, load, 1, 1) LOOP1(VrecpxsF32N, float32_t, float32_t, vrecpxs_f32, save, load, 1, 1) LOOP1(VreinterpretF32S32N, float32_t, int32_t, vreinterpret_f32_s32, vst1_f32, vld1_s32, 2, 2) LOOP1(VreinterpretF32U32N, float32_t, uint32_t, vreinterpret_f32_u32, vst1_f32, vld1_u32, 2, 2) LOOP1(VreinterpretF64S64N, float64_t, int64_t, vreinterpret_f64_s64, vst1_f64, vld1_s64, 1, 1) LOOP1(VreinterpretF64U64N, float64_t, uint64_t, vreinterpret_f64_u64, vst1_f64, vld1_u64, 1, 1) LOOP1(VreinterpretS16U16N, int16_t, uint16_t, vreinterpret_s16_u16, vst1_s16, vld1_u16, 4, 4) LOOP1(VreinterpretS32U32N, int32_t, uint32_t, vreinterpret_s32_u32, vst1_s32, vld1_u32, 2, 2) LOOP1(VreinterpretS32F32N, int32_t, float32_t, vreinterpret_s32_f32, vst1_s32, vld1_f32, 2, 2) LOOP1(VreinterpretS64U64N, int64_t, uint64_t, vreinterpret_s64_u64, vst1_s64, vld1_u64, 1, 1) LOOP1(VreinterpretS64F64N, int64_t, float64_t, vreinterpret_s64_f64, vst1_s64, vld1_f64, 1, 1) LOOP1(VreinterpretS8U8N, int8_t, uint8_t, vreinterpret_s8_u8, vst1_s8, vld1_u8, 8, 8) LOOP1(VreinterpretU16S16N, uint16_t, int16_t, vreinterpret_u16_s16, vst1_u16, vld1_s16, 4, 4) LOOP1(VreinterpretU32S32N, uint32_t, int32_t, vreinterpret_u32_s32, vst1_u32, vld1_s32, 2, 2) LOOP1(VreinterpretU32F32N, uint32_t, float32_t, vreinterpret_u32_f32, vst1_u32, vld1_f32, 2, 2) LOOP1(VreinterpretU64S64N, uint64_t, int64_t, vreinterpret_u64_s64, vst1_u64, vld1_s64, 1, 1) LOOP1(VreinterpretU64F64N, uint64_t, float64_t, vreinterpret_u64_f64, vst1_u64, vld1_f64, 1, 1) LOOP1(VreinterpretU8S8N, uint8_t, int8_t, vreinterpret_u8_s8, vst1_u8, vld1_s8, 8, 8) LOOP1(VreinterpretqF32S32N, float32_t, int32_t, vreinterpretq_f32_s32, vst1q_f32, vld1q_s32, 4, 4) LOOP1(VreinterpretqF32U32N, float32_t, uint32_t, vreinterpretq_f32_u32, vst1q_f32, vld1q_u32, 4, 4) LOOP1(VreinterpretqF64S64N, float64_t, int64_t, vreinterpretq_f64_s64, vst1q_f64, vld1q_s64, 2, 2) LOOP1(VreinterpretqF64U64N, float64_t, uint64_t, vreinterpretq_f64_u64, vst1q_f64, vld1q_u64, 2, 2) LOOP1(VreinterpretqS16U16N, int16_t, uint16_t, vreinterpretq_s16_u16, vst1q_s16, vld1q_u16, 8, 8) LOOP1(VreinterpretqS32U32N, int32_t, uint32_t, vreinterpretq_s32_u32, vst1q_s32, vld1q_u32, 4, 4) LOOP1(VreinterpretqS32F32N, int32_t, float32_t, vreinterpretq_s32_f32, vst1q_s32, vld1q_f32, 4, 4) LOOP1(VreinterpretqS64U64N, int64_t, uint64_t, vreinterpretq_s64_u64, vst1q_s64, vld1q_u64, 2, 2) LOOP1(VreinterpretqS64F64N, int64_t, float64_t, vreinterpretq_s64_f64, vst1q_s64, vld1q_f64, 2, 2) LOOP1(VreinterpretqS8U8N, int8_t, uint8_t, vreinterpretq_s8_u8, vst1q_s8, vld1q_u8, 16, 16) LOOP1(VreinterpretqU16S16N, uint16_t, int16_t, vreinterpretq_u16_s16, vst1q_u16, vld1q_s16, 8, 8) LOOP1(VreinterpretqU32S32N, uint32_t, int32_t, vreinterpretq_u32_s32, vst1q_u32, vld1q_s32, 4, 4) LOOP1(VreinterpretqU32F32N, uint32_t, float32_t, vreinterpretq_u32_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP1(VreinterpretqU64S64N, uint64_t, int64_t, vreinterpretq_u64_s64, vst1q_u64, vld1q_s64, 2, 2) LOOP1(VreinterpretqU64F64N, uint64_t, float64_t, vreinterpretq_u64_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP1(VreinterpretqU8S8N, uint8_t, int8_t, vreinterpretq_u8_s8, vst1q_u8, vld1q_s8, 16, 16) LOOP1(Vrev16S8N, int8_t, int8_t, vrev16_s8, vst1_s8, vld1_s8, 8, 8) LOOP1(Vrev16U8N, uint8_t, uint8_t, vrev16_u8, vst1_u8, vld1_u8, 8, 8) LOOP1(Vrev16QS8N, int8_t, int8_t, vrev16q_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP1(Vrev16QU8N, uint8_t, uint8_t, vrev16q_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP1(Vrev32S8N, int8_t, int8_t, vrev32_s8, vst1_s8, vld1_s8, 8, 8) LOOP1(Vrev32S16N, int16_t, int16_t, vrev32_s16, vst1_s16, vld1_s16, 4, 4) LOOP1(Vrev32U8N, uint8_t, uint8_t, vrev32_u8, vst1_u8, vld1_u8, 8, 8) LOOP1(Vrev32U16N, uint16_t, uint16_t, vrev32_u16, vst1_u16, vld1_u16, 4, 4) LOOP1(Vrev32QS8N, int8_t, int8_t, vrev32q_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP1(Vrev32QS16N, int16_t, int16_t, vrev32q_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP1(Vrev32QU8N, uint8_t, uint8_t, vrev32q_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP1(Vrev32QU16N, uint16_t, uint16_t, vrev32q_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP1(Vrev64S8N, int8_t, int8_t, vrev64_s8, vst1_s8, vld1_s8, 8, 8) LOOP1(Vrev64S16N, int16_t, int16_t, vrev64_s16, vst1_s16, vld1_s16, 4, 4) LOOP1(Vrev64S32N, int32_t, int32_t, vrev64_s32, vst1_s32, vld1_s32, 2, 2) LOOP1(Vrev64U8N, uint8_t, uint8_t, vrev64_u8, vst1_u8, vld1_u8, 8, 8) LOOP1(Vrev64U16N, uint16_t, uint16_t, vrev64_u16, vst1_u16, vld1_u16, 4, 4) LOOP1(Vrev64U32N, uint32_t, uint32_t, vrev64_u32, vst1_u32, vld1_u32, 2, 2) LOOP1(Vrev64F32N, float32_t, float32_t, vrev64_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(Vrev64QS8N, int8_t, int8_t, vrev64q_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP1(Vrev64QS16N, int16_t, int16_t, vrev64q_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP1(Vrev64QS32N, int32_t, int32_t, vrev64q_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP1(Vrev64QU8N, uint8_t, uint8_t, vrev64q_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP1(Vrev64QU16N, uint16_t, uint16_t, vrev64q_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP1(Vrev64QU32N, uint32_t, uint32_t, vrev64q_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP1(Vrev64QF32N, float32_t, float32_t, vrev64q_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(VrndF32N, float32_t, float32_t, vrnd_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(VrndF64N, float64_t, float64_t, vrnd_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(Vrnd32XF32N, float32_t, float32_t, vrnd32x_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(Vrnd32XF64N, float64_t, float64_t, vrnd32x_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(Vrnd32XqF32N, float32_t, float32_t, vrnd32xq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(Vrnd32XqF64N, float64_t, float64_t, vrnd32xq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(Vrnd32ZF32N, float32_t, float32_t, vrnd32z_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(Vrnd32ZF64N, float64_t, float64_t, vrnd32z_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(Vrnd32ZqF32N, float32_t, float32_t, vrnd32zq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(Vrnd32ZqF64N, float64_t, float64_t, vrnd32zq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(Vrnd64XF32N, float32_t, float32_t, vrnd64x_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(Vrnd64XF64N, float64_t, float64_t, vrnd64x_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(Vrnd64XqF32N, float32_t, float32_t, vrnd64xq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(Vrnd64XqF64N, float64_t, float64_t, vrnd64xq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(Vrnd64ZF32N, float32_t, float32_t, vrnd64z_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(Vrnd64ZF64N, float64_t, float64_t, vrnd64z_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(Vrnd64ZqF32N, float32_t, float32_t, vrnd64zq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(Vrnd64ZqF64N, float64_t, float64_t, vrnd64zq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(VrndaF32N, float32_t, float32_t, vrnda_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(VrndaF64N, float64_t, float64_t, vrnda_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(VrndaqF32N, float32_t, float32_t, vrndaq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(VrndaqF64N, float64_t, float64_t, vrndaq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(VrndiF32N, float32_t, float32_t, vrndi_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(VrndiF64N, float64_t, float64_t, vrndi_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(VrndiqF32N, float32_t, float32_t, vrndiq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(VrndiqF64N, float64_t, float64_t, vrndiq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(VrndmF32N, float32_t, float32_t, vrndm_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(VrndmF64N, float64_t, float64_t, vrndm_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(VrndmqF32N, float32_t, float32_t, vrndmq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(VrndmqF64N, float64_t, float64_t, vrndmq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(VrndnF32N, float32_t, float32_t, vrndn_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(VrndnF64N, float64_t, float64_t, vrndn_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(VrndnqF32N, float32_t, float32_t, vrndnq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(VrndnqF64N, float64_t, float64_t, vrndnq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(VrndnsF32N, float32_t, float32_t, vrndns_f32, save, load, 1, 1) LOOP1(VrndpF32N, float32_t, float32_t, vrndp_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(VrndpF64N, float64_t, float64_t, vrndp_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(VrndpqF32N, float32_t, float32_t, vrndpq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(VrndpqF64N, float64_t, float64_t, vrndpq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(VrndqF32N, float32_t, float32_t, vrndq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(VrndqF64N, float64_t, float64_t, vrndq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(VrndxF32N, float32_t, float32_t, vrndx_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(VrndxF64N, float64_t, float64_t, vrndx_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(VrndxqF32N, float32_t, float32_t, vrndxq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(VrndxqF64N, float64_t, float64_t, vrndxq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(VrsqrteU32N, uint32_t, uint32_t, vrsqrte_u32, vst1_u32, vld1_u32, 2, 2) LOOP1(VrsqrteF32N, float32_t, float32_t, vrsqrte_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(VrsqrteF64N, float64_t, float64_t, vrsqrte_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(VrsqrtedF64N, float64_t, float64_t, vrsqrted_f64, save, load, 1, 1) LOOP1(VrsqrteqU32N, uint32_t, uint32_t, vrsqrteq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP1(VrsqrteqF32N, float32_t, float32_t, vrsqrteq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(VrsqrteqF64N, float64_t, float64_t, vrsqrteq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP1(VrsqrtesF32N, float32_t, float32_t, vrsqrtes_f32, save, load, 1, 1) LOOP1(Vsha1HU32N, uint32_t, uint32_t, vsha1h_u32, save, load, 1, 1) LOOP1(VsqrtF32N, float32_t, float32_t, vsqrt_f32, vst1_f32, vld1_f32, 2, 2) LOOP1(VsqrtF64N, float64_t, float64_t, vsqrt_f64, vst1_f64, vld1_f64, 1, 1) LOOP1(VsqrtqF32N, float32_t, float32_t, vsqrtq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP1(VsqrtqF64N, float64_t, float64_t, vsqrtq_f64, vst1q_f64, vld1q_f64, 2, 2) #define LOOP2(name, rtype, itype, f, set, load, rstep, istep) \ void name(rtype *r, itype *v1, itype *v2, int32_t n) \ { \ while (n >= rstep) \ { \ set(r, f(load(v1), load(v2))); \ r += rstep; \ n -= rstep; \ v1 += istep; \ v2 += istep; \ } \ } LOOP2(VabdS8N, int8_t, int8_t, vabd_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VabdS16N, int16_t, int16_t, vabd_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VabdS32N, int32_t, int32_t, vabd_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VabdU8N, uint8_t, uint8_t, vabd_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VabdU16N, uint16_t, uint16_t, vabd_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VabdU32N, uint32_t, uint32_t, vabd_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VabdF32N, float32_t, float32_t, vabd_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VabdF64N, float64_t, float64_t, vabd_f64, vst1_f64, vld1_f64, 1, 1) LOOP2(VabddF64N, float64_t, float64_t, vabdd_f64, save, load, 1, 1) LOOP2(VabdqS8N, int8_t, int8_t, vabdq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VabdqS16N, int16_t, int16_t, vabdq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VabdqS32N, int32_t, int32_t, vabdq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VabdqU8N, uint8_t, uint8_t, vabdq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VabdqU16N, uint16_t, uint16_t, vabdq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VabdqU32N, uint32_t, uint32_t, vabdq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VabdqF32N, float32_t, float32_t, vabdq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VabdqF64N, float64_t, float64_t, vabdq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VabdsF32N, float32_t, float32_t, vabds_f32, save, load, 1, 1) LOOP2(VaddS8N, int8_t, int8_t, vadd_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VaddS16N, int16_t, int16_t, vadd_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VaddS32N, int32_t, int32_t, vadd_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VaddS64N, int64_t, int64_t, vadd_s64, vst1_s64, vld1_s64, 1, 1) LOOP2(VaddU8N, uint8_t, uint8_t, vadd_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VaddU16N, uint16_t, uint16_t, vadd_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VaddU32N, uint32_t, uint32_t, vadd_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VaddU64N, uint64_t, uint64_t, vadd_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VaddF32N, float32_t, float32_t, vadd_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VaddF64N, float64_t, float64_t, vadd_f64, vst1_f64, vld1_f64, 1, 1) LOOP2(VadddS64N, int64_t, int64_t, vaddd_s64, save, load, 1, 1) LOOP2(VadddU64N, uint64_t, uint64_t, vaddd_u64, save, load, 1, 1) LOOP2(VaddqS8N, int8_t, int8_t, vaddq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VaddqS16N, int16_t, int16_t, vaddq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VaddqS32N, int32_t, int32_t, vaddq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VaddqS64N, int64_t, int64_t, vaddq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(VaddqU8N, uint8_t, uint8_t, vaddq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VaddqU16N, uint16_t, uint16_t, vaddq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VaddqU32N, uint32_t, uint32_t, vaddq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VaddqU64N, uint64_t, uint64_t, vaddq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VaddqF32N, float32_t, float32_t, vaddq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VaddqF64N, float64_t, float64_t, vaddq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VaesdqU8N, uint8_t, uint8_t, vaesdq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VaeseqU8N, uint8_t, uint8_t, vaeseq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VandS8N, int8_t, int8_t, vand_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VandS16N, int16_t, int16_t, vand_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VandS32N, int32_t, int32_t, vand_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VandS64N, int64_t, int64_t, vand_s64, vst1_s64, vld1_s64, 1, 1) LOOP2(VandU8N, uint8_t, uint8_t, vand_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VandU16N, uint16_t, uint16_t, vand_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VandU32N, uint32_t, uint32_t, vand_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VandU64N, uint64_t, uint64_t, vand_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VandqS8N, int8_t, int8_t, vandq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VandqS16N, int16_t, int16_t, vandq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VandqS32N, int32_t, int32_t, vandq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VandqS64N, int64_t, int64_t, vandq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(VandqU8N, uint8_t, uint8_t, vandq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VandqU16N, uint16_t, uint16_t, vandq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VandqU32N, uint32_t, uint32_t, vandq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VandqU64N, uint64_t, uint64_t, vandq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VbicS8N, int8_t, int8_t, vbic_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VbicS16N, int16_t, int16_t, vbic_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VbicS32N, int32_t, int32_t, vbic_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VbicS64N, int64_t, int64_t, vbic_s64, vst1_s64, vld1_s64, 1, 1) LOOP2(VbicU8N, uint8_t, uint8_t, vbic_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VbicU16N, uint16_t, uint16_t, vbic_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VbicU32N, uint32_t, uint32_t, vbic_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VbicU64N, uint64_t, uint64_t, vbic_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VbicqS8N, int8_t, int8_t, vbicq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VbicqS16N, int16_t, int16_t, vbicq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VbicqS32N, int32_t, int32_t, vbicq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VbicqS64N, int64_t, int64_t, vbicq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(VbicqU8N, uint8_t, uint8_t, vbicq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VbicqU16N, uint16_t, uint16_t, vbicq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VbicqU32N, uint32_t, uint32_t, vbicq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VbicqU64N, uint64_t, uint64_t, vbicq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VcaddRot270F32N, float32_t, float32_t, vcadd_rot270_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VcaddRot90F32N, float32_t, float32_t, vcadd_rot90_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VcaddqRot270F32N, float32_t, float32_t, vcaddq_rot270_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VcaddqRot270F64N, float64_t, float64_t, vcaddq_rot270_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VcaddqRot90F32N, float32_t, float32_t, vcaddq_rot90_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VcaddqRot90F64N, float64_t, float64_t, vcaddq_rot90_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VcageF32N, uint32_t, float32_t, vcage_f32, vst1_u32, vld1_f32, 2, 2) LOOP2(VcageF64N, uint64_t, float64_t, vcage_f64, vst1_u64, vld1_f64, 1, 1) LOOP2(VcagedF64N, uint64_t, float64_t, vcaged_f64, save, load, 1, 1) LOOP2(VcageqF32N, uint32_t, float32_t, vcageq_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP2(VcageqF64N, uint64_t, float64_t, vcageq_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP2(VcagesF32N, uint32_t, float32_t, vcages_f32, save, load, 1, 1) LOOP2(VcagtF32N, uint32_t, float32_t, vcagt_f32, vst1_u32, vld1_f32, 2, 2) LOOP2(VcagtF64N, uint64_t, float64_t, vcagt_f64, vst1_u64, vld1_f64, 1, 1) LOOP2(VcagtdF64N, uint64_t, float64_t, vcagtd_f64, save, load, 1, 1) LOOP2(VcagtqF32N, uint32_t, float32_t, vcagtq_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP2(VcagtqF64N, uint64_t, float64_t, vcagtq_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP2(VcagtsF32N, uint32_t, float32_t, vcagts_f32, save, load, 1, 1) LOOP2(VcaleF32N, uint32_t, float32_t, vcale_f32, vst1_u32, vld1_f32, 2, 2) LOOP2(VcaleF64N, uint64_t, float64_t, vcale_f64, vst1_u64, vld1_f64, 1, 1) LOOP2(VcaledF64N, uint64_t, float64_t, vcaled_f64, save, load, 1, 1) LOOP2(VcaleqF32N, uint32_t, float32_t, vcaleq_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP2(VcaleqF64N, uint64_t, float64_t, vcaleq_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP2(VcalesF32N, uint32_t, float32_t, vcales_f32, save, load, 1, 1) LOOP2(VcaltF32N, uint32_t, float32_t, vcalt_f32, vst1_u32, vld1_f32, 2, 2) LOOP2(VcaltF64N, uint64_t, float64_t, vcalt_f64, vst1_u64, vld1_f64, 1, 1) LOOP2(VcaltdF64N, uint64_t, float64_t, vcaltd_f64, save, load, 1, 1) LOOP2(VcaltqF32N, uint32_t, float32_t, vcaltq_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP2(VcaltqF64N, uint64_t, float64_t, vcaltq_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP2(VcaltsF32N, uint32_t, float32_t, vcalts_f32, save, load, 1, 1) LOOP2(VceqS8N, uint8_t, int8_t, vceq_s8, vst1_u8, vld1_s8, 8, 8) LOOP2(VceqS16N, uint16_t, int16_t, vceq_s16, vst1_u16, vld1_s16, 4, 4) LOOP2(VceqS32N, uint32_t, int32_t, vceq_s32, vst1_u32, vld1_s32, 2, 2) LOOP2(VceqS64N, uint64_t, int64_t, vceq_s64, vst1_u64, vld1_s64, 1, 1) LOOP2(VceqU8N, uint8_t, uint8_t, vceq_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VceqU16N, uint16_t, uint16_t, vceq_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VceqU32N, uint32_t, uint32_t, vceq_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VceqU64N, uint64_t, uint64_t, vceq_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VceqF32N, uint32_t, float32_t, vceq_f32, vst1_u32, vld1_f32, 2, 2) LOOP2(VceqF64N, uint64_t, float64_t, vceq_f64, vst1_u64, vld1_f64, 1, 1) LOOP2(VceqdS64N, uint64_t, int64_t, vceqd_s64, save, load, 1, 1) LOOP2(VceqdU64N, uint64_t, uint64_t, vceqd_u64, save, load, 1, 1) LOOP2(VceqdF64N, uint64_t, float64_t, vceqd_f64, save, load, 1, 1) LOOP2(VceqqS8N, uint8_t, int8_t, vceqq_s8, vst1q_u8, vld1q_s8, 16, 16) LOOP2(VceqqS16N, uint16_t, int16_t, vceqq_s16, vst1q_u16, vld1q_s16, 8, 8) LOOP2(VceqqS32N, uint32_t, int32_t, vceqq_s32, vst1q_u32, vld1q_s32, 4, 4) LOOP2(VceqqS64N, uint64_t, int64_t, vceqq_s64, vst1q_u64, vld1q_s64, 2, 2) LOOP2(VceqqU8N, uint8_t, uint8_t, vceqq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VceqqU16N, uint16_t, uint16_t, vceqq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VceqqU32N, uint32_t, uint32_t, vceqq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VceqqU64N, uint64_t, uint64_t, vceqq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VceqqF32N, uint32_t, float32_t, vceqq_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP2(VceqqF64N, uint64_t, float64_t, vceqq_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP2(VceqsF32N, uint32_t, float32_t, vceqs_f32, save, load, 1, 1) LOOP2(VcgeS8N, uint8_t, int8_t, vcge_s8, vst1_u8, vld1_s8, 8, 8) LOOP2(VcgeS16N, uint16_t, int16_t, vcge_s16, vst1_u16, vld1_s16, 4, 4) LOOP2(VcgeS32N, uint32_t, int32_t, vcge_s32, vst1_u32, vld1_s32, 2, 2) LOOP2(VcgeS64N, uint64_t, int64_t, vcge_s64, vst1_u64, vld1_s64, 1, 1) LOOP2(VcgeU8N, uint8_t, uint8_t, vcge_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VcgeU16N, uint16_t, uint16_t, vcge_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VcgeU32N, uint32_t, uint32_t, vcge_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VcgeU64N, uint64_t, uint64_t, vcge_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VcgeF32N, uint32_t, float32_t, vcge_f32, vst1_u32, vld1_f32, 2, 2) LOOP2(VcgeF64N, uint64_t, float64_t, vcge_f64, vst1_u64, vld1_f64, 1, 1) LOOP2(VcgedS64N, uint64_t, int64_t, vcged_s64, save, load, 1, 1) LOOP2(VcgedU64N, uint64_t, uint64_t, vcged_u64, save, load, 1, 1) LOOP2(VcgedF64N, uint64_t, float64_t, vcged_f64, save, load, 1, 1) LOOP2(VcgeqS8N, uint8_t, int8_t, vcgeq_s8, vst1q_u8, vld1q_s8, 16, 16) LOOP2(VcgeqS16N, uint16_t, int16_t, vcgeq_s16, vst1q_u16, vld1q_s16, 8, 8) LOOP2(VcgeqS32N, uint32_t, int32_t, vcgeq_s32, vst1q_u32, vld1q_s32, 4, 4) LOOP2(VcgeqS64N, uint64_t, int64_t, vcgeq_s64, vst1q_u64, vld1q_s64, 2, 2) LOOP2(VcgeqU8N, uint8_t, uint8_t, vcgeq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VcgeqU16N, uint16_t, uint16_t, vcgeq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VcgeqU32N, uint32_t, uint32_t, vcgeq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VcgeqU64N, uint64_t, uint64_t, vcgeq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VcgeqF32N, uint32_t, float32_t, vcgeq_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP2(VcgeqF64N, uint64_t, float64_t, vcgeq_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP2(VcgesF32N, uint32_t, float32_t, vcges_f32, save, load, 1, 1) LOOP2(VcgtS8N, uint8_t, int8_t, vcgt_s8, vst1_u8, vld1_s8, 8, 8) LOOP2(VcgtS16N, uint16_t, int16_t, vcgt_s16, vst1_u16, vld1_s16, 4, 4) LOOP2(VcgtS32N, uint32_t, int32_t, vcgt_s32, vst1_u32, vld1_s32, 2, 2) LOOP2(VcgtS64N, uint64_t, int64_t, vcgt_s64, vst1_u64, vld1_s64, 1, 1) LOOP2(VcgtU8N, uint8_t, uint8_t, vcgt_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VcgtU16N, uint16_t, uint16_t, vcgt_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VcgtU32N, uint32_t, uint32_t, vcgt_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VcgtU64N, uint64_t, uint64_t, vcgt_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VcgtF32N, uint32_t, float32_t, vcgt_f32, vst1_u32, vld1_f32, 2, 2) LOOP2(VcgtF64N, uint64_t, float64_t, vcgt_f64, vst1_u64, vld1_f64, 1, 1) LOOP2(VcgtdS64N, uint64_t, int64_t, vcgtd_s64, save, load, 1, 1) LOOP2(VcgtdU64N, uint64_t, uint64_t, vcgtd_u64, save, load, 1, 1) LOOP2(VcgtdF64N, uint64_t, float64_t, vcgtd_f64, save, load, 1, 1) LOOP2(VcgtqS8N, uint8_t, int8_t, vcgtq_s8, vst1q_u8, vld1q_s8, 16, 16) LOOP2(VcgtqS16N, uint16_t, int16_t, vcgtq_s16, vst1q_u16, vld1q_s16, 8, 8) LOOP2(VcgtqS32N, uint32_t, int32_t, vcgtq_s32, vst1q_u32, vld1q_s32, 4, 4) LOOP2(VcgtqS64N, uint64_t, int64_t, vcgtq_s64, vst1q_u64, vld1q_s64, 2, 2) LOOP2(VcgtqU8N, uint8_t, uint8_t, vcgtq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VcgtqU16N, uint16_t, uint16_t, vcgtq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VcgtqU32N, uint32_t, uint32_t, vcgtq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VcgtqU64N, uint64_t, uint64_t, vcgtq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VcgtqF32N, uint32_t, float32_t, vcgtq_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP2(VcgtqF64N, uint64_t, float64_t, vcgtq_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP2(VcgtsF32N, uint32_t, float32_t, vcgts_f32, save, load, 1, 1) LOOP2(VcleS8N, uint8_t, int8_t, vcle_s8, vst1_u8, vld1_s8, 8, 8) LOOP2(VcleS16N, uint16_t, int16_t, vcle_s16, vst1_u16, vld1_s16, 4, 4) LOOP2(VcleS32N, uint32_t, int32_t, vcle_s32, vst1_u32, vld1_s32, 2, 2) LOOP2(VcleS64N, uint64_t, int64_t, vcle_s64, vst1_u64, vld1_s64, 1, 1) LOOP2(VcleU8N, uint8_t, uint8_t, vcle_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VcleU16N, uint16_t, uint16_t, vcle_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VcleU32N, uint32_t, uint32_t, vcle_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VcleU64N, uint64_t, uint64_t, vcle_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VcleF32N, uint32_t, float32_t, vcle_f32, vst1_u32, vld1_f32, 2, 2) LOOP2(VcleF64N, uint64_t, float64_t, vcle_f64, vst1_u64, vld1_f64, 1, 1) LOOP2(VcledS64N, uint64_t, int64_t, vcled_s64, save, load, 1, 1) LOOP2(VcledU64N, uint64_t, uint64_t, vcled_u64, save, load, 1, 1) LOOP2(VcledF64N, uint64_t, float64_t, vcled_f64, save, load, 1, 1) LOOP2(VcleqS8N, uint8_t, int8_t, vcleq_s8, vst1q_u8, vld1q_s8, 16, 16) LOOP2(VcleqS16N, uint16_t, int16_t, vcleq_s16, vst1q_u16, vld1q_s16, 8, 8) LOOP2(VcleqS32N, uint32_t, int32_t, vcleq_s32, vst1q_u32, vld1q_s32, 4, 4) LOOP2(VcleqS64N, uint64_t, int64_t, vcleq_s64, vst1q_u64, vld1q_s64, 2, 2) LOOP2(VcleqU8N, uint8_t, uint8_t, vcleq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VcleqU16N, uint16_t, uint16_t, vcleq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VcleqU32N, uint32_t, uint32_t, vcleq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VcleqU64N, uint64_t, uint64_t, vcleq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VcleqF32N, uint32_t, float32_t, vcleq_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP2(VcleqF64N, uint64_t, float64_t, vcleq_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP2(VclesF32N, uint32_t, float32_t, vcles_f32, save, load, 1, 1) LOOP2(VcltS8N, uint8_t, int8_t, vclt_s8, vst1_u8, vld1_s8, 8, 8) LOOP2(VcltS16N, uint16_t, int16_t, vclt_s16, vst1_u16, vld1_s16, 4, 4) LOOP2(VcltS32N, uint32_t, int32_t, vclt_s32, vst1_u32, vld1_s32, 2, 2) LOOP2(VcltS64N, uint64_t, int64_t, vclt_s64, vst1_u64, vld1_s64, 1, 1) LOOP2(VcltU8N, uint8_t, uint8_t, vclt_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VcltU16N, uint16_t, uint16_t, vclt_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VcltU32N, uint32_t, uint32_t, vclt_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VcltU64N, uint64_t, uint64_t, vclt_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VcltF32N, uint32_t, float32_t, vclt_f32, vst1_u32, vld1_f32, 2, 2) LOOP2(VcltF64N, uint64_t, float64_t, vclt_f64, vst1_u64, vld1_f64, 1, 1) LOOP2(VcltdS64N, uint64_t, int64_t, vcltd_s64, save, load, 1, 1) LOOP2(VcltdU64N, uint64_t, uint64_t, vcltd_u64, save, load, 1, 1) LOOP2(VcltdF64N, uint64_t, float64_t, vcltd_f64, save, load, 1, 1) LOOP2(VcltqS8N, uint8_t, int8_t, vcltq_s8, vst1q_u8, vld1q_s8, 16, 16) LOOP2(VcltqS16N, uint16_t, int16_t, vcltq_s16, vst1q_u16, vld1q_s16, 8, 8) LOOP2(VcltqS32N, uint32_t, int32_t, vcltq_s32, vst1q_u32, vld1q_s32, 4, 4) LOOP2(VcltqS64N, uint64_t, int64_t, vcltq_s64, vst1q_u64, vld1q_s64, 2, 2) LOOP2(VcltqU8N, uint8_t, uint8_t, vcltq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VcltqU16N, uint16_t, uint16_t, vcltq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VcltqU32N, uint32_t, uint32_t, vcltq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VcltqU64N, uint64_t, uint64_t, vcltq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VcltqF32N, uint32_t, float32_t, vcltq_f32, vst1q_u32, vld1q_f32, 4, 4) LOOP2(VcltqF64N, uint64_t, float64_t, vcltq_f64, vst1q_u64, vld1q_f64, 2, 2) LOOP2(VcltsF32N, uint32_t, float32_t, vclts_f32, save, load, 1, 1) LOOP2(VcombineS8N, int8_t, int8_t, vcombine_s8, vst1q_s8, vld1_s8, 16, 8) LOOP2(VcombineS16N, int16_t, int16_t, vcombine_s16, vst1q_s16, vld1_s16, 8, 4) LOOP2(VcombineS32N, int32_t, int32_t, vcombine_s32, vst1q_s32, vld1_s32, 4, 2) LOOP2(VcombineS64N, int64_t, int64_t, vcombine_s64, vst1q_s64, vld1_s64, 2, 1) LOOP2(VcombineU8N, uint8_t, uint8_t, vcombine_u8, vst1q_u8, vld1_u8, 16, 8) LOOP2(VcombineU16N, uint16_t, uint16_t, vcombine_u16, vst1q_u16, vld1_u16, 8, 4) LOOP2(VcombineU32N, uint32_t, uint32_t, vcombine_u32, vst1q_u32, vld1_u32, 4, 2) LOOP2(VcombineU64N, uint64_t, uint64_t, vcombine_u64, vst1q_u64, vld1_u64, 2, 1) LOOP2(VcombineF32N, float32_t, float32_t, vcombine_f32, vst1q_f32, vld1_f32, 4, 2) LOOP2(VcombineF64N, float64_t, float64_t, vcombine_f64, vst1q_f64, vld1_f64, 2, 1) LOOP2(VdivF32N, float32_t, float32_t, vdiv_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VdivF64N, float64_t, float64_t, vdiv_f64, vst1_f64, vld1_f64, 1, 1) LOOP2(VdivqF32N, float32_t, float32_t, vdivq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VdivqF64N, float64_t, float64_t, vdivq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VeorS8N, int8_t, int8_t, veor_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VeorS16N, int16_t, int16_t, veor_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VeorS32N, int32_t, int32_t, veor_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VeorS64N, int64_t, int64_t, veor_s64, vst1_s64, vld1_s64, 1, 1) LOOP2(VeorU8N, uint8_t, uint8_t, veor_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VeorU16N, uint16_t, uint16_t, veor_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VeorU32N, uint32_t, uint32_t, veor_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VeorU64N, uint64_t, uint64_t, veor_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VeorqS8N, int8_t, int8_t, veorq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VeorqS16N, int16_t, int16_t, veorq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VeorqS32N, int32_t, int32_t, veorq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VeorqS64N, int64_t, int64_t, veorq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(VeorqU8N, uint8_t, uint8_t, veorq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VeorqU16N, uint16_t, uint16_t, veorq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VeorqU32N, uint32_t, uint32_t, veorq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VeorqU64N, uint64_t, uint64_t, veorq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VhaddS8N, int8_t, int8_t, vhadd_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VhaddS16N, int16_t, int16_t, vhadd_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VhaddS32N, int32_t, int32_t, vhadd_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VhaddU8N, uint8_t, uint8_t, vhadd_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VhaddU16N, uint16_t, uint16_t, vhadd_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VhaddU32N, uint32_t, uint32_t, vhadd_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VhaddqS8N, int8_t, int8_t, vhaddq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VhaddqS16N, int16_t, int16_t, vhaddq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VhaddqS32N, int32_t, int32_t, vhaddq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VhaddqU8N, uint8_t, uint8_t, vhaddq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VhaddqU16N, uint16_t, uint16_t, vhaddq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VhaddqU32N, uint32_t, uint32_t, vhaddq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VhsubS8N, int8_t, int8_t, vhsub_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VhsubS16N, int16_t, int16_t, vhsub_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VhsubS32N, int32_t, int32_t, vhsub_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VhsubU8N, uint8_t, uint8_t, vhsub_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VhsubU16N, uint16_t, uint16_t, vhsub_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VhsubU32N, uint32_t, uint32_t, vhsub_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VhsubqS8N, int8_t, int8_t, vhsubq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VhsubqS16N, int16_t, int16_t, vhsubq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VhsubqS32N, int32_t, int32_t, vhsubq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VhsubqU8N, uint8_t, uint8_t, vhsubq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VhsubqU16N, uint16_t, uint16_t, vhsubq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VhsubqU32N, uint32_t, uint32_t, vhsubq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VmaxS8N, int8_t, int8_t, vmax_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VmaxS16N, int16_t, int16_t, vmax_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VmaxS32N, int32_t, int32_t, vmax_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VmaxU8N, uint8_t, uint8_t, vmax_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VmaxU16N, uint16_t, uint16_t, vmax_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VmaxU32N, uint32_t, uint32_t, vmax_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VmaxF32N, float32_t, float32_t, vmax_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VmaxF64N, float64_t, float64_t, vmax_f64, vst1_f64, vld1_f64, 1, 1) LOOP2(VmaxnmF32N, float32_t, float32_t, vmaxnm_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VmaxnmF64N, float64_t, float64_t, vmaxnm_f64, vst1_f64, vld1_f64, 1, 1) LOOP2(VmaxnmqF32N, float32_t, float32_t, vmaxnmq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VmaxnmqF64N, float64_t, float64_t, vmaxnmq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VmaxqS8N, int8_t, int8_t, vmaxq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VmaxqS16N, int16_t, int16_t, vmaxq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VmaxqS32N, int32_t, int32_t, vmaxq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VmaxqU8N, uint8_t, uint8_t, vmaxq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VmaxqU16N, uint16_t, uint16_t, vmaxq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VmaxqU32N, uint32_t, uint32_t, vmaxq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VmaxqF32N, float32_t, float32_t, vmaxq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VmaxqF64N, float64_t, float64_t, vmaxq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VminS8N, int8_t, int8_t, vmin_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VminS16N, int16_t, int16_t, vmin_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VminS32N, int32_t, int32_t, vmin_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VminU8N, uint8_t, uint8_t, vmin_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VminU16N, uint16_t, uint16_t, vmin_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VminU32N, uint32_t, uint32_t, vmin_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VminF32N, float32_t, float32_t, vmin_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VminF64N, float64_t, float64_t, vmin_f64, vst1_f64, vld1_f64, 1, 1) LOOP2(VminnmF32N, float32_t, float32_t, vminnm_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VminnmF64N, float64_t, float64_t, vminnm_f64, vst1_f64, vld1_f64, 1, 1) LOOP2(VminnmqF32N, float32_t, float32_t, vminnmq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VminnmqF64N, float64_t, float64_t, vminnmq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VminqS8N, int8_t, int8_t, vminq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VminqS16N, int16_t, int16_t, vminq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VminqS32N, int32_t, int32_t, vminq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VminqU8N, uint8_t, uint8_t, vminq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VminqU16N, uint16_t, uint16_t, vminq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VminqU32N, uint32_t, uint32_t, vminq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VminqF32N, float32_t, float32_t, vminq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VminqF64N, float64_t, float64_t, vminq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VmulS8N, int8_t, int8_t, vmul_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VmulS16N, int16_t, int16_t, vmul_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VmulS32N, int32_t, int32_t, vmul_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VmulU8N, uint8_t, uint8_t, vmul_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VmulU16N, uint16_t, uint16_t, vmul_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VmulU32N, uint32_t, uint32_t, vmul_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VmulF32N, float32_t, float32_t, vmul_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VmulF64N, float64_t, float64_t, vmul_f64, vst1_f64, vld1_f64, 1, 1) LOOP2(VmulqS8N, int8_t, int8_t, vmulq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VmulqS16N, int16_t, int16_t, vmulq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VmulqS32N, int32_t, int32_t, vmulq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VmulqU8N, uint8_t, uint8_t, vmulq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VmulqU16N, uint16_t, uint16_t, vmulq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VmulqU32N, uint32_t, uint32_t, vmulq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VmulqF32N, float32_t, float32_t, vmulq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VmulqF64N, float64_t, float64_t, vmulq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VmulxF32N, float32_t, float32_t, vmulx_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VmulxF64N, float64_t, float64_t, vmulx_f64, vst1_f64, vld1_f64, 1, 1) LOOP2(VmulxdF64N, float64_t, float64_t, vmulxd_f64, save, load, 1, 1) LOOP2(VmulxqF32N, float32_t, float32_t, vmulxq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VmulxqF64N, float64_t, float64_t, vmulxq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VmulxsF32N, float32_t, float32_t, vmulxs_f32, save, load, 1, 1) LOOP2(VornS8N, int8_t, int8_t, vorn_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VornS16N, int16_t, int16_t, vorn_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VornS32N, int32_t, int32_t, vorn_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VornS64N, int64_t, int64_t, vorn_s64, vst1_s64, vld1_s64, 1, 1) LOOP2(VornU8N, uint8_t, uint8_t, vorn_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VornU16N, uint16_t, uint16_t, vorn_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VornU32N, uint32_t, uint32_t, vorn_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VornU64N, uint64_t, uint64_t, vorn_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VornqS8N, int8_t, int8_t, vornq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VornqS16N, int16_t, int16_t, vornq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VornqS32N, int32_t, int32_t, vornq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VornqS64N, int64_t, int64_t, vornq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(VornqU8N, uint8_t, uint8_t, vornq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VornqU16N, uint16_t, uint16_t, vornq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VornqU32N, uint32_t, uint32_t, vornq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VornqU64N, uint64_t, uint64_t, vornq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VorrS8N, int8_t, int8_t, vorr_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VorrS16N, int16_t, int16_t, vorr_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VorrS32N, int32_t, int32_t, vorr_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VorrS64N, int64_t, int64_t, vorr_s64, vst1_s64, vld1_s64, 1, 1) LOOP2(VorrU8N, uint8_t, uint8_t, vorr_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VorrU16N, uint16_t, uint16_t, vorr_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VorrU32N, uint32_t, uint32_t, vorr_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VorrU64N, uint64_t, uint64_t, vorr_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VorrqS8N, int8_t, int8_t, vorrq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VorrqS16N, int16_t, int16_t, vorrq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VorrqS32N, int32_t, int32_t, vorrq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VorrqS64N, int64_t, int64_t, vorrq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(VorrqU8N, uint8_t, uint8_t, vorrq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VorrqU16N, uint16_t, uint16_t, vorrq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VorrqU32N, uint32_t, uint32_t, vorrq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VorrqU64N, uint64_t, uint64_t, vorrq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VpaddS8N, int8_t, int8_t, vpadd_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VpaddS16N, int16_t, int16_t, vpadd_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VpaddS32N, int32_t, int32_t, vpadd_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VpaddU8N, uint8_t, uint8_t, vpadd_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VpaddU16N, uint16_t, uint16_t, vpadd_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VpaddU32N, uint32_t, uint32_t, vpadd_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VpaddF32N, float32_t, float32_t, vpadd_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VpaddqS8N, int8_t, int8_t, vpaddq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VpaddqS16N, int16_t, int16_t, vpaddq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VpaddqS32N, int32_t, int32_t, vpaddq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VpaddqS64N, int64_t, int64_t, vpaddq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(VpaddqU8N, uint8_t, uint8_t, vpaddq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VpaddqU16N, uint16_t, uint16_t, vpaddq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VpaddqU32N, uint32_t, uint32_t, vpaddq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VpaddqU64N, uint64_t, uint64_t, vpaddq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VpaddqF32N, float32_t, float32_t, vpaddq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VpaddqF64N, float64_t, float64_t, vpaddq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VpmaxS8N, int8_t, int8_t, vpmax_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VpmaxS16N, int16_t, int16_t, vpmax_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VpmaxS32N, int32_t, int32_t, vpmax_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VpmaxU8N, uint8_t, uint8_t, vpmax_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VpmaxU16N, uint16_t, uint16_t, vpmax_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VpmaxU32N, uint32_t, uint32_t, vpmax_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VpmaxF32N, float32_t, float32_t, vpmax_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VpmaxnmF32N, float32_t, float32_t, vpmaxnm_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VpmaxnmqF32N, float32_t, float32_t, vpmaxnmq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VpmaxnmqF64N, float64_t, float64_t, vpmaxnmq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VpmaxqS8N, int8_t, int8_t, vpmaxq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VpmaxqS16N, int16_t, int16_t, vpmaxq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VpmaxqS32N, int32_t, int32_t, vpmaxq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VpmaxqU8N, uint8_t, uint8_t, vpmaxq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VpmaxqU16N, uint16_t, uint16_t, vpmaxq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VpmaxqU32N, uint32_t, uint32_t, vpmaxq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VpmaxqF32N, float32_t, float32_t, vpmaxq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VpmaxqF64N, float64_t, float64_t, vpmaxq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VpminS8N, int8_t, int8_t, vpmin_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VpminS16N, int16_t, int16_t, vpmin_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VpminS32N, int32_t, int32_t, vpmin_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VpminU8N, uint8_t, uint8_t, vpmin_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VpminU16N, uint16_t, uint16_t, vpmin_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VpminU32N, uint32_t, uint32_t, vpmin_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VpminF32N, float32_t, float32_t, vpmin_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VpminnmF32N, float32_t, float32_t, vpminnm_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VpminnmqF32N, float32_t, float32_t, vpminnmq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VpminnmqF64N, float64_t, float64_t, vpminnmq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VpminqS8N, int8_t, int8_t, vpminq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VpminqS16N, int16_t, int16_t, vpminq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VpminqS32N, int32_t, int32_t, vpminq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VpminqU8N, uint8_t, uint8_t, vpminq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VpminqU16N, uint16_t, uint16_t, vpminq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VpminqU32N, uint32_t, uint32_t, vpminq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VpminqF32N, float32_t, float32_t, vpminq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VpminqF64N, float64_t, float64_t, vpminq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VqaddS8N, int8_t, int8_t, vqadd_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VqaddS16N, int16_t, int16_t, vqadd_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VqaddS32N, int32_t, int32_t, vqadd_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VqaddS64N, int64_t, int64_t, vqadd_s64, vst1_s64, vld1_s64, 1, 1) LOOP2(VqaddU8N, uint8_t, uint8_t, vqadd_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VqaddU16N, uint16_t, uint16_t, vqadd_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VqaddU32N, uint32_t, uint32_t, vqadd_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VqaddU64N, uint64_t, uint64_t, vqadd_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VqaddbS8N, int8_t, int8_t, vqaddb_s8, save, load, 1, 1) LOOP2(VqaddbU8N, uint8_t, uint8_t, vqaddb_u8, save, load, 1, 1) LOOP2(VqadddS64N, int64_t, int64_t, vqaddd_s64, save, load, 1, 1) LOOP2(VqadddU64N, uint64_t, uint64_t, vqaddd_u64, save, load, 1, 1) LOOP2(VqaddhS16N, int16_t, int16_t, vqaddh_s16, save, load, 1, 1) LOOP2(VqaddhU16N, uint16_t, uint16_t, vqaddh_u16, save, load, 1, 1) LOOP2(VqaddqS8N, int8_t, int8_t, vqaddq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VqaddqS16N, int16_t, int16_t, vqaddq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VqaddqS32N, int32_t, int32_t, vqaddq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VqaddqS64N, int64_t, int64_t, vqaddq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(VqaddqU8N, uint8_t, uint8_t, vqaddq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VqaddqU16N, uint16_t, uint16_t, vqaddq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VqaddqU32N, uint32_t, uint32_t, vqaddq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VqaddqU64N, uint64_t, uint64_t, vqaddq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VqaddsS32N, int32_t, int32_t, vqadds_s32, save, load, 1, 1) LOOP2(VqaddsU32N, uint32_t, uint32_t, vqadds_u32, save, load, 1, 1) LOOP2(VqdmulhS16N, int16_t, int16_t, vqdmulh_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VqdmulhS32N, int32_t, int32_t, vqdmulh_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VqdmulhhS16N, int16_t, int16_t, vqdmulhh_s16, save, load, 1, 1) LOOP2(VqdmulhqS16N, int16_t, int16_t, vqdmulhq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VqdmulhqS32N, int32_t, int32_t, vqdmulhq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VqdmulhsS32N, int32_t, int32_t, vqdmulhs_s32, save, load, 1, 1) LOOP2(VqrdmulhS16N, int16_t, int16_t, vqrdmulh_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VqrdmulhS32N, int32_t, int32_t, vqrdmulh_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VqrdmulhhS16N, int16_t, int16_t, vqrdmulhh_s16, save, load, 1, 1) LOOP2(VqrdmulhqS16N, int16_t, int16_t, vqrdmulhq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VqrdmulhqS32N, int32_t, int32_t, vqrdmulhq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VqrdmulhsS32N, int32_t, int32_t, vqrdmulhs_s32, save, load, 1, 1) LOOP2(VqrshlS8N, int8_t, int8_t, vqrshl_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VqrshlS16N, int16_t, int16_t, vqrshl_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VqrshlS32N, int32_t, int32_t, vqrshl_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VqrshlS64N, int64_t, int64_t, vqrshl_s64, vst1_s64, vld1_s64, 1, 1) LOOP2(VqrshlbS8N, int8_t, int8_t, vqrshlb_s8, save, load, 1, 1) LOOP2(VqrshldS64N, int64_t, int64_t, vqrshld_s64, save, load, 1, 1) LOOP2(VqrshlhS16N, int16_t, int16_t, vqrshlh_s16, save, load, 1, 1) LOOP2(VqrshlqS8N, int8_t, int8_t, vqrshlq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VqrshlqS16N, int16_t, int16_t, vqrshlq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VqrshlqS32N, int32_t, int32_t, vqrshlq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VqrshlqS64N, int64_t, int64_t, vqrshlq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(VqrshlsS32N, int32_t, int32_t, vqrshls_s32, save, load, 1, 1) LOOP2(VqshlS8N, int8_t, int8_t, vqshl_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VqshlS16N, int16_t, int16_t, vqshl_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VqshlS32N, int32_t, int32_t, vqshl_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VqshlS64N, int64_t, int64_t, vqshl_s64, vst1_s64, vld1_s64, 1, 1) LOOP2(VqshlbS8N, int8_t, int8_t, vqshlb_s8, save, load, 1, 1) LOOP2(VqshldS64N, int64_t, int64_t, vqshld_s64, save, load, 1, 1) LOOP2(VqshlhS16N, int16_t, int16_t, vqshlh_s16, save, load, 1, 1) LOOP2(VqshlqS8N, int8_t, int8_t, vqshlq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VqshlqS16N, int16_t, int16_t, vqshlq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VqshlqS32N, int32_t, int32_t, vqshlq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VqshlqS64N, int64_t, int64_t, vqshlq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(VqshlsS32N, int32_t, int32_t, vqshls_s32, save, load, 1, 1) LOOP2(VqsubS8N, int8_t, int8_t, vqsub_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VqsubS16N, int16_t, int16_t, vqsub_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VqsubS32N, int32_t, int32_t, vqsub_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VqsubS64N, int64_t, int64_t, vqsub_s64, vst1_s64, vld1_s64, 1, 1) LOOP2(VqsubU8N, uint8_t, uint8_t, vqsub_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VqsubU16N, uint16_t, uint16_t, vqsub_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VqsubU32N, uint32_t, uint32_t, vqsub_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VqsubU64N, uint64_t, uint64_t, vqsub_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VqsubbS8N, int8_t, int8_t, vqsubb_s8, save, load, 1, 1) LOOP2(VqsubbU8N, uint8_t, uint8_t, vqsubb_u8, save, load, 1, 1) LOOP2(VqsubdS64N, int64_t, int64_t, vqsubd_s64, save, load, 1, 1) LOOP2(VqsubdU64N, uint64_t, uint64_t, vqsubd_u64, save, load, 1, 1) LOOP2(VqsubhS16N, int16_t, int16_t, vqsubh_s16, save, load, 1, 1) LOOP2(VqsubhU16N, uint16_t, uint16_t, vqsubh_u16, save, load, 1, 1) LOOP2(VqsubqS8N, int8_t, int8_t, vqsubq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VqsubqS16N, int16_t, int16_t, vqsubq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VqsubqS32N, int32_t, int32_t, vqsubq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VqsubqS64N, int64_t, int64_t, vqsubq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(VqsubqU8N, uint8_t, uint8_t, vqsubq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VqsubqU16N, uint16_t, uint16_t, vqsubq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VqsubqU32N, uint32_t, uint32_t, vqsubq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VqsubqU64N, uint64_t, uint64_t, vqsubq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VqsubsS32N, int32_t, int32_t, vqsubs_s32, save, load, 1, 1) LOOP2(VqsubsU32N, uint32_t, uint32_t, vqsubs_u32, save, load, 1, 1) LOOP2(Vqtbl1QU8N, uint8_t, uint8_t, vqtbl1q_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(Vrax1QU64N, uint64_t, uint64_t, vrax1q_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VrecpsF32N, float32_t, float32_t, vrecps_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VrecpsF64N, float64_t, float64_t, vrecps_f64, vst1_f64, vld1_f64, 1, 1) LOOP2(VrecpsdF64N, float64_t, float64_t, vrecpsd_f64, save, load, 1, 1) LOOP2(VrecpsqF32N, float32_t, float32_t, vrecpsq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VrecpsqF64N, float64_t, float64_t, vrecpsq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VrecpssF32N, float32_t, float32_t, vrecpss_f32, save, load, 1, 1) LOOP2(VrhaddS8N, int8_t, int8_t, vrhadd_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VrhaddS16N, int16_t, int16_t, vrhadd_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VrhaddS32N, int32_t, int32_t, vrhadd_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VrhaddU8N, uint8_t, uint8_t, vrhadd_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VrhaddU16N, uint16_t, uint16_t, vrhadd_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VrhaddU32N, uint32_t, uint32_t, vrhadd_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VrhaddqS8N, int8_t, int8_t, vrhaddq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VrhaddqS16N, int16_t, int16_t, vrhaddq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VrhaddqS32N, int32_t, int32_t, vrhaddq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VrhaddqU8N, uint8_t, uint8_t, vrhaddq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VrhaddqU16N, uint16_t, uint16_t, vrhaddq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VrhaddqU32N, uint32_t, uint32_t, vrhaddq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VrshlS8N, int8_t, int8_t, vrshl_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VrshlS16N, int16_t, int16_t, vrshl_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VrshlS32N, int32_t, int32_t, vrshl_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VrshlS64N, int64_t, int64_t, vrshl_s64, vst1_s64, vld1_s64, 1, 1) LOOP2(VrshldS64N, int64_t, int64_t, vrshld_s64, save, load, 1, 1) LOOP2(VrshlqS8N, int8_t, int8_t, vrshlq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VrshlqS16N, int16_t, int16_t, vrshlq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VrshlqS32N, int32_t, int32_t, vrshlq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VrshlqS64N, int64_t, int64_t, vrshlq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(VrsqrtsF32N, float32_t, float32_t, vrsqrts_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VrsqrtsF64N, float64_t, float64_t, vrsqrts_f64, vst1_f64, vld1_f64, 1, 1) LOOP2(VrsqrtsdF64N, float64_t, float64_t, vrsqrtsd_f64, save, load, 1, 1) LOOP2(VrsqrtsqF32N, float32_t, float32_t, vrsqrtsq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VrsqrtsqF64N, float64_t, float64_t, vrsqrtsq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VrsqrtssF32N, float32_t, float32_t, vrsqrtss_f32, save, load, 1, 1) LOOP2(Vsha1Su1QU32N, uint32_t, uint32_t, vsha1su1q_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(Vsha256Su0QU32N, uint32_t, uint32_t, vsha256su0q_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(Vsha512Su0QU64N, uint64_t, uint64_t, vsha512su0q_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VshlS8N, int8_t, int8_t, vshl_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VshlS16N, int16_t, int16_t, vshl_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VshlS32N, int32_t, int32_t, vshl_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VshlS64N, int64_t, int64_t, vshl_s64, vst1_s64, vld1_s64, 1, 1) LOOP2(VshldS64N, int64_t, int64_t, vshld_s64, save, load, 1, 1) LOOP2(VshlqS8N, int8_t, int8_t, vshlq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VshlqS16N, int16_t, int16_t, vshlq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VshlqS32N, int32_t, int32_t, vshlq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VshlqS64N, int64_t, int64_t, vshlq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(Vsm4EkeyqU32N, uint32_t, uint32_t, vsm4ekeyq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(Vsm4EqU32N, uint32_t, uint32_t, vsm4eq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VsubS8N, int8_t, int8_t, vsub_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(VsubS16N, int16_t, int16_t, vsub_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(VsubS32N, int32_t, int32_t, vsub_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(VsubS64N, int64_t, int64_t, vsub_s64, vst1_s64, vld1_s64, 1, 1) LOOP2(VsubU8N, uint8_t, uint8_t, vsub_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VsubU16N, uint16_t, uint16_t, vsub_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VsubU32N, uint32_t, uint32_t, vsub_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VsubU64N, uint64_t, uint64_t, vsub_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VsubF32N, float32_t, float32_t, vsub_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(VsubF64N, float64_t, float64_t, vsub_f64, vst1_f64, vld1_f64, 1, 1) LOOP2(VsubdS64N, int64_t, int64_t, vsubd_s64, save, load, 1, 1) LOOP2(VsubdU64N, uint64_t, uint64_t, vsubd_u64, save, load, 1, 1) LOOP2(VsubqS8N, int8_t, int8_t, vsubq_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(VsubqS16N, int16_t, int16_t, vsubq_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(VsubqS32N, int32_t, int32_t, vsubq_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(VsubqS64N, int64_t, int64_t, vsubq_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(VsubqU8N, uint8_t, uint8_t, vsubq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VsubqU16N, uint16_t, uint16_t, vsubq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VsubqU32N, uint32_t, uint32_t, vsubq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VsubqU64N, uint64_t, uint64_t, vsubq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(VsubqF32N, float32_t, float32_t, vsubq_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(VsubqF64N, float64_t, float64_t, vsubq_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(Vtbl1S8N, int8_t, int8_t, vtbl1_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(Vtbl1U8N, uint8_t, uint8_t, vtbl1_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(Vtrn1S8N, int8_t, int8_t, vtrn1_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(Vtrn1S16N, int16_t, int16_t, vtrn1_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(Vtrn1S32N, int32_t, int32_t, vtrn1_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(Vtrn1U8N, uint8_t, uint8_t, vtrn1_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(Vtrn1U16N, uint16_t, uint16_t, vtrn1_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(Vtrn1U32N, uint32_t, uint32_t, vtrn1_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(Vtrn1F32N, float32_t, float32_t, vtrn1_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(Vtrn1QS8N, int8_t, int8_t, vtrn1q_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(Vtrn1QS16N, int16_t, int16_t, vtrn1q_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(Vtrn1QS32N, int32_t, int32_t, vtrn1q_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(Vtrn1QS64N, int64_t, int64_t, vtrn1q_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(Vtrn1QU8N, uint8_t, uint8_t, vtrn1q_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(Vtrn1QU16N, uint16_t, uint16_t, vtrn1q_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(Vtrn1QU32N, uint32_t, uint32_t, vtrn1q_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(Vtrn1QU64N, uint64_t, uint64_t, vtrn1q_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(Vtrn1QF32N, float32_t, float32_t, vtrn1q_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(Vtrn1QF64N, float64_t, float64_t, vtrn1q_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(Vtrn2S8N, int8_t, int8_t, vtrn2_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(Vtrn2S16N, int16_t, int16_t, vtrn2_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(Vtrn2S32N, int32_t, int32_t, vtrn2_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(Vtrn2U8N, uint8_t, uint8_t, vtrn2_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(Vtrn2U16N, uint16_t, uint16_t, vtrn2_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(Vtrn2U32N, uint32_t, uint32_t, vtrn2_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(Vtrn2F32N, float32_t, float32_t, vtrn2_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(Vtrn2QS8N, int8_t, int8_t, vtrn2q_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(Vtrn2QS16N, int16_t, int16_t, vtrn2q_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(Vtrn2QS32N, int32_t, int32_t, vtrn2q_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(Vtrn2QS64N, int64_t, int64_t, vtrn2q_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(Vtrn2QU8N, uint8_t, uint8_t, vtrn2q_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(Vtrn2QU16N, uint16_t, uint16_t, vtrn2q_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(Vtrn2QU32N, uint32_t, uint32_t, vtrn2q_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(Vtrn2QU64N, uint64_t, uint64_t, vtrn2q_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(Vtrn2QF32N, float32_t, float32_t, vtrn2q_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(Vtrn2QF64N, float64_t, float64_t, vtrn2q_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(VtstS8N, uint8_t, int8_t, vtst_s8, vst1_u8, vld1_s8, 8, 8) LOOP2(VtstS16N, uint16_t, int16_t, vtst_s16, vst1_u16, vld1_s16, 4, 4) LOOP2(VtstS32N, uint32_t, int32_t, vtst_s32, vst1_u32, vld1_s32, 2, 2) LOOP2(VtstS64N, uint64_t, int64_t, vtst_s64, vst1_u64, vld1_s64, 1, 1) LOOP2(VtstU8N, uint8_t, uint8_t, vtst_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(VtstU16N, uint16_t, uint16_t, vtst_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(VtstU32N, uint32_t, uint32_t, vtst_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(VtstU64N, uint64_t, uint64_t, vtst_u64, vst1_u64, vld1_u64, 1, 1) LOOP2(VtstdS64N, uint64_t, int64_t, vtstd_s64, save, load, 1, 1) LOOP2(VtstdU64N, uint64_t, uint64_t, vtstd_u64, save, load, 1, 1) LOOP2(VtstqS8N, uint8_t, int8_t, vtstq_s8, vst1q_u8, vld1q_s8, 16, 16) LOOP2(VtstqS16N, uint16_t, int16_t, vtstq_s16, vst1q_u16, vld1q_s16, 8, 8) LOOP2(VtstqS32N, uint32_t, int32_t, vtstq_s32, vst1q_u32, vld1q_s32, 4, 4) LOOP2(VtstqS64N, uint64_t, int64_t, vtstq_s64, vst1q_u64, vld1q_s64, 2, 2) LOOP2(VtstqU8N, uint8_t, uint8_t, vtstq_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(VtstqU16N, uint16_t, uint16_t, vtstq_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(VtstqU32N, uint32_t, uint32_t, vtstq_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(VtstqU64N, uint64_t, uint64_t, vtstq_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(Vuzp1S8N, int8_t, int8_t, vuzp1_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(Vuzp1S16N, int16_t, int16_t, vuzp1_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(Vuzp1S32N, int32_t, int32_t, vuzp1_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(Vuzp1U8N, uint8_t, uint8_t, vuzp1_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(Vuzp1U16N, uint16_t, uint16_t, vuzp1_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(Vuzp1U32N, uint32_t, uint32_t, vuzp1_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(Vuzp1F32N, float32_t, float32_t, vuzp1_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(Vuzp1QS8N, int8_t, int8_t, vuzp1q_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(Vuzp1QS16N, int16_t, int16_t, vuzp1q_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(Vuzp1QS32N, int32_t, int32_t, vuzp1q_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(Vuzp1QS64N, int64_t, int64_t, vuzp1q_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(Vuzp1QU8N, uint8_t, uint8_t, vuzp1q_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(Vuzp1QU16N, uint16_t, uint16_t, vuzp1q_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(Vuzp1QU32N, uint32_t, uint32_t, vuzp1q_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(Vuzp1QU64N, uint64_t, uint64_t, vuzp1q_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(Vuzp1QF32N, float32_t, float32_t, vuzp1q_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(Vuzp1QF64N, float64_t, float64_t, vuzp1q_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(Vuzp2S8N, int8_t, int8_t, vuzp2_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(Vuzp2S16N, int16_t, int16_t, vuzp2_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(Vuzp2S32N, int32_t, int32_t, vuzp2_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(Vuzp2U8N, uint8_t, uint8_t, vuzp2_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(Vuzp2U16N, uint16_t, uint16_t, vuzp2_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(Vuzp2U32N, uint32_t, uint32_t, vuzp2_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(Vuzp2F32N, float32_t, float32_t, vuzp2_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(Vuzp2QS8N, int8_t, int8_t, vuzp2q_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(Vuzp2QS16N, int16_t, int16_t, vuzp2q_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(Vuzp2QS32N, int32_t, int32_t, vuzp2q_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(Vuzp2QS64N, int64_t, int64_t, vuzp2q_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(Vuzp2QU8N, uint8_t, uint8_t, vuzp2q_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(Vuzp2QU16N, uint16_t, uint16_t, vuzp2q_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(Vuzp2QU32N, uint32_t, uint32_t, vuzp2q_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(Vuzp2QU64N, uint64_t, uint64_t, vuzp2q_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(Vuzp2QF32N, float32_t, float32_t, vuzp2q_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(Vuzp2QF64N, float64_t, float64_t, vuzp2q_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(Vzip1S8N, int8_t, int8_t, vzip1_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(Vzip1S16N, int16_t, int16_t, vzip1_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(Vzip1S32N, int32_t, int32_t, vzip1_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(Vzip1U8N, uint8_t, uint8_t, vzip1_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(Vzip1U16N, uint16_t, uint16_t, vzip1_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(Vzip1U32N, uint32_t, uint32_t, vzip1_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(Vzip1F32N, float32_t, float32_t, vzip1_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(Vzip1QS8N, int8_t, int8_t, vzip1q_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(Vzip1QS16N, int16_t, int16_t, vzip1q_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(Vzip1QS32N, int32_t, int32_t, vzip1q_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(Vzip1QS64N, int64_t, int64_t, vzip1q_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(Vzip1QU8N, uint8_t, uint8_t, vzip1q_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(Vzip1QU16N, uint16_t, uint16_t, vzip1q_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(Vzip1QU32N, uint32_t, uint32_t, vzip1q_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(Vzip1QU64N, uint64_t, uint64_t, vzip1q_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(Vzip1QF32N, float32_t, float32_t, vzip1q_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(Vzip1QF64N, float64_t, float64_t, vzip1q_f64, vst1q_f64, vld1q_f64, 2, 2) LOOP2(Vzip2S8N, int8_t, int8_t, vzip2_s8, vst1_s8, vld1_s8, 8, 8) LOOP2(Vzip2S16N, int16_t, int16_t, vzip2_s16, vst1_s16, vld1_s16, 4, 4) LOOP2(Vzip2S32N, int32_t, int32_t, vzip2_s32, vst1_s32, vld1_s32, 2, 2) LOOP2(Vzip2U8N, uint8_t, uint8_t, vzip2_u8, vst1_u8, vld1_u8, 8, 8) LOOP2(Vzip2U16N, uint16_t, uint16_t, vzip2_u16, vst1_u16, vld1_u16, 4, 4) LOOP2(Vzip2U32N, uint32_t, uint32_t, vzip2_u32, vst1_u32, vld1_u32, 2, 2) LOOP2(Vzip2F32N, float32_t, float32_t, vzip2_f32, vst1_f32, vld1_f32, 2, 2) LOOP2(Vzip2QS8N, int8_t, int8_t, vzip2q_s8, vst1q_s8, vld1q_s8, 16, 16) LOOP2(Vzip2QS16N, int16_t, int16_t, vzip2q_s16, vst1q_s16, vld1q_s16, 8, 8) LOOP2(Vzip2QS32N, int32_t, int32_t, vzip2q_s32, vst1q_s32, vld1q_s32, 4, 4) LOOP2(Vzip2QS64N, int64_t, int64_t, vzip2q_s64, vst1q_s64, vld1q_s64, 2, 2) LOOP2(Vzip2QU8N, uint8_t, uint8_t, vzip2q_u8, vst1q_u8, vld1q_u8, 16, 16) LOOP2(Vzip2QU16N, uint16_t, uint16_t, vzip2q_u16, vst1q_u16, vld1q_u16, 8, 8) LOOP2(Vzip2QU32N, uint32_t, uint32_t, vzip2q_u32, vst1q_u32, vld1q_u32, 4, 4) LOOP2(Vzip2QU64N, uint64_t, uint64_t, vzip2q_u64, vst1q_u64, vld1q_u64, 2, 2) LOOP2(Vzip2QF32N, float32_t, float32_t, vzip2q_f32, vst1q_f32, vld1q_f32, 4, 4) LOOP2(Vzip2QF64N, float64_t, float64_t, vzip2q_f64, vst1q_f64, vld1q_f64, 2, 2) ================================================ FILE: arm/neon/loops.go ================================================ package neon import ( "github.com/alivanz/go-simd/arm" ) /* #include */ import "C" // Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdS8N VabdS8N //go:noescape func VabdS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdS16N VabdS16N //go:noescape func VabdS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdS32N VabdS32N //go:noescape func VabdS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdU8N VabdU8N //go:noescape func VabdU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdU16N VabdU16N //go:noescape func VabdU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdU32N VabdU32N //go:noescape func VabdU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdF32N VabdF32N //go:noescape func VabdF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdF64N VabdF64N //go:noescape func VabdF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabddF64N VabddF64N //go:noescape func VabddF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqS8N VabdqS8N //go:noescape func VabdqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqS16N VabdqS16N //go:noescape func VabdqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqS32N VabdqS32N //go:noescape func VabdqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqU8N VabdqU8N //go:noescape func VabdqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqU16N VabdqU16N //go:noescape func VabdqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqU32N VabdqU32N //go:noescape func VabdqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqF32N VabdqF32N //go:noescape func VabdqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdqF64N VabdqF64N //go:noescape func VabdqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabdsF32N VabdsF32N //go:noescape func VabdsF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsS8N VabsS8N //go:noescape func VabsS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsS16N VabsS16N //go:noescape func VabsS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsS32N VabsS32N //go:noescape func VabsS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsS64N VabsS64N //go:noescape func VabsS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsF32N VabsF32N //go:noescape func VabsF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsF64N VabsF64N //go:noescape func VabsF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsdS64N VabsdS64N //go:noescape func VabsdS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsqS8N VabsqS8N //go:noescape func VabsqS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsqS16N VabsqS16N //go:noescape func VabsqS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsqS32N VabsqS32N //go:noescape func VabsqS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsqS64N VabsqS64N //go:noescape func VabsqS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsqF32N VabsqF32N //go:noescape func VabsqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VabsqF64N VabsqF64N //go:noescape func VabsqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddS8N VaddS8N //go:noescape func VaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddS16N VaddS16N //go:noescape func VaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddS32N VaddS32N //go:noescape func VaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddS64N VaddS64N //go:noescape func VaddS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddU8N VaddU8N //go:noescape func VaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddU16N VaddU16N //go:noescape func VaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddU32N VaddU32N //go:noescape func VaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddU64N VaddU64N //go:noescape func VaddU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VaddF32N VaddF32N //go:noescape func VaddF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VaddF64N VaddF64N //go:noescape func VaddF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VadddS64N VadddS64N //go:noescape func VadddS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VadddU64N VadddU64N //go:noescape func VadddU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqS8N VaddqS8N //go:noescape func VaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqS16N VaddqS16N //go:noescape func VaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqS32N VaddqS32N //go:noescape func VaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqS64N VaddqS64N //go:noescape func VaddqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqU8N VaddqU8N //go:noescape func VaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqU16N VaddqU16N //go:noescape func VaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqU32N VaddqU32N //go:noescape func VaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VaddqU64N VaddqU64N //go:noescape func VaddqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VaddqF32N VaddqF32N //go:noescape func VaddqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VaddqF64N VaddqF64N //go:noescape func VaddqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvS8N VaddvS8N //go:noescape func VaddvS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvS16N VaddvS16N //go:noescape func VaddvS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Add across vector // //go:linkname VaddvS32N VaddvS32N //go:noescape func VaddvS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvU8N VaddvU8N //go:noescape func VaddvU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvU16N VaddvU16N //go:noescape func VaddvU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Add across vector // //go:linkname VaddvU32N VaddvU32N //go:noescape func VaddvU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Floating-point add across vector // //go:linkname VaddvF32N VaddvF32N //go:noescape func VaddvF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvqS8N VaddvqS8N //go:noescape func VaddvqS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvqS16N VaddvqS16N //go:noescape func VaddvqS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvqS32N VaddvqS32N //go:noescape func VaddvqS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Add across vector // //go:linkname VaddvqS64N VaddvqS64N //go:noescape func VaddvqS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvqU8N VaddvqU8N //go:noescape func VaddvqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvqU16N VaddvqU16N //go:noescape func VaddvqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the scalar result to the destination SIMD&FP register. // //go:linkname VaddvqU32N VaddvqU32N //go:noescape func VaddvqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Add across vector // //go:linkname VaddvqU64N VaddvqU64N //go:noescape func VaddvqU64N(r *arm.Uint64, v0 *arm.Uint64, n int32) // Floating-point add across vector // //go:linkname VaddvqF32N VaddvqF32N //go:noescape func VaddvqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point add across vector // //go:linkname VaddvqF64N VaddvqF64N //go:noescape func VaddvqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // AES single round decryption. // //go:linkname VaesdqU8N VaesdqU8N //go:noescape func VaesdqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // AES single round encryption. // //go:linkname VaeseqU8N VaeseqU8N //go:noescape func VaeseqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // AES inverse mix columns. // //go:linkname VaesimcqU8N VaesimcqU8N //go:noescape func VaesimcqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // AES mix columns. // //go:linkname VaesmcqU8N VaesmcqU8N //go:noescape func VaesmcqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandS8N VandS8N //go:noescape func VandS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandS16N VandS16N //go:noescape func VandS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandS32N VandS32N //go:noescape func VandS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandS64N VandS64N //go:noescape func VandS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandU8N VandU8N //go:noescape func VandU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandU16N VandU16N //go:noescape func VandU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandU32N VandU32N //go:noescape func VandU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandU64N VandU64N //go:noescape func VandU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqS8N VandqS8N //go:noescape func VandqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqS16N VandqS16N //go:noescape func VandqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqS32N VandqS32N //go:noescape func VandqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqS64N VandqS64N //go:noescape func VandqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqU8N VandqU8N //go:noescape func VandqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqU16N VandqU16N //go:noescape func VandqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqU32N VandqU32N //go:noescape func VandqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VandqU64N VandqU64N //go:noescape func VandqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicS8N VbicS8N //go:noescape func VbicS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicS16N VbicS16N //go:noescape func VbicS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicS32N VbicS32N //go:noescape func VbicS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicS64N VbicS64N //go:noescape func VbicS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicU8N VbicU8N //go:noescape func VbicU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicU16N VbicU16N //go:noescape func VbicU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicU32N VbicU32N //go:noescape func VbicU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicU64N VbicU64N //go:noescape func VbicU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqS8N VbicqS8N //go:noescape func VbicqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqS16N VbicqS16N //go:noescape func VbicqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqS32N VbicqS32N //go:noescape func VbicqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqS64N VbicqS64N //go:noescape func VbicqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqU8N VbicqU8N //go:noescape func VbicqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqU16N VbicqU16N //go:noescape func VbicqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqU32N VbicqU32N //go:noescape func VbicqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP register and the complement of the second source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname VbicqU64N VbicqU64N //go:noescape func VbicqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Complex Add. // //go:linkname VcaddRot270F32N VcaddRot270F32N //go:noescape func VcaddRot270F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Complex Add. // //go:linkname VcaddRot90F32N VcaddRot90F32N //go:noescape func VcaddRot90F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Complex Add. // //go:linkname VcaddqRot270F32N VcaddqRot270F32N //go:noescape func VcaddqRot270F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Complex Add. // //go:linkname VcaddqRot270F64N VcaddqRot270F64N //go:noescape func VcaddqRot270F64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Complex Add. // //go:linkname VcaddqRot90F32N VcaddqRot90F32N //go:noescape func VcaddqRot90F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Complex Add. // //go:linkname VcaddqRot90F64N VcaddqRot90F64N //go:noescape func VcaddqRot90F64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcageF32N VcageF32N //go:noescape func VcageF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcageF64N VcageF64N //go:noescape func VcageF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagedF64N VcagedF64N //go:noescape func VcagedF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcageqF32N VcageqF32N //go:noescape func VcageqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcageqF64N VcageqF64N //go:noescape func VcageqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagesF32N VcagesF32N //go:noescape func VcagesF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagtF32N VcagtF32N //go:noescape func VcagtF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagtF64N VcagtF64N //go:noescape func VcagtF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagtdF64N VcagtdF64N //go:noescape func VcagtdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagtqF32N VcagtqF32N //go:noescape func VcagtqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagtqF64N VcagtqF64N //go:noescape func VcagtqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the second source SIMD&FP register and if the first value is greater than the second value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcagtsF32N VcagtsF32N //go:noescape func VcagtsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point absolute compare less than or equal // //go:linkname VcaleF32N VcaleF32N //go:noescape func VcaleF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point absolute compare less than or equal // //go:linkname VcaleF64N VcaleF64N //go:noescape func VcaleF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point absolute compare less than or equal // //go:linkname VcaledF64N VcaledF64N //go:noescape func VcaledF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point absolute compare less than or equal // //go:linkname VcaleqF32N VcaleqF32N //go:noescape func VcaleqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point absolute compare less than or equal // //go:linkname VcaleqF64N VcaleqF64N //go:noescape func VcaleqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point absolute compare less than or equal // //go:linkname VcalesF32N VcalesF32N //go:noescape func VcalesF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point absolute compare less than // //go:linkname VcaltF32N VcaltF32N //go:noescape func VcaltF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point absolute compare less than // //go:linkname VcaltF64N VcaltF64N //go:noescape func VcaltF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point absolute compare less than // //go:linkname VcaltdF64N VcaltdF64N //go:noescape func VcaltdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point absolute compare less than // //go:linkname VcaltqF32N VcaltqF32N //go:noescape func VcaltqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point absolute compare less than // //go:linkname VcaltqF64N VcaltqF64N //go:noescape func VcaltqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point absolute compare less than // //go:linkname VcaltsF32N VcaltsF32N //go:noescape func VcaltsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqS8N VceqS8N //go:noescape func VceqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqS16N VceqS16N //go:noescape func VceqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqS32N VceqS32N //go:noescape func VceqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqS64N VceqS64N //go:noescape func VceqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqU8N VceqU8N //go:noescape func VceqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqU16N VceqU16N //go:noescape func VceqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqU32N VceqU32N //go:noescape func VceqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqU64N VceqU64N //go:noescape func VceqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqF32N VceqF32N //go:noescape func VceqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqF64N VceqF64N //go:noescape func VceqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqdS64N VceqdS64N //go:noescape func VceqdS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqdU64N VceqdU64N //go:noescape func VceqdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqdF64N VceqdF64N //go:noescape func VceqdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqS8N VceqqS8N //go:noescape func VceqqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqS16N VceqqS16N //go:noescape func VceqqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqS32N VceqqS32N //go:noescape func VceqqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqS64N VceqqS64N //go:noescape func VceqqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqU8N VceqqU8N //go:noescape func VceqqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqU16N VceqqU16N //go:noescape func VceqqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqU32N VceqqU32N //go:noescape func VceqqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqU64N VceqqU64N //go:noescape func VceqqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqF32N VceqqF32N //go:noescape func VceqqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqqF64N VceqqF64N //go:noescape func VceqqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqsF32N VceqsF32N //go:noescape func VceqsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzS8N VceqzS8N //go:noescape func VceqzS8N(r *arm.Uint8, v0 *arm.Int8, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzS16N VceqzS16N //go:noescape func VceqzS16N(r *arm.Uint16, v0 *arm.Int16, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzS32N VceqzS32N //go:noescape func VceqzS32N(r *arm.Uint32, v0 *arm.Int32, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzS64N VceqzS64N //go:noescape func VceqzS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzU8N VceqzU8N //go:noescape func VceqzU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzU16N VceqzU16N //go:noescape func VceqzU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzU32N VceqzU32N //go:noescape func VceqzU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzU64N VceqzU64N //go:noescape func VceqzU64N(r *arm.Uint64, v0 *arm.Uint64, n int32) // Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzF32N VceqzF32N //go:noescape func VceqzF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzF64N VceqzF64N //go:noescape func VceqzF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzdS64N VceqzdS64N //go:noescape func VceqzdS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzdU64N VceqzdU64N //go:noescape func VceqzdU64N(r *arm.Uint64, v0 *arm.Uint64, n int32) // Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzdF64N VceqzdF64N //go:noescape func VceqzdF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqS8N VceqzqS8N //go:noescape func VceqzqS8N(r *arm.Uint8, v0 *arm.Int8, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqS16N VceqzqS16N //go:noescape func VceqzqS16N(r *arm.Uint16, v0 *arm.Int16, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqS32N VceqzqS32N //go:noescape func VceqzqS32N(r *arm.Uint32, v0 *arm.Int32, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqS64N VceqzqS64N //go:noescape func VceqzqS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqU8N VceqzqU8N //go:noescape func VceqzqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqU16N VceqzqU16N //go:noescape func VceqzqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqU32N VceqzqU32N //go:noescape func VceqzqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqU64N VceqzqU64N //go:noescape func VceqzqU64N(r *arm.Uint64, v0 *arm.Uint64, n int32) // Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqF32N VceqzqF32N //go:noescape func VceqzqF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzqF64N VceqzqF64N //go:noescape func VceqzqF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VceqzsF32N VceqzsF32N //go:noescape func VceqzsF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeS8N VcgeS8N //go:noescape func VcgeS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeS16N VcgeS16N //go:noescape func VcgeS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeS32N VcgeS32N //go:noescape func VcgeS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeS64N VcgeS64N //go:noescape func VcgeS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeU8N VcgeU8N //go:noescape func VcgeU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeU16N VcgeU16N //go:noescape func VcgeU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeU32N VcgeU32N //go:noescape func VcgeU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeU64N VcgeU64N //go:noescape func VcgeU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeF32N VcgeF32N //go:noescape func VcgeF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeF64N VcgeF64N //go:noescape func VcgeF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgedS64N VcgedS64N //go:noescape func VcgedS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgedU64N VcgedU64N //go:noescape func VcgedU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgedF64N VcgedF64N //go:noescape func VcgedF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqS8N VcgeqS8N //go:noescape func VcgeqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqS16N VcgeqS16N //go:noescape func VcgeqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqS32N VcgeqS32N //go:noescape func VcgeqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqS64N VcgeqS64N //go:noescape func VcgeqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqU8N VcgeqU8N //go:noescape func VcgeqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqU16N VcgeqU16N //go:noescape func VcgeqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqU32N VcgeqU32N //go:noescape func VcgeqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqU64N VcgeqU64N //go:noescape func VcgeqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqF32N VcgeqF32N //go:noescape func VcgeqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgeqF64N VcgeqF64N //go:noescape func VcgeqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgesF32N VcgesF32N //go:noescape func VcgesF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezS8N VcgezS8N //go:noescape func VcgezS8N(r *arm.Uint8, v0 *arm.Int8, n int32) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezS16N VcgezS16N //go:noescape func VcgezS16N(r *arm.Uint16, v0 *arm.Int16, n int32) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezS32N VcgezS32N //go:noescape func VcgezS32N(r *arm.Uint32, v0 *arm.Int32, n int32) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezS64N VcgezS64N //go:noescape func VcgezS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezF32N VcgezF32N //go:noescape func VcgezF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezF64N VcgezF64N //go:noescape func VcgezF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezdS64N VcgezdS64N //go:noescape func VcgezdS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezdF64N VcgezdF64N //go:noescape func VcgezdF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezqS8N VcgezqS8N //go:noescape func VcgezqS8N(r *arm.Uint8, v0 *arm.Int8, n int32) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezqS16N VcgezqS16N //go:noescape func VcgezqS16N(r *arm.Uint16, v0 *arm.Int16, n int32) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezqS32N VcgezqS32N //go:noescape func VcgezqS32N(r *arm.Uint32, v0 *arm.Int32, n int32) // Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezqS64N VcgezqS64N //go:noescape func VcgezqS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezqF32N VcgezqF32N //go:noescape func VcgezqF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezqF64N VcgezqF64N //go:noescape func VcgezqF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgezsF32N VcgezsF32N //go:noescape func VcgezsF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtS8N VcgtS8N //go:noescape func VcgtS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtS16N VcgtS16N //go:noescape func VcgtS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtS32N VcgtS32N //go:noescape func VcgtS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtS64N VcgtS64N //go:noescape func VcgtS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtU8N VcgtU8N //go:noescape func VcgtU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtU16N VcgtU16N //go:noescape func VcgtU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtU32N VcgtU32N //go:noescape func VcgtU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtU64N VcgtU64N //go:noescape func VcgtU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtF32N VcgtF32N //go:noescape func VcgtF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtF64N VcgtF64N //go:noescape func VcgtF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtdS64N VcgtdS64N //go:noescape func VcgtdS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtdU64N VcgtdU64N //go:noescape func VcgtdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtdF64N VcgtdF64N //go:noescape func VcgtdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqS8N VcgtqS8N //go:noescape func VcgtqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqS16N VcgtqS16N //go:noescape func VcgtqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqS32N VcgtqS32N //go:noescape func VcgtqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer value is greater than the second signed integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqS64N VcgtqS64N //go:noescape func VcgtqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqU8N VcgtqU8N //go:noescape func VcgtqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqU16N VcgtqU16N //go:noescape func VcgtqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqU32N VcgtqU32N //go:noescape func VcgtqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqU64N VcgtqU64N //go:noescape func VcgtqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqF32N VcgtqF32N //go:noescape func VcgtqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtqF64N VcgtqF64N //go:noescape func VcgtqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtsF32N VcgtsF32N //go:noescape func VcgtsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzS8N VcgtzS8N //go:noescape func VcgtzS8N(r *arm.Uint8, v0 *arm.Int8, n int32) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzS16N VcgtzS16N //go:noescape func VcgtzS16N(r *arm.Uint16, v0 *arm.Int16, n int32) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzS32N VcgtzS32N //go:noescape func VcgtzS32N(r *arm.Uint32, v0 *arm.Int32, n int32) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzS64N VcgtzS64N //go:noescape func VcgtzS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzF32N VcgtzF32N //go:noescape func VcgtzF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzF64N VcgtzF64N //go:noescape func VcgtzF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzdS64N VcgtzdS64N //go:noescape func VcgtzdS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzdF64N VcgtzdF64N //go:noescape func VcgtzdF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzqS8N VcgtzqS8N //go:noescape func VcgtzqS8N(r *arm.Uint8, v0 *arm.Int8, n int32) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzqS16N VcgtzqS16N //go:noescape func VcgtzqS16N(r *arm.Uint16, v0 *arm.Int16, n int32) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzqS32N VcgtzqS32N //go:noescape func VcgtzqS32N(r *arm.Uint32, v0 *arm.Int32, n int32) // Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzqS64N VcgtzqS64N //go:noescape func VcgtzqS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzqF32N VcgtzqF32N //go:noescape func VcgtzqF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzqF64N VcgtzqF64N //go:noescape func VcgtzqF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcgtzsF32N VcgtzsF32N //go:noescape func VcgtzsF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Compare signed less than or equal // //go:linkname VcleS8N VcleS8N //go:noescape func VcleS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Compare signed less than or equal // //go:linkname VcleS16N VcleS16N //go:noescape func VcleS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Compare signed less than or equal // //go:linkname VcleS32N VcleS32N //go:noescape func VcleS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Compare signed less than or equal // //go:linkname VcleS64N VcleS64N //go:noescape func VcleS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare unsigned less than or equal // //go:linkname VcleU8N VcleU8N //go:noescape func VcleU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Compare unsigned less than or equal // //go:linkname VcleU16N VcleU16N //go:noescape func VcleU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Compare unsigned less than or equal // //go:linkname VcleU32N VcleU32N //go:noescape func VcleU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Compare unsigned less than or equal // //go:linkname VcleU64N VcleU64N //go:noescape func VcleU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point compare less than or equal // //go:linkname VcleF32N VcleF32N //go:noescape func VcleF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point compare less than or equal // //go:linkname VcleF64N VcleF64N //go:noescape func VcleF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Compare signed less than or equal // //go:linkname VcledS64N VcledS64N //go:noescape func VcledS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare unsigned less than or equal // //go:linkname VcledU64N VcledU64N //go:noescape func VcledU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point compare less than or equal // //go:linkname VcledF64N VcledF64N //go:noescape func VcledF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Compare signed less than or equal // //go:linkname VcleqS8N VcleqS8N //go:noescape func VcleqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Compare signed less than or equal // //go:linkname VcleqS16N VcleqS16N //go:noescape func VcleqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Compare signed less than or equal // //go:linkname VcleqS32N VcleqS32N //go:noescape func VcleqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Compare signed less than or equal // //go:linkname VcleqS64N VcleqS64N //go:noescape func VcleqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare unsigned less than or equal // //go:linkname VcleqU8N VcleqU8N //go:noescape func VcleqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Compare unsigned less than or equal // //go:linkname VcleqU16N VcleqU16N //go:noescape func VcleqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Compare unsigned less than or equal // //go:linkname VcleqU32N VcleqU32N //go:noescape func VcleqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Compare unsigned less than or equal // //go:linkname VcleqU64N VcleqU64N //go:noescape func VcleqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point compare less than or equal // //go:linkname VcleqF32N VcleqF32N //go:noescape func VcleqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point compare less than or equal // //go:linkname VcleqF64N VcleqF64N //go:noescape func VcleqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point compare less than or equal // //go:linkname VclesF32N VclesF32N //go:noescape func VclesF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezS8N VclezS8N //go:noescape func VclezS8N(r *arm.Uint8, v0 *arm.Int8, n int32) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezS16N VclezS16N //go:noescape func VclezS16N(r *arm.Uint16, v0 *arm.Int16, n int32) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezS32N VclezS32N //go:noescape func VclezS32N(r *arm.Uint32, v0 *arm.Int32, n int32) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezS64N VclezS64N //go:noescape func VclezS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezF32N VclezF32N //go:noescape func VclezF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezF64N VclezF64N //go:noescape func VclezF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezdS64N VclezdS64N //go:noescape func VclezdS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezdF64N VclezdF64N //go:noescape func VclezdF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezqS8N VclezqS8N //go:noescape func VclezqS8N(r *arm.Uint8, v0 *arm.Int8, n int32) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezqS16N VclezqS16N //go:noescape func VclezqS16N(r *arm.Uint16, v0 *arm.Int16, n int32) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezqS32N VclezqS32N //go:noescape func VclezqS32N(r *arm.Uint32, v0 *arm.Int32, n int32) // Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezqS64N VclezqS64N //go:noescape func VclezqS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezqF32N VclezqF32N //go:noescape func VclezqF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezqF64N VclezqF64N //go:noescape func VclezqF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VclezsF32N VclezsF32N //go:noescape func VclezsF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsS8N VclsS8N //go:noescape func VclsS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsS16N VclsS16N //go:noescape func VclsS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsS32N VclsS32N //go:noescape func VclsS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsU8N VclsU8N //go:noescape func VclsU8N(r *arm.Int8, v0 *arm.Uint8, n int32) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsU16N VclsU16N //go:noescape func VclsU16N(r *arm.Int16, v0 *arm.Uint16, n int32) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsU32N VclsU32N //go:noescape func VclsU32N(r *arm.Int32, v0 *arm.Uint32, n int32) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsqS8N VclsqS8N //go:noescape func VclsqS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsqS16N VclsqS16N //go:noescape func VclsqS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsqS32N VclsqS32N //go:noescape func VclsqS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsqU8N VclsqU8N //go:noescape func VclsqU8N(r *arm.Int8, v0 *arm.Uint8, n int32) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsqU16N VclsqU16N //go:noescape func VclsqU16N(r *arm.Int16, v0 *arm.Uint16, n int32) // Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most significant bit itself. // //go:linkname VclsqU32N VclsqU32N //go:noescape func VclsqU32N(r *arm.Int32, v0 *arm.Uint32, n int32) // Compare signed less than // //go:linkname VcltS8N VcltS8N //go:noescape func VcltS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Compare signed less than // //go:linkname VcltS16N VcltS16N //go:noescape func VcltS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Compare signed less than // //go:linkname VcltS32N VcltS32N //go:noescape func VcltS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Compare signed less than // //go:linkname VcltS64N VcltS64N //go:noescape func VcltS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare unsigned less than // //go:linkname VcltU8N VcltU8N //go:noescape func VcltU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Compare unsigned less than // //go:linkname VcltU16N VcltU16N //go:noescape func VcltU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Compare unsigned less than // //go:linkname VcltU32N VcltU32N //go:noescape func VcltU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Compare unsigned less than // //go:linkname VcltU64N VcltU64N //go:noescape func VcltU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point compare less than // //go:linkname VcltF32N VcltF32N //go:noescape func VcltF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point compare less than // //go:linkname VcltF64N VcltF64N //go:noescape func VcltF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Compare signed less than // //go:linkname VcltdS64N VcltdS64N //go:noescape func VcltdS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare unsigned less than // //go:linkname VcltdU64N VcltdU64N //go:noescape func VcltdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point compare less than // //go:linkname VcltdF64N VcltdF64N //go:noescape func VcltdF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Compare signed less than // //go:linkname VcltqS8N VcltqS8N //go:noescape func VcltqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Compare signed less than // //go:linkname VcltqS16N VcltqS16N //go:noescape func VcltqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Compare signed less than // //go:linkname VcltqS32N VcltqS32N //go:noescape func VcltqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Compare signed less than // //go:linkname VcltqS64N VcltqS64N //go:noescape func VcltqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare unsigned less than // //go:linkname VcltqU8N VcltqU8N //go:noescape func VcltqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Compare unsigned less than // //go:linkname VcltqU16N VcltqU16N //go:noescape func VcltqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Compare unsigned less than // //go:linkname VcltqU32N VcltqU32N //go:noescape func VcltqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Compare unsigned less than // //go:linkname VcltqU64N VcltqU64N //go:noescape func VcltqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point compare less than // //go:linkname VcltqF32N VcltqF32N //go:noescape func VcltqF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point compare less than // //go:linkname VcltqF64N VcltqF64N //go:noescape func VcltqF64N(r *arm.Uint64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point compare less than // //go:linkname VcltsF32N VcltsF32N //go:noescape func VcltsF32N(r *arm.Uint32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzS8N VcltzS8N //go:noescape func VcltzS8N(r *arm.Uint8, v0 *arm.Int8, n int32) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzS16N VcltzS16N //go:noescape func VcltzS16N(r *arm.Uint16, v0 *arm.Int16, n int32) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzS32N VcltzS32N //go:noescape func VcltzS32N(r *arm.Uint32, v0 *arm.Int32, n int32) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzS64N VcltzS64N //go:noescape func VcltzS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzF32N VcltzF32N //go:noescape func VcltzF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzF64N VcltzF64N //go:noescape func VcltzF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzdS64N VcltzdS64N //go:noescape func VcltzdS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzdF64N VcltzdF64N //go:noescape func VcltzdF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzqS8N VcltzqS8N //go:noescape func VcltzqS8N(r *arm.Uint8, v0 *arm.Int8, n int32) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzqS16N VcltzqS16N //go:noescape func VcltzqS16N(r *arm.Uint16, v0 *arm.Int16, n int32) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzqS32N VcltzqS32N //go:noescape func VcltzqS32N(r *arm.Uint32, v0 *arm.Int32, n int32) // Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzqS64N VcltzqS64N //go:noescape func VcltzqS64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzqF32N VcltzqF32N //go:noescape func VcltzqF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzqF64N VcltzqF64N //go:noescape func VcltzqF64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VcltzsF32N VcltzsF32N //go:noescape func VcltzsF32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzS8N VclzS8N //go:noescape func VclzS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzS16N VclzS16N //go:noescape func VclzS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzS32N VclzS32N //go:noescape func VclzS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzU8N VclzU8N //go:noescape func VclzU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzU16N VclzU16N //go:noescape func VclzU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzU32N VclzU32N //go:noescape func VclzU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzqS8N VclzqS8N //go:noescape func VclzqS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzqS16N VclzqS16N //go:noescape func VclzqS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzqS32N VclzqS32N //go:noescape func VclzqS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzqU8N VclzqU8N //go:noescape func VclzqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzqU16N VclzqU16N //go:noescape func VclzqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VclzqU32N VclzqU32N //go:noescape func VclzqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VcntS8N VcntS8N //go:noescape func VcntS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VcntU8N VcntU8N //go:noescape func VcntU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VcntqS8N VcntqS8N //go:noescape func VcntqS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VcntqU8N VcntqU8N //go:noescape func VcntqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Join two smaller vectors into a single larger vector // //go:linkname VcombineS8N VcombineS8N //go:noescape func VcombineS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Join two smaller vectors into a single larger vector // //go:linkname VcombineS16N VcombineS16N //go:noescape func VcombineS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Join two smaller vectors into a single larger vector // //go:linkname VcombineS32N VcombineS32N //go:noescape func VcombineS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Join two smaller vectors into a single larger vector // //go:linkname VcombineS64N VcombineS64N //go:noescape func VcombineS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Join two smaller vectors into a single larger vector // //go:linkname VcombineU8N VcombineU8N //go:noescape func VcombineU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Join two smaller vectors into a single larger vector // //go:linkname VcombineU16N VcombineU16N //go:noescape func VcombineU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Join two smaller vectors into a single larger vector // //go:linkname VcombineU32N VcombineU32N //go:noescape func VcombineU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Join two smaller vectors into a single larger vector // //go:linkname VcombineU64N VcombineU64N //go:noescape func VcombineU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Join two smaller vectors into a single larger vector // //go:linkname VcombineF32N VcombineF32N //go:noescape func VcombineF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Join two smaller vectors into a single larger vector // //go:linkname VcombineF64N VcombineF64N //go:noescape func VcombineF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtF32S32N VcvtF32S32N //go:noescape func VcvtF32S32N(r *arm.Float32, v0 *arm.Int32, n int32) // Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtF32U32N VcvtF32U32N //go:noescape func VcvtF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32) // Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtF64S64N VcvtF64S64N //go:noescape func VcvtF64S64N(r *arm.Float64, v0 *arm.Int64, n int32) // Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtF64U64N VcvtF64U64N //go:noescape func VcvtF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32) // Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtS32F32N VcvtS32F32N //go:noescape func VcvtS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtS64F64N VcvtS64F64N //go:noescape func VcvtS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register. // //go:linkname VcvtU32F32N VcvtU32F32N //go:noescape func VcvtU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register. // //go:linkname VcvtU64F64N VcvtU64F64N //go:noescape func VcvtU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaS32F32N VcvtaS32F32N //go:noescape func VcvtaS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaS64F64N VcvtaS64F64N //go:noescape func VcvtaS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaU32F32N VcvtaU32F32N //go:noescape func VcvtaU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaU64F64N VcvtaU64F64N //go:noescape func VcvtaU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtadS64F64N VcvtadS64F64N //go:noescape func VcvtadS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtadU64F64N VcvtadU64F64N //go:noescape func VcvtadU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaqS32F32N VcvtaqS32F32N //go:noescape func VcvtaqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaqS64F64N VcvtaqS64F64N //go:noescape func VcvtaqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaqU32F32N VcvtaqU32F32N //go:noescape func VcvtaqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtaqU64F64N VcvtaqU64F64N //go:noescape func VcvtaqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtasS32F32N VcvtasS32F32N //go:noescape func VcvtasS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties to Away rounding mode and writes the result to the SIMD&FP destination register. // //go:linkname VcvtasU32F32N VcvtasU32F32N //go:noescape func VcvtasU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtdF64S64N VcvtdF64S64N //go:noescape func VcvtdF64S64N(r *arm.Float64, v0 *arm.Int64, n int32) // Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtdF64U64N VcvtdF64U64N //go:noescape func VcvtdF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32) // Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtdS64F64N VcvtdS64F64N //go:noescape func VcvtdS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register. // //go:linkname VcvtdU64F64N VcvtdU64F64N //go:noescape func VcvtdU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmS32F32N VcvtmS32F32N //go:noescape func VcvtmS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmS64F64N VcvtmS64F64N //go:noescape func VcvtmS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmU32F32N VcvtmU32F32N //go:noescape func VcvtmU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmU64F64N VcvtmU64F64N //go:noescape func VcvtmU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmdS64F64N VcvtmdS64F64N //go:noescape func VcvtmdS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmdU64F64N VcvtmdU64F64N //go:noescape func VcvtmdU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmqS32F32N VcvtmqS32F32N //go:noescape func VcvtmqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmqS64F64N VcvtmqS64F64N //go:noescape func VcvtmqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmqU32F32N VcvtmqU32F32N //go:noescape func VcvtmqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmqU64F64N VcvtmqU64F64N //go:noescape func VcvtmqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmsS32F32N VcvtmsS32F32N //go:noescape func VcvtmsS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtmsU32F32N VcvtmsU32F32N //go:noescape func VcvtmsU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnS32F32N VcvtnS32F32N //go:noescape func VcvtnS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnS64F64N VcvtnS64F64N //go:noescape func VcvtnS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnU32F32N VcvtnU32F32N //go:noescape func VcvtnU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnU64F64N VcvtnU64F64N //go:noescape func VcvtnU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtndS64F64N VcvtndS64F64N //go:noescape func VcvtndS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtndU64F64N VcvtndU64F64N //go:noescape func VcvtndU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnqS32F32N VcvtnqS32F32N //go:noescape func VcvtnqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnqS64F64N VcvtnqS64F64N //go:noescape func VcvtnqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnqU32F32N VcvtnqU32F32N //go:noescape func VcvtnqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnqU64F64N VcvtnqU64F64N //go:noescape func VcvtnqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnsS32F32N VcvtnsS32F32N //go:noescape func VcvtnsS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtnsU32F32N VcvtnsU32F32N //go:noescape func VcvtnsU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpS32F32N VcvtpS32F32N //go:noescape func VcvtpS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpS64F64N VcvtpS64F64N //go:noescape func VcvtpS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpU32F32N VcvtpU32F32N //go:noescape func VcvtpU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpU64F64N VcvtpU64F64N //go:noescape func VcvtpU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpdS64F64N VcvtpdS64F64N //go:noescape func VcvtpdS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpdU64F64N VcvtpdU64F64N //go:noescape func VcvtpdU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpqS32F32N VcvtpqS32F32N //go:noescape func VcvtpqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpqS64F64N VcvtpqS64F64N //go:noescape func VcvtpqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpqU32F32N VcvtpqU32F32N //go:noescape func VcvtpqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpqU64F64N VcvtpqU64F64N //go:noescape func VcvtpqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpsS32F32N VcvtpsS32F32N //go:noescape func VcvtpsS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtpsU32F32N VcvtpsU32F32N //go:noescape func VcvtpsU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtqF32S32N VcvtqF32S32N //go:noescape func VcvtqF32S32N(r *arm.Float32, v0 *arm.Int32, n int32) // Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtqF32U32N VcvtqF32U32N //go:noescape func VcvtqF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32) // Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtqF64S64N VcvtqF64S64N //go:noescape func VcvtqF64S64N(r *arm.Float64, v0 *arm.Int64, n int32) // Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtqF64U64N VcvtqF64U64N //go:noescape func VcvtqF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32) // Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtqS32F32N VcvtqS32F32N //go:noescape func VcvtqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtqS64F64N VcvtqS64F64N //go:noescape func VcvtqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register. // //go:linkname VcvtqU32F32N VcvtqU32F32N //go:noescape func VcvtqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register. // //go:linkname VcvtqU64F64N VcvtqU64F64N //go:noescape func VcvtqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtsF32S32N VcvtsF32S32N //go:noescape func VcvtsF32S32N(r *arm.Float32, v0 *arm.Int32, n int32) // Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtsF32U32N VcvtsF32U32N //go:noescape func VcvtsF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32) // Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VcvtsS32F32N VcvtsS32F32N //go:noescape func VcvtsS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding mode, and writes the result to the general-purpose destination register. // //go:linkname VcvtsU32F32N VcvtsU32F32N //go:noescape func VcvtsU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VdivF32N VdivF32N //go:noescape func VdivF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VdivF64N VdivF64N //go:noescape func VdivF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VdivqF32N VdivqF32N //go:noescape func VdivqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VdivqF64N VdivqF64N //go:noescape func VdivqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNS8N VdupNS8N //go:noescape func VdupNS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNS16N VdupNS16N //go:noescape func VdupNS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNS32N VdupNS32N //go:noescape func VdupNS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register. // //go:linkname VdupNS64N VdupNS64N //go:noescape func VdupNS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNU8N VdupNU8N //go:noescape func VdupNU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNU16N VdupNU16N //go:noescape func VdupNU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNU32N VdupNU32N //go:noescape func VdupNU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register. // //go:linkname VdupNU64N VdupNU64N //go:noescape func VdupNU64N(r *arm.Uint64, v0 *arm.Uint64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupNF32N VdupNF32N //go:noescape func VdupNF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP register to the specified vector element of the destination SIMD&FP register. // //go:linkname VdupNF64N VdupNF64N //go:noescape func VdupNF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNS8N VdupqNS8N //go:noescape func VdupqNS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNS16N VdupqNS16N //go:noescape func VdupqNS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNS32N VdupqNS32N //go:noescape func VdupqNS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNS64N VdupqNS64N //go:noescape func VdupqNS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNU8N VdupqNU8N //go:noescape func VdupqNU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNU16N VdupqNU16N //go:noescape func VdupqNU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNU32N VdupqNU32N //go:noescape func VdupqNU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNU64N VdupqNU64N //go:noescape func VdupqNU64N(r *arm.Uint64, v0 *arm.Uint64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNF32N VdupqNF32N //go:noescape func VdupqNF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VdupqNF64N VdupqNF64N //go:noescape func VdupqNF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorS8N VeorS8N //go:noescape func VeorS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorS16N VeorS16N //go:noescape func VeorS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorS32N VeorS32N //go:noescape func VeorS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorS64N VeorS64N //go:noescape func VeorS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorU8N VeorU8N //go:noescape func VeorU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorU16N VeorU16N //go:noescape func VeorU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorU32N VeorU32N //go:noescape func VeorU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorU64N VeorU64N //go:noescape func VeorU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqS8N VeorqS8N //go:noescape func VeorqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqS16N VeorqS16N //go:noescape func VeorqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqS32N VeorqS32N //go:noescape func VeorqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqS64N VeorqS64N //go:noescape func VeorqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqU8N VeorqU8N //go:noescape func VeorqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqU16N VeorqU16N //go:noescape func VeorqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqU32N VeorqU32N //go:noescape func VeorqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source SIMD&FP registers, and places the result in the destination SIMD&FP register. // //go:linkname VeorqU64N VeorqU64N //go:noescape func VeorqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighS8N VgetHighS8N //go:noescape func VgetHighS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighS16N VgetHighS16N //go:noescape func VgetHighS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighS32N VgetHighS32N //go:noescape func VgetHighS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighS64N VgetHighS64N //go:noescape func VgetHighS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighU8N VgetHighU8N //go:noescape func VgetHighU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighU16N VgetHighU16N //go:noescape func VgetHighU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighU32N VgetHighU32N //go:noescape func VgetHighU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighU64N VgetHighU64N //go:noescape func VgetHighU64N(r *arm.Uint64, v0 *arm.Uint64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighF32N VgetHighF32N //go:noescape func VgetHighF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetHighF64N VgetHighF64N //go:noescape func VgetHighF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowS8N VgetLowS8N //go:noescape func VgetLowS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowS16N VgetLowS16N //go:noescape func VgetLowS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowS32N VgetLowS32N //go:noescape func VgetLowS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowS64N VgetLowS64N //go:noescape func VgetLowS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowU8N VgetLowU8N //go:noescape func VgetLowU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowU16N VgetLowU16N //go:noescape func VgetLowU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowU32N VgetLowU32N //go:noescape func VgetLowU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowU64N VgetLowU64N //go:noescape func VgetLowU64N(r *arm.Uint64, v0 *arm.Uint64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowF32N VgetLowF32N //go:noescape func VgetLowF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VgetLowF64N VgetLowF64N //go:noescape func VgetLowF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddS8N VhaddS8N //go:noescape func VhaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddS16N VhaddS16N //go:noescape func VhaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddS32N VhaddS32N //go:noescape func VhaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddU8N VhaddU8N //go:noescape func VhaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddU16N VhaddU16N //go:noescape func VhaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddU32N VhaddU32N //go:noescape func VhaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddqS8N VhaddqS8N //go:noescape func VhaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddqS16N VhaddqS16N //go:noescape func VhaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddqS32N VhaddqS32N //go:noescape func VhaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddqU8N VhaddqU8N //go:noescape func VhaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddqU16N VhaddqU16N //go:noescape func VhaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhaddqU32N VhaddqU32N //go:noescape func VhaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubS8N VhsubS8N //go:noescape func VhsubS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubS16N VhsubS16N //go:noescape func VhsubS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubS32N VhsubS32N //go:noescape func VhsubS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubU8N VhsubU8N //go:noescape func VhsubU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubU16N VhsubU16N //go:noescape func VhsubU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubU32N VhsubU32N //go:noescape func VhsubU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubqS8N VhsubqS8N //go:noescape func VhsubqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubqS16N VhsubqS16N //go:noescape func VhsubqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubqS32N VhsubqS32N //go:noescape func VhsubqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubqU8N VhsubqU8N //go:noescape func VhsubqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubqU16N VhsubqU16N //go:noescape func VhsubqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places each result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VhsubqU32N VhsubqU32N //go:noescape func VhsubqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxS8N VmaxS8N //go:noescape func VmaxS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxS16N VmaxS16N //go:noescape func VmaxS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxS32N VmaxS32N //go:noescape func VmaxS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxU8N VmaxU8N //go:noescape func VmaxU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxU16N VmaxU16N //go:noescape func VmaxU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxU32N VmaxU32N //go:noescape func VmaxU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxF32N VmaxF32N //go:noescape func VmaxF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxF64N VmaxF64N //go:noescape func VmaxF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxnmF32N VmaxnmF32N //go:noescape func VmaxnmF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxnmF64N VmaxnmF64N //go:noescape func VmaxnmF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxnmqF32N VmaxnmqF32N //go:noescape func VmaxnmqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxnmqF64N VmaxnmqF64N //go:noescape func VmaxnmqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VmaxnmvF32N VmaxnmvF32N //go:noescape func VmaxnmvF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Maximum Number across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VmaxnmvqF32N VmaxnmvqF32N //go:noescape func VmaxnmvqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VmaxnmvqF64N VmaxnmvqF64N //go:noescape func VmaxnmvqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqS8N VmaxqS8N //go:noescape func VmaxqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqS16N VmaxqS16N //go:noescape func VmaxqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqS32N VmaxqS32N //go:noescape func VmaxqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqU8N VmaxqU8N //go:noescape func VmaxqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqU16N VmaxqU16N //go:noescape func VmaxqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqU32N VmaxqU32N //go:noescape func VmaxqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqF32N VmaxqF32N //go:noescape func VmaxqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxqF64N VmaxqF64N //go:noescape func VmaxqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VmaxvS8N VmaxvS8N //go:noescape func VmaxvS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VmaxvS16N VmaxvS16N //go:noescape func VmaxvS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxvS32N VmaxvS32N //go:noescape func VmaxvS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VmaxvU8N VmaxvU8N //go:noescape func VmaxvU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VmaxvU16N VmaxvU16N //go:noescape func VmaxvU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmaxvU32N VmaxvU32N //go:noescape func VmaxvU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VmaxvF32N VmaxvF32N //go:noescape func VmaxvF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VmaxvqS8N VmaxvqS8N //go:noescape func VmaxvqS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VmaxvqS16N VmaxvqS16N //go:noescape func VmaxvqS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VmaxvqS32N VmaxvqS32N //go:noescape func VmaxvqS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VmaxvqU8N VmaxvqU8N //go:noescape func VmaxvqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VmaxvqU16N VmaxvqU16N //go:noescape func VmaxvqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VmaxvqU32N VmaxvqU32N //go:noescape func VmaxvqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Floating-point Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VmaxvqF32N VmaxvqF32N //go:noescape func VmaxvqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VmaxvqF64N VmaxvqF64N //go:noescape func VmaxvqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminS8N VminS8N //go:noescape func VminS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminS16N VminS16N //go:noescape func VminS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminS32N VminS32N //go:noescape func VminS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminU8N VminU8N //go:noescape func VminU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminU16N VminU16N //go:noescape func VminU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminU32N VminU32N //go:noescape func VminU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminF32N VminF32N //go:noescape func VminF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminF64N VminF64N //go:noescape func VminF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminnmF32N VminnmF32N //go:noescape func VminnmF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminnmF64N VminnmF64N //go:noescape func VminnmF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminnmqF32N VminnmqF32N //go:noescape func VminnmqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminnmqF64N VminnmqF64N //go:noescape func VminnmqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VminnmvF32N VminnmvF32N //go:noescape func VminnmvF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Minimum Number across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VminnmvqF32N VminnmvqF32N //go:noescape func VminnmvqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VminnmvqF64N VminnmvqF64N //go:noescape func VminnmvqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqS8N VminqS8N //go:noescape func VminqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqS16N VminqS16N //go:noescape func VminqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqS32N VminqS32N //go:noescape func VminqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqU8N VminqU8N //go:noescape func VminqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqU16N VminqU16N //go:noescape func VminqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqU32N VminqU32N //go:noescape func VminqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqF32N VminqF32N //go:noescape func VminqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminqF64N VminqF64N //go:noescape func VminqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VminvS8N VminvS8N //go:noescape func VminvS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VminvS16N VminvS16N //go:noescape func VminvS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminvS32N VminvS32N //go:noescape func VminvS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VminvU8N VminvU8N //go:noescape func VminvU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VminvU16N VminvU16N //go:noescape func VminvU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VminvU32N VminvU32N //go:noescape func VminvU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VminvF32N VminvF32N //go:noescape func VminvF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VminvqS8N VminvqS8N //go:noescape func VminvqS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VminvqS16N VminvqS16N //go:noescape func VminvqS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VminvqS32N VminvqS32N //go:noescape func VminvqS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VminvqU8N VminvqU8N //go:noescape func VminvqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VminvqU16N VminvqU16N //go:noescape func VminvqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VminvqU32N VminvqU32N //go:noescape func VminvqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Floating-point Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VminvqF32N VminvqF32N //go:noescape func VminvqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VminvqF64N VminvqF64N //go:noescape func VminvqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNS8N VmovNS8N //go:noescape func VmovNS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNS16N VmovNS16N //go:noescape func VmovNS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNS32N VmovNS32N //go:noescape func VmovNS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNS64N VmovNS64N //go:noescape func VmovNS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNU8N VmovNU8N //go:noescape func VmovNU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNU16N VmovNU16N //go:noescape func VmovNU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNU32N VmovNU32N //go:noescape func VmovNU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNU64N VmovNU64N //go:noescape func VmovNU64N(r *arm.Uint64, v0 *arm.Uint64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNF32N VmovNF32N //go:noescape func VmovNF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovNF64N VmovNF64N //go:noescape func VmovNF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNS8N VmovqNS8N //go:noescape func VmovqNS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNS16N VmovqNS16N //go:noescape func VmovqNS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNS32N VmovqNS32N //go:noescape func VmovqNS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNS64N VmovqNS64N //go:noescape func VmovqNS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNU8N VmovqNU8N //go:noescape func VmovqNU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNU16N VmovqNU16N //go:noescape func VmovqNU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNU32N VmovqNU32N //go:noescape func VmovqNU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNU64N VmovqNU64N //go:noescape func VmovqNU64N(r *arm.Uint64, v0 *arm.Uint64, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNF32N VmovqNF32N //go:noescape func VmovqNF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the destination SIMD&FP register. // //go:linkname VmovqNF64N VmovqNF64N //go:noescape func VmovqNF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulS8N VmulS8N //go:noescape func VmulS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulS16N VmulS16N //go:noescape func VmulS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulS32N VmulS32N //go:noescape func VmulS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulU8N VmulU8N //go:noescape func VmulU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulU16N VmulU16N //go:noescape func VmulU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulU32N VmulU32N //go:noescape func VmulU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulF32N VmulF32N //go:noescape func VmulF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulF64N VmulF64N //go:noescape func VmulF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqS8N VmulqS8N //go:noescape func VmulqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqS16N VmulqS16N //go:noescape func VmulqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqS32N VmulqS32N //go:noescape func VmulqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqU8N VmulqU8N //go:noescape func VmulqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqU16N VmulqU16N //go:noescape func VmulqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqU32N VmulqU32N //go:noescape func VmulqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqF32N VmulqF32N //go:noescape func VmulqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulqF64N VmulqF64N //go:noescape func VmulqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulxF32N VmulxF32N //go:noescape func VmulxF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulxF64N VmulxF64N //go:noescape func VmulxF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulxdF64N VmulxdF64N //go:noescape func VmulxdF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulxqF32N VmulxqF32N //go:noescape func VmulxqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulxqF64N VmulxqF64N //go:noescape func VmulxqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmulxsF32N VmulxsF32N //go:noescape func VmulxsF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnS8N VmvnS8N //go:noescape func VmvnS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnS16N VmvnS16N //go:noescape func VmvnS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnS32N VmvnS32N //go:noescape func VmvnS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnU8N VmvnU8N //go:noescape func VmvnU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnU16N VmvnU16N //go:noescape func VmvnU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnU32N VmvnU32N //go:noescape func VmvnU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnqS8N VmvnqS8N //go:noescape func VmvnqS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnqS16N VmvnqS16N //go:noescape func VmvnqS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnqS32N VmvnqS32N //go:noescape func VmvnqS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnqU8N VmvnqU8N //go:noescape func VmvnqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnqU16N VmvnqU16N //go:noescape func VmvnqU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the inverse of each value into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VmvnqU32N VmvnqU32N //go:noescape func VmvnqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegS8N VnegS8N //go:noescape func VnegS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegS16N VnegS16N //go:noescape func VnegS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegS32N VnegS32N //go:noescape func VnegS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegS64N VnegS64N //go:noescape func VnegS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegF32N VnegF32N //go:noescape func VnegF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegF64N VnegF64N //go:noescape func VnegF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegdS64N VnegdS64N //go:noescape func VnegdS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegqS8N VnegqS8N //go:noescape func VnegqS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegqS16N VnegqS16N //go:noescape func VnegqS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegqS32N VnegqS32N //go:noescape func VnegqS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value, puts the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegqS64N VnegqS64N //go:noescape func VnegqS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegqF32N VnegqF32N //go:noescape func VnegqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VnegqF64N VnegqF64N //go:noescape func VnegqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornS8N VornS8N //go:noescape func VornS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornS16N VornS16N //go:noescape func VornS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornS32N VornS32N //go:noescape func VornS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornS64N VornS64N //go:noescape func VornS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornU8N VornU8N //go:noescape func VornU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornU16N VornU16N //go:noescape func VornU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornU32N VornU32N //go:noescape func VornU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornU64N VornU64N //go:noescape func VornU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqS8N VornqS8N //go:noescape func VornqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqS16N VornqS16N //go:noescape func VornqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqS32N VornqS32N //go:noescape func VornqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqS64N VornqS64N //go:noescape func VornqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqU8N VornqU8N //go:noescape func VornqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqU16N VornqU16N //go:noescape func VornqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqU32N VornqU32N //go:noescape func VornqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VornqU64N VornqU64N //go:noescape func VornqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrS8N VorrS8N //go:noescape func VorrS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrS16N VorrS16N //go:noescape func VorrS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrS32N VorrS32N //go:noescape func VorrS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrS64N VorrS64N //go:noescape func VorrS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrU8N VorrU8N //go:noescape func VorrU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrU16N VorrU16N //go:noescape func VorrU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrU32N VorrU32N //go:noescape func VorrU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrU64N VorrU64N //go:noescape func VorrU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqS8N VorrqS8N //go:noescape func VorrqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqS16N VorrqS16N //go:noescape func VorrqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqS32N VorrqS32N //go:noescape func VorrqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqS64N VorrqS64N //go:noescape func VorrqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqU8N VorrqU8N //go:noescape func VorrqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqU16N VorrqU16N //go:noescape func VorrqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqU32N VorrqU32N //go:noescape func VorrqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP registers, and writes the result to the destination SIMD&FP register. // //go:linkname VorrqU64N VorrqU64N //go:noescape func VorrqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddS8N VpaddS8N //go:noescape func VpaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddS16N VpaddS16N //go:noescape func VpaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddS32N VpaddS32N //go:noescape func VpaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddU8N VpaddU8N //go:noescape func VpaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddU16N VpaddU16N //go:noescape func VpaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddU32N VpaddU32N //go:noescape func VpaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpaddF32N VpaddF32N //go:noescape func VpaddF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpadddS64N VpadddS64N //go:noescape func VpadddS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpadddU64N VpadddU64N //go:noescape func VpadddU64N(r *arm.Uint64, v0 *arm.Uint64, n int32) // Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpadddF64N VpadddF64N //go:noescape func VpadddF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqS8N VpaddqS8N //go:noescape func VpaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqS16N VpaddqS16N //go:noescape func VpaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqS32N VpaddqS32N //go:noescape func VpaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqS64N VpaddqS64N //go:noescape func VpaddqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqU8N VpaddqU8N //go:noescape func VpaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqU16N VpaddqU16N //go:noescape func VpaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqU32N VpaddqU32N //go:noescape func VpaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpaddqU64N VpaddqU64N //go:noescape func VpaddqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpaddqF32N VpaddqF32N //go:noescape func VpaddqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpaddqF64N VpaddqF64N //go:noescape func VpaddqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpaddsF32N VpaddsF32N //go:noescape func VpaddsF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxS8N VpmaxS8N //go:noescape func VpmaxS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxS16N VpmaxS16N //go:noescape func VpmaxS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxS32N VpmaxS32N //go:noescape func VpmaxS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxU8N VpmaxU8N //go:noescape func VpmaxU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxU16N VpmaxU16N //go:noescape func VpmaxU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxU32N VpmaxU32N //go:noescape func VpmaxU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxF32N VpmaxF32N //go:noescape func VpmaxF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxnmF32N VpmaxnmF32N //go:noescape func VpmaxnmF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxnmqF32N VpmaxnmqF32N //go:noescape func VpmaxnmqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxnmqF64N VpmaxnmqF64N //go:noescape func VpmaxnmqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxnmqdF64N VpmaxnmqdF64N //go:noescape func VpmaxnmqdF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxnmsF32N VpmaxnmsF32N //go:noescape func VpmaxnmsF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxqS8N VpmaxqS8N //go:noescape func VpmaxqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxqS16N VpmaxqS16N //go:noescape func VpmaxqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxqS32N VpmaxqS32N //go:noescape func VpmaxqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxqU8N VpmaxqU8N //go:noescape func VpmaxqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxqU16N VpmaxqU16N //go:noescape func VpmaxqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpmaxqU32N VpmaxqU32N //go:noescape func VpmaxqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxqF32N VpmaxqF32N //go:noescape func VpmaxqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxqF64N VpmaxqF64N //go:noescape func VpmaxqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxqdF64N VpmaxqdF64N //go:noescape func VpmaxqdF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpmaxsF32N VpmaxsF32N //go:noescape func VpmaxsF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminS8N VpminS8N //go:noescape func VpminS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminS16N VpminS16N //go:noescape func VpminS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminS32N VpminS32N //go:noescape func VpminS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminU8N VpminU8N //go:noescape func VpminU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminU16N VpminU16N //go:noescape func VpminU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminU32N VpminU32N //go:noescape func VpminU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminF32N VpminF32N //go:noescape func VpminF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminnmF32N VpminnmF32N //go:noescape func VpminnmF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminnmqF32N VpminnmqF32N //go:noescape func VpminnmqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminnmqF64N VpminnmqF64N //go:noescape func VpminnmqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminnmqdF64N VpminnmqdF64N //go:noescape func VpminnmqdF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminnmsF32N VpminnmsF32N //go:noescape func VpminnmsF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminqS8N VpminqS8N //go:noescape func VpminqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminqS16N VpminqS16N //go:noescape func VpminqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminqS32N VpminqS32N //go:noescape func VpminqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminqU8N VpminqU8N //go:noescape func VpminqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminqU16N VpminqU16N //go:noescape func VpminqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VpminqU32N VpminqU32N //go:noescape func VpminqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminqF32N VpminqF32N //go:noescape func VpminqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminqF64N VpminqF64N //go:noescape func VpminqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminqdF64N VpminqdF64N //go:noescape func VpminqdF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values. // //go:linkname VpminsF32N VpminsF32N //go:noescape func VpminsF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsS8N VqabsS8N //go:noescape func VqabsS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsS16N VqabsS16N //go:noescape func VqabsS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsS32N VqabsS32N //go:noescape func VqabsS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsS64N VqabsS64N //go:noescape func VqabsS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsbS8N VqabsbS8N //go:noescape func VqabsbS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsdS64N VqabsdS64N //go:noescape func VqabsdS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabshS16N VqabshS16N //go:noescape func VqabshS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsqS8N VqabsqS8N //go:noescape func VqabsqS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsqS16N VqabsqS16N //go:noescape func VqabsqS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsqS32N VqabsqS32N //go:noescape func VqabsqS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabsqS64N VqabsqS64N //go:noescape func VqabsqS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqabssS32N VqabssS32N //go:noescape func VqabssS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddS8N VqaddS8N //go:noescape func VqaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddS16N VqaddS16N //go:noescape func VqaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddS32N VqaddS32N //go:noescape func VqaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddS64N VqaddS64N //go:noescape func VqaddS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddU8N VqaddU8N //go:noescape func VqaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddU16N VqaddU16N //go:noescape func VqaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddU32N VqaddU32N //go:noescape func VqaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddU64N VqaddU64N //go:noescape func VqaddU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddbS8N VqaddbS8N //go:noescape func VqaddbS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddbU8N VqaddbU8N //go:noescape func VqaddbU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqadddS64N VqadddS64N //go:noescape func VqadddS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqadddU64N VqadddU64N //go:noescape func VqadddU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddhS16N VqaddhS16N //go:noescape func VqaddhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddhU16N VqaddhU16N //go:noescape func VqaddhU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqS8N VqaddqS8N //go:noescape func VqaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqS16N VqaddqS16N //go:noescape func VqaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqS32N VqaddqS32N //go:noescape func VqaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqS64N VqaddqS64N //go:noescape func VqaddqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqU8N VqaddqU8N //go:noescape func VqaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqU16N VqaddqU16N //go:noescape func VqaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqU32N VqaddqU32N //go:noescape func VqaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddqU64N VqaddqU64N //go:noescape func VqaddqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddsS32N VqaddsS32N //go:noescape func VqaddsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP registers, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqaddsU32N VqaddsU32N //go:noescape func VqaddsU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmulhS16N VqdmulhS16N //go:noescape func VqdmulhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmulhS32N VqdmulhS32N //go:noescape func VqdmulhS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmulhhS16N VqdmulhhS16N //go:noescape func VqdmulhhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmulhqS16N VqdmulhqS16N //go:noescape func VqdmulhqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmulhqS32N VqdmulhqS32N //go:noescape func VqdmulhqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqdmulhsS32N VqdmulhsS32N //go:noescape func VqdmulhsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegS8N VqnegS8N //go:noescape func VqnegS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegS16N VqnegS16N //go:noescape func VqnegS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegS32N VqnegS32N //go:noescape func VqnegS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegS64N VqnegS64N //go:noescape func VqnegS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegbS8N VqnegbS8N //go:noescape func VqnegbS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegdS64N VqnegdS64N //go:noescape func VqnegdS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqneghS16N VqneghS16N //go:noescape func VqneghS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegqS8N VqnegqS8N //go:noescape func VqnegqS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegqS16N VqnegqS16N //go:noescape func VqnegqS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegqS32N VqnegqS32N //go:noescape func VqnegqS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegqS64N VqnegqS64N //go:noescape func VqnegqS64N(r *arm.Int64, v0 *arm.Int64, n int32) // Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed integer values. // //go:linkname VqnegsS32N VqnegsS32N //go:noescape func VqnegsS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrdmulhS16N VqrdmulhS16N //go:noescape func VqrdmulhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrdmulhS32N VqrdmulhS32N //go:noescape func VqrdmulhS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrdmulhhS16N VqrdmulhhS16N //go:noescape func VqrdmulhhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrdmulhqS16N VqrdmulhqS16N //go:noescape func VqrdmulhqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrdmulhqS32N VqrdmulhqS32N //go:noescape func VqrdmulhqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrdmulhsS32N VqrdmulhsS32N //go:noescape func VqrdmulhsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlS8N VqrshlS8N //go:noescape func VqrshlS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlS16N VqrshlS16N //go:noescape func VqrshlS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlS32N VqrshlS32N //go:noescape func VqrshlS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlS64N VqrshlS64N //go:noescape func VqrshlS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlbS8N VqrshlbS8N //go:noescape func VqrshlbS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshldS64N VqrshldS64N //go:noescape func VqrshldS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlhS16N VqrshlhS16N //go:noescape func VqrshlhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlqS8N VqrshlqS8N //go:noescape func VqrshlqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlqS16N VqrshlqS16N //go:noescape func VqrshlqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlqS32N VqrshlqS32N //go:noescape func VqrshlqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlqS64N VqrshlqS64N //go:noescape func VqrshlqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqrshlsS32N VqrshlsS32N //go:noescape func VqrshlsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlS8N VqshlS8N //go:noescape func VqshlS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlS16N VqshlS16N //go:noescape func VqshlS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlS32N VqshlS32N //go:noescape func VqshlS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlS64N VqshlS64N //go:noescape func VqshlS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlbS8N VqshlbS8N //go:noescape func VqshlbS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshldS64N VqshldS64N //go:noescape func VqshldS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlhS16N VqshlhS16N //go:noescape func VqshlhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlqS8N VqshlqS8N //go:noescape func VqshlqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlqS16N VqshlqS16N //go:noescape func VqshlqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlqS32N VqshlqS32N //go:noescape func VqshlqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlqS64N VqshlqS64N //go:noescape func VqshlqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register, shifts each element by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqshlsS32N VqshlsS32N //go:noescape func VqshlsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubS8N VqsubS8N //go:noescape func VqsubS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubS16N VqsubS16N //go:noescape func VqsubS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubS32N VqsubS32N //go:noescape func VqsubS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubS64N VqsubS64N //go:noescape func VqsubS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubU8N VqsubU8N //go:noescape func VqsubU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubU16N VqsubU16N //go:noescape func VqsubU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubU32N VqsubU32N //go:noescape func VqsubU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubU64N VqsubU64N //go:noescape func VqsubU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubbS8N VqsubbS8N //go:noescape func VqsubbS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubbU8N VqsubbU8N //go:noescape func VqsubbU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubdS64N VqsubdS64N //go:noescape func VqsubdS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubdU64N VqsubdU64N //go:noescape func VqsubdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubhS16N VqsubhS16N //go:noescape func VqsubhS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubhU16N VqsubhU16N //go:noescape func VqsubhU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqS8N VqsubqS8N //go:noescape func VqsubqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqS16N VqsubqS16N //go:noescape func VqsubqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqS32N VqsubqS32N //go:noescape func VqsubqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqS64N VqsubqS64N //go:noescape func VqsubqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqU8N VqsubqU8N //go:noescape func VqsubqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqU16N VqsubqU16N //go:noescape func VqsubqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqU32N VqsubqU32N //go:noescape func VqsubqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubqU64N VqsubqU64N //go:noescape func VqsubqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubsS32N VqsubsS32N //go:noescape func VqsubsS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VqsubsU32N VqsubsU32N //go:noescape func VqsubsU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vqtbl1QU8N Vqtbl1QU8N //go:noescape func Vqtbl1QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Rotate and Exclusive OR rotates each 64-bit element of the 128-bit vector in a source SIMD&FP register left by 1, performs a bitwise exclusive OR of the resulting 128-bit vector and the vector in another source SIMD&FP register, and writes the result to the destination SIMD&FP register. // //go:linkname Vrax1QU64N Vrax1QU64N //go:noescape func Vrax1QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrbitS8N VrbitS8N //go:noescape func VrbitS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrbitU8N VrbitU8N //go:noescape func VrbitU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrbitqS8N VrbitqS8N //go:noescape func VrbitqS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrbitqU8N VrbitqU8N //go:noescape func VrbitqU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpeU32N VrecpeU32N //go:noescape func VrecpeU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpeF32N VrecpeF32N //go:noescape func VrecpeF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpeF64N VrecpeF64N //go:noescape func VrecpeF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpedF64N VrecpedF64N //go:noescape func VrecpedF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpeqU32N VrecpeqU32N //go:noescape func VrecpeqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpeqF32N VrecpeqF32N //go:noescape func VrecpeqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpeqF64N VrecpeqF64N //go:noescape func VrecpeqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpesF32N VrecpesF32N //go:noescape func VrecpesF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpsF32N VrecpsF32N //go:noescape func VrecpsF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpsF64N VrecpsF64N //go:noescape func VrecpsF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpsdF64N VrecpsdF64N //go:noescape func VrecpsdF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpsqF32N VrecpsqF32N //go:noescape func VrecpsqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpsqF64N VrecpsqF64N //go:noescape func VrecpsqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpssF32N VrecpssF32N //go:noescape func VrecpssF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpxdF64N VrecpxdF64N //go:noescape func VrecpxdF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrecpxsF32N VrecpxsF32N //go:noescape func VrecpxsF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretF32S32N VreinterpretF32S32N //go:noescape func VreinterpretF32S32N(r *arm.Float32, v0 *arm.Int32, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretF32U32N VreinterpretF32U32N //go:noescape func VreinterpretF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretF64S64N VreinterpretF64S64N //go:noescape func VreinterpretF64S64N(r *arm.Float64, v0 *arm.Int64, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretF64U64N VreinterpretF64U64N //go:noescape func VreinterpretF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretS16U16N VreinterpretS16U16N //go:noescape func VreinterpretS16U16N(r *arm.Int16, v0 *arm.Uint16, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretS32U32N VreinterpretS32U32N //go:noescape func VreinterpretS32U32N(r *arm.Int32, v0 *arm.Uint32, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretS32F32N VreinterpretS32F32N //go:noescape func VreinterpretS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretS64U64N VreinterpretS64U64N //go:noescape func VreinterpretS64U64N(r *arm.Int64, v0 *arm.Uint64, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretS64F64N VreinterpretS64F64N //go:noescape func VreinterpretS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretS8U8N VreinterpretS8U8N //go:noescape func VreinterpretS8U8N(r *arm.Int8, v0 *arm.Uint8, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretU16S16N VreinterpretU16S16N //go:noescape func VreinterpretU16S16N(r *arm.Uint16, v0 *arm.Int16, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretU32S32N VreinterpretU32S32N //go:noescape func VreinterpretU32S32N(r *arm.Uint32, v0 *arm.Int32, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretU32F32N VreinterpretU32F32N //go:noescape func VreinterpretU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretU64S64N VreinterpretU64S64N //go:noescape func VreinterpretU64S64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretU64F64N VreinterpretU64F64N //go:noescape func VreinterpretU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretU8S8N VreinterpretU8S8N //go:noescape func VreinterpretU8S8N(r *arm.Uint8, v0 *arm.Int8, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqF32S32N VreinterpretqF32S32N //go:noescape func VreinterpretqF32S32N(r *arm.Float32, v0 *arm.Int32, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqF32U32N VreinterpretqF32U32N //go:noescape func VreinterpretqF32U32N(r *arm.Float32, v0 *arm.Uint32, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64S64N VreinterpretqF64S64N //go:noescape func VreinterpretqF64S64N(r *arm.Float64, v0 *arm.Int64, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqF64U64N VreinterpretqF64U64N //go:noescape func VreinterpretqF64U64N(r *arm.Float64, v0 *arm.Uint64, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqS16U16N VreinterpretqS16U16N //go:noescape func VreinterpretqS16U16N(r *arm.Int16, v0 *arm.Uint16, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32U32N VreinterpretqS32U32N //go:noescape func VreinterpretqS32U32N(r *arm.Int32, v0 *arm.Uint32, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqS32F32N VreinterpretqS32F32N //go:noescape func VreinterpretqS32F32N(r *arm.Int32, v0 *arm.Float32, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64U64N VreinterpretqS64U64N //go:noescape func VreinterpretqS64U64N(r *arm.Int64, v0 *arm.Uint64, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqS64F64N VreinterpretqS64F64N //go:noescape func VreinterpretqS64F64N(r *arm.Int64, v0 *arm.Float64, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqS8U8N VreinterpretqS8U8N //go:noescape func VreinterpretqS8U8N(r *arm.Int8, v0 *arm.Uint8, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqU16S16N VreinterpretqU16S16N //go:noescape func VreinterpretqU16S16N(r *arm.Uint16, v0 *arm.Int16, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32S32N VreinterpretqU32S32N //go:noescape func VreinterpretqU32S32N(r *arm.Uint32, v0 *arm.Int32, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqU32F32N VreinterpretqU32F32N //go:noescape func VreinterpretqU32F32N(r *arm.Uint32, v0 *arm.Float32, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64S64N VreinterpretqU64S64N //go:noescape func VreinterpretqU64S64N(r *arm.Uint64, v0 *arm.Int64, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqU64F64N VreinterpretqU64F64N //go:noescape func VreinterpretqU64F64N(r *arm.Uint64, v0 *arm.Float64, n int32) // Vector reinterpret cast operation // //go:linkname VreinterpretqU8S8N VreinterpretqU8S8N //go:noescape func VreinterpretqU8S8N(r *arm.Uint8, v0 *arm.Int8, n int32) // Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev16S8N Vrev16S8N //go:noescape func Vrev16S8N(r *arm.Int8, v0 *arm.Int8, n int32) // Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev16U8N Vrev16U8N //go:noescape func Vrev16U8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev16QS8N Vrev16QS8N //go:noescape func Vrev16QS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev16QU8N Vrev16QU8N //go:noescape func Vrev16QU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32S8N Vrev32S8N //go:noescape func Vrev32S8N(r *arm.Int8, v0 *arm.Int8, n int32) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32S16N Vrev32S16N //go:noescape func Vrev32S16N(r *arm.Int16, v0 *arm.Int16, n int32) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32U8N Vrev32U8N //go:noescape func Vrev32U8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32U16N Vrev32U16N //go:noescape func Vrev32U16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32QS8N Vrev32QS8N //go:noescape func Vrev32QS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32QS16N Vrev32QS16N //go:noescape func Vrev32QS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32QU8N Vrev32QU8N //go:noescape func Vrev32QU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev32QU16N Vrev32QU16N //go:noescape func Vrev32QU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64S8N Vrev64S8N //go:noescape func Vrev64S8N(r *arm.Int8, v0 *arm.Int8, n int32) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64S16N Vrev64S16N //go:noescape func Vrev64S16N(r *arm.Int16, v0 *arm.Int16, n int32) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64S32N Vrev64S32N //go:noescape func Vrev64S32N(r *arm.Int32, v0 *arm.Int32, n int32) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64U8N Vrev64U8N //go:noescape func Vrev64U8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64U16N Vrev64U16N //go:noescape func Vrev64U16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64U32N Vrev64U32N //go:noescape func Vrev64U32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64F32N Vrev64F32N //go:noescape func Vrev64F32N(r *arm.Float32, v0 *arm.Float32, n int32) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QS8N Vrev64QS8N //go:noescape func Vrev64QS8N(r *arm.Int8, v0 *arm.Int8, n int32) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QS16N Vrev64QS16N //go:noescape func Vrev64QS16N(r *arm.Int16, v0 *arm.Int16, n int32) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QS32N Vrev64QS32N //go:noescape func Vrev64QS32N(r *arm.Int32, v0 *arm.Int32, n int32) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QU8N Vrev64QU8N //go:noescape func Vrev64QU8N(r *arm.Uint8, v0 *arm.Uint8, n int32) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QU16N Vrev64QU16N //go:noescape func Vrev64QU16N(r *arm.Uint16, v0 *arm.Uint16, n int32) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QU32N Vrev64QU32N //go:noescape func Vrev64QU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vrev64QF32N Vrev64QF32N //go:noescape func Vrev64QF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddS8N VrhaddS8N //go:noescape func VrhaddS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddS16N VrhaddS16N //go:noescape func VrhaddS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddS32N VrhaddS32N //go:noescape func VrhaddS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddU8N VrhaddU8N //go:noescape func VrhaddU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddU16N VrhaddU16N //go:noescape func VrhaddU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddU32N VrhaddU32N //go:noescape func VrhaddU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddqS8N VrhaddqS8N //go:noescape func VrhaddqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddqS16N VrhaddqS16N //go:noescape func VrhaddqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddqS32N VrhaddqS32N //go:noescape func VrhaddqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddqU8N VrhaddqU8N //go:noescape func VrhaddqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddqU16N VrhaddqU16N //go:noescape func VrhaddqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrhaddqU32N VrhaddqU32N //go:noescape func VrhaddqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndF32N VrndF32N //go:noescape func VrndF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndF64N VrndF64N //go:noescape func VrndF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32XF32N Vrnd32XF32N //go:noescape func Vrnd32XF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32XF64N Vrnd32XF64N //go:noescape func Vrnd32XF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32XqF32N Vrnd32XqF32N //go:noescape func Vrnd32XqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32XqF64N Vrnd32XqF64N //go:noescape func Vrnd32XqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32ZF32N Vrnd32ZF32N //go:noescape func Vrnd32ZF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32ZF64N Vrnd32ZF64N //go:noescape func Vrnd32ZF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32ZqF32N Vrnd32ZqF32N //go:noescape func Vrnd32ZqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd32ZqF64N Vrnd32ZqF64N //go:noescape func Vrnd32ZqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64XF32N Vrnd64XF32N //go:noescape func Vrnd64XF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64XF64N Vrnd64XF64N //go:noescape func Vrnd64XF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64XqF32N Vrnd64XqF32N //go:noescape func Vrnd64XqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64XqF64N Vrnd64XqF64N //go:noescape func Vrnd64XqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64ZF32N Vrnd64ZF32N //go:noescape func Vrnd64ZF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64ZF64N Vrnd64ZF64N //go:noescape func Vrnd64ZF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64ZqF32N Vrnd64ZqF32N //go:noescape func Vrnd64ZqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname Vrnd64ZqF64N Vrnd64ZqF64N //go:noescape func Vrnd64ZqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndaF32N VrndaF32N //go:noescape func VrndaF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndaF64N VrndaF64N //go:noescape func VrndaF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndaqF32N VrndaqF32N //go:noescape func VrndaqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndaqF64N VrndaqF64N //go:noescape func VrndaqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndiF32N VrndiF32N //go:noescape func VrndiF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndiF64N VrndiF64N //go:noescape func VrndiF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndiqF32N VrndiqF32N //go:noescape func VrndiqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndiqF64N VrndiqF64N //go:noescape func VrndiqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndmF32N VrndmF32N //go:noescape func VrndmF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndmF64N VrndmF64N //go:noescape func VrndmF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndmqF32N VrndmqF32N //go:noescape func VrndmqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndmqF64N VrndmqF64N //go:noescape func VrndmqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndnF32N VrndnF32N //go:noescape func VrndnF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndnF64N VrndnF64N //go:noescape func VrndnF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndnqF32N VrndnqF32N //go:noescape func VrndnqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndnqF64N VrndnqF64N //go:noescape func VrndnqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndnsF32N VrndnsF32N //go:noescape func VrndnsF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndpF32N VrndpF32N //go:noescape func VrndpF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndpF64N VrndpF64N //go:noescape func VrndpF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndpqF32N VrndpqF32N //go:noescape func VrndpqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndpqF64N VrndpqF64N //go:noescape func VrndpqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndqF32N VrndqF32N //go:noescape func VrndqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding mode, and writes the result to the SIMD&FP destination register. // //go:linkname VrndqF64N VrndqF64N //go:noescape func VrndqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndxF32N VrndxF32N //go:noescape func VrndxF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndxF64N VrndxF64N //go:noescape func VrndxF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndxqF32N VrndxqF32N //go:noescape func VrndxqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register. // //go:linkname VrndxqF64N VrndxqF64N //go:noescape func VrndxqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlS8N VrshlS8N //go:noescape func VrshlS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlS16N VrshlS16N //go:noescape func VrshlS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlS32N VrshlS32N //go:noescape func VrshlS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlS64N VrshlS64N //go:noescape func VrshlS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshldS64N VrshldS64N //go:noescape func VrshldS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlqS8N VrshlqS8N //go:noescape func VrshlqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlqS16N VrshlqS16N //go:noescape func VrshlqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlqS32N VrshlqS32N //go:noescape func VrshlqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrshlqS64N VrshlqS64N //go:noescape func VrshlqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VrsqrteU32N VrsqrteU32N //go:noescape func VrsqrteU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrteF32N VrsqrteF32N //go:noescape func VrsqrteF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrteF64N VrsqrteF64N //go:noescape func VrsqrteF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtedF64N VrsqrtedF64N //go:noescape func VrsqrtedF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values. // //go:linkname VrsqrteqU32N VrsqrteqU32N //go:noescape func VrsqrteqU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrteqF32N VrsqrteqF32N //go:noescape func VrsqrteqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrteqF64N VrsqrteqF64N //go:noescape func VrsqrteqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtesF32N VrsqrtesF32N //go:noescape func VrsqrtesF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtsF32N VrsqrtsF32N //go:noescape func VrsqrtsF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtsF64N VrsqrtsF64N //go:noescape func VrsqrtsF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtsdF64N VrsqrtsdF64N //go:noescape func VrsqrtsdF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtsqF32N VrsqrtsqF32N //go:noescape func VrsqrtsqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtsqF64N VrsqrtsqF64N //go:noescape func VrsqrtsqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0, places the results into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VrsqrtssF32N VrsqrtssF32N //go:noescape func VrsqrtssF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // SHA1 fixed rotate. // //go:linkname Vsha1HU32N Vsha1HU32N //go:noescape func Vsha1HU32N(r *arm.Uint32, v0 *arm.Uint32, n int32) // SHA1 schedule update 1. // //go:linkname Vsha1Su1QU32N Vsha1Su1QU32N //go:noescape func Vsha1Su1QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // SHA256 schedule update 0. // //go:linkname Vsha256Su0QU32N Vsha256Su0QU32N //go:noescape func Vsha256Su0QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // SHA512 Schedule Update 0 takes the values from the two 128-bit source SIMD&FP registers and produces a 128-bit output value that combines the gamma0 functions of two iterations of the SHA512 schedule update that are performed after the first 16 iterations within a block. It returns this value to the destination SIMD&FP register. // //go:linkname Vsha512Su0QU64N Vsha512Su0QU64N //go:noescape func Vsha512Su0QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlS8N VshlS8N //go:noescape func VshlS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlS16N VshlS16N //go:noescape func VshlS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlS32N VshlS32N //go:noescape func VshlS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlS64N VshlS64N //go:noescape func VshlS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshldS64N VshldS64N //go:noescape func VshldS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlqS8N VshlqS8N //go:noescape func VshlqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlqS16N VshlqS16N //go:noescape func VshlqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlqS32N VshlqS32N //go:noescape func VshlqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP register, shifts each value by a value from the least significant byte of the corresponding element of the second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VshlqS64N VshlqS64N //go:noescape func VshlqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // SM4 Key takes an input as a 128-bit vector from the first source SIMD&FP register and a 128-bit constant from the second SIMD&FP register. It derives four iterations of the output key, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register. // //go:linkname Vsm4EkeyqU32N Vsm4EkeyqU32N //go:noescape func Vsm4EkeyqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // SM4 Encode takes input data as a 128-bit vector from the first source SIMD&FP register, and four iterations of the round key held as the elements of the 128-bit vector in the second source SIMD&FP register. It encrypts the data by four rounds, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register. // //go:linkname Vsm4EqU32N Vsm4EqU32N //go:noescape func Vsm4EqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsqrtF32N VsqrtF32N //go:noescape func VsqrtF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsqrtF64N VsqrtF64N //go:noescape func VsqrtF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsqrtqF32N VsqrtqF32N //go:noescape func VsqrtqF32N(r *arm.Float32, v0 *arm.Float32, n int32) // Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsqrtqF64N VsqrtqF64N //go:noescape func VsqrtqF64N(r *arm.Float64, v0 *arm.Float64, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubS8N VsubS8N //go:noescape func VsubS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubS16N VsubS16N //go:noescape func VsubS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubS32N VsubS32N //go:noescape func VsubS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubS64N VsubS64N //go:noescape func VsubS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubU8N VsubU8N //go:noescape func VsubU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubU16N VsubU16N //go:noescape func VsubU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubU32N VsubU32N //go:noescape func VsubU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubU64N VsubU64N //go:noescape func VsubU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubF32N VsubF32N //go:noescape func VsubF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubF64N VsubF64N //go:noescape func VsubF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubdS64N VsubdS64N //go:noescape func VsubdS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubdU64N VsubdU64N //go:noescape func VsubdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqS8N VsubqS8N //go:noescape func VsubqS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqS16N VsubqS16N //go:noescape func VsubqS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqS32N VsubqS32N //go:noescape func VsubqS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqS64N VsubqS64N //go:noescape func VsubqS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqU8N VsubqU8N //go:noescape func VsubqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqU16N VsubqU16N //go:noescape func VsubqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqU32N VsubqU32N //go:noescape func VsubqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqU64N VsubqU64N //go:noescape func VsubqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqF32N VsubqF32N //go:noescape func VsubqF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into elements of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname VsubqF64N VsubqF64N //go:noescape func VsubqF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbl1S8N Vtbl1S8N //go:noescape func Vtbl1S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is used to describe the table, the first source register describes the lowest bytes of the table. // //go:linkname Vtbl1U8N Vtbl1U8N //go:noescape func Vtbl1U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1S8N Vtrn1S8N //go:noescape func Vtrn1S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1S16N Vtrn1S16N //go:noescape func Vtrn1S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1S32N Vtrn1S32N //go:noescape func Vtrn1S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1U8N Vtrn1U8N //go:noescape func Vtrn1U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1U16N Vtrn1U16N //go:noescape func Vtrn1U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1U32N Vtrn1U32N //go:noescape func Vtrn1U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1F32N Vtrn1F32N //go:noescape func Vtrn1F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QS8N Vtrn1QS8N //go:noescape func Vtrn1QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QS16N Vtrn1QS16N //go:noescape func Vtrn1QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QS32N Vtrn1QS32N //go:noescape func Vtrn1QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QS64N Vtrn1QS64N //go:noescape func Vtrn1QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QU8N Vtrn1QU8N //go:noescape func Vtrn1QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QU16N Vtrn1QU16N //go:noescape func Vtrn1QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QU32N Vtrn1QU32N //go:noescape func Vtrn1QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QU64N Vtrn1QU64N //go:noescape func Vtrn1QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QF32N Vtrn1QF32N //go:noescape func Vtrn1QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn1QF64N Vtrn1QF64N //go:noescape func Vtrn1QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2S8N Vtrn2S8N //go:noescape func Vtrn2S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2S16N Vtrn2S16N //go:noescape func Vtrn2S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2S32N Vtrn2S32N //go:noescape func Vtrn2S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2U8N Vtrn2U8N //go:noescape func Vtrn2U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2U16N Vtrn2U16N //go:noescape func Vtrn2U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2U32N Vtrn2U32N //go:noescape func Vtrn2U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2F32N Vtrn2F32N //go:noescape func Vtrn2F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QS8N Vtrn2QS8N //go:noescape func Vtrn2QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QS16N Vtrn2QS16N //go:noescape func Vtrn2QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QS32N Vtrn2QS32N //go:noescape func Vtrn2QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QS64N Vtrn2QS64N //go:noescape func Vtrn2QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QU8N Vtrn2QU8N //go:noescape func Vtrn2QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QU16N Vtrn2QU16N //go:noescape func Vtrn2QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QU32N Vtrn2QU32N //go:noescape func Vtrn2QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QU64N Vtrn2QU64N //go:noescape func Vtrn2QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QF32N Vtrn2QF32N //go:noescape func Vtrn2QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-numbered elements of the destination vector. // //go:linkname Vtrn2QF64N Vtrn2QF64N //go:noescape func Vtrn2QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstS8N VtstS8N //go:noescape func VtstS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstS16N VtstS16N //go:noescape func VtstS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstS32N VtstS32N //go:noescape func VtstS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstS64N VtstS64N //go:noescape func VtstS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstU8N VtstU8N //go:noescape func VtstU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstU16N VtstU16N //go:noescape func VtstU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstU32N VtstU32N //go:noescape func VtstU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstU64N VtstU64N //go:noescape func VtstU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstdS64N VtstdS64N //go:noescape func VtstdS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstdU64N VtstdU64N //go:noescape func VtstdU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqS8N VtstqS8N //go:noescape func VtstqS8N(r *arm.Uint8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqS16N VtstqS16N //go:noescape func VtstqS16N(r *arm.Uint16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqS32N VtstqS32N //go:noescape func VtstqS32N(r *arm.Uint32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqS64N VtstqS64N //go:noescape func VtstqS64N(r *arm.Uint64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqU8N VtstqU8N //go:noescape func VtstqU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqU16N VtstqU16N //go:noescape func VtstqU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqU32N VtstqU32N //go:noescape func VtstqU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero. // //go:linkname VtstqU64N VtstqU64N //go:noescape func VtstqU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1S8N Vuzp1S8N //go:noescape func Vuzp1S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1S16N Vuzp1S16N //go:noescape func Vuzp1S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1S32N Vuzp1S32N //go:noescape func Vuzp1S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1U8N Vuzp1U8N //go:noescape func Vuzp1U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1U16N Vuzp1U16N //go:noescape func Vuzp1U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1U32N Vuzp1U32N //go:noescape func Vuzp1U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1F32N Vuzp1F32N //go:noescape func Vuzp1F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QS8N Vuzp1QS8N //go:noescape func Vuzp1QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QS16N Vuzp1QS16N //go:noescape func Vuzp1QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QS32N Vuzp1QS32N //go:noescape func Vuzp1QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QS64N Vuzp1QS64N //go:noescape func Vuzp1QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QU8N Vuzp1QU8N //go:noescape func Vuzp1QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QU16N Vuzp1QU16N //go:noescape func Vuzp1QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QU32N Vuzp1QU32N //go:noescape func Vuzp1QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QU64N Vuzp1QU64N //go:noescape func Vuzp1QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QF32N Vuzp1QF32N //go:noescape func Vuzp1QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp1QF64N Vuzp1QF64N //go:noescape func Vuzp1QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2S8N Vuzp2S8N //go:noescape func Vuzp2S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2S16N Vuzp2S16N //go:noescape func Vuzp2S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2S32N Vuzp2S32N //go:noescape func Vuzp2S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2U8N Vuzp2U8N //go:noescape func Vuzp2U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2U16N Vuzp2U16N //go:noescape func Vuzp2U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2U32N Vuzp2U32N //go:noescape func Vuzp2U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2F32N Vuzp2F32N //go:noescape func Vuzp2F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QS8N Vuzp2QS8N //go:noescape func Vuzp2QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QS16N Vuzp2QS16N //go:noescape func Vuzp2QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QS32N Vuzp2QS32N //go:noescape func Vuzp2QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QS64N Vuzp2QS64N //go:noescape func Vuzp2QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QU8N Vuzp2QU8N //go:noescape func Vuzp2QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QU16N Vuzp2QU16N //go:noescape func Vuzp2QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QU32N Vuzp2QU32N //go:noescape func Vuzp2QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QU64N Vuzp2QU64N //go:noescape func Vuzp2QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QF32N Vuzp2QF32N //go:noescape func Vuzp2QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a vector, and writes the vector to the destination SIMD&FP register. // //go:linkname Vuzp2QF64N Vuzp2QF64N //go:noescape func Vuzp2QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1S8N Vzip1S8N //go:noescape func Vzip1S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1S16N Vzip1S16N //go:noescape func Vzip1S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1S32N Vzip1S32N //go:noescape func Vzip1S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1U8N Vzip1U8N //go:noescape func Vzip1U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1U16N Vzip1U16N //go:noescape func Vzip1U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1U32N Vzip1U32N //go:noescape func Vzip1U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1F32N Vzip1F32N //go:noescape func Vzip1F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QS8N Vzip1QS8N //go:noescape func Vzip1QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QS16N Vzip1QS16N //go:noescape func Vzip1QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QS32N Vzip1QS32N //go:noescape func Vzip1QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QS64N Vzip1QS64N //go:noescape func Vzip1QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QU8N Vzip1QU8N //go:noescape func Vzip1QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QU16N Vzip1QU16N //go:noescape func Vzip1QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QU32N Vzip1QU32N //go:noescape func Vzip1QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QU64N Vzip1QU64N //go:noescape func Vzip1QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QF32N Vzip1QF32N //go:noescape func Vzip1QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip1QF64N Vzip1QF64N //go:noescape func Vzip1QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2S8N Vzip2S8N //go:noescape func Vzip2S8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2S16N Vzip2S16N //go:noescape func Vzip2S16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2S32N Vzip2S32N //go:noescape func Vzip2S32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2U8N Vzip2U8N //go:noescape func Vzip2U8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2U16N Vzip2U16N //go:noescape func Vzip2U16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2U32N Vzip2U32N //go:noescape func Vzip2U32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2F32N Vzip2F32N //go:noescape func Vzip2F32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QS8N Vzip2QS8N //go:noescape func Vzip2QS8N(r *arm.Int8, v0 *arm.Int8, v1 *arm.Int8, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QS16N Vzip2QS16N //go:noescape func Vzip2QS16N(r *arm.Int16, v0 *arm.Int16, v1 *arm.Int16, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QS32N Vzip2QS32N //go:noescape func Vzip2QS32N(r *arm.Int32, v0 *arm.Int32, v1 *arm.Int32, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QS64N Vzip2QS64N //go:noescape func Vzip2QS64N(r *arm.Int64, v0 *arm.Int64, v1 *arm.Int64, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QU8N Vzip2QU8N //go:noescape func Vzip2QU8N(r *arm.Uint8, v0 *arm.Uint8, v1 *arm.Uint8, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QU16N Vzip2QU16N //go:noescape func Vzip2QU16N(r *arm.Uint16, v0 *arm.Uint16, v1 *arm.Uint16, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QU32N Vzip2QU32N //go:noescape func Vzip2QU32N(r *arm.Uint32, v0 *arm.Uint32, v1 *arm.Uint32, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QU64N Vzip2QU64N //go:noescape func Vzip2QU64N(r *arm.Uint64, v0 *arm.Uint64, v1 *arm.Uint64, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QF32N Vzip2QF32N //go:noescape func Vzip2QF32N(r *arm.Float32, v0 *arm.Float32, v1 *arm.Float32, n int32) // Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with subsequent pairs taken alternately from each source register. // //go:linkname Vzip2QF64N Vzip2QF64N //go:noescape func Vzip2QF64N(r *arm.Float64, v0 *arm.Float64, v1 *arm.Float64, n int32) ================================================ FILE: arm/neon/loops_test.go ================================================ package neon import ( "math/rand" "reflect" "testing" "unsafe" "github.com/alivanz/go-simd/arm" ) func TestVabsS32N(t *testing.T) { const N = 1024 * 16 var ( r = make([]arm.Int32, N) v = make([]arm.Int32, N) ref = make([]arm.Int32, N) ) for i := 0; i < N; i++ { r[i] = arm.Int32(int32(rand.Int())) v[i] = arm.Int32(int32(rand.Int())) if v[i] < 0 { ref[i] = -v[i] } else { ref[i] = v[i] } } VabsS32N(&r[0], &v[0], N) if !reflect.DeepEqual(r, ref) { t.Fatal(r) } } func TestVmulqF32N(t *testing.T) { const N = 1024 * 16 var ( r = make([]arm.Float32, N) v1 = make([]arm.Float32, N) v2 = make([]arm.Float32, N) ref = make([]arm.Float32, N) ) for i := 0; i < N; i++ { v1[i] = arm.Float32(rand.Float32()) v2[i] = arm.Float32(rand.Float32()) ref[i] = v1[i] * v2[i] } VmulqF32N(&r[0], &v1[0], &v2[0], N) if !reflect.DeepEqual(r, ref) { t.Fatal(r) } } // this benchmark is fully run on C code func BenchmarkVmulqF32N(b *testing.B) { const N = 1024 * 1024 var ( r = make([]arm.Float32, N) v1 = make([]arm.Float32, N) v2 = make([]arm.Float32, N) ) b.SetBytes(N * 4) for i := int32(0); i < N; i++ { v1[i] = arm.Float32(rand.Float32()) v2[i] = arm.Float32(rand.Float32()) } b.StartTimer() for i := 0; i < b.N; i++ { VmulqF32N(&r[0], &v1[0], &v2[0], N) } } // this benchmark is calling the C code multiple times func BenchmarkVmulqF32C(b *testing.B) { const N = 1024 * 1024 var ( r = make([]arm.Float32, N) v1 = make([]arm.Float32, N) v2 = make([]arm.Float32, N) ) b.SetBytes(N * 4) for i := int32(0); i < N; i++ { v1[i] = arm.Float32(rand.Float32()) v2[i] = arm.Float32(rand.Float32()) } b.StartTimer() for i := 0; i < b.N; i++ { for j := int32(0); j < N; j += 4 { VmulqF32( (*arm.Float32X4)(unsafe.Pointer(&r[j])), (*arm.Float32X4)(unsafe.Pointer(&v1[j])), (*arm.Float32X4)(unsafe.Pointer(&v2[j])), ) } } } // this benchmark is Go runtime implementation func BenchmarkVmulqF32Ref(b *testing.B) { const N = 1024 * 1024 var ( r = make([]arm.Float32, N) v1 = make([]arm.Float32, N) v2 = make([]arm.Float32, N) ) b.SetBytes(N * 4) for i := int32(0); i < N; i++ { v1[i] = arm.Float32(rand.Float32()) v2[i] = arm.Float32(rand.Float32()) } b.StartTimer() for i := 0; i < b.N; i++ { for j := int32(0); j < N; j++ { r[j] = v1[j] * v2[j] } } } ================================================ FILE: arm/types.go ================================================ package arm /* #include */ import "C" // typedef float float32_t; type Float32 = C.float32_t // typedef __attribute__((neon_vector_type(2))) float32_t float32x2_t; type Float32X2 = C.float32x2_t // typedef struct float32x2x2_t { float32x2_t val[2];} float32x2x2_t; type Float32X2X2 = C.float32x2x2_t // typedef __attribute__((neon_vector_type(4))) float32_t float32x4_t; type Float32X4 = C.float32x4_t // typedef struct float32x4x2_t { float32x4_t val[2];} float32x4x2_t; type Float32X4X2 = C.float32x4x2_t // typedef double float64_t; type Float64 = C.float64_t // typedef __attribute__((neon_vector_type(1))) float64_t float64x1_t; type Float64X1 = C.float64x1_t // typedef __attribute__((neon_vector_type(2))) float64_t float64x2_t; type Float64X2 = C.float64x2_t // typedef short int16_t; type Int16 = C.int16_t // typedef __attribute__((neon_vector_type(4))) int16_t int16x4_t; type Int16X4 = C.int16x4_t // typedef struct int16x4x2_t { int16x4_t val[2];} int16x4x2_t; type Int16X4X2 = C.int16x4x2_t // typedef __attribute__((neon_vector_type(8))) int16_t int16x8_t; type Int16X8 = C.int16x8_t // typedef struct int16x8x2_t { int16x8_t val[2];} int16x8x2_t; type Int16X8X2 = C.int16x8x2_t // typedef int int32_t; type Int32 = C.int32_t // typedef __attribute__((neon_vector_type(2))) int32_t int32x2_t; type Int32X2 = C.int32x2_t // typedef struct int32x2x2_t { int32x2_t val[2];} int32x2x2_t; type Int32X2X2 = C.int32x2x2_t // typedef __attribute__((neon_vector_type(4))) int32_t int32x4_t; type Int32X4 = C.int32x4_t // typedef struct int32x4x2_t { int32x4_t val[2];} int32x4x2_t; type Int32X4X2 = C.int32x4x2_t // typedef longlong int64_t; type Int64 = C.int64_t // typedef __attribute__((neon_vector_type(1))) int64_t int64x1_t; type Int64X1 = C.int64x1_t // typedef __attribute__((neon_vector_type(2))) int64_t int64x2_t; type Int64X2 = C.int64x2_t // typedef signed char int8_t; type Int8 = C.int8_t // typedef __attribute__((neon_vector_type(16))) int8_t int8x16_t; type Int8X16 = C.int8x16_t // typedef struct int8x16x2_t { int8x16_t val[2];} int8x16x2_t; type Int8X16X2 = C.int8x16x2_t // typedef struct int8x16x3_t { int8x16_t val[3];} int8x16x3_t; type Int8X16X3 = C.int8x16x3_t // typedef struct int8x16x4_t { int8x16_t val[4];} int8x16x4_t; type Int8X16X4 = C.int8x16x4_t // typedef __attribute__((neon_vector_type(8))) int8_t int8x8_t; type Int8X8 = C.int8x8_t // typedef struct int8x8x2_t { int8x8_t val[2];} int8x8x2_t; type Int8X8X2 = C.int8x8x2_t // typedef struct int8x8x3_t { int8x8_t val[3];} int8x8x3_t; type Int8X8X3 = C.int8x8x3_t // typedef struct int8x8x4_t { int8x8_t val[4];} int8x8x4_t; type Int8X8X4 = C.int8x8x4_t // typedef __uint128_t poly128_t; type Poly128 = C.poly128_t // typedef uint16_t poly16_t; type Poly16 = C.poly16_t // typedef __attribute__((neon_polyvector_type(4))) poly16_t poly16x4_t; type Poly16X4 = C.poly16x4_t // typedef struct poly16x4x2_t { poly16x4_t val[2];} poly16x4x2_t; type Poly16X4X2 = C.poly16x4x2_t // typedef __attribute__((neon_polyvector_type(8))) poly16_t poly16x8_t; type Poly16X8 = C.poly16x8_t // typedef struct poly16x8x2_t { poly16x8_t val[2];} poly16x8x2_t; type Poly16X8X2 = C.poly16x8x2_t // typedef uint64_t poly64_t; type Poly64 = C.poly64_t // typedef __attribute__((neon_polyvector_type(1))) poly64_t poly64x1_t; type Poly64X1 = C.poly64x1_t // typedef __attribute__((neon_polyvector_type(2))) poly64_t poly64x2_t; type Poly64X2 = C.poly64x2_t // typedef uint8_t poly8_t; type Poly8 = C.poly8_t // typedef __attribute__((neon_polyvector_type(16))) poly8_t poly8x16_t; type Poly8X16 = C.poly8x16_t // typedef struct poly8x16x2_t { poly8x16_t val[2];} poly8x16x2_t; type Poly8X16X2 = C.poly8x16x2_t // typedef struct poly8x16x3_t { poly8x16_t val[3];} poly8x16x3_t; type Poly8X16X3 = C.poly8x16x3_t // typedef struct poly8x16x4_t { poly8x16_t val[4];} poly8x16x4_t; type Poly8X16X4 = C.poly8x16x4_t // typedef __attribute__((neon_polyvector_type(8))) poly8_t poly8x8_t; type Poly8X8 = C.poly8x8_t // typedef struct poly8x8x2_t { poly8x8_t val[2];} poly8x8x2_t; type Poly8X8X2 = C.poly8x8x2_t // typedef struct poly8x8x3_t { poly8x8_t val[3];} poly8x8x3_t; type Poly8X8X3 = C.poly8x8x3_t // typedef struct poly8x8x4_t { poly8x8_t val[4];} poly8x8x4_t; type Poly8X8X4 = C.poly8x8x4_t // typedef ushort uint16_t; type Uint16 = C.uint16_t // typedef __attribute__((neon_vector_type(4))) uint16_t uint16x4_t; type Uint16X4 = C.uint16x4_t // typedef struct uint16x4x2_t { uint16x4_t val[2];} uint16x4x2_t; type Uint16X4X2 = C.uint16x4x2_t // typedef __attribute__((neon_vector_type(8))) uint16_t uint16x8_t; type Uint16X8 = C.uint16x8_t // typedef struct uint16x8x2_t { uint16x8_t val[2];} uint16x8x2_t; type Uint16X8X2 = C.uint16x8x2_t // typedef uint uint32_t; type Uint32 = C.uint32_t // typedef __attribute__((neon_vector_type(2))) uint32_t uint32x2_t; type Uint32X2 = C.uint32x2_t // typedef struct uint32x2x2_t { uint32x2_t val[2];} uint32x2x2_t; type Uint32X2X2 = C.uint32x2x2_t // typedef __attribute__((neon_vector_type(4))) uint32_t uint32x4_t; type Uint32X4 = C.uint32x4_t // typedef struct uint32x4x2_t { uint32x4_t val[2];} uint32x4x2_t; type Uint32X4X2 = C.uint32x4x2_t // typedef ulonglong uint64_t; type Uint64 = C.uint64_t // typedef __attribute__((neon_vector_type(1))) uint64_t uint64x1_t; type Uint64X1 = C.uint64x1_t // typedef __attribute__((neon_vector_type(2))) uint64_t uint64x2_t; type Uint64X2 = C.uint64x2_t // typedef uchar uint8_t; type Uint8 = C.uint8_t // typedef __attribute__((neon_vector_type(16))) uint8_t uint8x16_t; type Uint8X16 = C.uint8x16_t // typedef struct uint8x16x2_t { uint8x16_t val[2];} uint8x16x2_t; type Uint8X16X2 = C.uint8x16x2_t // typedef struct uint8x16x3_t { uint8x16_t val[3];} uint8x16x3_t; type Uint8X16X3 = C.uint8x16x3_t // typedef struct uint8x16x4_t { uint8x16_t val[4];} uint8x16x4_t; type Uint8X16X4 = C.uint8x16x4_t // typedef __attribute__((neon_vector_type(8))) uint8_t uint8x8_t; type Uint8X8 = C.uint8x8_t // typedef struct uint8x8x2_t { uint8x8_t val[2];} uint8x8x2_t; type Uint8X8X2 = C.uint8x8x2_t // typedef struct uint8x8x3_t { uint8x8_t val[3];} uint8x8x3_t; type Uint8X8X3 = C.uint8x8x3_t // typedef struct uint8x8x4_t { uint8x8_t val[4];} uint8x8x4_t; type Uint8X8X4 = C.uint8x8x4_t ================================================ FILE: example/neon/main.go ================================================ package main import ( "log" "github.com/alivanz/go-simd/arm" "github.com/alivanz/go-simd/arm/neon" ) func main() { var a, b arm.Int8X8 var add, mul arm.Int16X8 for i := 0; i < 8; i++ { a[i] = arm.Int8(i) b[i] = arm.Int8(i * i) } log.Printf("a = %+v", b) log.Printf("b = %+v", a) neon.VaddlS8(&add, &a, &b) neon.VmullS8(&mul, &a, &b) log.Printf("add = %+v", add) log.Printf("mul = %+v", mul) } ================================================ FILE: example/sse2/main.go ================================================ package main import ( "log" "github.com/alivanz/go-simd/x86" ) func main() { a := x86.MmSetrEpi8(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15) b := x86.MmSetrEpi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0) add := x86.MmAddEpi8(a, b) log.Print(a) log.Print(b) log.Print(add) } ================================================ FILE: generator/arm/arm.go ================================================ package main import ( "encoding/json" "os" "github.com/alivanz/go-simd/generator/utils" ) type ArmIntrinsics []ArmIntrinsic type ArmIntrinsic struct { Name string `json:"name"` Description string `json:"description"` } func GetIntrinsics() (ArmIntrinsics, error) { if err := utils.Download( "intrinsics.json", "https://developer.arm.com/architectures/instruction-sets/intrinsics/data/intrinsics.json", ); err != nil { return nil, err } f, err := os.Open("intrinsics.json") if err != nil { return nil, err } defer f.Close() var intrins ArmIntrinsics if err := json.NewDecoder(f).Decode(&intrins); err != nil { return nil, err } return intrins, nil } func (intrins ArmIntrinsics) Find(s string) *ArmIntrinsic { for _, intrin := range intrins { if intrin.Name == s { return &intrin } } return nil } ================================================ FILE: generator/arm/main.go ================================================ package main import ( "bytes" "fmt" "io" "log" "os" "os/exec" "sort" "strconv" "strings" "github.com/alivanz/go-simd/generator/scanner" "github.com/alivanz/go-simd/generator/types" "github.com/alivanz/go-simd/generator/utils" "github.com/alivanz/go-simd/generator/writer" "github.com/iancoleman/strcase" ) func Source() ([]byte, error) { cmd := exec.Command("clang", "-E", "-") cmd.Stdin = bytes.NewBufferString(strings.Join(writer.Includes([]string{ "arm_neon.h", }), "\n")) cmd.Stderr = os.Stderr return cmd.Output() } func main() { src, err := Source() if err != nil { log.Fatal(err) } // write raw if err := writer.WriteToFile("raw.h", func(w io.Writer) error { _, err := w.Write(src) return err }); err != nil { log.Fatal(err) } // scan result, err := scanner.Scan(src) if err != nil { log.Fatal(err) } // filter functions result.Functions = utils.Filter(result.Functions, func(fn types.Function) bool { if strings.HasPrefix(fn.Name, "vbf") { return false } if strings.Contains(fn.Name, "bf16") { return false } return true }) // filter types mtype := make(map[string]bool) for _, fn := range result.Functions { if fn.Return != nil { mtype[fn.Return.Name] = true } for _, arg := range fn.Args { mtype[arg.Name] = true } } result.Types = utils.Filter(result.Types, func(t types.Type) bool { return mtype[t.Name] }) // sort functions sort.Slice(result.Functions, func(i, j int) bool { g0, i0, _ := sortGroup(result.Functions[i].Name) g1, i1, _ := sortGroup(result.Functions[j].Name) if g0 != g1 { return g0 < g1 } return i0 < i1 }) // sort types sort.Slice(result.Types, func(i, j int) bool { return result.Types[i].Name < result.Types[j].Name }) // write types if err := writer.WriteToFile("types.go", func(w io.Writer) error { if err := writer.Package(w, "arm"); err != nil { return err } if err := writer.ImportC(w, func(w io.Writer) error { _, err := io.WriteString(w, strings.Join(writer.Includes([]string{ "arm_neon.h", }), "\n")) return err }); err != nil { return err } if err := writer.Types(w, result.Types); err != nil { return err } return nil }); err != nil { log.Fatal(err) } // patch intrinsics info intrins, err := GetIntrinsics() if err != nil { log.Fatal(err) } for i, fn := range result.Functions { if info := intrins.Find(fn.Name); info != nil { result.Functions[i].Comment = info.Description } } // write C if err := writer.WriteToFile("neon/functions.c", func(w io.Writer) error { if _, err := io.WriteString(w, "#include \n\n"); err != nil { return err } for _, fn := range result.Functions { if fn.Blacklisted() { continue } if err := writer.RewriteC(w, fn); err != nil { return err } } return nil }); err != nil { log.Fatal(err) } // write functions if err := writer.WriteToFile("neon/functions.go", func(w io.Writer) error { if err := writer.Package(w, "neon"); err != nil { return err } if err := writer.Import(w, []string{ "github.com/alivanz/go-simd/arm", }); err != nil { return err } if err := writer.ImportC(w, func(w io.Writer) error { if _, err := io.WriteString(w, "#include "); err != nil { return err } return nil }); err != nil { return err } for _, fn := range result.Functions { if fn.Blacklisted() { continue } writer.DeclareFuncBypass(w, fn, "arm") } return nil }); err != nil { log.Fatal(err) } // C loops var ( loops = make(map[string]bool) ) if err := writer.WriteToFile("neon/loops.c", func(w io.Writer) error { if _, err := io.WriteString(w, "#include \n\n"); err != nil { return err } if _, err := io.WriteString(w, "#define save(dst, src) *dst = src\n"); err != nil { return err } if _, err := io.WriteString(w, "#define load(src) (*src)\n"); err != nil { return err } if _, err := io.WriteString(w, `#define LOOP1(name, rtype, itype, f, set, load, rstep, istep) \ void name(rtype *r, itype *v, int32_t n) \ { \ while (n >= rstep) \ { \ set(r, f(load(v))); \ r += rstep; \ n -= rstep; \ v += istep; \ } \ } `); err != nil { return err } for _, fn := range result.Functions { if fn.Blacklisted() { continue } if len(fn.Args) != 1 { continue } og, o0, o1 := parseType(fn.Return.Name) if og == "" { continue } ig, i0, i1 := parseType(fn.Args[0].Name) if ig == "" { continue } if o0 != i0 { continue } var rq, iq string if o0*o1 == 128 { rq = "q" } if i0*i1 == 128 { iq = "q" } if o1 == -1 { o1 = 1 } if i1 == -1 { i1 = 1 } group, _, suffix := sortGroup(fn.Name) rg, r0, _ := parseType(fn.Return.Name) if rg == "" { continue } io.WriteString(w, fmt.Sprintf( "LOOP1(%s, %s, %s, %s, %s, %s, %d, %d)\n", strcase.ToCamel(group+suffix+"N"), fmt.Sprintf("%s%d_t", rg, r0), fmt.Sprintf("%s%d_t", ig, i0), fn.Name, setter(fn.Return.Name, "save", fmt.Sprintf("vst1%s_%s%d", rq, typeShort[rg], r0)), setter(fn.Args[0].Name, "load", fmt.Sprintf("vld1%s_%s%d", iq, typeShort[ig], i0)), o1, i1, ), ) loops[fn.Name] = true } io.WriteString(w, "\n") if _, err := io.WriteString(w, `#define LOOP2(name, rtype, itype, f, set, load, rstep, istep) \ void name(rtype *r, itype *v1, itype *v2, int32_t n) \ { \ while (n >= rstep) \ { \ set(r, f(load(v1), load(v2))); \ r += rstep; \ n -= rstep; \ v1 += istep; \ v2 += istep; \ } \ } `); err != nil { return err } for _, fn := range result.Functions { if fn.Blacklisted() { continue } if len(fn.Args) != 2 { continue } if fn.Args[0].Name != fn.Args[1].Name { continue } og, o0, o1 := parseType(fn.Return.Name) if og == "" { continue } ig, i0, i1 := parseType(fn.Args[0].Name) if ig == "" { continue } if o0 != i0 { continue } var rq, iq string if o0*o1 == 128 { rq = "q" } if i0*i1 == 128 { iq = "q" } if o1 == -1 { o1 = 1 } if i1 == -1 { i1 = 1 } group, _, suffix := sortGroup(fn.Name) rg, r0, _ := parseType(fn.Return.Name) if rg == "" { continue } io.WriteString(w, fmt.Sprintf( "LOOP2(%s, %s, %s, %s, %s, %s, %d, %d)\n", strcase.ToCamel(group+suffix+"N"), fmt.Sprintf("%s%d_t", rg, r0), fmt.Sprintf("%s%d_t", ig, i0), fn.Name, setter(fn.Return.Name, "save", fmt.Sprintf("vst1%s_%s%d", rq, typeShort[rg], r0)), setter(fn.Args[0].Name, "load", fmt.Sprintf("vld1%s_%s%d", iq, typeShort[ig], i0)), o1, i1, ), ) loops[fn.Name] = true } return nil }); err != nil { log.Fatal(err) } // loop functions if err := writer.WriteToFile("neon/loops.go", func(w io.Writer) error { if err := writer.Package(w, "neon"); err != nil { return err } if err := writer.Import(w, []string{ "github.com/alivanz/go-simd/arm", }); err != nil { return err } if err := writer.ImportC(w, func(w io.Writer) error { if _, err := io.WriteString(w, "#include "); err != nil { return err } return nil }); err != nil { return err } for _, fn := range result.Functions { if !loops[fn.Name] { continue } // add suffix fn.Name += "N" // write fmt.Fprintf(w, "\n") if len(fn.Comment) > 0 { fmt.Fprintf(w, "// %s\n", fn.Comment) } else { fmt.Fprintf(w, "// %s\n", fn.Name) } fmt.Fprintf(w, "//\n") fmt.Fprintf(w, "//go:linkname %s %s\n", strcase.ToCamel(fn.Name), strcase.ToCamel(fn.Name)) fmt.Fprintf(w, "//go:noescape\n") fmt.Fprintf(w, "func %s(", strcase.ToCamel(fn.Name)) if fn.Return != nil { var parts = strings.SplitN(strings.TrimSuffix(fn.Return.Name, "_t"), "x", 2) fmt.Fprintf(w, "r *arm.%s, ", strcase.ToCamel(parts[0])) } fmt.Fprintf(w, "%s, n int32)\n", strings.Join(utils.Transform(fn.Args, func(i int, t types.Type) string { var parts = strings.SplitN(strings.TrimSuffix(t.Name, "_t"), "x", 2) return fmt.Sprintf("v%d *arm.%s", i, strcase.ToCamel(parts[0])) }), ", ")) } return nil }); err != nil { log.Fatal(err) } } func setter(t string, direct string, def string) string { _, _, r1 := parseType(t) if r1 == -1 { return direct } return def } func parseType(t string) (string, int, int) { var ( group string ) t = strings.TrimSuffix(t, "_t") if strings.HasPrefix(t, "uint") { group = "uint" t = t[4:] } else if strings.HasPrefix(t, "int") { group = "int" t = t[3:] } else if strings.HasPrefix(t, "float") { group = "float" t = t[5:] } parts := strings.Split(t, "x") switch len(parts) { case 1: w, err := strconv.ParseUint(parts[0], 10, 32) if err != nil { return "", 0, 0 } return group, int(w), -1 case 2: w, err := strconv.ParseUint(parts[0], 10, 32) if err != nil { return "", 0, 0 } h, err := strconv.ParseUint(parts[1], 10, 32) if err != nil { return "", 0, 0 } return group, int(w), int(h) } return "", 0, 0 } var ( typeShort = map[string]string{ "uint": "u", "uint8": "u8", "uint16": "u16", "uint32": "u32", "uint64": "u64", "int": "s", "int8": "s8", "int16": "s16", "int32": "s32", "int64": "s64", "float": "f", "float32": "f32", "float64": "f64", } ) ================================================ FILE: generator/arm/sort.go ================================================ package main import "strings" var ( suffixOrder = []string{ "_s8", "_s16", "_s32", "_s64", "_u8", "_u16", "_u32", "_u64", "_f32", "_f64", } ) func sortGroup(name string) (string, int, string) { var ( group = name index = -1 suffix = "" ) for i, s := range suffixOrder { if strings.HasSuffix(name, s) { group = strings.TrimSuffix(name, s) index = i suffix = s } } return group, index, suffix } ================================================ FILE: generator/scanner/scan.go ================================================ package scanner import ( "bytes" "regexp" "github.com/alivanz/go-simd/generator/types" "github.com/alivanz/go-simd/generator/utils" ) var ( name = `(\w+?)` args = `([\w\s,_]*?)` attr = `(?:__attribute__\(\(` + `([\w\s,\(\)"]+?)` + `\)\))` regTypedefSimple = regexp.MustCompile(`typedef\s+` + attr + `?[\w\s]+? ` + name + `\s*` + attr + `?;`) regTypedefStruct = regexp.MustCompile(`typedef struct \w+? {.+?}\s*?` + name + `;`) regFunction = regexp.MustCompile(name + `\s+` + attr + `?\s*` + name + `\s*\(` + args + `\)` + `\s*` + `{.*?}`) regArg = regexp.MustCompile(`\s*(([\w\s]+)\s(?:\w+))`) regWhitespace = regexp.MustCompile(`\s+`) regComma = regexp.MustCompile(`\s*,\s*`) regLongLong = regexp.MustCompile(`long\s+long`) regULongLong = regexp.MustCompile(`unsigned\s+long\s+long`) regUlong = regexp.MustCompile(`unsigned\s+long`) regUint = regexp.MustCompile(`unsigned\s+int`) regUshort = regexp.MustCompile(`unsigned\s+short`) regUchar = regexp.MustCompile(`unsigned\s+char`) ) type ScanResult struct { Types []types.Type Functions []types.Function } func Scan(raw []byte) (*ScanResult, error) { var buf bytes.Buffer // filter # for _, line := range bytes.Split(raw, []byte("\n")) { if !bytes.HasPrefix(line, []byte("#")) { buf.Write(line) } } // remove duplicates whitespace raw = regWhitespace.ReplaceAll(buf.Bytes(), []byte(" ")) // replace known types raw = regLongLong.ReplaceAll(raw, []byte("longlong")) raw = regULongLong.ReplaceAll(raw, []byte("ulonglong")) raw = regUlong.ReplaceAll(raw, []byte("ulong")) raw = regUint.ReplaceAll(raw, []byte("uint")) raw = regUshort.ReplaceAll(raw, []byte("ushort")) raw = regUchar.ReplaceAll(raw, []byte("uchar")) s := string(raw) var result ScanResult // types result.Types = utils.Merge( utils.Transform( regTypedefSimple.FindAllStringSubmatch(s, -1), func(i int, e []string) types.Type { return types.Type{ Name: e[2], Full: e[0], Attributes: commaSplit(e[1], e[3]), } }, ), utils.Transform( regTypedefStruct.FindAllStringSubmatch(s, -1), func(i int, e []string) types.Type { return types.Type{ Name: e[1], Full: e[0], } }, ), ) // functions result.Functions = utils.Transform( regFunction.FindAllStringSubmatch(s, -1), func(i int, match []string) types.Function { var args []types.Type for _, arg := range regArg.FindAllStringSubmatch(match[4], -1) { if arg[2] == "void" { continue } args = append(args, types.Type{ Name: arg[2], Full: arg[1], }) } var ret *types.Type if match[1] != "void" { ret = &types.Type{ Name: match[1], Full: match[1], } } return types.Function{ Name: match[3], Attribute: match[2], Return: ret, Args: args, } }, ) return &result, nil } ================================================ FILE: generator/scanner/scan_test.go ================================================ package scanner import ( "reflect" "regexp" "testing" "github.com/alivanz/go-simd/generator/types" ) func TestAttribute(t *testing.T) { reg := regexp.MustCompile(attr + ";") result := reg.FindAllString(` __attribute__((__vector_size__(32), __aligned__(32))); __attribute__((neon_vector_type(8))); `, -1) ref := []string{ "__attribute__((__vector_size__(32), __aligned__(32)));", "__attribute__((neon_vector_type(8)));", } t.Log(result) t.Log(ref) if !reflect.DeepEqual(result, ref) { t.Fail() } } func TestScan(t *testing.T) { result, err := Scan([]byte(` typedef char int8_t; typedef __attribute__((neon_vector_type(8))) int8_t int8x8_t; typedef double __m256d __attribute__((__vector_size__(32), __aligned__(32))); typedef struct int32x4x3_t { int32x4_t val[3]; } int32x4x3_t; int func(int a, int b, int c) { return a+b+c; } static __inline__ __m128 __attribute__((__always_inline__, __nodebug__, __target__("mmx, sse"), __min_vector_width__(128))) _mm_move_ss(__m128 __a, __m128 __b) { __a[0] = __b[0]; return __a; } static __inline__ long long __attribute__((__always_inline__, __nodebug__, __target__("mmx"), __min_vector_width__(64))) _mm_cvtm64_si64(__m64 __m) { return 1; } void lolo(int a, long long b) { } void vovo(void) { } `)) if err != nil { t.Fatal(err) } ref := &ScanResult{ Types: []types.Type{ { Name: "int8_t", Full: "typedef char int8_t;", }, { Name: "int8x8_t", Full: "typedef __attribute__((neon_vector_type(8))) int8_t int8x8_t;", Attributes: []string{"neon_vector_type(8)"}, }, { Name: "__m256d", Full: "typedef double __m256d __attribute__((__vector_size__(32), __aligned__(32)));", Attributes: []string{"__vector_size__(32)", "__aligned__(32)"}, }, { Name: "int32x4x3_t", Full: "typedef struct int32x4x3_t { int32x4_t val[3]; } int32x4x3_t;", }, }, Functions: []types.Function{ { Name: "func", Return: &types.Type{ Name: "int", Full: "int", }, Args: []types.Type{ { Name: "int", Full: "int a", }, { Name: "int", Full: "int b", }, { Name: "int", Full: "int c", }, }, }, { Name: "_mm_move_ss", Attribute: `__always_inline__, __nodebug__, __target__("mmx, sse"), __min_vector_width__(128)`, Return: &types.Type{ Name: "__m128", Full: "__m128", }, Args: []types.Type{ { Name: "__m128", Full: "__m128 __a", }, { Name: "__m128", Full: "__m128 __b", }, }, }, { Name: "_mm_cvtm64_si64", Attribute: `__always_inline__, __nodebug__, __target__("mmx"), __min_vector_width__(64)`, Return: &types.Type{ Name: "longlong", Full: "longlong", }, Args: []types.Type{ { Name: "__m64", Full: "__m64 __m", }, }, }, { Name: "lolo", Args: []types.Type{ { Name: "int", Full: "int a", }, { Name: "longlong", Full: "longlong b", }, }, }, { Name: "vovo", }, }, } t.Logf("%+v", result.Functions[4].Return) if !reflect.DeepEqual(result.Types, ref.Types) { t.Logf("%+v", result.Types) t.Logf("%+v", ref.Types) t.Fatal() } if !reflect.DeepEqual(result, ref) { t.Logf("%+v", result.Functions) t.Logf("%+v", ref.Functions) t.Fatal() } } ================================================ FILE: generator/scanner/util.go ================================================ package scanner func commaSplit(ss ...string) []string { switch len(ss) { case 0: return nil case 1: s := regWhitespace.ReplaceAllString(ss[0], " ") if len(s) == 0 { return nil } return regComma.Split(s, -1) default: return append(commaSplit(ss[0]), commaSplit(ss[1:]...)...) } } ================================================ FILE: generator/types/function.go ================================================ package types import ( "regexp" "strings" ) type Function struct { Name string Args []Type Return *Type Attribute string Comment string } type Arg struct { Name string Type string } var ( regTarget = regexp.MustCompile(`__target__\("([a-z0-9\s,]+)"\)`) ) func (f *Function) Target() string { match := regTarget.FindStringSubmatch(f.Attribute) if match == nil { return "" } return match[1] } func (fn *Function) Blacklisted() bool { for _, blacklist := range []string{ "f16", "vcmla", "__extension__", } { if strings.Contains(fn.Name, blacklist) { return true } } return false } ================================================ FILE: generator/types/type.go ================================================ package types import ( "strings" "github.com/iancoleman/strcase" ) type Type struct { Name string Full string Attributes []string } func (t *Type) C() string { switch t.Name { case "longlong": return "long long" case "ulonglong": return "unsigned long long" case "ulong": return "unsigned long" case "uint": return "unsigned int" case "ushort": return "unsigned short" case "uchar": return "unsigned char" default: return t.Name } } func (t *Type) CGO() string { if !strings.Contains(t.Name, " ") { return t.Name } s := strings.Replace(t.Name, "unsigned", "u", -1) s = strings.Replace(s, " ", "", -1) return s } func (t *Type) Go(pkg string) string { s := strings.TrimSuffix(string(t.Name), "_t") s = strcase.ToCamel(s) if len(pkg) > 0 { return pkg + "." + s } return s } func (t *Type) Blacklisted() bool { for _, blacklist := range []string{ "__darwin", "__int", "__uint", "__mm_storeh", "_tile", "_aligned", // float16 "float16", "f16", "v8bf", "v8hf", "m128h", "m128bh", // windows? "crt", "_pi_", "mbstate_t", } { if strings.Contains(t.Name, blacklist) { return true } } return false } ================================================ FILE: generator/utils/download.go ================================================ package utils import ( "io" "net/http" "os" ) func Download(dst, url string) error { if _, err := os.Stat(dst); !os.IsNotExist(err) { return nil } resp, err := http.Get(url) if err != nil { return err } defer resp.Body.Close() f, err := os.OpenFile(dst, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.ModePerm) if err != nil { return err } defer f.Close() if _, err := io.Copy(f, resp.Body); err != nil { return err } return nil } ================================================ FILE: generator/utils/filter.go ================================================ package utils func Filter[T any](arr []T, fn func(e T) bool) []T { out := make([]T, 0, len(arr)) for _, e := range arr { if !fn(e) { continue } out = append(out, e) } return out } ================================================ FILE: generator/utils/slice.go ================================================ package utils func Transform[A, B any](arr []A, fn func(i int, e A) B) []B { if arr == nil { return nil } out := make([]B, len(arr)) for i, e := range arr { out[i] = fn(i, e) } return out } func Merge[T any](lists ...[]T) []T { var out []T for _, l := range lists { out = append(out, l...) } return out } ================================================ FILE: generator/writer/cgo.go ================================================ package writer import ( "fmt" "strings" ) func Cflags(flags []string) string { return fmt.Sprintf("#cgo CFLAGS: %s", strings.Join(flags, " ")) } func Includes(headers []string) []string { out := make([]string, len(headers)) for i, h := range headers { out[i] = fmt.Sprintf("#include <%s>", h) } return out } ================================================ FILE: generator/writer/function.go ================================================ package writer import ( "fmt" "io" "strings" "github.com/alivanz/go-simd/generator/types" "github.com/alivanz/go-simd/generator/utils" "github.com/iancoleman/strcase" ) func DeclareFunc(w io.Writer, f types.Function, typePkg string) error { fmt.Fprintf(w, "\n") if len(f.Comment) > 0 { fmt.Fprintf(w, "// %s\n", f.Comment) } else { fmt.Fprintf(w, "// %s\n", f.Name) } // if len(f.Attribute) > 0 { // fmt.Fprintf(w, "// %s\n", f.Attribute) // } fmt.Fprintf(w, "func %s(", strcase.ToCamel(f.Name)) fmt.Fprintf(w, "%s", strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string { return fmt.Sprintf("v%d %s", i, t.Go(typePkg)) }), ", ")) if f.Return == nil { fmt.Fprintf(w, ") {\n") } else { fmt.Fprintf(w, ") %s {\n", f.Return.Go(typePkg)) } if f.Return == nil { fmt.Fprintf(w, "\tC.%s(%s)\n", f.Name, strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string { return fmt.Sprintf("v%d", i) }), ", ")) } else { fmt.Fprintf(w, "\treturn C.%s(%s)\n", f.Name, strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string { return fmt.Sprintf("v%d", i) }), ", ")) } fmt.Fprintf(w, "}\n") return nil } func DeclareFuncBypass(w io.Writer, f types.Function, typePkg string) error { fmt.Fprintf(w, "\n") if len(f.Comment) > 0 { fmt.Fprintf(w, "// %s\n", f.Comment) } else { fmt.Fprintf(w, "// %s\n", f.Name) } fmt.Fprintf(w, "//\n") fmt.Fprintf(w, "//go:linkname %s %s\n", strcase.ToCamel(f.Name), strcase.ToCamel(f.Name)) fmt.Fprintf(w, "//go:noescape\n") fmt.Fprintf(w, "func %s(", strcase.ToCamel(f.Name)) if f.Return != nil { fmt.Fprintf(w, "r *%s, ", f.Return.Go(typePkg)) } fmt.Fprintf(w, "%s)\n", strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string { return fmt.Sprintf("v%d *%s", i, t.Go(typePkg)) }), ", ")) return nil } func RewriteC(w io.Writer, f types.Function) error { var cargs []string if f.Return != nil { cargs = append(cargs, fmt.Sprintf("%s* r", f.Return.C())) } for i, t := range f.Args { cargs = append(cargs, fmt.Sprintf("%s* v%d", t.C(), i)) } fmt.Fprintf(w, "void %s(%s) { ", strcase.ToCamel(f.Name), strings.Join(cargs, ", "), ) if f.Return != nil { fmt.Fprintf(w, "*r = ") } fmt.Fprintf(w, "%s(%s); }\n", f.Name, strings.Join(utils.Transform(f.Args, func(i int, t types.Type) string { return fmt.Sprintf("*v%d", i) }), ", ")) return nil } ================================================ FILE: generator/writer/package.go ================================================ package writer import ( "fmt" "io" "strings" "github.com/alivanz/go-simd/generator/types" ) func Package(w io.Writer, pkg string) error { _, err := fmt.Fprintf(w, "package %s\n", pkg) return err } func Import(w io.Writer, pkgs []string) error { if len(pkgs) == 0 { return nil } _, err := fmt.Fprintf(w, "\nimport (\n\t\"%s\"\n)\n", strings.Join(pkgs, "\"\n\t\"")) return err } func ImportC(w io.Writer, fn func(w io.Writer) error) error { if _, err := fmt.Fprintf(w, "\n/*\n"); err != nil { return err } if err := fn(w); err != nil { return err } if _, err := fmt.Fprintf(w, "\n*/\nimport \"C\"\n"); err != nil { return err } return nil } func Types(w io.Writer, types []types.Type) error { for _, t := range types { if t.Blacklisted() { continue } if err := DeclareType(w, t); err != nil { return err } } return nil } func Funcs(w io.Writer, funcs []types.Function, typePkg string) error { for _, fn := range funcs { if fn.Blacklisted() { continue } if err := DeclareFunc(w, fn, typePkg); err != nil { return err } } return nil } ================================================ FILE: generator/writer/package_test.go ================================================ package writer import ( "bytes" "io" "strings" "testing" ) func TestPackage(t *testing.T) { var buf bytes.Buffer Package(&buf, "abc") if buf.String() != "package abc\n" { t.Fatal(buf.String()) } } func TestImport(t *testing.T) { var buf bytes.Buffer Import(&buf, []string{ "pkg1", "pkg2", "pkg3", }) if buf.String() != ` import ( "pkg1" "pkg2" "pkg3" ) ` { t.Fatal(buf.String()) } } func TestImportC(t *testing.T) { var buf bytes.Buffer ImportC(&buf, func(w io.Writer) error { io.WriteString(w, strings.Join([]string{ `#include `, `#include `, }, "\n")) return nil }) ref := ` /* #include #include */ import "C" ` if buf.String() != ref { t.Fatal(buf.String()) } } ================================================ FILE: generator/writer/type.go ================================================ package writer import ( "fmt" "io" "github.com/alivanz/go-simd/generator/types" ) func DeclareType(w io.Writer, t types.Type) error { var err error if len(t.Full) > 0 { _, err = fmt.Fprintf(w, "\n// %s\ntype %s = C.%s\n", t.Full, t.Go(""), t.CGO()) } else { _, err = fmt.Fprintf(w, "\ntype %s = C.%s\n", t.Go(""), t.CGO()) } return err } ================================================ FILE: generator/writer/writer.go ================================================ package writer import ( "io" "os" "path/filepath" ) func WriteToFile(dst string, fn func(w io.Writer) error) error { if len(dst) == 0 { return nil } dst, err := filepath.Abs(dst) if err != nil { return err } if err := os.MkdirAll(filepath.Dir(dst), os.ModePerm); err != nil { return err } f, err := os.OpenFile(dst, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.ModePerm) if err != nil { return err } defer f.Close() return fn(f) } ================================================ FILE: generator/x86/info.go ================================================ package main import ( "bytes" "io/ioutil" "regexp" "github.com/alivanz/go-simd/generator/utils" ) type Intrinsic struct { Name string CpuID string Description string Operation string } var ( regIntrinsic = regexp.MustCompile(``) regName = regexp.MustCompile(`name="(.+?)"`) regDescription = regexp.MustCompile(`(.+?)`) regCpuID = regexp.MustCompile(`(.+?)`) ) func GetIntrinsic() ([]*Intrinsic, error) { if err := utils.Download( "data.xml", "https://www.intel.com/content/dam/develop/public/us/en/include/intrinsics-guide/data-3-6-6.xml", ); err != nil { return nil, err } raw, err := ioutil.ReadFile("data.xml") if err != nil { return nil, err } raw = bytes.ReplaceAll(raw, []byte("\n"), []byte("")) intrins := regIntrinsic.FindAll(raw, -1) out := make([]*Intrinsic, len(intrins)) for i, part := range intrins { var intrin Intrinsic if match := regName.FindSubmatch(part); match != nil { intrin.Name = string(match[1]) } if match := regDescription.FindSubmatch(part); match != nil { intrin.Description = string(match[1]) } if match := regCpuID.FindSubmatch(part); match != nil { intrin.CpuID = string(match[1]) } out[i] = &intrin } return out, nil } ================================================ FILE: generator/x86/main.go ================================================ package main import ( "bytes" "fmt" "io" "log" "os" "os/exec" "regexp" "strings" "github.com/alivanz/go-simd/generator/scanner" "github.com/alivanz/go-simd/generator/types" "github.com/alivanz/go-simd/generator/utils" "github.com/alivanz/go-simd/generator/writer" ) var ( regComma = regexp.MustCompile(`\s*,\s*`) ) func main() { // generate cmd := exec.Command("clang", "-march=native", "-E", "-") cmd.Stdin = bytes.NewBufferString(strings.Join([]string{ "#include ", }, "\n")) cmd.Stderr = os.Stderr src, err := cmd.Output() if err != nil { log.Fatal(err) } // raw if err := writer.WriteToFile("raw.h", func(w io.Writer) error { _, err := w.Write(src) return err }); err != nil { log.Fatal(err) } // scan result, err := scanner.Scan(src) if err != nil { log.Fatal(err) } // filter functions mfunc := make(map[string]bool) result.Functions = utils.Filter(result.Functions, func(fn types.Function) bool { if mfunc[fn.Name] { return false } if len(fn.Target()) == 0 { return false } mfunc[fn.Name] = true return true }) // filter types mtype := make(map[string]bool) for _, fn := range result.Functions { if fn.Return != nil { mtype[fn.Return.Name] = true // append type result.Types = append(result.Types, *fn.Return) } for _, arg := range fn.Args { mtype[arg.Name] = true result.Types = append(result.Types, arg) } } result.Types = utils.Filter(result.Types, func(t types.Type) bool { if !mtype[t.Name] { return false } // remove dup delete(mtype, t.Name) return true }) // types if err := writer.WriteToFile("types.go", func(w io.Writer) error { if err := writer.Package(w, "x86"); err != nil { return err } if err := writer.ImportC(w, func(w io.Writer) error { fmt.Fprintf(w, "#include ") return err }); err != nil { return err } if err := writer.Types(w, result.Types); err != nil { return err } return nil }); err != nil { log.Fatal(err) } // patch funcs intrins, err := GetIntrinsic() if err != nil { log.Fatal(err) } log.Printf("%+v", intrins[0]) mintrin := make(map[string]*Intrinsic) for _, intrin := range intrins { mintrin[intrin.Name] = intrin } log.Printf("%+v", mintrin["_mm_fmsubadd_pd"]) for i, fn := range result.Functions { if intrin, found := mintrin[fn.Name]; found { result.Functions[i].Comment = intrin.Description } } // group funcs by target mf := make(map[string][]types.Function) for _, fn := range result.Functions { target := fn.Target() mf[target] = append(mf[target], fn) } // funcs for target, funcs := range mf { target = regComma.ReplaceAllString(target, "_") cname := fmt.Sprintf("%s/functions.c", target) fname := fmt.Sprintf("%s/functions.go", target) // write C if err := writer.WriteToFile(cname, func(w io.Writer) error { if _, err := io.WriteString(w, "#include \n\n"); err != nil { return err } for _, fn := range funcs { if fn.Blacklisted() { continue } if err := writer.RewriteC(w, fn); err != nil { return err } } return nil }); err != nil { log.Fatal(err) } // write Go if err := writer.WriteToFile(fname, func(w io.Writer) error { if err := writer.Package(w, target); err != nil { return err } if err := writer.Import(w, []string{ "github.com/alivanz/go-simd/x86", }); err != nil { return err } if err := writer.ImportC(w, func(w io.Writer) error { feats := strings.Split(target, "_") if len(feats) > 0 { fmt.Fprintf(w, "#cgo CFLAGS: %s\n", strings.Join(utils.Transform(feats, func(i int, feat string) string { return "-m" + feat }), " ")) } fmt.Fprintf(w, "#include ") return err }); err != nil { return err } for _, fn := range funcs { if fn.Blacklisted() { continue } if err := writer.DeclareFuncBypass(w, fn, "x86"); err != nil { return err } } return nil }); err != nil { log.Fatal(err) } } } ================================================ FILE: go.mod ================================================ module github.com/alivanz/go-simd go 1.20 require github.com/iancoleman/strcase v0.2.0 ================================================ FILE: go.sum ================================================ github.com/iancoleman/strcase v0.2.0 h1:05I4QRnGpI0m37iZQRuskXh+w77mr6Z41lwQzuHLwW0= github.com/iancoleman/strcase v0.2.0/go.mod h1:iwCmte+B7n89clKwxIoIXy/HfoL7AsD47ZCWhYzw7ho= ================================================ FILE: x86/aes/functions.c ================================================ #include void MmAesencSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_aesenc_si128(*v0, *v1); } void MmAesenclastSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_aesenclast_si128(*v0, *v1); } void MmAesdecSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_aesdec_si128(*v0, *v1); } void MmAesdeclastSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_aesdeclast_si128(*v0, *v1); } void MmAesimcSi128(__m128i* r, __m128i* v0) { *r = _mm_aesimc_si128(*v0); } ================================================ FILE: x86/aes/functions.go ================================================ package aes import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -maes #include */ import "C" // Perform one round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst"." // //go:linkname MmAesencSi128 MmAesencSi128 //go:noescape func MmAesencSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Perform the last round of an AES encryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst"." // //go:linkname MmAesenclastSi128 MmAesenclastSi128 //go:noescape func MmAesenclastSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Perform one round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst". // //go:linkname MmAesdecSi128 MmAesdecSi128 //go:noescape func MmAesdecSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Perform the last round of an AES decryption flow on data (state) in "a" using the round key in "RoundKey", and store the result in "dst". // //go:linkname MmAesdeclastSi128 MmAesdeclastSi128 //go:noescape func MmAesdeclastSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Perform the InvMixColumns transformation on "a" and store the result in "dst". // //go:linkname MmAesimcSi128 MmAesimcSi128 //go:noescape func MmAesimcSi128(r *x86.M128I, v0 *x86.M128I) ================================================ FILE: x86/avx/functions.c ================================================ #include void Mm256AddPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_add_pd(*v0, *v1); } void Mm256AddPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_add_ps(*v0, *v1); } void Mm256SubPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_sub_pd(*v0, *v1); } void Mm256SubPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_sub_ps(*v0, *v1); } void Mm256AddsubPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_addsub_pd(*v0, *v1); } void Mm256AddsubPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_addsub_ps(*v0, *v1); } void Mm256DivPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_div_pd(*v0, *v1); } void Mm256DivPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_div_ps(*v0, *v1); } void Mm256MaxPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_max_pd(*v0, *v1); } void Mm256MaxPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_max_ps(*v0, *v1); } void Mm256MinPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_min_pd(*v0, *v1); } void Mm256MinPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_min_ps(*v0, *v1); } void Mm256MulPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_mul_pd(*v0, *v1); } void Mm256MulPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_mul_ps(*v0, *v1); } void Mm256SqrtPd(__m256d* r, __m256d* v0) { *r = _mm256_sqrt_pd(*v0); } void Mm256SqrtPs(__m256* r, __m256* v0) { *r = _mm256_sqrt_ps(*v0); } void Mm256RsqrtPs(__m256* r, __m256* v0) { *r = _mm256_rsqrt_ps(*v0); } void Mm256RcpPs(__m256* r, __m256* v0) { *r = _mm256_rcp_ps(*v0); } void Mm256AndPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_and_pd(*v0, *v1); } void Mm256AndPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_and_ps(*v0, *v1); } void Mm256AndnotPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_andnot_pd(*v0, *v1); } void Mm256AndnotPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_andnot_ps(*v0, *v1); } void Mm256OrPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_or_pd(*v0, *v1); } void Mm256OrPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_or_ps(*v0, *v1); } void Mm256XorPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_xor_pd(*v0, *v1); } void Mm256XorPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_xor_ps(*v0, *v1); } void Mm256HaddPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_hadd_pd(*v0, *v1); } void Mm256HaddPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_hadd_ps(*v0, *v1); } void Mm256HsubPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_hsub_pd(*v0, *v1); } void Mm256HsubPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_hsub_ps(*v0, *v1); } void MmPermutevarPd(__m128d* r, __m128d* v0, __m128i* v1) { *r = _mm_permutevar_pd(*v0, *v1); } void Mm256PermutevarPd(__m256d* r, __m256d* v0, __m256i* v1) { *r = _mm256_permutevar_pd(*v0, *v1); } void MmPermutevarPs(__m128* r, __m128* v0, __m128i* v1) { *r = _mm_permutevar_ps(*v0, *v1); } void Mm256PermutevarPs(__m256* r, __m256* v0, __m256i* v1) { *r = _mm256_permutevar_ps(*v0, *v1); } void Mm256BlendvPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_blendv_pd(*v0, *v1, *v2); } void Mm256BlendvPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_blendv_ps(*v0, *v1, *v2); } void Mm256Cvtepi32Pd(__m256d* r, __m128i* v0) { *r = _mm256_cvtepi32_pd(*v0); } void Mm256Cvtepi32Ps(__m256* r, __m256i* v0) { *r = _mm256_cvtepi32_ps(*v0); } void Mm256CvtpdPs(__m128* r, __m256d* v0) { *r = _mm256_cvtpd_ps(*v0); } void Mm256CvtpsEpi32(__m256i* r, __m256* v0) { *r = _mm256_cvtps_epi32(*v0); } void Mm256CvtpsPd(__m256d* r, __m128* v0) { *r = _mm256_cvtps_pd(*v0); } void Mm256CvttpdEpi32(__m128i* r, __m256d* v0) { *r = _mm256_cvttpd_epi32(*v0); } void Mm256CvtpdEpi32(__m128i* r, __m256d* v0) { *r = _mm256_cvtpd_epi32(*v0); } void Mm256CvttpsEpi32(__m256i* r, __m256* v0) { *r = _mm256_cvttps_epi32(*v0); } void Mm256CvtsdF64(double* r, __m256d* v0) { *r = _mm256_cvtsd_f64(*v0); } void Mm256Cvtsi256Si32(int* r, __m256i* v0) { *r = _mm256_cvtsi256_si32(*v0); } void Mm256CvtssF32(float* r, __m256* v0) { *r = _mm256_cvtss_f32(*v0); } void Mm256MovehdupPs(__m256* r, __m256* v0) { *r = _mm256_movehdup_ps(*v0); } void Mm256MoveldupPs(__m256* r, __m256* v0) { *r = _mm256_moveldup_ps(*v0); } void Mm256MovedupPd(__m256d* r, __m256d* v0) { *r = _mm256_movedup_pd(*v0); } void Mm256UnpackhiPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_unpackhi_pd(*v0, *v1); } void Mm256UnpackloPd(__m256d* r, __m256d* v0, __m256d* v1) { *r = _mm256_unpacklo_pd(*v0, *v1); } void Mm256UnpackhiPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_unpackhi_ps(*v0, *v1); } void Mm256UnpackloPs(__m256* r, __m256* v0, __m256* v1) { *r = _mm256_unpacklo_ps(*v0, *v1); } void MmTestzPd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_testz_pd(*v0, *v1); } void MmTestcPd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_testc_pd(*v0, *v1); } void MmTestnzcPd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_testnzc_pd(*v0, *v1); } void MmTestzPs(int* r, __m128* v0, __m128* v1) { *r = _mm_testz_ps(*v0, *v1); } void MmTestcPs(int* r, __m128* v0, __m128* v1) { *r = _mm_testc_ps(*v0, *v1); } void MmTestnzcPs(int* r, __m128* v0, __m128* v1) { *r = _mm_testnzc_ps(*v0, *v1); } void Mm256TestzPd(int* r, __m256d* v0, __m256d* v1) { *r = _mm256_testz_pd(*v0, *v1); } void Mm256TestcPd(int* r, __m256d* v0, __m256d* v1) { *r = _mm256_testc_pd(*v0, *v1); } void Mm256TestnzcPd(int* r, __m256d* v0, __m256d* v1) { *r = _mm256_testnzc_pd(*v0, *v1); } void Mm256TestzPs(int* r, __m256* v0, __m256* v1) { *r = _mm256_testz_ps(*v0, *v1); } void Mm256TestcPs(int* r, __m256* v0, __m256* v1) { *r = _mm256_testc_ps(*v0, *v1); } void Mm256TestnzcPs(int* r, __m256* v0, __m256* v1) { *r = _mm256_testnzc_ps(*v0, *v1); } void Mm256TestzSi256(int* r, __m256i* v0, __m256i* v1) { *r = _mm256_testz_si256(*v0, *v1); } void Mm256TestcSi256(int* r, __m256i* v0, __m256i* v1) { *r = _mm256_testc_si256(*v0, *v1); } void Mm256TestnzcSi256(int* r, __m256i* v0, __m256i* v1) { *r = _mm256_testnzc_si256(*v0, *v1); } void Mm256MovemaskPd(int* r, __m256d* v0) { *r = _mm256_movemask_pd(*v0); } void Mm256MovemaskPs(int* r, __m256* v0) { *r = _mm256_movemask_ps(*v0); } void Mm256Zeroall() { _mm256_zeroall(); } void Mm256Zeroupper() { _mm256_zeroupper(); } void Mm256UndefinedPd(__m256d* r) { *r = _mm256_undefined_pd(); } void Mm256UndefinedPs(__m256* r) { *r = _mm256_undefined_ps(); } void Mm256UndefinedSi256(__m256i* r) { *r = _mm256_undefined_si256(); } void Mm256SetPd(__m256d* r, double* v0, double* v1, double* v2, double* v3) { *r = _mm256_set_pd(*v0, *v1, *v2, *v3); } void Mm256SetPs(__m256* r, float* v0, float* v1, float* v2, float* v3, float* v4, float* v5, float* v6, float* v7) { *r = _mm256_set_ps(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); } void Mm256SetEpi32(__m256i* r, int* v0, int* v1, int* v2, int* v3, int* v4, int* v5, int* v6, int* v7) { *r = _mm256_set_epi32(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); } void Mm256SetEpi16(__m256i* r, short* v0, short* v1, short* v2, short* v3, short* v4, short* v5, short* v6, short* v7, short* v8, short* v9, short* v10, short* v11, short* v12, short* v13, short* v14, short* v15) { *r = _mm256_set_epi16(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15); } void Mm256SetEpi8(__m256i* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7, char* v8, char* v9, char* v10, char* v11, char* v12, char* v13, char* v14, char* v15, char* v16, char* v17, char* v18, char* v19, char* v20, char* v21, char* v22, char* v23, char* v24, char* v25, char* v26, char* v27, char* v28, char* v29, char* v30, char* v31) { *r = _mm256_set_epi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15, *v16, *v17, *v18, *v19, *v20, *v21, *v22, *v23, *v24, *v25, *v26, *v27, *v28, *v29, *v30, *v31); } void Mm256SetEpi64X(__m256i* r, long long* v0, long long* v1, long long* v2, long long* v3) { *r = _mm256_set_epi64x(*v0, *v1, *v2, *v3); } void Mm256SetrPd(__m256d* r, double* v0, double* v1, double* v2, double* v3) { *r = _mm256_setr_pd(*v0, *v1, *v2, *v3); } void Mm256SetrPs(__m256* r, float* v0, float* v1, float* v2, float* v3, float* v4, float* v5, float* v6, float* v7) { *r = _mm256_setr_ps(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); } void Mm256SetrEpi32(__m256i* r, int* v0, int* v1, int* v2, int* v3, int* v4, int* v5, int* v6, int* v7) { *r = _mm256_setr_epi32(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); } void Mm256SetrEpi16(__m256i* r, short* v0, short* v1, short* v2, short* v3, short* v4, short* v5, short* v6, short* v7, short* v8, short* v9, short* v10, short* v11, short* v12, short* v13, short* v14, short* v15) { *r = _mm256_setr_epi16(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15); } void Mm256SetrEpi8(__m256i* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7, char* v8, char* v9, char* v10, char* v11, char* v12, char* v13, char* v14, char* v15, char* v16, char* v17, char* v18, char* v19, char* v20, char* v21, char* v22, char* v23, char* v24, char* v25, char* v26, char* v27, char* v28, char* v29, char* v30, char* v31) { *r = _mm256_setr_epi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15, *v16, *v17, *v18, *v19, *v20, *v21, *v22, *v23, *v24, *v25, *v26, *v27, *v28, *v29, *v30, *v31); } void Mm256SetrEpi64X(__m256i* r, long long* v0, long long* v1, long long* v2, long long* v3) { *r = _mm256_setr_epi64x(*v0, *v1, *v2, *v3); } void Mm256Set1Pd(__m256d* r, double* v0) { *r = _mm256_set1_pd(*v0); } void Mm256Set1Ps(__m256* r, float* v0) { *r = _mm256_set1_ps(*v0); } void Mm256Set1Epi32(__m256i* r, int* v0) { *r = _mm256_set1_epi32(*v0); } void Mm256Set1Epi16(__m256i* r, short* v0) { *r = _mm256_set1_epi16(*v0); } void Mm256Set1Epi8(__m256i* r, char* v0) { *r = _mm256_set1_epi8(*v0); } void Mm256Set1Epi64X(__m256i* r, long long* v0) { *r = _mm256_set1_epi64x(*v0); } void Mm256SetzeroPd(__m256d* r) { *r = _mm256_setzero_pd(); } void Mm256SetzeroPs(__m256* r) { *r = _mm256_setzero_ps(); } void Mm256SetzeroSi256(__m256i* r) { *r = _mm256_setzero_si256(); } void Mm256CastpdPs(__m256* r, __m256d* v0) { *r = _mm256_castpd_ps(*v0); } void Mm256CastpdSi256(__m256i* r, __m256d* v0) { *r = _mm256_castpd_si256(*v0); } void Mm256CastpsPd(__m256d* r, __m256* v0) { *r = _mm256_castps_pd(*v0); } void Mm256CastpsSi256(__m256i* r, __m256* v0) { *r = _mm256_castps_si256(*v0); } void Mm256Castsi256Ps(__m256* r, __m256i* v0) { *r = _mm256_castsi256_ps(*v0); } void Mm256Castsi256Pd(__m256d* r, __m256i* v0) { *r = _mm256_castsi256_pd(*v0); } void Mm256Castpd256Pd128(__m128d* r, __m256d* v0) { *r = _mm256_castpd256_pd128(*v0); } void Mm256Castps256Ps128(__m128* r, __m256* v0) { *r = _mm256_castps256_ps128(*v0); } void Mm256Castsi256Si128(__m128i* r, __m256i* v0) { *r = _mm256_castsi256_si128(*v0); } void Mm256Castpd128Pd256(__m256d* r, __m128d* v0) { *r = _mm256_castpd128_pd256(*v0); } void Mm256Castps128Ps256(__m256* r, __m128* v0) { *r = _mm256_castps128_ps256(*v0); } void Mm256Castsi128Si256(__m256i* r, __m128i* v0) { *r = _mm256_castsi128_si256(*v0); } void Mm256Zextpd128Pd256(__m256d* r, __m128d* v0) { *r = _mm256_zextpd128_pd256(*v0); } void Mm256Zextps128Ps256(__m256* r, __m128* v0) { *r = _mm256_zextps128_ps256(*v0); } void Mm256Zextsi128Si256(__m256i* r, __m128i* v0) { *r = _mm256_zextsi128_si256(*v0); } void Mm256SetM128(__m256* r, __m128* v0, __m128* v1) { *r = _mm256_set_m128(*v0, *v1); } void Mm256SetM128D(__m256d* r, __m128d* v0, __m128d* v1) { *r = _mm256_set_m128d(*v0, *v1); } void Mm256SetM128I(__m256i* r, __m128i* v0, __m128i* v1) { *r = _mm256_set_m128i(*v0, *v1); } void Mm256SetrM128(__m256* r, __m128* v0, __m128* v1) { *r = _mm256_setr_m128(*v0, *v1); } void Mm256SetrM128D(__m256d* r, __m128d* v0, __m128d* v1) { *r = _mm256_setr_m128d(*v0, *v1); } void Mm256SetrM128I(__m256i* r, __m128i* v0, __m128i* v1) { *r = _mm256_setr_m128i(*v0, *v1); } ================================================ FILE: x86/avx/functions.go ================================================ package avx import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mavx #include */ import "C" // Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname Mm256AddPd Mm256AddPd //go:noescape func Mm256AddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname Mm256AddPs Mm256AddPs //go:noescape func Mm256AddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". // //go:linkname Mm256SubPd Mm256SubPd //go:noescape func Mm256SubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". // //go:linkname Mm256SubPs Mm256SubPs //go:noescape func Mm256SubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Alternatively add and subtract packed double-precision (64-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst". // //go:linkname Mm256AddsubPd Mm256AddsubPd //go:noescape func Mm256AddsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Alternatively add and subtract packed single-precision (32-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst". // //go:linkname Mm256AddsubPs Mm256AddsubPs //go:noescape func Mm256AddsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst". // //go:linkname Mm256DivPd Mm256DivPd //go:noescape func Mm256DivPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst". // //go:linkname Mm256DivPs Mm256DivPs //go:noescape func Mm256DivPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note] // //go:linkname Mm256MaxPd Mm256MaxPd //go:noescape func Mm256MaxPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note] // //go:linkname Mm256MaxPs Mm256MaxPs //go:noescape func Mm256MaxPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note] // //go:linkname Mm256MinPd Mm256MinPd //go:noescape func Mm256MinPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note] // //go:linkname Mm256MinPs Mm256MinPs //go:noescape func Mm256MinPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname Mm256MulPd Mm256MulPd //go:noescape func Mm256MulPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname Mm256MulPs Mm256MulPs //go:noescape func Mm256MulPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". // //go:linkname Mm256SqrtPd Mm256SqrtPd //go:noescape func Mm256SqrtPd(r *x86.M256D, v0 *x86.M256D) // Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". // //go:linkname Mm256SqrtPs Mm256SqrtPs //go:noescape func Mm256SqrtPs(r *x86.M256, v0 *x86.M256) // Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12. // //go:linkname Mm256RsqrtPs Mm256RsqrtPs //go:noescape func Mm256RsqrtPs(r *x86.M256, v0 *x86.M256) // Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12. // //go:linkname Mm256RcpPs Mm256RcpPs //go:noescape func Mm256RcpPs(r *x86.M256, v0 *x86.M256) // Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname Mm256AndPd Mm256AndPd //go:noescape func Mm256AndPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname Mm256AndPs Mm256AndPs //go:noescape func Mm256AndPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst". // //go:linkname Mm256AndnotPd Mm256AndnotPd //go:noescape func Mm256AndnotPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst". // //go:linkname Mm256AndnotPs Mm256AndnotPs //go:noescape func Mm256AndnotPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname Mm256OrPd Mm256OrPd //go:noescape func Mm256OrPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname Mm256OrPs Mm256OrPs //go:noescape func Mm256OrPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname Mm256XorPd Mm256XorPd //go:noescape func Mm256XorPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname Mm256XorPs Mm256XorPs //go:noescape func Mm256XorPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst". // //go:linkname Mm256HaddPd Mm256HaddPd //go:noescape func Mm256HaddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst". // //go:linkname Mm256HaddPs Mm256HaddPs //go:noescape func Mm256HaddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst". // //go:linkname Mm256HsubPd Mm256HsubPd //go:noescape func Mm256HsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Horizontally subtract adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst". // //go:linkname Mm256HsubPs Mm256HsubPs //go:noescape func Mm256HsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Shuffle double-precision (64-bit) floating-point elements in "a" using the control in "b", and store the results in "dst". // //go:linkname MmPermutevarPd MmPermutevarPd //go:noescape func MmPermutevarPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128I) // Shuffle double-precision (64-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst". // //go:linkname Mm256PermutevarPd Mm256PermutevarPd //go:noescape func Mm256PermutevarPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256I) // Shuffle single-precision (32-bit) floating-point elements in "a" using the control in "b", and store the results in "dst". // //go:linkname MmPermutevarPs MmPermutevarPs //go:noescape func MmPermutevarPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128I) // Shuffle single-precision (32-bit) floating-point elements in "a" within 128-bit lanes using the control in "b", and store the results in "dst". // //go:linkname Mm256PermutevarPs Mm256PermutevarPs //go:noescape func Mm256PermutevarPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256I) // Blend packed double-precision (64-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst". // //go:linkname Mm256BlendvPd Mm256BlendvPd //go:noescape func Mm256BlendvPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D) // Blend packed single-precision (32-bit) floating-point elements from "a" and "b" using "mask", and store the results in "dst". // //go:linkname Mm256BlendvPs Mm256BlendvPs //go:noescape func Mm256BlendvPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256) // Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". // //go:linkname Mm256Cvtepi32Pd Mm256Cvtepi32Pd //go:noescape func Mm256Cvtepi32Pd(r *x86.M256D, v0 *x86.M128I) // Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". // //go:linkname Mm256Cvtepi32Ps Mm256Cvtepi32Ps //go:noescape func Mm256Cvtepi32Ps(r *x86.M256, v0 *x86.M256I) // Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". // //go:linkname Mm256CvtpdPs Mm256CvtpdPs //go:noescape func Mm256CvtpdPs(r *x86.M128, v0 *x86.M256D) // Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". // //go:linkname Mm256CvtpsEpi32 Mm256CvtpsEpi32 //go:noescape func Mm256CvtpsEpi32(r *x86.M256I, v0 *x86.M256) // Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". // //go:linkname Mm256CvtpsPd Mm256CvtpsPd //go:noescape func Mm256CvtpsPd(r *x86.M256D, v0 *x86.M128) // Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". // //go:linkname Mm256CvttpdEpi32 Mm256CvttpdEpi32 //go:noescape func Mm256CvttpdEpi32(r *x86.M128I, v0 *x86.M256D) // Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". // //go:linkname Mm256CvtpdEpi32 Mm256CvtpdEpi32 //go:noescape func Mm256CvtpdEpi32(r *x86.M128I, v0 *x86.M256D) // Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". // //go:linkname Mm256CvttpsEpi32 Mm256CvttpsEpi32 //go:noescape func Mm256CvttpsEpi32(r *x86.M256I, v0 *x86.M256) // Copy the lower double-precision (64-bit) floating-point element of "a" to "dst". // //go:linkname Mm256CvtsdF64 Mm256CvtsdF64 //go:noescape func Mm256CvtsdF64(r *x86.Double, v0 *x86.M256D) // Copy the lower 32-bit integer in "a" to "dst". // //go:linkname Mm256Cvtsi256Si32 Mm256Cvtsi256Si32 //go:noescape func Mm256Cvtsi256Si32(r *x86.Int, v0 *x86.M256I) // Copy the lower single-precision (32-bit) floating-point element of "a" to "dst". // //go:linkname Mm256CvtssF32 Mm256CvtssF32 //go:noescape func Mm256CvtssF32(r *x86.Float, v0 *x86.M256) // Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst". // //go:linkname Mm256MovehdupPs Mm256MovehdupPs //go:noescape func Mm256MovehdupPs(r *x86.M256, v0 *x86.M256) // Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst". // //go:linkname Mm256MoveldupPs Mm256MoveldupPs //go:noescape func Mm256MoveldupPs(r *x86.M256, v0 *x86.M256) // Duplicate even-indexed double-precision (64-bit) floating-point elements from "a", and store the results in "dst". // //go:linkname Mm256MovedupPd Mm256MovedupPd //go:noescape func Mm256MovedupPd(r *x86.M256D, v0 *x86.M256D) // Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". // //go:linkname Mm256UnpackhiPd Mm256UnpackhiPd //go:noescape func Mm256UnpackhiPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". // //go:linkname Mm256UnpackloPd Mm256UnpackloPd //go:noescape func Mm256UnpackloPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D) // Unpack and interleave single-precision (32-bit) floating-point elements from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". // //go:linkname Mm256UnpackhiPs Mm256UnpackhiPs //go:noescape func Mm256UnpackhiPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". // //go:linkname Mm256UnpackloPs Mm256UnpackloPs //go:noescape func Mm256UnpackloPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256) // Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value. // //go:linkname MmTestzPd MmTestzPd //go:noescape func MmTestzPd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value. // //go:linkname MmTestcPd MmTestcPd //go:noescape func MmTestcPd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Compute the bitwise AND of 128 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0. // //go:linkname MmTestnzcPd MmTestnzcPd //go:noescape func MmTestnzcPd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value. // //go:linkname MmTestzPs MmTestzPs //go:noescape func MmTestzPs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value. // //go:linkname MmTestcPs MmTestcPs //go:noescape func MmTestcPs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Compute the bitwise AND of 128 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 128-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0. // //go:linkname MmTestnzcPs MmTestnzcPs //go:noescape func MmTestnzcPs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value. // //go:linkname Mm256TestzPd Mm256TestzPd //go:noescape func Mm256TestzPd(r *x86.Int, v0 *x86.M256D, v1 *x86.M256D) // Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value. // //go:linkname Mm256TestcPd Mm256TestcPd //go:noescape func Mm256TestcPd(r *x86.Int, v0 *x86.M256D, v1 *x86.M256D) // Compute the bitwise AND of 256 bits (representing double-precision (64-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 64-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0. // //go:linkname Mm256TestnzcPd Mm256TestnzcPd //go:noescape func Mm256TestnzcPd(r *x86.Int, v0 *x86.M256D, v1 *x86.M256D) // Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "ZF" value. // //go:linkname Mm256TestzPs Mm256TestzPs //go:noescape func Mm256TestzPs(r *x86.Int, v0 *x86.M256, v1 *x86.M256) // Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return the "CF" value. // //go:linkname Mm256TestcPs Mm256TestcPs //go:noescape func Mm256TestcPs(r *x86.Int, v0 *x86.M256, v1 *x86.M256) // Compute the bitwise AND of 256 bits (representing single-precision (32-bit) floating-point elements) in "a" and "b", producing an intermediate 256-bit value, and set "ZF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", producing an intermediate value, and set "CF" to 1 if the sign bit of each 32-bit element in the intermediate value is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0. // //go:linkname Mm256TestnzcPs Mm256TestnzcPs //go:noescape func Mm256TestnzcPs(r *x86.Int, v0 *x86.M256, v1 *x86.M256) // Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "ZF" value. // //go:linkname Mm256TestzSi256 Mm256TestzSi256 //go:noescape func Mm256TestzSi256(r *x86.Int, v0 *x86.M256I, v1 *x86.M256I) // Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return the "CF" value. // //go:linkname Mm256TestcSi256 Mm256TestcSi256 //go:noescape func Mm256TestcSi256(r *x86.Int, v0 *x86.M256I, v1 *x86.M256I) // Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and set "ZF" to 1 if the result is zero, otherwise set "ZF" to 0. Compute the bitwise NOT of "a" and then AND with "b", and set "CF" to 1 if the result is zero, otherwise set "CF" to 0. Return 1 if both the "ZF" and "CF" values are zero, otherwise return 0. // //go:linkname Mm256TestnzcSi256 Mm256TestnzcSi256 //go:noescape func Mm256TestnzcSi256(r *x86.Int, v0 *x86.M256I, v1 *x86.M256I) // Set each bit of mask "dst" based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in "a". // //go:linkname Mm256MovemaskPd Mm256MovemaskPd //go:noescape func Mm256MovemaskPd(r *x86.Int, v0 *x86.M256D) // Set each bit of mask "dst" based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in "a". // //go:linkname Mm256MovemaskPs Mm256MovemaskPs //go:noescape func Mm256MovemaskPs(r *x86.Int, v0 *x86.M256) // Zero the contents of all XMM or YMM registers. // //go:linkname Mm256Zeroall Mm256Zeroall //go:noescape func Mm256Zeroall() // Zero the upper 128 bits of all YMM registers; the lower 128-bits of the registers are unmodified. // //go:linkname Mm256Zeroupper Mm256Zeroupper //go:noescape func Mm256Zeroupper() // Return vector of type __m256d with undefined elements. // //go:linkname Mm256UndefinedPd Mm256UndefinedPd //go:noescape func Mm256UndefinedPd(r *x86.M256D, ) // Return vector of type __m256 with undefined elements. // //go:linkname Mm256UndefinedPs Mm256UndefinedPs //go:noescape func Mm256UndefinedPs(r *x86.M256, ) // Return vector of type __m256i with undefined elements. // //go:linkname Mm256UndefinedSi256 Mm256UndefinedSi256 //go:noescape func Mm256UndefinedSi256(r *x86.M256I, ) // Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values. // //go:linkname Mm256SetPd Mm256SetPd //go:noescape func Mm256SetPd(r *x86.M256D, v0 *x86.Double, v1 *x86.Double, v2 *x86.Double, v3 *x86.Double) // Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values. // //go:linkname Mm256SetPs Mm256SetPs //go:noescape func Mm256SetPs(r *x86.M256, v0 *x86.Float, v1 *x86.Float, v2 *x86.Float, v3 *x86.Float, v4 *x86.Float, v5 *x86.Float, v6 *x86.Float, v7 *x86.Float) // Set packed 32-bit integers in "dst" with the supplied values. // //go:linkname Mm256SetEpi32 Mm256SetEpi32 //go:noescape func Mm256SetEpi32(r *x86.M256I, v0 *x86.Int, v1 *x86.Int, v2 *x86.Int, v3 *x86.Int, v4 *x86.Int, v5 *x86.Int, v6 *x86.Int, v7 *x86.Int) // Set packed 16-bit integers in "dst" with the supplied values. // //go:linkname Mm256SetEpi16 Mm256SetEpi16 //go:noescape func Mm256SetEpi16(r *x86.M256I, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short, v4 *x86.Short, v5 *x86.Short, v6 *x86.Short, v7 *x86.Short, v8 *x86.Short, v9 *x86.Short, v10 *x86.Short, v11 *x86.Short, v12 *x86.Short, v13 *x86.Short, v14 *x86.Short, v15 *x86.Short) // Set packed 8-bit integers in "dst" with the supplied values. // //go:linkname Mm256SetEpi8 Mm256SetEpi8 //go:noescape func Mm256SetEpi8(r *x86.M256I, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char, v8 *x86.Char, v9 *x86.Char, v10 *x86.Char, v11 *x86.Char, v12 *x86.Char, v13 *x86.Char, v14 *x86.Char, v15 *x86.Char, v16 *x86.Char, v17 *x86.Char, v18 *x86.Char, v19 *x86.Char, v20 *x86.Char, v21 *x86.Char, v22 *x86.Char, v23 *x86.Char, v24 *x86.Char, v25 *x86.Char, v26 *x86.Char, v27 *x86.Char, v28 *x86.Char, v29 *x86.Char, v30 *x86.Char, v31 *x86.Char) // Set packed 64-bit integers in "dst" with the supplied values. // //go:linkname Mm256SetEpi64X Mm256SetEpi64X //go:noescape func Mm256SetEpi64X(r *x86.M256I, v0 *x86.Longlong, v1 *x86.Longlong, v2 *x86.Longlong, v3 *x86.Longlong) // Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values in reverse order. // //go:linkname Mm256SetrPd Mm256SetrPd //go:noescape func Mm256SetrPd(r *x86.M256D, v0 *x86.Double, v1 *x86.Double, v2 *x86.Double, v3 *x86.Double) // Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values in reverse order. // //go:linkname Mm256SetrPs Mm256SetrPs //go:noescape func Mm256SetrPs(r *x86.M256, v0 *x86.Float, v1 *x86.Float, v2 *x86.Float, v3 *x86.Float, v4 *x86.Float, v5 *x86.Float, v6 *x86.Float, v7 *x86.Float) // Set packed 32-bit integers in "dst" with the supplied values in reverse order. // //go:linkname Mm256SetrEpi32 Mm256SetrEpi32 //go:noescape func Mm256SetrEpi32(r *x86.M256I, v0 *x86.Int, v1 *x86.Int, v2 *x86.Int, v3 *x86.Int, v4 *x86.Int, v5 *x86.Int, v6 *x86.Int, v7 *x86.Int) // Set packed 16-bit integers in "dst" with the supplied values in reverse order. // //go:linkname Mm256SetrEpi16 Mm256SetrEpi16 //go:noescape func Mm256SetrEpi16(r *x86.M256I, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short, v4 *x86.Short, v5 *x86.Short, v6 *x86.Short, v7 *x86.Short, v8 *x86.Short, v9 *x86.Short, v10 *x86.Short, v11 *x86.Short, v12 *x86.Short, v13 *x86.Short, v14 *x86.Short, v15 *x86.Short) // Set packed 8-bit integers in "dst" with the supplied values in reverse order. // //go:linkname Mm256SetrEpi8 Mm256SetrEpi8 //go:noescape func Mm256SetrEpi8(r *x86.M256I, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char, v8 *x86.Char, v9 *x86.Char, v10 *x86.Char, v11 *x86.Char, v12 *x86.Char, v13 *x86.Char, v14 *x86.Char, v15 *x86.Char, v16 *x86.Char, v17 *x86.Char, v18 *x86.Char, v19 *x86.Char, v20 *x86.Char, v21 *x86.Char, v22 *x86.Char, v23 *x86.Char, v24 *x86.Char, v25 *x86.Char, v26 *x86.Char, v27 *x86.Char, v28 *x86.Char, v29 *x86.Char, v30 *x86.Char, v31 *x86.Char) // Set packed 64-bit integers in "dst" with the supplied values in reverse order. // //go:linkname Mm256SetrEpi64X Mm256SetrEpi64X //go:noescape func Mm256SetrEpi64X(r *x86.M256I, v0 *x86.Longlong, v1 *x86.Longlong, v2 *x86.Longlong, v3 *x86.Longlong) // Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst". // //go:linkname Mm256Set1Pd Mm256Set1Pd //go:noescape func Mm256Set1Pd(r *x86.M256D, v0 *x86.Double) // Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst". // //go:linkname Mm256Set1Ps Mm256Set1Ps //go:noescape func Mm256Set1Ps(r *x86.M256, v0 *x86.Float) // Broadcast 32-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastd". // //go:linkname Mm256Set1Epi32 Mm256Set1Epi32 //go:noescape func Mm256Set1Epi32(r *x86.M256I, v0 *x86.Int) // Broadcast 16-bit integer "a" to all all elements of "dst". This intrinsic may generate the "vpbroadcastw". // //go:linkname Mm256Set1Epi16 Mm256Set1Epi16 //go:noescape func Mm256Set1Epi16(r *x86.M256I, v0 *x86.Short) // Broadcast 8-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastb". // //go:linkname Mm256Set1Epi8 Mm256Set1Epi8 //go:noescape func Mm256Set1Epi8(r *x86.M256I, v0 *x86.Char) // Broadcast 64-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastq". // //go:linkname Mm256Set1Epi64X Mm256Set1Epi64X //go:noescape func Mm256Set1Epi64X(r *x86.M256I, v0 *x86.Longlong) // Return vector of type __m256d with all elements set to zero. // //go:linkname Mm256SetzeroPd Mm256SetzeroPd //go:noescape func Mm256SetzeroPd(r *x86.M256D, ) // Return vector of type __m256 with all elements set to zero. // //go:linkname Mm256SetzeroPs Mm256SetzeroPs //go:noescape func Mm256SetzeroPs(r *x86.M256, ) // Return vector of type __m256i with all elements set to zero. // //go:linkname Mm256SetzeroSi256 Mm256SetzeroSi256 //go:noescape func Mm256SetzeroSi256(r *x86.M256I, ) // Cast vector of type __m256d to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256CastpdPs Mm256CastpdPs //go:noescape func Mm256CastpdPs(r *x86.M256, v0 *x86.M256D) // Cast vector of type __m256d to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256CastpdSi256 Mm256CastpdSi256 //go:noescape func Mm256CastpdSi256(r *x86.M256I, v0 *x86.M256D) // Cast vector of type __m256 to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256CastpsPd Mm256CastpsPd //go:noescape func Mm256CastpsPd(r *x86.M256D, v0 *x86.M256) // Cast vector of type __m256 to type __m256i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256CastpsSi256 Mm256CastpsSi256 //go:noescape func Mm256CastpsSi256(r *x86.M256I, v0 *x86.M256) // Cast vector of type __m256i to type __m256. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256Castsi256Ps Mm256Castsi256Ps //go:noescape func Mm256Castsi256Ps(r *x86.M256, v0 *x86.M256I) // Cast vector of type __m256i to type __m256d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256Castsi256Pd Mm256Castsi256Pd //go:noescape func Mm256Castsi256Pd(r *x86.M256D, v0 *x86.M256I) // Cast vector of type __m256d to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256Castpd256Pd128 Mm256Castpd256Pd128 //go:noescape func Mm256Castpd256Pd128(r *x86.M128D, v0 *x86.M256D) // Cast vector of type __m256 to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256Castps256Ps128 Mm256Castps256Ps128 //go:noescape func Mm256Castps256Ps128(r *x86.M128, v0 *x86.M256) // Cast vector of type __m256i to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256Castsi256Si128 Mm256Castsi256Si128 //go:noescape func Mm256Castsi256Si128(r *x86.M128I, v0 *x86.M256I) // Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256Castpd128Pd256 Mm256Castpd128Pd256 //go:noescape func Mm256Castpd128Pd256(r *x86.M256D, v0 *x86.M128D) // Cast vector of type __m128 to type __m256; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256Castps128Ps256 Mm256Castps128Ps256 //go:noescape func Mm256Castps128Ps256(r *x86.M256, v0 *x86.M128) // Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256Castsi128Si256 Mm256Castsi128Si256 //go:noescape func Mm256Castsi128Si256(r *x86.M256I, v0 *x86.M128I) // Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256Zextpd128Pd256 Mm256Zextpd128Pd256 //go:noescape func Mm256Zextpd128Pd256(r *x86.M256D, v0 *x86.M128D) // Cast vector of type __m128 to type __m256; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256Zextps128Ps256 Mm256Zextps128Ps256 //go:noescape func Mm256Zextps128Ps256(r *x86.M256, v0 *x86.M128) // Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are zeroed. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname Mm256Zextsi128Si256 Mm256Zextsi128Si256 //go:noescape func Mm256Zextsi128Si256(r *x86.M256I, v0 *x86.M128I) // Set packed __m256 vector "dst" with the supplied values. // //go:linkname Mm256SetM128 Mm256SetM128 //go:noescape func Mm256SetM128(r *x86.M256, v0 *x86.M128, v1 *x86.M128) // Set packed __m256d vector "dst" with the supplied values. // //go:linkname Mm256SetM128D Mm256SetM128D //go:noescape func Mm256SetM128D(r *x86.M256D, v0 *x86.M128D, v1 *x86.M128D) // Set packed __m256i vector "dst" with the supplied values. // //go:linkname Mm256SetM128I Mm256SetM128I //go:noescape func Mm256SetM128I(r *x86.M256I, v0 *x86.M128I, v1 *x86.M128I) // Set packed __m256 vector "dst" with the supplied values. // //go:linkname Mm256SetrM128 Mm256SetrM128 //go:noescape func Mm256SetrM128(r *x86.M256, v0 *x86.M128, v1 *x86.M128) // Set packed __m256d vector "dst" with the supplied values. // //go:linkname Mm256SetrM128D Mm256SetrM128D //go:noescape func Mm256SetrM128D(r *x86.M256D, v0 *x86.M128D, v1 *x86.M128D) // Set packed __m256i vector "dst" with the supplied values. // //go:linkname Mm256SetrM128I Mm256SetrM128I //go:noescape func Mm256SetrM128I(r *x86.M256I, v0 *x86.M128I, v1 *x86.M128I) ================================================ FILE: x86/avx2/functions.c ================================================ #include void Mm256AbsEpi8(__m256i* r, __m256i* v0) { *r = _mm256_abs_epi8(*v0); } void Mm256AbsEpi16(__m256i* r, __m256i* v0) { *r = _mm256_abs_epi16(*v0); } void Mm256AbsEpi32(__m256i* r, __m256i* v0) { *r = _mm256_abs_epi32(*v0); } void Mm256PacksEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_packs_epi16(*v0, *v1); } void Mm256PacksEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_packs_epi32(*v0, *v1); } void Mm256PackusEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_packus_epi16(*v0, *v1); } void Mm256PackusEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_packus_epi32(*v0, *v1); } void Mm256AddEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_add_epi8(*v0, *v1); } void Mm256AddEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_add_epi16(*v0, *v1); } void Mm256AddEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_add_epi32(*v0, *v1); } void Mm256AddEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_add_epi64(*v0, *v1); } void Mm256AddsEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_adds_epi8(*v0, *v1); } void Mm256AddsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_adds_epi16(*v0, *v1); } void Mm256AddsEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_adds_epu8(*v0, *v1); } void Mm256AddsEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_adds_epu16(*v0, *v1); } void Mm256AndSi256(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_and_si256(*v0, *v1); } void Mm256AndnotSi256(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_andnot_si256(*v0, *v1); } void Mm256AvgEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_avg_epu8(*v0, *v1); } void Mm256AvgEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_avg_epu16(*v0, *v1); } void Mm256BlendvEpi8(__m256i* r, __m256i* v0, __m256i* v1, __m256i* v2) { *r = _mm256_blendv_epi8(*v0, *v1, *v2); } void Mm256CmpeqEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpeq_epi8(*v0, *v1); } void Mm256CmpeqEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpeq_epi16(*v0, *v1); } void Mm256CmpeqEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpeq_epi32(*v0, *v1); } void Mm256CmpeqEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpeq_epi64(*v0, *v1); } void Mm256CmpgtEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpgt_epi8(*v0, *v1); } void Mm256CmpgtEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpgt_epi16(*v0, *v1); } void Mm256CmpgtEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpgt_epi32(*v0, *v1); } void Mm256CmpgtEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_cmpgt_epi64(*v0, *v1); } void Mm256HaddEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hadd_epi16(*v0, *v1); } void Mm256HaddEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hadd_epi32(*v0, *v1); } void Mm256HaddsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hadds_epi16(*v0, *v1); } void Mm256HsubEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hsub_epi16(*v0, *v1); } void Mm256HsubEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hsub_epi32(*v0, *v1); } void Mm256HsubsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_hsubs_epi16(*v0, *v1); } void Mm256MaddubsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_maddubs_epi16(*v0, *v1); } void Mm256MaddEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_madd_epi16(*v0, *v1); } void Mm256MaxEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epi8(*v0, *v1); } void Mm256MaxEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epi16(*v0, *v1); } void Mm256MaxEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epi32(*v0, *v1); } void Mm256MaxEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epu8(*v0, *v1); } void Mm256MaxEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epu16(*v0, *v1); } void Mm256MaxEpu32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_max_epu32(*v0, *v1); } void Mm256MinEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epi8(*v0, *v1); } void Mm256MinEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epi16(*v0, *v1); } void Mm256MinEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epi32(*v0, *v1); } void Mm256MinEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epu8(*v0, *v1); } void Mm256MinEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epu16(*v0, *v1); } void Mm256MinEpu32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_min_epu32(*v0, *v1); } void Mm256MovemaskEpi8(int* r, __m256i* v0) { *r = _mm256_movemask_epi8(*v0); } void Mm256Cvtepi8Epi16(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi8_epi16(*v0); } void Mm256Cvtepi8Epi32(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi8_epi32(*v0); } void Mm256Cvtepi8Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi8_epi64(*v0); } void Mm256Cvtepi16Epi32(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi16_epi32(*v0); } void Mm256Cvtepi16Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi16_epi64(*v0); } void Mm256Cvtepi32Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepi32_epi64(*v0); } void Mm256Cvtepu8Epi16(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu8_epi16(*v0); } void Mm256Cvtepu8Epi32(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu8_epi32(*v0); } void Mm256Cvtepu8Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu8_epi64(*v0); } void Mm256Cvtepu16Epi32(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu16_epi32(*v0); } void Mm256Cvtepu16Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu16_epi64(*v0); } void Mm256Cvtepu32Epi64(__m256i* r, __m128i* v0) { *r = _mm256_cvtepu32_epi64(*v0); } void Mm256MulEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mul_epi32(*v0, *v1); } void Mm256MulhrsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mulhrs_epi16(*v0, *v1); } void Mm256MulhiEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mulhi_epu16(*v0, *v1); } void Mm256MulhiEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mulhi_epi16(*v0, *v1); } void Mm256MulloEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mullo_epi16(*v0, *v1); } void Mm256MulloEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mullo_epi32(*v0, *v1); } void Mm256MulEpu32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_mul_epu32(*v0, *v1); } void Mm256OrSi256(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_or_si256(*v0, *v1); } void Mm256SadEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sad_epu8(*v0, *v1); } void Mm256ShuffleEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_shuffle_epi8(*v0, *v1); } void Mm256SignEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sign_epi8(*v0, *v1); } void Mm256SignEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sign_epi16(*v0, *v1); } void Mm256SignEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sign_epi32(*v0, *v1); } void Mm256SlliEpi16(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_slli_epi16(*v0, *v1); } void Mm256SllEpi16(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sll_epi16(*v0, *v1); } void Mm256SlliEpi32(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_slli_epi32(*v0, *v1); } void Mm256SllEpi32(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sll_epi32(*v0, *v1); } void Mm256SlliEpi64(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_slli_epi64(*v0, *v1); } void Mm256SllEpi64(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sll_epi64(*v0, *v1); } void Mm256SraiEpi16(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srai_epi16(*v0, *v1); } void Mm256SraEpi16(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sra_epi16(*v0, *v1); } void Mm256SraiEpi32(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srai_epi32(*v0, *v1); } void Mm256SraEpi32(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_sra_epi32(*v0, *v1); } void Mm256SrliEpi16(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srli_epi16(*v0, *v1); } void Mm256SrlEpi16(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_srl_epi16(*v0, *v1); } void Mm256SrliEpi32(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srli_epi32(*v0, *v1); } void Mm256SrlEpi32(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_srl_epi32(*v0, *v1); } void Mm256SrliEpi64(__m256i* r, __m256i* v0, int* v1) { *r = _mm256_srli_epi64(*v0, *v1); } void Mm256SrlEpi64(__m256i* r, __m256i* v0, __m128i* v1) { *r = _mm256_srl_epi64(*v0, *v1); } void Mm256SubEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sub_epi8(*v0, *v1); } void Mm256SubEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sub_epi16(*v0, *v1); } void Mm256SubEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sub_epi32(*v0, *v1); } void Mm256SubEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sub_epi64(*v0, *v1); } void Mm256SubsEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_subs_epi8(*v0, *v1); } void Mm256SubsEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_subs_epi16(*v0, *v1); } void Mm256SubsEpu8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_subs_epu8(*v0, *v1); } void Mm256SubsEpu16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_subs_epu16(*v0, *v1); } void Mm256UnpackhiEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpackhi_epi8(*v0, *v1); } void Mm256UnpackhiEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpackhi_epi16(*v0, *v1); } void Mm256UnpackhiEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpackhi_epi32(*v0, *v1); } void Mm256UnpackhiEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpackhi_epi64(*v0, *v1); } void Mm256UnpackloEpi8(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpacklo_epi8(*v0, *v1); } void Mm256UnpackloEpi16(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpacklo_epi16(*v0, *v1); } void Mm256UnpackloEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpacklo_epi32(*v0, *v1); } void Mm256UnpackloEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_unpacklo_epi64(*v0, *v1); } void Mm256XorSi256(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_xor_si256(*v0, *v1); } void MmBroadcastssPs(__m128* r, __m128* v0) { *r = _mm_broadcastss_ps(*v0); } void MmBroadcastsdPd(__m128d* r, __m128d* v0) { *r = _mm_broadcastsd_pd(*v0); } void Mm256BroadcastssPs(__m256* r, __m128* v0) { *r = _mm256_broadcastss_ps(*v0); } void Mm256BroadcastsdPd(__m256d* r, __m128d* v0) { *r = _mm256_broadcastsd_pd(*v0); } void Mm256Broadcastsi128Si256(__m256i* r, __m128i* v0) { *r = _mm256_broadcastsi128_si256(*v0); } void Mm256BroadcastbEpi8(__m256i* r, __m128i* v0) { *r = _mm256_broadcastb_epi8(*v0); } void Mm256BroadcastwEpi16(__m256i* r, __m128i* v0) { *r = _mm256_broadcastw_epi16(*v0); } void Mm256BroadcastdEpi32(__m256i* r, __m128i* v0) { *r = _mm256_broadcastd_epi32(*v0); } void Mm256BroadcastqEpi64(__m256i* r, __m128i* v0) { *r = _mm256_broadcastq_epi64(*v0); } void MmBroadcastbEpi8(__m128i* r, __m128i* v0) { *r = _mm_broadcastb_epi8(*v0); } void MmBroadcastwEpi16(__m128i* r, __m128i* v0) { *r = _mm_broadcastw_epi16(*v0); } void MmBroadcastdEpi32(__m128i* r, __m128i* v0) { *r = _mm_broadcastd_epi32(*v0); } void MmBroadcastqEpi64(__m128i* r, __m128i* v0) { *r = _mm_broadcastq_epi64(*v0); } void Mm256Permutevar8X32Epi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_permutevar8x32_epi32(*v0, *v1); } void Mm256Permutevar8X32Ps(__m256* r, __m256* v0, __m256i* v1) { *r = _mm256_permutevar8x32_ps(*v0, *v1); } void Mm256SllvEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sllv_epi32(*v0, *v1); } void MmSllvEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sllv_epi32(*v0, *v1); } void Mm256SllvEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_sllv_epi64(*v0, *v1); } void MmSllvEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sllv_epi64(*v0, *v1); } void Mm256SravEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_srav_epi32(*v0, *v1); } void MmSravEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srav_epi32(*v0, *v1); } void Mm256SrlvEpi32(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_srlv_epi32(*v0, *v1); } void MmSrlvEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srlv_epi32(*v0, *v1); } void Mm256SrlvEpi64(__m256i* r, __m256i* v0, __m256i* v1) { *r = _mm256_srlv_epi64(*v0, *v1); } void MmSrlvEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srlv_epi64(*v0, *v1); } ================================================ FILE: x86/avx2/functions.go ================================================ package avx2 import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mavx2 #include */ import "C" // Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst". // //go:linkname Mm256AbsEpi8 Mm256AbsEpi8 //go:noescape func Mm256AbsEpi8(r *x86.M256I, v0 *x86.M256I) // Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst". // //go:linkname Mm256AbsEpi16 Mm256AbsEpi16 //go:noescape func Mm256AbsEpi16(r *x86.M256I, v0 *x86.M256I) // Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst". // //go:linkname Mm256AbsEpi32 Mm256AbsEpi32 //go:noescape func Mm256AbsEpi32(r *x86.M256I, v0 *x86.M256I) // Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst". // //go:linkname Mm256PacksEpi16 Mm256PacksEpi16 //go:noescape func Mm256PacksEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst". // //go:linkname Mm256PacksEpi32 Mm256PacksEpi32 //go:noescape func Mm256PacksEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst". // //go:linkname Mm256PackusEpi16 Mm256PackusEpi16 //go:noescape func Mm256PackusEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using unsigned saturation, and store the results in "dst". // //go:linkname Mm256PackusEpi32 Mm256PackusEpi32 //go:noescape func Mm256PackusEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Add packed 8-bit integers in "a" and "b", and store the results in "dst". // //go:linkname Mm256AddEpi8 Mm256AddEpi8 //go:noescape func Mm256AddEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Add packed 16-bit integers in "a" and "b", and store the results in "dst". // //go:linkname Mm256AddEpi16 Mm256AddEpi16 //go:noescape func Mm256AddEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Add packed 32-bit integers in "a" and "b", and store the results in "dst". // //go:linkname Mm256AddEpi32 Mm256AddEpi32 //go:noescape func Mm256AddEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Add packed 64-bit integers in "a" and "b", and store the results in "dst". // //go:linkname Mm256AddEpi64 Mm256AddEpi64 //go:noescape func Mm256AddEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Add packed 8-bit integers in "a" and "b" using saturation, and store the results in "dst". // //go:linkname Mm256AddsEpi8 Mm256AddsEpi8 //go:noescape func Mm256AddsEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Add packed 16-bit integers in "a" and "b" using saturation, and store the results in "dst". // //go:linkname Mm256AddsEpi16 Mm256AddsEpi16 //go:noescape func Mm256AddsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst". // //go:linkname Mm256AddsEpu8 Mm256AddsEpu8 //go:noescape func Mm256AddsEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst". // //go:linkname Mm256AddsEpu16 Mm256AddsEpu16 //go:noescape func Mm256AddsEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compute the bitwise AND of 256 bits (representing integer data) in "a" and "b", and store the result in "dst". // //go:linkname Mm256AndSi256 Mm256AndSi256 //go:noescape func Mm256AndSi256(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compute the bitwise NOT of 256 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst". // //go:linkname Mm256AndnotSi256 Mm256AndnotSi256 //go:noescape func Mm256AndnotSi256(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst". // //go:linkname Mm256AvgEpu8 Mm256AvgEpu8 //go:noescape func Mm256AvgEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst". // //go:linkname Mm256AvgEpu16 Mm256AvgEpu16 //go:noescape func Mm256AvgEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Blend packed 8-bit integers from "a" and "b" using "mask", and store the results in "dst". // //go:linkname Mm256BlendvEpi8 Mm256BlendvEpi8 //go:noescape func Mm256BlendvEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I, v2 *x86.M256I) // Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst". // //go:linkname Mm256CmpeqEpi8 Mm256CmpeqEpi8 //go:noescape func Mm256CmpeqEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst". // //go:linkname Mm256CmpeqEpi16 Mm256CmpeqEpi16 //go:noescape func Mm256CmpeqEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst". // //go:linkname Mm256CmpeqEpi32 Mm256CmpeqEpi32 //go:noescape func Mm256CmpeqEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed 64-bit integers in "a" and "b" for equality, and store the results in "dst". // //go:linkname Mm256CmpeqEpi64 Mm256CmpeqEpi64 //go:noescape func Mm256CmpeqEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst". // //go:linkname Mm256CmpgtEpi8 Mm256CmpgtEpi8 //go:noescape func Mm256CmpgtEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst". // //go:linkname Mm256CmpgtEpi16 Mm256CmpgtEpi16 //go:noescape func Mm256CmpgtEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst". // //go:linkname Mm256CmpgtEpi32 Mm256CmpgtEpi32 //go:noescape func Mm256CmpgtEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed signed 64-bit integers in "a" and "b" for greater-than, and store the results in "dst". // //go:linkname Mm256CmpgtEpi64 Mm256CmpgtEpi64 //go:noescape func Mm256CmpgtEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst". // //go:linkname Mm256HaddEpi16 Mm256HaddEpi16 //go:noescape func Mm256HaddEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst". // //go:linkname Mm256HaddEpi32 Mm256HaddEpi32 //go:noescape func Mm256HaddEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst". // //go:linkname Mm256HaddsEpi16 Mm256HaddsEpi16 //go:noescape func Mm256HaddsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst". // //go:linkname Mm256HsubEpi16 Mm256HsubEpi16 //go:noescape func Mm256HsubEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst". // //go:linkname Mm256HsubEpi32 Mm256HsubEpi32 //go:noescape func Mm256HsubEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst". // //go:linkname Mm256HsubsEpi16 Mm256HsubsEpi16 //go:noescape func Mm256HsubsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst". // //go:linkname Mm256MaddubsEpi16 Mm256MaddubsEpi16 //go:noescape func Mm256MaddubsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst". // //go:linkname Mm256MaddEpi16 Mm256MaddEpi16 //go:noescape func Mm256MaddEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed signed 8-bit integers in "a" and "b", and store packed maximum values in "dst". // //go:linkname Mm256MaxEpi8 Mm256MaxEpi8 //go:noescape func Mm256MaxEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst". // //go:linkname Mm256MaxEpi16 Mm256MaxEpi16 //go:noescape func Mm256MaxEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed signed 32-bit integers in "a" and "b", and store packed maximum values in "dst". // //go:linkname Mm256MaxEpi32 Mm256MaxEpi32 //go:noescape func Mm256MaxEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst". // //go:linkname Mm256MaxEpu8 Mm256MaxEpu8 //go:noescape func Mm256MaxEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed unsigned 16-bit integers in "a" and "b", and store packed maximum values in "dst". // //go:linkname Mm256MaxEpu16 Mm256MaxEpu16 //go:noescape func Mm256MaxEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed unsigned 32-bit integers in "a" and "b", and store packed maximum values in "dst". // //go:linkname Mm256MaxEpu32 Mm256MaxEpu32 //go:noescape func Mm256MaxEpu32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed signed 8-bit integers in "a" and "b", and store packed minimum values in "dst". // //go:linkname Mm256MinEpi8 Mm256MinEpi8 //go:noescape func Mm256MinEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst". // //go:linkname Mm256MinEpi16 Mm256MinEpi16 //go:noescape func Mm256MinEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed signed 32-bit integers in "a" and "b", and store packed minimum values in "dst". // //go:linkname Mm256MinEpi32 Mm256MinEpi32 //go:noescape func Mm256MinEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst". // //go:linkname Mm256MinEpu8 Mm256MinEpu8 //go:noescape func Mm256MinEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed unsigned 16-bit integers in "a" and "b", and store packed minimum values in "dst". // //go:linkname Mm256MinEpu16 Mm256MinEpu16 //go:noescape func Mm256MinEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compare packed unsigned 32-bit integers in "a" and "b", and store packed minimum values in "dst". // //go:linkname Mm256MinEpu32 Mm256MinEpu32 //go:noescape func Mm256MinEpu32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst". // //go:linkname Mm256MovemaskEpi8 Mm256MovemaskEpi8 //go:noescape func Mm256MovemaskEpi8(r *x86.Int, v0 *x86.M256I) // Sign extend packed 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst". // //go:linkname Mm256Cvtepi8Epi16 Mm256Cvtepi8Epi16 //go:noescape func Mm256Cvtepi8Epi16(r *x86.M256I, v0 *x86.M128I) // Sign extend packed 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst". // //go:linkname Mm256Cvtepi8Epi32 Mm256Cvtepi8Epi32 //go:noescape func Mm256Cvtepi8Epi32(r *x86.M256I, v0 *x86.M128I) // Sign extend packed 8-bit integers in the low 8 bytes of "a" to packed 64-bit integers, and store the results in "dst". // //go:linkname Mm256Cvtepi8Epi64 Mm256Cvtepi8Epi64 //go:noescape func Mm256Cvtepi8Epi64(r *x86.M256I, v0 *x86.M128I) // Sign extend packed 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst". // //go:linkname Mm256Cvtepi16Epi32 Mm256Cvtepi16Epi32 //go:noescape func Mm256Cvtepi16Epi32(r *x86.M256I, v0 *x86.M128I) // Sign extend packed 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst". // //go:linkname Mm256Cvtepi16Epi64 Mm256Cvtepi16Epi64 //go:noescape func Mm256Cvtepi16Epi64(r *x86.M256I, v0 *x86.M128I) // Sign extend packed 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst". // //go:linkname Mm256Cvtepi32Epi64 Mm256Cvtepi32Epi64 //go:noescape func Mm256Cvtepi32Epi64(r *x86.M256I, v0 *x86.M128I) // Zero extend packed unsigned 8-bit integers in "a" to packed 16-bit integers, and store the results in "dst". // //go:linkname Mm256Cvtepu8Epi16 Mm256Cvtepu8Epi16 //go:noescape func Mm256Cvtepu8Epi16(r *x86.M256I, v0 *x86.M128I) // Zero extend packed unsigned 8-bit integers in "a" to packed 32-bit integers, and store the results in "dst". // //go:linkname Mm256Cvtepu8Epi32 Mm256Cvtepu8Epi32 //go:noescape func Mm256Cvtepu8Epi32(r *x86.M256I, v0 *x86.M128I) // Zero extend packed unsigned 8-bit integers in the low 8 byte sof "a" to packed 64-bit integers, and store the results in "dst". // //go:linkname Mm256Cvtepu8Epi64 Mm256Cvtepu8Epi64 //go:noescape func Mm256Cvtepu8Epi64(r *x86.M256I, v0 *x86.M128I) // Zero extend packed unsigned 16-bit integers in "a" to packed 32-bit integers, and store the results in "dst". // //go:linkname Mm256Cvtepu16Epi32 Mm256Cvtepu16Epi32 //go:noescape func Mm256Cvtepu16Epi32(r *x86.M256I, v0 *x86.M128I) // Zero extend packed unsigned 16-bit integers in "a" to packed 64-bit integers, and store the results in "dst". // //go:linkname Mm256Cvtepu16Epi64 Mm256Cvtepu16Epi64 //go:noescape func Mm256Cvtepu16Epi64(r *x86.M256I, v0 *x86.M128I) // Zero extend packed unsigned 32-bit integers in "a" to packed 64-bit integers, and store the results in "dst". // //go:linkname Mm256Cvtepu32Epi64 Mm256Cvtepu32Epi64 //go:noescape func Mm256Cvtepu32Epi64(r *x86.M256I, v0 *x86.M128I) // Multiply the low signed 32-bit integers from each packed 64-bit element in "a" and "b", and store the signed 64-bit results in "dst". // //go:linkname Mm256MulEpi32 Mm256MulEpi32 //go:noescape func Mm256MulEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst". // //go:linkname Mm256MulhrsEpi16 Mm256MulhrsEpi16 //go:noescape func Mm256MulhrsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". // //go:linkname Mm256MulhiEpu16 Mm256MulhiEpu16 //go:noescape func Mm256MulhiEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". // //go:linkname Mm256MulhiEpi16 Mm256MulhiEpi16 //go:noescape func Mm256MulhiEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst". // //go:linkname Mm256MulloEpi16 Mm256MulloEpi16 //go:noescape func Mm256MulloEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Multiply the packed signed 32-bit integers in "a" and "b", producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integers in "dst". // //go:linkname Mm256MulloEpi32 Mm256MulloEpi32 //go:noescape func Mm256MulloEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst". // //go:linkname Mm256MulEpu32 Mm256MulEpu32 //go:noescape func Mm256MulEpu32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compute the bitwise OR of 256 bits (representing integer data) in "a" and "b", and store the result in "dst". // //go:linkname Mm256OrSi256 Mm256OrSi256 //go:noescape func Mm256OrSi256(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in "dst". // //go:linkname Mm256SadEpu8 Mm256SadEpu8 //go:noescape func Mm256SadEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Shuffle 8-bit integers in "a" within 128-bit lanes according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst". // //go:linkname Mm256ShuffleEpi8 Mm256ShuffleEpi8 //go:noescape func Mm256ShuffleEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Negate packed signed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. // //go:linkname Mm256SignEpi8 Mm256SignEpi8 //go:noescape func Mm256SignEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Negate packed signed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. // //go:linkname Mm256SignEpi16 Mm256SignEpi16 //go:noescape func Mm256SignEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Negate packed signed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. // //go:linkname Mm256SignEpi32 Mm256SignEpi32 //go:noescape func Mm256SignEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SlliEpi16 Mm256SlliEpi16 //go:noescape func Mm256SlliEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int) // Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SllEpi16 Mm256SllEpi16 //go:noescape func Mm256SllEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I) // Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SlliEpi32 Mm256SlliEpi32 //go:noescape func Mm256SlliEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int) // Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SllEpi32 Mm256SllEpi32 //go:noescape func Mm256SllEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I) // Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SlliEpi64 Mm256SlliEpi64 //go:noescape func Mm256SlliEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int) // Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SllEpi64 Mm256SllEpi64 //go:noescape func Mm256SllEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I) // Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". // //go:linkname Mm256SraiEpi16 Mm256SraiEpi16 //go:noescape func Mm256SraiEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int) // Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". // //go:linkname Mm256SraEpi16 Mm256SraEpi16 //go:noescape func Mm256SraEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I) // Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". // //go:linkname Mm256SraiEpi32 Mm256SraiEpi32 //go:noescape func Mm256SraiEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int) // Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". // //go:linkname Mm256SraEpi32 Mm256SraEpi32 //go:noescape func Mm256SraEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I) // Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SrliEpi16 Mm256SrliEpi16 //go:noescape func Mm256SrliEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int) // Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SrlEpi16 Mm256SrlEpi16 //go:noescape func Mm256SrlEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I) // Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SrliEpi32 Mm256SrliEpi32 //go:noescape func Mm256SrliEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int) // Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SrlEpi32 Mm256SrlEpi32 //go:noescape func Mm256SrlEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I) // Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SrliEpi64 Mm256SrliEpi64 //go:noescape func Mm256SrliEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.Int) // Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SrlEpi64 Mm256SrlEpi64 //go:noescape func Mm256SrlEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M128I) // Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst". // //go:linkname Mm256SubEpi8 Mm256SubEpi8 //go:noescape func Mm256SubEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst". // //go:linkname Mm256SubEpi16 Mm256SubEpi16 //go:noescape func Mm256SubEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst". // //go:linkname Mm256SubEpi32 Mm256SubEpi32 //go:noescape func Mm256SubEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst". // //go:linkname Mm256SubEpi64 Mm256SubEpi64 //go:noescape func Mm256SubEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst". // //go:linkname Mm256SubsEpi8 Mm256SubsEpi8 //go:noescape func Mm256SubsEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst". // //go:linkname Mm256SubsEpi16 Mm256SubsEpi16 //go:noescape func Mm256SubsEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst". // //go:linkname Mm256SubsEpu8 Mm256SubsEpu8 //go:noescape func Mm256SubsEpu8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst". // //go:linkname Mm256SubsEpu16 Mm256SubsEpu16 //go:noescape func Mm256SubsEpu16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Unpack and interleave 8-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". // //go:linkname Mm256UnpackhiEpi8 Mm256UnpackhiEpi8 //go:noescape func Mm256UnpackhiEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Unpack and interleave 16-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". // //go:linkname Mm256UnpackhiEpi16 Mm256UnpackhiEpi16 //go:noescape func Mm256UnpackhiEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Unpack and interleave 32-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". // //go:linkname Mm256UnpackhiEpi32 Mm256UnpackhiEpi32 //go:noescape func Mm256UnpackhiEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Unpack and interleave 64-bit integers from the high half of each 128-bit lane in "a" and "b", and store the results in "dst". // //go:linkname Mm256UnpackhiEpi64 Mm256UnpackhiEpi64 //go:noescape func Mm256UnpackhiEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Unpack and interleave 8-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". // //go:linkname Mm256UnpackloEpi8 Mm256UnpackloEpi8 //go:noescape func Mm256UnpackloEpi8(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Unpack and interleave 16-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". // //go:linkname Mm256UnpackloEpi16 Mm256UnpackloEpi16 //go:noescape func Mm256UnpackloEpi16(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Unpack and interleave 32-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". // //go:linkname Mm256UnpackloEpi32 Mm256UnpackloEpi32 //go:noescape func Mm256UnpackloEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Unpack and interleave 64-bit integers from the low half of each 128-bit lane in "a" and "b", and store the results in "dst". // //go:linkname Mm256UnpackloEpi64 Mm256UnpackloEpi64 //go:noescape func Mm256UnpackloEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Compute the bitwise XOR of 256 bits (representing integer data) in "a" and "b", and store the result in "dst". // //go:linkname Mm256XorSi256 Mm256XorSi256 //go:noescape func Mm256XorSi256(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst". // //go:linkname MmBroadcastssPs MmBroadcastssPs //go:noescape func MmBroadcastssPs(r *x86.M128, v0 *x86.M128) // Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst". // //go:linkname MmBroadcastsdPd MmBroadcastsdPd //go:noescape func MmBroadcastsdPd(r *x86.M128D, v0 *x86.M128D) // Broadcast the low single-precision (32-bit) floating-point element from "a" to all elements of "dst". // //go:linkname Mm256BroadcastssPs Mm256BroadcastssPs //go:noescape func Mm256BroadcastssPs(r *x86.M256, v0 *x86.M128) // Broadcast the low double-precision (64-bit) floating-point element from "a" to all elements of "dst". // //go:linkname Mm256BroadcastsdPd Mm256BroadcastsdPd //go:noescape func Mm256BroadcastsdPd(r *x86.M256D, v0 *x86.M128D) // Broadcast 128 bits of integer data from "a" to all 128-bit lanes in "dst". // //go:linkname Mm256Broadcastsi128Si256 Mm256Broadcastsi128Si256 //go:noescape func Mm256Broadcastsi128Si256(r *x86.M256I, v0 *x86.M128I) // Broadcast the low packed 8-bit integer from "a" to all elements of "dst". // //go:linkname Mm256BroadcastbEpi8 Mm256BroadcastbEpi8 //go:noescape func Mm256BroadcastbEpi8(r *x86.M256I, v0 *x86.M128I) // Broadcast the low packed 16-bit integer from "a" to all elements of "dst". // //go:linkname Mm256BroadcastwEpi16 Mm256BroadcastwEpi16 //go:noescape func Mm256BroadcastwEpi16(r *x86.M256I, v0 *x86.M128I) // Broadcast the low packed 32-bit integer from "a" to all elements of "dst". // //go:linkname Mm256BroadcastdEpi32 Mm256BroadcastdEpi32 //go:noescape func Mm256BroadcastdEpi32(r *x86.M256I, v0 *x86.M128I) // Broadcast the low packed 64-bit integer from "a" to all elements of "dst". // //go:linkname Mm256BroadcastqEpi64 Mm256BroadcastqEpi64 //go:noescape func Mm256BroadcastqEpi64(r *x86.M256I, v0 *x86.M128I) // Broadcast the low packed 8-bit integer from "a" to all elements of "dst". // //go:linkname MmBroadcastbEpi8 MmBroadcastbEpi8 //go:noescape func MmBroadcastbEpi8(r *x86.M128I, v0 *x86.M128I) // Broadcast the low packed 16-bit integer from "a" to all elements of "dst". // //go:linkname MmBroadcastwEpi16 MmBroadcastwEpi16 //go:noescape func MmBroadcastwEpi16(r *x86.M128I, v0 *x86.M128I) // Broadcast the low packed 32-bit integer from "a" to all elements of "dst". // //go:linkname MmBroadcastdEpi32 MmBroadcastdEpi32 //go:noescape func MmBroadcastdEpi32(r *x86.M128I, v0 *x86.M128I) // Broadcast the low packed 64-bit integer from "a" to all elements of "dst". // //go:linkname MmBroadcastqEpi64 MmBroadcastqEpi64 //go:noescape func MmBroadcastqEpi64(r *x86.M128I, v0 *x86.M128I) // Shuffle 32-bit integers in "a" across lanes using the corresponding index in "idx", and store the results in "dst". // //go:linkname Mm256Permutevar8X32Epi32 Mm256Permutevar8X32Epi32 //go:noescape func Mm256Permutevar8X32Epi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Shuffle single-precision (32-bit) floating-point elements in "a" across lanes using the corresponding index in "idx". // //go:linkname Mm256Permutevar8X32Ps Mm256Permutevar8X32Ps //go:noescape func Mm256Permutevar8X32Ps(r *x86.M256, v0 *x86.M256, v1 *x86.M256I) // Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SllvEpi32 Mm256SllvEpi32 //go:noescape func Mm256SllvEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Shift packed 32-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". // //go:linkname MmSllvEpi32 MmSllvEpi32 //go:noescape func MmSllvEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SllvEpi64 Mm256SllvEpi64 //go:noescape func Mm256SllvEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Shift packed 64-bit integers in "a" left by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". // //go:linkname MmSllvEpi64 MmSllvEpi64 //go:noescape func MmSllvEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst". // //go:linkname Mm256SravEpi32 Mm256SravEpi32 //go:noescape func Mm256SravEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in sign bits, and store the results in "dst". // //go:linkname MmSravEpi32 MmSravEpi32 //go:noescape func MmSravEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SrlvEpi32 Mm256SrlvEpi32 //go:noescape func Mm256SrlvEpi32(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Shift packed 32-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". // //go:linkname MmSrlvEpi32 MmSrlvEpi32 //go:noescape func MmSrlvEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". // //go:linkname Mm256SrlvEpi64 Mm256SrlvEpi64 //go:noescape func Mm256SrlvEpi64(r *x86.M256I, v0 *x86.M256I, v1 *x86.M256I) // Shift packed 64-bit integers in "a" right by the amount specified by the corresponding element in "count" while shifting in zeros, and store the results in "dst". // //go:linkname MmSrlvEpi64 MmSrlvEpi64 //go:noescape func MmSrlvEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) ================================================ FILE: x86/bmi/functions.c ================================================ #include void AndnU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = __andn_u32(*v0, *v1); } void BextrU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = __bextr_u32(*v0, *v1); } void BextrU32(unsigned int* r, unsigned int* v0, unsigned int* v1, unsigned int* v2) { *r = _bextr_u32(*v0, *v1, *v2); } void Bextr2U32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _bextr2_u32(*v0, *v1); } void BlsiU32(unsigned int* r, unsigned int* v0) { *r = __blsi_u32(*v0); } void BlsmskU32(unsigned int* r, unsigned int* v0) { *r = __blsmsk_u32(*v0); } void BlsrU32(unsigned int* r, unsigned int* v0) { *r = __blsr_u32(*v0); } void AndnU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = __andn_u64(*v0, *v1); } void BextrU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = __bextr_u64(*v0, *v1); } void BextrU64(unsigned long long* r, unsigned long long* v0, unsigned int* v1, unsigned int* v2) { *r = _bextr_u64(*v0, *v1, *v2); } void Bextr2U64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _bextr2_u64(*v0, *v1); } void BlsiU64(unsigned long long* r, unsigned long long* v0) { *r = __blsi_u64(*v0); } void BlsmskU64(unsigned long long* r, unsigned long long* v0) { *r = __blsmsk_u64(*v0); } void BlsrU64(unsigned long long* r, unsigned long long* v0) { *r = __blsr_u64(*v0); } ================================================ FILE: x86/bmi/functions.go ================================================ package bmi import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mbmi #include */ import "C" // __andn_u32 // //go:linkname AndnU32 AndnU32 //go:noescape func AndnU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint) // __bextr_u32 // //go:linkname BextrU32 BextrU32 //go:noescape func BextrU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint) // Extract contiguous bits from unsigned 32-bit integer "a", and store the result in "dst". Extract the number of bits specified by "len", starting at the bit specified by "start". // //go:linkname BextrU32 BextrU32 //go:noescape func BextrU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint, v2 *x86.Uint) // Extract contiguous bits from unsigned 32-bit integer "a", and store the result in "dst". Extract the number of bits specified by bits 15:8 of "control", starting at the bit specified by bits 0:7 of "control". // //go:linkname Bextr2U32 Bextr2U32 //go:noescape func Bextr2U32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint) // __blsi_u32 // //go:linkname BlsiU32 BlsiU32 //go:noescape func BlsiU32(r *x86.Uint, v0 *x86.Uint) // __blsmsk_u32 // //go:linkname BlsmskU32 BlsmskU32 //go:noescape func BlsmskU32(r *x86.Uint, v0 *x86.Uint) // __blsr_u32 // //go:linkname BlsrU32 BlsrU32 //go:noescape func BlsrU32(r *x86.Uint, v0 *x86.Uint) // __andn_u64 // //go:linkname AndnU64 AndnU64 //go:noescape func AndnU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong) // __bextr_u64 // //go:linkname BextrU64 BextrU64 //go:noescape func BextrU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong) // Extract contiguous bits from unsigned 64-bit integer "a", and store the result in "dst". Extract the number of bits specified by "len", starting at the bit specified by "start". // //go:linkname BextrU64 BextrU64 //go:noescape func BextrU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Uint, v2 *x86.Uint) // Extract contiguous bits from unsigned 64-bit integer "a", and store the result in "dst". Extract the number of bits specified by bits 15:8 of "control", starting at the bit specified by bits 0:7 of "control".. // //go:linkname Bextr2U64 Bextr2U64 //go:noescape func Bextr2U64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong) // __blsi_u64 // //go:linkname BlsiU64 BlsiU64 //go:noescape func BlsiU64(r *x86.Ulonglong, v0 *x86.Ulonglong) // __blsmsk_u64 // //go:linkname BlsmskU64 BlsmskU64 //go:noescape func BlsmskU64(r *x86.Ulonglong, v0 *x86.Ulonglong) // __blsr_u64 // //go:linkname BlsrU64 BlsrU64 //go:noescape func BlsrU64(r *x86.Ulonglong, v0 *x86.Ulonglong) ================================================ FILE: x86/bmi2/functions.c ================================================ #include void BzhiU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _bzhi_u32(*v0, *v1); } void PdepU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _pdep_u32(*v0, *v1); } void PextU32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _pext_u32(*v0, *v1); } void BzhiU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _bzhi_u64(*v0, *v1); } void PdepU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _pdep_u64(*v0, *v1); } void PextU64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _pext_u64(*v0, *v1); } ================================================ FILE: x86/bmi2/functions.go ================================================ package bmi2 import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mbmi2 #include */ import "C" // Copy all bits from unsigned 32-bit integer "a" to "dst", and reset (set to 0) the high bits in "dst" starting at "index". // //go:linkname BzhiU32 BzhiU32 //go:noescape func BzhiU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint) // Deposit contiguous low bits from unsigned 32-bit integer "a" to "dst" at the corresponding bit locations specified by "mask"; all other bits in "dst" are set to zero. // //go:linkname PdepU32 PdepU32 //go:noescape func PdepU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint) // Extract bits from unsigned 32-bit integer "a" at the corresponding bit locations specified by "mask" to contiguous low bits in "dst"; the remaining upper bits in "dst" are set to zero. // //go:linkname PextU32 PextU32 //go:noescape func PextU32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint) // Copy all bits from unsigned 64-bit integer "a" to "dst", and reset (set to 0) the high bits in "dst" starting at "index". // //go:linkname BzhiU64 BzhiU64 //go:noescape func BzhiU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong) // Deposit contiguous low bits from unsigned 64-bit integer "a" to "dst" at the corresponding bit locations specified by "mask"; all other bits in "dst" are set to zero. // //go:linkname PdepU64 PdepU64 //go:noescape func PdepU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong) // Extract bits from unsigned 64-bit integer "a" at the corresponding bit locations specified by "mask" to contiguous low bits in "dst"; the remaining upper bits in "dst" are set to zero. // //go:linkname PextU64 PextU64 //go:noescape func PextU64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong) ================================================ FILE: x86/crc32/functions.c ================================================ #include void MmCrc32U8(unsigned int* r, unsigned int* v0, unsigned char* v1) { *r = _mm_crc32_u8(*v0, *v1); } void MmCrc32U16(unsigned int* r, unsigned int* v0, unsigned short* v1) { *r = _mm_crc32_u16(*v0, *v1); } void MmCrc32U32(unsigned int* r, unsigned int* v0, unsigned int* v1) { *r = _mm_crc32_u32(*v0, *v1); } void MmCrc32U64(unsigned long long* r, unsigned long long* v0, unsigned long long* v1) { *r = _mm_crc32_u64(*v0, *v1); } ================================================ FILE: x86/crc32/functions.go ================================================ package crc32 import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mcrc32 #include */ import "C" // Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 8-bit integer "v", and stores the result in "dst". // //go:linkname MmCrc32U8 MmCrc32U8 //go:noescape func MmCrc32U8(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uchar) // Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 16-bit integer "v", and stores the result in "dst". // //go:linkname MmCrc32U16 MmCrc32U16 //go:noescape func MmCrc32U16(r *x86.Uint, v0 *x86.Uint, v1 *x86.Ushort) // Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 32-bit integer "v", and stores the result in "dst". // //go:linkname MmCrc32U32 MmCrc32U32 //go:noescape func MmCrc32U32(r *x86.Uint, v0 *x86.Uint, v1 *x86.Uint) // Starting with the initial value in "crc", accumulates a CRC32 value for unsigned 64-bit integer "v", and stores the result in "dst". // //go:linkname MmCrc32U64 MmCrc32U64 //go:noescape func MmCrc32U64(r *x86.Ulonglong, v0 *x86.Ulonglong, v1 *x86.Ulonglong) ================================================ FILE: x86/f16c/functions.c ================================================ #include void CvtshSs(float* r, unsigned short* v0) { *r = _cvtsh_ss(*v0); } void MmCvtphPs(__m128* r, __m128i* v0) { *r = _mm_cvtph_ps(*v0); } void Mm256CvtphPs(__m256* r, __m128i* v0) { *r = _mm256_cvtph_ps(*v0); } ================================================ FILE: x86/f16c/functions.go ================================================ package f16c import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mf16c #include */ import "C" // Convert the half-precision (16-bit) floating-point value "a" to a single-precision (32-bit) floating-point value, and store the result in "dst". // //go:linkname CvtshSs CvtshSs //go:noescape func CvtshSs(r *x86.Float, v0 *x86.Ushort) // Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". // //go:linkname MmCvtphPs MmCvtphPs //go:noescape func MmCvtphPs(r *x86.M128, v0 *x86.M128I) // Convert packed half-precision (16-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". // //go:linkname Mm256CvtphPs Mm256CvtphPs //go:noescape func Mm256CvtphPs(r *x86.M256, v0 *x86.M128I) ================================================ FILE: x86/fma/functions.c ================================================ #include void MmFmaddPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmadd_ps(*v0, *v1, *v2); } void MmFmaddPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmadd_pd(*v0, *v1, *v2); } void MmFmaddSs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmadd_ss(*v0, *v1, *v2); } void MmFmaddSd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmadd_sd(*v0, *v1, *v2); } void MmFmsubPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmsub_ps(*v0, *v1, *v2); } void MmFmsubPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmsub_pd(*v0, *v1, *v2); } void MmFmsubSs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmsub_ss(*v0, *v1, *v2); } void MmFmsubSd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmsub_sd(*v0, *v1, *v2); } void MmFnmaddPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fnmadd_ps(*v0, *v1, *v2); } void MmFnmaddPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fnmadd_pd(*v0, *v1, *v2); } void MmFnmaddSs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fnmadd_ss(*v0, *v1, *v2); } void MmFnmaddSd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fnmadd_sd(*v0, *v1, *v2); } void MmFnmsubPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fnmsub_ps(*v0, *v1, *v2); } void MmFnmsubPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fnmsub_pd(*v0, *v1, *v2); } void MmFnmsubSs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fnmsub_ss(*v0, *v1, *v2); } void MmFnmsubSd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fnmsub_sd(*v0, *v1, *v2); } void MmFmaddsubPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmaddsub_ps(*v0, *v1, *v2); } void MmFmaddsubPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmaddsub_pd(*v0, *v1, *v2); } void MmFmsubaddPs(__m128* r, __m128* v0, __m128* v1, __m128* v2) { *r = _mm_fmsubadd_ps(*v0, *v1, *v2); } void MmFmsubaddPd(__m128d* r, __m128d* v0, __m128d* v1, __m128d* v2) { *r = _mm_fmsubadd_pd(*v0, *v1, *v2); } void Mm256FmaddPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fmadd_ps(*v0, *v1, *v2); } void Mm256FmaddPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fmadd_pd(*v0, *v1, *v2); } void Mm256FmsubPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fmsub_ps(*v0, *v1, *v2); } void Mm256FmsubPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fmsub_pd(*v0, *v1, *v2); } void Mm256FnmaddPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fnmadd_ps(*v0, *v1, *v2); } void Mm256FnmaddPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fnmadd_pd(*v0, *v1, *v2); } void Mm256FnmsubPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fnmsub_ps(*v0, *v1, *v2); } void Mm256FnmsubPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fnmsub_pd(*v0, *v1, *v2); } void Mm256FmaddsubPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fmaddsub_ps(*v0, *v1, *v2); } void Mm256FmaddsubPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fmaddsub_pd(*v0, *v1, *v2); } void Mm256FmsubaddPs(__m256* r, __m256* v0, __m256* v1, __m256* v2) { *r = _mm256_fmsubadd_ps(*v0, *v1, *v2); } void Mm256FmsubaddPd(__m256d* r, __m256d* v0, __m256d* v1, __m256d* v2) { *r = _mm256_fmsubadd_pd(*v0, *v1, *v2); } ================================================ FILE: x86/fma/functions.go ================================================ package fma import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mfma #include */ import "C" // Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst". // //go:linkname MmFmaddPs MmFmaddPs //go:noescape func MmFmaddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128) // Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst". // //go:linkname MmFmaddPd MmFmaddPd //go:noescape func MmFmaddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D) // Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmFmaddSs MmFmaddSs //go:noescape func MmFmaddSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128) // Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmFmaddSd MmFmaddSd //go:noescape func MmFmaddSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D) // Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst". // //go:linkname MmFmsubPs MmFmsubPs //go:noescape func MmFmsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128) // Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst". // //go:linkname MmFmsubPd MmFmsubPd //go:noescape func MmFmsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D) // Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmFmsubSs MmFmsubSs //go:noescape func MmFmsubSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128) // Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmFmsubSd MmFmsubSd //go:noescape func MmFmsubSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D) // Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst". // //go:linkname MmFnmaddPs MmFnmaddPs //go:noescape func MmFnmaddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128) // Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst". // //go:linkname MmFnmaddPd MmFnmaddPd //go:noescape func MmFnmaddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D) // Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmFnmaddSs MmFnmaddSs //go:noescape func MmFnmaddSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128) // Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and add the negated intermediate result to the lower element in "c". Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmFnmaddSd MmFnmaddSd //go:noescape func MmFnmaddSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D) // Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst". // //go:linkname MmFnmsubPs MmFnmsubPs //go:noescape func MmFnmsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128) // Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst". // //go:linkname MmFnmsubPd MmFnmsubPd //go:noescape func MmFnmsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D) // Multiply the lower single-precision (32-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmFnmsubSs MmFnmsubSs //go:noescape func MmFnmsubSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128) // Multiply the lower double-precision (64-bit) floating-point elements in "a" and "b", and subtract the lower element in "c" from the negated intermediate result. Store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmFnmsubSd MmFnmsubSd //go:noescape func MmFnmsubSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D) // Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst". // //go:linkname MmFmaddsubPs MmFmaddsubPs //go:noescape func MmFmaddsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128) // Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst". // //go:linkname MmFmaddsubPd MmFmaddsubPd //go:noescape func MmFmaddsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D) // Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst". // //go:linkname MmFmsubaddPs MmFmsubaddPs //go:noescape func MmFmsubaddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128, v2 *x86.M128) // Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst". // //go:linkname MmFmsubaddPd MmFmsubaddPd //go:noescape func MmFmsubaddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D, v2 *x86.M128D) // Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst". // //go:linkname Mm256FmaddPs Mm256FmaddPs //go:noescape func Mm256FmaddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256) // Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the intermediate result to packed elements in "c", and store the results in "dst". // //go:linkname Mm256FmaddPd Mm256FmaddPd //go:noescape func Mm256FmaddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D) // Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst". // //go:linkname Mm256FmsubPs Mm256FmsubPs //go:noescape func Mm256FmsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256) // Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the intermediate result, and store the results in "dst". // //go:linkname Mm256FmsubPd Mm256FmsubPd //go:noescape func Mm256FmsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D) // Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst". // //go:linkname Mm256FnmaddPs Mm256FnmaddPs //go:noescape func Mm256FnmaddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256) // Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", add the negated intermediate result to packed elements in "c", and store the results in "dst". // //go:linkname Mm256FnmaddPd Mm256FnmaddPd //go:noescape func Mm256FnmaddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D) // Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst". // //go:linkname Mm256FnmsubPs Mm256FnmsubPs //go:noescape func Mm256FnmsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256) // Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", subtract packed elements in "c" from the negated intermediate result, and store the results in "dst". // //go:linkname Mm256FnmsubPd Mm256FnmsubPd //go:noescape func Mm256FnmsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D) // Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst". // //go:linkname Mm256FmaddsubPs Mm256FmaddsubPs //go:noescape func Mm256FmaddsubPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256) // Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively add and subtract packed elements in "c" to/from the intermediate result, and store the results in "dst". // //go:linkname Mm256FmaddsubPd Mm256FmaddsubPd //go:noescape func Mm256FmaddsubPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D) // Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst". // //go:linkname Mm256FmsubaddPs Mm256FmsubaddPs //go:noescape func Mm256FmsubaddPs(r *x86.M256, v0 *x86.M256, v1 *x86.M256, v2 *x86.M256) // Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", alternatively subtract and add packed elements in "c" from/to the intermediate result, and store the results in "dst". // //go:linkname Mm256FmsubaddPd Mm256FmsubaddPd //go:noescape func Mm256FmsubaddPd(r *x86.M256D, v0 *x86.M256D, v1 *x86.M256D, v2 *x86.M256D) ================================================ FILE: x86/fsgsbase/functions.c ================================================ #include void ReadfsbaseU32(unsigned int* r) { *r = _readfsbase_u32(); } void ReadfsbaseU64(unsigned long long* r) { *r = _readfsbase_u64(); } void ReadgsbaseU32(unsigned int* r) { *r = _readgsbase_u32(); } void ReadgsbaseU64(unsigned long long* r) { *r = _readgsbase_u64(); } void WritefsbaseU32(unsigned int* v0) { _writefsbase_u32(*v0); } void WritefsbaseU64(unsigned long long* v0) { _writefsbase_u64(*v0); } void WritegsbaseU32(unsigned int* v0) { _writegsbase_u32(*v0); } void WritegsbaseU64(unsigned long long* v0) { _writegsbase_u64(*v0); } ================================================ FILE: x86/fsgsbase/functions.go ================================================ package fsgsbase import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mfsgsbase #include */ import "C" // Read the FS segment base register and store the 32-bit result in "dst". // //go:linkname ReadfsbaseU32 ReadfsbaseU32 //go:noescape func ReadfsbaseU32(r *x86.Uint, ) // Read the FS segment base register and store the 64-bit result in "dst". // //go:linkname ReadfsbaseU64 ReadfsbaseU64 //go:noescape func ReadfsbaseU64(r *x86.Ulonglong, ) // Read the GS segment base register and store the 32-bit result in "dst". // //go:linkname ReadgsbaseU32 ReadgsbaseU32 //go:noescape func ReadgsbaseU32(r *x86.Uint, ) // Read the GS segment base register and store the 64-bit result in "dst". // //go:linkname ReadgsbaseU64 ReadgsbaseU64 //go:noescape func ReadgsbaseU64(r *x86.Ulonglong, ) // Write the unsigned 32-bit integer "a" to the FS segment base register. // //go:linkname WritefsbaseU32 WritefsbaseU32 //go:noescape func WritefsbaseU32(v0 *x86.Uint) // Write the unsigned 64-bit integer "a" to the FS segment base register. // //go:linkname WritefsbaseU64 WritefsbaseU64 //go:noescape func WritefsbaseU64(v0 *x86.Ulonglong) // Write the unsigned 32-bit integer "a" to the GS segment base register. // //go:linkname WritegsbaseU32 WritegsbaseU32 //go:noescape func WritegsbaseU32(v0 *x86.Uint) // Write the unsigned 64-bit integer "a" to the GS segment base register. // //go:linkname WritegsbaseU64 WritegsbaseU64 //go:noescape func WritegsbaseU64(v0 *x86.Ulonglong) ================================================ FILE: x86/generate.go ================================================ package x86 //go:generate go run ../generator/x86 ================================================ FILE: x86/lzcnt/functions.c ================================================ #include void Lzcnt32(unsigned int* r, unsigned int* v0) { *r = __lzcnt32(*v0); } void LzcntU32(unsigned int* r, unsigned int* v0) { *r = _lzcnt_u32(*v0); } void LzcntU64(unsigned long long* r, unsigned long long* v0) { *r = _lzcnt_u64(*v0); } ================================================ FILE: x86/lzcnt/functions.go ================================================ package lzcnt import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mlzcnt #include */ import "C" // __lzcnt32 // //go:linkname Lzcnt32 Lzcnt32 //go:noescape func Lzcnt32(r *x86.Uint, v0 *x86.Uint) // Count the number of leading zero bits in unsigned 32-bit integer "a", and return that count in "dst". // //go:linkname LzcntU32 LzcntU32 //go:noescape func LzcntU32(r *x86.Uint, v0 *x86.Uint) // Count the number of leading zero bits in unsigned 64-bit integer "a", and return that count in "dst". // //go:linkname LzcntU64 LzcntU64 //go:noescape func LzcntU64(r *x86.Ulonglong, v0 *x86.Ulonglong) ================================================ FILE: x86/mmx/functions.c ================================================ #include void MmEmpty() { _mm_empty(); } void MmCvtsi32Si64(__m64* r, int* v0) { *r = _mm_cvtsi32_si64(*v0); } void MmCvtsi64Si32(int* r, __m64* v0) { *r = _mm_cvtsi64_si32(*v0); } void MmCvtsi64M64(__m64* r, long long* v0) { *r = _mm_cvtsi64_m64(*v0); } void MmCvtm64Si64(long long* r, __m64* v0) { *r = _mm_cvtm64_si64(*v0); } void MmPacksPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_packs_pi16(*v0, *v1); } void MmPacksPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_packs_pi32(*v0, *v1); } void MmPacksPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_packs_pu16(*v0, *v1); } void MmUnpackhiPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpackhi_pi8(*v0, *v1); } void MmUnpackhiPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpackhi_pi16(*v0, *v1); } void MmUnpackhiPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpackhi_pi32(*v0, *v1); } void MmUnpackloPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpacklo_pi8(*v0, *v1); } void MmUnpackloPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpacklo_pi16(*v0, *v1); } void MmUnpackloPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_unpacklo_pi32(*v0, *v1); } void MmAddPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_add_pi8(*v0, *v1); } void MmAddPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_add_pi16(*v0, *v1); } void MmAddPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_add_pi32(*v0, *v1); } void MmAddsPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_adds_pi8(*v0, *v1); } void MmAddsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_adds_pi16(*v0, *v1); } void MmAddsPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_adds_pu8(*v0, *v1); } void MmAddsPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_adds_pu16(*v0, *v1); } void MmSubPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sub_pi8(*v0, *v1); } void MmSubPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sub_pi16(*v0, *v1); } void MmSubPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sub_pi32(*v0, *v1); } void MmSubsPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_subs_pi8(*v0, *v1); } void MmSubsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_subs_pi16(*v0, *v1); } void MmSubsPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_subs_pu8(*v0, *v1); } void MmSubsPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_subs_pu16(*v0, *v1); } void MmMaddPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_madd_pi16(*v0, *v1); } void MmMulhiPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mulhi_pi16(*v0, *v1); } void MmMulloPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mullo_pi16(*v0, *v1); } void MmSllPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sll_pi16(*v0, *v1); } void MmSlliPi16(__m64* r, __m64* v0, int* v1) { *r = _mm_slli_pi16(*v0, *v1); } void MmSllPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sll_pi32(*v0, *v1); } void MmSlliPi32(__m64* r, __m64* v0, int* v1) { *r = _mm_slli_pi32(*v0, *v1); } void MmSllSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sll_si64(*v0, *v1); } void MmSlliSi64(__m64* r, __m64* v0, int* v1) { *r = _mm_slli_si64(*v0, *v1); } void MmSraPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sra_pi16(*v0, *v1); } void MmSraiPi16(__m64* r, __m64* v0, int* v1) { *r = _mm_srai_pi16(*v0, *v1); } void MmSraPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sra_pi32(*v0, *v1); } void MmSraiPi32(__m64* r, __m64* v0, int* v1) { *r = _mm_srai_pi32(*v0, *v1); } void MmSrlPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_srl_pi16(*v0, *v1); } void MmSrliPi16(__m64* r, __m64* v0, int* v1) { *r = _mm_srli_pi16(*v0, *v1); } void MmSrlPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_srl_pi32(*v0, *v1); } void MmSrliPi32(__m64* r, __m64* v0, int* v1) { *r = _mm_srli_pi32(*v0, *v1); } void MmSrlSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_srl_si64(*v0, *v1); } void MmSrliSi64(__m64* r, __m64* v0, int* v1) { *r = _mm_srli_si64(*v0, *v1); } void MmAndSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_and_si64(*v0, *v1); } void MmAndnotSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_andnot_si64(*v0, *v1); } void MmOrSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_or_si64(*v0, *v1); } void MmXorSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_xor_si64(*v0, *v1); } void MmCmpeqPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpeq_pi8(*v0, *v1); } void MmCmpeqPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpeq_pi16(*v0, *v1); } void MmCmpeqPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpeq_pi32(*v0, *v1); } void MmCmpgtPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpgt_pi8(*v0, *v1); } void MmCmpgtPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpgt_pi16(*v0, *v1); } void MmCmpgtPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_cmpgt_pi32(*v0, *v1); } void MmSetzeroSi64(__m64* r) { *r = _mm_setzero_si64(); } void MmSetPi32(__m64* r, int* v0, int* v1) { *r = _mm_set_pi32(*v0, *v1); } void MmSetPi16(__m64* r, short* v0, short* v1, short* v2, short* v3) { *r = _mm_set_pi16(*v0, *v1, *v2, *v3); } void MmSetPi8(__m64* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7) { *r = _mm_set_pi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); } void MmSet1Pi32(__m64* r, int* v0) { *r = _mm_set1_pi32(*v0); } void MmSet1Pi16(__m64* r, short* v0) { *r = _mm_set1_pi16(*v0); } void MmSet1Pi8(__m64* r, char* v0) { *r = _mm_set1_pi8(*v0); } void MmSetrPi32(__m64* r, int* v0, int* v1) { *r = _mm_setr_pi32(*v0, *v1); } void MmSetrPi16(__m64* r, short* v0, short* v1, short* v2, short* v3) { *r = _mm_setr_pi16(*v0, *v1, *v2, *v3); } void MmSetrPi8(__m64* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7) { *r = _mm_setr_pi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); } ================================================ FILE: x86/mmx/functions.go ================================================ package mmx import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mmmx #include */ import "C" // Empty the MMX state, which marks the x87 FPU registers as available for use by x87 instructions. This instruction must be used at the end of all MMX technology procedures. // //go:linkname MmEmpty MmEmpty //go:noescape func MmEmpty() // Copy 32-bit integer "a" to the lower elements of "dst", and zero the upper element of "dst". // //go:linkname MmCvtsi32Si64 MmCvtsi32Si64 //go:noescape func MmCvtsi32Si64(r *x86.M64, v0 *x86.Int) // Copy the lower 32-bit integer in "a" to "dst". // //go:linkname MmCvtsi64Si32 MmCvtsi64Si32 //go:noescape func MmCvtsi64Si32(r *x86.Int, v0 *x86.M64) // Copy 64-bit integer "a" to "dst". // //go:linkname MmCvtsi64M64 MmCvtsi64M64 //go:noescape func MmCvtsi64M64(r *x86.M64, v0 *x86.Longlong) // Copy 64-bit integer "a" to "dst". // //go:linkname MmCvtm64Si64 MmCvtm64Si64 //go:noescape func MmCvtm64Si64(r *x86.Longlong, v0 *x86.M64) // Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst". // //go:linkname MmPacksPi16 MmPacksPi16 //go:noescape func MmPacksPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst". // //go:linkname MmPacksPi32 MmPacksPi32 //go:noescape func MmPacksPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst". // //go:linkname MmPacksPu16 MmPacksPu16 //go:noescape func MmPacksPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackhiPi8 MmUnpackhiPi8 //go:noescape func MmUnpackhiPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackhiPi16 MmUnpackhiPi16 //go:noescape func MmUnpackhiPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackhiPi32 MmUnpackhiPi32 //go:noescape func MmUnpackhiPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackloPi8 MmUnpackloPi8 //go:noescape func MmUnpackloPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackloPi16 MmUnpackloPi16 //go:noescape func MmUnpackloPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackloPi32 MmUnpackloPi32 //go:noescape func MmUnpackloPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Add packed 8-bit integers in "a" and "b", and store the results in "dst". // //go:linkname MmAddPi8 MmAddPi8 //go:noescape func MmAddPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Add packed 16-bit integers in "a" and "b", and store the results in "dst". // //go:linkname MmAddPi16 MmAddPi16 //go:noescape func MmAddPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Add packed 32-bit integers in "a" and "b", and store the results in "dst". // //go:linkname MmAddPi32 MmAddPi32 //go:noescape func MmAddPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst". // //go:linkname MmAddsPi8 MmAddsPi8 //go:noescape func MmAddsPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst". // //go:linkname MmAddsPi16 MmAddsPi16 //go:noescape func MmAddsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst". // //go:linkname MmAddsPu8 MmAddsPu8 //go:noescape func MmAddsPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst". // //go:linkname MmAddsPu16 MmAddsPu16 //go:noescape func MmAddsPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst". // //go:linkname MmSubPi8 MmSubPi8 //go:noescape func MmSubPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst". // //go:linkname MmSubPi16 MmSubPi16 //go:noescape func MmSubPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst". // //go:linkname MmSubPi32 MmSubPi32 //go:noescape func MmSubPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst". // //go:linkname MmSubsPi8 MmSubsPi8 //go:noescape func MmSubsPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst". // //go:linkname MmSubsPi16 MmSubsPi16 //go:noescape func MmSubsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst". // //go:linkname MmSubsPu8 MmSubsPu8 //go:noescape func MmSubsPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst". // //go:linkname MmSubsPu16 MmSubsPu16 //go:noescape func MmSubsPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst". // //go:linkname MmMaddPi16 MmMaddPi16 //go:noescape func MmMaddPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". // //go:linkname MmMulhiPi16 MmMulhiPi16 //go:noescape func MmMulhiPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst". // //go:linkname MmMulloPi16 MmMulloPi16 //go:noescape func MmMulloPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". // //go:linkname MmSllPi16 MmSllPi16 //go:noescape func MmSllPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname MmSlliPi16 MmSlliPi16 //go:noescape func MmSlliPi16(r *x86.M64, v0 *x86.M64, v1 *x86.Int) // Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". // //go:linkname MmSllPi32 MmSllPi32 //go:noescape func MmSllPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname MmSlliPi32 MmSlliPi32 //go:noescape func MmSlliPi32(r *x86.M64, v0 *x86.M64, v1 *x86.Int) // Shift 64-bit integer "a" left by "count" while shifting in zeros, and store the result in "dst". // //go:linkname MmSllSi64 MmSllSi64 //go:noescape func MmSllSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Shift 64-bit integer "a" left by "imm8" while shifting in zeros, and store the result in "dst". // //go:linkname MmSlliSi64 MmSlliSi64 //go:noescape func MmSlliSi64(r *x86.M64, v0 *x86.M64, v1 *x86.Int) // Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". // //go:linkname MmSraPi16 MmSraPi16 //go:noescape func MmSraPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". // //go:linkname MmSraiPi16 MmSraiPi16 //go:noescape func MmSraiPi16(r *x86.M64, v0 *x86.M64, v1 *x86.Int) // Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". // //go:linkname MmSraPi32 MmSraPi32 //go:noescape func MmSraPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". // //go:linkname MmSraiPi32 MmSraiPi32 //go:noescape func MmSraiPi32(r *x86.M64, v0 *x86.M64, v1 *x86.Int) // Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". // //go:linkname MmSrlPi16 MmSrlPi16 //go:noescape func MmSrlPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname MmSrliPi16 MmSrliPi16 //go:noescape func MmSrliPi16(r *x86.M64, v0 *x86.M64, v1 *x86.Int) // Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". // //go:linkname MmSrlPi32 MmSrlPi32 //go:noescape func MmSrlPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname MmSrliPi32 MmSrliPi32 //go:noescape func MmSrliPi32(r *x86.M64, v0 *x86.M64, v1 *x86.Int) // Shift 64-bit integer "a" right by "count" while shifting in zeros, and store the result in "dst". // //go:linkname MmSrlSi64 MmSrlSi64 //go:noescape func MmSrlSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Shift 64-bit integer "a" right by "imm8" while shifting in zeros, and store the result in "dst". // //go:linkname MmSrliSi64 MmSrliSi64 //go:noescape func MmSrliSi64(r *x86.M64, v0 *x86.M64, v1 *x86.Int) // Compute the bitwise AND of 64 bits (representing integer data) in "a" and "b", and store the result in "dst". // //go:linkname MmAndSi64 MmAndSi64 //go:noescape func MmAndSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Compute the bitwise NOT of 64 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst". // //go:linkname MmAndnotSi64 MmAndnotSi64 //go:noescape func MmAndnotSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Compute the bitwise OR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst". // //go:linkname MmOrSi64 MmOrSi64 //go:noescape func MmOrSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Compute the bitwise XOR of 64 bits (representing integer data) in "a" and "b", and store the result in "dst". // //go:linkname MmXorSi64 MmXorSi64 //go:noescape func MmXorSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst". // //go:linkname MmCmpeqPi8 MmCmpeqPi8 //go:noescape func MmCmpeqPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst". // //go:linkname MmCmpeqPi16 MmCmpeqPi16 //go:noescape func MmCmpeqPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst". // //go:linkname MmCmpeqPi32 MmCmpeqPi32 //go:noescape func MmCmpeqPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst". // //go:linkname MmCmpgtPi8 MmCmpgtPi8 //go:noescape func MmCmpgtPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst". // //go:linkname MmCmpgtPi16 MmCmpgtPi16 //go:noescape func MmCmpgtPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst". // //go:linkname MmCmpgtPi32 MmCmpgtPi32 //go:noescape func MmCmpgtPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Return vector of type __m64 with all elements set to zero. // //go:linkname MmSetzeroSi64 MmSetzeroSi64 //go:noescape func MmSetzeroSi64(r *x86.M64, ) // Set packed 32-bit integers in "dst" with the supplied values. // //go:linkname MmSetPi32 MmSetPi32 //go:noescape func MmSetPi32(r *x86.M64, v0 *x86.Int, v1 *x86.Int) // Set packed 16-bit integers in "dst" with the supplied values. // //go:linkname MmSetPi16 MmSetPi16 //go:noescape func MmSetPi16(r *x86.M64, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short) // Set packed 8-bit integers in "dst" with the supplied values. // //go:linkname MmSetPi8 MmSetPi8 //go:noescape func MmSetPi8(r *x86.M64, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char) // Broadcast 32-bit integer "a" to all elements of "dst". // //go:linkname MmSet1Pi32 MmSet1Pi32 //go:noescape func MmSet1Pi32(r *x86.M64, v0 *x86.Int) // Broadcast 16-bit integer "a" to all all elements of "dst". // //go:linkname MmSet1Pi16 MmSet1Pi16 //go:noescape func MmSet1Pi16(r *x86.M64, v0 *x86.Short) // Broadcast 8-bit integer "a" to all elements of "dst". // //go:linkname MmSet1Pi8 MmSet1Pi8 //go:noescape func MmSet1Pi8(r *x86.M64, v0 *x86.Char) // Set packed 32-bit integers in "dst" with the supplied values in reverse order. // //go:linkname MmSetrPi32 MmSetrPi32 //go:noescape func MmSetrPi32(r *x86.M64, v0 *x86.Int, v1 *x86.Int) // Set packed 16-bit integers in "dst" with the supplied values in reverse order. // //go:linkname MmSetrPi16 MmSetrPi16 //go:noescape func MmSetrPi16(r *x86.M64, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short) // Set packed 8-bit integers in "dst" with the supplied values in reverse order. // //go:linkname MmSetrPi8 MmSetrPi8 //go:noescape func MmSetrPi8(r *x86.M64, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char) ================================================ FILE: x86/mmx_sse/functions.c ================================================ #include void MmCvtpsPi32(__m64* r, __m128* v0) { *r = _mm_cvtps_pi32(*v0); } void MmCvtPs2Pi(__m64* r, __m128* v0) { *r = _mm_cvt_ps2pi(*v0); } void MmCvttpsPi32(__m64* r, __m128* v0) { *r = _mm_cvttps_pi32(*v0); } void MmCvttPs2Pi(__m64* r, __m128* v0) { *r = _mm_cvtt_ps2pi(*v0); } void MmCvtpi32Ps(__m128* r, __m128* v0, __m64* v1) { *r = _mm_cvtpi32_ps(*v0, *v1); } void MmCvtPi2Ps(__m128* r, __m128* v0, __m64* v1) { *r = _mm_cvt_pi2ps(*v0, *v1); } void MmMaxPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_max_pi16(*v0, *v1); } void MmMaxPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_max_pu8(*v0, *v1); } void MmMinPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_min_pi16(*v0, *v1); } void MmMinPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_min_pu8(*v0, *v1); } void MmMovemaskPi8(int* r, __m64* v0) { *r = _mm_movemask_pi8(*v0); } void MmMulhiPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mulhi_pu16(*v0, *v1); } void MmAvgPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_avg_pu8(*v0, *v1); } void MmAvgPu16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_avg_pu16(*v0, *v1); } void MmSadPu8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sad_pu8(*v0, *v1); } void MmCvtpi16Ps(__m128* r, __m64* v0) { *r = _mm_cvtpi16_ps(*v0); } void MmCvtpu16Ps(__m128* r, __m64* v0) { *r = _mm_cvtpu16_ps(*v0); } void MmCvtpi8Ps(__m128* r, __m64* v0) { *r = _mm_cvtpi8_ps(*v0); } void MmCvtpu8Ps(__m128* r, __m64* v0) { *r = _mm_cvtpu8_ps(*v0); } void MmCvtpi32X2Ps(__m128* r, __m64* v0, __m64* v1) { *r = _mm_cvtpi32x2_ps(*v0, *v1); } void MmCvtpsPi16(__m64* r, __m128* v0) { *r = _mm_cvtps_pi16(*v0); } void MmCvtpsPi8(__m64* r, __m128* v0) { *r = _mm_cvtps_pi8(*v0); } ================================================ FILE: x86/mmx_sse/functions.go ================================================ package mmx_sse import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mmmx -msse #include */ import "C" // Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". // //go:linkname MmCvtpsPi32 MmCvtpsPi32 //go:noescape func MmCvtpsPi32(r *x86.M64, v0 *x86.M128) // Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". // //go:linkname MmCvtPs2Pi MmCvtPs2Pi //go:noescape func MmCvtPs2Pi(r *x86.M64, v0 *x86.M128) // Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". // //go:linkname MmCvttpsPi32 MmCvttpsPi32 //go:noescape func MmCvttpsPi32(r *x86.M64, v0 *x86.M128) // Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". // //go:linkname MmCvttPs2Pi MmCvttPs2Pi //go:noescape func MmCvttPs2Pi(r *x86.M64, v0 *x86.M128) // Convert packed 32-bit integers in "b" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", and copy the upper 2 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCvtpi32Ps MmCvtpi32Ps //go:noescape func MmCvtpi32Ps(r *x86.M128, v0 *x86.M128, v1 *x86.M64) // Convert packed signed 32-bit integers in "b" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", and copy the upper 2 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCvtPi2Ps MmCvtPi2Ps //go:noescape func MmCvtPi2Ps(r *x86.M128, v0 *x86.M128, v1 *x86.M64) // Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst". // //go:linkname MmMaxPi16 MmMaxPi16 //go:noescape func MmMaxPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst". // //go:linkname MmMaxPu8 MmMaxPu8 //go:noescape func MmMaxPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst". // //go:linkname MmMinPi16 MmMinPi16 //go:noescape func MmMinPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst". // //go:linkname MmMinPu8 MmMinPu8 //go:noescape func MmMinPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst". // //go:linkname MmMovemaskPi8 MmMovemaskPi8 //go:noescape func MmMovemaskPi8(r *x86.Int, v0 *x86.M64) // Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". // //go:linkname MmMulhiPu16 MmMulhiPu16 //go:noescape func MmMulhiPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst". // //go:linkname MmAvgPu8 MmAvgPu8 //go:noescape func MmAvgPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst". // //go:linkname MmAvgPu16 MmAvgPu16 //go:noescape func MmAvgPu16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of "dst". // //go:linkname MmSadPu8 MmSadPu8 //go:noescape func MmSadPu8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Convert packed 16-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". // //go:linkname MmCvtpi16Ps MmCvtpi16Ps //go:noescape func MmCvtpi16Ps(r *x86.M128, v0 *x86.M64) // Convert packed unsigned 16-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". // //go:linkname MmCvtpu16Ps MmCvtpu16Ps //go:noescape func MmCvtpu16Ps(r *x86.M128, v0 *x86.M64) // Convert the lower packed 8-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". // //go:linkname MmCvtpi8Ps MmCvtpi8Ps //go:noescape func MmCvtpi8Ps(r *x86.M128, v0 *x86.M64) // Convert the lower packed unsigned 8-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". // //go:linkname MmCvtpu8Ps MmCvtpu8Ps //go:noescape func MmCvtpu8Ps(r *x86.M128, v0 *x86.M64) // Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements of "dst", then covert the packed signed 32-bit integers in "b" to single-precision (32-bit) floating-point element, and store the results in the upper 2 elements of "dst". // //go:linkname MmCvtpi32X2Ps MmCvtpi32X2Ps //go:noescape func MmCvtpi32X2Ps(r *x86.M128, v0 *x86.M64, v1 *x86.M64) // Convert packed single-precision (32-bit) floating-point elements in "a" to packed 16-bit integers, and store the results in "dst". Note: this intrinsic will generate 0x7FFF, rather than 0x8000, for input values between 0x7FFF and 0x7FFFFFFF. // //go:linkname MmCvtpsPi16 MmCvtpsPi16 //go:noescape func MmCvtpsPi16(r *x86.M64, v0 *x86.M128) // Convert packed single-precision (32-bit) floating-point elements in "a" to packed 8-bit integers, and store the results in lower 4 elements of "dst". Note: this intrinsic will generate 0x7F, rather than 0x80, for input values between 0x7F and 0x7FFFFFFF. // //go:linkname MmCvtpsPi8 MmCvtpsPi8 //go:noescape func MmCvtpsPi8(r *x86.M64, v0 *x86.M128) ================================================ FILE: x86/mmx_sse2/functions.c ================================================ #include void MmCvtpdPi32(__m64* r, __m128d* v0) { *r = _mm_cvtpd_pi32(*v0); } void MmCvttpdPi32(__m64* r, __m128d* v0) { *r = _mm_cvttpd_pi32(*v0); } void MmCvtpi32Pd(__m128d* r, __m64* v0) { *r = _mm_cvtpi32_pd(*v0); } void MmAddSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_add_si64(*v0, *v1); } void MmMulSu32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mul_su32(*v0, *v1); } void MmSubSi64(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sub_si64(*v0, *v1); } ================================================ FILE: x86/mmx_sse2/functions.go ================================================ package mmx_sse2 import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mmmx -msse2 #include */ import "C" // Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". // //go:linkname MmCvtpdPi32 MmCvtpdPi32 //go:noescape func MmCvtpdPi32(r *x86.M64, v0 *x86.M128D) // Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". // //go:linkname MmCvttpdPi32 MmCvttpdPi32 //go:noescape func MmCvttpdPi32(r *x86.M64, v0 *x86.M128D) // Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". // //go:linkname MmCvtpi32Pd MmCvtpi32Pd //go:noescape func MmCvtpi32Pd(r *x86.M128D, v0 *x86.M64) // Add 64-bit integers "a" and "b", and store the result in "dst". // //go:linkname MmAddSi64 MmAddSi64 //go:noescape func MmAddSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Multiply the low unsigned 32-bit integers from "a" and "b", and store the unsigned 64-bit result in "dst". // //go:linkname MmMulSu32 MmMulSu32 //go:noescape func MmMulSu32(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Subtract 64-bit integer "b" from 64-bit integer "a", and store the result in "dst". // //go:linkname MmSubSi64 MmSubSi64 //go:noescape func MmSubSi64(r *x86.M64, v0 *x86.M64, v1 *x86.M64) ================================================ FILE: x86/mmx_ssse3/functions.c ================================================ #include void MmAbsPi8(__m64* r, __m64* v0) { *r = _mm_abs_pi8(*v0); } void MmAbsPi16(__m64* r, __m64* v0) { *r = _mm_abs_pi16(*v0); } void MmAbsPi32(__m64* r, __m64* v0) { *r = _mm_abs_pi32(*v0); } void MmHaddPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hadd_pi16(*v0, *v1); } void MmHaddPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hadd_pi32(*v0, *v1); } void MmHaddsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hadds_pi16(*v0, *v1); } void MmHsubPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hsub_pi16(*v0, *v1); } void MmHsubPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hsub_pi32(*v0, *v1); } void MmHsubsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_hsubs_pi16(*v0, *v1); } void MmMaddubsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_maddubs_pi16(*v0, *v1); } void MmMulhrsPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_mulhrs_pi16(*v0, *v1); } void MmShufflePi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_shuffle_pi8(*v0, *v1); } void MmSignPi8(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sign_pi8(*v0, *v1); } void MmSignPi16(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sign_pi16(*v0, *v1); } void MmSignPi32(__m64* r, __m64* v0, __m64* v1) { *r = _mm_sign_pi32(*v0, *v1); } ================================================ FILE: x86/mmx_ssse3/functions.go ================================================ package mmx_ssse3 import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mmmx -mssse3 #include */ import "C" // Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst". // //go:linkname MmAbsPi8 MmAbsPi8 //go:noescape func MmAbsPi8(r *x86.M64, v0 *x86.M64) // Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst". // //go:linkname MmAbsPi16 MmAbsPi16 //go:noescape func MmAbsPi16(r *x86.M64, v0 *x86.M64) // Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst". // //go:linkname MmAbsPi32 MmAbsPi32 //go:noescape func MmAbsPi32(r *x86.M64, v0 *x86.M64) // Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst". // //go:linkname MmHaddPi16 MmHaddPi16 //go:noescape func MmHaddPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst". // //go:linkname MmHaddPi32 MmHaddPi32 //go:noescape func MmHaddPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst". // //go:linkname MmHaddsPi16 MmHaddsPi16 //go:noescape func MmHaddsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst". // //go:linkname MmHsubPi16 MmHsubPi16 //go:noescape func MmHsubPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst". // //go:linkname MmHsubPi32 MmHsubPi32 //go:noescape func MmHsubPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst". // //go:linkname MmHsubsPi16 MmHsubsPi16 //go:noescape func MmHsubsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst". // //go:linkname MmMaddubsPi16 MmMaddubsPi16 //go:noescape func MmMaddubsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst". // //go:linkname MmMulhrsPi16 MmMulhrsPi16 //go:noescape func MmMulhrsPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst". // //go:linkname MmShufflePi8 MmShufflePi8 //go:noescape func MmShufflePi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Negate packed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. // //go:linkname MmSignPi8 MmSignPi8 //go:noescape func MmSignPi8(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Negate packed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. // //go:linkname MmSignPi16 MmSignPi16 //go:noescape func MmSignPi16(r *x86.M64, v0 *x86.M64, v1 *x86.M64) // Negate packed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. // //go:linkname MmSignPi32 MmSignPi32 //go:noescape func MmSignPi32(r *x86.M64, v0 *x86.M64, v1 *x86.M64) ================================================ FILE: x86/popcnt/functions.c ================================================ #include void MmPopcntU32(int* r, unsigned int* v0) { *r = _mm_popcnt_u32(*v0); } void MmPopcntU64(long long* r, unsigned long long* v0) { *r = _mm_popcnt_u64(*v0); } ================================================ FILE: x86/popcnt/functions.go ================================================ package popcnt import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mpopcnt #include */ import "C" // Count the number of bits set to 1 in unsigned 32-bit integer "a", and return that count in "dst". // //go:linkname MmPopcntU32 MmPopcntU32 //go:noescape func MmPopcntU32(r *x86.Int, v0 *x86.Uint) // Count the number of bits set to 1 in unsigned 64-bit integer "a", and return that count in "dst". // //go:linkname MmPopcntU64 MmPopcntU64 //go:noescape func MmPopcntU64(r *x86.Longlong, v0 *x86.Ulonglong) ================================================ FILE: x86/sse/functions.c ================================================ #include void MmAddSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_add_ss(*v0, *v1); } void MmAddPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_add_ps(*v0, *v1); } void MmSubSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_sub_ss(*v0, *v1); } void MmSubPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_sub_ps(*v0, *v1); } void MmMulSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_mul_ss(*v0, *v1); } void MmMulPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_mul_ps(*v0, *v1); } void MmDivSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_div_ss(*v0, *v1); } void MmDivPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_div_ps(*v0, *v1); } void MmSqrtSs(__m128* r, __m128* v0) { *r = _mm_sqrt_ss(*v0); } void MmSqrtPs(__m128* r, __m128* v0) { *r = _mm_sqrt_ps(*v0); } void MmRcpSs(__m128* r, __m128* v0) { *r = _mm_rcp_ss(*v0); } void MmRcpPs(__m128* r, __m128* v0) { *r = _mm_rcp_ps(*v0); } void MmRsqrtSs(__m128* r, __m128* v0) { *r = _mm_rsqrt_ss(*v0); } void MmRsqrtPs(__m128* r, __m128* v0) { *r = _mm_rsqrt_ps(*v0); } void MmMinSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_min_ss(*v0, *v1); } void MmMinPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_min_ps(*v0, *v1); } void MmMaxSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_max_ss(*v0, *v1); } void MmMaxPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_max_ps(*v0, *v1); } void MmAndPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_and_ps(*v0, *v1); } void MmAndnotPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_andnot_ps(*v0, *v1); } void MmOrPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_or_ps(*v0, *v1); } void MmXorPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_xor_ps(*v0, *v1); } void MmCmpeqSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpeq_ss(*v0, *v1); } void MmCmpeqPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpeq_ps(*v0, *v1); } void MmCmpltSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmplt_ss(*v0, *v1); } void MmCmpltPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmplt_ps(*v0, *v1); } void MmCmpleSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmple_ss(*v0, *v1); } void MmCmplePs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmple_ps(*v0, *v1); } void MmCmpgtSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpgt_ss(*v0, *v1); } void MmCmpgtPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpgt_ps(*v0, *v1); } void MmCmpgeSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpge_ss(*v0, *v1); } void MmCmpgePs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpge_ps(*v0, *v1); } void MmCmpneqSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpneq_ss(*v0, *v1); } void MmCmpneqPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpneq_ps(*v0, *v1); } void MmCmpnltSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnlt_ss(*v0, *v1); } void MmCmpnltPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnlt_ps(*v0, *v1); } void MmCmpnleSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnle_ss(*v0, *v1); } void MmCmpnlePs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnle_ps(*v0, *v1); } void MmCmpngtSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpngt_ss(*v0, *v1); } void MmCmpngtPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpngt_ps(*v0, *v1); } void MmCmpngeSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnge_ss(*v0, *v1); } void MmCmpngePs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpnge_ps(*v0, *v1); } void MmCmpordSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpord_ss(*v0, *v1); } void MmCmpordPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpord_ps(*v0, *v1); } void MmCmpunordSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpunord_ss(*v0, *v1); } void MmCmpunordPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_cmpunord_ps(*v0, *v1); } void MmComieqSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comieq_ss(*v0, *v1); } void MmComiltSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comilt_ss(*v0, *v1); } void MmComileSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comile_ss(*v0, *v1); } void MmComigtSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comigt_ss(*v0, *v1); } void MmComigeSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comige_ss(*v0, *v1); } void MmComineqSs(int* r, __m128* v0, __m128* v1) { *r = _mm_comineq_ss(*v0, *v1); } void MmUcomieqSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomieq_ss(*v0, *v1); } void MmUcomiltSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomilt_ss(*v0, *v1); } void MmUcomileSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomile_ss(*v0, *v1); } void MmUcomigtSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomigt_ss(*v0, *v1); } void MmUcomigeSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomige_ss(*v0, *v1); } void MmUcomineqSs(int* r, __m128* v0, __m128* v1) { *r = _mm_ucomineq_ss(*v0, *v1); } void MmCvtssSi32(int* r, __m128* v0) { *r = _mm_cvtss_si32(*v0); } void MmCvtSs2Si(int* r, __m128* v0) { *r = _mm_cvt_ss2si(*v0); } void MmCvtssSi64(long long* r, __m128* v0) { *r = _mm_cvtss_si64(*v0); } void MmCvttssSi32(int* r, __m128* v0) { *r = _mm_cvttss_si32(*v0); } void MmCvttSs2Si(int* r, __m128* v0) { *r = _mm_cvtt_ss2si(*v0); } void MmCvttssSi64(long long* r, __m128* v0) { *r = _mm_cvttss_si64(*v0); } void MmCvtsi32Ss(__m128* r, __m128* v0, int* v1) { *r = _mm_cvtsi32_ss(*v0, *v1); } void MmCvtSi2Ss(__m128* r, __m128* v0, int* v1) { *r = _mm_cvt_si2ss(*v0, *v1); } void MmCvtsi64Ss(__m128* r, __m128* v0, long long* v1) { *r = _mm_cvtsi64_ss(*v0, *v1); } void MmCvtssF32(float* r, __m128* v0) { *r = _mm_cvtss_f32(*v0); } void MmUndefinedPs(__m128* r) { *r = _mm_undefined_ps(); } void MmSetSs(__m128* r, float* v0) { *r = _mm_set_ss(*v0); } void MmSet1Ps(__m128* r, float* v0) { *r = _mm_set1_ps(*v0); } void MmSetPs1(__m128* r, float* v0) { *r = _mm_set_ps1(*v0); } void MmSetPs(__m128* r, float* v0, float* v1, float* v2, float* v3) { *r = _mm_set_ps(*v0, *v1, *v2, *v3); } void MmSetrPs(__m128* r, float* v0, float* v1, float* v2, float* v3) { *r = _mm_setr_ps(*v0, *v1, *v2, *v3); } void MmSetzeroPs(__m128* r) { *r = _mm_setzero_ps(); } void MmUnpackhiPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_unpackhi_ps(*v0, *v1); } void MmUnpackloPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_unpacklo_ps(*v0, *v1); } void MmMoveSs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_move_ss(*v0, *v1); } void MmMovehlPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_movehl_ps(*v0, *v1); } void MmMovelhPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_movelh_ps(*v0, *v1); } void MmMovemaskPs(int* r, __m128* v0) { *r = _mm_movemask_ps(*v0); } ================================================ FILE: x86/sse/functions.go ================================================ package sse import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -msse #include */ import "C" // Add the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmAddSs MmAddSs //go:noescape func MmAddSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Add packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname MmAddPs MmAddPs //go:noescape func MmAddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Subtract the lower single-precision (32-bit) floating-point element in "b" from the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmSubSs MmSubSs //go:noescape func MmSubSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Subtract packed single-precision (32-bit) floating-point elements in "b" from packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". // //go:linkname MmSubPs MmSubPs //go:noescape func MmSubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Multiply the lower single-precision (32-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmMulSs MmMulSs //go:noescape func MmMulSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Multiply packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname MmMulPs MmMulPs //go:noescape func MmMulPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Divide the lower single-precision (32-bit) floating-point element in "a" by the lower single-precision (32-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmDivSs MmDivSs //go:noescape func MmDivSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Divide packed single-precision (32-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst". // //go:linkname MmDivPs MmDivPs //go:noescape func MmDivPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compute the square root of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmSqrtSs MmSqrtSs //go:noescape func MmSqrtSs(r *x86.M128, v0 *x86.M128) // Compute the square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". // //go:linkname MmSqrtPs MmSqrtPs //go:noescape func MmSqrtPs(r *x86.M128, v0 *x86.M128) // Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12. // //go:linkname MmRcpSs MmRcpSs //go:noescape func MmRcpSs(r *x86.M128, v0 *x86.M128) // Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12. // //go:linkname MmRcpPs MmRcpPs //go:noescape func MmRcpPs(r *x86.M128, v0 *x86.M128) // Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". The maximum relative error for this approximation is less than 1.5*2^-12. // //go:linkname MmRsqrtSs MmRsqrtSs //go:noescape func MmRsqrtSs(r *x86.M128, v0 *x86.M128) // Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in "a", and store the results in "dst". The maximum relative error for this approximation is less than 1.5*2^-12. // //go:linkname MmRsqrtPs MmRsqrtPs //go:noescape func MmRsqrtPs(r *x86.M128, v0 *x86.M128) // Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper element of "dst". [min_float_note] // //go:linkname MmMinSs MmMinSs //go:noescape func MmMinSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note] // //go:linkname MmMinPs MmMinPs //go:noescape func MmMinPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper element of "dst". [max_float_note] // //go:linkname MmMaxSs MmMaxSs //go:noescape func MmMaxSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note] // //go:linkname MmMaxPs MmMaxPs //go:noescape func MmMaxPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname MmAndPs MmAndPs //go:noescape func MmAndPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst". // //go:linkname MmAndnotPs MmAndnotPs //go:noescape func MmAndnotPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname MmOrPs MmOrPs //go:noescape func MmOrPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname MmXorPs MmXorPs //go:noescape func MmXorPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for equality, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCmpeqSs MmCmpeqSs //go:noescape func MmCmpeqSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for equality, and store the results in "dst". // //go:linkname MmCmpeqPs MmCmpeqPs //go:noescape func MmCmpeqPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for less-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCmpltSs MmCmpltSs //go:noescape func MmCmpltSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than, and store the results in "dst". // //go:linkname MmCmpltPs MmCmpltPs //go:noescape func MmCmpltPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCmpleSs MmCmpleSs //go:noescape func MmCmpleSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in "dst". // //go:linkname MmCmplePs MmCmplePs //go:noescape func MmCmplePs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for greater-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCmpgtSs MmCmpgtSs //go:noescape func MmCmpgtSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for greater-than, and store the results in "dst". // //go:linkname MmCmpgtPs MmCmpgtPs //go:noescape func MmCmpgtPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for greater-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCmpgeSs MmCmpgeSs //go:noescape func MmCmpgeSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and store the results in "dst". // //go:linkname MmCmpgePs MmCmpgePs //go:noescape func MmCmpgePs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCmpneqSs MmCmpneqSs //go:noescape func MmCmpneqSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-equal, and store the results in "dst". // //go:linkname MmCmpneqPs MmCmpneqPs //go:noescape func MmCmpneqPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCmpnltSs MmCmpnltSs //go:noescape func MmCmpnltSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in "dst". // //go:linkname MmCmpnltPs MmCmpnltPs //go:noescape func MmCmpnltPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCmpnleSs MmCmpnleSs //go:noescape func MmCmpnleSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in "dst". // //go:linkname MmCmpnlePs MmCmpnlePs //go:noescape func MmCmpnlePs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCmpngtSs MmCmpngtSs //go:noescape func MmCmpngtSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than, and store the results in "dst". // //go:linkname MmCmpngtPs MmCmpngtPs //go:noescape func MmCmpngtPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCmpngeSs MmCmpngeSs //go:noescape func MmCmpngeSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, and store the results in "dst". // //go:linkname MmCmpngePs MmCmpngePs //go:noescape func MmCmpngePs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCmpordSs MmCmpordSs //go:noescape func MmCmpordSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in "dst". // //go:linkname MmCmpordPs MmCmpordPs //go:noescape func MmCmpordPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCmpunordSs MmCmpunordSs //go:noescape func MmCmpunordSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare packed single-precision (32-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in "dst". // //go:linkname MmCmpunordPs MmCmpunordPs //go:noescape func MmCmpunordPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). // //go:linkname MmComieqSs MmComieqSs //go:noescape func MmComieqSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). // //go:linkname MmComiltSs MmComiltSs //go:noescape func MmComiltSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). // //go:linkname MmComileSs MmComileSs //go:noescape func MmComileSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). // //go:linkname MmComigtSs MmComigtSs //go:noescape func MmComigtSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). // //go:linkname MmComigeSs MmComigeSs //go:noescape func MmComigeSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). // //go:linkname MmComineqSs MmComineqSs //go:noescape func MmComineqSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. // //go:linkname MmUcomieqSs MmUcomieqSs //go:noescape func MmUcomieqSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. // //go:linkname MmUcomiltSs MmUcomiltSs //go:noescape func MmUcomiltSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. // //go:linkname MmUcomileSs MmUcomileSs //go:noescape func MmUcomileSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. // //go:linkname MmUcomigtSs MmUcomigtSs //go:noescape func MmUcomigtSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. // //go:linkname MmUcomigeSs MmUcomigeSs //go:noescape func MmUcomigeSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Compare the lower single-precision (32-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. // //go:linkname MmUcomineqSs MmUcomineqSs //go:noescape func MmUcomineqSs(r *x86.Int, v0 *x86.M128, v1 *x86.M128) // Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst". // //go:linkname MmCvtssSi32 MmCvtssSi32 //go:noescape func MmCvtssSi32(r *x86.Int, v0 *x86.M128) // Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst". // //go:linkname MmCvtSs2Si MmCvtSs2Si //go:noescape func MmCvtSs2Si(r *x86.Int, v0 *x86.M128) // Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst". // //go:linkname MmCvtssSi64 MmCvtssSi64 //go:noescape func MmCvtssSi64(r *x86.Longlong, v0 *x86.M128) // Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". // //go:linkname MmCvttssSi32 MmCvttssSi32 //go:noescape func MmCvttssSi32(r *x86.Int, v0 *x86.M128) // Convert the lower single-precision (32-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". // //go:linkname MmCvttSs2Si MmCvttSs2Si //go:noescape func MmCvttSs2Si(r *x86.Int, v0 *x86.M128) // Convert the lower single-precision (32-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst". // //go:linkname MmCvttssSi64 MmCvttssSi64 //go:noescape func MmCvttssSi64(r *x86.Longlong, v0 *x86.M128) // Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCvtsi32Ss MmCvtsi32Ss //go:noescape func MmCvtsi32Ss(r *x86.M128, v0 *x86.M128, v1 *x86.Int) // Convert the signed 32-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCvtSi2Ss MmCvtSi2Ss //go:noescape func MmCvtSi2Ss(r *x86.M128, v0 *x86.M128, v1 *x86.Int) // Convert the signed 64-bit integer "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCvtsi64Ss MmCvtsi64Ss //go:noescape func MmCvtsi64Ss(r *x86.M128, v0 *x86.M128, v1 *x86.Longlong) // Copy the lower single-precision (32-bit) floating-point element of "a" to "dst". // //go:linkname MmCvtssF32 MmCvtssF32 //go:noescape func MmCvtssF32(r *x86.Float, v0 *x86.M128) // Return vector of type __m128 with undefined elements. // //go:linkname MmUndefinedPs MmUndefinedPs //go:noescape func MmUndefinedPs(r *x86.M128, ) // Copy single-precision (32-bit) floating-point element "a" to the lower element of "dst", and zero the upper 3 elements. // //go:linkname MmSetSs MmSetSs //go:noescape func MmSetSs(r *x86.M128, v0 *x86.Float) // Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst". // //go:linkname MmSet1Ps MmSet1Ps //go:noescape func MmSet1Ps(r *x86.M128, v0 *x86.Float) // Broadcast single-precision (32-bit) floating-point value "a" to all elements of "dst". // //go:linkname MmSetPs1 MmSetPs1 //go:noescape func MmSetPs1(r *x86.M128, v0 *x86.Float) // Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values. // //go:linkname MmSetPs MmSetPs //go:noescape func MmSetPs(r *x86.M128, v0 *x86.Float, v1 *x86.Float, v2 *x86.Float, v3 *x86.Float) // Set packed single-precision (32-bit) floating-point elements in "dst" with the supplied values in reverse order. // //go:linkname MmSetrPs MmSetrPs //go:noescape func MmSetrPs(r *x86.M128, v0 *x86.Float, v1 *x86.Float, v2 *x86.Float, v3 *x86.Float) // Return vector of type __m128 with all elements set to zero. // //go:linkname MmSetzeroPs MmSetzeroPs //go:noescape func MmSetzeroPs(r *x86.M128, ) // Unpack and interleave single-precision (32-bit) floating-point elements from the high half "a" and "b", and store the results in "dst". // //go:linkname MmUnpackhiPs MmUnpackhiPs //go:noescape func MmUnpackhiPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Unpack and interleave single-precision (32-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackloPs MmUnpackloPs //go:noescape func MmUnpackloPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Move the lower single-precision (32-bit) floating-point element from "b" to the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmMoveSs MmMoveSs //go:noescape func MmMoveSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Move the upper 2 single-precision (32-bit) floating-point elements from "b" to the lower 2 elements of "dst", and copy the upper 2 elements from "a" to the upper 2 elements of "dst". // //go:linkname MmMovehlPs MmMovehlPs //go:noescape func MmMovehlPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Move the lower 2 single-precision (32-bit) floating-point elements from "b" to the upper 2 elements of "dst", and copy the lower 2 elements from "a" to the lower 2 elements of "dst". // //go:linkname MmMovelhPs MmMovelhPs //go:noescape func MmMovelhPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Set each bit of mask "dst" based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in "a". // //go:linkname MmMovemaskPs MmMovemaskPs //go:noescape func MmMovemaskPs(r *x86.Int, v0 *x86.M128) ================================================ FILE: x86/sse2/functions.c ================================================ #include void MmAddSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_add_sd(*v0, *v1); } void MmAddPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_add_pd(*v0, *v1); } void MmSubSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_sub_sd(*v0, *v1); } void MmSubPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_sub_pd(*v0, *v1); } void MmMulSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_mul_sd(*v0, *v1); } void MmMulPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_mul_pd(*v0, *v1); } void MmDivSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_div_sd(*v0, *v1); } void MmDivPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_div_pd(*v0, *v1); } void MmSqrtSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_sqrt_sd(*v0, *v1); } void MmSqrtPd(__m128d* r, __m128d* v0) { *r = _mm_sqrt_pd(*v0); } void MmMinSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_min_sd(*v0, *v1); } void MmMinPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_min_pd(*v0, *v1); } void MmMaxSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_max_sd(*v0, *v1); } void MmMaxPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_max_pd(*v0, *v1); } void MmAndPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_and_pd(*v0, *v1); } void MmAndnotPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_andnot_pd(*v0, *v1); } void MmOrPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_or_pd(*v0, *v1); } void MmXorPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_xor_pd(*v0, *v1); } void MmCmpeqPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpeq_pd(*v0, *v1); } void MmCmpltPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmplt_pd(*v0, *v1); } void MmCmplePd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmple_pd(*v0, *v1); } void MmCmpgtPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpgt_pd(*v0, *v1); } void MmCmpgePd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpge_pd(*v0, *v1); } void MmCmpordPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpord_pd(*v0, *v1); } void MmCmpunordPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpunord_pd(*v0, *v1); } void MmCmpneqPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpneq_pd(*v0, *v1); } void MmCmpnltPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnlt_pd(*v0, *v1); } void MmCmpnlePd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnle_pd(*v0, *v1); } void MmCmpngtPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpngt_pd(*v0, *v1); } void MmCmpngePd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnge_pd(*v0, *v1); } void MmCmpeqSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpeq_sd(*v0, *v1); } void MmCmpltSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmplt_sd(*v0, *v1); } void MmCmpleSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmple_sd(*v0, *v1); } void MmCmpgtSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpgt_sd(*v0, *v1); } void MmCmpgeSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpge_sd(*v0, *v1); } void MmCmpordSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpord_sd(*v0, *v1); } void MmCmpunordSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpunord_sd(*v0, *v1); } void MmCmpneqSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpneq_sd(*v0, *v1); } void MmCmpnltSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnlt_sd(*v0, *v1); } void MmCmpnleSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnle_sd(*v0, *v1); } void MmCmpngtSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpngt_sd(*v0, *v1); } void MmCmpngeSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_cmpnge_sd(*v0, *v1); } void MmComieqSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comieq_sd(*v0, *v1); } void MmComiltSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comilt_sd(*v0, *v1); } void MmComileSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comile_sd(*v0, *v1); } void MmComigtSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comigt_sd(*v0, *v1); } void MmComigeSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comige_sd(*v0, *v1); } void MmComineqSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_comineq_sd(*v0, *v1); } void MmUcomieqSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomieq_sd(*v0, *v1); } void MmUcomiltSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomilt_sd(*v0, *v1); } void MmUcomileSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomile_sd(*v0, *v1); } void MmUcomigtSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomigt_sd(*v0, *v1); } void MmUcomigeSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomige_sd(*v0, *v1); } void MmUcomineqSd(int* r, __m128d* v0, __m128d* v1) { *r = _mm_ucomineq_sd(*v0, *v1); } void MmCvtpdPs(__m128* r, __m128d* v0) { *r = _mm_cvtpd_ps(*v0); } void MmCvtpsPd(__m128d* r, __m128* v0) { *r = _mm_cvtps_pd(*v0); } void MmCvtepi32Pd(__m128d* r, __m128i* v0) { *r = _mm_cvtepi32_pd(*v0); } void MmCvtpdEpi32(__m128i* r, __m128d* v0) { *r = _mm_cvtpd_epi32(*v0); } void MmCvtsdSi32(int* r, __m128d* v0) { *r = _mm_cvtsd_si32(*v0); } void MmCvtsdSs(__m128* r, __m128* v0, __m128d* v1) { *r = _mm_cvtsd_ss(*v0, *v1); } void MmCvtsi32Sd(__m128d* r, __m128d* v0, int* v1) { *r = _mm_cvtsi32_sd(*v0, *v1); } void MmCvtssSd(__m128d* r, __m128d* v0, __m128* v1) { *r = _mm_cvtss_sd(*v0, *v1); } void MmCvttpdEpi32(__m128i* r, __m128d* v0) { *r = _mm_cvttpd_epi32(*v0); } void MmCvttsdSi32(int* r, __m128d* v0) { *r = _mm_cvttsd_si32(*v0); } void MmCvtsdF64(double* r, __m128d* v0) { *r = _mm_cvtsd_f64(*v0); } void MmUndefinedPd(__m128d* r) { *r = _mm_undefined_pd(); } void MmSetSd(__m128d* r, double* v0) { *r = _mm_set_sd(*v0); } void MmSet1Pd(__m128d* r, double* v0) { *r = _mm_set1_pd(*v0); } void MmSetPd1(__m128d* r, double* v0) { *r = _mm_set_pd1(*v0); } void MmSetPd(__m128d* r, double* v0, double* v1) { *r = _mm_set_pd(*v0, *v1); } void MmSetrPd(__m128d* r, double* v0, double* v1) { *r = _mm_setr_pd(*v0, *v1); } void MmSetzeroPd(__m128d* r) { *r = _mm_setzero_pd(); } void MmMoveSd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_move_sd(*v0, *v1); } void MmAddEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_add_epi8(*v0, *v1); } void MmAddEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_add_epi16(*v0, *v1); } void MmAddEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_add_epi32(*v0, *v1); } void MmAddEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_add_epi64(*v0, *v1); } void MmAddsEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_adds_epi8(*v0, *v1); } void MmAddsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_adds_epi16(*v0, *v1); } void MmAddsEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_adds_epu8(*v0, *v1); } void MmAddsEpu16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_adds_epu16(*v0, *v1); } void MmAvgEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_avg_epu8(*v0, *v1); } void MmAvgEpu16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_avg_epu16(*v0, *v1); } void MmMaddEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_madd_epi16(*v0, *v1); } void MmMaxEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_max_epi16(*v0, *v1); } void MmMaxEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_max_epu8(*v0, *v1); } void MmMinEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_min_epi16(*v0, *v1); } void MmMinEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_min_epu8(*v0, *v1); } void MmMulhiEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mulhi_epi16(*v0, *v1); } void MmMulhiEpu16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mulhi_epu16(*v0, *v1); } void MmMulloEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mullo_epi16(*v0, *v1); } void MmMulEpu32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mul_epu32(*v0, *v1); } void MmSadEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sad_epu8(*v0, *v1); } void MmSubEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sub_epi8(*v0, *v1); } void MmSubEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sub_epi16(*v0, *v1); } void MmSubEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sub_epi32(*v0, *v1); } void MmSubEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sub_epi64(*v0, *v1); } void MmSubsEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_subs_epi8(*v0, *v1); } void MmSubsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_subs_epi16(*v0, *v1); } void MmSubsEpu8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_subs_epu8(*v0, *v1); } void MmSubsEpu16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_subs_epu16(*v0, *v1); } void MmAndSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_and_si128(*v0, *v1); } void MmAndnotSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_andnot_si128(*v0, *v1); } void MmOrSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_or_si128(*v0, *v1); } void MmXorSi128(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_xor_si128(*v0, *v1); } void MmSlliEpi16(__m128i* r, __m128i* v0, int* v1) { *r = _mm_slli_epi16(*v0, *v1); } void MmSllEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sll_epi16(*v0, *v1); } void MmSlliEpi32(__m128i* r, __m128i* v0, int* v1) { *r = _mm_slli_epi32(*v0, *v1); } void MmSllEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sll_epi32(*v0, *v1); } void MmSlliEpi64(__m128i* r, __m128i* v0, int* v1) { *r = _mm_slli_epi64(*v0, *v1); } void MmSllEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sll_epi64(*v0, *v1); } void MmSraiEpi16(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srai_epi16(*v0, *v1); } void MmSraEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sra_epi16(*v0, *v1); } void MmSraiEpi32(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srai_epi32(*v0, *v1); } void MmSraEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sra_epi32(*v0, *v1); } void MmSrliEpi16(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srli_epi16(*v0, *v1); } void MmSrlEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srl_epi16(*v0, *v1); } void MmSrliEpi32(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srli_epi32(*v0, *v1); } void MmSrlEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srl_epi32(*v0, *v1); } void MmSrliEpi64(__m128i* r, __m128i* v0, int* v1) { *r = _mm_srli_epi64(*v0, *v1); } void MmSrlEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_srl_epi64(*v0, *v1); } void MmCmpeqEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpeq_epi8(*v0, *v1); } void MmCmpeqEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpeq_epi16(*v0, *v1); } void MmCmpeqEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpeq_epi32(*v0, *v1); } void MmCmpgtEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpgt_epi8(*v0, *v1); } void MmCmpgtEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpgt_epi16(*v0, *v1); } void MmCmpgtEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmpgt_epi32(*v0, *v1); } void MmCmpltEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmplt_epi8(*v0, *v1); } void MmCmpltEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmplt_epi16(*v0, *v1); } void MmCmpltEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_cmplt_epi32(*v0, *v1); } void MmCvtsi64Sd(__m128d* r, __m128d* v0, long long* v1) { *r = _mm_cvtsi64_sd(*v0, *v1); } void MmCvtsdSi64(long long* r, __m128d* v0) { *r = _mm_cvtsd_si64(*v0); } void MmCvttsdSi64(long long* r, __m128d* v0) { *r = _mm_cvttsd_si64(*v0); } void MmCvtepi32Ps(__m128* r, __m128i* v0) { *r = _mm_cvtepi32_ps(*v0); } void MmCvtpsEpi32(__m128i* r, __m128* v0) { *r = _mm_cvtps_epi32(*v0); } void MmCvttpsEpi32(__m128i* r, __m128* v0) { *r = _mm_cvttps_epi32(*v0); } void MmCvtsi32Si128(__m128i* r, int* v0) { *r = _mm_cvtsi32_si128(*v0); } void MmCvtsi64Si128(__m128i* r, long long* v0) { *r = _mm_cvtsi64_si128(*v0); } void MmCvtsi128Si32(int* r, __m128i* v0) { *r = _mm_cvtsi128_si32(*v0); } void MmCvtsi128Si64(long long* r, __m128i* v0) { *r = _mm_cvtsi128_si64(*v0); } void MmUndefinedSi128(__m128i* r) { *r = _mm_undefined_si128(); } void MmSetEpi64X(__m128i* r, long long* v0, long long* v1) { *r = _mm_set_epi64x(*v0, *v1); } void MmSetEpi64(__m128i* r, __m64* v0, __m64* v1) { *r = _mm_set_epi64(*v0, *v1); } void MmSetEpi32(__m128i* r, int* v0, int* v1, int* v2, int* v3) { *r = _mm_set_epi32(*v0, *v1, *v2, *v3); } void MmSetEpi16(__m128i* r, short* v0, short* v1, short* v2, short* v3, short* v4, short* v5, short* v6, short* v7) { *r = _mm_set_epi16(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); } void MmSetEpi8(__m128i* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7, char* v8, char* v9, char* v10, char* v11, char* v12, char* v13, char* v14, char* v15) { *r = _mm_set_epi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15); } void MmSet1Epi64X(__m128i* r, long long* v0) { *r = _mm_set1_epi64x(*v0); } void MmSet1Epi64(__m128i* r, __m64* v0) { *r = _mm_set1_epi64(*v0); } void MmSet1Epi32(__m128i* r, int* v0) { *r = _mm_set1_epi32(*v0); } void MmSet1Epi16(__m128i* r, short* v0) { *r = _mm_set1_epi16(*v0); } void MmSet1Epi8(__m128i* r, char* v0) { *r = _mm_set1_epi8(*v0); } void MmSetrEpi64(__m128i* r, __m64* v0, __m64* v1) { *r = _mm_setr_epi64(*v0, *v1); } void MmSetrEpi32(__m128i* r, int* v0, int* v1, int* v2, int* v3) { *r = _mm_setr_epi32(*v0, *v1, *v2, *v3); } void MmSetrEpi16(__m128i* r, short* v0, short* v1, short* v2, short* v3, short* v4, short* v5, short* v6, short* v7) { *r = _mm_setr_epi16(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7); } void MmSetrEpi8(__m128i* r, char* v0, char* v1, char* v2, char* v3, char* v4, char* v5, char* v6, char* v7, char* v8, char* v9, char* v10, char* v11, char* v12, char* v13, char* v14, char* v15) { *r = _mm_setr_epi8(*v0, *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8, *v9, *v10, *v11, *v12, *v13, *v14, *v15); } void MmSetzeroSi128(__m128i* r) { *r = _mm_setzero_si128(); } void MmPacksEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_packs_epi16(*v0, *v1); } void MmPacksEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_packs_epi32(*v0, *v1); } void MmPackusEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_packus_epi16(*v0, *v1); } void MmMovemaskEpi8(int* r, __m128i* v0) { *r = _mm_movemask_epi8(*v0); } void MmUnpackhiEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpackhi_epi8(*v0, *v1); } void MmUnpackhiEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpackhi_epi16(*v0, *v1); } void MmUnpackhiEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpackhi_epi32(*v0, *v1); } void MmUnpackhiEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpackhi_epi64(*v0, *v1); } void MmUnpackloEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpacklo_epi8(*v0, *v1); } void MmUnpackloEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpacklo_epi16(*v0, *v1); } void MmUnpackloEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpacklo_epi32(*v0, *v1); } void MmUnpackloEpi64(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_unpacklo_epi64(*v0, *v1); } void MmMovepi64Pi64(__m64* r, __m128i* v0) { *r = _mm_movepi64_pi64(*v0); } void MmMovpi64Epi64(__m128i* r, __m64* v0) { *r = _mm_movpi64_epi64(*v0); } void MmMoveEpi64(__m128i* r, __m128i* v0) { *r = _mm_move_epi64(*v0); } void MmUnpackhiPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_unpackhi_pd(*v0, *v1); } void MmUnpackloPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_unpacklo_pd(*v0, *v1); } void MmMovemaskPd(int* r, __m128d* v0) { *r = _mm_movemask_pd(*v0); } void MmCastpdPs(__m128* r, __m128d* v0) { *r = _mm_castpd_ps(*v0); } void MmCastpdSi128(__m128i* r, __m128d* v0) { *r = _mm_castpd_si128(*v0); } void MmCastpsPd(__m128d* r, __m128* v0) { *r = _mm_castps_pd(*v0); } void MmCastpsSi128(__m128i* r, __m128* v0) { *r = _mm_castps_si128(*v0); } void MmCastsi128Ps(__m128* r, __m128i* v0) { *r = _mm_castsi128_ps(*v0); } void MmCastsi128Pd(__m128d* r, __m128i* v0) { *r = _mm_castsi128_pd(*v0); } ================================================ FILE: x86/sse2/functions.go ================================================ package sse2 import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -msse2 #include */ import "C" // Add the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmAddSd MmAddSd //go:noescape func MmAddSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Add packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname MmAddPd MmAddPd //go:noescape func MmAddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Subtract the lower double-precision (64-bit) floating-point element in "b" from the lower double-precision (64-bit) floating-point element in "a", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmSubSd MmSubSd //go:noescape func MmSubSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Subtract packed double-precision (64-bit) floating-point elements in "b" from packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". // //go:linkname MmSubPd MmSubPd //go:noescape func MmSubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Multiply the lower double-precision (64-bit) floating-point element in "a" and "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmMulSd MmMulSd //go:noescape func MmMulSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Multiply packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname MmMulPd MmMulPd //go:noescape func MmMulPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Divide the lower double-precision (64-bit) floating-point element in "a" by the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmDivSd MmDivSd //go:noescape func MmDivSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Divide packed double-precision (64-bit) floating-point elements in "a" by packed elements in "b", and store the results in "dst". // //go:linkname MmDivPd MmDivPd //go:noescape func MmDivPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compute the square root of the lower double-precision (64-bit) floating-point element in "b", store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmSqrtSd MmSqrtSd //go:noescape func MmSqrtSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compute the square root of packed double-precision (64-bit) floating-point elements in "a", and store the results in "dst". // //go:linkname MmSqrtPd MmSqrtPd //go:noescape func MmSqrtPd(r *x86.M128D, v0 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the minimum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [min_float_note] // //go:linkname MmMinSd MmMinSd //go:noescape func MmMinSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed minimum values in "dst". [min_float_note] // //go:linkname MmMinPd MmMinPd //go:noescape func MmMinPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point elements in "a" and "b", store the maximum value in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". [max_float_note] // //go:linkname MmMaxSd MmMaxSd //go:noescape func MmMaxSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b", and store packed maximum values in "dst". [max_float_note] // //go:linkname MmMaxPd MmMaxPd //go:noescape func MmMaxPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname MmAndPd MmAndPd //go:noescape func MmAndPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in "a" and then AND with "b", and store the results in "dst". // //go:linkname MmAndnotPd MmAndnotPd //go:noescape func MmAndnotPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname MmOrPd MmOrPd //go:noescape func MmOrPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in "a" and "b", and store the results in "dst". // //go:linkname MmXorPd MmXorPd //go:noescape func MmXorPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for equality, and store the results in "dst". // //go:linkname MmCmpeqPd MmCmpeqPd //go:noescape func MmCmpeqPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than, and store the results in "dst". // //go:linkname MmCmpltPd MmCmpltPd //go:noescape func MmCmpltPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, and store the results in "dst". // //go:linkname MmCmplePd MmCmplePd //go:noescape func MmCmplePd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for greater-than, and store the results in "dst". // //go:linkname MmCmpgtPd MmCmpgtPd //go:noescape func MmCmpgtPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for greater-than-or-equal, and store the results in "dst". // //go:linkname MmCmpgePd MmCmpgePd //go:noescape func MmCmpgePd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, and store the results in "dst". // //go:linkname MmCmpordPd MmCmpordPd //go:noescape func MmCmpordPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, and store the results in "dst". // //go:linkname MmCmpunordPd MmCmpunordPd //go:noescape func MmCmpunordPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, and store the results in "dst". // //go:linkname MmCmpneqPd MmCmpneqPd //go:noescape func MmCmpneqPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, and store the results in "dst". // //go:linkname MmCmpnltPd MmCmpnltPd //go:noescape func MmCmpnltPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, and store the results in "dst". // //go:linkname MmCmpnlePd MmCmpnlePd //go:noescape func MmCmpnlePd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than, and store the results in "dst". // //go:linkname MmCmpngtPd MmCmpngtPd //go:noescape func MmCmpngtPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare packed double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, and store the results in "dst". // //go:linkname MmCmpngePd MmCmpngePd //go:noescape func MmCmpngePd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for equality, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCmpeqSd MmCmpeqSd //go:noescape func MmCmpeqSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for less-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCmpltSd MmCmpltSd //go:noescape func MmCmpltSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for less-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCmpleSd MmCmpleSd //go:noescape func MmCmpleSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for greater-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCmpgtSd MmCmpgtSd //go:noescape func MmCmpgtSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for greater-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCmpgeSd MmCmpgeSd //go:noescape func MmCmpgeSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" to see if neither is NaN, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCmpordSd MmCmpordSd //go:noescape func MmCmpordSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" to see if either is NaN, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCmpunordSd MmCmpunordSd //go:noescape func MmCmpunordSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCmpneqSd MmCmpneqSd //go:noescape func MmCmpneqSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCmpnltSd MmCmpnltSd //go:noescape func MmCmpnltSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-less-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCmpnleSd MmCmpnleSd //go:noescape func MmCmpnleSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCmpngtSd MmCmpngtSd //go:noescape func MmCmpngtSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point elements in "a" and "b" for not-greater-than-or-equal, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCmpngeSd MmCmpngeSd //go:noescape func MmCmpngeSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). // //go:linkname MmComieqSd MmComieqSd //go:noescape func MmComieqSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). // //go:linkname MmComiltSd MmComiltSd //go:noescape func MmComiltSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). // //go:linkname MmComileSd MmComileSd //go:noescape func MmComileSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). // //go:linkname MmComigtSd MmComigtSd //go:noescape func MmComigtSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). // //go:linkname MmComigeSd MmComigeSd //go:noescape func MmComigeSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). // //go:linkname MmComineqSd MmComineqSd //go:noescape func MmComineqSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for equality, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. // //go:linkname MmUcomieqSd MmUcomieqSd //go:noescape func MmUcomieqSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. // //go:linkname MmUcomiltSd MmUcomiltSd //go:noescape func MmUcomiltSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for less-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. // //go:linkname MmUcomileSd MmUcomileSd //go:noescape func MmUcomileSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. // //go:linkname MmUcomigtSd MmUcomigtSd //go:noescape func MmUcomigtSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for greater-than-or-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. // //go:linkname MmUcomigeSd MmUcomigeSd //go:noescape func MmUcomigeSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Compare the lower double-precision (64-bit) floating-point element in "a" and "b" for not-equal, and return the boolean result (0 or 1). This instruction will not signal an exception for QNaNs. // //go:linkname MmUcomineqSd MmUcomineqSd //go:noescape func MmUcomineqSd(r *x86.Int, v0 *x86.M128D, v1 *x86.M128D) // Convert packed double-precision (64-bit) floating-point elements in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". // //go:linkname MmCvtpdPs MmCvtpdPs //go:noescape func MmCvtpdPs(r *x86.M128, v0 *x86.M128D) // Convert packed single-precision (32-bit) floating-point elements in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". // //go:linkname MmCvtpsPd MmCvtpsPd //go:noescape func MmCvtpsPd(r *x86.M128D, v0 *x86.M128) // Convert packed signed 32-bit integers in "a" to packed double-precision (64-bit) floating-point elements, and store the results in "dst". // //go:linkname MmCvtepi32Pd MmCvtepi32Pd //go:noescape func MmCvtepi32Pd(r *x86.M128D, v0 *x86.M128I) // Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". // //go:linkname MmCvtpdEpi32 MmCvtpdEpi32 //go:noescape func MmCvtpdEpi32(r *x86.M128I, v0 *x86.M128D) // Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer, and store the result in "dst". // //go:linkname MmCvtsdSi32 MmCvtsdSi32 //go:noescape func MmCvtsdSi32(r *x86.Int, v0 *x86.M128D) // Convert the lower double-precision (64-bit) floating-point element in "b" to a single-precision (32-bit) floating-point element, store the result in the lower element of "dst", and copy the upper 3 packed elements from "a" to the upper elements of "dst". // //go:linkname MmCvtsdSs MmCvtsdSs //go:noescape func MmCvtsdSs(r *x86.M128, v0 *x86.M128, v1 *x86.M128D) // Convert the signed 32-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCvtsi32Sd MmCvtsi32Sd //go:noescape func MmCvtsi32Sd(r *x86.M128D, v0 *x86.M128D, v1 *x86.Int) // Convert the lower single-precision (32-bit) floating-point element in "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCvtssSd MmCvtssSd //go:noescape func MmCvtssSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128) // Convert packed double-precision (64-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". // //go:linkname MmCvttpdEpi32 MmCvttpdEpi32 //go:noescape func MmCvttpdEpi32(r *x86.M128I, v0 *x86.M128D) // Convert the lower double-precision (64-bit) floating-point element in "a" to a 32-bit integer with truncation, and store the result in "dst". // //go:linkname MmCvttsdSi32 MmCvttsdSi32 //go:noescape func MmCvttsdSi32(r *x86.Int, v0 *x86.M128D) // Copy the lower double-precision (64-bit) floating-point element of "a" to "dst". // //go:linkname MmCvtsdF64 MmCvtsdF64 //go:noescape func MmCvtsdF64(r *x86.Double, v0 *x86.M128D) // Return vector of type __m128d with undefined elements. // //go:linkname MmUndefinedPd MmUndefinedPd //go:noescape func MmUndefinedPd(r *x86.M128D, ) // Copy double-precision (64-bit) floating-point element "a" to the lower element of "dst", and zero the upper element. // //go:linkname MmSetSd MmSetSd //go:noescape func MmSetSd(r *x86.M128D, v0 *x86.Double) // Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst". // //go:linkname MmSet1Pd MmSet1Pd //go:noescape func MmSet1Pd(r *x86.M128D, v0 *x86.Double) // Broadcast double-precision (64-bit) floating-point value "a" to all elements of "dst". // //go:linkname MmSetPd1 MmSetPd1 //go:noescape func MmSetPd1(r *x86.M128D, v0 *x86.Double) // Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values. // //go:linkname MmSetPd MmSetPd //go:noescape func MmSetPd(r *x86.M128D, v0 *x86.Double, v1 *x86.Double) // Set packed double-precision (64-bit) floating-point elements in "dst" with the supplied values in reverse order. // //go:linkname MmSetrPd MmSetrPd //go:noescape func MmSetrPd(r *x86.M128D, v0 *x86.Double, v1 *x86.Double) // Return vector of type __m128d with all elements set to zero. // //go:linkname MmSetzeroPd MmSetzeroPd //go:noescape func MmSetzeroPd(r *x86.M128D, ) // Move the lower double-precision (64-bit) floating-point element from "b" to the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmMoveSd MmMoveSd //go:noescape func MmMoveSd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Add packed 8-bit integers in "a" and "b", and store the results in "dst". // //go:linkname MmAddEpi8 MmAddEpi8 //go:noescape func MmAddEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Add packed 16-bit integers in "a" and "b", and store the results in "dst". // //go:linkname MmAddEpi16 MmAddEpi16 //go:noescape func MmAddEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Add packed 32-bit integers in "a" and "b", and store the results in "dst". // //go:linkname MmAddEpi32 MmAddEpi32 //go:noescape func MmAddEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Add packed 64-bit integers in "a" and "b", and store the results in "dst". // //go:linkname MmAddEpi64 MmAddEpi64 //go:noescape func MmAddEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Add packed signed 8-bit integers in "a" and "b" using saturation, and store the results in "dst". // //go:linkname MmAddsEpi8 MmAddsEpi8 //go:noescape func MmAddsEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Add packed signed 16-bit integers in "a" and "b" using saturation, and store the results in "dst". // //go:linkname MmAddsEpi16 MmAddsEpi16 //go:noescape func MmAddsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Add packed unsigned 8-bit integers in "a" and "b" using saturation, and store the results in "dst". // //go:linkname MmAddsEpu8 MmAddsEpu8 //go:noescape func MmAddsEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Add packed unsigned 16-bit integers in "a" and "b" using saturation, and store the results in "dst". // //go:linkname MmAddsEpu16 MmAddsEpu16 //go:noescape func MmAddsEpu16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Average packed unsigned 8-bit integers in "a" and "b", and store the results in "dst". // //go:linkname MmAvgEpu8 MmAvgEpu8 //go:noescape func MmAvgEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Average packed unsigned 16-bit integers in "a" and "b", and store the results in "dst". // //go:linkname MmAvgEpu16 MmAvgEpu16 //go:noescape func MmAvgEpu16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in "dst". // //go:linkname MmMaddEpi16 MmMaddEpi16 //go:noescape func MmMaddEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compare packed signed 16-bit integers in "a" and "b", and store packed maximum values in "dst". // //go:linkname MmMaxEpi16 MmMaxEpi16 //go:noescape func MmMaxEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compare packed unsigned 8-bit integers in "a" and "b", and store packed maximum values in "dst". // //go:linkname MmMaxEpu8 MmMaxEpu8 //go:noescape func MmMaxEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compare packed signed 16-bit integers in "a" and "b", and store packed minimum values in "dst". // //go:linkname MmMinEpi16 MmMinEpi16 //go:noescape func MmMinEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compare packed unsigned 8-bit integers in "a" and "b", and store packed minimum values in "dst". // //go:linkname MmMinEpu8 MmMinEpu8 //go:noescape func MmMinEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Multiply the packed signed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". // //go:linkname MmMulhiEpi16 MmMulhiEpi16 //go:noescape func MmMulhiEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Multiply the packed unsigned 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the high 16 bits of the intermediate integers in "dst". // //go:linkname MmMulhiEpu16 MmMulhiEpu16 //go:noescape func MmMulhiEpu16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Multiply the packed 16-bit integers in "a" and "b", producing intermediate 32-bit integers, and store the low 16 bits of the intermediate integers in "dst". // //go:linkname MmMulloEpi16 MmMulloEpi16 //go:noescape func MmMulloEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Multiply the low unsigned 32-bit integers from each packed 64-bit element in "a" and "b", and store the unsigned 64-bit results in "dst". // //go:linkname MmMulEpu32 MmMulEpu32 //go:noescape func MmMulEpu32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compute the absolute differences of packed unsigned 8-bit integers in "a" and "b", then horizontally sum each consecutive 8 differences to produce two unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in "dst". // //go:linkname MmSadEpu8 MmSadEpu8 //go:noescape func MmSadEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Subtract packed 8-bit integers in "b" from packed 8-bit integers in "a", and store the results in "dst". // //go:linkname MmSubEpi8 MmSubEpi8 //go:noescape func MmSubEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Subtract packed 16-bit integers in "b" from packed 16-bit integers in "a", and store the results in "dst". // //go:linkname MmSubEpi16 MmSubEpi16 //go:noescape func MmSubEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Subtract packed 32-bit integers in "b" from packed 32-bit integers in "a", and store the results in "dst". // //go:linkname MmSubEpi32 MmSubEpi32 //go:noescape func MmSubEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Subtract packed 64-bit integers in "b" from packed 64-bit integers in "a", and store the results in "dst". // //go:linkname MmSubEpi64 MmSubEpi64 //go:noescape func MmSubEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Subtract packed signed 8-bit integers in "b" from packed 8-bit integers in "a" using saturation, and store the results in "dst". // //go:linkname MmSubsEpi8 MmSubsEpi8 //go:noescape func MmSubsEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Subtract packed signed 16-bit integers in "b" from packed 16-bit integers in "a" using saturation, and store the results in "dst". // //go:linkname MmSubsEpi16 MmSubsEpi16 //go:noescape func MmSubsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Subtract packed unsigned 8-bit integers in "b" from packed unsigned 8-bit integers in "a" using saturation, and store the results in "dst". // //go:linkname MmSubsEpu8 MmSubsEpu8 //go:noescape func MmSubsEpu8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Subtract packed unsigned 16-bit integers in "b" from packed unsigned 16-bit integers in "a" using saturation, and store the results in "dst". // //go:linkname MmSubsEpu16 MmSubsEpu16 //go:noescape func MmSubsEpu16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compute the bitwise AND of 128 bits (representing integer data) in "a" and "b", and store the result in "dst". // //go:linkname MmAndSi128 MmAndSi128 //go:noescape func MmAndSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compute the bitwise NOT of 128 bits (representing integer data) in "a" and then AND with "b", and store the result in "dst". // //go:linkname MmAndnotSi128 MmAndnotSi128 //go:noescape func MmAndnotSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compute the bitwise OR of 128 bits (representing integer data) in "a" and "b", and store the result in "dst". // //go:linkname MmOrSi128 MmOrSi128 //go:noescape func MmOrSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compute the bitwise XOR of 128 bits (representing integer data) in "a" and "b", and store the result in "dst". // //go:linkname MmXorSi128 MmXorSi128 //go:noescape func MmXorSi128(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Shift packed 16-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname MmSlliEpi16 MmSlliEpi16 //go:noescape func MmSlliEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int) // Shift packed 16-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". // //go:linkname MmSllEpi16 MmSllEpi16 //go:noescape func MmSllEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Shift packed 32-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname MmSlliEpi32 MmSlliEpi32 //go:noescape func MmSlliEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int) // Shift packed 32-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". // //go:linkname MmSllEpi32 MmSllEpi32 //go:noescape func MmSllEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Shift packed 64-bit integers in "a" left by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname MmSlliEpi64 MmSlliEpi64 //go:noescape func MmSlliEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int) // Shift packed 64-bit integers in "a" left by "count" while shifting in zeros, and store the results in "dst". // //go:linkname MmSllEpi64 MmSllEpi64 //go:noescape func MmSllEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Shift packed 16-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". // //go:linkname MmSraiEpi16 MmSraiEpi16 //go:noescape func MmSraiEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int) // Shift packed 16-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". // //go:linkname MmSraEpi16 MmSraEpi16 //go:noescape func MmSraEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Shift packed 32-bit integers in "a" right by "imm8" while shifting in sign bits, and store the results in "dst". // //go:linkname MmSraiEpi32 MmSraiEpi32 //go:noescape func MmSraiEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int) // Shift packed 32-bit integers in "a" right by "count" while shifting in sign bits, and store the results in "dst". // //go:linkname MmSraEpi32 MmSraEpi32 //go:noescape func MmSraEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Shift packed 16-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname MmSrliEpi16 MmSrliEpi16 //go:noescape func MmSrliEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int) // Shift packed 16-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". // //go:linkname MmSrlEpi16 MmSrlEpi16 //go:noescape func MmSrlEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Shift packed 32-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname MmSrliEpi32 MmSrliEpi32 //go:noescape func MmSrliEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int) // Shift packed 32-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". // //go:linkname MmSrlEpi32 MmSrlEpi32 //go:noescape func MmSrlEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Shift packed 64-bit integers in "a" right by "imm8" while shifting in zeros, and store the results in "dst". // //go:linkname MmSrliEpi64 MmSrliEpi64 //go:noescape func MmSrliEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.Int) // Shift packed 64-bit integers in "a" right by "count" while shifting in zeros, and store the results in "dst". // //go:linkname MmSrlEpi64 MmSrlEpi64 //go:noescape func MmSrlEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compare packed 8-bit integers in "a" and "b" for equality, and store the results in "dst". // //go:linkname MmCmpeqEpi8 MmCmpeqEpi8 //go:noescape func MmCmpeqEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compare packed 16-bit integers in "a" and "b" for equality, and store the results in "dst". // //go:linkname MmCmpeqEpi16 MmCmpeqEpi16 //go:noescape func MmCmpeqEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compare packed 32-bit integers in "a" and "b" for equality, and store the results in "dst". // //go:linkname MmCmpeqEpi32 MmCmpeqEpi32 //go:noescape func MmCmpeqEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compare packed signed 8-bit integers in "a" and "b" for greater-than, and store the results in "dst". // //go:linkname MmCmpgtEpi8 MmCmpgtEpi8 //go:noescape func MmCmpgtEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compare packed signed 16-bit integers in "a" and "b" for greater-than, and store the results in "dst". // //go:linkname MmCmpgtEpi16 MmCmpgtEpi16 //go:noescape func MmCmpgtEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compare packed signed 32-bit integers in "a" and "b" for greater-than, and store the results in "dst". // //go:linkname MmCmpgtEpi32 MmCmpgtEpi32 //go:noescape func MmCmpgtEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compare packed signed 8-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtb instruction with the order of the operands switched. // //go:linkname MmCmpltEpi8 MmCmpltEpi8 //go:noescape func MmCmpltEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compare packed signed 16-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtw instruction with the order of the operands switched. // //go:linkname MmCmpltEpi16 MmCmpltEpi16 //go:noescape func MmCmpltEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Compare packed signed 32-bit integers in "a" and "b" for less-than, and store the results in "dst". Note: This intrinsic emits the pcmpgtd instruction with the order of the operands switched. // //go:linkname MmCmpltEpi32 MmCmpltEpi32 //go:noescape func MmCmpltEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Convert the signed 64-bit integer "b" to a double-precision (64-bit) floating-point element, store the result in the lower element of "dst", and copy the upper element from "a" to the upper element of "dst". // //go:linkname MmCvtsi64Sd MmCvtsi64Sd //go:noescape func MmCvtsi64Sd(r *x86.M128D, v0 *x86.M128D, v1 *x86.Longlong) // Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer, and store the result in "dst". // //go:linkname MmCvtsdSi64 MmCvtsdSi64 //go:noescape func MmCvtsdSi64(r *x86.Longlong, v0 *x86.M128D) // Convert the lower double-precision (64-bit) floating-point element in "a" to a 64-bit integer with truncation, and store the result in "dst". // //go:linkname MmCvttsdSi64 MmCvttsdSi64 //go:noescape func MmCvttsdSi64(r *x86.Longlong, v0 *x86.M128D) // Convert packed signed 32-bit integers in "a" to packed single-precision (32-bit) floating-point elements, and store the results in "dst". // //go:linkname MmCvtepi32Ps MmCvtepi32Ps //go:noescape func MmCvtepi32Ps(r *x86.M128, v0 *x86.M128I) // Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers, and store the results in "dst". // //go:linkname MmCvtpsEpi32 MmCvtpsEpi32 //go:noescape func MmCvtpsEpi32(r *x86.M128I, v0 *x86.M128) // Convert packed single-precision (32-bit) floating-point elements in "a" to packed 32-bit integers with truncation, and store the results in "dst". // //go:linkname MmCvttpsEpi32 MmCvttpsEpi32 //go:noescape func MmCvttpsEpi32(r *x86.M128I, v0 *x86.M128) // Copy 32-bit integer "a" to the lower elements of "dst", and zero the upper elements of "dst". // //go:linkname MmCvtsi32Si128 MmCvtsi32Si128 //go:noescape func MmCvtsi32Si128(r *x86.M128I, v0 *x86.Int) // Copy 64-bit integer "a" to the lower element of "dst", and zero the upper element. // //go:linkname MmCvtsi64Si128 MmCvtsi64Si128 //go:noescape func MmCvtsi64Si128(r *x86.M128I, v0 *x86.Longlong) // Copy the lower 32-bit integer in "a" to "dst". // //go:linkname MmCvtsi128Si32 MmCvtsi128Si32 //go:noescape func MmCvtsi128Si32(r *x86.Int, v0 *x86.M128I) // Copy the lower 64-bit integer in "a" to "dst". // //go:linkname MmCvtsi128Si64 MmCvtsi128Si64 //go:noescape func MmCvtsi128Si64(r *x86.Longlong, v0 *x86.M128I) // Return vector of type __m128i with undefined elements. // //go:linkname MmUndefinedSi128 MmUndefinedSi128 //go:noescape func MmUndefinedSi128(r *x86.M128I, ) // Set packed 64-bit integers in "dst" with the supplied values. // //go:linkname MmSetEpi64X MmSetEpi64X //go:noescape func MmSetEpi64X(r *x86.M128I, v0 *x86.Longlong, v1 *x86.Longlong) // Set packed 64-bit integers in "dst" with the supplied values. // //go:linkname MmSetEpi64 MmSetEpi64 //go:noescape func MmSetEpi64(r *x86.M128I, v0 *x86.M64, v1 *x86.M64) // Set packed 32-bit integers in "dst" with the supplied values. // //go:linkname MmSetEpi32 MmSetEpi32 //go:noescape func MmSetEpi32(r *x86.M128I, v0 *x86.Int, v1 *x86.Int, v2 *x86.Int, v3 *x86.Int) // Set packed 16-bit integers in "dst" with the supplied values. // //go:linkname MmSetEpi16 MmSetEpi16 //go:noescape func MmSetEpi16(r *x86.M128I, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short, v4 *x86.Short, v5 *x86.Short, v6 *x86.Short, v7 *x86.Short) // Set packed 8-bit integers in "dst" with the supplied values. // //go:linkname MmSetEpi8 MmSetEpi8 //go:noescape func MmSetEpi8(r *x86.M128I, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char, v8 *x86.Char, v9 *x86.Char, v10 *x86.Char, v11 *x86.Char, v12 *x86.Char, v13 *x86.Char, v14 *x86.Char, v15 *x86.Char) // Broadcast 64-bit integer "a" to all elements of "dst". This intrinsic may generate the "vpbroadcastq". // //go:linkname MmSet1Epi64X MmSet1Epi64X //go:noescape func MmSet1Epi64X(r *x86.M128I, v0 *x86.Longlong) // Broadcast 64-bit integer "a" to all elements of "dst". // //go:linkname MmSet1Epi64 MmSet1Epi64 //go:noescape func MmSet1Epi64(r *x86.M128I, v0 *x86.M64) // Broadcast 32-bit integer "a" to all elements of "dst". This intrinsic may generate "vpbroadcastd". // //go:linkname MmSet1Epi32 MmSet1Epi32 //go:noescape func MmSet1Epi32(r *x86.M128I, v0 *x86.Int) // Broadcast 16-bit integer "a" to all all elements of "dst". This intrinsic may generate "vpbroadcastw". // //go:linkname MmSet1Epi16 MmSet1Epi16 //go:noescape func MmSet1Epi16(r *x86.M128I, v0 *x86.Short) // Broadcast 8-bit integer "a" to all elements of "dst". This intrinsic may generate "vpbroadcastb". // //go:linkname MmSet1Epi8 MmSet1Epi8 //go:noescape func MmSet1Epi8(r *x86.M128I, v0 *x86.Char) // Set packed 64-bit integers in "dst" with the supplied values in reverse order. // //go:linkname MmSetrEpi64 MmSetrEpi64 //go:noescape func MmSetrEpi64(r *x86.M128I, v0 *x86.M64, v1 *x86.M64) // Set packed 32-bit integers in "dst" with the supplied values in reverse order. // //go:linkname MmSetrEpi32 MmSetrEpi32 //go:noescape func MmSetrEpi32(r *x86.M128I, v0 *x86.Int, v1 *x86.Int, v2 *x86.Int, v3 *x86.Int) // Set packed 16-bit integers in "dst" with the supplied values in reverse order. // //go:linkname MmSetrEpi16 MmSetrEpi16 //go:noescape func MmSetrEpi16(r *x86.M128I, v0 *x86.Short, v1 *x86.Short, v2 *x86.Short, v3 *x86.Short, v4 *x86.Short, v5 *x86.Short, v6 *x86.Short, v7 *x86.Short) // Set packed 8-bit integers in "dst" with the supplied values in reverse order. // //go:linkname MmSetrEpi8 MmSetrEpi8 //go:noescape func MmSetrEpi8(r *x86.M128I, v0 *x86.Char, v1 *x86.Char, v2 *x86.Char, v3 *x86.Char, v4 *x86.Char, v5 *x86.Char, v6 *x86.Char, v7 *x86.Char, v8 *x86.Char, v9 *x86.Char, v10 *x86.Char, v11 *x86.Char, v12 *x86.Char, v13 *x86.Char, v14 *x86.Char, v15 *x86.Char) // Return vector of type __m128i with all elements set to zero. // //go:linkname MmSetzeroSi128 MmSetzeroSi128 //go:noescape func MmSetzeroSi128(r *x86.M128I, ) // Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using signed saturation, and store the results in "dst". // //go:linkname MmPacksEpi16 MmPacksEpi16 //go:noescape func MmPacksEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Convert packed signed 32-bit integers from "a" and "b" to packed 16-bit integers using signed saturation, and store the results in "dst". // //go:linkname MmPacksEpi32 MmPacksEpi32 //go:noescape func MmPacksEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Convert packed signed 16-bit integers from "a" and "b" to packed 8-bit integers using unsigned saturation, and store the results in "dst". // //go:linkname MmPackusEpi16 MmPackusEpi16 //go:noescape func MmPackusEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Create mask from the most significant bit of each 8-bit element in "a", and store the result in "dst". // //go:linkname MmMovemaskEpi8 MmMovemaskEpi8 //go:noescape func MmMovemaskEpi8(r *x86.Int, v0 *x86.M128I) // Unpack and interleave 8-bit integers from the high half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackhiEpi8 MmUnpackhiEpi8 //go:noescape func MmUnpackhiEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Unpack and interleave 16-bit integers from the high half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackhiEpi16 MmUnpackhiEpi16 //go:noescape func MmUnpackhiEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Unpack and interleave 32-bit integers from the high half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackhiEpi32 MmUnpackhiEpi32 //go:noescape func MmUnpackhiEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Unpack and interleave 64-bit integers from the high half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackhiEpi64 MmUnpackhiEpi64 //go:noescape func MmUnpackhiEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Unpack and interleave 8-bit integers from the low half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackloEpi8 MmUnpackloEpi8 //go:noescape func MmUnpackloEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Unpack and interleave 16-bit integers from the low half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackloEpi16 MmUnpackloEpi16 //go:noescape func MmUnpackloEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Unpack and interleave 32-bit integers from the low half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackloEpi32 MmUnpackloEpi32 //go:noescape func MmUnpackloEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Unpack and interleave 64-bit integers from the low half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackloEpi64 MmUnpackloEpi64 //go:noescape func MmUnpackloEpi64(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Copy the lower 64-bit integer in "a" to "dst". // //go:linkname MmMovepi64Pi64 MmMovepi64Pi64 //go:noescape func MmMovepi64Pi64(r *x86.M64, v0 *x86.M128I) // Copy the 64-bit integer "a" to the lower element of "dst", and zero the upper element. // //go:linkname MmMovpi64Epi64 MmMovpi64Epi64 //go:noescape func MmMovpi64Epi64(r *x86.M128I, v0 *x86.M64) // Copy the lower 64-bit integer in "a" to the lower element of "dst", and zero the upper element. // //go:linkname MmMoveEpi64 MmMoveEpi64 //go:noescape func MmMoveEpi64(r *x86.M128I, v0 *x86.M128I) // Unpack and interleave double-precision (64-bit) floating-point elements from the high half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackhiPd MmUnpackhiPd //go:noescape func MmUnpackhiPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Unpack and interleave double-precision (64-bit) floating-point elements from the low half of "a" and "b", and store the results in "dst". // //go:linkname MmUnpackloPd MmUnpackloPd //go:noescape func MmUnpackloPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Set each bit of mask "dst" based on the most significant bit of the corresponding packed double-precision (64-bit) floating-point element in "a". // //go:linkname MmMovemaskPd MmMovemaskPd //go:noescape func MmMovemaskPd(r *x86.Int, v0 *x86.M128D) // Cast vector of type __m128d to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname MmCastpdPs MmCastpdPs //go:noescape func MmCastpdPs(r *x86.M128, v0 *x86.M128D) // Cast vector of type __m128d to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname MmCastpdSi128 MmCastpdSi128 //go:noescape func MmCastpdSi128(r *x86.M128I, v0 *x86.M128D) // Cast vector of type __m128 to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname MmCastpsPd MmCastpsPd //go:noescape func MmCastpsPd(r *x86.M128D, v0 *x86.M128) // Cast vector of type __m128 to type __m128i. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname MmCastpsSi128 MmCastpsSi128 //go:noescape func MmCastpsSi128(r *x86.M128I, v0 *x86.M128) // Cast vector of type __m128i to type __m128. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname MmCastsi128Ps MmCastsi128Ps //go:noescape func MmCastsi128Ps(r *x86.M128, v0 *x86.M128I) // Cast vector of type __m128i to type __m128d. This intrinsic is only used for compilation and does not generate any instructions, thus it has zero latency. // //go:linkname MmCastsi128Pd MmCastsi128Pd //go:noescape func MmCastsi128Pd(r *x86.M128D, v0 *x86.M128I) ================================================ FILE: x86/sse3/functions.c ================================================ #include void MmAddsubPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_addsub_ps(*v0, *v1); } void MmHaddPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_hadd_ps(*v0, *v1); } void MmHsubPs(__m128* r, __m128* v0, __m128* v1) { *r = _mm_hsub_ps(*v0, *v1); } void MmMovehdupPs(__m128* r, __m128* v0) { *r = _mm_movehdup_ps(*v0); } void MmMoveldupPs(__m128* r, __m128* v0) { *r = _mm_moveldup_ps(*v0); } void MmAddsubPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_addsub_pd(*v0, *v1); } void MmHaddPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_hadd_pd(*v0, *v1); } void MmHsubPd(__m128d* r, __m128d* v0, __m128d* v1) { *r = _mm_hsub_pd(*v0, *v1); } void MmMovedupPd(__m128d* r, __m128d* v0) { *r = _mm_movedup_pd(*v0); } void MmMwait(unsigned* v0, unsigned* v1) { _mm_mwait(*v0, *v1); } ================================================ FILE: x86/sse3/functions.go ================================================ package sse3 import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -msse3 #include */ import "C" // Alternatively add and subtract packed single-precision (32-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst". // //go:linkname MmAddsubPs MmAddsubPs //go:noescape func MmAddsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst". // //go:linkname MmHaddPs MmHaddPs //go:noescape func MmHaddPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in "a" and "b", and pack the results in "dst". // //go:linkname MmHsubPs MmHsubPs //go:noescape func MmHsubPs(r *x86.M128, v0 *x86.M128, v1 *x86.M128) // Duplicate odd-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst". // //go:linkname MmMovehdupPs MmMovehdupPs //go:noescape func MmMovehdupPs(r *x86.M128, v0 *x86.M128) // Duplicate even-indexed single-precision (32-bit) floating-point elements from "a", and store the results in "dst". // //go:linkname MmMoveldupPs MmMoveldupPs //go:noescape func MmMoveldupPs(r *x86.M128, v0 *x86.M128) // Alternatively add and subtract packed double-precision (64-bit) floating-point elements in "a" to/from packed elements in "b", and store the results in "dst". // //go:linkname MmAddsubPd MmAddsubPd //go:noescape func MmAddsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst". // //go:linkname MmHaddPd MmHaddPd //go:noescape func MmHaddPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in "a" and "b", and pack the results in "dst". // //go:linkname MmHsubPd MmHsubPd //go:noescape func MmHsubPd(r *x86.M128D, v0 *x86.M128D, v1 *x86.M128D) // Duplicate the low double-precision (64-bit) floating-point element from "a", and store the results in "dst". // //go:linkname MmMovedupPd MmMovedupPd //go:noescape func MmMovedupPd(r *x86.M128D, v0 *x86.M128D) // Hint to the processor that it can enter an implementation-dependent-optimized state while waiting for an event or store operation to the address range specified by MONITOR. // //go:linkname MmMwait MmMwait //go:noescape func MmMwait(v0 *x86.Unsigned, v1 *x86.Unsigned) ================================================ FILE: x86/ssse3/functions.c ================================================ #include void MmAbsEpi8(__m128i* r, __m128i* v0) { *r = _mm_abs_epi8(*v0); } void MmAbsEpi16(__m128i* r, __m128i* v0) { *r = _mm_abs_epi16(*v0); } void MmAbsEpi32(__m128i* r, __m128i* v0) { *r = _mm_abs_epi32(*v0); } void MmHaddEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hadd_epi16(*v0, *v1); } void MmHaddEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hadd_epi32(*v0, *v1); } void MmHaddsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hadds_epi16(*v0, *v1); } void MmHsubEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hsub_epi16(*v0, *v1); } void MmHsubEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hsub_epi32(*v0, *v1); } void MmHsubsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_hsubs_epi16(*v0, *v1); } void MmMaddubsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_maddubs_epi16(*v0, *v1); } void MmMulhrsEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_mulhrs_epi16(*v0, *v1); } void MmShuffleEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_shuffle_epi8(*v0, *v1); } void MmSignEpi8(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sign_epi8(*v0, *v1); } void MmSignEpi16(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sign_epi16(*v0, *v1); } void MmSignEpi32(__m128i* r, __m128i* v0, __m128i* v1) { *r = _mm_sign_epi32(*v0, *v1); } ================================================ FILE: x86/ssse3/functions.go ================================================ package ssse3 import ( "github.com/alivanz/go-simd/x86" ) /* #cgo CFLAGS: -mssse3 #include */ import "C" // Compute the absolute value of packed signed 8-bit integers in "a", and store the unsigned results in "dst". // //go:linkname MmAbsEpi8 MmAbsEpi8 //go:noescape func MmAbsEpi8(r *x86.M128I, v0 *x86.M128I) // Compute the absolute value of packed signed 16-bit integers in "a", and store the unsigned results in "dst". // //go:linkname MmAbsEpi16 MmAbsEpi16 //go:noescape func MmAbsEpi16(r *x86.M128I, v0 *x86.M128I) // Compute the absolute value of packed signed 32-bit integers in "a", and store the unsigned results in "dst". // //go:linkname MmAbsEpi32 MmAbsEpi32 //go:noescape func MmAbsEpi32(r *x86.M128I, v0 *x86.M128I) // Horizontally add adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst". // //go:linkname MmHaddEpi16 MmHaddEpi16 //go:noescape func MmHaddEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Horizontally add adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst". // //go:linkname MmHaddEpi32 MmHaddEpi32 //go:noescape func MmHaddEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Horizontally add adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst". // //go:linkname MmHaddsEpi16 MmHaddsEpi16 //go:noescape func MmHaddsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Horizontally subtract adjacent pairs of 16-bit integers in "a" and "b", and pack the signed 16-bit results in "dst". // //go:linkname MmHsubEpi16 MmHsubEpi16 //go:noescape func MmHsubEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Horizontally subtract adjacent pairs of 32-bit integers in "a" and "b", and pack the signed 32-bit results in "dst". // //go:linkname MmHsubEpi32 MmHsubEpi32 //go:noescape func MmHsubEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Horizontally subtract adjacent pairs of signed 16-bit integers in "a" and "b" using saturation, and pack the signed 16-bit results in "dst". // //go:linkname MmHsubsEpi16 MmHsubsEpi16 //go:noescape func MmHsubsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Vertically multiply each unsigned 8-bit integer from "a" with the corresponding signed 8-bit integer from "b", producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results in "dst". // //go:linkname MmMaddubsEpi16 MmMaddubsEpi16 //go:noescape func MmMaddubsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Multiply packed signed 16-bit integers in "a" and "b", producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and store bits [16:1] to "dst". // //go:linkname MmMulhrsEpi16 MmMulhrsEpi16 //go:noescape func MmMulhrsEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Shuffle packed 8-bit integers in "a" according to shuffle control mask in the corresponding 8-bit element of "b", and store the results in "dst". // //go:linkname MmShuffleEpi8 MmShuffleEpi8 //go:noescape func MmShuffleEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Negate packed 8-bit integers in "a" when the corresponding signed 8-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. // //go:linkname MmSignEpi8 MmSignEpi8 //go:noescape func MmSignEpi8(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Negate packed 16-bit integers in "a" when the corresponding signed 16-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. // //go:linkname MmSignEpi16 MmSignEpi16 //go:noescape func MmSignEpi16(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) // Negate packed 32-bit integers in "a" when the corresponding signed 32-bit integer in "b" is negative, and store the results in "dst". Element in "dst" are zeroed out when the corresponding element in "b" is zero. // //go:linkname MmSignEpi32 MmSignEpi32 //go:noescape func MmSignEpi32(r *x86.M128I, v0 *x86.M128I, v1 *x86.M128I) ================================================ FILE: x86/types.go ================================================ package x86 /* #include */ import "C" // typedef longlong __m64 __attribute__((__vector_size__(8), __aligned__(8))); type M64 = C.__m64 // typedef float __m128 __attribute__((__vector_size__(16), __aligned__(16))); type M128 = C.__m128 // typedef double __m128d __attribute__((__vector_size__(16), __aligned__(16))); type M128D = C.__m128d // typedef longlong __m128i __attribute__((__vector_size__(16), __aligned__(16))); type M128I = C.__m128i // typedef double __m256d __attribute__((__vector_size__(32), __aligned__(32))); type M256D = C.__m256d // typedef longlong __m256i __attribute__((__vector_size__(32), __aligned__(32))); type M256I = C.__m256i // uint type Uint = C.uint // uchar __D type Uchar = C.uchar // ushort __D type Ushort = C.ushort // ulonglong type Ulonglong = C.ulonglong // int __i type Int = C.int // longlong __i type Longlong = C.longlong // short __s3 type Short = C.short // char __b7 type Char = C.char // float type Float = C.float // double type Double = C.double // unsigned __extensions type Unsigned = C.unsigned // __m256 type M256 = C.__m256